In Focus:
Machine learning

In the brain’s natural intelligence and in DNA, evolution has produced systems for processing, communicating, and storing information that remain unrivaled by technology – and never cease to inspire scientific exploration and creative innovation. TUM-IAS Fellow Reinhard Heckel and his collaborators combine artificial intelligence and information theory to improve existing technologies and devise new ones, such as data storage in synthetic DNA.

Machine learning is a broad and expanding field of artificial intelligence with diverse, multifaceted branches.­ The Machine Learning Focus Group of the TUM Institute for Advanced Study centers on the research interests of Reinhard Heckel, who joined the Technical University of Munich in 2019 as an assistant professor in the Department of Computer and Electrical Engineering.

A Rudolf Mößbauer Tenure Track Fellow of the TUM-IAS, Heckel leads ­research that explores the fundamentals of so-called deep learning, enables systems to learn from few and noisy examples, and applies deep learning to solving inverse problems in science, engineering, and medicine. In addition, he is a pioneer in encoding and decoding information for storage in the biological molecule DNA.

On behalf of the TUM-IAS, science journalist Patrick Regan spoke with Reinhard Heckel via videoconference in January 2022. Their conversation has been edited for clarity and length.

Q: Artificial intelligence in general, and deep networks and deep learning in particular, are not just hot topics but have scored some achievements – in the study of protein folding, for instance – that may have real-world consequences. What are the areas you see as most interesting and promising? Are there any areas where you think proponents might risk claiming too much, or expecting too much?

There are some areas where machine learning works particularly well, and it’s well known that it works well. Those are ­computer vision and natural language processing. I would say most people who work on deep learning work on applications in one of those two areas. Computer vision is very broad. It includes vision for self-driving cars, building apps that can detect how you do an ­exercise, and it ranges from things like that to surveillance applications and more. Similarly, natural language processing ranges from applications like summarizing news to translating languages or helping as assistants, like chatbots, or also more sophisticated things like supporting other computer scientists in writing code faster. There are a lot of applications, and it works really well. I personally work in the area of using deep learning for signal and image reconstruction problems.

Q: Speaking of imaging, you’ve worked on both radar and MRI. How do you approach these different technologies, and how do these relate to each other?

Radar and MRI are both imaging technologies. They have in common that a device ­collects measurements of an object through a physical process. In MRI, the object may be a person’s brain. The measurement device is a magnetic coil that takes the measurement. In radar a device sends a signal and receives a response, which is the measurement. In both examples, and really any imaging system, there’s a physical process that describes a relationship between the image and the measurement.

Traditionally algorithms that reconstruct an image from measurements were designed without machine learning. Experts, physicists or engineers, were handcrafting sophisticated algorithms and designing sophisticated models for images and other signals. Most current imaging technologies still work with those traditional algorithms.

But scientists and engineers have realized in the past five years that for all those imaging problems, deep networks work extremely well, and yield significantly better image quality.

Q: With radar, you have little or no control over the space in which you’re making the measurement, whereas with MRI you can engineer a lot of the constraints. So is learning applicable in different ways to those two situations?

Those technologies are actually quite similar in a way. For an MRI scanner, you can design the sequences that you are running, and thus decide how to take measurements. And similarly in radar you can decide on the signal that you’re sending. The reason deep networks perform well in both domains is that both radar and MRI images are difficult to model accurately using mathematics, and deep networks can learn very good models from data.

For example, it’s difficult to mathematically model how a brain looks. But if I show you a few images of a brain, then it’s easy for you to tell me what is or is not a brain. Or if I show you a brain and then I draw something, it is easy for you to see that this is not an image of the brain. But it’s very difficult to formulate such intuition in computer code. And that’s why, in these kinds of problems, machine learning helps a lot. Because it’s very helpful for the algorithm to have such knowledge built in. If you’re imaging a brain, it’s very helpful for the algorithm to know what a brain looks like.

Q: Do you continue to be active in this area? What does the research focus on?

In our research, we’re interested in ­developing algorithms that are “learned” from data. We use learning to derive the algorithm itself from data. We are interested, of course, in
good performance. So if we image a knee, we want an accurate image so the radiologist can make a good diagnosis based on that.

We’re also interested in understanding the inner workings of deep learning-based algorithms, and in providing guarantees about properties like robustness. Our concern is the following. Let’s say the algorithm is trained on data from measurements with a particular MRI scanner and a particular patient population in Munich. Then someplace else, the algorithm is applied to data from a slightly different scanner, and another technician operates the scanner, and the patient population is different. I still need to be sure that my algorithm works well as intended. And that is very challenging. Ensuring robustness is what keeps us busy.

Q: How do you improve the robustness?

By measuring robustness, identifying robustness issues, and fixing them. We measure robustness by performing experiments on data collected by us, or we work with scientists who collect data. Then we construct algorithms where we build everything that we understand explicitly into the model, the physics for example. Everything that we know, we want to build into our system. Everything that we don’t know, or that we are not sure of, we want to leave open, so that we can learn it on the basis of data. This algorithm design approach works very well.

Q: And then the process of improvement is writing new code, or asking the networks to do something?

Sometimes it’s not clear what can lead to an improvement. It could be that all you need to do to improve your system is to collect more data, or it could be that you need to improve your model, or it could be that if you have much more data you actually use a different model. If you have much more data, you have more to learn from, and if you have more to learn from, you need to hard-code less into your model. And if you hard-code less in your model, you also have fewer biases in the model that you hard-coded. So you use different models, for example, in a setup where you have a lot of data. We are always thinking about these models and performing experiments to validate whether they work well. We carefully test hypotheses to design a good algorithm. Sometimes we might have an idea about what should work, and that then ­becomes a hypothesis that we test. ­After testing such a hypothesis on data, we go back and forth until we have a well performing and robust system.

Q: So there’s an iterative process of refining the models and algorithms.

Yes.

Q: Now, does this connect directly with the DNA storage, or is that a completely separate item?

It’s pretty separate. But I really hope that the idea from imaging that we can learn an algorithm is also applicable in DNA storage. We’ve started looking into that, because in DNA storage systems, there are situations where we don’t have satisfying algorithmic solutions. Specifically, we have algorithms that work but take a lot of computational resources. We want to build algorithms that are significantly faster, and one idea about how to design such algorithms is to learn them from data using deep networks. So there is a connection, but it started out completely separate. Now I am trying to use similar ideas.

Q: Let’s talk about DNA-based information storage. As I understand it: With DNA, you’re translating binary code into A-T and G-C bonds? So you use a conventional computer to translate digital information into a sequence of these four amino acids, and then you chemically synthesize strands of DNA with that -exact sequence? Then the physical DNA is somehow applied to a surface that can provide you later with samples that can be sequenced, and the recovered sequence can be translated to the original digital data? OK so far?

Yes, everything is correct.

Q: So this might, like quantum computing, be practical as long as there are applications that justify the extra effort. For DNA information storage, what would such applications be?

DNA data storage is already practical today for some specific applications. Of course, there are some storage applications like ­active working memory in a computer where DNA doesn’t make sense. But then there are storage applications where DNA makes a lot of sense. Those are applications where we want to store data with a very high information density. The information density that we already achieve on DNA today is by orders of magnitude higher than what we achieve on tape or on a hard disk.

Q: So, information per unit of area or volume.

Exactly. A lot of information in a very small space. And the second reason we are interested in DNA storage is longevity. Data on DNA lasts very long.

Q: It’s counter-intuitive, isn’t it. We and all other living things are, in the grand scheme of things, fleeting phenomena. But this molecule outlasts us somehow.

Yes, there are some good examples demonstrating DNA’s longevity. DNA has been found, preserved, in the bones of horses that lived 400,000 years ago. And researchers have been able to extract DNA from the bones of a mammoth more than a million years old. Sometimes insects are found in amber, and from those you can also extract the DNA. If DNA is preserved in a dry form, then it can last very long.

Q: I’m still grappling with how you package it so that a person in the future will recognize what it is and know what to do with it. That question seems comparable to labeling nuclear waste as hazardous in a way that future ­generations should understand, something people are working on right now. The potential longevity of data stored in DNA makes me think of the Voyager space probes, which are carrying information about us and our world into the universe beyond the solar system. As unlikely as it may be that the golden disks will ever be retrieved and understood, if that does happen, the information they contain might have outlived human civilization. Potentially, what you’re working on may not be so different.

That’s a good point. It’s also a separate problem in a way, an important problem. If you want to store something for a long time, you still have to remember how you stored it and what you even mean with all that information. Even if we write sentences in English, we have to understand the language, and on a lower level there are additional ways we write data and encode it, and we have to remember “in what technical language” we wrote that information. This meta information has to be preserved as well, potentially close to the storage medium. The Voyager golden records are a good example. They were included aboard the Voyager spacecrafts and are a sort of time capsule. Those records carry information on how to read the information on the disks.

Q: Right, there are instructions on how to play it. So there are a bunch of important steps involved. Where are you placing most of your emphasis?

For the DNA project, I put all my energy into the encoding and the decoding. My ­chemist collaborator, Prof. Robert Grass, does all things chemistry. I essentially tell them, here is what we need to write, and then they write it on DNA, carry out an experiment, and ­return the read DNA back to me. Everything in between encoding and decoding is really the job of a chemist.

Q: Your main collaborators are still in Zurich?

Yes, at ETH Zurich. That’s a collaboration that’s been going for ten years now.

Q: Do you foresee any application that would correspond to short-term working memory in a computer or communication system?

No, not really. There are different things that you want from memory. In some indifferent applications you want to store a lot of data. There are applications where the data should be very dense. There are applications where the data should last very long. And there are applications where the data should be ­accessible very ­quickly. Depending on what you’re interested in, one storage medium is better than the other. DNA will never be ­extremely fast to access. It will never be as fast to read and write as the working memory in your ­computer. Some chips have memory access times of 10 nanoseconds and less. But then there are some things you can only do with DNA right now. For example, we had one project where we stored the music of a rock band in DNA. We put it in spray paint, and the band made a graffiti painting with it.

Q: So the painting could be sampled, the sample could be sequenced, and the resulting digital file could then be played?

Yes. If you scratch off just a tiny piece of the painting and take it to the chemist, Prof. Grass, he can extract the DNA, and then you give the data to me to decode it, and you get all the music back.

Q: So in the graffiti painting, there would be millions of copies of this sequence.

Millions.

Q: And each one is like a hologram of the original file?

Right. You have millions and millions of tiny copies everywhere, and if you take just a little bit, you can get all your data back. I don’t see how you could do that with anything else right now. And you can even embed it in plastic and things like that. So there are applications for which DNA is already interesting now.

Q: To what extent is this approach to encoding and decoding tied to classic information ­theory? At the time information theory was first being developed at Bell Labs, other ­researchers there were creating some of the essential components for digital computing and communications. Information theory and transistor-based electronics seem to be made for each other. This makes me wonder: Is there something else that might be “made for” DNA? DNA may suggest potentials that a binary operation, or even superposition scenarios, could not do or would not even suggest. Are you tempted to rethink information theory?

That’s a good question. We used a lot of the coding theory that was developed at that time, and we build on the information theory that was developed at that time. When there a new storage medium, like DNA, arises you need to develop new or update current information theory for the particular problem. So what is particular about DNA, why it is necessary to develop new codes, is the following. Typically when you store information or you send a signal, you just have a relatively long string of data. In a hard disk, it’s just written on your disk. And you know where you wrote it. It’s just physically there. That’s what’s different about DNA. In our bodies, DNA is very long, it’s a very long sequence, about three billion nucleotides. But we cannot write long pieces of DNA. It’s not possible now, and it doesn’t look as if it’s going to be possible any time soon. What we can do is write very short pieces of DNA. Those pieces are typically a hundred nucleotides long.

Q: Analogous to packets in data communications? Or so-called shotgun sequencing in DNA analysis?

It’s related to shotgun sequencing. We do shotgun sequencing because it’s difficult to read long pieces of DNA. If we could just read the entire DNA of a person, we wouldn’t need shotgun sequencing. We can only write and read short pieces of DNA, so we have to split long pieces of DNA into smaller ones to read it, and that’s why scientists do shotgun ­sequencing.

Q: So part of the encoding process is splitting it up?

Exactly. However, even if you split things up into blocks, you typically know where these blocks are. But in DNA, you have the following situation: We can write short pieces of DNA, and we can write as many as we want. But then the writing is imperfect. We make a lot of errors when we write.

Q: So you had to invent error correction for DNA as well?

We didn’t have to invent it. We could build our codes on the work of Shannon, Hamming, Reed and Solomon, and others. But we had to make specific information theory for the DNA data storage channel. A channel model distorts the information at its input. For DNA, we have a very different channel model, and when you have a different channel model, then you have to study that, and you have to study the capacity again, which Shannon defined. Then you have to develop codes for that. Essentially you use the language of ­information theory.

Q: Even though you’re dealing with this chemically synthesized material.

Exactly. It’s really nice that you brought this up, because that’s really what I’m doing. That’s why the encoding and decoding is interesting. If you could write a long piece of DNA, then we could use established techniques. I would think: All these information theory problems are solved already, so I’m just going to use existing codes. There wouldn’t be much to do from a research ­perspective. That would also be fine, right? Then I would have worked on something else. But one other thing that’s specific to DNA is that you also have deletions and insertions. That makes it difficult too, because we don’t have good codes for deletions and insertions, and we actually don’t understand fundamental questions on the capacity of deletion and insertion channels.

Q: But I’m guessing you’re not looking for a biomimetic approach to coding.

No. We are not trying to mimic biology in any way. You mentioned that transistors and ­information theory are kind of made for each other. The one thing you can do very easily with DNA is to copy it. That’s very cheap. But other than that, we are not trying to use ­biological approaches. I don’t think it makes that much sense. There are also some error correction mechanisms in biology, but it’s very inefficient because of the computational limitations that biology has. So I don’t think there’s anything there to mimic, unfortunately.

Q: Last question on DNA. What do you imagine a DNA-based archive or library would look like at some point in the future? How do you picture people interacting with this information source, putting things in and taking things out?

I think eventually it’s going to look more or less like a USB stick. There is already a ­device only slightly larger than a USB stick that can read DNA. I imagine if it’s ever going to really become a thing like that, it will probably be a device that can read and write the DNA, so you can put information in and out.

Q: Something using microfluidics integrated with electronics?

I think that’s the vision. But who knows what will happen? It can also be that DNA storage will just not take off for long-term storage of information, but then it’ll likely have another application area. For example, for marking products or embedding information in them, you might use DNA storage as an industrial tool. That’s the first thing that will happen, I think. Because it’s much easier. You don’t have to store so much information. You can make use of the fact that you need to copy it.

Q: So, for example, the information could be in the ink on the label?

Yes, the information can be in the ink. DNA can be embedded in many objects. It can be in olive oil, to mark where the olive oil is ­coming from. The first successful applications of DNA storage are going to be things like that.

Q: So it could have applications in control and logistics.

That’s right.

Q: Your last position was in the Computer and Electrical Engineering Department at Rice University, and clearly that’s a good place to be. How did TUM lure you away?

Rice is actually a fantastic research environment. It’s a small department but functions very well. The people are very nice and ­supportive, and it’s very good for young ­scientists. TUM was interesting, first, ­because within Germany it is the best university, so it attracts good students. And it’s a big university, with lots of students. If you have a lot of students, then there are also many especially good students. That makes it easier to find good PhD candidates at TUM, and good PhD candidates are super important. That was one of the main reasons I came here.

Q: Did the Mößbauer Tenure Track program and the connection to the TUM-IAS offer any special advantages that played a role in your decision?

That did make it more interesting. The TUM-IAS provided funding to start with and to build a lab. That was very helpful. That gave me extra freedom at the beginning. The ­money is not tied to one or the other thing. That’s very important at the beginning. I traveled quite a bit, and I bought some equipment like computers that you can’t easily purchase with public grants. So the freedom of having the funding and being able to use it for ­anything I want was very important. I think that is very important in general. Ideally, money in research should come with very few strings attached. Then it’s going to be used most ­efficiently.

Q: What kinds of topics are your current PhD candidates working on?

In imaging we have seen that image reconstruction problems can be solved by traditional algorithms without any learning, but we can do much better when we learn the algorithms from data, that is, if we take a data-­driven approach to imaging. The reason is that machine learning algorithms can learn very effective prior assumptions about the data. So they learn structure about the data. In imaging, the algorithms learn what natural images look like.

I think this idea is much more broadly applicable. For example, we are interested in other types of data, such as proteins. In medicine and chemistry it is very important to know what a given protein looks like, for drug discovery and other applications like that. To image and use a protein in a computer system, we want to view the protein as protein, not as an image of a crystal. Because it moves and occurs in different configurations. You don’t want to view it as a 3D object. You want to see and model it for what it is, and for this neural networks and machine learning can be very effective.

One concrete project we are working on is how to learn prior assumptions about such objects. Of course we can work with images of such objects, but then we’re not really ­interested in the image per se, we’re interested in a representation of that object. The same thing goes for other types of data. For example, in communications, potentially, you have a channel, and it’s not an image, but it also has some representation. And what I’m interested in, and also what a lot of my PhD candidates are working on, is how we can take the idea of learning in algorithms to ­other fields and other types of data.

Q: Are any of them working on the DNA side yet?

I just found an excellent young student. She started in April 2022 to work on DNA data storage, and I’m very excited about what she’ll do. It’s been a bit more difficult to find such an excellent student for DNA storage than for machine learning. A student in that area needs to have similar skills, but many of today’s ­students want to work on machine learning, because it is hot right now. There was a time when all the smart students wanted to work in information theory. Now a lot of the best ­students want to do machine learning.