Sep 17 2024
The Potential of AI + CRISPR
In my book, which I will now shamelessly promote – The Skeptics’ Guide to the Future – my coauthors and I discuss the incredible potential of information-based technologies. As we increasingly transition to digital technology, we can leverage the increasing power of computer hardware and software. This is not just increasing linearly, but geometrically. Further, there are technologies that make other technologies more information-based or digital, such as 3D printing. The physical world and the virtual world are merging.
With current technology this is perhaps most profound when it comes to genetics. The genetic code of life is essentially a digital technology. Efficient gene-editing tools, like CRISPR, give us increasing control over the genetic code. Arguably two of the most dramatic science and technology news stories over the last decade have been advances in gene editing and advances in artificial intelligence (AI). These two technologies also work well together – the genome is a large complex system of interacting information, and AI tools excel at dealing with large complex systems of interacting information. This is definitely a “you got chocolate in my peanut butter” situation.
A recent paper nicely illustrates the synergistic power of these two technologies – Interpreting cis-regulatory interactions from large-scale deep neural networks. Let’s break it down.
Cis-regulatory interactions refer to several regulatory functions of non-coding DNA. Coding DNA, which is contained within genes (genes contain both coding and non-coding elements) directly code for amino acids which are assembled into polypeptides and then folded into functional proteins. Remember the ATCG four letter base code, with three bases coding for a specific amino acid (or coding function, like a stop signal). This is coding DNA. Noncoding DAN regulates how coding DNA is transcribed into proteins.
There are, for example, promoter sequences, which are necessary for transcription in eukaryotes. There are also enhancer sequences which increase transcription, and silencer sequences which decrease transcription. Interactions among these various regulatory segments control how much of which proteins any particular cell will make, while responding dynamically to its metabolic and environmental needs. It is a horrifically complex system, as one might imagine.
CRISPR gives us the ability to not only change the coding sequence of a gene (or remote or splice in entire genes), it can also be used to alter regulation of gene expression. It can reversibly turn off, and then back on again, the transcription of a gene. But doing so messes with this complex systems of regulatory sequences, so the more we understand about it, the better. Also, we are discovering that there are genetic diseases that do not involve mutations of coding DNA but of regulatory DNA. So again, the more we understand about the regulatory system, the better we will be able to study and eventually treat diseases of gene expression regulation.
This is a perfect job for AI, and in this case specifically, deep neural networks (DNN). The problem with conventional research into a massive and complex system like the human genome (or any genome) is that the number of individual experiments you would need to do in order to address even a single question can be vast. You would need the resources of laboratory time, personnel and money to do thousands of individual experiments. Or – we could let AI do those experiments virtually, at a tiny fraction of the cost and time. This is exactly the tool that the researchers have developed. They write:
“Here we present cis-regulatory element model explanations (CREME), an in silico perturbation toolkit that interprets the rules of gene regulation learned by a genomic DNN. Applying CREME to Enformer, a state-of-the-art DNN, we identify cis-regulatory elements that enhance or silence gene expression and characterize their complex interactions.”
Essentially this is a two-step process. Enformer is a DNN that plows through tons of data to learn the rules of gene regulation. The problem with some of these AIs, however, is that they spit out answers but not necessarily the steps that led to the answers. This is the so-called “black box” problem of some AIs. But genetics researchers want to know the steps – they want to know the individual regulatory elements that Enformer identified as the building blocks for the overall rules the produce. That is what CREME does – it looks at the rule output of Enformer and reverse engineers the cis-regulatory elements.
The combination essentially allows genetics researchers to run thousands of virtual experiments in silico to build a picture of cis-regulatory elements and interactions that make up the web of rules that control gene expression. This is great example of how AI can potentially dramatically increase the pace of scientific research. It also highlights how genetics is perhaps ideally suited to reap the benefits of AI-enhanced research, because it is already an inherently digital science.
This is perhaps the sweat spot for AI-enhanced scientific research – look through billions of potential targets and tell me which 2 or 3 I should focus on. This also applies to drug research and material science, where the number of permutations – the potential space – of possible solutions is incredibly vast. For many types of research, AI is condensing down months or years of research into hours or days of processing time.
For genetics these two technologies (AI and gene-editing such as, but not limited to, CRISPR) combine to give us incredible knowledge and control over the literal code of life. It still takes a lot of time to translate this into specific practical applications, but they are coming. We already, for example, have approved therapies for genetic diseases, like sickle cell, that previously had no treatments that could alter their course. More is coming.
This field is getting so powerful, in fact, that we are discussing the ethics of potential applications. I understand why people might be a little freaked out at the prospect of tinkering with life at its most fundamental level. We need a regulatory framework that allows us to reap the immense benefits without unleashing unintended consequences, which can be similarly immense. For now this largely means that we don’t mess with the germ line, and that anything a company wishes to put out into the world has to be individually approved. But like many technologies, as both AI and genetic manipulation gets cheaper, easier, and more powerful, the challenge will be maintaining effective regulation as the tech proliferates.
For now, at least, we can remain focused on ethical biomedical research. I expect in the next 5-20 years we will see not only increasing knowledge of genetics, but specific medical applications. There is still a lot of low hanging fruit to be picked.