• An `oracle' for predicting the evolution

    From ScienceDaily@1:317/3 to All on Wednesday, March 09, 2022 21:30:48
    An `oracle' for predicting the evolution of gene regulation

    Date:
    March 9, 2022
    Source:
    Massachusetts Institute of Technology Department of Biology
    Summary:
    Computational biologists have created a neural network model
    capable of predicting how changes to non-coding DNA sequences
    in yeast affect gene expression. They also devised a unique way
    of representing this data in two dimensions, making it easy to
    understand the past and future evolution of non-coding sequences
    in organisms beyond yeast -- and even design custom gene expression
    patterns for gene therapies and industrial applications. Despite the
    sheer number of genes that each human cell contains, these so-called
    'coding' DNA sequences comprise just 1% of our entire genome. The
    remaining 99% is made up of 'non-coding' DNA -- which, unlike
    coding DNA, does not carry the instructions to build proteins.



    FULL STORY ========================================================================== Computational biologists have created a neural network model capable
    of predicting how changes to non-coding DNA sequences in yeast affect
    gene expression. They also devised a unique way of representing this
    data in two dimensions, making it easy to understand the past and
    future evolution of non- coding sequences in organisms beyond yeast --
    and even design custom gene expression patterns for gene therapies and industrial applications.


    ========================================================================== Despite the sheer number of genes that each human cell contains, these so- called "coding" DNA sequences comprise just 1% of our entire genome. The remaining 99% is made up of "non-coding" DNA -- which, unlike coding DNA,
    does not carry the instructions to build proteins.

    One vital function of this non-coding DNA, also called "regulatory"
    DNA, is to help turn genes on and off, controlling how much (if any)
    of a protein is made.

    Over time, as cells replicate their DNA to grow and divide, mutations
    often crop up in these non-coding regions -- sometimes tweaking their
    function and changing the way they control gene expression. Many of
    these mutations are trivial, and some are even beneficial. Occasionally, though, they can be associated with increased risk of common diseases,
    such as type 2 diabetes, or more life-threatening ones, including cancer.

    To better understand the repercussions of such mutations, researchers
    have been hard at work on mathematical maps that allow them to look at an organism's genome, predict which genes will be expressed, and determine
    how that expression will affect the organism's observable traits. These
    maps, called fitness landscapes, were conceptualized roughly a century
    ago to understand how genetic makeup influences one common measure of organismal fitness in particular: reproductive success. Early fitness landscapes were very simple, often focusing on a limited number of
    mutations. Much richer data sets are now available, but researchers still require additional tools to characterize and visualize such complex
    data. This ability would not only facilitate a better understanding
    of how individual genes have evolved over time, but would also help to
    predict what sequence and expression changes might occur in the future.

    In a new study published on March 9 in Nature, a team of scientists has developed a framework for studying the fitness landscapes of regulatory
    DNA.

    They created a neural network model that, when trained on hundreds
    of millions of experimental measurements, was capable of predicting
    how changes to these non-coding sequences in yeast affected gene
    expression. They also devised a unique way of representing the landscapes
    in two dimensions, making it easy to understand the past and forecast the future evolution of non-coding sequences in organisms beyond yeast --
    and even design custom gene expression patterns for gene therapies and industrial applications.

    "We now have an 'oracle' that can be queried to ask: What if we tried
    all possible mutations of this sequence? Or, what new sequence should we
    design to give us a desired expression?" says Aviv Regev, a professor of biology at MIT (on leave), core member of the Broad Institute of Harvard
    and MIT (on leave), head of Genentech Research and Early Development,
    and the study's senior author. "Scientists can now use the model for
    their own evolutionary question or scenario, and for other problems
    like making sequences that control gene expression in desired ways. I
    am also excited about the possibilities for machine learning researchers interested in interpretability; they can ask their questions in reverse,
    to better understand the underlying biology." Prior to this study, many researchers had simply trained their models on known mutations (or slight variations thereof) that exist in nature. However, Regev's team wanted
    to go a step further by creating their own unbiased models capable of predicting an organism's fitness and gene expression based on any possible
    DNA sequence -- even sequences they'd never seen before. This would also
    enable researchers to use such models to engineer cells for pharmaceutical purposes, including new treatments for cancer and autoimmune disorders.



    ==========================================================================
    To accomplish this goal, Eeshit Dhaval Vaishnav, a graduate student
    at MIT and co-first author, Carl de Boer, now an assistant professor
    at the University of British Columbia, and their colleagues created a
    neural network model to predict gene expression. They trained it on a
    dataset generated by inserting millions of totally random non-coding DNA sequences into yeast, and observing how each random sequence affected
    gene expression. They focused on a particular subset of non-coding DNA sequences called promoters, which serve as binding sites for proteins
    that can switch nearby genes on or off.

    "This work highlights what possibilities open up when we design new
    kinds of experiments to generate the right data to train models," Regev
    says. "In the broader sense, I believe these kinds of approaches will
    be important for many problems -- like understanding genetic variants
    in regulatory regions that confer disease risk in the human genome,
    but also for predicting the impact of combinations of mutations, or
    designing new molecules." Regev, Vaishnav, de Boer, and their coauthors
    went on to test their model's predictive abilities in a variety of ways,
    in order to show how it could help demystify the evolutionary past --
    and possible future -- of certain promoters.

    "Creating an accurate model was certainly an accomplishment, but, to me,
    it was really just a starting point," Vaishnav explains.

    First, to determine whether their model could help with synthetic
    biology applications like producing antibiotics, enzymes, and food, the researchers practiced using it to design promoters that could generate
    desired expression levels for any gene of interest. They then scoured
    other scientific papers to identify fundamental evolutionary questions,
    in order to see if their model could help answer them. The team even went
    so far as to feed their model a real-world population data set from one existing study, which contained genetic information from yeast strains
    around the world. In doing so, they were able to delineate thousands of
    years of past selection pressures that sculpted the genomes of today's
    yeast.

    But, in order to create a powerful tool that could probe any genome,
    the researchers knew they'd need to find a way to forecast the evolution
    of non- coding sequences even without such a comprehensive population
    data set. To address this goal, Vaishnav and his colleagues devised a computational technique that allowed them to plot the predictions from
    their framework onto a two-dimensional graph. This helped them show,
    in a remarkably simple manner, how any non-coding DNA sequence would
    affect gene expression and fitness, without needing to conduct any time-consuming experiments at the lab bench.



    ==========================================================================
    "One of the unsolved problems in fitness landscapes was that we didn't
    have an approach for visualizing them in a way that meaningfully captured
    the evolutionary properties of sequences," Vaishnav explains. "I really
    wanted to find a way to fill that gap, and contribute to the longstanding vision of creating a complete fitness landscape." Martin Taylor, a
    professor of genetics at the University of Edinburgh's Medical Research
    Council Human Genetics Unit who was not involved in the research,
    says the study shows that artificial intelligence can not only predict
    the effect of regulatory DNA changes, but also reveal the underlying
    principles that govern millions of years of evolution.

    Despite the fact that the model was trained on just a fraction of yeast regulatory DNA in a few growth conditions, he's impressed that it's
    capable of making such useful predictions about the evolution of gene regulation in mammals.

    "There are obvious near-term applications, such as the custom design
    of regulatory DNA for yeast in brewing, baking, and biotechnology,"
    he explains.

    "But extensions of this work could also help identify disease mutations
    in human regulatory DNA that are currently difficult to find and largely overlooked in the clinic. This work suggests there is a bright future
    for AI models of gene regulation trained on richer, more complex, and
    more diverse data sets." Even before the study was formally published, Vaishnav began receiving queries from other researchers hoping to use
    the model to devise non-coding DNA sequences for use in gene therapies.

    "People have been studying regulatory evolution and fitness landscapes
    for decades now," Vaishnav says. "I think our framework will go a long
    way in answering fundamental, open questions about the evolution and evolvability of gene regulatory DNA -- and even help us design biological sequences for exciting new applications."

    ========================================================================== Story Source: Materials provided by Massachusetts_Institute_of_Technology_Department_of Biology. Original
    written by Raleigh McElvery. Note: Content may be edited for style
    and length.


    ========================================================================== Journal Reference:
    1. Eeshit Dhaval Vaishnav, Carl G. de Boer, Jennifer Molinet,
    Moran Yassour,
    Lin Fan, Xian Adiconis, Dawn A. Thompson, Joshua Z. Levin,
    Francisco A.

    Cubillos, Aviv Regev. The evolution, evolvability and engineering of
    gene regulatory DNA. Nature, 2022; DOI: 10.1038/s41586-022-04506-6 ==========================================================================

    Link to news story: https://www.sciencedaily.com/releases/2022/03/220309131825.htm

    --- up 1 week, 2 days, 10 hours, 51 minutes
    * Origin: -=> Castle Rock BBS <=- Now Husky HPT Powered! (1:317/3)