De novo genes, which are new protein-coding genes that arise from previously non-coding sequences, represent an important and understudied source of evolutionary innovation. At present, little is known about the earliest stages of de novo gene formation in which a genomic sequence gains transcription, an open reading frame, and spreads through the population. To address these fundamental questions, we study newly-evolved, expressed open reading frames (neORFs) in species of the genus Drosophila. During the first phase of this project, we used long-read sequencing to assemble the complete genomes and transcriptomes of 30 inbred Drosophila lines from 6 different species spanning divergence times of 2–50 million years. We identified thousands of neORFs, most of which were line- and species-specific. We now plan to use this unique dataset to study the properties of neORFs and the factors that allow them to spread through a population or species and be retained. We will perform ribosomal profiling (Ribo-Seq) to determine the translational status of neORFs and bioinformatic analyses to characterize the functional motifs and structural elements of their encoded proteins. We will use the translational data together with additional population-level transcriptomic data (Iso-Seq) to improve and extend our models of neORF emergence and retention. In addition, we will use gene editing (CRISPR-Cas) to knock-out candidate neORFs in order to determine their effects on global gene expression and organismal phenotypes, such as male fertility. Taken together, the results will provide a comprehensive picture of how de novo genes become established in species.
| Bornberg-Bauer, Erich | Research Group Evolutionary Bioinformatics |
| Bornberg-Bauer, Erich | Research Group Evolutionary Bioinformatics |