Annotating gene structures and functions to genome assemblies is necessary to make assembly resources useful for biological inference. Gene Ontology (GO) term assignment is the most used functional annotation system, and new methods for GO assignment have improved the quality of GO-based function predictions. The Gene Ontology Meta Annotator for Plants (GOMAP) is an optimized,
high-throughput, and reproducible pipeline for genome-scale GO annotation of plants. We containerized GOMAP to increase portability and reproducibility and also optimized its performance for HPC environments. Here we report on the pipeline’s availability and performance for annotating large, repetitive plant genomes and describe how GOMAP was used to annotate multiple maize genomes as a test case. Assessment shows that GOMAP expands and improves the number of genes annotated and annotations assigned per gene as well as the quality (based on [Formula: see text]) of GO assignments in maize. GOMAP has been deployed to annotate other species including wheat, rice,
barley, cotton, and soy. Instructions and access to the GOMAP Singularity container are freely available online at https://bioinformapping.com/gomap/ . A list of annotated genomes and links to data is maintained at https://dill-picl.org/projects/gomap/ .
The shoot apical meristem (SAM) orchestrates the balance between stem cell proliferation and organ initiation essential for postembryonic shoot growth. Meristems show a striking diversity in shape and size. How this morphological diversity relates to variation in plant architecture and the molecular circuitries driving it are unclear. By generating a high-resolution gene expression atlas of the vegetative maize shoot apex, we show here that distinct sets of genes govern the regulation and identity of stem cells in maize versus Arabidopsis. Cell identities in the maize SAM reflect the combinatorial activity of transcription factors (TFs) that drive the preferential, differential expression of individual members within gene families functioning in a plethora of cellular processes. Subfunctionalization thus emerges as a fundamental feature underlying cell identity. Moreover, we show that adult plant characters are, to a significant degree, regulated by gene circuitries acting in the SAM, with natural variation modulating agronomically important architectural traits enriched specifically near dynamically expressed SAM genes and the TFs that regulate them. Besides unique mechanisms of maize stem cell regulation, our atlas thus identifies key new targets for crop improvement. \copyright 2019 Knauer et al.; Published by Cold Spring Harbor Laboratory Press.
Most gene expression analysis methods discover groups of genes that are co-expressed, rather than testing whether a specified gene group behaves in a concerted manner. We implemented a novel statistical method designed to assess significance of differences in RNA expression levels among specified groups of genes. Our Shiny web application C-REx (Comparison of RNA Expression) enables researchers to readily test hypotheses about whether specific gene groups share expression profiles and whether those profiles differ from those of other groups of genes. We implemented data transformation, a normality visualizer, and both parametric and non-parametric tests for determining whether gene groups are functioning in concert or in contrast both within and between conditions. Here, we demonstrate that the C-REx application recovers well-known biological phenomena (e.g., response to heat stress).
Diversification of the turtle’s shell comprises remarkable phenotypic transformations. For instance, two divergent species convergently evolved shell‐closing systems with shoulder blade (scapula) segments that enable coordinated movements with the shell. We expected these unusual structures to originate via similar changes in underlying gene networks, as skeletal segment formation is an evolutionarily conserved developmental process. We tested this hypothesis by comparing transcriptomes of scapula tissue across three stages of embryonic development in three emydid turtles from natural populations. We found that alternative strategies for skeletal segmentation were associated with interspecific differences in gene co‐expression networks. Notably, mesenchyme homeobox 2 (MEOX2) and HOXA3‐5 were central hubs driving the activity of 2,806 genes in a candidate network for scapula segmentation, albeit in only one species. Even so, scapula muscle overgrowth corresponded to the activity of similar myogenic networks in both species. This and other derived developmental processes were not observed in the third species, which displayed the ancestral (unsegmented) scapula condition. Differential gene expression tests against this reference lineage supported histological and network analyses. Our findings illustrate that molecular underpinnings of convergent evolution, including during the diversification of the atypical turtle “body plan,” are influenced by variation in underlying developmental processes.
The maize W22 inbred has served as a platform for maize genetics since the mid twentieth century. To streamline maize genome analyses, we have sequenced and de novo assembled a W22 reference genome using short-read sequencing technologies. We show that significant structural heterogeneity exists in comparison to the B73 reference genome at multiple scales, from transposon composition and copy number variation to single-nucleotide polymorphisms. The generation of this reference genome enables accurate placement of thousands of Mutator (Mu) and Dissociation (Ds) transposable element insertions for reverse and forward genetics studies. Annotation of the genome has been achieved using RNA-seq analysis, differential nuclease sensitivity profiling and bisulfite sequencing to map open reading frames, open chromatin sites and DNA methylation profiles, respectively. Collectively, the resources developed here integrate W22 as a community reference genome for functional genomics and provide a foundation for the maize pan-genome.
We created a new high‐coverage, robust, and reproducible functional annotation of maize protein‐coding genes based on Gene Ontology (GO) term assignments. Whereas the existing Phytozome and Gramene maize GO annotation sets only cover 41% and 56% of maize protein‐coding genes, respectively, this study provides annotations for 100% of the genes. We also compared the quality of our newly derived annotations with the existing Gramene and Phytozome functional annotation sets by comparing all three to a manually annotated gold standard set of 1,619 genes where annotations were primarily inferred from direct assay or mutant phenotype. Evaluations based on the gold standard indicate that our new annotation set is measurably more accurate than those from Phytozome and Gramene. To derive this new high‐coverage, high‐confidence annotation set, we used sequence similarity and protein domain presence methods as well as mixed‐method pipelines that were developed for the Critical Assessment of Function Annotation (CAFA) challenge. Our project to improve maize annotations is called maize‐GAMER (GO Annotation Method, Evaluation, and Review), and the newly derived annotations are accessible via MaizeGDB (http://download.maizegdb.org/maize-GAMER) and CyVerse (B73 RefGen_v3 5b+ at doi.org/10.7946/P2S62P and B73 RefGen_v4 Zm00001d.2 at doi.org/10.7946/P2M925).
MaizeGDB is a highly curated, community-oriented database and informatics service to researchers focused on the crop plant and model organism Zea mays ssp. mays. Although some form of the maize community database has existed over the last 25 years, there have only been two major releases. In 1991, the original maize genetics database MaizeDB was created. In 2003, the combined contents of MaizeDB and the sequence data from ZmDB were made accessible as a single resource named MaizeGDB. Over the next decade, MaizeGDB became more sequence driven while still maintaining traditional maize genetics datasets. This enabled the project to meet the continued growing and evolving needs of the maize research community, yet the interface and underlying infrastructure remained unchanged. In 2015, the MaizeGDB team completed a multi-year effort to update the MaizeGDB resource by reorganizing existing data, upgrading hardware and infrastructure, creating new tools, incorporating new data types (including diversity data, expression data, gene models, and metabolic pathways), and developing and deploying a modern interface. In addition to coordinating a data resource, the MaizeGDB team coordinates activities and provides technical support to the maize research community. MaizeGDB is accessible online at http://www.maizegdb.org. \copyright Published by Oxford University Press on behalf of Nucleic Acids Research 2015. This work is written by (a) US Government employee(s) and is in the public domain in the US.
Comparative genetic maps are used in examination of genome organization, detection of conserved gene order, and exploration of marker order variations. YouGenMap is an open-source web tool that offers dynamic comparative mapping capability of users’ own genetic mapping between 2 or more map sets. Users’ genetic map data and optional gene annotations are uploaded, either publically or privately, as long as they follow our template which is available in several standard file formats. Data is parsed and loaded into MySQL relational database to be displayed and compared against users’ genetic maps or other public data available on YouGenMap. With the highly interactive GUIs, all public data on YouGenMap are maps available for visualization, comparison, search, filtration and download. YouGenMap web tool is available on the website (http://conifergdb.miamioh.edu/yougenmap) with the source-code repository at (http://sourceforge.net/projects/yougenmap/?source=directory).
As a model organism, use of Chlamydomonas reinhardtii is not only limited with biological experiments to understand chloroplast and flagella, but is also extended to biodiesel production. Chlamydomonas promoter regions were extracted based on available RNA-Seq data and community genome annotation, and promoters were used to analyze and detect core and proximal promoter elements. While the evidence suggests only the TATA box (canonical and non-canonical TATA boxes) as the only core promoter element, it also indicates that the TATA box in Chlamydomonas is different than Arabidopsis and human TATA boxes. While some proximal promoter elements discovered show weak similarities to known promoter elements from other species, most are novel elements only present in Chlamydomonas. Most of the proximal promoter elements detected show significant similarities to each other. It is evident from this study that the promoter architecture in Chlamydomonas seems to be simpler compared to animals and plants.
BACKGROUND: Previous loblolly pine (Pinus taeda L.) genetic linkage maps have been based on a variety of DNA polymorphisms, such as AFLPs, RAPDs, RFLPs, and ESTPs, but only a few \SSRs (simple sequence repeats), also known as simple tandem repeats or microsatellites, have been mapped in P. taeda. The objective of this study was to integrate a large set of \SSR markers from a variety of sources and published cDNA markers into a composite P. taeda genetic map constructed from two reference mapping pedigrees. A dense genetic map that incorporates \SSR loci will benefit complete pine genome sequencing, pine population genetics studies, and pine breeding programs. Careful marker annotation using a variety of references further enhances the utility of the integrated \SSR map. RESULTS: The updated P. taeda genetic map, with an estimated genome coverage of 1,515 cM(Kosambi) across 12 linkage groups, incorporated 170 new \SSR markers and 290 previously reported \SSR, RFLP, and ESTP markers. The average marker interval was 3.1 cM. Of 233 mapped \SSR loci, 84 were from cDNA-derived sequences (EST-\SSRs) and 149 were from non-transcribed genomic sequences (genomic-\SSRs). Of all 311 mapped cDNA-derived markers, 77% were associated with NCBI Pta UniGene clusters, 67% with RefSeq proteins, and 62% with functional Gene Ontology (GO) terms. Duplicate (i.e., redundant accessory) and paralogous markers were tentatively identified by evaluating marker sequences by their UniGene cluster IDs, clone IDs, and relative map positions. The average gene diversity, He, among polymorphic \SSR loci, including those that were not mapped, was 0.43 for 94 EST-\SSRs and 0.72 for 83 genomic-\SSRs. The genetic map can be viewed and queried at http://www.conifergdb.org/pinemap. CONCLUSIONS: Many polymorphic and genetically mapped \SSR markers are now available for use in P. taeda population genetics, studies of adaptive traits, and various germplasm management applications. Annotating mapped genes with UniGene clusters and GO terms allowed assessment of redundant and paralogous EST markers and further improved the quality and utility of the genetic map for P. taeda.