DCAR V3.0 Gene Prediction

Resource Type: 
Analysis
Name: 
DCAR V3.0 Gene Prediction
Description: 

A multi-step approach was used to predict the most comprehensive gene model catalog for the carrot genome v3. MAKER and GeMoMa were used to perform gene prediction based on the integration of de novo gene prediction and evidence-based predictions. For MAKER, carrot ESTs, DH1 Illumina and IsoSeq transcriptome sequences, gene models obtained from five closely related or model species and proteins from Uniprot-sprot were used as transcript evidence. AUGUSTUS v2.5.5 and SNAP were used for de novo prediction. Through this analysis MAKER predicted 28,721 gene models. Next, GeMoMa was used to improve the quality of the splice junction sites predicted by MAKER and to predict the gene models that were not predicted by MAKER. The datasets included as input in GeMoMa were: predicted genes from the five related species or model species used for MAKER prediction; final gene models produced from MAKER pipeline; splice sites mined from the mapping of the DH1 Illumina transcriptome data on DH1 v3. This analysis produced an intermediate set of 32,625 gene models. A final step was performed to refine all gene models and predict any missing models. In this step, gene models predicted on the DH1 v2 assembly, named DCARv2 (32,112) and RefSeq (44,484) were transferred/re-predicted to the DH1 v3 genome assembly using GMAP and GenomeThreader. DCARv2 or RefSeq gene models that were not predicted by MAKER+GeMoMa, that had experimental evidence an that were not masked, were considered as new gene models. In those cases where the structure of the RefSeq and DCARv2 gene models were not in agreement, the correct structure was manually inspected using the experimental evidences. Finally, high-quality IsoSeq transcripts were mapped to the DH1 v3 assembly using GMAP and GenomeThreader, and those transcripts mapping with appropriate gene structure and not predicted in the previous steps, were added to the gene model catalog.

File: 
Publication: 
Coe K, Bostan H, Rolling W, Turner-Hissong S, Macko-Podgórni A, Senalik D, Liu S, Seth R, Curaba J, Mengist MF, Grzebelus D, Van Deynze A, Dawson J, Ellison S, Simon P, Iorizzo M. Population genomics identifies genetic signatures of carrot domestication and improvement and uncovers the origin of high-carotenoid orange carrots.. Nature plants. 2023 Oct; 9(10):1643-1658.
Relationship: 
There is 1 relationship.
Relationships
The analysis, DCAR V3.0 Gene Prediction, is a part of analysis, Carrot Genome Assembly DH1 v3.0.
Loading content
Program, Pipeline, Workflow or Method Name: 
MAKER; GeMoMa; AUGUSTUS v2.5.5; SNAP; GMAP; GenomeThreader
Program Version: 
3.0
Algorithm: 
de novo prediction, protein-homology searches, and prediction and transcript based evidence
Date Performed: 
Thursday, March 10, 2022 - 00:00
Data Source: 
Source Name
: Carrot Genome Assembly DH1 v3.0
Source URI
: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA798760
Organism: 
NameCommon NameComment
Carrot
For a general overview of carrot, see the Carrot Facts Page
Loading content
Biomaterial: 
NameDescription

This biosample is also known by its germplasm accession of DH1, please see this record for more details.

Loading content
Project: 
NameDescription

The project goals are to improve the assembly of the genome of cultivated carrot, and improve gene predictions using this improved assembly.

Loading content
Feature: 
There are 176000 CDS features
There are 36958 gene features
There are 36332 mRNA features
There are 36332 polypeptide features
There are 11 rRNA features
There are 654 tRNA features
Total: 286287