Methodological keywords are added at the discretion of the INSDC from a controlled vocabulary of terms that explicitly describe the type of sequence contained within the entry.
Keyword | Description |
---|---|
BARCODE | a genetic barcode sequence that meets the standards as defined by CBoL (Consortium for the Barcode of Life) |
CAGE (Cap Analysis Gene Expression) | short sequences in 5′ end of mRNA obtained by using Cap Analysis Gene Expression |
CAP trapper | sequences obtained from cDNAs created using cap-trapping |
ENV | A sequence derived from an environmental sample (see the description for the /environmental sample qualifier in Feature Table document) |
EST | Short single pass cDNA sequences with no annotation other than a source feature. |
expressed sequence tag | Short single pass cDNA sequences with no annotation other than a source feature. |
EST (expressed sequence tag) | Deprecated. |
3′-end sequence (3′-EST) | EST sequence from the 3′ direction |
5′-end sequence (5′-EST) | EST sequence from the 5′ direction |
FLI_CDNA | A sequence derived from a cDNA libraries created using full-mRNA cloning methods |
GSS | Genome survey sequence, short single pass genomic sequences |
genome survey sequence | Genome survey sequence, short single pass genomic sequences |
HTC | High-throughput cDNA record from full length cDNA sequencing projects. |
HTG | The sequence submitted mainly from genome sequencing projects which regarded a clone as a sequencing unit. |
HTGS_PHASE0 | Sequence consists of an unordered set of sequencing reads (typically 100-200), unoriented, unordered, unannotated and containing gaps |
HTGS_PHASE1 | Sequence consists of unfinished, may be unordered, unoriented contigs, with gaps, with or without annotation. |
HTGS_PHASE2 | Sequence consists of unfinished, ordered, oriented contigs, with or without gaps, with or without annotation. |
HTGS_PHASE3 | deprecated; once the sequence is considered to be finished, the HTGS_PHASEx keyword should be removed |
HTGS_DRAFT | Sequence at draft stage. These should be, on average over all draft depositions, at least 4X coverage (but can be more) and are Phase 1 or Phase 2. |
HTGS_ENRICHED | The BAC assembly is enriched by inclusion of both BAC generated reads and overlapping WGS reads. |
HTGS_POOLED_CLONE | The assembly consists of a specific BAC clone’s reads that were deconvoluted from an array of pooled clones; contains overlapping reads from WGS sequencing if used in conjunction with HTGS_ENRICHED. |
HTGS_POOLED_MULTICLONE | The assembly consists of reads from multiple BAC clone reads that have not yet been deconvoluted from an array of pooled clones. |
UNORDERED | For CON records, order and orientation of components are unknown. |
oligo capping | sequences obtained from cDNAs created using oligo-capping |
MAG | Metagenome Assembled Genomes (MAGs) are prokaryotic, eukaryotic or viral genomes that have been constructed from DNA or RNA sequences isolated from one or more environmental samples or samples containing more than one organism. The sequences are computationally binned and assembled into genomes, and each MAG is asserted to represent the genome of a single organism. |
Metagenome Assembled Genome | Metagenome Assembled Genomes (MAGs) are prokaryotic, eukaryotic or viral genomes that have been constructed from DNA or RNA sequences isolated from one or more environmental samples or samples containing more than one organism. The sequences are computationally binned and assembled into genomes, and each MAG is asserted to represent the genome of a single organism. |
STS | Sequence tagged site; The tag site for genome sequencing. The information of chromosome, map, PCR_condition is mandatory for this division. |
sequence tagged site | Sequence tagged site; The tag site for genome sequencing. The information of chromosome, map, PCR_condition is mandatory for this division. |
STS (sequence tagged site) | Deprecated. |
TPA | Re-annotation/assemblies/re-assemblies of primary sequences deposited in INSDC, the Trace Archive (TA) or the Short-Read Archive (SRA) that have been the focus of a peer reviewed publication. |
Third Party Data | Re-annotation/assemblies/re-assemblies of primary sequences deposited in INSDC, the Trace Archive (TA) or the Short-Read Archive (SRA) that have been the focus of a peer reviewed publication. |
TPA:experimental | TPA sequences where the annotations presented are supported by wet-lab experimental evidence |
TPA:inferential | TPA sequences where the annotations presented are not supported by wet-lab experimental evidence |
TPA:assembly | TPA sequences that are a re-assembly of an existing genome or large genomic region or TPA sequences that are a de-novo assembly of existing primary sequences deposited in INSDC |
TPA:specialist_db | TPA sequences that are submitted from an existing authoritative public database that is built using INSDC sequence data and is described in an accepted peer-reviewed publication |
TSA | shotgun assemblies of primary transcriptome data deposited in INSDC, the Trace Archive (TA) or the Short-Read Archive (SRA) |
Transcriptome Shotgun Assembly | shotgun assemblies of primary transcriptome data deposited in INSDC, the Trace Archive (TA) or the Short-Read Archive (SRA) |
WGS | Whole genome shotgun data from a sequencing project where the data are likely to be completely reassembled with no tracking between reassembly updates. |
STANDARD_DRAFT | Standard draft genome sequence data should refer to any number of reads, or runs on any number of different sequencing platforms, and assembled into sequenced contigs. |
HIGH_QUALITY_DRAFT | This should refer to a product with overall coverage representing greater than 90% of the genome or target region. Efforts should be made to include only sequence of the target organism and exclude contaminating sequences (e.g. from tissue cell lines). |
IMPROVED_HIGH_QUALITY_DRAFT | This standard refers to a high-quality-draft assembled sequence where additional work has been performed beyond the initial shotgun sequencing and standard assembly using either manual or automated methods. In addition, this standard should contain no discernable misassemblies, and if feasible, should have attempted to reduce the number of contigs (thus reducing the number of gaps) and supercontigs (or scaffolds). |
ANNOTATION_GRADE | This standard may overlap with the previous two standards of high quality and improved draft, but emphasizes the verification and correction of anomalies within coding regions such as frameshifts, and stop codons, particularly in sequences of interest within the specific genome. |
NON_CONTIGUOUS_FINISHED | Describes high-quality assemblies that have been subject to automated and manual improvement, and where closure approaches have been successful for almost all gaps, misassemblies, and low-quality regions. Attempts have been made to resolve all gap and sequence uncertainties, and only those recalcitrant to resolution remain (with notations in the genome submission as to the nature of the uncertainty). |