INSDC

INSDC agreed methodological keywords

Methodological keywords are added at the discretion of the INSDC from a controlled vocabulary of terms that explicitly describe the type of sequence contained within the entry.

Keyword Description
BARCODE a genetic barcode sequence that meets the standards as defined by CBoL (Consortium for the Barcode of Life)
CAGE (Cap Analysis Gene Expression) short sequences in 5′ end of mRNA obtained by using Cap Analysis Gene Expression
CAP trapper sequences obtained from cDNAs created using cap-trapping
ENV A sequence derived from an environmental sample (see the description for the /environmental sample qualifier in Feature Table document)
EST Short single pass cDNA sequences with no annotation other than a source feature.
expressed sequence tag Short single pass cDNA sequences with no annotation other than a source feature.
EST (expressed sequence tag) Deprecated.
3′-end sequence (3′-EST) EST sequence from the 3′ direction
5′-end sequence (5′-EST) EST sequence from the 5′ direction
FLI_CDNA A sequence derived from a cDNA libraries created using full-mRNA cloning methods
GSS Genome survey sequence, short single pass genomic sequences
genome survey sequence Genome survey sequence, short single pass genomic sequences
HTC High-throughput cDNA record from full length cDNA sequencing projects.
HTG The sequence submitted mainly from genome sequencing projects which regarded a clone as a sequencing unit.
HTGS_PHASE0 Sequence consists of an unordered set of sequencing reads (typically 100-200), unoriented, unordered, unannotated and containing gaps
HTGS_PHASE1 Sequence consists of unfinished, may be unordered, unoriented contigs, with gaps, with or without annotation.
HTGS_PHASE2 Sequence consists of unfinished, ordered, oriented contigs, with or without gaps, with or without annotation.
HTGS_PHASE3 deprecated; once the sequence is considered to be finished, the HTGS_PHASEx keyword should be removed
HTGS_DRAFT Sequence at draft stage. These should be, on average over all draft depositions, at least 4X coverage (but can be more) and are Phase 1 or Phase 2.
HTGS_ENRICHED The BAC assembly is enriched by inclusion of both BAC generated reads and overlapping WGS reads.
HTGS_POOLED_CLONE The assembly consists of a specific BAC clone’s reads that were deconvoluted from an array of pooled clones; contains overlapping reads from WGS sequencing if used in conjunction with HTGS_ENRICHED.
HTGS_POOLED_MULTICLONE The assembly consists of reads from multiple BAC clone reads that have not yet been deconvoluted from an array of pooled clones.
UNORDERED For CON records, order and orientation of components are unknown.
oligo capping sequences obtained from cDNAs created using oligo-capping
MAG Metagenome Assembled Genomes (MAGs) are prokaryotic, eukaryotic or viral genomes that have been constructed from DNA or RNA sequences isolated from one or more environmental samples or samples containing more than one organism. The sequences are computationally binned and assembled into genomes, and each MAG is asserted to represent the genome of a single organism.
Metagenome Assembled Genome Metagenome Assembled Genomes (MAGs) are prokaryotic, eukaryotic or viral genomes that have been constructed from DNA or RNA sequences isolated from one or more environmental samples or samples containing more than one organism. The sequences are computationally binned and assembled into genomes, and each MAG is asserted to represent the genome of a single organism.
STS Sequence tagged site; The tag site for genome sequencing. The information of chromosome, map, PCR_condition is mandatory for this division.
sequence tagged site Sequence tagged site; The tag site for genome sequencing. The information of chromosome, map, PCR_condition is mandatory for this division.
STS (sequence tagged site) Deprecated.
TPA Re-annotation/assemblies/re-assemblies of primary sequences deposited in INSDC, the Trace Archive (TA) or the Short-Read Archive (SRA) that have been the focus of a peer reviewed publication.
Third Party Data Re-annotation/assemblies/re-assemblies of primary sequences deposited in INSDC, the Trace Archive (TA) or the Short-Read Archive (SRA) that have been the focus of a peer reviewed publication.
TPA:experimental TPA sequences where the annotations presented are supported by wet-lab experimental evidence
TPA:inferential TPA sequences where the annotations presented are not supported by wet-lab experimental evidence
TPA:assembly TPA sequences that are a re-assembly of an existing genome or large genomic region or TPA sequences that are a de-novo assembly of existing primary sequences deposited in INSDC
TPA:specialist_db TPA sequences that are submitted from an existing authoritative public database that is built using INSDC sequence data and is described in an accepted peer-reviewed publication
TSA shotgun assemblies of primary transcriptome data deposited in INSDC, the Trace Archive (TA) or the Short-Read Archive (SRA)
Transcriptome Shotgun Assembly shotgun assemblies of primary transcriptome data deposited in INSDC, the Trace Archive (TA) or the Short-Read Archive (SRA)
WGS Whole genome shotgun data from a sequencing project where the data are likely to be completely reassembled with no tracking between reassembly updates.
STANDARD_DRAFT Standard draft genome sequence data should refer to any number of reads, or runs on any number of different sequencing platforms, and assembled into sequenced contigs.
HIGH_QUALITY_DRAFT This should refer to a product with overall coverage representing greater than 90% of the genome or target region. Efforts should be made to include only sequence of the target organism and exclude contaminating sequences (e.g. from tissue cell lines).
IMPROVED_HIGH_QUALITY_DRAFT This standard refers to a high-quality-draft assembled sequence where additional work has been performed beyond the initial shotgun sequencing and standard assembly using either manual or automated methods. In addition, this standard should contain no discernable misassemblies, and if feasible, should have attempted to reduce the number of contigs (thus reducing the number of gaps) and supercontigs (or scaffolds).
ANNOTATION_GRADE This standard may overlap with the previous two standards of high quality and improved draft, but emphasizes the verification and correction of anomalies within coding regions such as frameshifts, and stop codons, particularly in sequences of interest within the specific genome.
NON_CONTIGUOUS_FINISHED Describes high-quality assemblies that have been subject to automated and manual improvement, and where closure approaches have been successful for almost all gaps, misassemblies, and low-quality regions. Attempts have been made to resolve all gap and sequence uncertainties, and only those recalcitrant to resolution remain (with notations in the genome submission as to the nature of the uncertainty).