INSDC agreed methodological keywords

Methodological keywords are added at the discretion of the INSDC from a controlled vocabulary of terms that explicitly describe the type of sequence contained within the entry.

KeywordDescription
BARCODEa genetic barcode sequence that meets the standards as defined by CBoL (Consortium for the Barcode of Life)
CAGE (Cap Analysis Gene Expression)short sequences in 5′ end of mRNA obtained by using Cap Analysis Gene Expression
CAP trappersequences obtained from cDNAs created using cap-trapping
ENVA sequence derived from an environmental sample (see the description for the /environmental sample qualifier in Feature Table document)
ESTShort single pass cDNA sequences with no annotation other than a source feature.
expressed sequence tagShort single pass cDNA sequences with no annotation other than a source feature.
EST (expressed sequence tag)Deprecated.
3′-end sequence (3′-EST)EST sequence from the 3′ direction
5′-end sequence (5′-EST)EST sequence from the 5′ direction
FLI_CDNAA sequence derived from a cDNA libraries created using full-mRNA cloning methods
GSSGenome survey sequence, short single pass genomic sequences
genome survey sequenceGenome survey sequence, short single pass genomic sequences
HTCHigh-throughput cDNA record from full length cDNA sequencing projects.
HTGThe sequence submitted mainly from genome sequencing projects which regarded a clone as a sequencing unit.
HTGS_PHASE0Sequence consists of an unordered set of sequencing reads (typically 100-200), unoriented, unordered, unannotated and containing gaps
HTGS_PHASE1Sequence consists of unfinished, may be unordered, unoriented contigs, with gaps, with or without annotation.
HTGS_PHASE2Sequence consists of unfinished, ordered, oriented contigs, with or without gaps, with or without annotation.
HTGS_PHASE3deprecated; once the sequence is considered to be finished, the HTGS_PHASEx keyword should be removed
HTGS_DRAFTSequence at draft stage. These should be, on average over all draft depositions, at least 4X coverage (but can be more) and are Phase 1 or Phase 2.
HTGS_ENRICHEDThe BAC assembly is enriched by inclusion of both BAC generated reads and overlapping WGS reads.
HTGS_POOLED_CLONEThe assembly consists of a specific BAC clone’s reads that were deconvoluted from an array of pooled clones; contains overlapping reads from WGS sequencing if used in conjunction with HTGS_ENRICHED.
HTGS_POOLED_MULTICLONEThe assembly consists of reads from multiple BAC clone reads that have not yet been deconvoluted from an array of pooled clones.
UNORDEREDFor CON records, order and orientation of components are unknown.
oligo cappingsequences obtained from cDNAs created using oligo-capping
MAGMetagenome Assembled Genomes (MAGs) are prokaryotic, eukaryotic or viral genomes that have been constructed from DNA or RNA sequences isolated from one or more environmental samples or samples containing more than one organism. The sequences are computationally binned and assembled into genomes, and each MAG is asserted to represent the genome of a single organism.
Metagenome Assembled GenomeMetagenome Assembled Genomes (MAGs) are prokaryotic, eukaryotic or viral genomes that have been constructed from DNA or RNA sequences isolated from one or more environmental samples or samples containing more than one organism. The sequences are computationally binned and assembled into genomes, and each MAG is asserted to represent the genome of a single organism.
STSSequence tagged site; The tag site for genome sequencing. The information of chromosome, map, PCR_condition is mandatory for this division.
sequence tagged siteSequence tagged site; The tag site for genome sequencing. The information of chromosome, map, PCR_condition is mandatory for this division.
STS (sequence tagged site)Deprecated.
TPARe-annotation/assemblies/re-assemblies of primary sequences deposited in INSDC, the Trace Archive (TA) or the Short-Read Archive (SRA) that have been the focus of a peer reviewed publication.
Third Party DataRe-annotation/assemblies/re-assemblies of primary sequences deposited in INSDC, the Trace Archive (TA) or the Short-Read Archive (SRA) that have been the focus of a peer reviewed publication.
TPA:experimentalTPA sequences where the annotations presented are supported by wet-lab experimental evidence
TPA:inferentialTPA sequences where the annotations presented are not supported by wet-lab experimental evidence
TPA:assemblyTPA sequences that are a re-assembly of an existing genome or large genomic region or TPA sequences that are a de-novo assembly of existing primary sequences deposited in INSDC
TPA:specialist_dbTPA sequences that are submitted from an existing authoritative public database that is built using INSDC sequence data and is described in an accepted peer-reviewed publication
TSAshotgun assemblies of primary transcriptome data deposited in INSDC, the Trace Archive (TA) or the Short-Read Archive (SRA)
Transcriptome Shotgun Assemblyshotgun assemblies of primary transcriptome data deposited in INSDC, the Trace Archive (TA) or the Short-Read Archive (SRA)
WGSWhole genome shotgun data from a sequencing project where the data are likely to be completely reassembled with no tracking between reassembly updates.
STANDARD_DRAFTStandard draft genome sequence data should refer to any number of reads, or runs on any number of different sequencing platforms, and assembled into sequenced contigs.
HIGH_QUALITY_DRAFTThis should refer to a product with overall coverage representing greater than 90% of the genome or target region. Efforts should be made to include only sequence of the target organism and exclude contaminating sequences (e.g. from tissue cell lines).
IMPROVED_HIGH_QUALITY_DRAFTThis standard refers to a high-quality-draft assembled sequence where additional work has been performed beyond the initial shotgun sequencing and standard assembly using either manual or automated methods. In addition, this standard should contain no discernable misassemblies, and if feasible, should have attempted to reduce the number of contigs (thus reducing the number of gaps) and supercontigs (or scaffolds).
ANNOTATION_GRADEThis standard may overlap with the previous two standards of high quality and improved draft, but emphasizes the verification and correction of anomalies within coding regions such as frameshifts, and stop codons, particularly in sequences of interest within the specific genome.
NON_CONTIGUOUS_FINISHEDDescribes high-quality assemblies that have been subject to automated and manual improvement, and where closure approaches have been successful for almost all gaps, misassemblies, and low-quality regions. Attempts have been made to resolve all gap and sequence uncertainties, and only those recalcitrant to resolution remain (with notations in the genome submission as to the nature of the uncertainty).