The International Nucleotide Database Collaboration (INSDC) have a standardised missing/null value reporting language to be used where a value of an expected format for sample metadata reporting can not be provided.
The controlled vocabulary takes into account different type of constraints. Submitters are strongly encouraged to always provide true values. However, if missing/null value reporting is required, submitters are asked to use a term with the finest granularity for their situation. See the table below for accepted missing value reporting terms.
INSDC Missing Value Reporting Terms:
INSDC term (top level) | INSDC term (lower level) | Definition | INSDC term (reporting level) | Definition |
---|---|---|---|---|
not applicable | Information is inappropriate to report, can indicate that the standard itself fails to model or represent the information appropriately | control sample | Information is not applicable as the sample represents a negative control sample collected in a lab. | |
sample group | Information is not applicable as the sample represents a group of samples that do not have a single origin. E.g. for co-assembly or transcriptome assembly. | |||
missing | not collected | Information of an expected format was not given because it has not been collected | synthetic construct | Information does not exist as the sample represents an ab-initio synthetic construct. |
lab stock | Information was not collected as the sample represents a cultured cell line or model organism under long-term lab control. | |||
third party data | Information does not exist as the metadata was not collected or reported in records predating the 2023 agreement. For use in Third PArty data submissions. | |||
not provided | Information of an expected format was not given, a value may be given at the later stage | data agreement established pre-2023 | Data agreements were established before the 2023 INSDC standard and metadata can not be provided. A value may be given at a later stage. | |
restricted access | Information exists but can not be released openly because of privacy concerns | endangered species | Information can not be reported as the target organism is endangered e.g. on the IUCN red-list. | |
human-identifiable | Information can not be reported as the metadata would make the sample human-identifiable. |
Usage of INSDC Missing Value Reporting Terms:
Please use the above standardised missing value vocabulary only if a true value of an expected format for a mandatory field is missing. If a true value is missing for a recommended or an optional field, then these fields should not be used for reporting at all. When reporting a missing mandatory field, the eight granular ‘reporting level’ terms need to be preceded with the term ‘missing: ’ to declare both the absence of a true value as well as the reason.
Example of usage:
geographic location (country and/or sea): missing: data agreement-established pre-2023
collection date: missing: control sample
geographic location (country and/or sea): missing: human-identifiable