{"id":309,"date":"2023-03-03T10:03:23","date_gmt":"2023-03-03T10:03:23","guid":{"rendered":"https:\/\/www.insdc.org\/?post_type=news&p=309"},"modified":"2024-05-09T11:53:07","modified_gmt":"2024-05-09T10:53:07","slug":"insdc-spatiotemporal-metadata-minimum-standards-update-03-03-2023","status":"publish","type":"news","link":"https:\/\/www.insdc.org\/news\/insdc-spatiotemporal-metadata-minimum-standards-update-03-03-2023\/","title":{"rendered":"INSDC spatiotemporal metadata – minimum standards update (03-03-2023)"},"content":{"rendered":"\n<p class=\"vf-text--body vf-text-body--2\">INSDC continues its aim to increase the number of sequences for which the origin of the sample can be precisely located in time and space through harmonisation of accurate geographical annotation and date and time of collection information.<\/p>\n\n\n\n<p class=\"vf-text--body vf-text-body--2\">In this update, INSDC will elaborate on the plans for the new standards being introduced for spatiotemporal metadata as well as the next steps for implementation.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p class=\"vf-text--body vf-text-body--2\"><strong>Technical implementation<\/strong><\/p>\n\n\n\n<p class=\"vf-text--body vf-text-body--2\">Mandatory spatiotemporal data will be captured in pre-existing fields. For sequence flat files, the data will be captured in the source qualifiers: \u2018country\u2019 and \u2018collection_date\u2019; for BioSamples the data will be captured in country and collection date attributes. BioSample fields, implementation and tooling may differ between partners and the INSDC partners may follow this announcement with individual statements about implementation.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p class=\"vf-text--body vf-text-body--2\">Minimum reporting requirements for these fields are as follows, though further granularity is encouraged:<\/p>\n\n\n\n<p class=\"vf-text--body vf-text-body--2\"><strong>Location of collection: <\/strong>the locality of isolation of the sequenced sample should be indicated to country level at least and should be provided in terms of political names for nations, oceans or seas using values from the controlled vocabulary at <a href=\"http:\/\/www.insdc.org\/documents\/country-qualifier-vocabulary\">http:\/\/www.insdc.org\/documents\/country-qualifier-vocabulary<\/a><\/p>\n\n\n\n<p class=\"vf-text--body vf-text-body--2\"><strong>Date\/time of collection: <\/strong>the date and time at which the specimen was collected should be provided, at least to the nearest year.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p class=\"vf-text--body vf-text-body--2\">INSDC recognises that there are valid exemptions from this rule. Accordingly, the INSDC \u2018<a href=\"https:\/\/www.insdc.org\/submitting-standards\/missing-value-reporting\/\">missing value<\/a>\u2019 reporting standards will be extended to add another layer of granularity so users can report specific use-cases where they are unable to report spatiotemporal metadata. Previous \u2018lower-level\u2019 terms \u2018not collected\u2019, \u2018not provided\u2019 and \u2018restricted access\u2019 will be split into different use-case specific missing values which will form a new set of \u2018reporting-level\u2019 terms. Users will be encouraged to use these new \u2018reporting-level\u2019 INSDC missing value terms going forward. Although other missing values may remain in place for backwards compatibility purposes, partners may discontinue using \u2018lower-level\u2019 or \u2018top-level\u2019 in future. The list of \u2018reporting-level\u2019 terms that will be added are detailed below:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-table is-style-regular\"><table><tbody><tr><td class=\"has-text-align-left\" data-align=\"left\"><em>control sample<\/em><\/td><td class=\"has-text-align-left\" data-align=\"left\">Information is not applicable as the sample represents a negative control sample collected in a lab.<\/td><\/tr><tr><td class=\"has-text-align-left\" data-align=\"left\"><em>sample group<\/em><\/td><td class=\"has-text-align-left\" data-align=\"left\">Information is not applicable as the sample represents a group of samples that do not have a single origin. E.g. for co-assembly or transcriptome assembly.<\/td><\/tr><tr><td class=\"has-text-align-left\" data-align=\"left\"><em>synthetic construct<\/em><\/td><td class=\"has-text-align-left\" data-align=\"left\">Information does not exist as the sample represents an ab-initio synthetic construct.<\/td><\/tr><tr><td class=\"has-text-align-left\" data-align=\"left\"><em>lab stock<\/em><\/td><td class=\"has-text-align-left\" data-align=\"left\">Information was not collected as the sample represents a cultured cell line or model organism under long-term lab control.<\/td><\/tr><tr><td class=\"has-text-align-left\" data-align=\"left\"><em>third party data<\/em><\/td><td class=\"has-text-align-left\" data-align=\"left\">Information does not exist as the metadata was not collected or reported in records predating the 2023 agreement. For use in <a href=\"https:\/\/www.insdc.org\/submitting-standards\/tpa-submission-guidelines\/\">Third PArty<\/a> data submissions.<\/td><\/tr><tr><td class=\"has-text-align-left\" data-align=\"left\"><em>data agreement established pre-2023<\/em><\/td><td class=\"has-text-align-left\" data-align=\"left\">Data agreements were established before introduction of the 2023 INSDC spatiotemporal metadata standard and metadata can not be provided. A value may be given at a later stage.<\/td><\/tr><tr><td class=\"has-text-align-left\" data-align=\"left\"><em>endangered species<\/em><\/td><td class=\"has-text-align-left\" data-align=\"left\">Information can not be reported as the target organism is endangered e.g. on the IUCN red-list.<\/td><\/tr><tr><td class=\"has-text-align-left\" data-align=\"left\"><em>human-identifiable<\/em><\/td><td class=\"has-text-align-left\" data-align=\"left\">Information can not be reported as the metadata would make the sample human-identifiable.<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p class=\"vf-text--body vf-text-body--2\">Users can expect another announcement in a month’s time with an update to the <a href=\"https:\/\/www.insdc.org\/submitting-standards\/missing-value-reporting\/\">INSDC missing value reporting page <\/a>at which point these terms will be usable for BioSample registration.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p class=\"vf-text--body vf-text-body--2\"><strong>Timeline<\/strong><\/p>\n\n\n\n<p class=\"vf-text--body vf-text-body--2\">The timeline for implementation will be split across two main phases. These phases outline the main milestones where the new standards will be put in place for different record types. Between these phases, please note that INSDC partners may also progress on tightening more complex validation to ensure correct usage of the missing values. E.g. cross-referencing validation of \u2018model organism\u2019 exceptions against a list of valid taxa; or ensuring \u2018sample group\u2019 declarations contain references to more than one individually registered BioSamples.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p class=\"vf-text--body vf-text-body--2\">See details of these key phases below.<\/p>\n\n\n\n<p class=\"vf-text--body vf-text-body--2\"><strong>Phase I – new standard in place for BioSamples by the end of May 2023<\/strong><\/p>\n\n\n\n<p class=\"vf-text--body vf-text-body--2\">It will become mandatory to provide country and collection date metadata for all new registered BioSamples associated with INSDC data following this date unless a valid exemption is declared. As a result, all new raw (SRA\/ENA\/DRA) data and genomes will have associated spatiotemporal metadata in the BioSample.<\/p>\n\n\n\n<p class=\"vf-text--body vf-text-body--2\"><strong>Phase II – new standard in place for sequences by the end of Dec 2024<\/strong><\/p>\n\n\n\n<p class=\"vf-text--body vf-text-body--2\">It will become mandatory to provide country and collection date metadata for all newly submitted sequence records through any remaining submission routes within 2 years, this includes sequences submitted without BioSample references.<\/p>\n\n\n\n<p class=\"vf-text--body vf-text-body--2\">We thank users who have provided feedback on this so far and would like to encourage further feedback, particularly whether there are exemptions you feel are applicable that are not yet catered for. Please provide your feedback to the INSDC member database to which you normally submit:<\/p>\n\n\n\n<p class=\"vf-text--body vf-text-body--2\">DDBJ: please email <a href=\"mailto: ddbjsub@ddbj.nig.ac.jp\" data-type=\"mailto\" data-id=\"mailto: ddbjsub@ddbj.nig.ac.jp\">ddbjsub@ddbj.nig.ac.jp <\/a><\/p>\n\n\n\n<p class=\"vf-text--body vf-text-body--2\">ENA (EMBL-EBI): please email <a href=\"mailto:ena-collaborations@ebi.ac.uk\">ena-collaborations@ebi.ac.uk<\/a><\/p>\n\n\n\n<p class=\"vf-text--body vf-text-body--2\">GenBank and SRA (NCBI): please email <a href=\"mailto:gb-admin@ncbi.nlm.nih.gov\">gb-admin@ncbi.nlm.nih.gov<\/a> <\/p>\n","protected":false},"excerpt":{"rendered":"<p>INSDC continues its aim to increase the number of sequences for which the origin of the sample can be precisely located in time and space through harmonisation of accurate geographical annotation and date and time of collection information. In this update, INSDC will elaborate on the plans for the…<\/p>\n","protected":false},"author":6,"featured_media":0,"template":"","acf":[],"_links":{"self":[{"href":"https:\/\/www.insdc.org\/wp-json\/wp\/v2\/news\/309"}],"collection":[{"href":"https:\/\/www.insdc.org\/wp-json\/wp\/v2\/news"}],"about":[{"href":"https:\/\/www.insdc.org\/wp-json\/wp\/v2\/types\/news"}],"author":[{"embeddable":true,"href":"https:\/\/www.insdc.org\/wp-json\/wp\/v2\/users\/6"}],"wp:attachment":[{"href":"https:\/\/www.insdc.org\/wp-json\/wp\/v2\/media?parent=309"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}