{"id":136,"date":"2022-06-22T11:07:53","date_gmt":"2022-06-22T10:07:53","guid":{"rendered":"https:\/\/insdc.org\/?page_id=136"},"modified":"2024-05-10T17:13:20","modified_gmt":"2024-05-10T16:13:20","slug":"insdc-standards-genome-assembly-submission","status":"publish","type":"page","link":"https:\/\/www.insdc.org\/submitting-standards\/insdc-standards-genome-assembly-submission\/","title":{"rendered":"INSDC standards for genome assembly submission"},"content":{"rendered":"\n<p class=\"vf-text--body vf-text-body--2\"><strong>This page represents a brief document describing at a high level INSDC requirements for assembly submission. Differences in submission requirements at lower levels are possible between INSDC databases (DDBJ, NCBI and ENA) and we recommend submitters refer to the corresponding websites of the submitting databases for more detailed instructions.<\/strong><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p class=\"vf-text--body vf-text-body--2\">Genome assemblies comprise a number of possible layers of information, including reads, contigs, scaffolds and chromosomes (see figure I). This document lays out the requirements for submission of genome assembly information into INSDC databases.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"vf-figure wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"238\" height=\"300\" class=\"vf-figure__image\" src=\"http:\/\/insdc.org\/wp-content\/uploads\/2022\/06\/assembly_layers-500x631-1-238x300.png\" alt=\"\" class=\"wp-image-137\" srcset=\"https:\/\/www.insdc.org\/wp-content\/uploads\/2022\/06\/assembly_layers-500x631-1-238x300.png 238w, https:\/\/www.insdc.org\/wp-content\/uploads\/2022\/06\/assembly_layers-500x631-1.png 475w\" sizes=\"(max-width: 238px) 100vw, 238px\" \/><\/figure>\n\n\n\n<p class=\"vf-text--body vf-text-body--2\"><strong>Figure I. The figure shows three typical assembly processes and the layers of information that they yield. A) Clone-based assembly with scaffolding and finishing steps. B) Shotgun assembly direct to chromosomes. C) Partial assembly to contigs only.<\/strong><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p class=\"vf-text--body vf-text-body--2\">Consistent with the variety of assembly processes, submitters to INSDC approach with data for the layers in different combinations of layers. Tables I and II shows requirements for new genome assembly submissions and updates to existing assembly submissions.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Table I. New genome assembly submissions<\/h4>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>Component<\/strong><\/td><td><strong>Level<\/strong><\/td><td><strong>Comment<\/strong><\/td><\/tr><tr><td>Reads<\/td><td>Recommended<\/td><td>Complete read and quality data<\/td><\/tr><tr><td>Read to contig mapping<\/td><td rowspan=\"2\">One of, as appropriate, optional<\/td><td>e.g. BAM alignment of reads to contigs<\/td><\/tr><tr><td>Read to chromosome mapping<\/td><td>e.g. BAM alignment of reads to new chromosome<\/td><\/tr><tr><td>Contigs<\/td><td rowspan=\"3\">At least one layer mandatory<\/td><td rowspan=\"3\"> <\/td><\/tr><tr><td>Scaffolds<\/td><\/tr><tr><td>Chromosomes<\/td><\/tr><tr><td>Scaffold to chromosome mapping<\/td><td>Mandatory if both layers are present<\/td><td>e.g. AGP file<\/td><\/tr><tr><td>Contig to scaffold mapping<\/td><td>Mandatory if both layers are present<\/td><td>e.g. AGP file<\/td><\/tr><tr><td>Assembly description<\/td><td>Mandatory<\/td><td>Brief information relating to assembly and future plans<\/td><\/tr><tr><td>Functional annotation<\/td><td>Optional<\/td><td> <\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Table II. Update to existing genome assembly<\/h4>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>Component<\/strong><\/td><td><strong>Level<\/strong><\/td><td><strong>Comment<\/strong><\/td><\/tr><tr><td>Reads<\/td><td>Recommended<\/td><td>Complete read and quality data<\/td><\/tr><tr><td>Read to contig mapping<\/td><td rowspan=\"2\">One of, as appropriate, optional<\/td><td>e.g. BAM alignment of reads to contigs<\/td><\/tr><tr><td>Read to chromosome mapping<\/td><td>e.g. BAM alignment of reads to new chromosome<\/td><\/tr><tr><td>Contigs<\/td><td rowspan=\"3\">At least one layer mandatory, with highest layer no lower than for existing assembly<\/td><td rowspan=\"3\"> <\/td><\/tr><tr><td>Scaffolds<\/td><\/tr><tr><td>Chromosomes<\/td><\/tr><tr><td>Scaffold to chromosome mapping<\/td><td>Mandatory if both layers are present<\/td><td>e.g. AGP file<\/td><\/tr><tr><td>Contig to scaffold mapping<\/td><td>Mandatory if both layers are present<\/td><td>e.g. AGP file<\/td><\/tr><tr><td>Assembly description<\/td><td>Mandatory<\/td><td>Brief information relating to assembly and future plans<\/td><\/tr><tr><td>Regenerated (or lifted-over) functional annotation<\/td><td>Recommended<\/td><td>If associated with existing assembly<\/td><\/tr><tr><td>Coding annotation mappings between old and new assemblies<\/td><td>Recommended where functional annotation is provided for the updated assembly<\/td><td>Typically through INSDC protein ID mappings<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p class=\"vf-text--body vf-text-body--2\">Third party genome assembly submissions and updates, in which the submitting group does not hold complete ownership of data, are subject to existing third party data rules, including the requirement for presentation of the new\/updated genome assembly in a peer reviewed publication prior to public release from ENA.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This page represents a brief document describing at a high level INSDC requirements for assembly submission. Differences in submission requirements at lower levels are possible between INSDC databases (DDBJ, NCBI and ENA) and we recommend submitters refer to the corresponding websites of the…<\/p>\n","protected":false},"author":2,"featured_media":0,"parent":233,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_acf_changed":false,"footnotes":""},"acf":[],"_links":{"self":[{"href":"https:\/\/www.insdc.org\/wp-json\/wp\/v2\/pages\/136"}],"collection":[{"href":"https:\/\/www.insdc.org\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.insdc.org\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.insdc.org\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.insdc.org\/wp-json\/wp\/v2\/comments?post=136"}],"version-history":[{"count":9,"href":"https:\/\/www.insdc.org\/wp-json\/wp\/v2\/pages\/136\/revisions"}],"predecessor-version":[{"id":567,"href":"https:\/\/www.insdc.org\/wp-json\/wp\/v2\/pages\/136\/revisions\/567"}],"up":[{"embeddable":true,"href":"https:\/\/www.insdc.org\/wp-json\/wp\/v2\/pages\/233"}],"wp:attachment":[{"href":"https:\/\/www.insdc.org\/wp-json\/wp\/v2\/media?parent=136"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}