|
Types of sequence deposited:
It is expected that two main types of sequence will be deposited at the CG-HLB Genome Resources Website. (1.) Short regions of sequence covering a relatively small number of loci (such as those that have been used for phylogeny analyses of different strains) and (2.) Data from larger scale sequencing efforts that will be starting material for multiple iterations of assembly and annotation.
Creation of documentation tables
Documentation table will be created for new sequence material whether submitted directly or via NCBI, and each assigned a reference number (1, 2, 3, etc.).
- Information included
(1.) research group or individual who submitted/generated the data and (2.) information on the source of the genetic material, and methods/programs used for sequencing and/or annotation.
- Intent
(1.)
to maintain clear documentation of the source sequence(s) and methods used to generate successive iterations of sequence assembly and annotation and (2.) to make clear which annotation files are associated with which sequence.
Version Control
To minimize confusion arising from the presence of multiple version of sequence and annotation files, sequence files will be assigned a prefix name according to the presumed species of origin followed by a number (e.g. “las0001-[descriptor].txt”). All annotation files generated and posted on the website for this sequence will also begin with the prefix “las0001” and will be added to the documentation table created for the sequence upon its original posting. Subsequent sequence submissions will warrant creation of new documentation tables.
Changes to existing sequence without addition of novel sequence data
If a subsequent sequence file is created by reordered or truncating a previous submission without addition of new sequence data, it will be treated as a version of the previous submission. For example, reordering of the las0001.txt sequence file would result in a file with the prefix “las0001.1”. All annotations for the “las0001.1” sequence would also be given the prefix number “las0001.1”.
Changes to an existing sequence involving addition of novel sequence
If the newly posted sequence includes novel sequences or sequences derived from multiple previous submissions, the new sequence will be treated as a new sequence worthy of a new primary accession number (e.g. “las0002”). All annotations applicable to the new version of the sequence will be given the prefix number “las0002”.
Sequences assigned Genbank accession numbers will be identified by those numbers. However, Genbank numbers will not be propagated in new version names so as to avoid confusion with Genbank’s own system of version designation.
Example
An example of a documentation table is posted on http://www.citrusgreening.org/HLB/Data-home.aspx. This page (as of 11-16-08) shows documentation table #1 which contains a posting of the sequence las0001-pseudomol.txt. This sequence represents a reordering and joining of sequence data with NCBI accession number ABQW00000000 but in keeping with guidelines above, the NCBI accession number is not itself propagated. Associated annotation files also begin with the “las0001” prefix. As with the sequence file itself, a short descriptive term follows the prefix to orient users as to its contents.
Archives
As genome sequencing and annotation progress, it is likely that some earlier submissions will become obsolete for one reason or another. Obsolete files or tables will be moved to an archive page and their location on the main data page replaced with a short note indicating that the file has been archived and for what reason. In cases where data was retracted outright, archiving is not a requirement. Reference numbers previously used for documentation tables and file prefixes of obsoleted data will not be reused for new submissions.
|