README for Genome Sequence of Mayetiola destructor, Mdes_0.5 (September 10, 2009) 0. Conditions for use 1. What's New 2. Introduction 3. Description of files 4. Sequence and Scaffold statistics 5. Read statistics 6. History 0. Conditions for Use. The data may be freely downloaded, used in analyses, and repackaged in databases. Some of the data presented here represents work in progress. The data are being released by the Baylor College of Medicine Human Genome Sequencing Center, (BCM-HGSC) prior to project completion as a public service to allow our colleagues to search for genes or functions and speed their research. These data have not been edited and are presented "as is." You should regard the data as preliminary if it is unpublished. The data providers and associated funding agencies bear no responsibility for the user's reliance upon or interpretation of these data. The accuracy or reliability of the data is not guaranteed or warranted in any way and the providers disclaim liability of any kind. If you use this preliminary information we request that you honor the following conditions: 1. Please communicate your results to us so that we can incorporate them into the annotation of the final sequence. Contact us at hgsc- help@hgsc.bcm.tmc.edu. 2. Acknowledge the information obtained from BCM-HGSC in publications by stating in Materials and Methods and Acknowledgements: "Preliminary sequence data was obtained from Baylor College of Medicine Human Genome Sequencing Center website at http://www.hgsc.bcm.tmc.edu." Also acknowledge our funding source, which is listed in each project, with a statement such as "The DNA sequence of Mayetiola destructor was supported by USDA CRESS award 2008-35302-18816 to Stephen Richards at the BCM-HGSC." We also request that you notify us when your manuscript is accepted and send us a pre-print of the article. 3. Use of this data or information derived from it on a web page is permitted, providing the web page contains the statement that "Preliminary sequence data was obtained from the Baylor College of Medicine Human Genome Sequencing Center website at http://www.hgsc.bcm.tmc.edu." Please inform us of your web page by sending email to hgsc-help@hgsc.bcm.tmc.edu. 4. All other written or oral public disclosures of research using data from the BCM-HGSC should follow the acknowledgment guidelines outlined above. 5. However, although we encourage use of this preliminary information for limited studies, we request that you not publish whole genome or chromosome scale analyses of genes or genomic data prior to the publication of the BCM- HGSC report on the final genome sequence and analysis. Contact the BCM- HGSC at hgsc-help@hgsc.bcm.tmc.edu to discuss a waiver of this request, which could involve simple acknowledgment, co-authorship, or other methods. 6. Any redistribution of the data should carry this notice. 1. What's New The first release (Mdes_0.5 ) of the draft genome assembly of the Hessian fly, Mayetiola destructor, using 454 sequencing platform to generate the fragment and mate-pair reads. 2. Introduction This information is for the first release (Mdes_0.5) of the draft genome sequence of the Hessian fly, Mayetiola destructor. This is a draft sequence and may contain errors so users should exercise caution. Typical errors in draft genome sequences include mis-assemblies of repeated sequences, collapses of repeated regions, and unmerged overlaps (e.g. due to polymorphisms) creating artificial duplications. However base accuracy in contigs (contiguous blocks of sequence) is usually very high with most errors near the ends of contigs. The Mdes_0.5 release was produced by assembling whole genome shotgun reads (WGS) with the 454 Newbler assembler (v2.1-PreRelease-5/13/2009). The Newbler scaffolds were ordered and oriented based on the BAC end sequences and linearized into a single consensus sequence. Two types of WGS libraries were used to produce the data, including a 454 Titanium fragment library and paired end library with ~14 kb inserts. About 15.2 million reads were assembled, representing about 5,489 Mb of sequence and about 34x coverage of the Hessian fly genome. The products of this assembly are a set of contigs and scaffolds. Scaffolds include sequence contigs that can be ordered and oriented with respect to each other as well as isolated contigs that could not be linked (single contig scaffolds or singletons). The N50 of the contigs is 9.8 kb and the N50 of the scaffolds is 271 kb. The N50 size is the length such that 50% of the assembled genome lies in blocks of the N50 size or longer. The total length of all contigs is 107.1 Mb. When the gaps between contigs in scaffolds are included, the total span of the assembly is 146 Mb. 3. Description of files The files can be found by going to the Hessian fly section of the BCM-HGSC web site at www.hgsc.bcm.tmc.edu.There are 5 files: Mdes20090910-genome.agp (AGP file) Mdes20090910-genome.fa (fasta file) Mdes20090910-genome.fa.qual (quality file) Mdes20090910-contigs.fa (fasta file) Mdes20090910-contigs.fa.qual (quality file) All of the contigs are found in the fasta formatted sequence file (Mdes20090910-contigs.fa) and corresponding quality file (Mdes20090910-contigs.fa.qual). The AGP file describes the positions of the contigs in the genome. It takes the standard NCBI format http://www.ncbi.nlm.nih.gov/projects/genome/assembly/agp/AGP_Specification.s html. A fasta formatted sequence file (Mdes20090910-genome.fa) and corresponding quality file (Mdes20090910-genome.fa.qual) for linearized scaffolds where the gaps between adjacent contigs within a scaffold are filled with 'N's and the captured gap size is estimated from the clone insert size. Each scaffold is a separate sequence within the files. 4. Sequence and Scaffold statistics Genome Scaffolds/Contigs Number N50(kb) Bases+Gaps(Mb) Bases(Mb) All Scaffolds 6,782 271 146 107 All Contigs 16,209 9.8 107 107 5. Read statistics Total reads 16.7M (mate: 6.1M, fragment: 10.6M, BAC: 21.8K) Sequence Coverage [1] 34x [1] Sequence coverage was calculated as the total trimmed bases divided by estimated genome size (160 Mb). 6. History Mdes_0.5 (September 10, 2009) This release is the first, preliminary assembly of Hessian fly, Mayetiola destructor genome.