Drosophila pseudoobscura Previous Updates
As of August 2003:
Freeze 1 Assembly of D. pseudoobscura:
We are pleased to announce an improved assembly of D. pseudoobscura which is available on the FTP site and BLAST pages. The statistics of the main assembly are as follows:
Scaffolds: Contigs in scaffolds: N50: 1,018,646 bp N50: 54,910 bp Total size: 139.3 M Total contig size: 129.4 M Average: 184,465 bp Average: 21,122 bp
In addition we have also we have tried to assemble reads that did not make it into the high quality assembly described above. Assemblies of repetitive sequences, and sequences with low overlaps are also available on the FTP site. Additionally, sequence reads resistant to all assembly attempts are also on that page. The high quality assembly and the additional assemblies and reads are able to be separately blasted on the BLAST page. For more details please see the readme on the FTP site.
Finally, a putative chromosome assignment for the majority of larger scaffolds (> 90% of unique sequence) is available. This is generally based on conservation between the Muller elements.
Future work includes the integration of the repeat assemblies with the main assembly.
As of May 2003:
TBLASTN is now available on the BLAST page Due to popular demand, the BLAST searching page has now been updated to include TBLASTN.
ESTs available: 34,000 D. pseudoobscura EST sequences are now available on the FTP site. These reads (which include forward and reverse reads of the cDNA clones) were sequenced from an embryonic full-length library produced by Dr. Ling Hong whilst in the Rubin laboratory and provided by Mark Stapleton and Gerry Rubin.
As of March 2003, the first freeze assembly is now available.
Contigs: average size 18.7Kb and N50 42Kb. The total length of the contigs is 128Mb.
Scaffolds: So far only Fosmid end sequence and medium sized insert WGS data has been used to generate scaffolds. When BAC end sequence data is added the size of the scaffolds is expected to dramatically increase. Current scaffold data: 464 scaffolds with at least 3 contigs per scaffold cover 128.2Mb. There are 288 scaffolds greater than 100Kb in size which together have a total size of 122.0Mb. Note that only scaffolds including two or more contigs scaffolds are in the linearized scaffold file and N's in this file represent the estimated size of the gap between contigs in the scaffold. Both the contigs and the linearized scaffold sequences are available on the FTP site.
An alignment of the D. pseudoobscura and D. melanogaster genomes has been produced in the Genome Sciences Department of LBNL using the Berkeley Genome Pipeline (1), based on the AVID alignment program (2). The alignment is displayed using the VISTA genome browser, which allows the alignment to be searched by the CG number of any D. melanogaster gene. Alignments of a particular region can also be downloaded using the contig details button.
The entire Berkeley genome pipeline was a joint collaboration between Inna Dubchak's group http://www-gsd.lbl.gov/dubchak/index.html and Lior Pachter, and is described in Couronne et al. 2003 (1).
(1) Couronne, O., Poliakov, A., Bray, N., Ishkhanov, T., Ryaboy, D., Rubin, E., Pachter, L., Dubchak, I. Strategies and Tools for Whole-Genome Alignments. Genome Res. 2003 Jan 13(1):73-80
(2) Bray, N., Dubchak, I. and Pachter, L. (2003) AVID: A Global Alignment Program. Genome Research 2003 Jan 13(1):97.
As of February 2003, an improved assembly of the D. pseudoobscuragenome is available. This version has significantly larger contig and N50 sizes of 13.8Kb and 28.7Kb for contigs greater then 1Kb in size. The number of contigs (and thus gaps) has also been reduced to 9820 so that there are now significantly fewer gaps than genes. The assembly will further improve in future iterations and as more data is added.
The D. pseudoobscura sequencing strain is now available. The D. pseudoobscura inbred strain used for sequencing is now available from the Tucson Drosophila species stock center. The stock number is 14011-0121.94. The history of the strain is described in the library and strain info section.
As of January 2003, the Drosophila pseudoobscura interim assembly is available. We have completed sequencing the first whole genome shotgun (WGS) library (2Kb insert). Other sequencing libraries moving into sequence production include 10kb, fosmid, BAC and cDNA. An intermediate assembly has been performed and contains 138Mb of sequence in contigs of average size 9.5Kb (an N50 of 21Kb). The other sequence libraries will add approximately 33% more sequence data. The additional data will order and orient the contigs using end sequence data from the larger insert libraries. In response to the interest in the data we have made the current assembly available as unordered contigs. The contigs are available for download on the FTP site. Additionally, you can BLAST your sequence against the D. pseudoobscura contigs at our BLAST web page.
As of October 31, 2002 the Baylor HGSC has begun sequencing the 2Kb whole genome shotgun library from a new Drosophila pseudoobscurastrain. The individual sequence traces are available from the NCBI Trace Archive. Two million sequence reads will be attempted before the end of 2002.
BAC, fosmid and 10kb library sequencing will follow. End sequence from these clones will be used to determine the order and orientation of contigs in the final assembly.
In other news, Mark Stapleton and Gerry Rubin of the Berkeley Drosophila Genome Project have provided a high quality embryonicDrosophila pseudoobscura cDNA library. This library will be used for EST sequencing.
Sixteen random fosmid clones have been selected for full sequencing and finishing to prepare for the annotation.
As of August 26, 2002 the Baylor HGSC has expanded a newDrosophila pseudoobscura strain for large scale DNA production from embryos. "Mitochondria-free" nuclear DNA will now be isolated from five grams of embryos. Genomic library construction will immediately follow. The BCM-HGSC is also in the process of preparing high molecular weight DNA for the creation of new BAC and fosmid libraries. End sequencing from these clones will be used to determine the order and orientation of contigs in the final assembly.
In other news, Mark Stapleton and Gerry Rubin of the Berkeley Drosophila Genome Project have provided a high quality embryonic Drosophila pseudoobscura cDNA library. This library will be used for EST sequencing.
Sixteen random fosmid clones have been selected for full sequencing and finishing to prepare for the annotation. At the present time, ten fosmid clones from this collection have been finished to high quality. Accession numbers for the remaining clones are as follows:
As of July 18th, 2002, the D. pseudoobscura sequencing has required a revisiting of the strain to be analyzed. The original strain selected at the Berkeley Laboratories had an inversion on one arm of chromosome 3. In other similar sequencing projects this has resulted in assembly difficulties where two haplotypes have to be independently assembled. In order to minimize the likelihood of this problem, we decided to switch to a D. pseudoobscura strain without an inversion.
With the efforts of Wyatt Anderson (Athens, GA) we now have a strain growing to replace the original. Dr. Anderson has verified both the species and the absence of inversions in the strain cytologically. Additionally this strain has been through 14 generations of brother-sister inbreeding.
To remove the possibility of DNA contamination from the gut contents, the DNA source for the genomic whole genome libraries will be prepared from embryos. At this time, growing the new D. pseudoobscura strain to collect embryos in quantity is the only obstacle preventing the start of the sequencing. We now expect the sequencing to take place in September/October 2002. Further updates will be posted to this page.