Running Atlas Whole Genome Assembly Suite


In atlas-demo, we demonstrate the steps one has to take in running atlas-type WGS assemblies.

0. Prepare data

  1. Copy the original files supplied in the downloads to the desired directory (sequence, quality, and vector-masked sequence).

  2. Create trimmed reads file demo.fa.pass.gz (scan in from each end of read looking for 50-base windows of high quality and no vector).

  3. Create indices for demo.fa, demo.fa.qual and demo.fa.screen making it easier to extract reads from the files. The files and indices constitute a simple, file-oriented database that should be replaced with more efficient database tables for a large genome.

1. Run kmer-counter on demo.fa.pass.gz

  1. Run kmer count (here k=32).

  2. Analyze distribution: find out cutoff repeat frequency, choose cutoff = (2 or 3)*coverage (coverage ~peak in distribution curve).

  3. Create a kill-file for atlas-overlapper.

2. Run all-vs-all overlapper

3. Run binner, grouping reads into bins for localized assembly

4. Assemble bins

  1. Separate bins from the overall file of bins and readnames (.fon).

  2. Extract reads from the WGS reads pool (using indices on quality files and vector-masked sequences).

  3. Extend vector masking near ends of read to ends, and undo false masking interior to each read.

  4. Phrap assembly.

5. Combine, check and scaffold the bin assemblies

6. Ultra-scaffolding using FPC, markers and other large-scale or mapping-type data (not part of the current distribution)

For details, please refer to the paper by Havlak, et al.

Reference

Paul Havlak, Rui Chen, K. James Durbin, Amy Egan, Yanru Ren, Xing-Zhi Song, George M. Weinstock and Richard A. Gibbs, The Atlas Genome Assembly System Genome Res. 14: 721-732