![]() |
K. James Durbin (Atlas Genome Tools Group)
Is a fast repeat supressing search tool, written primarialy for anchoring reads to genomes, but also adaptable to other genome scale comparison problems.
bang's key features are efficiently coded k-mer hashing which gives it exceptional speed,
a k-mer thinning scheme that allows it to run large jobs in comparatively small RAM footprint,
and an ability to use k-mer distributions
to effectively perform de novo repeat filtering of matches. bang is also designed to be
convenient to use with even very large data sets (e.g. no arbitrary limits on the number of query sequences
like megablast).
An example of the kind of performance you can expect form bang is given in the
table below.
| Program | GenomeSize | Reads | RAM | RunTime | Sensitivity | Specificity |
|---|---|---|---|---|---|---|
| bang | 120 MB | 500K (~450MB) | ~400MB | 10 minutes | 0.981 | 0.973 |
| megablast | 120 MB | 500K (~450MB) | ~300MB | 60 minutes | 0.999 | 0.265 |
| ssaha | 120 MB | 500K (~450MB) | ? (ran out of memory on 4 GB machine) | |||
This table gives out-of-the-box performance characteristics and is intended only to
give you a relative view of the sort of characteristics bang has. Suitibility for any particular
application is dependent on many factors.
To further illustrate the blinding speed of bang, consider a recent read mapping task we performed. In this task, 2.5 million human resequencing reads were mapped to the human genome using 8 cluster cpus in only 2.5 wall-clock hours, or a total of 18 cpu-hours. Peak RAM usage was 600 MB. Said another way, we could have done a complete mapping of 1x coverage of reads to the human genome on a single laptop in less than a day.
LICENSE
Please read the terms of the license agreement carefully.
License summary:
By downloading this software you explicitly agree to the terms of the license agreement .
Click to go to bang download page.
CBT++ is Copyright© 2000-2005 Baylor College of Medicine Human Genome Sequencing Center
All rights reserved
|
||||