Human Genome Sequencing Center, Baylor College of Medicine
 

bang


Author:

K. James Durbin (Atlas Genome Tools Group)


bang

Is a fast repeat supressing search tool, written primarialy for anchoring reads to genomes, but also adaptable to other genome scale comparison problems.

bang's key features are efficiently coded k-mer hashing which gives it exceptional speed, a k-mer thinning scheme that allows it to run large jobs in comparatively small RAM footprint, and an ability to use k-mer distributions to effectively perform de novo repeat filtering of matches. bang is also designed to be convenient to use with even very large data sets (e.g. no arbitrary limits on the number of query sequences like megablast).

An example of the kind of performance you can expect form bang is given in the table below.

Program GenomeSizeReadsRAM RunTime Sensitivity Specificity
bang 120 MB 500K (~450MB) ~400MB 10 minutes0.981 0.973
megablast 120 MB 500K (~450MB) ~300MB 60 minutes0.999 0.265
ssaha 120 MB 500K (~450MB) ? (ran out of memory on 4 GB machine)

This table gives out-of-the-box performance characteristics and is intended only to give you a relative view of the sort of characteristics bang has. Suitibility for any particular application is dependent on many factors.

To further illustrate the blinding speed of bang, consider a recent read mapping task we performed. In this task, 2.5 million human resequencing reads were mapped to the human genome using 8 cluster cpus in only 2.5 wall-clock hours, or a total of 18 cpu-hours. Peak RAM usage was 600 MB. Said another way, we could have done a complete mapping of 1x coverage of reads to the human genome on a single laptop in less than a day.

LICENSE

Please read the terms of the license agreement carefully.

License summary:


SYSTEM REQUIREMENTS

By downloading this software you explicitly agree to the terms of the license agreement .

Click to go to bang download page.



Contact James Durbin kdurbin@bcm.tmc.edu for information about obtaining a commercial license for this software.


CBT++ is Copyright© 2000-2005 Baylor College of Medicine Human Genome Sequencing Center
All rights reserved

.
BCM HGSC

www.hgsc.bcm.tmc.edu/~kdurbin/cbt.docs/