IonCRAM: a reference-based compression tool for ion torrent sequence files

Abstract

Background

Ion Torrent is one of the leading next generation sequencing (NGS) technologies and is frequently used in medical research and diagnostics. The integrated software for Ion Torrent sequencing machines delivers sequencing results in BAM format. In addition to the usual SAM/BAM fields, the Ion Torrent BAM file includes technology-specific flow signal data. Flow signals occupy a large portion of the BAM file (around 75% for the human genome). Compression of SAM/BAM in CRAM format significantly reduces the space required to store NGS results. However, the tools for generating CRAM formats are not designed to handle the flow signals. This missing feature has motivated us to develop a new program to improve the compression of Ion Torrent files for long-term archiving.

Results

In this article, we introduce IonCRAM, the first reference-based compression tool to compress Ion Torrent BAM files for long-term archiving. For BAM files, IonCRAM could achieve about 43% space savings. This space saving is greater than that achieved with the CRAM format by approximately 8-9%.

Conclusions

Reducing the space consumption of NGS data reduces the cost of data storage and transfer. Therefore, developing efficient compression software for clinical NGS data goes beyond computational interest; as it ultimately contributes to the overall reduction in the cost of the clinical trial. The space savings achieved with our tool is a practical step in this direction. The tool is open source and available on Code Ocean, github and http://ioncram.saudigenomeproject.com.

Many academic publications written by UC are freely available on this site due to the open access policies. Let us know why this access is important to you.

Main content