-
Notifications
You must be signed in to change notification settings - Fork 6
Home
Bio::Faster is a BioRuby gem that implements a fast and simple parser for FastQ files. The new version dropped the support for FastA files to focus on the more resource demanding FastQ parsing. This new version is a rewrite of the old one, the C extension has been completely written from scratch and now the parser checks also for formatting problems in FastQ files. Full RSpecs has been defined based on the test files available in the official FastQ paper. This new gem uses Ruby-FFI to bind against the C extension and it's also compatible with JRuby. For a full list of supported Rubies check Travis-CI
The Bio::Faster class is instantiated with the file name and the each_record method is then used to parse the whole file. It returns an array for each sequence with the sequence header (ID and comment), the sequence itself and an array with the quality values. Default quality encoding is expected to be Sanger (Phred33).
fastq = Bio::Faster.new("sequences.fastq")
fastq.each_record do |sequence_header, sequence, quality|
puts sequence_header, sequence, quality
end
If the quality encoding is Phred64 (i.e. Solexa) you need to specify it:
fastq_solexa = Bio::Faster.new("sequences.fastq",:solexa)
The each_record method can also read directly from STDIN and this can be useful when dealing with compressed FastQ files.
Just specify stdin as the input:
Bio::Faster.new(:stdin).each_record do |seq|
...
and you can call the Ruby script with pipes in a standard Unix terminal:
zcat sequences.fastq.gz | ruby my_parser.rb
So you can read gzipped files without any drop in the parsing performance.
BioFaster is almost 3-4X times faster then standard object oriented FastQ parser method (and even faster with JRuby).
This is a comparison of the time needed to parse a 5.4 Gb Illumina 1.8+ FastQ file.
Bio::Faster.new("test_file.fastq").each_record {|sequence_header, sequence, quality|}
Ruby 1.9.3-p194
real 4m1.337s
user 3m56.447s
sys 0m4.339s
JRuby 1.6.7 OpenJDK 64-Bit Server VM 1.6.0_18
real 3m12.023s
user 3m9.040s
sys 0m4.277s
Bio::FlatFile.open(Bio::Fastq,File.open("test_file.fastq")).each_entry {|seq|}
real 11m35.946s
user 11m26.762s
sys 0m7.764s