Skip to content

Data Format

oluwatosin oluwadare edited this page Feb 28, 2018 · 21 revisions

GenomeFlow generates a text file in the medium file format.

Medium format (most common)

A whitespace separated file that contains, on each line:

<readname> <str1> <chr1> <pos1> <frag1> <str2> <chr2> <pos2> <frag2> <mapq1> <mapq2>

  • str = strand (0 for forward, anything else for reverse)
  • chr = chromosome (must be a chromosome in the genome)
  • pos = position
  • frag = restriction site fragment
  • mapq = mapping quality score

If not using the restriction site file option, frag will be ignored, but please see above note on dummy values. If not using mapping quality filter, mapq will be ignored. readname and strand are also not currently stored within .hic files.

Test data is the GM06990 cell line data, that can be downloaded from link below:

More details about other file formats can be found here


hic file

The .hic file is a binary file containing compressed contact matrices at many resolutions, facilitating visualization and analysis at multiple scales. The .hic file format is described extensively in Durand and Shamim et al., 2016

To create an hic file use the GenomeFlow Funcions Convert mapped Hi-C reads to hic format file

Clone this wiki locally