Skip to content

2D Functions

oluwatosin oluwadare edited this page Feb 20, 2019 · 25 revisions

Convert mapped Hi-C reads to .hic file

A. Purpose
To create a binary hic format file containing contact matrices at different resolutions and normalized by different methods from a text file describing mapped Hi-C reads

B.Test Data
Five formats are acceptable: short format, short format with score, medium format, long format and 4DN DCIC format. These formats are described here.

A sample file is: executable/sample_data/GSM1551688_HIC143_merged_nodups.zip (unzip it before use).
Another set of test data is the GM06990 cell line data. This can be downloaded from the link below:

  1. GM06990 Cell line:

C. Output
A binary .hic file containing contact matrices

D. Running
Access the function from the menu toolbar: 2D-Functions/Convert to HiC

E. Convert mapped Hi-C reads to .hic file GUI

Field Description Default
Input file A text file that contains the mapped Hi-C reads (format described above) NA
Genome ID Version genome of Hi-C data hg19
Genome ID (If Not listed above) Enter a unique Genome ID for the data, if it is not provided in the dropdown list above. NA
Output Directory The output directory path to output the generated hic format file. An example output filename is GenomeFlow_Convert_1521343280452.hic NA
Contact Threshold Number of interaction threshold for contacts to be used in creating contact matrices. 0
MAPQ Score Threshold Mapping quality score threshold for reads to be considered in creating contact matrices. 0
Chromosomes Chromosomes for which their contact matrices need to be created. When left blank, all chromosomes will be considered. Chromosomes must be separated by a comma (,). All (when left blank)
Resolutions List of resolutions of contact matrices to be created. Resolutions are separated by a comma (,) 2500000, 1000000, 500000, 250000, 100000, 50000, 25000,10000,5000
Restriction Site File Each line starts with a chromosome number followed by positions of restriction sites on that chromosome, in numeric order, and ending with the size of the chromosome. When provided, 8 additional fragment-delimited resolutions are added: 500f, 250f, 100f, 50f, 20f, 5f, 2f, 1f blank

Extract contact matrices from a hic file

A. Purpose
To extract a contact matrix from a hic file into a sparse matrix format in a text file.

B. Input
A local path to a hic file or an online link to a hic file. A link to a hic file: https://www.encodeproject.org/files/ENCFF219YOB/@@download/ENCFF219YOB.hic

C. Output
A contact matrix in sparse matrix format

D. Running
Access the function from the menu toolbar: 2D-Functions/Extract HiC

E. Extract contact matrices from a hic file GUI

Field Description Default
Path to .hic File An online link or local path to a hic format file NA
Load Click this button to fetch information from the header of the hic file. NA
Genome Genome version of the hic file NA
Chromosomes List of resolutions of contact matrices in the hic file NA
From Start of a fragment (to extract its contact matrix). When From and To are left blank, the whole chromosome is considered. Blank
To End of a fragment (to extract its contact matrix). When From and To are left blank, the whole chromosome is considered. Blank
Resolution List of resolutions of contact matrices in the hic file NA
Normalization List of normalization methods used to normalize contact matrices. None means unnormalized. NA
Output Directory The output directory path to output the extracted data. An example filename for the generated file is GenomeFlow_Extract_hg19_chr_1_res_BP_2500000_norm_None.txt
Extract Contact Data Click this button to initiate extracting contact data NA

Normalize an unnormalized HiC contact matrices

A. Purpose
To normalize a un-normalized contact matrices in sparse matrix format.

B. Input
A contact matrix in sparse matrix format.

C. Output
A normalized contact matrix in sparse matrix format. The matrix is normalized by the Iterative Correction and Eigenvector decomposition (ICE) method

D. Running
Access the function from the menu toolbar: 2D-Functions/Normalized HiC Data

E. Normalize an unnormalized HiC contact matrices GUI

Visualizing Dataset in 2D format

A. Purpose
To create a two dimensional (2D) graphical representation of a contact matrix from an input file.

B. Input
A sparse matrix format or a square matrix format. Mark the Is Square Matrix? box if the input is a square matrix.

  • An example sparse matrix file can be found here: /executable/sample_data/ contact_matrices/ chr11_10kb_gm12878_list_125mb_135mb.txt
  • Examples of square matrix files can be found here: /executable/sample_data/ contact_matrices/square_matrices/
    Note: Resolution for square matrices = 40000

C. Output
A heatmap which is a graphical representation of contact data where numeric values in the input contact matrix are represented as colors based according to a selected color gradient.

D. Running
Access the function from the menu toolbar: 2D-Functions/Visualize Dataset.

E. Visualization GUI

Field Description Default
Draw Title Shows or hides the heatmap title checked
Draw Legend Shows or hides the color legend checked
Draw X-Axis Title Shows or hides the X-axis title label on the 2D display window checked
Draw X-Axis Ticks Shows or hides the X-axis ticks label on the 2D display window checked
Draw Y-Axis Title Shows or Hides the Y-axis title label on the 2D display window checked
Draw Y-Axis Ticks Shows or Hides the Y-axis ticks label on the 2D display window checked
Heatmap Direction(Left/Right) Changes the Y-axis origin of the heatmap matrix from the Bottom-Left to Top-Left and vice versa checked
Enable Zoom Mode Allows the user to zoom in/out of the heat map matrix unchecked
Save HeatMap A Button to save the current State of the HeatMap. Saves in .png format NA
Is Square Matrix?(Input contact file) Allows the user to specify if the input is a Square matrix (a full matrix) or a sparse matrix. If checked, Displays a textbox for the user to specify the matrix resolution. unchecked
Specify Resolution Visible only if Is SquareMatrix? is checked. It allows user specify resolution for the input matrix. NA
Input contact file A text file containing a contact matrix in any of the format described above. NA
Title Allows user to specify the title of the heatmap Heatmap Display
X-Axis Title Allows user to specify the X-Axis title for the heatmap Genome bin Resolution (bp)
Y-Axis Title Allows user to specify the Y-Axis title for the heatmap Genome bin Resolution (bp)
X min It allows the user to specify the minimum X-axis Tick for the heatmap 0
X max Allows the user to specify the maximum X-axis Tick for the heatmap 200
Y min Allows the user to specify the minimum Y-axis Tick for the heatmap 0
Y max Allows the user to specify the maximum Y-axis Tick for the heatmap 200
X min [Genome Location Equivalent] Shows the genomic position equivalent for the minimum X-axis Tick for the heatmap 0
X max[Genome Location Equivalent] Shows the genomic position equivalent for the maximum X-axis Tick for the heatmap 8000000
Y min [Genome Location Equivalent] Shows the genomic position equivalent for the minimum Y-axis Tick for the heatmap 0
Y [Genome Location Equivalent] Shows the genomic position equivalent for the maximum Y-axis Tick for the heatmap 8000000
Number of Units detected Shows the number of regions found in the input matrix 200
Number of Missing Units Shows the number of gaps or missing regions noted from the input matrix 0
Resolution detected Displays the resolution of the input matrix 40000
Initial Start Position Shows the minimum genome position observed from the input matrix 0
Initial End Position Shows the maximum genome position observed from the input matrix 8000000
Gradient An array of Color used as a gradient. One color is used as the bottom gradient and another color is used as the top gradient. Hence, it produces a gradient from one color to the other. The Gradient Colors are explained below HOT
GRADIENT_BLACK_TO_WHITE Produces a gradient from black (low) to white (high)
GRADIENT_BLUE_TO_RED Produces a gradient from blue (low) to red (high)
GRADIENT_HEAT Produces a gradient using the colors black, brown, orange, white
GRADIENT_HOT Produces a gradient using the colors black, red, orange, and yellow to white
GRADIENT_MAROON_TO_GOLD Produces a gradient from maroon (low) to gold (high)
GRADIENT_RAINBOW Produces a gradient with the colors violet, blue, green, yellow, orange, and red
GRADIENT_RED_TO_GREEN Produces a gradient from red (low) to green (high)
GRADIENT_ROY Produces a gradient through red, orange, yellow
Data Type Determines the type of data to be displayed. The types available are the raw input data, a TANH of input data, a Pearson correlation of input data, and a Spearman correlation of the input data. TANH

F. TAD Annotation
The description of the display controls on the display window for TAD annotation is given below.

Field Description Default
Load TAD file Browse and Load a .bed format file containing the TADs identified for the input matrix NA
Identified TAD It shows the TADs in the input file NA
Show TAD on Heatmap It marks the boundary of the TADs identified on the displayed heatmap
Display Multiple TADs Once checked, allows TADs from different method to be overlapped on the same display window. This function is useful for comparing TADs identified by different methods for a dataset unchecked
Choose Display Color Choose the color for the TAD boundary marks Color 1

G. Demonstration
The figure below shows the TAD annotation for the TADs identified by two TAD identification algorithms (ClusterTAD and DI) for mESC Chromosome 17 from Ren Lab.
Step 1:
To run this demonstration, (1) load a sample square matrix as the input contact file. The example file can be found here: /executable/sample_data/ contact_matrices/square_matrices/mESC_nij.chr17.
Resolution for the square matrix = 40000
(2) Load the contact file as instructed here: Visualizing Dataset in 2D format
Step 2:
Modify the highlighted fields on the display window. The table below shows the values set for each field in the display control.

Field Value
Draw Title checked
Draw Legend checked
Draw X-Axis Title checked
Draw X-Axis Ticks checked
Draw Y-Axis Title checked
Draw Y-Axis Ticks checked
Heatmap Direction(Left/Right) checked
Enable Zoom Mode unchecked
Is SquareMatrix?(Input contact file) checked
Specify Resolution 40000
Input contact file Path/to/chr17/inputfile
Title HeatMap Display
X-Axis Title Number of Bins
Y-Axis Title Number of Bins
X min 500
X max 700
Y min 500
Y max 700
X min [Genome Location Equivalent] 20000000
X max[Genome Location Equivalent] 28000000
Y min [Genome Location Equivalent] 20000000
Y [Genome Location Equivalent] 28000000
Number of Units detected 2382
Number of Missing Units 0
Resolution detected 40000
Initial Start Position 0
Initial End Position 95240000
Gradient HOT
Data Type TANH

Step 3:

  • Browse & Load the ClusterTAD file found here:
    ClusterTAD: /executable/sample_data/TAD_annotation/mESC_TAD_bed/ ClusterTAD /chr17.bed.
  • Select a Unique from Color 1 to 4. (Ex: Color 1 for ClusterTAD and Color 2 for DI)
  • Click the Show TAD on Heatmap button.

**Step 4:** To display multiple TADs on the Heatmap: * Mark/Check the Display Multiple TADs then * Repeat **Step 3** above with the DI file found here: `DI: /executable/sample_data/TAD_annotation/mESC_TAD_bed/ DI /chr17.bed `

Demonstration of TAD Annotation on 2D Heatmap

Identify TAD

A .Purpose
To identify Topological Associated domains(TAD) from input contact matrix.

B. Input
An input file in square matrix format or a sparse matrix format . An example sparse matrix file can be found here: /executable/sample_data/ contact_matrices/ chr11_10kb_gm12878_list_125mb_135mb.txt

C. Output
A TAD with the best quality will be generated prefixed with BestTAD_ in bed format. This file will be found here: /Selected_output_directory_from_GUI/Output/TADs/.

D. Running
Access the function from the menu toolbar: 2D-Functions/Identify TAD.

E. Identify TAD GUI

Field Description Default
Input contact file An input file in any of the format described above NA
Output folder Directory to output the comparison report NA
Is SquareMatrix?(Input contact file) Allows the user to specify if the input is a Square matrix (a full matrix) or a sparse matrix. If checked, it displays a textbox for the user to specify the matrix resolution. unchecked
Data Resolution It is visible only if Is SquareMatrix? is checked. It allows user specify resolution for the input matrix. 40000
Chromosome (optional) Allows user to specify the chromosome data X
Run ClusterTAD Algorithm The default algorithm used for TAD identification from the input contact Matrix checked
Run To start the identification process. A progress bar is displayed to show the steps taken by the TAD identification algorithm,. NA
Stop During the identification, if this button is pressed, the program will stop. NA

Check TAD consistency between two TADs from different methods

A. Purpose
To compare two TADs from two different Topological Associated domains(TAD) identification method.

B. Input
A file containing TADs in .bed format. The method whose TADs consistency is to be checked is termed Method-1, and the methods whose TADs is to be compared with is termed Method-2. Choose the same chromosome for different methods.
For example, to compare TAD from ClusterTAD with DI for chromosome 17,
Method-1 = /executable/sample_data/TAD_annotation/mESC_TAD_bed/ ClusterTAD /chr17.bed.
Method-2 = /executable/sample_data/TAD_annotation/mESC_TAD_bed/ DI /chr17.bed.

C. Output
A report of the consistency of the Method-1 with Method-2. The output reports the following cases:

Field Description
Case 1 The number of Exact TADs found in both Method-1 and Method-2
Case 2 The number of Sub-TADs that exist between Method-1 and Method-2
Case 3 The number of Conflicting TADs.
Case 4 The number of TADs in Method-1 but not found in Method-2

D. Running
Access the function from the menu toolbar: 2D-Functions/Check TAD Consistency.

E. Check TAD consistency GUI

Field Description Default
Input Method-1 TAD file(.bed) Browse the .bed format file containing the TADs identified by Method-1 NA
Input Method-2 TAD file(.bed) Browse the .bed format file containing the TADs identified by Method-2 NA
Data Resolution The Resolution of the dataset the TADs were identified from. 40000
Output folder Directory to output the comparison report NA
Create Report Once this button is pressed, a progress bar is displayed to show the steps taken by the TAD identification algorithm,. NA
Stop During the check, if this button is pressed, the program will stop. NA