Skip to content

2D Functions

oluwatosin oluwadare edited this page Mar 4, 2018 · 25 revisions

Convert mapped Hi-C reads to .hic file

A. Purpose
To create a binary hic format file containing contact matrices at different resolutions and normalized by different methods from a text file describing mapped Hi-C reads

B.Test Data
An example test data is the GM06990 cell line data, that can be downloaded from link below:

  1. GM06990 Cell line:

C. Output
A binary .hic file containing contact matrices

D. Running
Access the function from the menu toolbar: 2D-Functions/Convert to HiC

E. Convert mapped Hi-C reads to .hic file GUI

Field Description Default
Input file A text file describes mapped Hi-C reads (format described above) NA
Genome ID Version genome of Hi-C data hg19
Output File A name of the output hic format file NA
Contact Threshold Number of interaction threshold for contacts to be used in creating contact matrices. 0
MAPQ Score Threshold Mapping quality score threshold for reads to be considered in creating contact matrices. 0
Chromosomes Chromosomes for which their contact matrices need to be created. When left blank, all chromosomes will be considered. Chromosomes must be separated by a comma (,). All (when left blank)
Resolutions List of resolutions of contact matrices to be created. Resolutions are separated by a comma (,) 2500000, 1000000, 500000, 250000, 100000, 50000, 25000,10000,5000
Restriction Site File Each line starts with a chromosome number followed by positions of restriction sites on that chromosome, in numeric order, and ending with the size of the chromosome. When provided, 8 additional fragment-delimited resolutions are added: 500f, 250f, 100f, 50f, 20f, 5f, 2f, 1f blank

Extract contact matrices from a hic file

A. Purpose
To extract a contact matrix from a hic file into a sparse matrix format in a text file.

B. Input
A local path to a hic file or an online link to a hic file. A link to a hic file: https://www.encodeproject.org/files/ENCFF219YOB/@@download/ENCFF219YOB.hic

C. Output
A contact matrix in sparse matrix format

D. Running
Access the function from the menu toolbar: 2D-Functions/Extract HiC

E. Extract contact matrices from a hic file GUI

Field Description Default
Path to .hic File An online link or local path to a hic format file NA
Load Click this button to fetch information from the header of the hic file. NA
Genome Genome version of the hic file NA
Chromosomes List of resolutions of contact matrices in the hic file NA
From Start of a fragment (to extract its contact matrix). When From and To are left blank, the whole chromosome is considered. Blank
To End of a fragment (to extract its contact matrix). When From and To are left blank, the whole chromosome is considered. Blank
Resolution List of resolutions of contact matrices in the hic file NA
Normalization List of normalization methods used to normalize contact matrices NA
Extract Contac Data Click this button to initiate extracting contact data NA

Normalize an unnormalized HiC contact matrices

A. Purpose
To normalize a un-normalized contact matrices in sparse matrix format.

B. Input
A contact matrix in sparse matrix format.

C. Output
A normalized contact matrix in sparse matrix format. The matrix is normalized by the Iterative Correction and Eigenvector decomposition (ICE) method

D. Running\ Access the function from the menu toolbar: 2D-Functions/Normalized HiC Data

E. Normalize an unnormalized HiC contact matrices GUI

Visualizing Dataset in 2D format

A. Purpose
To create a two dimensional (2D) graphical representation of a contact matrix from an input file.

B. Input
A sparse matrix format or a square matrix format. Mark the Is Square Matrix? box if the input is a square matrix.

  • An example sparse matrix file can be found here: /executable/sample_data/ contact_matrices/ chr11_10kb_gm12878_list_125mb_135mb.txt
  • Examples of square matrix files can be found here: /executable/sample_data/ contact_matrices/square_matrices/
    Note: Resolution for square matrices = 40000

C. Output
A Heatmap which is a graphical representation of contact data where numeric values in the input contact matrix are represented as colors based according to a selected color gradient.

D. Running
Access the function from the menu toolbar: 2D-Functions/Visualize Dataset.

E. Visualization GUI

Field Description Default
Draw Title It shows or hides the Heatmap title checked
Draw Legend It shows or hides the color legend checked
Draw X-Axis Title It shows or hides the X-axis title label on the 2D display window checked
Draw X-Axis Ticks It shows or hides the X-axis ticks label on the 2D display window checked
Draw Y-Axis Title It shows or hides the Y-axis title label on the 2D display window checked
Draw Y-Axis Ticks It shows or hides the Y-axis ticks label on the 2D display window checked
Heatmap Direction(Left/Right) It changes the Y-axis origin of the heatmap matrix from the Bottom-Left to Top-Left and vice versa checked
Enable Zoom Mode It allows the user to zoom in/out of the heat map matrix unchecked
Is Square Matrix?(Input contact file) Allows the user to specify if the input is a Square matrix (a full matrix) or a sparse matrix. If checked, it displays a textbox for the user to specify the matrix resolution. unchecked
Specify Resolution It is visible only if Is SquareMatrix? is checked. It allows user specify resolution for the input matrix. NA
Input contact file A text file containing a contact matrix in any of the format described above. NA
Title Allows user to specify the title of the heatmap Heatmap Display
X-Axis Title Allows user to specify the X-Axis title for the heatmap Genome bin Resolution (bp)
Y-Axis Title Allows user to specify the Y-Axis title for the heatmap Genome bin Resolution (bp)
X min It allows the user to specify the minimum X-axis Tick for the heatmap 0
X max It allows the user to specify the maximum X-axis Tick for the heatmap 200
Y min It allows the user to specify the minimum Y-axis Tick for the heatmap 0
Y max It allows the user to specify the maximum Y-axis Tick for the heatmap 200
X min [Genome Location Equivalent] It shows the genomic position equivalent for the minimum X-axis Tick for the heatmap 0
X max[Genome Location Equivalent] It shows the genomic position equivalent for the maximum X-axis Tick for the heatmap 8000000
Y min [Genome Location Equivalent] It shows the genomic position equivalent for the minimum Y-axis Tick for the heatmap 0
Y [Genome Location Equivalent] It shows the genomic position equivalent for the maximum Y-axis Tick for the heatmap 8000000
Number of Units detected It shows the number of regions found in the input matrix 200
Number of Missing Units It shows the number of gaps or missing regions noted from the input matrix 0
Resolution detected It displays the resolution of the input matrix 40000
Initial Start Position It shows the minimum genome position observed from the input matrix 0
Initial End Position It shows the maximum genome position observed from the input matrix 8000000
Gradient An array of Color used as a gradient. One color is used as the bottom gradient and another color is used as the top gradient. Hence, it produces a gradient from one color to the other. The Gradient Colors are explained below HOT
GRADIENT_BLACK_TO_WHITE Produces a gradient from black (low) to white (high)
GRADIENT_BLUE_TO_RED Produces a gradient from blue (low) to red (high)
GRADIENT_HEAT Produces a different gradient for hot things (black, brown, orange, white)
GRADIENT_HOT Produces a gradient for hot things (black, red, orange, yellow, white)
GRADIENT_MAROON_TO_GOLD Produces a gradient from maroon (low) to gold (high)
GRADIENT_RAINBOW Produces a gradient through the rainbow: violet, blue, green, yellow, orange, red
GRADIENT_RED_TO_GREEN Produces a gradient from red (low) to green (high)
GRADIENT_ROY Produces a gradient through red, orange, yellow
Data Type It determines the type of data to be displayed. The types available are the raw input data, a Tanh of input data, a Pearson correlation of input data, and a Spearman correlation of the input data. TANH

Identify TAD

A .Purpose
To identify Topological Associated domains(TAD) from input contact matrix.

B. Input
An input file in square matrix format or a sparse matrix format . An example sparse matrix file can be found here: /executable/sample_data/ contact_matrices/ chr11_10kb_gm12878_list_125mb_135mb.txt

C. Output
A TAD with the best quality will be generated prefixed with BestTAD_ in bed format. This file will be found here: /Selected_output_directory_from_GUI/Output/TADs/.

D. Running
Access the function from the menu toolbar: 2D-Functions/Identify TAD.

E. Identify TAD GUI

Field Description Default
Input contact file An input file in any of the format described above NA
Output folder Directory to output the comparison report NA
Is SquareMatrix?(Input contact file) Allows the user to specify if the input is a Square matrix (a full matrix) or a sparse matrix. If checked, it displays a textbox for the user to specify the matrix resolution. unchecked
Data Resolution It is visible only if Is SquareMatrix? is checked. It allows user specify resolution for the input matrix. 40000
Chromosome (optional) Allows user to specify the chromosome data X
Run ClusterTAD Algorithm The default algorithm used for TAD identification from the input contact Matrix checked
Run To start the identification process. A progress bar is displayed to show the steps taken by the TAD identification algorithm,. NA
Stop During the identification, if this button is pressed, the program will stop. NA

Check TAD consistency between two TADs from different methods

A. Purpose
To compare two TADs from two different Topological Associated domains(TAD) identification method.

B. Input
A file containing TADs in .bed format. The method whose TADs consistency is to be checked is termed Method-1, and the methods whose TADs is to be compared with is termed Method-2. Choose the same chromosome for different methods.
For example, to compare TAD from ClusterTAD with DI for chromosome 17,
Method-1 = /executable/sample_data/TAD_annotation/mESC_TAD_bed/ ClusterTAD /chr17.bed.
Method-2 = /executable/sample_data/TAD_annotation/mESC_TAD_bed/ DI /chr17.bed.

C. Output
A report of the consistency of the Method-1 with Method-2. The output reports the following cases:

Case 1 = The number of Exact TADs found in both Method-1 and Method-2
Case 2 = The number of Sub-TADs that exist between Method-1 and Method-2
Case 3 = The number of Conflicting TADs.
Case 4 = The number of TADs in Method-1 but not found in Method-2

D. Running
Access the function from the menu toolbar: 2D-Functions/Check TAD Consistency.

E. Check TAD consistency GUI

Clone this wiki locally