2D Functions

Convert mapped Hi-C reads to .hic file

A. Purpose
To create a binary hic format file containing contact matrices at different resolutions and normalized by different methods from a text file describing mapped Hi-C reads

B.Test Data
Five formats are acceptable: short format, short format with score, medium format, long format and 4DN DCIC format. These formats are described here.

A sample file is: executable/sample_data/GSM1551688_HIC143_merged_nodups.zip (unzip it before use).
Another set of test data is the GM06990 cell line data. This can be downloaded from the link below:

GM06990 Cell line:

Bowtie2:
- Go to http://sysbio.rnet.missouri.edu/bdm_download/GenomeFlow/GM06990/
- Download/Save the GenomeFlow_formatted.bowtie2.input file
Bwa:
- Go to http://sysbio.rnet.missouri.edu/bdm_download/GenomeFlow/GM06990/
- Download/Save the GenomeFlow_formatted.bwa.input file

C. Output
A binary .hic file containing contact matrices

D. Running
Access the function from the menu toolbar: 2D-Functions/Convert to HiC

E. Convert mapped Hi-C reads to .hic file GUI

Field	Description	Default
Input file	A text file that contains the mapped Hi-C reads (format described above)	NA
Genome ID	Version genome of Hi-C data	hg19
Genome ID (If Not listed above)	Enter a unique Genome ID for the data, if it is not provided in the dropdown list above.	NA
Output Directory	The output directory path to output the generated hic format file. An example output filename is GenomeFlow_Convert_1521343280452.hic	NA
Contact Threshold	Number of interaction threshold for contacts to be used in creating contact matrices.	0
MAPQ Score Threshold	Mapping quality score threshold for reads to be considered in creating contact matrices.	0
Chromosomes	Chromosomes for which their contact matrices need to be created. When left blank, all chromosomes will be considered. Chromosomes must be separated by a comma (,).	All (when left blank)
Resolutions	List of resolutions of contact matrices to be created. Resolutions are separated by a comma (,)	2500000, 1000000, 500000, 250000, 100000, 50000, 25000,10000,5000
Restriction Site File	Each line starts with a chromosome number followed by positions of restriction sites on that chromosome, in numeric order, and ending with the size of the chromosome. When provided, 8 additional fragment-delimited resolutions are added: 500f, 250f, 100f, 50f, 20f, 5f, 2f, 1f	blank

Extract contact matrices from a hic file

A. Purpose
To extract a contact matrix from a hic file into a sparse matrix format in a text file.

B. Input
A local path to a hic file or an online link to a hic file. A link to a hic file: https://www.encodeproject.org/files/ENCFF219YOB/@@download/ENCFF219YOB.hic

C. Output
A contact matrix in sparse matrix format

D. Running
Access the function from the menu toolbar: 2D-Functions/Extract HiC

E. Extract contact matrices from a hic file GUI

Field	Description	Default
Path to .hic File	An online link or local path to a hic format file	NA
Load	Click this button to fetch information from the header of the hic file.	NA
Genome	Genome version of the hic file	NA
Chromosomes	List of resolutions of contact matrices in the hic file	NA
From	Start of a fragment (to extract its contact matrix). When From and To are left blank, the whole chromosome is considered.	Blank
To	End of a fragment (to extract its contact matrix). When From and To are left blank, the whole chromosome is considered.	Blank
Resolution	List of resolutions of contact matrices in the hic file	NA
Normalization	List of normalization methods used to normalize contact matrices. None means unnormalized.	NA
Output Directory	The output directory path to output the extracted data. An example filename for the generated file is GenomeFlow_Extract_hg19_chr_1_res_BP_2500000_norm_None.txt
Extract Contact Data	Click this button to initiate extracting contact data	NA

Normalize an unnormalized HiC contact matrices

A. Purpose
To normalize a un-normalized contact matrices in sparse matrix format.

B. Input
A contact matrix in sparse matrix format.

C. Output
A normalized contact matrix in sparse matrix format. The matrix is normalized by the Iterative Correction and Eigenvector decomposition (ICE) method

D. Running
Access the function from the menu toolbar: 2D-Functions/Normalized HiC Data

E. Normalize an unnormalized HiC contact matrices GUI

Visualizing Dataset in 2D format

A. Purpose
To create a two dimensional (2D) graphical representation of a contact matrix from an input file.

B. Input
A sparse matrix format or a square matrix format. Mark the Is Square Matrix? box if the input is a square matrix.

An example sparse matrix file can be found here: /executable/sample_data/ contact_matrices/ chr11_10kb_gm12878_list_125mb_135mb.txt
Examples of square matrix files can be found here: /executable/sample_data/ contact_matrices/square_matrices/
Note: Resolution for square matrices = 40000

C. Output
A heatmap which is a graphical representation of contact data where numeric values in the input contact matrix are represented as colors based according to a selected color gradient.

D. Running
Access the function from the menu toolbar: 2D-Functions/Visualize Dataset.

E. Visualization GUI

Field	Description	Default
Draw Title	Shows or hides the heatmap title	checked
Draw Legend	Shows or hides the color legend	checked
Draw X-Axis Title	Shows or hides the X-axis title label on the 2D display window	checked
Draw X-Axis Ticks	Shows or hides the X-axis ticks label on the 2D display window	checked
Draw Y-Axis Title	Shows or Hides the Y-axis title label on the 2D display window	checked
Draw Y-Axis Ticks	Shows or Hides the Y-axis ticks label on the 2D display window	checked
Heatmap Direction(Left/Right)	Changes the Y-axis origin of the heatmap matrix from the Bottom-Left to Top-Left and vice versa	checked
Enable Zoom Mode	Allows the user to zoom in/out of the heat map matrix	unchecked
Save HeatMap	A Button to save the current State of the HeatMap. Saves in .png format	NA
Is Square Matrix?(Input contact file)	Allows the user to specify if the input is a Square matrix (a full matrix) or a sparse matrix. If checked, Displays a textbox for the user to specify the matrix resolution.	unchecked
Specify Resolution	Visible only if Is SquareMatrix? is checked. It allows user specify resolution for the input matrix.	NA
Input contact file	A text file containing a contact matrix in any of the format described above.	NA
Title	Allows user to specify the title of the heatmap	Heatmap Display
X-Axis Title	Allows user to specify the X-Axis title for the heatmap	Genome bin Resolution (bp)
Y-Axis Title	Allows user to specify the Y-Axis title for the heatmap	Genome bin Resolution (bp)
X min	It allows the user to specify the minimum X-axis Tick for the heatmap	0
X max	Allows the user to specify the maximum X-axis Tick for the heatmap	200
Y min	Allows the user to specify the minimum Y-axis Tick for the heatmap	0
Y max	Allows the user to specify the maximum Y-axis Tick for the heatmap	200
X min [Genome Location Equivalent]	Shows the genomic position equivalent for the minimum X-axis Tick for the heatmap	0
X max[Genome Location Equivalent]	Shows the genomic position equivalent for the maximum X-axis Tick for the heatmap	8000000
Y min [Genome Location Equivalent]	Shows the genomic position equivalent for the minimum Y-axis Tick for the heatmap	0
Y [Genome Location Equivalent]	Shows the genomic position equivalent for the maximum Y-axis Tick for the heatmap	8000000
Number of Units detected	Shows the number of regions found in the input matrix	200
Number of Missing Units	Shows the number of gaps or missing regions noted from the input matrix	0
Resolution detected	Displays the resolution of the input matrix	40000
Initial Start Position	Shows the minimum genome position observed from the input matrix	0
Initial End Position	Shows the maximum genome position observed from the input matrix	8000000
Gradient	An array of Color used as a gradient. One color is used as the bottom gradient and another color is used as the top gradient. Hence, it produces a gradient from one color to the other. The Gradient Colors are explained below	HOT
GRADIENT_BLACK_TO_WHITE	Produces a gradient from black (low) to white (high)
GRADIENT_BLUE_TO_RED	Produces a gradient from blue (low) to red (high)
GRADIENT_HEAT	Produces a gradient using the colors black, brown, orange, white
GRADIENT_HOT	Produces a gradient using the colors black, red, orange, and yellow to white
GRADIENT_MAROON_TO_GOLD	Produces a gradient from maroon (low) to gold (high)
GRADIENT_RAINBOW	Produces a gradient with the colors violet, blue, green, yellow, orange, and red
GRADIENT_RED_TO_GREEN	Produces a gradient from red (low) to green (high)
GRADIENT_ROY	Produces a gradient through red, orange, yellow
Data Type	Determines the type of data to be displayed. The types available are the raw input data, a TANH of input data, a Pearson correlation of input data, and a Spearman correlation of the input data.	TANH

F. TAD Annotation
The description of the display controls on the display window for TAD annotation is given below.

Field	Description	Default
Load TAD file	Browse and Load a .bed format file containing the TADs identified for the input matrix	NA
Identified TAD	It shows the TADs in the input file	NA
Show TAD on Heatmap	It marks the boundary of the TADs identified on the displayed heatmap
Display Multiple TADs	Once checked, allows TADs from different method to be overlapped on the same display window. This function is useful for comparing TADs identified by different methods for a dataset	unchecked
Choose Display Color	Choose the color for the TAD boundary marks	Color 1

G. Demonstration
The figure below shows the TAD annotation for the TADs identified by two TAD identification algorithms (ClusterTAD and DI) for mESC Chromosome 17 from Ren Lab.
Step 1:
To run this demonstration, (1) load a sample square matrix as the input contact file. The example file can be found here: /executable/sample_data/ contact_matrices/square_matrices/mESC_nij.chr17.
Resolution for the square matrix = 40000
(2) Load the contact file as instructed here: Visualizing Dataset in 2D format
Step 2:
Modify the highlighted fields on the display window. The table below shows the values set for each field in the display control.

Field	Value
Draw Title	checked
Draw Legend	checked
Draw X-Axis Title	checked
Draw X-Axis Ticks	checked
Draw Y-Axis Title	checked
Draw Y-Axis Ticks	checked
Heatmap Direction(Left/Right)	checked
Enable Zoom Mode	unchecked
`Is SquareMatrix?(Input contact file)`	`checked`
`Specify Resolution`	`40000`
`Input contact file`	`Path/to/chr17/inputfil`e
Title	HeatMap Display
X-Axis Title	Number of Bins
Y-Axis Title	Number of Bins
`X min`	`500`
`X max`	`700`
`Y min`	`500`
`Y max`	`700`
X min [Genome Location Equivalent]	20000000
X max[Genome Location Equivalent]	28000000
Y min [Genome Location Equivalent]	20000000
Y [Genome Location Equivalent]	28000000
Number of Units detected	2382
Number of Missing Units	0
Resolution detected	40000
Initial Start Position	0
Initial End Position	95240000
Gradient	HOT
Data Type	TANH

Step 3:

Browse & Load the ClusterTAD file found here:
ClusterTAD: /executable/sample_data/TAD_annotation/mESC_TAD_bed/ ClusterTAD /chr17.bed.
Select a Unique from Color 1 to 4. (Ex: Color 1 for ClusterTAD and Color 2 for DI)
Click the Show TAD on Heatmap button.

**Step 4:** To display multiple TADs on the Heatmap: * Mark/Check the Display Multiple TADs then * Repeat **Step 3** above with the DI file found here: `DI: /executable/sample_data/TAD_annotation/mESC_TAD_bed/ DI /chr17.bed `

Demonstration of TAD Annotation on 2D Heatmap

Identify TAD

A .Purpose
To identify Topological Associated domains(TAD) from input contact matrix.

B. Input
An input file in square matrix format or a sparse matrix format . An example sparse matrix file can be found here: /executable/sample_data/ contact_matrices/ chr11_10kb_gm12878_list_125mb_135mb.txt

C. Output
A TAD with the best quality will be generated prefixed with BestTAD_ in bed format. This file will be found here: /Selected_output_directory_from_GUI/Output/TADs/.

D. Running
Access the function from the menu toolbar: 2D-Functions/Identify TAD.

E. Identify TAD GUI

Field	Description	Default
Input contact file	An input file in any of the format described above	NA
Output folder	Directory to output the comparison report	NA
Is SquareMatrix?(Input contact file)	Allows the user to specify if the input is a Square matrix (a full matrix) or a sparse matrix. If checked, it displays a textbox for the user to specify the matrix resolution.	unchecked
Data Resolution	It is visible only if Is SquareMatrix? is checked. It allows user specify resolution for the input matrix.	40000
Chromosome (optional)	Allows user to specify the chromosome data	X
Run ClusterTAD Algorithm	The default algorithm used for TAD identification from the input contact Matrix checked
Run	To start the identification process. A progress bar is displayed to show the steps taken by the TAD identification algorithm,.	NA
Stop	During the identification, if this button is pressed, the program will stop.	NA

Check TAD consistency between two TADs from different methods

A. Purpose
To compare two TADs from two different Topological Associated domains(TAD) identification method.

B. Input
A file containing TADs in .bed format. The method whose TADs consistency is to be checked is termed Method-1, and the methods whose TADs is to be compared with is termed Method-2. Choose the same chromosome for different methods.
For example, to compare TAD from ClusterTAD with DI for chromosome 17,
Method-1 = /executable/sample_data/TAD_annotation/mESC_TAD_bed/ ClusterTAD /chr17.bed.
Method-2 = /executable/sample_data/TAD_annotation/mESC_TAD_bed/ DI /chr17.bed.

C. Output
A report of the consistency of the Method-1 with Method-2. The output reports the following cases:

Field	Description
Case 1	The number of Exact TADs found in both Method-1 and Method-2
Case 2	The number of Sub-TADs that exist between Method-1 and Method-2
Case 3	The number of Conflicting TADs.
Case 4	The number of TADs in Method-1 but not found in Method-2

D. Running
Access the function from the menu toolbar: 2D-Functions/Check TAD Consistency.

E. Check TAD consistency GUI

Field	Description	Default
Input Method-1 TAD file(.bed)	Browse the .bed format file containing the TADs identified by Method-1	NA
Input Method-2 TAD file(.bed)	Browse the .bed format file containing the TADs identified by Method-2	NA
Data Resolution	The Resolution of the dataset the TADs were identified from.	40000
Output folder	Directory to output the comparison report	NA
Create Report	Once this button is pressed, a progress bar is displayed to show the steps taken by the TAD identification algorithm,.	NA
Stop	During the check, if this button is pressed, the program will stop.	NA

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

2D Functions

Convert mapped Hi-C reads to .hic file

Extract contact matrices from a hic file

Normalize an unnormalized HiC contact matrices

Visualizing Dataset in 2D format

Identify TAD

Check TAD consistency between two TADs from different methods

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

GenomeFlow

1D Functions tools

2D-Functions tools

3D-Functions tools

Clone this wiki locally