-
Notifications
You must be signed in to change notification settings - Fork 4
2D Functions
A. Purpose
To create a binary hic format file containing contact matrices at different resolutions and normalized by different methods from a text file describing mapped Hi-C reads
B.Test Data
Five formats are acceptable: short format, short format with score, medium format, long format and 4DN DCIC format. These formats are described here.
A sample file is: executable/sample_data/GSM1551688_HIC143_merged_nodups.zip
(unzip it before use).
Another set of test data is the GM06990 cell line data. This can be downloaded from the link below:
- GM06990 Cell line:
- Bowtie2:
- Go to http://sysbio.rnet.missouri.edu/bdm_download/GenomeFlow/GM06990/
- Download/Save the GenomeFlow_formatted.bowtie2.input file
- Bwa:
- Go to http://sysbio.rnet.missouri.edu/bdm_download/GenomeFlow/GM06990/
- Download/Save the GenomeFlow_formatted.bwa.input file
C. Output
A binary .hic file containing contact matrices
D. Running
Access the function from the menu toolbar: 2D-Functions/Convert to HiC
E. Convert mapped Hi-C reads to .hic file GUI
Field | Description | Default |
---|---|---|
Input file | A text file that contains the mapped Hi-C reads (format described above) | NA |
Genome ID | Version genome of Hi-C data | hg19 |
Genome ID (If Not listed above) | Enter a unique Genome ID for the data, if it is not provided in the dropdown list above. | NA |
Output Directory | The output directory path to output the generated hic format file. An example output filename is GenomeFlow_Convert_1521343280452.hic | NA |
Contact Threshold | Number of interaction threshold for contacts to be used in creating contact matrices. | 0 |
MAPQ Score Threshold | Mapping quality score threshold for reads to be considered in creating contact matrices. | 0 |
Chromosomes | Chromosomes for which their contact matrices need to be created. When left blank, all chromosomes will be considered. Chromosomes must be separated by a comma (,). | All (when left blank) |
Resolutions | List of resolutions of contact matrices to be created. Resolutions are separated by a comma (,) | 2500000, 1000000, 500000, 250000, 100000, 50000, 25000,10000,5000 |
Restriction Site File | Each line starts with a chromosome number followed by positions of restriction sites on that chromosome, in numeric order, and ending with the size of the chromosome. When provided, 8 additional fragment-delimited resolutions are added: 500f, 250f, 100f, 50f, 20f, 5f, 2f, 1f | blank |
A. Purpose
To extract a contact matrix from a hic file into a sparse matrix format in a text file.
B. Input
A local path to a hic file or an online link to a hic file. A link to a hic file: https://www.encodeproject.org/files/ENCFF219YOB/@@download/ENCFF219YOB.hic
C. Output
A contact matrix in sparse matrix format
D. Running
Access the function from the menu toolbar: 2D-Functions/Extract HiC
E. Extract contact matrices from a hic file GUI
Field | Description | Default |
---|---|---|
Path to .hic File | An online link or local path to a hic format file | NA |
Load | Click this button to fetch information from the header of the hic file. | NA |
Genome | Genome version of the hic file | NA |
Chromosomes | List of resolutions of contact matrices in the hic file | NA |
From | Start of a fragment (to extract its contact matrix). When From and To are left blank, the whole chromosome is considered. | Blank |
To | End of a fragment (to extract its contact matrix). When From and To are left blank, the whole chromosome is considered. | Blank |
Resolution | List of resolutions of contact matrices in the hic file | NA |
Normalization | List of normalization methods used to normalize contact matrices. None means unnormalized. | NA |
Output Directory | The output directory path to output the extracted data. An example filename for the generated file is GenomeFlow_Extract_hg19_chr_1_res_BP_2500000_norm_None.txt | |
Extract Contact Data | Click this button to initiate extracting contact data | NA |
A. Purpose
To normalize a un-normalized contact matrices in sparse matrix format.
B. Input
A contact matrix in sparse matrix format.
C. Output
A normalized contact matrix in sparse matrix format. The matrix is normalized by the Iterative Correction and Eigenvector decomposition (ICE) method
D. Running
Access the function from the menu toolbar: 2D-Functions/Normalized HiC Data
E. Normalize an unnormalized HiC contact matrices GUI
A. Purpose
To create a two dimensional (2D) graphical representation of a contact matrix from an input file.
B. Input
A sparse matrix format or a square matrix format. Mark the Is Square Matrix? box if the input is a square matrix.
- An example sparse matrix file can be found here:
/executable/sample_data/ contact_matrices/ chr11_10kb_gm12878_list_125mb_135mb.txt
- Examples of square matrix files can be found here:
/executable/sample_data/ contact_matrices/square_matrices/
Note: Resolution for square matrices = 40000
C. Output
A heatmap which is a graphical representation of contact data where numeric values in the input contact matrix are represented as colors based according to a selected color gradient.
D. Running
Access the function from the menu toolbar: 2D-Functions/Visualize Dataset.
E. Visualization GUI
Field | Description | Default |
---|---|---|
Draw Title | Shows or hides the heatmap title | checked |
Draw Legend | Shows or hides the color legend | checked |
Draw X-Axis Title | Shows or hides the X-axis title label on the 2D display window | checked |
Draw X-Axis Ticks | Shows or hides the X-axis ticks label on the 2D display window | checked |
Draw Y-Axis Title | Shows or Hides the Y-axis title label on the 2D display window | checked |
Draw Y-Axis Ticks | Shows or Hides the Y-axis ticks label on the 2D display window | checked |
Heatmap Direction(Left/Right) | Changes the Y-axis origin of the heatmap matrix from the Bottom-Left to Top-Left and vice versa | checked |
Enable Zoom Mode | Allows the user to zoom in/out of the heat map matrix | unchecked |
Save HeatMap | A Button to save the current State of the HeatMap. Saves in .png format | NA |
Is Square Matrix?(Input contact file) | Allows the user to specify if the input is a Square matrix (a full matrix) or a sparse matrix. If checked, Displays a textbox for the user to specify the matrix resolution. | unchecked |
Specify Resolution | Visible only if Is SquareMatrix? is checked. It allows user specify resolution for the input matrix. | NA |
Input contact file | A text file containing a contact matrix in any of the format described above. | NA |
Title | Allows user to specify the title of the heatmap | Heatmap Display |
X-Axis Title | Allows user to specify the X-Axis title for the heatmap | Genome bin Resolution (bp) |
Y-Axis Title | Allows user to specify the Y-Axis title for the heatmap | Genome bin Resolution (bp) |
X min | It allows the user to specify the minimum X-axis Tick for the heatmap | 0 |
X max | Allows the user to specify the maximum X-axis Tick for the heatmap | 200 |
Y min | Allows the user to specify the minimum Y-axis Tick for the heatmap | 0 |
Y max | Allows the user to specify the maximum Y-axis Tick for the heatmap | 200 |
X min [Genome Location Equivalent] | Shows the genomic position equivalent for the minimum X-axis Tick for the heatmap | 0 |
X max[Genome Location Equivalent] | Shows the genomic position equivalent for the maximum X-axis Tick for the heatmap | 8000000 |
Y min [Genome Location Equivalent] | Shows the genomic position equivalent for the minimum Y-axis Tick for the heatmap | 0 |
Y [Genome Location Equivalent] | Shows the genomic position equivalent for the maximum Y-axis Tick for the heatmap | 8000000 |
Number of Units detected | Shows the number of regions found in the input matrix | 200 |
Number of Missing Units | Shows the number of gaps or missing regions noted from the input matrix | 0 |
Resolution detected | Displays the resolution of the input matrix | 40000 |
Initial Start Position | Shows the minimum genome position observed from the input matrix | 0 |
Initial End Position | Shows the maximum genome position observed from the input matrix | 8000000 |
Gradient | An array of Color used as a gradient. One color is used as the bottom gradient and another color is used as the top gradient. Hence, it produces a gradient from one color to the other. The Gradient Colors are explained below | HOT |
GRADIENT_BLACK_TO_WHITE | Produces a gradient from black (low) to white (high) | |
GRADIENT_BLUE_TO_RED | Produces a gradient from blue (low) to red (high) | |
GRADIENT_HEAT | Produces a gradient using the colors black, brown, orange, white | |
GRADIENT_HOT | Produces a gradient using the colors black, red, orange, and yellow to white | |
GRADIENT_MAROON_TO_GOLD | Produces a gradient from maroon (low) to gold (high) | |
GRADIENT_RAINBOW | Produces a gradient with the colors violet, blue, green, yellow, orange, and red | |
GRADIENT_RED_TO_GREEN | Produces a gradient from red (low) to green (high) | |
GRADIENT_ROY | Produces a gradient through red, orange, yellow | |
Data Type | Determines the type of data to be displayed. The types available are the raw input data, a TANH of input data, a Pearson correlation of input data, and a Spearman correlation of the input data. | TANH |
F. TAD Annotation
The description of the display controls on the display window for TAD annotation is given below.
Field | Description | Default |
---|---|---|
Load TAD file | Browse and Load a .bed format file containing the TADs identified for the input matrix | NA |
Identified TAD | It shows the TADs in the input file | NA |
Show TAD on Heatmap | It marks the boundary of the TADs identified on the displayed heatmap | |
Display Multiple TADs | Once checked, allows TADs from different method to be overlapped on the same display window. This function is useful for comparing TADs identified by different methods for a dataset | unchecked |
Choose Display Color | Choose the color for the TAD boundary marks | Color 1 |
G. Demonstration
The figure below shows the TAD annotation for the TADs identified by two TAD identification algorithms (ClusterTAD and DI) for mESC Chromosome 17 from Ren Lab.
Step 1:
To run this demonstration,
(1) load a sample square matrix as the input contact file.
The example file can be found here: /executable/sample_data/ contact_matrices/square_matrices/mESC_nij.chr17
.
Resolution for the square matrix = 40000
(2) Load the contact file as instructed here: Visualizing Dataset in 2D format
Step 2:
Modify the highlighted fields on the display window. The table below shows the values set for each field in the display control.
Field | Value |
---|---|
Draw Title | checked |
Draw Legend | checked |
Draw X-Axis Title | checked |
Draw X-Axis Ticks | checked |
Draw Y-Axis Title | checked |
Draw Y-Axis Ticks | checked |
Heatmap Direction(Left/Right) | checked |
Enable Zoom Mode | unchecked |
Is SquareMatrix?(Input contact file) |
checked |
Specify Resolution |
40000 |
Input contact file |
Path/to/chr17/inputfil e |
Title | HeatMap Display |
X-Axis Title | Number of Bins |
Y-Axis Title | Number of Bins |
X min |
500 |
X max |
700 |
Y min |
500 |
Y max |
700 |
X min [Genome Location Equivalent] | 20000000 |
X max[Genome Location Equivalent] | 28000000 |
Y min [Genome Location Equivalent] | 20000000 |
Y [Genome Location Equivalent] | 28000000 |
Number of Units detected | 2382 |
Number of Missing Units | 0 |
Resolution detected | 40000 |
Initial Start Position | 0 |
Initial End Position | 95240000 |
Gradient | HOT |
Data Type | TANH |
Step 3:
- Browse & Load the ClusterTAD file found here:
ClusterTAD: /executable/sample_data/TAD_annotation/mESC_TAD_bed/ ClusterTAD /chr17.bed
.
- Select a Unique from Color 1 to 4. (Ex: Color 1 for ClusterTAD and Color 2 for DI)
- Click the Show TAD on Heatmap button.
**Step 4:** To display multiple TADs on the Heatmap: * Mark/Check the Display Multiple TADs then * Repeat **Step 3** above with the DI file found here: `DI: /executable/sample_data/TAD_annotation/mESC_TAD_bed/ DI /chr17.bed `
A .Purpose
To identify Topological Associated domains(TAD) from input contact matrix.
B. Input
An input file in square matrix format or a sparse matrix format .
An example sparse matrix file can be found here:
/executable/sample_data/ contact_matrices/ chr11_10kb_gm12878_list_125mb_135mb.txt
C. Output
A TAD with the best quality will be generated prefixed with BestTAD_ in bed format. This file will be found here: /Selected_output_directory_from_GUI/Output/TADs/
.
D. Running
Access the function from the menu toolbar: 2D-Functions/Identify TAD
.
E. Identify TAD GUI
Field | Description | Default |
---|---|---|
Input contact file | An input file in any of the format described above | NA |
Output folder | Directory to output the comparison report | NA |
Is SquareMatrix?(Input contact file) | Allows the user to specify if the input is a Square matrix (a full matrix) or a sparse matrix. If checked, it displays a textbox for the user to specify the matrix resolution. | unchecked |
Data Resolution | It is visible only if Is SquareMatrix? is checked. It allows user specify resolution for the input matrix. | 40000 |
Chromosome (optional) | Allows user to specify the chromosome data | X |
Run ClusterTAD Algorithm | The default algorithm used for TAD identification from the input contact Matrix checked | |
Run | To start the identification process. A progress bar is displayed to show the steps taken by the TAD identification algorithm,. | NA |
Stop | During the identification, if this button is pressed, the program will stop. | NA |
A. Purpose
To compare two TADs from two different Topological Associated domains(TAD) identification method.
B. Input
A file containing TADs in .bed format. The method whose TADs consistency is to be checked is termed Method-1, and the methods whose TADs is to be compared with is termed Method-2. Choose the same chromosome for different methods.
For example, to compare TAD from ClusterTAD with DI for chromosome 17,
Method-1 = /executable/sample_data/TAD_annotation/mESC_TAD_bed/ ClusterTAD /chr17.bed
.
Method-2 = /executable/sample_data/TAD_annotation/mESC_TAD_bed/ DI /chr17.bed
.
C. Output
A report of the consistency of the Method-1 with Method-2. The output reports the following cases:
Field | Description |
---|---|
Case 1 | The number of Exact TADs found in both Method-1 and Method-2 |
Case 2 | The number of Sub-TADs that exist between Method-1 and Method-2 |
Case 3 | The number of Conflicting TADs. |
Case 4 | The number of TADs in Method-1 but not found in Method-2 |
D. Running
Access the function from the menu toolbar: 2D-Functions/Check TAD Consistency
.
E. Check TAD consistency GUI
Field | Description | Default |
---|---|---|
Input Method-1 TAD file(.bed) | Browse the .bed format file containing the TADs identified by Method-1 | NA |
Input Method-2 TAD file(.bed) | Browse the .bed format file containing the TADs identified by Method-2 | NA |
Data Resolution | The Resolution of the dataset the TADs were identified from. | 40000 |
Output folder | Directory to output the comparison report | NA |
Create Report | Once this button is pressed, a progress bar is displayed to show the steps taken by the TAD identification algorithm,. | NA |
Stop | During the check, if this button is pressed, the program will stop. | NA |
- Create reference genome index
- Mapping raw FASTQ files
- Filter a BAM alignment file
- Convert a BAM file to Medium file format
- HiC-Express