Skip to content

johncthomas/jtmethtools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Methylation data tools. Primarily classes and functions to be built upon by other Python tools. Arrow based tables for efficiently storing and processing Bismark BAMs. Module for producing pile-up images of regions for CNNs.

Classes

Explore alignments.Alignment and classes.*

Convert a BAM to a parquet tables

Script: jtm-write-alignment-data

Outputs two tables, one with locus level (nucleotide, individual CpG, etc.) information, and one with read level information, plus metadata.

To load the data in R:

library(arrow)
library(jsonlite)

locusTable <- read_parquet("dataset/locus-table.parquet")
readTable <- read_parquet("dataset/read-table.parquet")
metadata <- fromJSON("dataset/metadata.json")
chrm_ids <- metadata['locus']['chrm_map']

Most text data is encoded into integers. These mappings are recorded in the metadata.

Images for CNN

2D pileups, as binary arrays with values between 0 & 1 representing different sequence features such as methylation state, mapping quality and nucleotide sequence.

Paired-end BAMs should be sorted by query name (preferably) or coordinate (may take more memory). Unsorted BAMs will probably work but use a lot of memory.

Generation

After installation use jtm-generate-images run --help for arguments. Available layers (that can then be passed to the --layer option) can be printed using jtm-generate-images layers. run produces gzipped TAR files that contain the binary array and a metadata JSON file, specifying the shape of the array and other things.

jtm-generate-images invokes the script generate_images.py.

Arrays

import jtmethtools as jtm
fn = 'image.region_name.layer.tar.gz'
array, metadata = jtm.images.read_array(fn)

import matplotlib.pyplot as plt
plt.imshow(array, interpolation='nearest', cmap='gray')

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages