Dependencies

bedtools: bedtools
pysam: pip install pysam
Run on linux

Generating strain-specific manifest with given vcf, manifest, and reference genome

python strain_specific_manifest.py -h

usage: python strain_specific_manifest.py vcf manifest outdir  

Manifest file should contain columns of chr, start, end, M,U,target, col,Probe_ID, mapFlag,mask     
it can be generated by:
zcat MM285.mm10.manifest.tsv.gz | sed '1d' | cut -f 1-6,8-9,11,31 > test/manifest.bed

For vcf file, there must be one line starts with #CHROM

generate strain-specific manifests for SeSAMe

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         Display version
  -vcf <file>, --vcf <file>
                        input vcf
  -mft <file>, --mft <file>
                        Input manifest, a bed file contains columns of chr,start,end, M,U,target,col,Probe_ID, mapFlag,mask,
                        but should not contain header.
  -ref <file>, --Reference <file>
                        reference genome, .fa
  -o <outdir>, --outdir <outdir>
                        output directory

example usage: strain_specific_manifest.py -vcf mgp.v5.merged.snps_all.dbSNP142.vcf.gz -mft MM285.mm10.manifest.bed -ref /mnt/isilon/zhou_lab/projects/20191221_references/mm10/mm10.fa -o test_dir

Please Note:

If the chromosome names in the input vcf, manifest and reference genome were not consistent, the script would handle it automatically by adding or removing 'chr' or replace chrM to chrMT or chrMT to chrM.

If the output of bedtools had already existed under outdir, the process of running bedtools would be skipped.

Demo

mkdir Demo
zcat /mnt/isilon/zhoulab/labprojects/20200228_Mouse_Array_Project/20210324_SeSAme_Wubin/data/strainSpecificManifest/MM285.mm10.manifest.tsv.gz | sed '1d' | cut -f 1-6,8-9,11,31 > Demo/MM285.mm10.manifest.bed
python strain_specific_manifest.py -vcf /mnt/isilon/zhoulab/labprojects/20200228_Mouse_Array_Project/20210324_SeSAme_Wubin/data/strainSpecificManifest/mgp.v5.merged.snps_all.dbSNP142.vcf.gz -mft Demo/MM285.mm10.manifest.bed -ref /mnt/isilon/zhou_lab/projects/20191221_references/mm10/mm10.fa -o Demo

All files would be generated under outdir [Demo]

Demo
├── chr1.bed
├── manifest.6bp.bed
├── manifest.6bp.sorted.bed
├── manifest.checked.bed
├── manifests
│   ├── 129P2_OlaHsd.txt
│   ├── 129S1_SvImJ.txt
│   ├── 129S5SvEvBrd.txt
│   ├── A_J.txt
│   ├── AKR_J.txt
│   ├── BALB_cJ.txt
│   ├── BTBR_T+_Itpr3tf_J.txt
│   ├── BUB_BnJ.txt
│   ├── C3H_HeH.txt
│   ├── C3H_HeJ.txt
│   ├── C57BL_10J.txt
│   ├── C57BL_6NJ.txt
│   ├── C57BR_cdJ.txt
│   ├── C57L_J.txt
│   ├── C58_J.txt
│   ├── CAST_EiJ.txt
│   ├── CBA_J.txt
│   ├── DBA_1J.txt
│   ├── DBA_2J.txt
│   ├── FVB_NJ.txt
│   ├── I_LnJ.txt
│   ├── KK_HiJ.txt
│   ├── LEWES_EiJ.txt
│   ├── LP_J.txt
│   ├── MOLF_EiJ.txt
│   ├── NOD_ShiLtJ.txt
│   ├── NZB_B1NJ.txt
│   ├── NZO_HlLtJ.txt
│   ├── NZW_LacJ.txt
│   ├── PWK_PhJ.txt
│   ├── RF_J.txt
│   ├── SEA_GnJ.txt
│   ├── SPRET_EiJ.txt
│   ├── ST_bJ.txt
│   ├── WSB_EiJ.txt
│   └── ZALENDE_EiJ.txt
├── manifest.var.overlapped.annotated.txt
├── manifest.var.overlapped.bed
├── manifest.var.overlapped.filtered.txt
├── MM285.mm10.manifest.bed
└── strain_specific_snp.txt

head Demo/manifests/A_J.txt

Probe_ID        M       U       col     mask
cg36602742_TC11 28608858.0      20700992        R       FALSE
cg36602743_TC21         55772828                FALSE
cg36602902_BC11 90750550.0      93783238        R       FALSE
cg36603287_TC21         13783923                FALSE
cg36603393_BC21         74727392                FALSE
cg36603791_TC21         90692186                FALSE
cg36603848_TC21         26796263                FALSE
cg36604001_TC21         86700295                FALSE
cg36604489_TC21         58682505                FALSE

generate MM285.addressStrain in R if necessary

strain_manifests_dir="Demo/manifests"
library(sesame)
library(stringr)
manifest=sesameDataGet("MM285.address")
addr <- list()
strains <- list()
for(name in list.files(path=strain_manifests_dir)) {
  print(name)
  strain <- str_replace(name,".txt",'')
  print(strain)
  mft <- read.csv(paste(strain_manifests_dir,name,sep='/'),sep='\t',header=T,check.names = F)
  mft <- mft[match(mft$Probe_ID,manifest$ordering$Probe_ID),]
  mft[mft$col=='','col'] <- NA
  L <- list(col=as.factor(mft$col),mask=mft$mask,Strain=strain)
  strains[[strain]] <- L
}
addr$ordering <- manifest$ordering
addr$strains <- strains
length(names(addr$strains))
class(addr$strains$PWK_PhJ$mask)
saveRDS(addr, file = "/Demo/MM285.addressStrain.rds")

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
strain_specific_manifest.py		strain_specific_manifest.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dependencies

Generating strain-specific manifest with given vcf, manifest, and reference genome

Demo

generate MM285.addressStrain in R if necessary

About

Uh oh!

Releases

Packages

Languages

zhou-lab/InfiniumManifestAnnotator

Folders and files

Latest commit

History

Repository files navigation

Dependencies

Generating strain-specific manifest with given vcf, manifest, and reference genome

Demo

generate MM285.addressStrain in R if necessary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages