Skip to content

Commit d18d41d

Browse files
committed
first commit on Github
0 parents  commit d18d41d

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

72 files changed

+5372
-0
lines changed

.gitignore

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
# Byte-compiled / optimized / DLL files
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
6+
# C extensions
7+
*.so
8+
9+
# Distribution / packaging
10+
.Python
11+
build/
12+
develop-eggs/
13+
dist/
14+
downloads/
15+
eggs/
16+
.eggs/
17+
lib/
18+
lib64/
19+
parts/
20+
sdist/
21+
var/
22+
wheels/
23+
*.egg-info/
24+
.installed.cfg
25+
*.egg
26+
MANIFEST
27+
28+
# PyInstaller
29+
# Usually these files are written by a python script from a template
30+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
31+
*.manifest
32+
*.spec
33+
34+
# Installer logs
35+
pip-log.txt
36+
pip-delete-this-directory.txt
37+
38+
# Unit test / coverage reports
39+
htmlcov/
40+
.tox/
41+
.coverage
42+
.coverage.*
43+
.cache
44+
nosetests.xml
45+
coverage.xml
46+
*.cover
47+
.hypothesis/
48+
.pytest_cache/
49+
50+
# Translations
51+
*.mo
52+
*.pot
53+
54+
# Django stuff:
55+
*.log
56+
local_settings.py
57+
db.sqlite3
58+
59+
# Flask stuff:
60+
instance/
61+
.webassets-cache
62+
63+
# Scrapy stuff:
64+
.scrapy
65+
66+
# Sphinx documentation
67+
docs/_build/
68+
69+
# PyBuilder
70+
target/
71+
72+
# Jupyter Notebook
73+
.ipynb_checkpoints
74+
75+
# pyenv
76+
.python-version
77+
78+
# celery beat schedule file
79+
celerybeat-schedule
80+
81+
# SageMath parsed files
82+
*.sage.py
83+
84+
# Environments
85+
.env
86+
.venv
87+
env/
88+
venv/
89+
ENV/
90+
env.bak/
91+
venv.bak/
92+
93+
# Spyder project settings
94+
.spyderproject
95+
.spyproject
96+
97+
# Rope project settings
98+
.ropeproject
99+
100+
# mkdocs documentation
101+
/site
102+
103+
# mypy
104+
.mypy_cache/
105+
106+
.idea
107+
work
108+
.nextflow*
109+
output

.gitlab-ci.yml

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
image: openjdk:11.0.10-jre-buster
2+
3+
before_script:
4+
- java -version
5+
- apt-get update && apt-get --assume-yes install wget make procps
6+
- wget -qO- https://get.nextflow.io | bash && cp nextflow /usr/local/bin/nextflow
7+
- nextflow help
8+
- wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
9+
- mkdir /root/.conda
10+
- bash Miniconda3-latest-Linux-x86_64.sh -b && cp /root/miniconda3/bin/* /usr/local/bin/
11+
- rm -f Miniconda3-latest-Linux-x86_64.sh
12+
- conda --version
13+
- conda env create -f environment.yml --name bam2tensor_env
14+
- conda init bash
15+
- source ~/.bashrc
16+
- conda activate bam2tensor_env
17+
18+
19+
test_aligner:
20+
script:
21+
- python tests/test_aligner.py
22+
23+
test_aligner_utils:
24+
script:
25+
- python tests/test_aligner_utils.py
26+
27+
test_intermediate_matrices:
28+
script:
29+
- python tests/test_intermediary_matrices.py
30+
31+
#stages:
32+
# - test
33+
#
34+
#test:
35+
# stage: test
36+
# script:
37+
# - make

LICENSE

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
Copyright (c) 2023 TRON gGmbH
2+
3+
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
4+
5+
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
6+
7+
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

MANIFEST.in

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
graft src
2+
graft tests
3+
prune scripts
4+
5+
recursive-include docs/source *.py
6+
recursive-include docs/source *.rst
7+
include docs/Makefile
8+
9+
global-exclude *.py[cod] __pycache__ *.so *.dylib .DS_Store *.gpickle
10+
11+
exclude .bumpversion.cfg .flake8 .travis.yml
12+
include README.rst LICENSE tox.ini

Makefile

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
2+
all : clean test check
3+
4+
clean:
5+
rm -rf output
6+
rm -f .nextflow.log*
7+
rm -rf .nextflow*
8+
9+
10+
test:
11+
nextflow main.nf --help
12+
nextflow main.nf -profile test,conda --output output/test1
13+

README.md

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
# bam2tensor
2+
3+
4+
Toolbox to convert BAM files into tensors
5+
6+
## Installation
7+
8+
Download this repository, go to the directory it resides and run:
9+
10+
```
11+
git clone https://github.com/TRON-Bioinformatics/bam2tensor.git
12+
cd bam2tensor
13+
pip install -e .
14+
```
15+
16+
## Requirements
17+
18+
* Python 3.9+
19+
* Packages listed under environment.yml
20+
* The required libraries can be found under `setup.cfg` and are automatically installed when you install this package as shown above.
21+
22+
## Running the nextflow pipeline
23+
24+
### Usage:
25+
```
26+
nextflow run tron-bioinformatics/bam2tensor
27+
-profile conda \
28+
--input_files input_files \
29+
--publish_dir out_dir \
30+
--reference genome_ref.fa \
31+
--window 150 \
32+
--max_coverage 500 \
33+
--read_length 50 \
34+
--max_mapq 60 \
35+
--max_baseq 82
36+
```
37+
38+
### Input:
39+
40+
* input_files: the path to a tab-separated values file containing in each row the sample name and a BAM file
41+
42+
The input file does not have a header!
43+
44+
Example input file:
45+
46+
name1 tumor_bam1 tumor_bai1 normal_bam1 normal_bai1 candidates1.tsv
47+
48+
name2 tumor_bam2 tumor_bai2 normal_bam2 normal_bai2 candidates2.tsv
49+
50+
* reference: the reference genome
51+
52+
* window: length of the window to be included around the variant
53+
54+
* max_coverage: Maximum coverage value to normalize coverage matrices
55+
56+
* read_length: The length of majority of the reads in BAM
57+
58+
* max_mapq: Maximum mapping quality to normalize mapping quality matrices, values indicating unknown mapping quality is ignored
59+
60+
* max_baseq: Maximum base quality to normalize base quality matrices, values indicating unknown base quality is ignored
61+
62+
63+
### Optional input:
64+
65+
* publish_dir: the folder where to publish output
66+
67+
* memory: the ammount of memory used by each job (default: 15g)
68+
69+
* cpus: the number of CPUs used by each job (default: 8)
70+
71+
72+
### Output:
73+
74+
* Tensors under the output folder
75+
76+

environment.yml

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
name: bam2tensor
2+
channels:
3+
- defaults
4+
- conda-forge
5+
- bioconda
6+
dependencies:
7+
- python=3.9.*
8+
- conda-forge::cxx-compiler
9+
- conda-forge::zlib
10+
- bioconda::samtools
11+
- bioconda::bedtools
12+
- bioconda::bcftools
13+
- pip
14+
- pip:
15+
- torch==2.0.1
16+
- fire==0.5.0
17+
- pybedtools==0.9.0
18+
- pysam
19+
- matplotlib==3.7.1
20+
- numpy==1.24.3
21+
- pandas==1.5.3
22+
- seaborn==0.12.2
23+

0 commit comments

Comments
 (0)