Skip to content

How can I fix config file when I only have DNA data? #42

@Donbbit

Description

@Donbbit

Hi, I used Scanneo2 when I only have dna_normal and dna_tumor data, I changed the config like this:

Reference

General settings

reference:
release: 111
nonchr: false
threads: 30
mapq: 30 # overall required mapping quality
basequal: 20 # overall required base quality

data:
name: D1
dnaseq:
dna_normal: /hsfscqjf3/DIPSEQ/zfssz8/CNGB_DATA/BGISEQ01/DIPSEQ/DIPSEQT20/P24Z10200N0995_Temp/DT2411609481-1/250210_SEQ081_FP500002421_L01_SP2501130808/FP500002421_L01_375_1.fq.gz /hsfscqjf3/DIPSEQ/zfssz8/CNGB_DATA/BGISEQ01/DIPSEQ/DIPSEQT20/P24Z10200N0995_Temp/DT2411609481-1/250210_SEQ081_FP500002421_L01_SP2501130808/FP500002421_L01_375_2.fq.gz
dna_tumor1: /hsfscqjf3/DIPSEQ/zfssz8/CNGB_DATA/BGISEQ01/DIPSEQ/DIPSEQT20/P24Z10200N0995_Temp/D2411320266/250210_SEQ082_FP500002422_L01_SP2501130799/FP500002422_L01_492_1.fq.gz /hsfscqjf3/DIPSEQ/zfssz8/CNGB_DATA/BGISEQ01/DIPSEQ/DIPSEQT20/P24Z10200N0995_Temp/D2411320266/250210_SEQ082_FP500002422_L01_SP2501130799/FP500002422_L01_492_2.fq.gz
dna_tumor2: /hsfscqjf3/DIPSEQ/zfssz8/CNGB_DATA/BGISEQ01/DIPSEQ/DIPSEQT20/P24Z10200N0995_Temp/D2411320262/250210_SEQ082_FP500002422_L01_SP2501130795/FP500002422_L01_488_1.fq.gz /hsfscqjf3/DIPSEQ/zfssz8/CNGB_DATA/BGISEQ01/DIPSEQ/DIPSEQT20/P24Z10200N0995_Temp/D2411320262/250210_SEQ082_FP500002422_L01_SP2501130795/FP500002422_L01_488_2.fq.gz
rnaseq:
rna_tumor:
normal: dna_normal

custom:
variants:
hlatyping:
MHC-I:
MHC-II:

pre-processing (only applied on fastq reads)

preproc:
activate: true # whether (=true) or not (=false) to include pre-processing
minlen: 10
slidingwindow:
activate: true
wsize: 3
wqual: 20

alingment

align:
chimSegmentMin: 20
chimScoreMin: 10
chimJunctionOverhangMin: 10
chimScoreDropMax: 30
chimScoreSeparation: 10

variant calling

alternative splicing

altsplicing:
activate: true # whether (=true) or not (=false) to include alternative splicing events
confidence: 3 # confidence level (1,2 or 3) - filtering of input alignments
iterations: 5 # number of iteratios (when adding intro edges) - increases sensitivity
edgelimit: 250 # limit max number of edges in graph - affects the runtime

exitron splicing

exitronsplicing:
activate: true # whether (=true) or not (=false) to include exitron-splicing events
ao: 3 # allele observation
pso: 0.05 # percent spliced out
#strand: 1 # strand specificity of library (0=unstranded, 1=forward, 2=reverse)
strand: XS # strand specificity of library (0=XS, 1=RF, 2=FR)

gene fusion

genefusion:
activate: true # whether (=true) or not (=false) to include gene fusion events
maxevalue: 0.3
suppreads: 2 # all fusions with less than suppreads are discarded
maxsuppreads: 1000
maxidentity: 0.3 # genes with fraction of identity are discarded (homologs)
hpolymerlen: 6 # removes breakpoints adjacent to homopolymers of length
readthroughdist: 10000 # distance between breakpoints with less than distance
minanchorlen: 20 # removes fusions whose segments are less than minchimlen
splicedevents: 4 # fusions between genes need at least this many spliced breakpoints
maxkmer: 0.6 # remove reads with repetitive 3-mer that make up more than maxkmer
fraglen: 200 # mean fragment length
maxmismatch: 0.01

indel

indel:
activate: true # whether (=true) or not (=false) to include indels
type: all # long, short, all
mode: DNA # DNA, RNA or BOTH -

strategy for optimizing posterior probability threshold

strategy: OPTIMAL_F_SCORE # OPTIMAL_F_SCORE, FALSE_DISCOVERY_RATE, CONSTANT
fscorebeta: 1.0 # rel. weight of recall to precision (when OPTIMAL_F_SCORE is selected)
fdr: 0.05 # false discovery rate (when FALSE_DISCOVERY_RATE is selected)
sliplen: 8 # min number of reference bases to suspect slippage event
sliprate: 0.1 # frequency of slippage when it is supsected

quantification:
mode: DNA # RNA, RNA or BOTH

hlatyping:
class: BOTH # I, II or BOTH

specific path for class II hlatyping (only required when class: II, or BOTH)

MHC-I_mode: DNA # DNA, RNA, or custom (if empty alleles have to be specified in custom)
MHC-II_mode: DNA # DNA, RNA, or custom (if empty alleles have to be specified in custom)

specific path for class II hlatyping (only required when class: II, or BOTH)

freqdata: /hsfscqjf1/ST_CQ/P23Z32300N0005/lvmeiqi/software/miniconda3/envs/scanneo2/soft/hlahd.1.7.0/freq_data/
split: /hsfscqjf1/ST_CQ/P23Z32300N0005/lvmeiqi/software/miniconda3/envs/scanneo2/soft/hlahd.1.7.0/HLA_gene.split.txt
dict: /hsfscqjf1/ST_CQ/P23Z32300N0005/lvmeiqi/software/miniconda3/envs/scanneo2/soft/hlahd.1.7.0/dictionary/

prioritization:
class: I # I, II or BOTH
lengths:
MHC-I: 8,9,10,11
MHC-II: 13,14,15

And I got the error :
Config file /hsfscqjf1/ST_CQ/P24Z32300N0028/lvmeiqi/Project/1.ESCA/1.scanneo2/D1/config.yaml is extended by additional config specified via the command line.
Traceback (most recent call last):
File "/hsfscqjf1/ST_CQ/P23Z32300N0005/lvmeiqi/software/miniconda3/envs/scanneo2/lib/python3.12/site-packages/snakemake/cli.py", line 1898, in args_to_api
dag_api = workflow_api.dag(
^^^^^^^^^^^^^^^^^
File "/hsfscqjf1/ST_CQ/P23Z32300N0005/lvmeiqi/software/miniconda3/envs/scanneo2/lib/python3.12/site-packages/snakemake/api.py", line 326, in dag
return DAGApi(
^^^^^^^
File "", line 6, in init
File "/hsfscqjf1/ST_CQ/P23Z32300N0005/lvmeiqi/software/miniconda3/envs/scanneo2/lib/python3.12/site-packages/snakemake/api.py", line 436, in post_init
self.workflow_api._workflow.dag_settings = self.dag_settings
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/hsfscqjf1/ST_CQ/P23Z32300N0005/lvmeiqi/software/miniconda3/envs/scanneo2/lib/python3.12/site-packages/snakemake/api.py", line 383, in _workflow
workflow.include(
File "/hsfscqjf1/ST_CQ/P23Z32300N0005/lvmeiqi/software/miniconda3/envs/scanneo2/lib/python3.12/site-packages/snakemake/workflow.py", line 1382, in include
exec(compile(code, snakefile.get_path_or_uri(), "exec"), self.globals)
File "/hsfscqjf1/ST_CQ/P24Z32300N0028/lvmeiqi/Project/1.ESCA/1.scanneo2/D1/workflow/Snakefile", line 27, in
include: "rules/custom.smk"
File "/hsfscqjf1/ST_CQ/P23Z32300N0005/lvmeiqi/software/miniconda3/envs/scanneo2/lib/python3.12/site-packages/snakemake/workflow.py", line 1382, in include
exec(compile(code, snakefile.get_path_or_uri(), "exec"), self.globals)
File "/hsfscqjf1/ST_CQ/P24Z32300N0028/lvmeiqi/Project/1.ESCA/1.scanneo2/D1/workflow/rules/common.smk", line 126, in
config['data'] = data_structure(config['data'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/hsfscqjf1/ST_CQ/P24Z32300N0028/lvmeiqi/Project/1.ESCA/1.scanneo2/D1/workflow/rules/common.smk", line 11, in data_structure
config['data']['rnaseq'], filetype, readtype = handle_seqfiles(config['data']['rnaseq'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/hsfscqjf1/ST_CQ/P24Z32300N0028/lvmeiqi/Project/1.ESCA/1.scanneo2/D1/workflow/rules/common.smk", line 64, in handle_seqfiles
return mod_seqdata, filetype[0], readtype[0]
^^^^^^^^^^^^^^
IndexError: list index out of range

What should I do when I don't have rna data?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions