-
Couldn't load subscription status.
- Fork 13
Configuration
marekkokot edited this page Mar 10, 2025
·
2 revisions
There are a lot of parameters allowing to customize the pipeline. They can be grouped into several categories.
The parameters will be displayed when running splash without parameters (or with --help).
-
input_file— positional argument, path to the file where input samples are defined, the format is: per each line{sample_name} {path}path is a fastq[.gz] file in case of non-10X and txt file for 10X/Visium where the content of text file is{first_file.fastq[.gz]},{second_file.fastq[.gz]}per line
-
--outname_prefix— prefix of output file names (default: result) -
--anchor_len— anchor length (default: 31) -
--gap_len— gap length, if 'auto' it will be inferred from the data, in the opposite case it must be an int (default: 0) -
--target_len— target length (default: 31) -
--anchor_list— list of accepted anchors, this is path to plain text file with one anchor per line without any header (default accept all anchors) -
--pvals_correction_col_name— for which column correction should be applied (default: pval_opt) -
--technology— Technology used to generate the input data, must be one of 'base', '10x', 'visium' (default:base) -
--without_compactors— if used compactors will not be run (default: False) -
--compactors_config— optional json file with compactors configuration, example file content: { "num_threads": 4, "epsilon": 0.001 } (default: ) -
--lookup_table_config— optional json file with configuration of lookup_table, if not specified lookup_tables are not used (default: )
-
--poly_ACGT_len— filter out all anchors containing poly(ACGT) of length at least <poly_ACGT_len> (0 means no filtering) (default: 8) -
--artifacts— path to artifacts, each anchor containing artifact will be filtered out (default: ) -
--dont_filter_illumina_adapters— if used anchors containing Illumina adapters will not be filtered out (default: False) -
--anchor_unique_targets_threshold— filter out all anchors for which the number of unique targets is <= anchor_unique_targets_threshold (default: 1) -
--anchor_count_threshold— filter out all anchors for which the total count <= anchor_count_threshold (default: 50) -
--anchor_samples_threshold— filter out all anchors for which the number of unique samples is <= anchor_samples_threshold (default: 1) -
--anchor_sample_counts_threshold— filter out anchor from sample if its count in this sample is <= anchor_sample_counts_threshold (default: 5) -
--n_most_freq_targets_for_stats— use at most n_most_freq_targets_for_stats for each contingency table, 0 means use all (default: 0) -
--n_most_freq_targets_for_dump— use when dumping satc (txt or binary), resulting file will only contain data for n_most_freq_targets_for_dump targets in each anchor, 0 means use all (default: 0) -
--fdr_threshold— keep anchors having corrected p-val below this value (default: 0.05) -
--min_hamming_threshold— keep only anchors with a pair of targets that differ by >= min_hamming_threshold (default: 0) -
--keep_top_n_target_entropy— select keep_top_n_target_entropy records with highest target entropy (0 means don't select) (default: 10000) -
--keep_top_n_effect_size_bin— select keep_top_n_effect_size_bin records with highest effect size bin (0 means don't select) (default: 20000) -
--keep_significant_anchors_satc— if set there will be additional output file in SATC format with all significant anchors (default: False) -
--keep_top_target_entropy_anchors_satc— if set there will be additional output file in SATC format with top target entropy significant anchors (default: False) -
--keep_top_effect_size_bin_anchors_satc— if set there will be additional output file in SATC format with top effect size bin anchors (default: False)
-
--dump_Cjs— output Cjs (default: False) -
--max_pval_opt_for_Cjs— dump only Cjs for anchors that have pval_opt <= max_pval_opt_for_Cjs (default: 0.1) -
--n_most_freq_targets— number of most frequent targets printed per each anchor (default: 10) -
--with_effect_size_cts— if set effect_size_cts will be computed (default: False) -
--with_pval_asymp_opt— if set pval_asymp_opt will be computed (default: False) -
--without_seqence_entropy— if set sequence entropy for anchor and most freq targets will not be computed (default: False) -
--sample_name_to_id— file name with mapping sample name <-> sample id (default: sample_name_to_id.mapping.txt) -
--dump_sample_anchor_target_count_txt— if set contingency tables will be generated in text format (default: False) -
--dump_sample_anchor_target_count_binary— if set contingency tables will be generated in binary (SATC) format, to convert to text format latersatc_dumpprogram may be used, it may take optionally mapping from id to sample_name (--sample_names param) (default: False) -
--supervised_test_samplesheet— if used script for finding/visualizing anchors with metadata-dependent variation will be run (forces--dump_sample_anchor_target_count_binary) (default: ) -
--supervised_test_anchor_sample_fraction_cutoff— the cutoff for the minimum fraction of samples for each anchor (default: 0.4) -
--supervised_test_num_anchors— maximum number of anchors to be tested example (default: 20000)
-
--opt_num_inits— the number of altMaximize random initializations (default: 10) -
--opt_num_iters— the maximum number of iterations in altMaximize (default: 50) -
--num_rand_cf— the number of random c and f used for pval_base (default: 50) -
--num_splits— the number of contingency table splits (default: 1) -
--opt_train_fraction— use this fraction to create training data from contingency table (default: 0.25) -
--without_alt_max— if set int alt max and related stats will not be computed (default: False) -
--without_sample_spectral_sum— if set sample spectral sum will not be computed (default: False) -
--Cjs_samplesheet— path to file with predefined Cjs for non-10X supervised mode (default: )
-
--bin_path— path to a directory where satc, satc_dump, satc_merge, sig_anch, kmc, kmc_tools, and other binaries are (if any not found there splash will check if installed and use installed) (default: bin) -
--tmp_dir— path to a directory where temporary files will be stored (default: let splash decide)
-
--n_threads_stage_1— number of threads for the first stage, too large value is not recomended because of intensive disk access here, but may be profitable if there is a lot of small size samples in the input (0 means auto adjustment) (default: 0) -
--n_threads_stage_1_internal— number of threads per each stage 1 thread (0 means auto adjustment) (default: 0) -
--n_threads_stage_1_internal_boost— multiply the value of n_threads_stage_1_internal by this (may increase performance but the total number of running threads may be high) (default: 1) -
--n_threads_stage_2— number of threads for the second stage, high value is recommended if possible, single thread will process single bin (0 means auto adjustment) (default: 0) -
--n_bins— the data will be split in a number of bins that will be merged later (default: 128) -
--kmc_use_RAM_only_mode— True here may increase performance but also RAM-usage (default: False) -
--kmc_max_mem_GB— maximal amount of memory (in GB) KMC will try to not extend (default: 12) -
--dont_clean_up— if set then intermediate files will not be removed (default: False) -
--logs_dir— director where run logs of each thread will be stored (default: logs)
-
--cbc_len— call barcode length (in case of 10X/Visium data) (default: 16) -
--umi_len— UMI length (in case of 10X/Visium data) (default: 12) -
--soft_cbc_umi_len_limit— allow additional symbols (beyond cbc_len + umi_len in _1.fastq 10X file UMI (default: 0) -
--cbc_filtering_thr— how to filter cbcs, if 0 do the same as umi tools, in the opposite case keep cbcs with freq >= <cbc_filtering_thr> (default: 0) -
--cell_type_samplesheet— path for mapping barcode to cell type, is used Helmert-based supervised mode is turned on (default: ) -
--export_cbc_logs— use if need cbc log files (default: False) -
--predefined_cbc— path to file with predefined CBCs (default: ) -
--export_filtered_input— use if need filtered FASTQ files (default: False) -
--allow_strange_cbc_umi_reads— use to prevent the application from crashing when the CBC+UMI read length is outside the acceptable range (either shorter than CBC+UMI or longer than CBC+UMI+soft_cbc_umi_len_limit) (default: False)