Configuration

There are a lot of parameters allowing to customize the pipeline. They can be grouped into several categories. The parameters will be displayed when running splash without parameters (or with --help).

Input

input_file — positional argument, path to the file where input samples are defined, the format is: per each line {sample_name} {path} path is a fastq[.gz] file in case of non-10X and txt file for 10X/Visium where the content of text file is {first_file.fastq[.gz]},{second_file.fastq[.gz]} per line

Base configuration

--outname_prefix — prefix of output file names (default: result)
--anchor_len — anchor length (default: 31)
--gap_len — gap length, if 'auto' it will be inferred from the data, in the opposite case it must be an int (default: 0)
--target_len — target length (default: 31)
--anchor_list — list of accepted anchors, this is path to plain text file with one anchor per line without any header (default accept all anchors)
--pvals_correction_col_name — for which column correction should be applied (default: pval_opt)
--technology — Technology used to generate the input data, must be one of 'base', '10x', 'visium' (default: base)
--without_compactors — if used compactors will not be run (default: False)
--compactors_config — optional json file with compactors configuration, example file content: { "num_threads": 4, "epsilon": 0.001 } (default: )
--lookup_table_config — optional json file with configuration of lookup_table, if not specified lookup_tables are not used (default: )

Filters and thresholds

--poly_ACGT_len — filter out all anchors containing poly(ACGT) of length at least <poly_ACGT_len> (0 means no filtering) (default: 8)
--artifacts — path to artifacts, each anchor containing artifact will be filtered out (default: )
--dont_filter_illumina_adapters — if used anchors containing Illumina adapters will not be filtered out (default: False)
--anchor_unique_targets_threshold — filter out all anchors for which the number of unique targets is <= anchor_unique_targets_threshold (default: 1)
--anchor_count_threshold — filter out all anchors for which the total count <= anchor_count_threshold (default: 50)
--anchor_samples_threshold — filter out all anchors for which the number of unique samples is <= anchor_samples_threshold (default: 1)
--anchor_sample_counts_threshold — filter out anchor from sample if its count in this sample is <= anchor_sample_counts_threshold (default: 5)
--n_most_freq_targets_for_stats — use at most n_most_freq_targets_for_stats for each contingency table, 0 means use all (default: 0)
--n_most_freq_targets_for_dump — use when dumping satc (txt or binary), resulting file will only contain data for n_most_freq_targets_for_dump targets in each anchor, 0 means use all (default: 0)
--fdr_threshold — keep anchors having corrected p-val below this value (default: 0.05)
--min_hamming_threshold — keep only anchors with a pair of targets that differ by >= min_hamming_threshold (default: 0)
--keep_top_n_target_entropy — select keep_top_n_target_entropy records with highest target entropy (0 means don't select) (default: 10000)
--keep_top_n_effect_size_bin — select keep_top_n_effect_size_bin records with highest effect size bin (0 means don't select) (default: 20000)
--keep_significant_anchors_satc — if set there will be additional output file in SATC format with all significant anchors (default: False)
--keep_top_target_entropy_anchors_satc — if set there will be additional output file in SATC format with top target entropy significant anchors (default: False)
--keep_top_effect_size_bin_anchors_satc — if set there will be additional output file in SATC format with top effect size bin anchors (default: False)

Additional output configuration

--dump_Cjs — output Cjs (default: False)
--max_pval_opt_for_Cjs — dump only Cjs for anchors that have pval_opt <= max_pval_opt_for_Cjs (default: 0.1)
--n_most_freq_targets — number of most frequent targets printed per each anchor (default: 10)
--with_effect_size_cts — if set effect_size_cts will be computed (default: False)
--with_pval_asymp_opt — if set pval_asymp_opt will be computed (default: False)
--without_seqence_entropy — if set sequence entropy for anchor and most freq targets will not be computed (default: False)
--sample_name_to_id — file name with mapping sample name <-> sample id (default: sample_name_to_id.mapping.txt)
--dump_sample_anchor_target_count_txt — if set contingency tables will be generated in text format (default: False)
--dump_sample_anchor_target_count_binary — if set contingency tables will be generated in binary (SATC) format, to convert to text format later satc_dump program may be used, it may take optionally mapping from id to sample_name (--sample_names param) (default: False)
--supervised_test_samplesheet — if used script for finding/visualizing anchors with metadata-dependent variation will be run (forces --dump_sample_anchor_target_count_binary) (default: )
--supervised_test_anchor_sample_fraction_cutoff — the cutoff for the minimum fraction of samples for each anchor (default: 0.4)
--supervised_test_num_anchors — maximum number of anchors to be tested example (default: 20000)

Tuning statistics computation

--opt_num_inits — the number of altMaximize random initializations (default: 10)
--opt_num_iters — the maximum number of iterations in altMaximize (default: 50)
--num_rand_cf — the number of random c and f used for pval_base (default: 50)
--num_splits — the number of contingency table splits (default: 1)
--opt_train_fraction — use this fraction to create training data from contingency table (default: 0.25)
--without_alt_max — if set int alt max and related stats will not be computed (default: False)
--without_sample_spectral_sum — if set sample spectral sum will not be computed (default: False)
--Cjs_samplesheet — path to file with predefined Cjs for non-10X supervised mode (default: )

Technical and performance-related

--bin_path — path to a directory where satc, satc_dump, satc_merge, sig_anch, kmc, kmc_tools, and other binaries are (if any not found there splash will check if installed and use installed) (default: bin)
--tmp_dir — path to a directory where temporary files will be stored (default: let splash decide)

--n_threads_stage_1 — number of threads for the first stage, too large value is not recomended because of intensive disk access here, but may be profitable if there is a lot of small size samples in the input (0 means auto adjustment) (default: 0)
--n_threads_stage_1_internal — number of threads per each stage 1 thread (0 means auto adjustment) (default: 0)
--n_threads_stage_1_internal_boost — multiply the value of n_threads_stage_1_internal by this (may increase performance but the total number of running threads may be high) (default: 1)
--n_threads_stage_2 — number of threads for the second stage, high value is recommended if possible, single thread will process single bin (0 means auto adjustment) (default: 0)
--n_bins — the data will be split in a number of bins that will be merged later (default: 128)
--kmc_use_RAM_only_mode — True here may increase performance but also RAM-usage (default: False)
--kmc_max_mem_GB — maximal amount of memory (in GB) KMC will try to not extend (default: 12)
--dont_clean_up — if set then intermediate files will not be removed (default: False)
--logs_dir — director where run logs of each thread will be stored (default: logs)

sc-SPLASH parameters (10x or Visium processing)

--cbc_len — call barcode length (in case of 10X/Visium data) (default: 16)
--umi_len — UMI length (in case of 10X/Visium data) (default: 12)
--soft_cbc_umi_len_limit — allow additional symbols (beyond cbc_len + umi_len in _1.fastq 10X file UMI (default: 0)
--cbc_filtering_thr — how to filter cbcs, if 0 do the same as umi tools, in the opposite case keep cbcs with freq >= <cbc_filtering_thr> (default: 0)
--cell_type_samplesheet — path for mapping barcode to cell type, is used Helmert-based supervised mode is turned on (default: )
--export_cbc_logs — use if need cbc log files (default: False)
--predefined_cbc — path to file with predefined CBCs (default: )
--export_filtered_input — use if need filtered FASTQ files (default: False)
--allow_strange_cbc_umi_reads — use to prevent the application from crashing when the CBC+UMI read length is outside the acceptable range (either shorter than CBC+UMI or longer than CBC+UMI+soft_cbc_umi_len_limit) (default: False)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Configuration

Input

Base configuration

Filters and thresholds

Additional output configuration

Tuning statistics computation

Technical and performance-related

sc-SPLASH parameters (10x or Visium processing)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Table of contents

Clone this wiki locally