GitHub

This is an early-development version of TRASH 2, update of https://github.com/vlothec/TRASH software for repeat identification

New features:

Classification of repeats to repeat families/classes across the fasta file
Better mapping of very short and very long repeats
Additional polishing steps for the repeats found at array edges
Better parallelisation
Better error diagnostics and runtime progress reporting
Full re-write with updates to all algorithms

Installation:

R needs to be installed.
mafft and nhmmer need to be installed.
Running TRASH.R from the /src/ directory for the first time will install the required R packages (if they're missing). See below for the required run settings

TRASH.R needs to be called directly from its directory, or added to the PATH variable for easy access

If TRASH.R does not execute, add permissions by chmod +x ./TRASH.R. Using Rscript ./TRASH.R might be necessary if R code is not being recognised

mafft and nhmmer need to be installed and added to the PATH variable. Alternatively, both can be installed locally and their paths can be added to the src/main.R script, replacing lines 12 and 13 on Windows or 15 and 16 on Linux. Windows installation of nhmmer will require a Unix-like enviroment interface like Cygwin. mafft Windows version is available and can be used by uncommenting line 10 of the src/main.R script

Run

TRASH is run through the TRASH.R script founr in the /src/ directory, with fasta file and output directory arguments:

Required run settings:

-o --output             output directory
-f --fasta              file to process

Optional run settings:

-p --cores_no           number of cores for parallel run, default: 1
-m --max_rep_size       maximum repeat size, default: 1000
-i --min_rep_size       minimum repeat size, default: 7
-t --templates          fasta file with repeat templates and their names

Output

├── [fasta_file]
│   ├── [fasta_file]_repeats_with_seq.csv      main output file with identified repeats
│   ├── [fasta_file]_repeats.gff               main output repeat file in gff format
│   ├── [fasta_file]_repeats.csv               main output file with identified repeats without sequence column
│   ├── [fasta_file]_arrays.csv                repeat arrays, start and end are not perfectly aligned with repeats, but can be used to get locations of repeats without loading in potentially big repeat files
│   ├── [fasta_file]_arrays.gff                repeat arrays as above, in gff format
│   ├── [fasta_file]_run_time.csv              report of the script run time
│   ├── [fasta_file]_regarrays.csv             temp file, can be removed
│   ├── [fasta_file]_aregarrays.csv            temp file, can be removed
│   ├── [fasta_file]_classarrays.csv           temp file, can be removed
│   └── [fasta_file]_no_repeats_arrays.csv     temp file, can be removed

HOR processing

HORT.R instead of TRASH.R command should be used, with following arguments:

--output_folder", "o", 1, "character",
--hor_threshold", "t", 2, "integer", 
--hor_min_len", "l", 2, "integer",  
--class", "c", 1, "character",
--repeats", "r", 1, "character",
--method", "m", 1, "integer",      
--chrA", "A", 1, "character",
--chrB", "B", 2, "character",  
--repeatsB", "b", 2, "character",  
--classB", "C", 2, "character",  
--genomeA", "g", 1, "character",  
--genomeB", "G", 2, "character",  
--saveR", "s", 2, "character",  
--plot_simple", "p", 2, "character"

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
.vscode		.vscode
dep		dep
src		src
temp		temp
testing_fastas		testing_fastas
.gitattributes		.gitattributes
.gitignore		.gitignore
.lintr		.lintr
README.md		README.md
base_run.md		base_run.md
hors.md		hors.md
license.txt		license.txt
plots.md		plots.md
repeats.md		repeats.md
settings.json		settings.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

New features:

Installation:

Run

Required run settings:

Optional run settings:

Output

HOR processing

About

Uh oh!

Releases

Packages

Languages

License

vlothec/TRASH_2

Folders and files

Latest commit

History

Repository files navigation

New features:

Installation:

Run

Required run settings:

Optional run settings:

Output

HOR processing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages