Separate

If original scripts (in master folder) will not work, try to copy paste scripts from alternative_version folder to master and use them instead.

Dependencies

Program tested on Linux Ubuntu 18.04.2. Should be also working on Windows however it's not recommended.

Those scripts need Python3 and are using those libraries (they should be built-in):

argparse
os
re
numpy (not necessary - used in flag -P in separate_Pfam_to_counts.py ($ sudo pip3 install numpy))
matplotlib (not necessary - used in flag -P in separate_Pfam_to_counts.py ($ sudo pip3 install matplotlib))

To install them quickly you can just type: $ pip3 install -r requirements.txt

However for full functionality you will need to install CLANS

Important note!!!

If you want to use CLANS program for example during all_in_one.py script remember to save result after finished clustering then create groups and put them into "groups_from_CLANS" folder!

all_in_one.py

Automatically process your data using default parameters (see below under respective scripts).

Flag	Description	Default
-C	Directory to CLANS
-P	File from Pfam directory
-F	Directory to file with full sequences

Example usage:
python3 all_in_one.py -P other_files/PF01699_full.txt -F other_files/PF01699_raw.fasta -C here-put-directory-to/clans.jar

separate_Pfam_to_counts.py

Input: File with repeats downloaded from Pfam database.

IMPORTANT:
- Format: FASTA
- Gaps: Gaps as "-" (dashes)

Flag	Description	Default
-F	Directory to the file
-P	Show the plot with all start/end positions	False
-M	Separate all structures, also with more than 2 counts	False

Example usage:
python3 separate_Pfam_to_counts.py -F PF00000_full.txt
or to see a plot:
python3 separate_Pfam_to_counts.py -F PF00000_full.txt -P

formatting.py

Delete all gaps and convert names.

Flag	Description	Default
-F	Directory to the file
-R	Recursive search - Directory to folder with files
-S	Substitute character	_

Example usage:
python3 formatting.py -F files_from_separate_Pfam_to_counts/put-file-name-here
or for recursive search in folder:
python3 formatting.py -R files_from_separate_Pfam_to_counts

full_seq_extract_n_counts.py

While file downloaded from Pfam may contain sequences with other number than n repeats (for example 2), this program will filter only interesting ones.

Flag	Description	Default
-F	Directory to file with full sequences
-S	Directory to file from separate_Pfam_to_counts
-N	Number of repeats	2

Example usage:
python3 full_seq_extract_n_counts.py -F PF00000_raw.fasta -S files_from_separate_Pfam_to_counts/put-here-name-of-the-file

separate_groups_to_counts.py

Seperate CLANS groups files to counts.

Flag	Description	Default
-G	Folder name with groups files	groups_from_CLANS
-S	Folder name with separated files	files_from_separate_Pfam_to_counts
-N	Number of counts	2

Example usage:
python3 separate_groups_to_counts.py

simple_filter_by_length.py

Filters sequences by their length (number of letters). It helps to filter longer or shorter seq than average but will not filter if seq is missing some part but is longer in other while length stays similar to others.

Flag	Description	Default
-F	Directory to the file
-R	Recursive search - Directory to folder with files
-A	More or less value than average

Example usage:
python3 simple_filter_by_length.py -F files_from_separate_groups_to_counts/put-here-file-name -A 15
or for recursive search:
python3 simple_filter_by_length.py -R files_from_separate_groups_to_counts -A 10

advanced_filter_by_value.py

Example can be found below

Flag	Description	Default
-F	Directory to the file
-R	Recursive search - Directory to folder with files
-M	Error margin	0.4
-A	Acceptable number of errors	10
-O	Detailed output	False

Example usage:
python3 advanced_filter_by_value.py -F files_from_separate_groups_to_counts/put-here-file-name'
for recursive search:
python3 advanced_filter_by_value.py -R files_from_separate_groups_to_counts'
to change parameters:
python3 advanced_filter_by_value.py -R files_from_separate_groups_to_counts -M 0.5 -A 20'

About advanced_filter_by_value.py

Example for:
margin = 0.4
error = 1

Sequence	col 1	col 2	col 3	col 4	col 5	Comment
S1	-	-	-	-	A
S2	-	-	-	A	A
S3	-	-	A	A	A
S4	-	A	A	A	A
S5	A	A	A	A	A
Value	0.2	0.4	0.6	0.8	1
Compare S1	OK	OK	X	X	OK	Error: 2 - Sequence deleted
Compare S2	OK	OK	X	OK	OK	Error: 1 - Sequence is OK
Compare S3	OK	OK	OK	OK	OK	Error: 0 - Sequence is OK
Compare S4	OK	X	OK	OK	OK	Error: 1 - Sequence is OK
Compare S5	X	X	OK	OK	OK	Error: 2 - Sequence deleted

So too long and too short sequence are deleted. Notice, that even if columns are in different order result will be the same.

rename_sequences.py

Adds prefix to each sequence name - helpful if you want to later merge more fasta files into one.

Flag	Description	Default
-F	Directory to the file
-R	Recursive search - Directory to folder with files

Example usage:
python3 rename_sequences.py -F directory/filename.txt
or
python3 rename_sequences.py -R ./directory/to/folder

merge_into_one_file.py

Merge fasta files. Note that all files you want to merge must be in one folder.

Flag	Description	Default
-R	Directory to folder with files
-O	Name of the output file	merged

Example usage:
python3 merge_into_one_file.py -R ./directory/to/folder -O output_filename

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Separate

Dependencies

Important note!!!

all_in_one.py

separate_Pfam_to_counts.py

formatting.py

full_seq_extract_n_counts.py

separate_groups_to_counts.py

simple_filter_by_length.py

advanced_filter_by_value.py

About advanced_filter_by_value.py

rename_sequences.py

merge_into_one_file.py

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
alternative_versions		alternative_versions
README.md		README.md
advanced_filter_by_value.py		advanced_filter_by_value.py
all_in_one.py		all_in_one.py
formatting.py		formatting.py
full_seq_extract_n_counts.py		full_seq_extract_n_counts.py
merge_into_one_file.py		merge_into_one_file.py
rename_sequences.py		rename_sequences.py
repeats_profiles_compare		repeats_profiles_compare
requirements.txt		requirements.txt
separate_Pfam_to_counts.py		separate_Pfam_to_counts.py
separate_groups_to_counts.py		separate_groups_to_counts.py
simple_filter_by_length.py		simple_filter_by_length.py
to_CLANS_part_2.py		to_CLANS_part_2.py

exsto1/Separate

Folders and files

Latest commit

History

Repository files navigation

Separate

Dependencies

Important note!!!

all_in_one.py

separate_Pfam_to_counts.py

formatting.py

full_seq_extract_n_counts.py

separate_groups_to_counts.py

simple_filter_by_length.py

advanced_filter_by_value.py

About advanced_filter_by_value.py

rename_sequences.py

merge_into_one_file.py

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages