Graph_Traversal

Find out all the connect components in the graph that contain the start nodes.

The graph used as input is generated by the SGA - String Graph Assembler. SGA is a de novo genome assembler based on the concept of string graphs. The major goal of SGA is to be very memory efficient, which is achieved by using a compressed representation of DNA sequence reads. link: https://github.com/jts/sga#sga---string-graph-assembler

the python version is 3.6.5

Preprocessing

the graph that generated by the SGA is saved in asqg format, which contains all the information of vertices and edges,and only the information about edges are needed. Filtering out all the unnecessary data is easy with the help of zgrep command in linux.

In a terminal of linux, run the fllowing command:

zgrep "ED.*" filename > output_filename
gzip output_filename

Usage

search.py

If you have enough memory to handle a huge graph, use search.py for better time-efficient,whole graph will be loaded into memory for further process. A graph that contains two million nodes need at least 13 GB memory.

Usage: search.py graph_file sam_file

Example: search.py example.gz example.sam

db_build.py and search_d.py

If you don't have enough memory, use db_build.py to build a SQLite database on your hard disk first.

Usage: db_build.py graph_file graph_database

Example: db_build.py example.gz my_database.db

After the database is built, use search_d.py to do the search.

Usage: search_d.py graph_database sam_file

Example: search_d.py my_database.db example.sam

Example

change the directory to the directory that contains your data.
```
 cd your_dir
```
do preprocessing, note that graph.asqg.gz is the output of SGA. gzip is used to compress the file to save space.
```
 zgrep "ED.*" graph.asqg.gz > example.txt
 gzip example.txt
```

if you have enough memory:

 python search.py example.gz example.sam

then check the result in the result.txt.

if the memory is not enough:

 python db_build.py example.gz my_database.db
 python search_d.py my_database.db example.sam

then check the result in the result.txt.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Graph_Traversal

Preprocessing

Usage

search.py

db_build.py and search_d.py

Example

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.idea		.idea
README.md		README.md
db_build.py		db_build.py
example.gz		example.gz
example.sam		example.sam
search.py		search.py
search_d.py		search_d.py

ynyxlxx/Graph_Traversal

Folders and files

Latest commit

History

Repository files navigation

Graph_Traversal

Preprocessing

Usage

search.py

db_build.py and search_d.py

Example

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages