EnigmaDB

Dataset generation pipeline for Enigma2 using NCBI database. Along with additional helper such as Dataset class to retireve & create batches for training ML models.

Prerequisites

Before setting up EnigmaDB, ensure that you have the following prerequisites installed:

Python 3.8 or higher
pip (Python package installer)

Dependencies

Installation

1. From PyPI

pip install enigmadatabase

2. Clone the Repo

git clone https://github.com/delveopers/EnigmaDataset.git
cd EnigmaDB

Documentation & Usage

Data gathering pipeline

from EnigmaDB import Database, EntrezQueries
queries = EntrezQueries()   # get queries

db = Database(topics=queries(), out_dir="./data/raw", email=EMAIL, api_key=API_KEY, retmax=1500, max_rate=10)   # set parameters
db.build(with_index=False)  # startbuilding

Creating Indexes

from EnigmaDB import create_index

create_index("./data/raw")    # add path to data

Converting versions

from EnigmaDB import convert_fasta

convert_fasta(input_dir="./data/raw", output_dir="./data/parquet", mode='parquet')  # for parquet
convert_fasta(input_dir="./data/raw", output_dir="./data/parquet", mode='csv')  # for csv

for more technical information, refer to documentation:

Project Structure

├── docs/
├── ├── Database.md
├── ├── Dataset.md
├── src/
├── ├── __init__.py
├── ├── _database.py    # ``Database`` class for downloading data from NCBI
├── ├── _dataset.py     # ``Dataset`` a dataloader class for enigma2
├── ├── _queries.py     # contains queries for DB pipeline
├── README.md
├── setup.py
├── pyproject.toml
├── requirements.txt  # List of Python dependencies

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Fork the repository.
Create a feature branch:

git checkout -b feature-name

Commit your changes:

git commit -m "Add feature"

Push to the branch:

 git push origin feature-name

Create a pull request.

Please make sure to update tests as appropriate.

License

MIT License. Check out License for more info.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EnigmaDB

Prerequisites

Dependencies

Installation

1. From PyPI

2. Clone the Repo

Documentation & Usage

Data gathering pipeline

Creating Indexes

Converting versions

Project Structure

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
EnigmaDB		EnigmaDB
docs		docs
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

License

delveopers/EnigmaDataset

Folders and files

Latest commit

History

Repository files navigation

EnigmaDB

Prerequisites

Dependencies

Installation

1. From PyPI

2. Clone the Repo

Documentation & Usage

Data gathering pipeline

Creating Indexes

Converting versions

Project Structure

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages