📄 SciUploader – Bulk Sci-Hub PDF Downloader

This tool automates the batch download of academic papers from Sci-Hub using DOIs and organizes the PDFs for further metadata processing and decentralized storage Irys.

📦 Project Structure

sciuploader/
├── doi/                            ← Each page_N.json contains a list of DOIs
├── pdf/                            ← Downloaded PDFs organized by page
├── 0_run_workflow.js              ← Run full workflow script
├── 1_fetch_all_dois.js            ← Fetch DOI list from external source
├── 2_fetch_all_pdfs.js            ← Download PDFs using DOI list
├── 3_generate_basic_metadata.js   ← Generate basic metadata JSON
├── 4_upload_all_basic_metadata.js ← Upload metadata to decentralized storage (TBD)
├── 5_upload_all_pdfs.js           ← Upload PDFs to decentralized storage (TBD)
├── 9_fund.js                      ← Funding registration or helper functions
├── .env.example                   ← Example environment configuration
└── README.md                      ← This file

✅ How to Use

1. Install dependencies

npm install

2. Set environment variables (optional)

Copy .env.example to .env and fill in any required values (e.g., upload keys for later stages).

3. Run full workflow

node 0_run_workflow.js

for dividing tasks, add --start-page=3 --end-page=4 like this, there are total 883431 pages

node 0_run_workflow.js --start-page=300000 --end-page=400000 --batch-size=10

Or run step-by-step:

◾️ Step 1: Fetch all DOIs (optional)

node 1_fetch_all_dois.js

This fetches DOIs from an API and saves them into doi/page_N.json files.

◾️ Step 2: Download all PDFs

node 2_fetch_all_pdfs.js --start-page=1 --end-page=10

Failed downloads are logged to failed_log_page_N.txt per page.
Already downloaded and valid files are skipped.

◾️ Step 3: Generate basic metadata

node 3_generate_basic_metadata.js

◾️ Step 4 & 5: Upload

node 4_upload_all_basic_metadata.js
node 5_upload_all_pdfs.js

📜 License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📄 SciUploader – Bulk Sci-Hub PDF Downloader

📦 Project Structure

✅ How to Use

1. Install dependencies

2. Set environment variables (optional)

3. Run full workflow

◾️ Step 1: Fetch all DOIs (optional)

◾️ Step 2: Download all PDFs

◾️ Step 3: Generate basic metadata

◾️ Step 4 & 5: Upload

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
queryweb		queryweb
.env.example		.env.example
.gitignore		.gitignore
0_run_workflow.js		0_run_workflow.js
1_fetch_all_dois.js		1_fetch_all_dois.js
2_fetch_all_pdfs.js		2_fetch_all_pdfs.js
3_generate_basic_metadata.js		3_generate_basic_metadata.js
4_upload_all_basic_metadata.js		4_upload_all_basic_metadata.js
5_upload_all_pdfs.js		5_upload_all_pdfs.js
9_fund.js		9_fund.js
README.md		README.md
package.json		package.json

SciVault/sciuploader

Folders and files

Latest commit

History

Repository files navigation

📄 SciUploader – Bulk Sci-Hub PDF Downloader

📦 Project Structure

✅ How to Use

1. Install dependencies

2. Set environment variables (optional)

3. Run full workflow

◾️ Step 1: Fetch all DOIs (optional)

◾️ Step 2: Download all PDFs

◾️ Step 3: Generate basic metadata

◾️ Step 4 & 5: Upload

📜 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages