Skip to content

SCAI-Foundation/irysuploader

 
 

Repository files navigation

📄 SciUploader – Bulk Sci-Hub PDF Downloader

This tool automates the batch download of academic papers from Sci-Hub using DOIs and organizes the PDFs for further metadata processing and decentralized storage Irys.


📦 Project Structure

sciuploader/
├── doi/                            ← Each page_N.json contains a list of DOIs
├── pdf/                            ← Downloaded PDFs organized by page
├── 0_run_workflow.js              ← Run full workflow script
├── 1_fetch_all_dois.js            ← Fetch DOI list from external source
├── 2_fetch_all_pdfs.js            ← Download PDFs using DOI list
├── 3_generate_basic_metadata.js   ← Generate basic metadata JSON
├── 4_upload_all_basic_metadata.js ← Upload metadata to decentralized storage (TBD)
├── 5_upload_all_pdfs.js           ← Upload PDFs to decentralized storage (TBD)
├── 9_fund.js                      ← Funding registration or helper functions
├── .env.example                   ← Example environment configuration
└── README.md                      ← This file

add run_uploader.sh to run the workflow

curl -sSL https://raw.githubusercontent.com/SCAI-Foundation/irysuploader/main/run_uploader.sh | bash -s -- --start-page=300000 --end-page=400000 -- --start-page=500000 --end-page=600000

✅ How to Use

1. Install dependencies

npm install

2. Set environment variables (optional)

Copy .env.example to .env and fill in any required values (e.g., upload keys for later stages).


3. Run full workflow

node 0_run_workflow.js

for dividing tasks, add --start-page=3 --end-page=4 like this, there are total 883431 pages

node 0_run_workflow.js --start-page=300000 --end-page=400000

Or run step-by-step:


◾️ Step 1: Fetch all DOIs (optional)

node 1_fetch_all_dois.js

This fetches DOIs from an API and saves them into doi/page_N.json files.


◾️ Step 2: Download all PDFs

node 2_fetch_all_pdfs.js --start-page=1 --end-page=10
  • Failed downloads are logged to failed_log_page_N.txt per page.
  • Already downloaded and valid files are skipped.

◾️ Step 3: Generate basic metadata

node 3_generate_basic_metadata.js

◾️ Step 4 & 5: Upload

node 4_upload_all_basic_metadata.js
node 5_upload_all_pdfs.js

📜 License

MIT

About

A decentralized academic paper repository system built on Irys

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 83.8%
  • Shell 10.3%
  • CSS 4.4%
  • HTML 1.5%