Skip to content

This project focuses on implementing automated bibliometric analysis algorithms to process scientific publications from multiple research databases. It enables data extraction, standardization, and deduplication, providing structured outputs in formats like BibTeX, RIS, and CSV.

Notifications You must be signed in to change notification settings

esteban2505J/Biliometric-analysis-system

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bibliometric Data Scraper

📌 Project Overview

This project is a bibliometric data analyst that collects, processes, and structures information from various academic sources. It extracts bibliographic data (titles, authors, journals, years, etc.) using web scraping, then formats it into structured outputs like BibTeX for academic use.

Additionally, the collected data is unified from multiple datasets, sorted using sorting algorithms, and the performance of these algorithms is analyzed and visualized through plots.

🚀 Features

  • Web Scraping: Extracts article information from academic databases.
  • JavaScript Handling: Uses Selenium for dynamic content scraping.
  • Data Cleaning: Standardizes and removes duplicates using Pandas.
  • Sorting & Performance Analysis: Orders data using sorting algorithms and visualizes their efficiency.
  • Export Formats: Saves data in RIS and BibTeX formats for citation management.
  • Logging & Error Handling: Implements logging mechanisms to track scraping status.

🛠️ Technologies Used

  • Python 3.13.2
  • selenium – Handling JavaScript-loaded pages
  • pandas – Cleaning and processing data
  • numpy – Efficient numerical operations
  • bibtexparser & rispy – Exporting data in academic formats
  • dotenv – Managing environment variables
  • matplotlib – Plotting sorting algorithm performance.

📥 Installation

  1. Clone the Repository

    git clone https://github.com/esteban2505J/Bibliometric-analysis-system.git
    cd Bibliometric-analysis-system
  2. Create a Virtual Environment (Optional but recommended)

    python -m venv venv
    source venv/bin/activate  # On macOS/Linux
    venv\Scripts\activate    # On Windows
  3. Install Dependencies

    pip install -r requirements.txt

🔧 Usage

  1. Run the Scraper

    python main.py
  2. Configure Environment Variables

    • Create an .env file and add the “EMAIL” and “PASSWORD” credentials used in science direct (you need an academic account).
  3. Output Files

    • Data is saved in the output/ directory in BibTeX format.

📄 License

This project is licensed under the MIT License.

About

This project focuses on implementing automated bibliometric analysis algorithms to process scientific publications from multiple research databases. It enables data extraction, standardization, and deduplication, providing structured outputs in formats like BibTeX, RIS, and CSV.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •