This project is a bibliometric data analyst that collects, processes, and structures information from various academic sources. It extracts bibliographic data (titles, authors, journals, years, etc.) using web scraping, then formats it into structured outputs like BibTeX for academic use.
Additionally, the collected data is unified from multiple datasets, sorted using sorting algorithms, and the performance of these algorithms is analyzed and visualized through plots.
- Web Scraping: Extracts article information from academic databases.
- JavaScript Handling: Uses Selenium for dynamic content scraping.
- Data Cleaning: Standardizes and removes duplicates using Pandas.
- Sorting & Performance Analysis: Orders data using sorting algorithms and visualizes their efficiency.
- Export Formats: Saves data in RIS and BibTeX formats for citation management.
- Logging & Error Handling: Implements logging mechanisms to track scraping status.
- Python 3.13.2
selenium
– Handling JavaScript-loaded pagespandas
– Cleaning and processing datanumpy
– Efficient numerical operationsbibtexparser
&rispy
– Exporting data in academic formatsdotenv
– Managing environment variablesmatplotlib
– Plotting sorting algorithm performance.
-
Clone the Repository
git clone https://github.com/esteban2505J/Bibliometric-analysis-system.git cd Bibliometric-analysis-system
-
Create a Virtual Environment (Optional but recommended)
python -m venv venv source venv/bin/activate # On macOS/Linux venv\Scripts\activate # On Windows
-
Install Dependencies
pip install -r requirements.txt
-
Run the Scraper
python main.py
-
Configure Environment Variables
- Create an
.env
file and add the “EMAIL” and “PASSWORD” credentials used in science direct (you need an academic account).
- Create an
-
Output Files
- Data is saved in the
output/
directory in BibTeX format.
- Data is saved in the
This project is licensed under the MIT License.