BDM - Lab 01 - Graph Database using Neo4j

This project is a hands-on exploration of graph databases using Neo4j.
It focuses on modeling, loading, evolving, and querying large-scale research article data, inspired by the DBLP dataset.
The main objectives are:

Modeling research papers, authors, conferences, journals, keywords, and reviews as a property graph.
Loading real-world or synthetic data into Neo4j using Cypher and bulk loading techniques.
Evolving the database model by introducing changes such as reviewer feedback and author affiliations.
Querying the graph using Cypher queries to extract insights like citation counts, author communities, h-indexes, and impact factors.
Applying Graph Algorithms (PageRank, Community Detection, etc.) using the Neo4j Graph Data Science library to analyze graph structures.

The project emphasizes clean data modeling, scalable graph instantiation, and meaningful domain-specific graph analysis.

Contributors:

Environment handling with uv

uv is a new ultra-fast Python package and virtual environment manager from Astral.

Install uv (macOS/Linux)
```
brew install uv
```
After installing, confirm it works:
```
uv --version
```
Initiate the uv project (only at the start of the project) or sync it:
- If there is no pyproject.toml and uv.lock, you should start a new environment:
  
  uv replaces both virtualenv and pip. To create and manage an environment:
```
uv init
```
  To activate it:
```
source .venv/bin/activate
```
- If uv project is already created:
```
uv sync
```
  This command will create a .venv and install all required dependencies shown in pyproject.toml
To install new dependencies or packages:
```
uv add <package-name>==<version>
```
This command will automatically add the new requirement into pyproject.toml and uv.lock and sync your dependencies (install it).
Remove packages
```
uv remove <package-name>
```
This command will automatically remove the requirement from pyproject.toml and uv.lock and sync your dependencies (uninstall it).

DBLP Raw data transformation creation and loading

At the same level of this repository, create a folder called data. Inside data, create another folder called files.
From DBLP website, download the XML raw data in data folder. Only the files dblp.dtd and dblp.xml.gz.
Extract dblp.xml file from dblp.xml.gz.
Clone this repository and execute the following command from the terminal to convert the .xml into .csv format to then preprocess:

python dblp-to-csv/XMLToCSV.py --annotate --neo4j dblp.xml dblp.dtd files/dblp.csv --relations author:authored_by journal:published_in

Database Creation

To create and load the final database:

Make sure the environment is activated (source .venv/bin/activate).
For creating the final databse, execute and follow the instructions inside main.ipynb.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
final_data		final_data
imgs		imgs
part_B		part_B
part_C		part_C
.gitignore		.gitignore
.python-version		.python-version
L1_T01_BDMDS_Romero_Smolin.pdf		L1_T01_BDMDS_Romero_Smolin.pdf
Lab1-Assignment-Statement.pdf		Lab1-Assignment-Statement.pdf
README.md		README.md
main.ipynb		main.ipynb
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BDM - Lab 01 - Graph Database using Neo4j

Table of Contents

Environment handling with uv

DBLP Raw data transformation creation and loading

Database Creation

About

Uh oh!

Releases

Packages

Languages

romj99/big_data_management_01

Folders and files

Latest commit

History

Repository files navigation

BDM - Lab 01 - Graph Database using Neo4j

Table of Contents

Environment handling with uv

DBLP Raw data transformation creation and loading

Database Creation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages