This repository provides a comprehensive implementation of the PageRank algorithm using the Power Method. It is organized into four deliverables that cover problem definition, mathematical formulation, code implementation, and experimental validation.
pagerank-power-method/
├── 01. Deliverable 1 - Problem Definition & Background/
│ └── 01. Deliverable 1 - Problem Definition & Background.pdf
├── 02. Deliverable 2 - Mathematical Formulation & Algorithm/
│ └── 02. Deliverable 2 - Mathematical Formulation & Algorithm.pdf
├── 03. Deliverable 3 - Code Implementation & Documentation/
│ └── pagerank-power-method/ # Code modules
├── 04. Deliverable 4 - Experiments & Validation/
│ ├── run_toy_experiments.py
│ ├── run_real_experiments.py
│ ├── run_sensitivity.py
│ └── plots/ # Generated figures
├── LICENSE # MIT License
├── requirements.txt # Python dependencies
└── venv/ # Python virtual environment (excluded from version control)
-
Deliverable 1 – Problem Definition & Background
- Describes the PageRank algorithm from a discrete mathematics perspective.
- Includes problem statement, motivation, literature review, and theoretical foundations (graph theory and Markov chains).
-
Deliverable 2 – Mathematical Formulation & Algorithm
- Formalizes PageRank as an eigenvector problem of the Google matrix.
- Defines the link matrix, teleportation adjustment, and damping factor.
- Presents the Power Method pseudocode, convergence criteria, and complexity analysis.
-
Deliverable 3 – Code Implementation & Documentation
-
Implements a modular Python codebase for PageRank computation.
-
Core modules:
graph_loader.py
– Load graphs from edge-list or adjacency-list formats.matrix_builder.py
– Build the sparse transition matrix and teleportation vector.power_method.py
– Compute PageRank via iterative power iterations.utils.py
– Helper routines (vector normalization, residual computation, plotting).run_pagerank.py
– Command-line script to run PageRank on a given graph file.
-
-
Deliverable 4 – Experiments & Validation
- Evaluates convergence on toy graphs and a real-world network (Zachary’s Karate Club).
- Performs sensitivity analysis with respect to the damping factor (α).
- Generates tables and plots illustrating residuals, iteration counts, and runtime.
-
Clone the repository
git clone https://github.com/amr-yasser226/pagerank-power-method.git cd pagerank-power-method
-
Create and activate a Python virtual environment
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
Compute PageRank on an edge-list or adjacency-list file:
python run_pagerank.py <input_path> [--format edgelist|adjlist] [--alpha ALPHA] [--tol TOL] [--max_iter MAX_ITER]
<input_path>
: Path to the graph file.--format
(-f
): File format (edgelist
oradjlist
, default:edgelist
).--alpha
(-a
): Damping factor (default: 0.85).--tol
(-t
): Convergence tolerance on L1 residual (default: 1e-6).--max_iter
(-m
): Maximum number of iterations (default: 100).
# Compute PageRank on an edge-list
python run_pagerank.py data/web_graph.txt --alpha 0.85 --tol 1e-8
Deliverable 4 includes three scripts under 04. Deliverable 4 - Experiments & Validation/
:
run_toy_experiments.py
– Convergence on predefined toy graphs.run_real_experiments.py
– Validation on Zachary’s Karate Club network.run_sensitivity.py
– Sensitivity analysis for different α values.
Run each script directly:
python "04. Deliverable 4 - Experiments & Validation/run_toy_experiments.py"
python "04. Deliverable 4 - Experiments & Validation/run_real_experiments.py"
python "04. Deliverable 4 - Experiments & Validation/run_sensitivity.py"
Plots will be saved under 04. Deliverable 4 - Experiments & Validation/plots/
.
-
Functions:
load_edge_list(path: str, directed: bool = True) -> List[Tuple[int, int]]
load_adjacency_list(path: str) -> Dict[int, List[int]]
graph_to_edge_list(graph: nx.Graph) -> List[Tuple[int, int]]
-
Behavior: Reads graph definitions, supports comments/blank lines, and validates node IDs.
- Function:
build_matrix(edges: List[Tuple[int, int]], alpha: float = 0.85) -> Tuple[csr_matrix, np.ndarray]
- Behavior: Constructs a column-stochastic sparse transition matrix (handling dangling nodes) and a uniform teleportation vector.
- Function:
compute_pagerank(P: csr_matrix, v: np.ndarray, alpha: float = 0.85, tol: float = 1e-6, max_iter: int = 100) -> Tuple[np.ndarray, List[float], int, float]
- Behavior: Iteratively applies the Google operator, normalizes the rank vector, and tracks L1 residuals until convergence or max iterations.
-
Functions:
normalize_vector(x: np.ndarray) -> np.ndarray
compute_residual(prev: np.ndarray, curr: np.ndarray) -> float
plot_residuals(history: List[float], title: str, path: Optional[str] = None) -> None
-
Behavior: Provides vector normalization, residual computation, and residual-plotting utilities.
This project is licensed under the MIT License. See the LICENSE file for details.
For questions or contributions, please open an issue or pull request on GitHub.