https://blogs.qub.ac.uk/DIPSA/ParaGrapher
This repository contains the source code of ParaGrapher, an API and library for loading graphs. For futher information about the library please refer to https://blogs.qub.ac.uk/DIPSA/ParaGrapher/ and publications.
Please visit the Wiki or download the PDF file using this link.
- PARAGRAPHER_CSX_WG_400_AP : WebGraphs with 4 Bytes ID per vertex without weights on edges or vertices
- PARAGRAPHER_CSX_WG_800_AP : Big WebGraphs with 8 Bytes ID per vertex without weights on edges or vertices
- PARAGRAPHER_CSX_WG_404_AP : WebGraphs with 4 Bytes ID per vertex and 4 Bytes integer weight per edge and without weights on vertices
gccwith a version greater than 9JDKwith a version greater than 15bc,wget, andunziplibfuse3andlibnumafor usingpg_fuse(optional)
- Run
make download_WG400,make download_WG404, ormake download_WG800
to download and store sample datasets into thetest/datasetsfolder.
-
By commenting
-DNDEBUGin Line 19 of theMakefile, ParaGrapher will output its logs. -
With
make allthe C and Java source codes are compiled and the required WebGraph libraries are downloaded. -
All compiled and downloaded files are stored in the
lib64folder and future calls to the library requires setting thePARAGRAPHER_LIB_FOLDERenvironemnt variable to thelib64folder. -
The
testfolder contains sample codes for different types of graphs. You may pass argumentdatasetto specify the location of the test, e.g.,make test1_deg_dist_WG400 dataset=path/to/dataset. -
In the first access to the graphs in WebGraph format a delay may be experienced for creating two files by the library:
- A WebGraph
.offsetfile is required which is created through a call to the WebGraph framework. - An
_offsets.binfile is created that contains the offsets array of the CSX format but in binary and littel-endian format with 8-Bytes values for each of |V|+1 elements. In case of MS-BioGraphs, the file with nameMS??_offsets.bincan be downloaded and renamed asMS??-underlying_offsets.binto prevent creating.
- A WebGraph
-
ParaGrapher creates shared memory objects (in
/dev/shm) with names starting byparagrapher_for communication between C and Java sides. The files are deleted at the end of a successful exuection. Otherwise, they should be manually deleted usingmake clean-shm-files. -
After calling ParaGrapher, the cached contents of the storage should be dropped using
echo 3 > /proc/sys/vm/drop_cachesor by calling theflushcacheprogram that has the same functionality but with a longer execution time.
To accelerate the loading WebGraphs, ParaGrapher introduces pg_fuse, a custom file system built on top of the
FUSE framework.
To enable pg_fuse, simply pass USE_PG_FUSE as an argument in the args parameter of
the paragrapher_open_graph() function.
For more detailed information, refer to PG-FUSE documentation.
-
The file test/read_bandwidth.c contains a benchmark implemented in C to measure the read bandwidth of storage for (i) different thread numbers, (ii) different block sizes, and (iii) different read methods (read(), pread(), mmap()).
-
The file test/ReadBandwidth.java contains a benchmark implemented in Java to measure the read bandwidth of storage for (i) different thread numbers, (ii) different block sizes, and (iii) different read methods (read(), mmap()). The script test/java-read-bandwidth.sh may be used for changing parameters.
-
The Storage Bandwidth Evaluation shows the execution results of the above programs for three storage types, SSD, HDD, and LustreFS.
@misc{pg_fuse,
title={Accelerating Loading WebGraphs in ParaGrapher},
author={Mohsen {Koohi Esfahani}},
year={2025},
eprint={2507.00716},
archivePrefix={arXiv},
primaryClass={cs.DC},
url={https://arxiv.org/abs/2507.00716},
}
@misc{paragrapher-arxiv,
title = { Selective Parallel Loading of Large-Scale
Compressed Graphs with {ParaGrapher}},
author = { {Mohsen} {Koohi Esfahani} and Marco D'Antonio and
Syed Ibtisam Tauhidi and Thai Son Mai and
Hans Vandierendonck},
year = {2024},
eprint = {2404.19735},
archivePrefix = {arXiv},
primaryClass = {cs.AR},
doi = {10.48550/arXiv.2404.19735},
url={https://arxiv.org/abs/2404.19735},
}
Licensed under the GNU v3 General Public License, as published by the Free Software Foundation. You must not use this Software except in compliance with the terms of the License. Unless required by applicable law or agreed upon in writing, this Software is distributed on an "as is" basis, without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose, neither express nor implied. For details see terms of the License (see attached file: LICENSE).
