Neo4j CSV export missing proper headers for neo4j-admin import (no :ID, :START_ID, :END_ID)

I generated BI dataset CSVs using the ldbc_snb_datagen_spark repo. Data generation succeeded, but the CSVs cannot be directly imported into Neo4j using neo4j-admin database import because the required Neo4j-style header annotations (:ID, :START_ID, :END_ID) are missing.

🔄 Steps to Reproduce

Clone repo and install dependencies:

git clone https://github.com/ldbc/ldbc_snb_datagen_spark.git
cd ldbc_snb_datagen_spark/scripts
./install-dependencies.sh


Build:

./build.sh


Get Spark:

./get-spark-to-home.sh


Generate BI dataset:

cd ~/ldbc_snb_datagen_spark/tools
./run.py -- --format csv --format-options raw=true,labels=true,header=true,quoteAll=true --scale-factor 1 --mode bi


Data is generated under:

~/ldbc_snb_datagen_spark/tools/out/graphs/csv/bi/composite-merged-fk/


Try Neo4j import:

$NEO4J_HOME/bin/neo4j-admin database import full neo4j \
  --overwrite-destination \
  --delimiter="|" \
  --nodes=Person=~/ldbc_snb_datagen_spark/tools/out/graphs/csv/bi/composite-merged-fk/initial_snapshot/dynamic/Person/*.csv \
  --relationships=KNOWS=~/ldbc_snb_datagen_spark/tools/out/graphs/csv/bi/composite-merged-fk/initial_snapshot/dynamic/Person_knows_Person/*.csv


Import fails with error:

Caused by: org.neo4j.internal.batchimport.input.HeaderException: 
Missing header of type START_ID, among entries [creationDate, Person1Id, Person2Id]

✅ Expected Behavior

Generated CSV files for the Neo4j import format should include proper headers, e.g.:

Node file (Person):

PersonId:ID|firstName|string|lastName|string|...


Relationship file (KNOWS):

Person1Id:START_ID(Person)|Person2Id:END_ID(Person)|creationDate:long


So that users can directly import with neo4j-admin without manually creating header files.

❌ Actual Behavior

The generated CSVs contain plain column names only, e.g.:

Person_knows_Person:

creationDate|Person1Id|Person2Id


Person:

PersonId|firstName|lastName|...


These cannot be parsed by neo4j-admin import.

🔧 Environment

Repo: ldbc_snb_datagen_spark

Commit: <insert commit hash>

Neo4j version: 5.26.0

JDK: 21 (runtime), switched to 11 for benchmark driver

OS: Ubuntu 22.04 ARM64 (AWS Graviton)

🙏 Request

Is this a bug in the CSV export logic for --format csv with --mode bi?

Or is the expectation that users must provide Neo4j-specific header files manually?

If headers are intentionally excluded, could the docs clarify the steps needed to adapt the CSVs for Neo4j import?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Neo4j CSV export missing proper headers for neo4j-admin import (no :ID, :START_ID, :END_ID) #442

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Neo4j CSV export missing proper headers for neo4j-admin import (no :ID, :START_ID, :END_ID) #442

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions