Skip to content

Neo4j CSV export missing proper headers for neo4j-admin import (no :ID, :START_ID, :END_ID) #442

@mohakgoel1

Description

@mohakgoel1

I generated BI dataset CSVs using the ldbc_snb_datagen_spark repo. Data generation succeeded, but the CSVs cannot be directly imported into Neo4j using neo4j-admin database import because the required Neo4j-style header annotations (:ID, :START_ID, :END_ID) are missing.

🔄 Steps to Reproduce

Clone repo and install dependencies:

git clone https://github.com/ldbc/ldbc_snb_datagen_spark.git
cd ldbc_snb_datagen_spark/scripts
./install-dependencies.sh

Build:

./build.sh

Get Spark:

./get-spark-to-home.sh

Generate BI dataset:

cd ~/ldbc_snb_datagen_spark/tools
./run.py -- --format csv --format-options raw=true,labels=true,header=true,quoteAll=true --scale-factor 1 --mode bi

Data is generated under:

~/ldbc_snb_datagen_spark/tools/out/graphs/csv/bi/composite-merged-fk/

Try Neo4j import:

$NEO4J_HOME/bin/neo4j-admin database import full neo4j
--overwrite-destination
--delimiter="|"
--nodes=Person=/ldbc_snb_datagen_spark/tools/out/graphs/csv/bi/composite-merged-fk/initial_snapshot/dynamic/Person/*.csv
--relationships=KNOWS=
/ldbc_snb_datagen_spark/tools/out/graphs/csv/bi/composite-merged-fk/initial_snapshot/dynamic/Person_knows_Person/*.csv

Import fails with error:

Caused by: org.neo4j.internal.batchimport.input.HeaderException:
Missing header of type START_ID, among entries [creationDate, Person1Id, Person2Id]

✅ Expected Behavior

Generated CSV files for the Neo4j import format should include proper headers, e.g.:

Node file (Person):

PersonId:ID|firstName|string|lastName|string|...

Relationship file (KNOWS):

Person1Id:START_ID(Person)|Person2Id:END_ID(Person)|creationDate:long

So that users can directly import with neo4j-admin without manually creating header files.

❌ Actual Behavior

The generated CSVs contain plain column names only, e.g.:

Person_knows_Person:

creationDate|Person1Id|Person2Id

Person:

PersonId|firstName|lastName|...

These cannot be parsed by neo4j-admin import.

🔧 Environment

Repo: ldbc_snb_datagen_spark

Commit:

Neo4j version: 5.26.0

JDK: 21 (runtime), switched to 11 for benchmark driver

OS: Ubuntu 22.04 ARM64 (AWS Graviton)

🙏 Request

Is this a bug in the CSV export logic for --format csv with --mode bi?

Or is the expectation that users must provide Neo4j-specific header files manually?

If headers are intentionally excluded, could the docs clarify the steps needed to adapt the CSVs for Neo4j import?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions