Skip to content

Commit fa33c84

Browse files
authored
Merge pull request #154 from marklogic/feature/batchSize-tweak
Increasing default batch size for reading documents
2 parents df896c7 + 6578be7 commit fa33c84

File tree

2 files changed

+7
-2
lines changed

2 files changed

+7
-2
lines changed

docs/reading-data/documents.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -187,7 +187,9 @@ doc['Department']
187187
The connector mimics the behavior of the [MarkLogic Data Movement SDK](https://docs.marklogic.com/guide/java/data-movement)
188188
by creating a Spark partition per forest in the database associated with your REST API app server. Each partition reader
189189
will return all matching documents from its associated forest. The option `spark.marklogic.read.batchSize` controls how
190-
many documents will be returned in each call to MarkLogic; its value defaults to 100.
190+
many documents will be returned in each call to MarkLogic; its value defaults to 500. For smaller documents,
191+
particularly those with 10 elements or fewer, you may find a batch size of 1,000 or even 10,000 to provide better
192+
performance.
191193

192194
The `spark.marklogic.read.numPartitions` option does not impact performance when reading document rows, as 1 partition
193195
is always created for each forest. It is not possible for 2 or more partition readers to read from the same forest.

src/main/java/com/marklogic/spark/reader/document/DocumentContext.java

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,9 @@ int getBatchSize() {
7171
throw new ConnectorException(message);
7272
}
7373
}
74-
return 100;
74+
// Testing has shown that at least for smaller documents, 100 or 200 can be significantly slower than something
75+
// like 1000 or even 10000. 500 is thus used as a default that should still be reasonably performant for larger
76+
// documents.
77+
return 500;
7578
}
7679
}

0 commit comments

Comments
 (0)