File tree Expand file tree Collapse file tree 1 file changed +18
-0
lines changed
docs/reading-data/reading-files Expand file tree Collapse file tree 1 file changed +18
-0
lines changed Original file line number Diff line number Diff line change @@ -44,6 +44,24 @@ The connector also supports the following
44
44
- Use ` recursiveFileLookup ` to include files in child directories.
45
45
- Use ` modifiedBefore ` and ` modifiedAfter ` to select files based on their modification time.
46
46
47
+ ## Reading and writing large binary files
48
+
49
+ The 2.3.2 connector introduces a fix for reading and writing large binary files to MarkLogic, allowing for the contents
50
+ of each file to be streamed from its source to MarkLogic. This avoids an issue where the Spark environment runs out
51
+ of memory while trying to fit the contents of a file into an in-memory row.
52
+
53
+ To enable this, include the following in the set of options passed to your reader:
54
+
55
+ .option("spark.marklogic.files.stream", "true")
56
+
57
+ As a result of this option, the ` content ` column in each row will not contain the contents of the file. Instead,
58
+ it will contain a serialized object intended to be used during the write phase to read the contents of the file as a
59
+ stream.
60
+
61
+ Files read from the MarkLogic Spark connector with the above option can then be written as documents to MarkLogic
62
+ with the same option above being passed to the writer. The connector will then stream the contents of each file to
63
+ MarkLogic, submitting one request to MarkLogic per document.
64
+
47
65
## Reading any file
48
66
49
67
If you wish to read files without any special handling provided by the connector, you can use the
You can’t perform that action at this time.
0 commit comments