Skip to content

Commit fe76f56

Browse files
authored
Merge pull request #172 from marklogic/feature/docs
Added mention of a load balancer to the docs
2 parents c02f5fb + 5ef6559 commit fe76f56

File tree

4 files changed

+26
-1
lines changed

4 files changed

+26
-1
lines changed

docs/index.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,4 +16,7 @@ The connector has the following system requirements:
1616
* For writing data, MarkLogic 9.0-9 or higher.
1717
* For reading data, MarkLogic 10.0-9 or higher.
1818

19+
In addition, if your MarkLogic cluster has multiple hosts in it, it is highly recommended to put a load balancer in front
20+
of your cluster and have the MarkLogic Spark connector connect through the load balancer.
21+
1922
Please see the [Getting Started guide](getting-started/getting-started.md) to begin using the connector.

docs/reading-data/documents.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -252,9 +252,17 @@ with more partition readers and a higher batch size.
252252
You can also adjust the level of parallelism by controlling how many threads Spark uses for executing partition reads.
253253
Please see your Spark distribution's documentation for further information.
254254

255+
### Using a load balancer
256+
257+
If your MarkLogic cluster has multiple hosts, it is highly recommended to put a load balancer in front
258+
of your cluster and have the connector connect through the load balancer. A typical load balancer will help ensure
259+
not only that load is spread across the hosts in your cluster, but that any network or connection failures can be
260+
retried without the error propagating to the connector.
261+
255262
### Direct connections to hosts
256263

257-
If your Spark program is able to connect to each host in your MarkLogic cluster, you can set the
264+
If you do not have a load balancer in front of your MarkLogic cluster, and your Spark program is able to connect to
265+
each host in your MarkLogic cluster, you can set the
258266
`spark.marklogic.client.connectionType` option to `direct`. Each partition reader will then connect to the
259267
host on which the reader's assigned forest resides. This will typically improve performance by reducing the network
260268
traffic, as the host that receives a request will not need to involve any other host in the processing of that request.

docs/reading-data/optic.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -257,6 +257,13 @@ The effectiveness of this approach can be evaluated by executing the Optic query
257257
[MarkLogic's qconsole application](https://docs.marklogic.com/guide/qconsole/intro), which will execute the query in
258258
a single request as well.
259259

260+
### Using a load balancer
261+
262+
If your MarkLogic cluster has multiple hosts, it is highly recommended to put a load balancer in front
263+
of your cluster and have the connector connect through the load balancer. A typical load balancer will help ensure
264+
not only that load is spread across the hosts in your cluster, but that any network or connection failures can be
265+
retried without the error propagating to the connector.
266+
260267
### More detail on partitions
261268

262269
This section is solely informational and is not required understanding for using the connector

docs/writing.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -233,6 +233,13 @@ The rule of thumb above can thus be expressed as:
233233

234234
Number of partitions * Value of spark.marklogic.write.threadCount <= Number of hosts * number of app server threads
235235

236+
### Using a load balancer
237+
238+
If your MarkLogic cluster has multiple hosts, it is highly recommended to put a load balancer in front
239+
of your cluster and have the connector connect through the load balancer. A typical load balancer will help ensure
240+
not only that load is spread across the hosts in your cluster, but that any network or connection failures can be
241+
retried without the error propagating to the connector.
242+
236243
### Error handling
237244

238245
The connector may throw an error during one of two phases of operation - before it begins to write data to MarkLogic,

0 commit comments

Comments
 (0)