Merge pull request #172 from marklogic/feature/docs

rjrudin · web-flow · commit fe76f56befa2 · 2024-02-21T16:27:10.000-05:00
Added mention of a load balancer to the docs
diff --git a/docs/index.md b/docs/index.md
@@ -16,4 +16,7 @@ The connector has the following system requirements:
 * For writing data, MarkLogic 9.0-9 or higher.
 * For reading data, MarkLogic 10.0-9 or higher.
 
+In addition, if your MarkLogic cluster has multiple hosts in it, it is highly recommended to put a load balancer in front
+of your cluster and have the MarkLogic Spark connector connect through the load balancer. 
+
 Please see the [Getting Started guide](getting-started/getting-started.md) to begin using the connector. 
diff --git a/docs/reading-data/documents.md b/docs/reading-data/documents.md
@@ -252,9 +252,17 @@ with more partition readers and a higher batch size.
 You can also adjust the level of parallelism by controlling how many threads Spark uses for executing partition reads. 
 Please see your Spark distribution's documentation for further information.
 
+### Using a load balancer
+
+If your MarkLogic cluster has multiple hosts, it is highly recommended to put a load balancer in front
+of your cluster and have the connector connect through the load balancer. A typical load balancer will help ensure 
+not only that load is spread across the hosts in your cluster, but that any network or connection failures can be 
+retried without the error propagating to the connector. 
+
 ### Direct connections to hosts
 
-If your Spark program is able to connect to each host in your MarkLogic cluster, you can set the
+If you do not have a load balancer in front of your MarkLogic cluster, and your Spark program is able to connect to 
+each host in your MarkLogic cluster, you can set the
 `spark.marklogic.client.connectionType` option to `direct`. Each partition reader will then connect to the
 host on which the reader's assigned forest resides. This will typically improve performance by reducing the network
 traffic, as the host that receives a request will not need to involve any other host in the processing of that request.
diff --git a/docs/reading-data/optic.md b/docs/reading-data/optic.md
@@ -257,6 +257,13 @@ The effectiveness of this approach can be evaluated by executing the Optic query
 [MarkLogic's qconsole application](https://docs.marklogic.com/guide/qconsole/intro), which will execute the query in
 a single request as well.
 
+### Using a load balancer
+
+If your MarkLogic cluster has multiple hosts, it is highly recommended to put a load balancer in front
+of your cluster and have the connector connect through the load balancer. A typical load balancer will help ensure
+not only that load is spread across the hosts in your cluster, but that any network or connection failures can be
+retried without the error propagating to the connector.
+
 ### More detail on partitions
 
 This section is solely informational and is not required understanding for using the connector
diff --git a/docs/writing.md b/docs/writing.md
@@ -233,6 +233,13 @@ The rule of thumb above can thus be expressed as:
 
     Number of partitions * Value of spark.marklogic.write.threadCount <= Number of hosts * number of app server threads
 
+### Using a load balancer
+
+If your MarkLogic cluster has multiple hosts, it is highly recommended to put a load balancer in front
+of your cluster and have the connector connect through the load balancer. A typical load balancer will help ensure
+not only that load is spread across the hosts in your cluster, but that any network or connection failures can be
+retried without the error propagating to the connector.
+
 ### Error handling
 
 The connector may throw an error during one of two phases of operation - before it begins to write data to MarkLogic,