Skip to content

Commit 17e0f83

Browse files
authored
Merge pull request #106 from marklogic/feature/625-docs
DEVEXP-625 Reworked docs to use docker-compose
2 parents ff738c1 + 944db8f commit 17e0f83

File tree

9 files changed

+111
-56
lines changed

9 files changed

+111
-56
lines changed

docs/configuration.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ Using this convenience can provide a much more succinct set of options - for exa
5656

5757
```
5858
df = spark.read.format("com.marklogic.spark")\
59-
.option("spark.marklogic.client.uri", "spark-example-user:password@localhost:8020")\
59+
.option("spark.marklogic.client.uri", "spark-example-user:password@localhost:8003")\
6060
.option("spark.marklogic.read.opticQuery", "op.fromView('example', 'employee')")\
6161
.load()
6262
```
@@ -155,4 +155,4 @@ The following options control how rows can be processed with custom code in Mark
155155
| spark.marklogic.write.xquery | XQuery code to execute. |
156156
| spark.marklogic.write.externalVariableName | Name of the external variable in custom code that is populated with row values; defaults to `URI`. |
157157
| spark.marklogic.write.externalVariableDelimiter | Delimiter used when multiple row values are sent in a single call; defaults to a comma. |
158-
| spark.marklogic.write.vars. | Prefix for user-defined variables to be sent to the custom code. |
158+
| spark.marklogic.write.vars. | Prefix for user-defined variables to be sent to the custom code. |

docs/getting-started/pyspark.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ paste the following Python statement into PySpark, adjusting the host and passwo
5252
```
5353
df = spark.read.format("com.marklogic.spark") \
5454
.option("spark.marklogic.client.host", "localhost") \
55-
.option("spark.marklogic.client.port", "8020") \
55+
.option("spark.marklogic.client.port", "8003") \
5656
.option("spark.marklogic.client.username", "spark-example-user") \
5757
.option("spark.marklogic.client.password", "password") \
5858
.option("spark.marklogic.read.opticQuery", "op.fromView('example', 'employee')") \
@@ -64,7 +64,7 @@ client options in one option:
6464

6565
```
6666
df = spark.read.format("com.marklogic.spark") \
67-
.option("spark.marklogic.client.uri", "spark-example-user:password@localhost:8020") \
67+
.option("spark.marklogic.client.uri", "spark-example-user:password@localhost:8003") \
6868
.option("spark.marklogic.read.opticQuery", "op.fromView('example', 'employee')") \
6969
.load()
7070
```
@@ -91,7 +91,7 @@ paste the following into PySpark, adjusting the host and password values as need
9191

9292
```
9393
df.write.format("com.marklogic.spark") \
94-
.option("spark.marklogic.client.uri", "spark-example-user:password@localhost:8020") \
94+
.option("spark.marklogic.client.uri", "spark-example-user:password@localhost:8003") \
9595
.option("spark.marklogic.write.collections", "write-test") \
9696
.option("spark.marklogic.write.permissions", "rest-reader,read,rest-writer,update") \
9797
.option("spark.marklogic.write.uriPrefix", "/write/") \

docs/getting-started/setup.md

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -25,32 +25,32 @@ projects data from documents in MarkLogic into rows.
2525

2626
To facilitate trying out the connector, perform the following steps to deploy an example application to your
2727
MarkLogic server that includes a
28-
[TDE view](https://docs.marklogic.com/guide/app-dev/TDE) and some documents that conform to that view.
28+
[TDE view](https://docs.marklogic.com/guide/app-dev/TDE) and some documents that conform to that view. These instructions depend on
29+
[using Docker](https://docs.docker.com/get-docker/) to install and initialize an instance of MarkLogic. If you already
30+
have an instance of MarkLogic running, you can skip step 4 below, but ensure that the `gradle.properties` file in the
31+
extracted directory contains valid connection properties for your instance of MarkLogic.
2932

3033
1. From [this repository's Releases page](https://github.com/marklogic/marklogic-spark-connector/releases), select
31-
the latest release and download the `marklogic-spark-getting-started-2.0.0.zip` file.
34+
the latest release and download the `marklogic-spark-getting-started-2.1.0.zip` file.
3235
2. Extract the contents of the downloaded zip file.
3336
3. Open a terminal window and go to the directory created by extracting the zip file; the directory should have a
34-
name of "marklogic-spark-getting-started-2.0.0".
35-
4. Create a file named `gradle-local.properties` and add `mlPassword=changeme`, changing the text "changeme" to the
36-
password of your MarkLogic `admin` user.
37-
5. Open the `gradle.properties` file and verify that the value of the `mlPort` property is an available port on the
38-
machine running your MarkLogic server; the default port is 8020.
39-
6. Ensure that the `./gradlew` file is executable; depending on your operating system, you may need to run
37+
name of "marklogic-spark-getting-started-2.1.0".
38+
4. Run `docker-compose up -d` to start an instance of MarkLogic
39+
5. Ensure that the `./gradlew` file is executable; depending on your operating system, you may need to run
4040
`chmod 755 gradlew` to make the file executable.
41-
7. Run `./gradlew -i mlDeploy` to deploy the example application.
41+
6. Run `./gradlew -i mlDeploy` to deploy the example application.
4242

4343
After the deployment finishes, your MarkLogic server will now have the following:
4444

45-
- An app server named `spark-example` listening on port 8020 (or the port you chose if you modified the `mlPort`
46-
property).
47-
- A database named `spark-example-content` that contains 1000 JSON documents in the `employee` collection.
45+
- An app server named `spark-example` listening on port 8003.
46+
- A database named `spark-example-content` that contains 1000 JSON documents in a collection named `employee`.
4847
- A TDE with a schema name of `example` and a view name of `employee`.
49-
- A user named `spark-example-user` that can be used with the Spark connector and [MarkLogic's qconsole tool](https://docs.marklogic.com/guide/qconsole/intro).
48+
- A user named `spark-example-user` with a password of `password` that can be used with the Spark connector and [MarkLogic's qconsole tool](https://docs.marklogic.com/guide/qconsole/intro).
49+
50+
To verify that your application was deployed correctly, access your MarkLogic server's qconsole tool
51+
via <http://localhost:8000/qconsole> . You can authenticate as the `spark-example-user` user that was created above,
52+
as it's generally preferable to test as a non-admin user.
5053

51-
To verify that your application was deployed correctly, access your MarkLogic server's qconsole tool - for example,
52-
if your MarkLogic server is deployed locally, you will go to <http://localhost:8000/qconsole> . You can authenticate as
53-
the `spark-example-user` user that was created above, as it's generally preferable to test as a non-admin user.
5454
After authenticating, perform the following steps:
5555

5656
1. In the "Database" dropdown, select `spark-example-content`.

docs/reading.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ see the next section for more information), and zero or more other options:
2222

2323
```
2424
df = spark.read.format("com.marklogic.spark") \
25-
.option("spark.marklogic.client.uri", "spark-example-user:password@localhost:8020") \
25+
.option("spark.marklogic.client.uri", "spark-example-user:password@localhost:8003") \
2626
.option("spark.marklogic.read.opticQuery", "op.fromView('example', 'employee')") \
2727
.load()
2828
```
@@ -42,7 +42,7 @@ the Optic query):
4242
query = "op.fromView('example', 'employee').where(cts.wordQuery('Drive'))"
4343
4444
df = spark.read.format("com.marklogic.spark") \
45-
.option("spark.marklogic.client.uri", "spark-example-user:password@localhost:8020") \
45+
.option("spark.marklogic.client.uri", "spark-example-user:password@localhost:8003") \
4646
.option("spark.marklogic.read.opticQuery", query) \
4747
.load()
4848
```
@@ -85,7 +85,7 @@ be provided within PySpark; this assumes that you have deployed the application
8585
from pyspark.sql.types import StructField, StructType, StringType
8686
df = spark.read.format("com.marklogic.spark") \
8787
.schema(StructType([StructField("example.employee.GivenName", StringType()), StructField("example.employee.Surname", StringType())])) \
88-
.option("spark.marklogic.client.uri", "spark-example-user:password@localhost:8020") \
88+
.option("spark.marklogic.client.uri", "spark-example-user:password@localhost:8003") \
8989
.option("spark.marklogic.read.opticQuery", "op.fromView('example', 'employee')") \
9090
.load()
9191
```
@@ -105,7 +105,7 @@ op.fromView('example', 'employee', '', joinCol) \
105105
.select('doc')"
106106
107107
df = spark.read.format("com.marklogic.spark") \
108-
.option("spark.marklogic.client.uri", "spark-example-user:password@localhost:8020") \
108+
.option("spark.marklogic.client.uri", "spark-example-user:password@localhost:8003") \
109109
.option("spark.marklogic.read.opticQuery", query) \
110110
.load()
111111
```
@@ -128,7 +128,7 @@ deployed in the [Getting Started with PySpark guide](getting-started/pyspark.md)
128128
```
129129
stream = spark.readStream \
130130
.format("com.marklogic.spark") \
131-
.option("spark.marklogic.client.uri", "spark-example-user:password@localhost:8020") \
131+
.option("spark.marklogic.client.uri", "spark-example-user:password@localhost:8003") \
132132
.option("spark.marklogic.read.numPartitions", 2) \
133133
.option("spark.marklogic.read.opticQuery", "op.fromView('example', 'employee')") \
134134
.load() \
@@ -173,7 +173,7 @@ rows being returned to Spark and far less work having to be done by Spark:
173173

174174
```
175175
spark.read.format("com.marklogic.spark") \
176-
.option("spark.marklogic.client.uri", "spark-example-user:password@localhost:8020") \
176+
.option("spark.marklogic.client.uri", "spark-example-user:password@localhost:8003") \
177177
.option("spark.marklogic.read.opticQuery", "op.fromView('example', 'employee', '')") \
178178
.load() \
179179
.filter("HiredDate < '2020-01-01'") \
@@ -286,7 +286,7 @@ configuring the `spark.marklogic.read.javascript` option:
286286

287287
```
288288
df = spark.read.format("com.marklogic.spark") \
289-
.option("spark.marklogic.client.uri", "spark-example-user:password@localhost:8020") \
289+
.option("spark.marklogic.client.uri", "spark-example-user:password@localhost:8003") \
290290
.option("spark.marklogic.read.javascript", "cts.uris(null, null, cts.collectionQuery('employee'))") \
291291
.load()
292292
```
@@ -296,7 +296,7 @@ Or code can be [written in XQuery](https://docs.marklogic.com/guide/getting-star
296296

297297
```
298298
df = spark.read.format("com.marklogic.spark") \
299-
.option("spark.marklogic.client.uri", "spark-example-user:password@localhost:8020") \
299+
.option("spark.marklogic.client.uri", "spark-example-user:password@localhost:8003") \
300300
.option("spark.marklogic.read.xquery", "cts:uris((), (), cts:collection-query('employee'))") \
301301
.load()
302302
```
@@ -306,7 +306,7 @@ You can also invoke a JavaScript or XQuery module in your application's modules
306306

307307
```
308308
df = spark.read.format("com.marklogic.spark") \
309-
.option("spark.marklogic.client.uri", "spark-example-user:password@localhost:8020") \
309+
.option("spark.marklogic.client.uri", "spark-example-user:password@localhost:8003") \
310310
.option("spark.marklogic.read.invoke", "/read.sjs") \
311311
.load()
312312
```
@@ -326,7 +326,7 @@ JSON objects with columns that conform to the given schema:
326326
```
327327
from pyspark.sql.types import StructField, StructType, IntegerType, StringType
328328
df = spark.read.format("com.marklogic.spark") \
329-
.option("spark.marklogic.client.uri", "spark-example-user:password@localhost:8020") \
329+
.option("spark.marklogic.client.uri", "spark-example-user:password@localhost:8003") \
330330
.option("spark.marklogic.read.invoke", "/read-custom-schema.sjs") \
331331
.schema(StructType([StructField("id", IntegerType()), StructField("name", StringType())])) \
332332
.load()
@@ -343,14 +343,14 @@ The following demonstrates two custom external variables being configured and us
343343

344344
```
345345
df = spark.read.format("com.marklogic.spark") \
346-
.option("spark.marklogic.client.uri", "spark-example-user:password@localhost:8020") \
346+
.option("spark.marklogic.client.uri", "spark-example-user:password@localhost:8003") \
347347
.option("spark.marklogic.read.vars.var1", "Engineering") \
348348
.option("spark.marklogic.read.vars.var2", "Marketing") \
349349
.option("spark.marklogic.read.javascript", "var var1, var2; cts.uris(null, null, cts.wordQuery([var1, var2]))") \
350350
.load()
351351
```
352352

353-
### Streaming with custom code
353+
### Streaming support
354354

355355
Spark's support for [streaming reads](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html)
356356
from MarkLogic can be useful when your custom code for reading data may take a long time to execute. Or, based on the
@@ -384,7 +384,7 @@ sent to the writer, which in this example are then printed to the console:
384384
```
385385
stream = spark.readStream \
386386
.format("com.marklogic.spark") \
387-
.option("spark.marklogic.client.uri", "spark-example-user:password@localhost:8020") \
387+
.option("spark.marklogic.client.uri", "spark-example-user:password@localhost:8003") \
388388
.option("spark.marklogic.read.batchIds.javascript", "xdmp.databaseForests(xdmp.database('spark-example-content'))") \
389389
.option("spark.marklogic.read.javascript", "cts.uris(null, null, cts.collectionQuery('employee'), null, [BATCH_ID]);") \
390390
.load() \

0 commit comments

Comments
 (0)