Skip to content

Commit b66aeaa

Browse files
committed
Reorganized docs and renamed example project
Main change to the docs was to turn "Getting Started" into a section with multiple child pages. And now that the example project isn't specific to PySpark but rather to "getting started", it's now called "getting-started". And renamed the project application from "pyspark-example" to "spark-example".
1 parent 4d09e57 commit b66aeaa

27 files changed

+270
-262
lines changed

.github/workflows/jekyll-gh-pages.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ name: Deploy Jekyll with GitHub Pages dependencies preinstalled
33

44
on:
55
# Runs on pushes targeting the default branch.
6-
# TODO This will be changed to master before the 1.0 release.
6+
# TODO This will be changed to master before the 2.0 release.
77
push:
88
branches: ["develop"]
99

CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ This will produce a single jar file for the connector in the `./build/libs` dire
8585

8686
You can then launch PySpark with the connector available via:
8787

88-
pyspark --jars build/libs/marklogic-spark-connector-1.0-SNAPSHOT.jar
88+
pyspark --jars build/libs/marklogic-spark-connector-2.0-SNAPSHOT.jar
8989

9090
The below command is an example of loading data from the test application deployed via the instructions at the top of
9191
this page.

build.gradle

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ plugins {
77
}
88

99
group 'com.marklogic'
10-
version '1.0-SNAPSHOT'
10+
version '2.0-SNAPSHOT'
1111

1212
java {
1313
sourceCompatibility = 1.8

caddy/config/Caddyfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
}
1111
}
1212

13-
# For the examples/pyspark application.
13+
# For the examples/getting-started application.
1414
:8820 {
1515
reverse_proxy bootstrap_3n.local:8020 node2.local:8020 node3.local:8020 {
1616
# Required for MLCP to work; "ip_hash" also works.

docker-compose.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ services:
2020
- 8015:8815
2121
# For running performance tests against quick-table data
2222
- 8009:8809
23-
# For the pyspark-example project
23+
# For the getting-started project
2424
- 8020:8820
2525
networks:
2626
- external_net

docs/configuration.md

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
---
22
layout: default
33
title: Configuration Reference
4-
nav_order: 7
4+
nav_order: 5
55
---
66

77
The MarkLogic Spark connector has 3 sets of configuration options - connection options, reading options, and writing
88
options. Each set of options is defined in a separate table below.
99

10-
# Connection options
10+
## Connection options
1111

1212
These options define how the connector connects and authenticates with MarkLogic.
1313

@@ -30,7 +30,7 @@ These options define how the connector connects and authenticates with MarkLogic
3030
| spark.marklogic.client.sslHostnameVerifier | Either `any`, `common`, or `strict`. |
3131
| spark.marklogic.client.uri | Shortcut for setting the host, port, username, and password when using `basic` or `digest` authentication. See below for more information. |
3232

33-
## Connecting with a client URI
33+
### Connecting with a client URI
3434

3535
The `spark.marklogic.client.uri` is a convenience for the common case of using `basic` or `digest` authentication.
3636
It allows you to specify username, password, host, and port via the following syntax:
@@ -50,7 +50,7 @@ Using this convenience can provide a much more succinct set of options - for exa
5050

5151
```
5252
df = spark.read.format("com.marklogic.spark")\
53-
.option("spark.marklogic.client.uri", "pyspark-example-user:password@localhost:8020")\
53+
.option("spark.marklogic.client.uri", "spark-example-user:password@localhost:8020")\
5454
.option("spark.marklogic.read.opticDsl", "op.fromView('example', 'employee')")\
5555
.load()
5656
```
@@ -59,7 +59,7 @@ Note that if the username or password contain either a `@` or a `:` character, y
5959
[percent encoding](https://developer.mozilla.org/en-US/docs/Glossary/percent-encoding) into the correct character
6060
triplet. For example, a password of `sp@r:k` must appear in the `spark.marklogic.client.uri` string as `sp%40r%3Ak`.
6161

62-
# Read options
62+
## Read options
6363

6464
These options control how the connector reads data from MarkLogic. See [the guide on reading](reading.md) for more
6565
information on how data is read from MarkLogic.
@@ -70,9 +70,7 @@ information on how data is read from MarkLogic.
7070
| spark.marklogic.read.numPartitions | The number of Spark partitions to create; defaults to `spark.default.parallelism` . |
7171
| spark.marklogic.read.batchSize | Approximate number of rows to retrieve in each call to MarkLogic; defaults to 10000. |
7272

73-
## Schema support
74-
75-
# Write options
73+
## Write options
7674

7775
These options control how the connector writes data to MarkLogic. See [the guide on writing](writing.md) for more
7876
information on how data is written to MarkLogic.

docs/getting-started-pyspark.md

Lines changed: 0 additions & 159 deletions
This file was deleted.
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
---
2+
layout: default
3+
title: Getting Started
4+
nav_order: 2
5+
has_children: true
6+
permalink: /docs/getting-started
7+
---
8+
9+
This guide provides instructions on using the MarkLogic Spark connector with multiple popular Spark environments.
10+
Before trying the connector in any of these environments, please [follow the instructions in the Setup guide](setup.md)
11+
to obtain the connector and deploy an example application to MarkLogic.
12+
13+
For environments not yet listed here, the process for using the connector will typically involve determining how a
14+
connector JAR file is included in the Spark environment. Please see the documentation for your specific Spark
15+
environment for instructions on how to accomplish that goal.

docs/getting-started-jupyter.md renamed to docs/getting-started/jupyter.md

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,31 @@
11
---
22
layout: default
3-
title: Getting Started with Jupyter
4-
nav_order: 4
3+
title: Jupyter
4+
parent: Getting Started
5+
nav_order: 3
56
---
67

78
[Project Jupyter](https://jupyter.org/) provides a set of tools for working with notebooks, code, and data. The
89
MarkLogic Spark connector can be easily integrated into these tools to allow users to access and analyze data in
910
MarkLogic.
1011

12+
Before going further, be sure you've followed the instructions in the [setup guide](setup.md) for
13+
obtaining the connector and deploying an example application to MarkLogic.
14+
15+
## Install Jupyter
16+
1117
To get started, install either [JupyterLab or Jupyter Notebook](https://jupyter.org/install). Both of these tools
1218
allow you to work with the connector in the same fashion. The rest of this guide will assume the use of Jupyter
1319
Notebook, though the instructions will work for JupyterLab as well.
1420

1521
Once you have installed, started, and accessed Jupyter Notebook in your web browser - in a default Notebook
1622
installation, you should be able to access it at http://localhost:8889/tree - click on "New" in the upper right hand
17-
corner of the Notebook interface and select "Python 3 (ipykernel)" to create a new notebook.
23+
corner of the Notebook interface and select "Python 3 (ipykernel)" to create a new notebook.
1824

19-
In the first cell in the notebook, enter the following to allow Jupyter Notebook to access the MarkLogic Spark connector
20-
and also to initialize Spark:
25+
## Using the connector
26+
27+
In the first cell in the notebook created above, enter the following to allow Jupyter Notebook to access the MarkLogic
28+
Spark connector and also to initialize Spark:
2129

2230
```
2331
import os
@@ -33,8 +41,7 @@ The path of `/path/to/marklogic-spark-connector-2.0.0.jar` should be changed to
3341
jar on your filesystem. You are free to customize the `spark` variable in any manner you see fit as well.
3442

3543
Now that you have an initialized Spark session, you can run any of the examples found in the
36-
[Getting Started with PySpark](getting-started-pyspark.md) guide.
37-
44+
[guide for using PySpark](pyspark.md).
3845

3946

4047

0 commit comments

Comments
 (0)