Merge pull request #46 from marklogic/feature/466-jupyter

rjrudin · web-flow · commit d91a92f266d8 · 2023-05-26T09:26:35.000-07:00
DEVEXP-466 Added guide for using Jupyter
diff --git a/docs/configuration.md b/docs/configuration.md
@@ -1,7 +1,7 @@
 ---
 layout: default
 title: Configuration Reference
-nav_order: 6
+nav_order: 7
 ---
 
 The MarkLogic Spark connector has 3 sets of configuration options - connection options, reading options, and writing 
diff --git a/docs/getting-started-jupyter.md b/docs/getting-started-jupyter.md
@@ -0,0 +1,40 @@
+---
+layout: default
+title: Getting Started with Jupyter
+nav_order: 4
+---
+
+[Project Jupyter](https://jupyter.org/) provides a set of tools for working with notebooks, code, and data. The 
+MarkLogic Spark connector can be easily integrated into these tools to allow users to access and analyze data in 
+MarkLogic. 
+
+To get started, install either [JupyterLab or Jupyter Notebook](https://jupyter.org/install). Both of these tools
+allow you to work with the connector in the same fashion. The rest of this guide will assume the use of Jupyter 
+Notebook, though the instructions will work for JupyterLab as well. 
+
+Once you have installed, started, and accessed Jupyter Notebook in your web browser - in a default Notebook 
+installation, you should be able to access it at http://localhost:8889/tree - click on "New" in the upper right hand 
+corner of the Notebook interface and select "Python 3 (ipykernel)" to create a new notebook. 
+
+In the first cell in the notebook, enter the following to allow Jupyter Notebook to access the MarkLogic Spark connector
+and also to initialize Spark:
+
+```
+import os
+os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars "/path/to/marklogic-spark-connector-2.0.0.jar" pyspark-shell'
+
+from pyspark.sql import SparkSession
+spark = SparkSession.builder.master("local[*]").appName('My Notebook').getOrCreate()
+spark.sparkContext.setLogLevel("WARN")
+spark
+```
+
+The path of `/path/to/marklogic-spark-connector-2.0.0.jar` should be changed to match the location of the connector 
+jar on your filesystem. You are free to customize the `spark` variable in any manner you see fit as well. 
+
+Now that you have an initialized Spark session, you can run any of the examples found in the 
+[Getting Started with PySpark](getting-started-pyspark.md) guide. 
+
+
+
+
diff --git a/docs/reading.md b/docs/reading.md
@@ -1,7 +1,7 @@
 ---
 layout: default
 title: Reading Data
-nav_order: 4
+nav_order: 5
 ---
 
 The MarkLogic Spark connector allows for data to be retrieved from MarkLogic as rows via an 
diff --git a/docs/writing.md b/docs/writing.md
@@ -1,7 +1,7 @@
 ---
 layout: default
 title: Writing Data
-nav_order: 5
+nav_order: 6
 ---
 
 The MarkLogic Spark connector allows for writing rows in a Spark DataFrame to MarkLogic as documents. The sections below