You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/reading-data/optic.md
+27-17Lines changed: 27 additions & 17 deletions
Original file line number
Diff line number
Diff line change
@@ -59,27 +59,32 @@ query expansion via [a thesaurus](https://docs.marklogic.com/guide/search-dev/th
59
59
60
60
## Optic query requirements
61
61
62
-
As of the 2.0.0 release of the connector, the Optic query must use the
63
-
[op.fromView](https://docs.marklogic.com/op.fromView) accessor function. Future releases of both the connector and
64
-
MarkLogic will strive to relax this requirement.
65
-
66
-
In addition, calls to `groupBy`, `orderBy`, `limit`, and `offset` should be performed via Spark instead of within
67
-
the initial Optic query. A key benefit of Spark and the MarkLogic connector is the ability to execute the query in
68
-
parallel via multiple Spark partitions. The aforementioned calls, if made in the Optic query, may not produce the
69
-
expected results if more than one Spark partition is used or if more than one request is made to MarkLogic. The
70
-
equivalent Spark operations should be called instead, or the connector should be configured to make a single request
71
-
to MarkLogic. See the "Pushing down operations" and "Tuning performance" sections below for more information.
72
-
73
-
Finally, the query must adhere to the handful of limitations imposed by the
62
+
**Starting with the 2.5.0 release**, an Optic query can use any
63
+
[data access function](https://docs.marklogic.com/guide/app-dev/OpticAPI#id_66011) with one caveat - only Optic
64
+
queries that use `op.fromView` can be partitioned into multiple calls to MarkLogic. Optic queries that use any other
65
+
data access function have the following constraints:
66
+
67
+
1. The connector will execute the query in a single call to MarkLogic. You will therefore need to ensure that the
68
+
call can complete without timing out.
69
+
2. The connector requires that the MarkLogic user have the necessary privileges to invoke the
70
+
[MarkLogic eval endpoint](https://docs.marklogic.com/REST/POST/v1/eval) along with the `xdmp-invoke` privilege.
71
+
72
+
**Prior to the 2.5.0 release**, the Optic query must use the
73
+
[op.fromView](https://docs.marklogic.com/op.fromView) accessor function. In addition, calls to `groupBy`, `orderBy`, `limit`, and `offset` should be
74
+
performed via Spark instead of within the initial Optic query. As the connector will partition `op.fromView` queries
75
+
into multiple calls to MarkLogic, the aforementioned calls will likely not produce the expected results when more
76
+
than one request is made to MarkLogic. See the "Pushing down operations" and "Tuning performance" sections below for
77
+
more information.
78
+
79
+
Finally, regardless of the Optic data access function you use, the query must adhere to the handful of limitations imposed by the
74
80
[Optic Query DSL](https://docs.marklogic.com/guide/app-dev/OpticAPI#id_46710). A good practice in validating a
75
81
query is to run it in your [MarkLogic server's qconsole tool](https://docs.marklogic.com/guide/qconsole) in a buffer
76
82
with a query type of "Optic DSL".
77
83
78
84
## Schema inference
79
85
80
-
The connector will infer a Spark schema automatically based on the view identified by `op.fromView` in
81
-
the Optic query. Each column returned by your Optic query will be mapped to a Spark schema column with the
82
-
same name and an appropriate type.
86
+
The connector will infer a Spark schema automatically based your Optic query. Each column returned by your Optic query
87
+
will be mapped to a Spark schema column with the same name and an appropriate type.
83
88
84
89
You may override this feature and provide your own schema instead. The example below shows how a custom schema can
85
90
be provided within PySpark; this assumes that you have deployed the application in the
@@ -97,8 +102,9 @@ df.show()
97
102
98
103
## Accessing documents
99
104
100
-
While the connector requires that an Optic query use `op.fromView` as its accessor function, documents can still be
101
-
retrieved via the [Optic functions for joining documents](https://docs.marklogic.com/guide/app-dev/OpticAPI#id_78437).
105
+
If your Optic query uses the `op.fromView` access function, documents can still be
106
+
retrieved via the [Optic functions for joining documents](https://docs.marklogic.com/guide/app-dev/OpticAPI#id_78437). Starting with the 2.5.0 release, you can simply use
107
+
`op.fromSearchDocs` instead, but only if your query can be executed in a single call to MarkLogic without timing out.
102
108
103
109
For example, the following query will find all matching rows and then retrieve the documents and URIs associated with
104
110
those rows:
@@ -216,6 +222,10 @@ correct result, please [file an issue with this project](https://github.com/mark
216
222
217
223
## Tuning performance
218
224
225
+
If you are using the 2.5.0 connector or later along with an Optic query that does not use the `op.fromView` data
226
+
access function, you can ignore this section. The performance of your query will be strictly based on the Optic query
227
+
itself, which the connector does not impact.
228
+
219
229
The primary factor affecting connector performance when reading rows is how many requests are made to MarkLogic. In
220
230
general, performance will be best when minimizing the number of requests to MarkLogic while ensuring that no single
221
231
request attempts to return or process too much data.
0 commit comments