You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**in-memory* (default) for SessionStateBuilder.md[org.apache.spark.sql.internal.SessionStateBuilder]
66
74
**hive* for hive/HiveSessionStateBuilder.md[org.apache.spark.sql.hive.HiveSessionStateBuilder]
67
75
68
-
## <spanid="newSession"> Creating New SparkSession
76
+
## Creating New SparkSession { #newSession }
69
77
70
78
```scala
71
79
newSession():SparkSession
@@ -80,7 +88,7 @@ newSession(): SparkSession
80
88
!!! note "SparkSession.newSession and SparkSession.cloneSession"
81
89
`SparkSession.newSession` uses no parent [SessionState](#parentSessionState) while [SparkSession.cloneSession](#cloneSession) (re)uses [SessionState](#sessionState).
82
90
83
-
## <spanid="cloneSession"> Cloning SparkSession
91
+
## Cloning SparkSession { #cloneSession }
84
92
85
93
```scala
86
94
cloneSession():SparkSession
@@ -119,7 +127,7 @@ version: String
119
127
120
128
Internally, `version` uses `spark.SPARK_VERSION` value that is the `version` property in `spark-version-info.properties` properties file on CLASSPATH.
Internally, `conf` creates a [RuntimeConfig](RuntimeConfig.md) (when requested the very first time and cached afterwards) with the [SQLConf](SessionState.md#conf) (of the [SessionState](#sessionState)).
@@ -348,7 +356,7 @@ Error while instantiating '[className]'
348
356
349
357
`instantiateSessionState` is used when `SparkSession` is requested for [SessionState](#sessionState) (based on [spark.sql.catalogImplementation](StaticSQLConf.md#spark.sql.catalogImplementation) configuration property).
`getDataflowGraphOrThrow`[looks up the GraphRegistrationContext](#getDataflowGraph) for the given `dataflowGraphId` or throws an `SparkException` if it does not exist.
55
+
56
+
```text
57
+
Dataflow graph with id [graphId] could not be found
58
+
```
59
+
60
+
---
61
+
62
+
`getDataflowGraphOrThrow` is used when:
63
+
64
+
*`PipelinesHandler` ([Spark Connect]({{ book.spark_connect }})) is requested to [defineDataset](PipelinesHandler.md#defineDataset), [defineFlow](PipelinesHandler.md#defineFlow), [defineSqlGraphElements](PipelinesHandler.md#defineSqlGraphElements), [startRun](PipelinesHandler.md#startRun)
65
+
66
+
## Find Dataflow Graph { #getDataflowGraph }
67
+
68
+
```scala
69
+
getDataflowGraph(
70
+
graphId: String):Option[GraphRegistrationContext]
71
+
```
72
+
73
+
`getDataflowGraph` finds the [GraphRegistrationContext](GraphRegistrationContext.md) for the given `graphId` (in this [dataflowGraphs](#dataflowGraphs) registry).
74
+
75
+
---
76
+
77
+
`getDataflowGraph` is used when:
78
+
79
+
*`DataflowGraphRegistry` is requested to [getDataflowGraphOrThrow](#getDataflowGraphOrThrow)
`constructFullRefreshSet` gives the following collections:
34
+
35
+
*[Table](Table.md)s to be refreshed (incl. a full refresh)
36
+
*`TableIdentifier`s of the tables to be refreshed (excl. fully refreshed)
37
+
*`TableIdentifier`s of the tables to be fully refreshed only
38
+
39
+
If there are tables to be fully refreshed yet not allowed for a full refresh, `constructFullRefreshSet` prints out the following INFO message to the logs:
40
+
41
+
```text
42
+
Skipping full refresh on some tables because pipelines.reset.allowed was set to false.
43
+
Tables: [fullRefreshNotAllowed]
44
+
```
45
+
46
+
`constructFullRefreshSet`...FIXME
47
+
48
+
---
49
+
50
+
`constructFullRefreshSet` is used when:
51
+
52
+
*`PipelineExecution` is requested to [initialize the dataflow graph](PipelineExecution.md#initializeGraph)
*`GraphExecution` is [created](#creating-instance) (to create the [FlowPlanner](#flowPlanner))
@@ -66,3 +84,26 @@ planAndStartFlow(
66
84
`planAndStartFlow` is used when:
67
85
68
86
*`TriggeredGraphExecution` is requested to [topologicalExecution](TriggeredGraphExecution.md#topologicalExecution)
87
+
88
+
## StreamListener { #streamListener }
89
+
90
+
`GraphExecution` creates a new [StreamListener](StreamListener.md) when [created](#creating-instance).
91
+
92
+
The `StreamListener` is created for this [PipelineUpdateContext](#env) and [DataflowGraph](#graphForExecution).
93
+
94
+
The `StreamListener` is registered (_added_) to the session-bound [StreamingQueryManager](../SparkSession.md#streams) when [started](#start), and deregistered (_removed_) when [stopped](#stop).
95
+
96
+
## Stop { #stop }
97
+
98
+
```scala
99
+
stop():Unit
100
+
```
101
+
102
+
`stop` requests this session-bound [StreamingQueryManager](../SparkSession.md#streams) to remove this [StreamListener](#streamListener).
103
+
104
+
---
105
+
106
+
`stop` is used when:
107
+
108
+
*`PipelineExecution` is requested to [stop the pipeline](PipelineExecution.md#stopPipeline)
109
+
*`TriggeredGraphExecution` is requested to [create the Topological Execution thread](TriggeredGraphExecution.md#buildTopologicalExecutionThread) and [stopInternal](TriggeredGraphExecution.md#stopInternal)
0 commit comments