japila-books
diff --git a/‎docs/SparkSession.md
Lines changed: 26 additions & 18 deletions b/‎docs/SparkSession.md
Lines changed: 26 additions & 18 deletions
diff --git a/‎docs/declarative-pipelines/DataflowGraph.md
Lines changed: 29 additions & 0 deletions b/‎docs/declarative-pipelines/DataflowGraph.md
Lines changed: 29 additions & 0 deletions
diff --git a/‎docs/declarative-pipelines/DataflowGraphRegistry.md
Lines changed: 44 additions & 0 deletions b/‎docs/declarative-pipelines/DataflowGraphRegistry.md
Lines changed: 44 additions & 0 deletions
diff --git a/‎docs/declarative-pipelines/DatasetManager.md
Lines changed: 50 additions & 1 deletion b/‎docs/declarative-pipelines/DatasetManager.md
Lines changed: 50 additions & 1 deletion
diff --git a/‎docs/declarative-pipelines/GraphExecution.md
Lines changed: 42 additions & 1 deletion b/‎docs/declarative-pipelines/GraphExecution.md
Lines changed: 42 additions & 1 deletion
@@ -52,7 +52,15 @@ val spark = SparkSession.builder
 * `SparkSession.Builder` is requested to [getOrCreate](SparkSession-Builder.md#getOrCreate)
 * Indirectly using [newSession](#newSession) or [cloneSession](#cloneSession)
 
-## <span id="sessionState"> SessionState
+## StreamingQueryManager { #streams }
+
+```scala
+streams: StreamingQueryManager
+```
+
+`streams` requests this [SessionState](#sessionState) for the [StreamingQueryManager](SessionState.md#streamingQueryManager)
+
+## SessionState { #sessionState }
 
 ```scala
 sessionState: SessionState
@@ -65,7 +73,7 @@ Internally, `sessionState` <<SessionState.md#clone, clones>> the optional <<pare
 * *in-memory* (default) for SessionStateBuilder.md[org.apache.spark.sql.internal.SessionStateBuilder]
 * *hive* for hive/HiveSessionStateBuilder.md[org.apache.spark.sql.hive.HiveSessionStateBuilder]
 
-## <span id="newSession"> Creating New SparkSession
+## Creating New SparkSession { #newSession }
 
 ```scala
 newSession(): SparkSession
@@ -80,7 +88,7 @@ newSession(): SparkSession
 !!! note "SparkSession.newSession and SparkSession.cloneSession"
     `SparkSession.newSession` uses no parent [SessionState](#parentSessionState) while [SparkSession.cloneSession](#cloneSession) (re)uses [SessionState](#sessionState).
 
-## <span id="cloneSession"> Cloning SparkSession
+## Cloning SparkSession { #cloneSession }
 
 ```scala
 cloneSession(): SparkSession
@@ -119,7 +127,7 @@ version: String
 
 Internally, `version` uses `spark.SPARK_VERSION` value that is the `version` property in `spark-version-info.properties` properties file on CLASSPATH.
 
-## <span id="emptyDataset"> Creating Empty Dataset (Given Encoder)
+## Creating Empty Dataset (Given Encoder) { #emptyDataset }
 
 ```scala
 emptyDataset[T: Encoder]: Dataset[T]
@@ -138,7 +146,7 @@ root
 
 `emptyDataset` creates a [LocalRelation](logical-operators/LocalRelation.md) logical operator.
 
-## <span id="createDataset"> Creating Dataset from Local Collections or RDDs
+## Creating Dataset from Local Collections or RDDs { #createDataset }
 
 ```scala
 createDataset[T : Encoder](
@@ -246,7 +254,7 @@ scala> sql("SELECT *, myUpper(value) UPPER FROM strs").show
 
 Internally, it is simply an alias for [SessionState.udfRegistration](SessionState.md#udfRegistration).
 
-## <span id="table"> Loading Data From Table
+## Loading Data From Table { #table }
 
 ```scala
 table(
@@ -282,7 +290,7 @@ catalog: Catalog
 ??? note "lazy value"
     `catalog` is a Scala lazy value which is computed once when accessed and cached afterwards.
 
-## <span id="read"> DataFrameReader
+## DataFrameReader { #read }
 
 ```scala
 read: DataFrameReader
@@ -295,7 +303,7 @@ val spark: SparkSession = ... // create instance
 val dfReader: DataFrameReader = spark.read
 ```
 
-## <span id="conf"> Runtime Configuration
+## Runtime Configuration { #conf }
 
 ```scala
 conf: RuntimeConfig
@@ -305,7 +313,7 @@ conf: RuntimeConfig
 
 Internally, `conf` creates a [RuntimeConfig](RuntimeConfig.md) (when requested the very first time and cached afterwards) with the [SQLConf](SessionState.md#conf) (of the [SessionState](#sessionState)).
 
-## <span id="experimentalMethods"> ExperimentalMethods
+## ExperimentalMethods { #experimentalMethods }
 
 ```scala
 experimental: ExperimentalMethods
@@ -315,7 +323,7 @@ experimental: ExperimentalMethods
 
 `experimental` is used in [SparkPlanner](SparkPlanner.md) and [SparkOptimizer](SparkOptimizer.md).
 
-## <span id="baseRelationToDataFrame"> Create DataFrame for BaseRelation
+## Create DataFrame for BaseRelation { #baseRelationToDataFrame }
 
 ```scala
 baseRelationToDataFrame(
@@ -330,7 +338,7 @@ Internally, `baseRelationToDataFrame` creates a [DataFrame](DataFrame.md) from t
 * `TextInputCSVDataSource` creates a base `Dataset` (of Strings)
 * `TextInputJsonDataSource` creates a base `Dataset` (of Strings)
 
-## <span id="instantiateSessionState"> Creating SessionState
+## Creating SessionState { #instantiateSessionState }
 
 ```scala
 instantiateSessionState(
@@ -348,7 +356,7 @@ Error while instantiating '[className]'
 
 `instantiateSessionState` is used when `SparkSession` is requested for [SessionState](#sessionState) (based on [spark.sql.catalogImplementation](StaticSQLConf.md#spark.sql.catalogImplementation) configuration property).
 
-## <span id="sessionStateClassName"> sessionStateClassName
+## sessionStateClassName { #sessionStateClassName }
 
 ```scala
 sessionStateClassName(
@@ -362,7 +370,7 @@ sessionStateClassName(
 
 `sessionStateClassName` is used when `SparkSession` is requested for the [SessionState](#sessionState) (and one is not available yet).
 
-## <span id="internalCreateDataFrame"> Creating DataFrame From RDD Of Internal Binary Rows and Schema
+## Creating DataFrame From RDD Of Internal Binary Rows and Schema { #internalCreateDataFrame }
 
 ```scala
 internalCreateDataFrame(
@@ -381,31 +389,31 @@ internalCreateDataFrame(
 
 * [InsertIntoDataSourceCommand](logical-operators/InsertIntoDataSourceCommand.md) logical command is executed
 
-## <span id="listenerManager"> ExecutionListenerManager
+## ExecutionListenerManager { #listenerManager }
 
 ```scala
 listenerManager: ExecutionListenerManager
 ```
 
 [ExecutionListenerManager](ExecutionListenerManager.md)
 
-## <span id="sharedState"> SharedState
+## SharedState { #sharedState }
 
 ```scala
 sharedState: SharedState
 ```
 
 [SharedState](SharedState.md)
 
-## <span id="time"> Measuring Duration of Executing Code Block
+## Measuring Duration of Executing Code Block { #time }
 
 ```scala
 time[T](f: => T): T
 ```
 
 `time` executes a code block and prints out (to standard output) the time taken to execute it
 
-## <span id="applyExtensions"> Applying SparkSessionExtensions
+## Applying SparkSessionExtensions { #applyExtensions }
 
 ```scala
 applyExtensions(
@@ -437,7 +445,7 @@ Cannot use [extensionConfClassName] to configure session extensions.
 * `SparkSession.Builder` is requested to [get active or create a new SparkSession instance](SparkSession-Builder.md#getOrCreate)
 * `SparkSession` is [created](#creating-instance) (from a `SparkContext`)
 
-## <span id="leafNodeDefaultParallelism"> Default Parallelism of Leaf Nodes
+## Default Parallelism of Leaf Nodes { #leafNodeDefaultParallelism }
 
 ```scala
 leafNodeDefaultParallelism: Int
 
@@ -28,3 +28,32 @@ reanalyzeFlow(
 
 * `BatchTableWrite` is requested to [executeInternal](BatchTableWrite.md#executeInternal)
 * `StreamingTableWrite` is requested to [startStream](StreamingTableWrite.md#startStream)
+
+## Resolve { #resolve }
+
+```scala
+resolve(): DataflowGraph
+```
+
+`resolve`...FIXME
+
+---
+
+`resolve` is used when:
+
+* `DataflowGraph` is requested to [reanalyzeFlow](#reanalyzeFlow)
+* `PipelineExecution` is requested to [initializeGraph](PipelineExecution.md#initializeGraph)
+
+## Validate { #validate }
+
+```scala
+validate(): DataflowGraph
+```
+
+`validate`...FIXME
+
+---
+
+`validate` is used when:
+
+* `PipelineExecution` is requested to [initialize the dataflow graph](PipelineExecution.md#initializeGraph)
@@ -1,5 +1,7 @@
 # DataflowGraphRegistry
 
+`DataflowGraphRegistry` is a registry of [Dataflow Graphs](#dataflowGraphs).
+
 !!! note "Scala object"
     `DataflowGraphRegistry` is an `object` in Scala which means it is a class that has exactly one instance (itself).
     A Scala `object` is created lazily when it is referenced for the first time.
@@ -17,6 +19,14 @@ val graphId = DataflowGraphRegistry.createDataflowGraph(
   defaultSqlConf=Map.empty)
 ```
 
+## Dataflow Graphs { #dataflowGraphs }
+
+```scala
+dataflowGraphs: ConcurrentHashMap[String, GraphRegistrationContext]
+```
+
+`DataflowGraphRegistry` creates an empty collection of [GraphRegistrationContext](GraphRegistrationContext.md)s by their UUIDs.
+
 ## createDataflowGraph { #createDataflowGraph }
 
 ```scala
@@ -33,3 +43,37 @@ createDataflowGraph(
 `createDataflowGraph` is used when:
 
 * `PipelinesHandler` ([Spark Connect]({{ book.spark_connect }})) is requested to [createDataflowGraph](PipelinesHandler.md#createDataflowGraph)
+
+## Find Dataflow Graph (or Throw SparkException) { #getDataflowGraphOrThrow }
+
+```scala
+getDataflowGraphOrThrow(
+  dataflowGraphId: String): GraphRegistrationContext
+```
+
+`getDataflowGraphOrThrow` [looks up the GraphRegistrationContext](#getDataflowGraph) for the given `dataflowGraphId` or throws an `SparkException` if it does not exist.
+
+```text
+Dataflow graph with id [graphId] could not be found
+```
+
+---
+
+`getDataflowGraphOrThrow` is used when:
+
+* `PipelinesHandler` ([Spark Connect]({{ book.spark_connect }})) is requested to [defineDataset](PipelinesHandler.md#defineDataset), [defineFlow](PipelinesHandler.md#defineFlow), [defineSqlGraphElements](PipelinesHandler.md#defineSqlGraphElements), [startRun](PipelinesHandler.md#startRun)
+
+## Find Dataflow Graph { #getDataflowGraph }
+
+```scala
+getDataflowGraph(
+  graphId: String): Option[GraphRegistrationContext]
+```
+
+`getDataflowGraph` finds the [GraphRegistrationContext](GraphRegistrationContext.md) for the given `graphId` (in this [dataflowGraphs](#dataflowGraphs) registry).
+
+---
+
+`getDataflowGraph` is used when:
+
+* `DataflowGraphRegistry` is requested to [getDataflowGraphOrThrow](#getDataflowGraphOrThrow)
@@ -1,3 +1,52 @@
 # DatasetManager
 
-`DatasetManager` is...FIXME
+!!! note "Scala object"
+    `DatasetManager` is an `object` in Scala which means it is a class that has exactly one instance (itself).
+    A Scala `object` is created lazily when it is referenced for the first time.
+
+    Learn more in [Tour of Scala](https://docs.scala-lang.org/tour/singleton-objects.html).
+
+## materializeDatasets { #materializeDatasets }
+
+```scala
+materializeDatasets(
+  resolvedDataflowGraph: DataflowGraph,
+  context: PipelineUpdateContext): DataflowGraph
+```
+
+`materializeDatasets`...FIXME
+
+---
+
+`materializeDatasets` is used when:
+
+* `PipelineExecution` is requested to [initialize the dataflow graph](PipelineExecution.md#initializeGraph)
+
+## constructFullRefreshSet { #constructFullRefreshSet }
+
+```scala
+constructFullRefreshSet(
+  graphTables: Seq[Table],
+  context: PipelineUpdateContext): (Seq[Table], Seq[TableIdentifier], Seq[TableIdentifier])
+```
+
+`constructFullRefreshSet` gives the following collections:
+
+* [Table](Table.md)s to be refreshed (incl. a full refresh)
+* `TableIdentifier`s of the tables to be refreshed (excl. fully refreshed)
+* `TableIdentifier`s of the tables to be fully refreshed only
+
+If there are tables to be fully refreshed yet not allowed for a full refresh, `constructFullRefreshSet` prints out the following INFO message to the logs:
+
+```text
+Skipping full refresh on some tables because pipelines.reset.allowed was set to false.
+Tables: [fullRefreshNotAllowed]
+```
+
+`constructFullRefreshSet`...FIXME
+
+---
+
+`constructFullRefreshSet` is used when:
+
+* `PipelineExecution` is requested to [initialize the dataflow graph](PipelineExecution.md#initializeGraph)
@@ -2,7 +2,21 @@
 
 `GraphExecution` is an [abstraction](#contract) of [graph executors](#implementations) that can...FIXME
 
-## Contract
+## Contract (Subset) { #contract }
+
+### awaitCompletion { #awaitCompletion }
+
+```scala
+awaitCompletion(): Unit
+```
+
+See:
+
+* [TriggeredGraphExecution](TriggeredGraphExecution.md#awaitCompletion)
+
+Used when:
+
+* `PipelineExecution` is requested to [await completion](PipelineExecution.md#awaitCompletion)
 
 ### streamTrigger { #streamTrigger }
 
@@ -11,6 +25,10 @@ streamTrigger(
   flow: Flow): Trigger
 ```
 
+See:
+
+* [TriggeredGraphExecution](TriggeredGraphExecution.md#streamTrigger)
+
 Used when:
 
 * `GraphExecution` is [created](#creating-instance) (to create the [FlowPlanner](#flowPlanner))
@@ -66,3 +84,26 @@ planAndStartFlow(
 `planAndStartFlow` is used when:
 
 * `TriggeredGraphExecution` is requested to [topologicalExecution](TriggeredGraphExecution.md#topologicalExecution)
+
+## StreamListener { #streamListener }
+
+`GraphExecution` creates a new [StreamListener](StreamListener.md) when [created](#creating-instance).
+
+The `StreamListener` is created for this [PipelineUpdateContext](#env) and [DataflowGraph](#graphForExecution).
+
+The `StreamListener` is registered (_added_) to the session-bound [StreamingQueryManager](../SparkSession.md#streams) when [started](#start), and deregistered (_removed_) when [stopped](#stop).
+
+## Stop { #stop }
+
+```scala
+stop(): Unit
+```
+
+`stop` requests this session-bound [StreamingQueryManager](../SparkSession.md#streams) to remove this [StreamListener](#streamListener).
+
+---
+
+`stop` is used when:
+
+* `PipelineExecution` is requested to [stop the pipeline](PipelineExecution.md#stopPipeline)
+* `TriggeredGraphExecution` is requested to [create the Topological Execution thread](TriggeredGraphExecution.md#buildTopologicalExecutionThread) and [stopInternal](TriggeredGraphExecution.md#stopInternal)