Skip to content

Commit 2f94fbc

Browse files
committed
pulling in docs changes
1 parent 54bb9e9 commit 2f94fbc

21 files changed

+954
-684
lines changed

docs/source/index.rst

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -20,15 +20,15 @@ Oracle Accelerated Data Science (ADS)
2020
.. toctree::
2121
:hidden:
2222
:maxdepth: 5
23-
:caption: Getting Started:
23+
:caption: Getting Started
2424

2525
release_notes
2626
user_guide/quick_start/quick_start
2727

2828
.. toctree::
2929
:hidden:
3030
:maxdepth: 5
31-
:caption: Installation and Configuration:
31+
:caption: Installation and Configuration
3232

3333
user_guide/cli/quickstart
3434
user_guide/cli/authentication
@@ -38,19 +38,19 @@ Oracle Accelerated Data Science (ADS)
3838
.. toctree::
3939
:hidden:
4040
:maxdepth: 5
41-
:caption: Low-Code AI Operators:
41+
:caption: Low-Code AI Operators
4242

4343
user_guide/operators/index
44-
user_guide/operators/common/index
45-
user_guide/operators/forecasting_operator/index
44+
user_guide/operators/forecast_operator/index
4645
user_guide/operators/anomaly_detection_operator/index
4746
user_guide/operators/pii_operator/index
4847
user_guide/operators/recommender_operator/index
48+
user_guide/operators/common/index
4949

5050
.. toctree::
5151
:hidden:
5252
:maxdepth: 5
53-
:caption: Tasks:
53+
:caption: Tasks
5454

5555
user_guide/loading_data/connect
5656
user_guide/data_labeling/index
@@ -62,7 +62,7 @@ Oracle Accelerated Data Science (ADS)
6262
.. toctree::
6363
:hidden:
6464
:maxdepth: 5
65-
:caption: Integrations:
65+
:caption: Integrations
6666

6767
user_guide/apachespark/spark
6868
user_guide/big_data_service/index
@@ -76,7 +76,7 @@ Oracle Accelerated Data Science (ADS)
7676
.. toctree::
7777
:hidden:
7878
:maxdepth: 5
79-
:caption: Classes:
79+
:caption: Classes
8080

8181
modules
8282

docs/source/release_notes.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,7 @@ Release date: March 20, 2024
146146
Release date: February 7, 2024
147147

148148
* Releasing v1 of the Anomaly Detection Operator! The Anomaly Detection Operator is a no-code Anomaly or Outlier Detection solution through the OCI Data Science Platform. It uses dozens of models from Oracle’s own proprietary research and the best of open source. See the ``Anomaly Detection`` Section of the ``AI Operators`` tab for full details (:doc:`link <./user_guide/operators/anomaly_detection_operator/index>`).
149-
* Releasing a new version of the Forecast Operator. This release has faster explainability, improved support for reading from databases, upgrades to the automatic reporting, improved parallelization across all models, and an ability to save models for deferred inference. See the ``Forecast`` Section of the ``AI Operators`` tab for full details (:doc:`link <./user_guide/operators/forecasting_operator/index>`).
149+
* Releasing a new version of the Forecast Operator. This release has faster explainability, improved support for reading from databases, upgrades to the automatic reporting, improved parallelization across all models, and an ability to save models for deferred inference. See the ``Forecast`` Section of the ``AI Operators`` tab for full details (:doc:`link <./user_guide/operators/forecast_operator/index>`).
150150
* Change to the default signer such that it now defaults to ``resource_prinicpal`` on any OCI Data Science resource (for example, jobs, notebooks, model deployments, dataflow).
151151

152152
2.10.0

docs/source/user_guide/operators/common/index.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
===============
2-
Getting Started
3-
===============
1+
====================
2+
More About Operators
3+
====================
44

55
Welcome to the world of operators! Getting started with operators is a breeze, and this section will guide you through the process step by step. Whether you're a seasoned data scientist or a newcomer, you'll find that harnessing the power of operators is both accessible and rewarding.
66

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
=================
2+
Data Integration
3+
=================
4+
5+
Supported Data Sources
6+
----------------------
7+
8+
The Operator can read data from the following sources:
9+
10+
- Oracle RDBMS
11+
- OCI Object Storage
12+
- OCI Data Lake
13+
- HTTPS
14+
- S3
15+
- Azure Blob Storage
16+
- Google Cloud Storage
17+
- Local file systems
18+
19+
Additionally, the operator supports any data source supported by `fsspec <https://filesystem-spec.readthedocs.io/en/latest/_modules/fsspec/registry.html>`_.
20+
21+
Examples
22+
--------
23+
24+
Reading from OCI Object Storage
25+
===============================
26+
27+
Below is an example of reading data from OCI Object Storage using the operator:
28+
29+
.. code-block:: yaml
30+
31+
kind: operator
32+
type: forecast
33+
version: v1
34+
spec:
35+
datetime_column:
36+
name: ds
37+
historical_data:
38+
url: oci://<bucket_name>@<namespace_name>/example_yosemite_temps.csv
39+
horizon: 3
40+
target_column: y
41+
42+
Reading from Oracle Database
43+
============================
44+
45+
Below is an example of reading data from an Oracle Database:
46+
47+
.. code-block:: yaml
48+
49+
kind: operator
50+
type: forecast
51+
version: v1
52+
spec:
53+
historical_data:
54+
connect_args:
55+
user: XXX
56+
password: YYY
57+
dsn: "localhost/orclpdb"
58+
sql: 'SELECT Store_ID, Sales, Date FROM live_data'
59+
datetime_column:
60+
name: ds
61+
horizon: 1
62+
target_column: y
63+
64+
65+
Data Preprocessing
66+
------------------
67+
68+
The forecasting operator simplifies powerful data preprocessing. By default, it includes several preprocessing steps to ensure dataset compliance with each framework. However, users can disable one or more of these steps if needed, though doing so may cause the model to fail. Proceed with caution.
69+
70+
Default preprocessing steps:
71+
- Missing value imputation
72+
- Outlier treatment
73+
74+
To disable ``outlier_treatment``, modify the YAML file as shown below:
75+
76+
.. code-block:: yaml
77+
78+
kind: operator
79+
type: forecast
80+
version: v1
81+
spec:
82+
datetime_column:
83+
name: ds
84+
historical_data:
85+
url: https://raw.githubusercontent.com/facebook/prophet/main/examples/example_yosemite_temps.csv
86+
horizon: 3
87+
target_column: y
88+
preprocessing:
89+
enabled: true
90+
steps:
91+
missing_value_imputation: True
92+
outlier_treatment: False
93+
94+
95+
Real-Time Trigger
96+
-----------------
97+
98+
The Operator can be run locally or on an OCI Data Science Job. The resultant model can be saved and deployed for future use if needed. For questions regarding this integration, please reach out to the OCI Data Science team.
Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
============
2+
Development
3+
============
4+
5+
Data Formatting
6+
---------------
7+
8+
Datetime Column
9+
===============
10+
11+
Operators read data in "long" format, which requires a datetime column with a constant frequency (e.g., daily, quarterly, hourly). The operator will attempt to infer the datetime format, but if it's ambiguous, users can specify the format explicitly in the ``format`` field of ``datetime_column`` as shown below:
12+
13+
.. code-block:: yaml
14+
15+
kind: operator
16+
type: forecast
17+
version: v1
18+
spec:
19+
datetime_column:
20+
name: ds
21+
format: "%Y-%m-%d"
22+
historical_data:
23+
url: oci://<bucket_name>@<namespace_name>/example_yosemite_temps.csv
24+
horizon: 3
25+
target_column: y
26+
27+
Target Category Columns
28+
=======================
29+
30+
A target category column, or series column, is optional. Use this field when you have multiple related forecasts over the same time period, such as predicting sales across different stores, forecasting system failures across multiple sensors, or forecasting different line items of a financial statement. The ``target_category_columns`` is a list of column names, though typically it contains just one. If a ``target_category_columns`` is specified in the ``historical_data``, it must also be present across all time periods in the ``additional_data``. Below is an example dataset and corresponding YAML:
31+
32+
Example Dataset:
33+
34+
======= ======== ========
35+
Product Qtr Sales
36+
======= ======== ========
37+
A 01-2024 $7,500
38+
B 01-2024 $4,500
39+
C 01-2024 $8,500
40+
A 04-2024 $9,500
41+
B 04-2024 $6,500
42+
C 04-2024 $9,500
43+
======= ======== ========
44+
45+
YAML Configuration:
46+
47+
.. code-block:: yaml
48+
49+
kind: operator
50+
type: forecast
51+
version: v1
52+
spec:
53+
datetime_column:
54+
name: Qtr
55+
format: "%m-%Y"
56+
historical_data:
57+
url: historical_data.csv
58+
target_category_columns:
59+
- Product
60+
horizon: 1
61+
target_column: Sales
62+
63+
Additional Data
64+
===============
65+
66+
Additional data enables multivariate forecasts and must adhere to similar formatting rules as historical data:
67+
68+
- It must include a datetime column with identical formatting to the historical data.
69+
- If a target category column is present in the historical data, it must also be present in the additional data.
70+
- The additional data must cover the entire forecast horizon.
71+
72+
Continuing with the previous example, for a horizon of 1, the additional data would look like this:
73+
74+
======= ======== ======== ===================
75+
Product Qtr Promotion Competitor Release
76+
======= ======== ======== ===================
77+
A 01-2024 0 0
78+
B 01-2024 0 1
79+
C 01-2024 1 1
80+
A 04-2024 1 1
81+
B 04-2024 0 0
82+
C 04-2024 0 0
83+
A 07-2024 0 0
84+
B 07-2024 0 0
85+
C 07-2024 0 0
86+
======= ======== ======== ===================
87+
88+
Corresponding YAML Configuration:
89+
90+
.. code-block:: yaml
91+
92+
kind: operator
93+
type: forecast
94+
version: v1
95+
spec:
96+
datetime_column:
97+
name: Qtr
98+
format: "%m-%Y"
99+
historical_data:
100+
url: data.csv
101+
additional_data:
102+
url: additional_data.csv
103+
target_category_columns:
104+
- Product
105+
horizon: 1
106+
target_column: Sales
107+
108+
Output Directory
109+
================
110+
111+
Before running operators on a job, users must configure their output directory. By default, results are output locally to a new folder named "results". This can be customized as shown below:
112+
113+
.. code-block:: yaml
114+
115+
kind: operator
116+
type: forecast
117+
version: v1
118+
spec:
119+
datetime_column:
120+
name: ds
121+
historical_data:
122+
url: oci://<bucket_name>@<namespace_name>/example_yosemite_temps.csv
123+
output_directory:
124+
url: oci://<bucket_name>@<namespace_name>/my_results/
125+
horizon: 3
126+
target_column: y
127+
128+
Ingesting and Interpreting Outputs
129+
==================================
130+
131+
The forecasting operator generates several output files: ``forecast.csv``, ``metrics.csv``, ``local_explanations.csv``, ``global_explanations.csv``, and ``report.html``.
132+
133+
We will review each of these output files in turn.
134+
135+
**forecast.csv**
136+
137+
This file contains the entire historical dataset with the following columns:
138+
139+
- **Series**: Categorical or numerical index
140+
- **Date**: Time series data
141+
- **Real values**: Target values from historical data
142+
- **Fitted values**: Model predictions on historical data
143+
- **Forecasted values**: Predictions for the forecast horizon
144+
- **Upper and lower bounds**: Confidence intervals for predictions (based on the specified confidence interval width in the YAML file)
145+
146+
**report.html**
147+
148+
The ``report.html`` file is customized for each model type. Generally, it contains a summary of the historical and additional data, plots of target values overlaid with fitted and forecasted values, analysis of the models used, and details about the model components. It also includes a "receipt" YAML file, providing a detailed version of the original ``forecast.yaml``.
149+
150+
**metrics.csv**
151+
152+
This file includes relevant metrics calculated on the training set.
153+
154+
**Global and Local Explanations in Forecasting Models**
155+
156+
Understanding the predictions and the driving factors behind them is crucial in forecasting models. Global and local explanations offer insights at different levels of granularity.
157+
158+
**Global Explanations:**
159+
160+
Global explanations provide a high-level overview of how a forecasting model operates across the entire dataset. Key aspects include:
161+
162+
1. **Feature Importance**: Identifies and ranks variables based on their contribution to the model's predictions.
163+
2. **Model Structure**: Reveals the architecture, algorithms, parameters, and hyperparameters used in the model.
164+
3. **Trends and Patterns**: Highlights broad trends and patterns captured by the model, such as seasonality and long-term trends.
165+
4. **Assumptions and Constraints**: Uncovers underlying assumptions or constraints of the model.
166+
167+
**Local Explanations:**
168+
169+
Local explanations focus on specific data points or subsets, offering detailed insights into why the model made particular predictions. Key aspects include:
170+
171+
1. **Instance-specific Insights**: Provides details on how individual features contributed to a specific prediction.
172+
2. **Contextual Understanding**: Considers the unique characteristics of the data point in question.
173+
3. **Model Variability**: Shows the model's sensitivity to changes in input variables.
174+
4. **Decision Boundaries**: In classification problems, explains the factors influencing specific classification outcomes.
175+
176+
Global explanations offer a broad understanding of the model, while local explanations provide detailed insights at the individual data point level.
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
====
2+
FAQs
3+
====
4+
5+
**How can I learn more about AutoMLX?**
6+
7+
For more details, refer to the official documentation: `AutoMLX Documentation <https://docs.oracle.com/en-us/iaas/tools/automlx/latest/html/multiversion/latest/automl.html>`_
8+
9+
**How can I learn more about AutoTS?**
10+
11+
For more details, refer to the official documentation: `AutoTS Documentation <https://winedarksea.github.io/AutoTS/build/html/source/tutorial.html>`_
12+
13+
**How do you handle missing values?**
14+
15+
By default, missing values are imputed using linear interpolation.
16+
17+
**Is there a way to specify the percentage increase that should be marked as an anomaly?**
18+
19+
Yes, the ``contamination`` parameter can be used to control the percentage of anomalies. The default value is 0.1 (10%).
20+
21+
**How is seasonality handled?**
22+
23+
Seasonality is analyzed differently by each modeling framework. Refer to the specific model's documentation for more detailed information.

0 commit comments

Comments
 (0)