Skip to content

Commit a577cce

Browse files
committed
Updated user guide.
1 parent 07f204b commit a577cce

File tree

6 files changed

+67
-49
lines changed

6 files changed

+67
-49
lines changed

docs/source/user_guide/data_transformation/data_transformation.rst

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
Transform Data
44
##############
55

6-
When datasets are loaded with DatasetFactory, they can be transformed and manipulated easily with the built-in functions. Underlying, an ``ADSDataset`` object is a Pandas dataframe. Any operation that can be performed to a `Pandas dataframe <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html>`_ can also be applied to an ADS Dataset.
6+
When datasets are loaded, they can be transformed and manipulated easily with the built-in functions. Underlying, an ``ADSDataset`` object is a Pandas dataframe. Any operation that can be performed to a `Pandas dataframe <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html>`_ can also be applied to an ADS Dataset.
77

88
Loading the Dataset
99
********************
@@ -12,9 +12,9 @@ You can load a ``pandas`` dataframe into an ``ADSDataset`` by calling.
1212

1313
.. code-block:: python3
1414
15-
from ads.dataset.factory import DatasetFactory
15+
from ads.dataset.dataset import ADSDataset
1616
17-
ds = DatasetFactory.from_dataframe(df)
17+
ds = ADSDataset.from_dataframe(df)
1818
1919
2020
Automated Transformations
@@ -513,11 +513,14 @@ The resulting three data subsets each have separate data (X) and labels (y).
513513
print(train.X) # print out all features in train dataset
514514
print(train.y) # print out labels in train dataset
515515
516-
You can split the dataset right after the ``DatasetFactory.open()`` statement:
516+
You can split the dataset right after the ``ADSDatasetWithTarget.from_dataframe()`` statement:
517517

518518
.. code-block:: python3
519519
520-
ds = DatasetFactory.open("path/data.csv").set_target('target')
520+
ds = ADSDatasetWithTarget.from_dataframe(
521+
df=pd.read_csv("path/data.csv"),
522+
target="target"
523+
)
521524
train, test = ds.train_test_split(test_size=0.25)
522525
523526
Text Data

docs/source/user_guide/loading_data/connect.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -526,34 +526,34 @@ To load a dataframe from a remote web server source, use ``pandas`` directly and
526526
Convert Pandas DataFrame to ``ADSDataset``
527527
==========================================
528528

529-
To convert a Pandas dataframe to ``ADSDataset``, pass the ``pandas.DataFrame`` object directly into the ADS ``DatasetFactory.open`` method:
529+
To convert a Pandas dataframe to ``ADSDataset``, pass the ``pandas.DataFrame`` object directly into the ADS ``ADSDataset`` constructor or ``ADSDataset.from_dataframe()`` method:
530530

531531
.. code-block:: python3
532532
533533
import pandas as pd
534-
from ads.dataset.factory import DatasetFactory
534+
from ads.dataset.dataset import ADSDataset
535535
536536
df = pd.read_csv('/path/some_data.csv) # load data with Pandas
537537
538538
# use open...
539539
540-
ds = DatasetFactory.open(df) # construct **ADS** Dataset from DataFrame
540+
ds = ADSDataset(df) # construct **ADS** Dataset from DataFrame
541541
542542
# alternative form...
543543
544-
ds = DatasetFactory.from_dataframe(df)
544+
ds = ADSDataset.from_dataframe(df)
545545
546546
# an example using Pandas to parse data on the clipboard as a CSV and construct an ADS Dataset object
547547
# this allows easily transfering data from an application like Microsoft Excel, Apple Numbers, etc.
548548
549-
ds = DatasetFactory.from_dataframe(pd.read_clipboard())
549+
ds = ADSDataset.from_dataframe(pd.read_clipboard())
550550
551551
# use Pandas to query a SQL database:
552552
553553
from sqlalchemy import create_engine
554554
engine = create_engine('dialect://user:pass@host:port/schema', echo=False)
555555
df = pd.read_sql_query('SELECT * FROM mytable', engine, index_col = 'ID')
556-
ds = DatasetFactory.from_dataframe(df)
556+
ds = ADSDataset.from_dataframe(df)
557557
558558
559559
Using ``PyArrow``

docs/source/user_guide/loading_data/connect_legacy.rst

Lines changed: 35 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
Connect with ``DatasetFactory``
2-
*******************************
1+
Connect with ``ADSDataset`` and ``ADSDatasetWithTarget``
2+
********************************************************
33

44

55
.. admonition:: Deprecation Note |deprecated|
@@ -25,7 +25,8 @@ Begin by loading the required libraries and modules:
2525
import pandas as pd
2626
2727
from ads.dataset.dataset_browser import DatasetBrowser
28-
from ads.dataset.factory import DatasetFactory
28+
from ads.dataset.dataset import ADSDataset
29+
from ads.dataset.dataset_with_target import ADSDatasetWithTarget
2930
3031
Object Storage
3132
==============
@@ -37,14 +38,15 @@ To open a dataset from Object Storage using the resource principal method, you c
3738
import ads
3839
import os
3940
40-
from ads.dataset.factory import DatasetFactory
41-
4241
ads.set_auth(auth='resource_principal')
4342
bucket_name = <bucket-name>
4443
file_name = <file-name>
4544
namespace = <namespace>
4645
storage_options = {'config':{}, 'tenancy': os.environ['TENANCY_OCID'], 'region': os.environ['NB_REGION']}
47-
ds = DatasetFactory.open(f"oci://{bucket_name}@{namespace}/{file_name}", storage_options=storage_options)
46+
ds = ADSDataset(
47+
df=pd.read_csv(f"oci://{bucket_name}@{namespace}/{file_name}.csv"),
48+
storage_options=storage_options
49+
)
4850
4951
5052
To open a dataset from Object Storage using the Oracle Cloud Infrastructure configuration file method, include the location of the file using this format ``oci://<bucket_name>@<namespace>/<file_name>`` and modify the optional parameter ``storage_options``. Insert:
@@ -56,19 +58,22 @@ For example:
5658

5759
.. code-block:: python3
5860
59-
ds = DatasetFactory.open("oci://<bucket_name>@<namespace>/<file_name>", storage_options = {
61+
ds = ADSDataset(
62+
df=pd.read_csv(f"oci://{bucket_name}@{namespace}/{file_name}.csv"),
63+
storage_options={
6064
"config": "~/.oci/config",
6165
"profile": "DEFAULT"
62-
})
66+
}
67+
)
6368
6469
Local Storage
6570
=============
6671

67-
To open a dataset from a local source, use ``DatasetFactory.open`` and specify the path of the data file:
72+
To open a dataset from a local source, use ``ADSDataset`` and specify the path of the data file:
6873

6974
.. code-block:: python3
7075
71-
ds = DatasetFactory.open("/path/to/data.data", format='csv', delimiter=" ")
76+
ds = ADSDataset(df=pd.read_csv("/path/to/data.csv"))
7277
7378
Oracle Database
7479
---------------
@@ -122,9 +127,11 @@ You can also use ``cx_Oracle`` within ADS by creating a connection string:
122127
.. code-block:: python3
123128
124129
os.environ['TNS_ADMIN'] = creds['tns_admin']
125-
from ads.dataset.factory import DatasetFactory
130+
from ads.dataset.dataset import ADSDataset
126131
uri = 'oracle+cx_oracle://' + creds['user'] + ':' + creds['password'] + '@' + creds['sid']
127-
ds = DatasetFactory.open(uri, format="sql", table=table, index_col=index_col)
132+
ds = ADSDataset(
133+
df=pd.read_sql(uri, table=table, index_col=index_col)
134+
)
128135
129136
Autonomous Database
130137
===================
@@ -148,13 +155,13 @@ After you have stored the ADB username, password, and database name (SID) as var
148155
149156
uri = 'oracle+cx_oracle://' + creds['user'] + ':' + creds['password'] + '@' + creds['sid']
150157
151-
You can use ADS to query a table from your database, and then load that table as an ``ADSDataset`` object through ``DatasetFactory``.
152-
When you open ``DatasetFactory``, specify the name of the table you want to pull using the ``table`` variable for a given table. For SQL expressions, use the table parameter also. For example, *(`table="SELECT * FROM sh.times WHERE rownum <= 30"`)*.
158+
You can use ADS to query a table from your database, and then load that table as an ``ADSDatasetWithTarget`` object.
159+
When you open ``ADSDatasetWithTarget``, specify the name of the table you want to pull using the ``table`` variable for a given table. For SQL expressions, use the table parameter also. For example, *(`table="SELECT * FROM sh.times WHERE rownum <= 30"`)*.
153160

154161
.. code-block:: python3
155162
156163
os.environ['TNS_ADMIN'] = creds['tns_admin']
157-
ds = DatasetFactory.open(uri, format="sql", table=table, target='label')
164+
ds = ADSDatasetWithTarget(df=pd.read_sql(uri, table=table), target='label')
158165
159166
Query ADB
160167
---------
@@ -172,11 +179,11 @@ Query ADB
172179
engine = create_engine(uri)
173180
df = pd.read_sql('SELECT * from <TABLENAME>', con=engine)
174181
175-
You can convert the ``pd.DataFrame`` into ``ADSDataset`` using the ``DatasetFactory.from_dataframe()`` function.
182+
You can convert the ``pd.DataFrame`` into ``ADSDataset`` using the ``ADSDataset.from_dataframe()`` function.
176183

177184
.. code-block:: python3
178185
179-
ds = DatasetFactory.from_dataframe(df)
186+
ds = ADSDataset.from_dataframe(df)
180187
181188
These two examples run a simple query on ADW data. With ``read_sql_query`` you can use SQL expressions not just for tables, but also to limit the number of rows and to apply conditions with filters, such as (``where``).
182189

@@ -207,7 +214,7 @@ You can also query data from ADW using cx_Oracle. Use the cx_Oracle 7.0.0 versio
207214
data = results.fetchall()
208215
df = pd.DataFrame(np.array(data))
209216
210-
ds = DatasetFactory.from_dataframe(df)
217+
ds = ADSDataset.from_dataframe(df)
211218
212219
.. code-block:: python3
213220
@@ -230,7 +237,7 @@ This example adds predictions programmatically using cx_Oracle. It uses ``execut
230237

231238
.. code-block:: python3
232239
233-
ds = DatasetFactory.open("iris.csv")
240+
ds = ADSDataset(pd.read_csv("iris.csv"))
234241
235242
create_table = '''CREATE TABLE IRIS_PREDICTED (,
236243
sepal_length number,
@@ -269,24 +276,29 @@ You can open Amazon S3 public or private files in ADS. For private files, you mu
269276

270277
.. code-block:: python3
271278
272-
ds = DatasetFactory.open("s3://bucket_name/iris.csv", storage_options = {
279+
ds = ADSDataset(
280+
df=pd.read_csv("s3://bucket_name/iris.csv"),
281+
storage_options = {
273282
'key': 'aws key',
274283
'secret': 'aws secret,
275284
'blocksize': 1000000,
276285
'client_kwargs': {
277-
"endpoint_url": "https://s3-us-west-1.amazonaws.com"
286+
"endpoint_url": "https://s3-us-west-1.amazonaws.com"
278287
}
279288
})
280289
281290
282291
HTTP(S) Sources
283292
===============
284293

285-
To open a dataset from a remote web server source, use ``DatasetFactory.open()`` and specify the URL of the data:
294+
To open a dataset from a remote web server source, use ``ADSDatasetWithTarget`` and specify the URL of the data:
286295

287296
.. code-block:: python3
288297
289-
ds = DatasetFactory.open('https://example.com/path/to/data.csv', target='label')
298+
ds = ADSDatasetWithTarget(
299+
df=pd.read_csv('https://example.com/path/to/data.csv'),
300+
target='label'
301+
)
290302
291303
292304
``DatasetBrowser``

docs/source/user_guide/loading_data/supported_format.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,11 @@ Supported Formats
33

44
You can load datasets into ADS, either locally or from network file systems.
55

6-
You can open datasets with ``DatasetFactory``, ``DatasetBrowser`` or ``pandas``. ``DatasetFactory`` allows datasets to be loaded into ADS.
6+
You can open datasets with ``DatasetBrowser`` or ``pandas``.
77

88
``DatasetBrowser`` supports opening the datasets from web sites and libraries, such as scikit-learn directly into ADS.
99

10-
When you open a dataset in ``DatasetFactory``, you can get the summary statistics, correlations, and visualizations of the dataset.
10+
When you load a dataset in ``ADSDataset`` from ``pandas.DataFrame``, you can get the summary statistics, correlations, and visualizations of the dataset.
1111

1212
ADS Supports:
1313

docs/source/user_guide/model_catalog/model_catalog.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,14 +17,15 @@ provenance, reproduced, and deployed.
1717
import os
1818
import tempfile
1919
import warnings
20+
import pandas as pd
2021
2122
from ads.catalog.model import ModelCatalog
2223
from ads.common.model import ADSModel
2324
from ads.common.model_export_util import prepare_generic_model
2425
from ads.common.model_metadata import (MetadataCustomCategory,
2526
UseCaseType,
2627
Framework)
27-
from ads.dataset.factory import DatasetFactory
28+
from ads.dataset.dataset_with_target import ADSDatasetWithTarget
2829
from ads.feature_engineering.schema import Expression, Schema
2930
from os import path
3031
from sklearn.ensemble import RandomForestClassifier
@@ -97,7 +98,7 @@ The ``RandomForestClassifier`` object is converted to into an ``ADSModel`` using
9798
# Load the dataset
9899
ds_path = path.join("/", "opt", "notebooks", "ads-examples", "oracle_data", "oracle_classification_dataset1_150K.csv")
99100
100-
ds = DatasetFactory.open(ds_path, target="class")
101+
ds = ADSDatasetWithTarget(df=pd.read_csv(ds_path), target="class")
101102
102103
# Data preprocessing
103104
transformed_ds = ds.auto_transform(fix_imbalance=False)

docs/source/user_guide/quickstart/quickstart.rst

Lines changed: 13 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -59,9 +59,9 @@ variable during modeling. The type of this target determines what type of modeli
5959
to use (regression, binary, and multi-class classification, or time series forecasting).
6060

6161
There are several ways to turn data into an ``ADSDataset``. The simplest way is to
62-
use `DatasetFactory`, which takes as its first argument as a string URI or a
63-
``Pandas Dataframe`` object. The URI supports many formats, such as Object Storage
64-
or S3 files. The
62+
use `ADSDataset` or `ADSDatasetWithTarget` constructor, which takes as its first argument
63+
as a ``Pandas Dataframe`` object. The ``Pandas Dataframe`` supports many formats, such as
64+
Object Storage or S3 files. The
6565
`class documentation <https://docs.cloud.oracle.com/en-us/iaas/tools/ads-sdk/latest/modules.html>_` describes all classes.
6666

6767
For example:
@@ -77,12 +77,12 @@ For example:
7777
df = pd.DataFrame(data.data, columns=data.feature_names)
7878
df["species"] = data.target
7979
80-
from ads.dataset.factory import DatasetFactory
80+
from ads.dataset.dataset_with_target import ADSDatasetWithTarget
8181
8282
# these two are equivalent:
83-
ds = DatasetFactory.open(df, target="species")
83+
ds = ADSDatasetWithTarget(df, target="species")
8484
# OR
85-
ds = DatasetFactory.from_dataframe(df, target="species")
85+
ds = ADSDatasetWithTarget.from_dataframe(df, target="species")
8686
8787
The ``ds`` (``ADSDataset``) object is ``Pandas`` like. For example, you can use ``ds.head()``. It's
8888
an encapsulation of a `Pandas` Dataframe with immutability. Any attempt to
@@ -93,7 +93,7 @@ modify the data yields a new copy-on-write of the ``ADSDataset``.
9393
to memory. ADS also samples the dataset for visualization purposes, computes
9494
co-correlation of the columns in the dataset, and performs type discovery on the
9595
different columns in the dataset. That is why loading a dataset with
96-
``DatasetFactory`` can be slower than simply reading the same dataset
96+
``ADSDataset`` can be slower than simply reading the same dataset
9797
with ``Pandas``. In return, you get the added data visualizations and data
9898
profiling benefits of the ``ADSDataset`` object.
9999

@@ -113,10 +113,12 @@ modify the data yields a new copy-on-write of the ``ADSDataset``.
113113
114114
pd.DataFrame({'c1':[1,2,3], 'target': ['yes', 'no', 'yes']}).to_csv('Users/ysz/data/sample.csv')
115115
116-
ds = DatasetFactory.open('Users/ysz/data/sample.csv',
117-
target = 'target',
118-
type_discovery = False, # turn off ADS type discovery
119-
types = {'target': 'category'}) # specify target type
116+
ds = ADSDatasetWithTarget(
117+
df=pd.read_csv('Users/ysz/data/sample.csv'),
118+
target='target',
119+
type_discovery=False, # turn off ADS type discovery
120+
types={'target': 'category'} # specify target type
121+
)
120122
121123
122124

0 commit comments

Comments
 (0)