You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/user_guide/big_data_service/file_management.rst
+3-4Lines changed: 3 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -71,7 +71,7 @@ Upload
71
71
------
72
72
73
73
The `.put() <https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.spec.AbstractFileSystem.put>`_ method is used to upload files from local storage to HDFS. The first parameter is the local path of the files to upload. The second parameter is the HDFS path where the files are to be stored.
74
-
`.upload() <https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.spec.AbstractFileSystem.upload>`_ is an alias of `.put()`.
74
+
`.upload() <https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.spec.AbstractFileSystem.upload>`_ is an alias of ``.put()``.
75
75
.. code-block:: python3
76
76
77
77
fs.put(
@@ -82,7 +82,7 @@ The `.put() <https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.sp
82
82
Ibis
83
83
====
84
84
85
-
`Ibis <https://github.com/ibis-project/ibis>`_ is an open-source library by `Cloudera <https://www.cloudera.com/>`_ that provides a Python framework to access data and perform analytical computations from different sources. Ibis allows access to the data ising HDFS. You use the ``ibis.impala.hdfs_connect()`` method to make a connection to HDFS, and it returns a handler. This handler has methods such as ``.ls()`` to list, ``.get()`` to download, ``.put()`` to upload, and ``.rm()`` to delete files. These operations support globbing. Ibis' HDFS connector supports a variety of `additional operations <https://ibis-project.org/docs/dev/backends/Impala/#hdfs-interaction>`_.
85
+
`Ibis <https://github.com/ibis-project/ibis>`_ is an open-source library by `Cloudera <https://www.cloudera.com/>`_ that provides a Python framework to access data and perform analytical computations from different sources. Ibis allows access to the data ising HDFS. You use the ``ibis.impala.hdfs_connect()`` method to make a connection to HDFS, and it returns a handler. This handler has methods such as ``.ls()`` to list, ``.get()`` to download, ``.put()`` to upload, and ``.rm()`` to delete files. These operations support globbing. Ibis' HDFS connector supports a variety of `additional operations <https://ibis-project.org/backends/impala/#hdfs-interaction>`_.
86
86
87
87
Connect
88
88
-------
@@ -159,7 +159,7 @@ Use the `.put() <https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspe
159
159
Pandas
160
160
======
161
161
162
-
Pandas allows access to BDS' HDFS system through :ref:`FSSpec`. This section demonstrates some common operations.
162
+
Pandas allows access to BDS' HDFS system through :ref:`FSSpec`. This section demonstrates some common operations.
163
163
164
164
Connect
165
165
-------
@@ -259,4 +259,3 @@ The following sample code shows several different PyArrow methods for working wi
Copy file name to clipboardExpand all lines: docs/source/user_guide/big_data_service/sql_data_management.rst
+1-3Lines changed: 1 addition & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -13,7 +13,7 @@ Ibis
13
13
Connect
14
14
-------
15
15
16
-
Obtaining a Kerberos ticket, depending on your system configuration, you may need to define the ``ibis.options.impala.temp_db`` and ``ibis.options.impala.temp_hdfs_path`` options. The ``ibis.impala.connect()`` method makes a connection to the `Impala execution backend <https://ibis-project.org/docs/dev/backends/Impala/>`_. The ``.sql()`` allows you to run SQL commands on the data.
16
+
Obtaining a Kerberos ticket, depending on your system configuration, you may need to define the ``ibis.options.impala.temp_db`` and ``ibis.options.impala.temp_hdfs_path`` options. The ``ibis.impala.connect()`` method makes a connection to the `Impala execution backend <https://ibis-project.org/backends/impala/>`_. The ``.sql()`` allows you to run SQL commands on the data.
17
17
18
18
.. code-block:: python3
19
19
@@ -167,5 +167,3 @@ It is important to close sessions when you don't need them anymore. This frees u
Copy file name to clipboardExpand all lines: docs/source/user_guide/data_transformation/data_transformation.rst
+17-2Lines changed: 17 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -20,12 +20,27 @@ You can load a ``pandas`` dataframe into an ``ADSDataset`` by calling.
20
20
Automated Transformations
21
21
*************************
22
22
23
-
ADS has built in automatic transform tools for datasets. When the ``get_recommendations()`` tool is applied to an ``ADSDataset`` object, it shows the user detected issues with the data and recommends changes to apply to the dataset. You can accept the changes is as easy as clicking a button in the drop down menu. After all the changes are applied, the transformed dataset can be retrieved by calling ``get_transformed_dataset()``.
23
+
ADS provides built-in automatic transformation tools for datasets. These tools help detect issues with the data and recommend changes to improve the dataset. The recommended changes can be accepted by clicking a button in the drop-down menu. Once the changes are applied, the transformed dataset can be retrieved using the ``get_transformed_dataset()`` method.
24
+
25
+
To access the recommendations, you can use the ``get_recommendations()`` method on the ``ADSDataset`` object:
24
26
25
27
.. code-block:: python3
26
28
29
+
wine_ds = DatasetFactory.from_dataframe(data, target='Price') # Specify the target variable
27
30
wine_ds.get_recommendations()
28
31
32
+
However, please note that ``get_recommendations()`` is not a direct method of the ``ADSDataset`` class. If you created the dataset using ``ADSDataset.from_dataframe(data)``, calling ``get_recommendations()`` directly on the ``ADSDataset`` object will result in an error. Instead, you can retrieve the recommendations by following these steps:
The ``recommendations`` variable will contain the detected issues with the dataset and the recommended changes. You can then review and accept the recommended changes as needed.
43
+
29
44
Alternatively, you can use ``auto_transform()`` to apply all the recommended transformations at once. ``auto_transform()`` returns a transformed dataset with several optimizations applied automatically. The optimizations include:
30
45
31
46
* Dropping constant and primary key columns, which has no predictive quality.
@@ -242,7 +257,7 @@ You can apply functions to update column values in existing column. This example
242
257
Change Data Type
243
258
================
244
259
245
-
You can change the data type columns with the ``astype()`` method. ADS uses the Pandas method, ``astype()``, on dataframe objects. For specifics, see `astype for a Pandas Dataframe <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.astype.html>`_, `using numpy.dtype <https://docs.scipy.org/doc/numpy/reference/generated/numpy.dtype.html#numpy.dtype>`_, or `Pandas dtypes <https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#dtypes>`_.
260
+
You can change the data type columns with the ``astype()`` method. ADS uses the Pandas method, ``astype()``, on dataframe objects. For specifics, see `astype for a Pandas Dataframe <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.astype.html>`_, `using numpy.dtype <https://docs.scipy.org/doc/numpy/reference/generated/numpy.dtype.html#numpy.dtype>`_, or `Pandas dtypes <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dtypes.html>`_.
246
261
247
262
When you change the type of a column, ADS updates its semantic type to categorical, continuous, datetime, or ordinal. For example, if you update a column type to integer, its semantic type updates to ordinal. For data type details, see ref:`loading-data-specify-dtype`.
0 commit comments