Skip to content

Commit c067bcd

Browse files
WEB: Clean up Ecosystem page (#61656)
1 parent e31afa1 commit c067bcd

File tree

1 file changed

+6
-157
lines changed

1 file changed

+6
-157
lines changed

web/pandas/community/ecosystem.md

Lines changed: 6 additions & 157 deletions
Original file line numberDiff line numberDiff line change
@@ -151,20 +151,6 @@ or MATLAB, modified in a GUI, or embedded in apps and dashboards. Plotly
151151
is free for unlimited sharing, and has cloud, offline, or on-premise
152152
accounts for private use.
153153

154-
### [Lux](https://github.com/lux-org/lux)
155-
156-
Lux is a Python library that facilitates fast and easy experimentation with data by automating the visual data exploration process. To use Lux, simply add an extra import alongside pandas:
157-
158-
```python
159-
import lux
160-
import pandas as pd
161-
162-
df = pd.read_csv("data.csv")
163-
df # discover interesting insights!
164-
```
165-
166-
By printing out a dataframe, Lux automatically [recommends a set of visualizations](https://raw.githubusercontent.com/lux-org/lux-resources/master/readme_img/demohighlight.gif) that highlights interesting trends and patterns in the dataframe. Users can leverage any existing pandas commands without modifying their code, while being able to visualize their pandas data structures (e.g., DataFrame, Series, Index) at the same time. Lux also offers a [powerful, intuitive language](https://lux-api.readthedocs.io/en/latest/source/guide/vis.html) that allow users to create Altair, matplotlib, or Vega-Lite visualizations without having to think at the level of code.
167-
168154
### [D-Tale](https://github.com/man-group/dtale)
169155

170156
D-Tale is a lightweight web client for visualizing pandas data structures. It
@@ -386,92 +372,14 @@ Use `pandas_gbq.read_gbq` and `pandas_gbq.to_gbq`, instead.
386372

387373
### [ArcticDB](https://github.com/man-group/ArcticDB)
388374

389-
ArcticDB is a serverless DataFrame database engine designed for the Python Data Science ecosystem. ArcticDB enables you to store, retrieve, and process pandas DataFrames at scale. It is a storage engine designed for object storage and also supports local-disk storage using LMDB. ArcticDB requires zero additional infrastructure beyond a running Python environment and access to object storage and can be installed in seconds. Please find full documentation [here](https://docs.arcticdb.io/latest/).
390-
391-
#### ArcticDB Terminology
392-
393-
ArcticDB is structured to provide a scalable and efficient way to manage and retrieve DataFrames, organized into several key components:
394-
395-
- `Object Store` Collections of libraries. Used to separate logical environments from each other. Analogous to a database server.
396-
- `Library` Contains multiple symbols which are grouped in a certain way (different users, markets, etc). Analogous to a database.
397-
- `Symbol` Atomic unit of data storage. Identified by a string name. Data stored under a symbol strongly resembles a pandas DataFrame. Analogous to tables.
398-
- `Version` Every modifying action (write, append, update) performed on a symbol creates a new version of that object.
399-
400-
#### Installation
401-
402-
To install, simply run:
403-
404-
```console
405-
pip install arcticdb
406-
```
407-
408-
To get started, we can import ArcticDB and instantiate it:
409-
410-
```python
411-
import arcticdb as adb
412-
import numpy as np
413-
import pandas as pd
414-
# this will set up the storage using the local file system
415-
arctic = adb.Arctic("lmdb://arcticdb_test")
416-
```
417-
418-
> **Note:** ArcticDB supports any S3 API compatible storage, including AWS. ArcticDB also supports Azure Blob storage.
419-
> ArcticDB also supports LMDB for local/file based storage - to use LMDB, pass an LMDB path as the URI: `adb.Arctic('lmdb://path/to/desired/database')`.
420-
421-
#### Library Setup
422-
423-
ArcticDB is geared towards storing many (potentially millions) of tables. Individual tables (DataFrames) are called symbols and are stored in collections called libraries. A single library can store many symbols. Libraries must first be initialized prior to use:
424-
425-
```python
426-
lib = arctic.get_library('sample', create_if_missing=True)
427-
```
428-
429-
#### Writing Data to ArcticDB
430-
431-
Now we have a library set up, we can get to reading and writing data. ArcticDB has a set of simple functions for DataFrame storage. Let's write a DataFrame to storage.
432-
433-
```python
434-
df = pd.DataFrame(
435-
{
436-
"a": list("abc"),
437-
"b": list(range(1, 4)),
438-
"c": np.arange(3, 6).astype("u1"),
439-
"d": np.arange(4.0, 7.0, dtype="float64"),
440-
"e": [True, False, True],
441-
"f": pd.date_range("20130101", periods=3)
442-
}
443-
)
444-
445-
df
446-
df.dtypes
447-
```
448-
449-
Write to ArcticDB.
450-
451-
```python
452-
write_record = lib.write("test", df)
453-
```
454-
455-
> **Note:** When writing pandas DataFrames, ArcticDB supports the following index types:
456-
>
457-
> - `pandas.Index` containing int64 (or the corresponding dedicated types Int64Index, UInt64Index)
458-
> - `RangeIndex`
459-
> - `DatetimeIndex`
460-
> - `MultiIndex` composed of above supported types
461-
>
462-
> The "row" concept in `head`/`tail` refers to the row number ('iloc'), not the value in the `pandas.Index` ('loc').
375+
ArcticDB is a serverless DataFrame database engine designed for the Python Data Science ecosystem.
376+
ArcticDB enables you to store, retrieve, and process pandas DataFrames at scale.
377+
It is a storage engine designed for object storage and also supports local-disk storage using LMDB.
378+
ArcticDB requires zero additional infrastructure beyond a running Python environment and access
379+
to object storage and can be installed in seconds.
463380

464-
#### Reading Data from ArcticDB
381+
Please find full documentation [here](https://docs.arcticdb.io/latest/).
465382

466-
Read the data back from storage:
467-
468-
```python
469-
read_record = lib.read("test")
470-
read_record.data
471-
df.dtypes
472-
```
473-
474-
ArcticDB also supports appending, updating, and querying data from storage to a pandas DataFrame. Please find more information [here](https://docs.arcticdb.io/latest/api/processing/#arcticdb.QueryBuilder).
475383

476384
### [Hugging Face](https://huggingface.co/datasets)
477385

@@ -524,35 +432,6 @@ def process_data():
524432
process_data()
525433
```
526434

527-
528-
### [Cylon](https://cylondata.org/)
529-
530-
Cylon is a fast, scalable, distributed memory parallel runtime with a pandas
531-
like Python DataFrame API. ”Core Cylon” is implemented with C++ using Apache
532-
Arrow format to represent the data in-memory. Cylon DataFrame API implements
533-
most of the core operators of pandas such as merge, filter, join, concat,
534-
group-by, drop_duplicates, etc. These operators are designed to work across
535-
thousands of cores to scale applications. It can interoperate with pandas
536-
DataFrame by reading data from pandas or converting data to pandas so users
537-
can selectively scale parts of their pandas DataFrame applications.
538-
539-
```python
540-
from pycylon import read_csv, DataFrame, CylonEnv
541-
from pycylon.net import MPIConfig
542-
543-
# Initialize Cylon distributed environment
544-
config: MPIConfig = MPIConfig()
545-
env: CylonEnv = CylonEnv(config=config, distributed=True)
546-
547-
df1: DataFrame = read_csv('/tmp/csv1.csv')
548-
df2: DataFrame = read_csv('/tmp/csv2.csv')
549-
550-
# Using 1000s of cores across the cluster to compute the join
551-
df3: Table = df1.join(other=df2, on=[0], algorithm="hash", env=env)
552-
553-
print(df3)
554-
```
555-
556435
### [Dask](https://docs.dask.org)
557436

558437
Dask is a flexible parallel computing library for analytics. Dask
@@ -592,36 +471,6 @@ import modin.pandas as pd
592471
df = pd.read_csv("big.csv") # use all your cores!
593472
```
594473

595-
### [Pandarallel](https://github.com/nalepae/pandarallel)
596-
597-
Pandarallel provides a simple way to parallelize your pandas operations on all your CPUs by changing only one line of code.
598-
It also displays progress bars.
599-
600-
```python
601-
from pandarallel import pandarallel
602-
603-
pandarallel.initialize(progress_bar=True)
604-
605-
# df.apply(func)
606-
df.parallel_apply(func)
607-
```
608-
609-
### [Vaex](https://vaex.io/docs/)
610-
611-
Increasingly, packages are being built on top of pandas to address
612-
specific needs in data preparation, analysis and visualization. Vaex is
613-
a python library for Out-of-Core DataFrames (similar to Pandas), to
614-
visualize and explore big tabular datasets. It can calculate statistics
615-
such as mean, sum, count, standard deviation etc, on an N-dimensional
616-
grid up to a billion (10^9) objects/rows per second. Visualization is
617-
done using histograms, density plots and 3d volume rendering, allowing
618-
interactive exploration of big data. Vaex uses memory mapping, zero
619-
memory copy policy and lazy computations for best performance (no memory
620-
wasted).
621-
622-
- ``vaex.from_pandas``
623-
- ``vaex.to_pandas_df``
624-
625474
### [Hail Query](https://hail.is/)
626475

627476
An out-of-core, preemptible-safe, distributed, dataframe library serving

0 commit comments

Comments
 (0)