Skip to content

update org #15

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -162,4 +162,5 @@ cython_debug/
**/checkpoints/*
.vscode/

models/
models/
data/*/
4 changes: 0 additions & 4 deletions benchmark/pyod_.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,6 @@
KNN, LMDD, LOF, MCD, OCSVM, PCA,
FeatureBagging, IForest)

# TODO: add sklearnex to accelerate sklearn
# from sklearnex import patch_sklearn
# patch_sklearn()

warnings.filterwarnings("ignore")


Expand Down
2 changes: 1 addition & 1 deletion docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = .. #_build
BUILDDIR = _build

# Put it first so that "make" without argument is like "make help".
help:
Expand Down
7 changes: 7 additions & 0 deletions docs/_static/custom.css
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,11 @@
/* or any size you want */
height: auto;
/* keep aspect ratio */
}

.wy-nav-content {
padding: 1.618em 3.236em;
height: 100%;
max-width: 1600px;
margin: auto;
}
Binary file added docs/_static/flowbench.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
'sphinx.ext.autodoc',
'sphinx.ext.napoleon',
'sphinx.ext.mathjax',
'sphinx.ext.doctest',
]

templates_path = ['_templates']
Expand Down
156 changes: 156 additions & 0 deletions docs/examples.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,159 @@
Examples
========

Load Dataset
------------

- load data as graphs in ``pytorch_geometric`` format:

.. code-block:: python

from flowbench.dataset import FlowDataset
dataset = FlowDataset(root="./", name="montage")
data = dataset[0]

The ``data`` contains the structural information by accessing ``data.edge_index``, and node feature information ``data.x``.

- load data as tabular data in ``pytorch`` format:

.. code-block:: python

from flowbench.dataset import FlowDataset
dataset = FlowDataset(root="./", name="montage")
data = dataset[0]
Xs = data.x
ys = data.y

Unlike the graph ``pyg.data``, the ``data`` only contains the node features.

- load data as tabular data in ``numpy`` format:

.. code-block:: python

from flowbench.dataset import FlowDataset
dataset = FlowDataset(root="./", name="montage")
data = dataset[0]
Xs = data.x.numpy()
ys = data.y.numpy()

This is the same as the previous one, but the data is in ``numpy`` format, which is typically used in the models from ``sklearn`` and ``xgboost``.

- load test data with ``huggingface`` interface.
We have uploaded our parsed text data in the ``huggingface`` dataset. You can load the data with the following code:

.. code-block:: python

from datasets import load_dataset
dataset = load_dataset("cshjin/poseidon", "1000genome")

The dataset is in the format of ``dict`` with keys ``train``, ``test``, and ``validation``.

PyOD Models
-----------

=================== ================ ====================================================================================================== ===== ===================================================
Type Abbr Algorithm Year Class
=================== ================ ====================================================================================================== ===== ===================================================
Probabilistic ABOD Angle-Based Outlier Detection 2008 :class:`flowbench.unsupervised.pyod.ABOD`
Probabilistic KDE Outlier Detection with Kernel Density Functions 2007 :class:`flowbench.unsupervised.pyod.KDE`
Probabilistic GMM Probabilistic Mixture Modeling for Outlier Analysis :class:`flowbench.unsupervised.pyod.GMM`
Linear Model PCA Principal Component Analysis (the sum of weighted projected distances to the eigenvector hyperplanes) 2003 :class:`flowbench.unsupervised.pyod.PCA`
Linear Model OCSVM One-Class Support Vector Machines 2001 :class:`flowbench.unsupervised.pyod.OCSVM`
Linear Model LMDD Deviation-based Outlier Detection (LMDD) 1996 :class:`flowbench.unsupervised.pyod.LMDD`
Proximity-Based LOF Local Outlier Factor 2000 :class:`flowbench.unsupervised.pyod.LOF`
Proximity-Based CBLOF Clustering-Based Local Outlier Factor 2003 :class:`flowbench.unsupervised.pyod.CBLOF`
Proximity-Based kNN k Nearest Neighbors (use the distance to the kth nearest neighbor as the outlier score) 2000 :class:`flowbench.unsupervised.pyod.KNN`
Outlier Ensembles IForest Isolation Forest 2008 :class:`flowbench.unsupervised.pyod.IForest`
Outlier Ensembles INNE Isolation-based Anomaly Detection Using Nearest-Neighbor Ensembles 2018 :class:`flowbench.unsupervised.pyod.INNE`
Outlier Ensembles LSCP LSCP: Locally Selective Combination of Parallel Outlier Ensembles 2019 :class:`flowbench.unsupervised.pyod.LSCP`
=================== ================ ====================================================================================================== ===== ===================================================

- Example of using `GMM`

.. code-block:: python

from flowbench.pyod import GMM
from flowbench.dataset import FlowDataset
dataset = FlowDataset(root="./", name="1000genome")
Xs = ds.x.numpy()
clf = GMM()
clf.fit(Xs)
y_pred = clf.predict(Xs)

- Detailed example in ``example/demo_pyod.py``

PyGOD Models
------------

=========== ================== ===== ==============================================
Type Abbr Year Class
=========== ================== ===== ==============================================
Clustering SCAN 2007 :class:`flowbench.unsupervised.pygod.SCAN`
GNN+AE GAE 2016 :class:`flowbench.unsupervised.pygod.GAE`
MF Radar 2017 :class:`flowbench.unsupervised.pygod.Radar`
MF ANOMALOUS 2018 :class:`flowbench.unsupervised.pygod.ANOMALOUS`
MF ONE 2019 :class:`flowbench.unsupervised.pygod.ONE`
GNN+AE DOMINANT 2019 :class:`flowbench.unsupervised.pygod.DOMINANT`
MLP+AE DONE 2020 :class:`flowbench.unsupervised.pygod.DONE`
MLP+AE AdONE 2020 :class:`flowbench.unsupervised.pygod.AdONE`
GNN+AE AnomalyDAE 2020 :class:`flowbench.unsupervised.pygod.AnomalyDAE`
GAN GAAN 2020 :class:`flowbench.unsupervised.pygod.GAAN`
GNN+AE DMGD 2020 :class:`flowbench.unsupervised.pygod.DMGD`
GNN OCGNN 2021 :class:`flowbench.unsupervised.pygod.OCGNN`
GNN+AE+SSL CoLA 2021 :class:`flowbench.unsupervised.pygod.CoLA`
GNN+AE GUIDE 2021 :class:`flowbench.unsupervised.pygod.GUIDE`
GNN+AE+SSL CONAD 2022 :class:`flowbench.unsupervised.pygod.CONAD`
GNN+AE GADNR 2024 :class:`flowbench.unsupervised.pygod.GADNR`
=========== ================== ===== ==============================================


- Example of using `GMM`

.. code-block:: python

from flowbench.unsupervised.pygod import GAE
from flowbench.dataset import FlowDataset
dataset = FlowDataset(root="./", name="1000genome")
data = dataset[0]
clf = GAE()
clf.fit(data)

- Detailed example in ``example/demo_pygod.py``


Supervised Models
-----------------

- Example of using `MLP`

.. code-block:: python

from flowbench.supervised.mlp import MLPClassifier
from flowbench.dataset import FlowDataset
dataset = FlowDataset(root="./", name="1000genome")
data = dataset[0]
clf = MLPClassifier()
clf.fit(data)

- Detailed example in ``example/demo_supervised.py``

Supervised fine-tuned LLMs
--------------------------

- Example of using LoRA (Low-rank Adaptation) for supervised fine-tuned LLMs:

.. code-block:: python

from peft import LoraConfig
dataset = load_dataset("cshjin/poseidon", "1000genome")
# data processing
...
# LoRA config
peft_config = LoraConfig(task_type=TaskType.SEQ_CLS, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1)
training_args = TrainingArgument(...)
# LoRA trainer
trainer = Trainer(peft_model, ...)
trainer.train()
...

- Detailed example in ``example/demo_sft_lora.py``
7 changes: 7 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,13 @@ Flow-Bench is a benchmark dataset for anomaly detection techniques in computatio
Flow-Bench contains workflow execution traces, executed on distributed infrastructure, that include systematically injected anomalies (labeled), and offers both the raw execution logs and a more compact parsed version.
In this GitHub repository, apart from the logs and traces, you will find sample code to load and process the parsed data using pytorch, as well as, the code used to parse the raw logs and events.

.. figure:: _static/flowbench.png
:alt: FlowBench Outline
:align: center
:scale: 50%

Figure: FlowBench - An Anomaly Detection Benchmark Dataset

.. toctree::
:maxdepth: 2
:caption: Contents:
Expand All @@ -25,6 +31,7 @@ In this GitHub repository, apart from the logs and traces, you will find sample
flowbench.nlp

license

Indices and tables
==================

Expand Down
Loading