Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .github/workflows/pytest_imputation_beta.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,6 @@ jobs:
python -m pytest ./tests/test_imputation_tkcm.py
python -m pytest ./tests/test_imputation_deepmvi.py
python -m pytest ./tests/test_imputation_brits.py
python -m pytest ./tests/test_imputation_mpin.py
python -m pytest ./tests/test_imputation_pristi.py
python -m pytest ./tests/test_imputation_grin.py
python -m pytest ./tests/test_imputation_gain.py
Expand Down
2 changes: 1 addition & 1 deletion .idea/imputegap.iml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion .idea/misc.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

77 changes: 37 additions & 40 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Access to commonly used datasets in time series research (Datasets).
<br>

![Python](https://img.shields.io/badge/Python-v3.12-blue)
![Release](https://img.shields.io/badge/Release-v1.0.6-brightgreen)
![Release](https://img.shields.io/badge/Release-v1.0.7-brightgreen)
![License](https://img.shields.io/badge/License-GPLv3-blue?style=flat&logo=gnu)
![Coverage](https://img.shields.io/badge/Coverage-93%25-brightgreen)
![PyPI](https://img.shields.io/pypi/v/imputegap?label=PyPI&color=blue)
Expand All @@ -40,13 +40,13 @@ Access to commonly used datasets in time series research (Datasets).
# List of available imputation algorithms
| **Family** | **Algorithm** | **Venue -- Year** |
|--------------------|---------------------------|------------------------------|
| Deep Learning | BITGraph [[32]](#ref32) | ICLR -- 2024 |
| Deep Learning | BitGraph [[32]](#ref32) | ICLR -- 2024 |
| Deep Learning | BayOTIDE [[30]](#ref30) | PMLR -- 2024 |
| Deep Learning | MPIN [[25]](#ref25) | PVLDB -- 2024 |
| Deep Learning | MissNet [[27]](#ref27) | KDD -- 2024 |
| Deep Learning | PriSTI [[26]](#ref26) | ICDE -- 2023 |
| Deep Learning | MPIN [[25]](#ref25) | PVLDB -- 2024 |
| Deep Learning | PRISTI [[26]](#ref26) | ICDE -- 2023 |
| Deep Learning | GRIN [[29]](#ref29) | ICLR -- 2022 |
| Deep Learning | HKMF-T [[31]](#ref31) | TKDE -- 2021 |
| Deep Learning | HKMF_T [[31]](#ref31) | TKDE -- 2021 |
| Deep Learning | DeepMVI [[24]](#ref24) | PVLDB -- 2021 |
| Deep Learning | MRNN [[22]](#ref22) | IEEE Trans on BE -- 2019 |
| Deep Learning | BRITS [[23]](#ref23) | NeurIPS -- 2018 |
Expand All @@ -60,18 +60,18 @@ Access to commonly used datasets in time series research (Datasets).
| Matrix Completion | SPIRIT [[5]](#ref5) | VLDB -- 2005 |
| Matrix Completion | IterativeSVD [[2]](#ref2) | BIOINFORMATICS -- 2001 |
| Pattern Search | TKCM [[11]](#ref11) | EDBT -- 2017 |
| Pattern Search | ST-MVL [[9]](#ref9) | IJCAI -- 2016 |
| Pattern Search | STMVL [[9]](#ref9) | IJCAI -- 2016 |
| Pattern Search | DynaMMo [[10]](#ref10) | KDD -- 2009 |
| Machine Learning | IIM [[12]](#ref12) | ICDE -- 2019 |
| Machine Learning | XGBI [[13]](#ref13) | KDD -- 2016 |
| Machine Learning | Mice [[14]](#ref14) | Statistical Software -- 2011 |
| Machine Learning | XGBOOST [[13]](#ref13) | KDD -- 2016 |
| Machine Learning | MICE [[14]](#ref14) | Statistical Software -- 2011 |
| Machine Learning | MissForest [[15]](#ref15) | BioInformatics -- 2011 |
| Statistics | KNNImpute | - |
| Statistics | Interpolation | - |
| Statistics | Min Impute | - |
| Statistics | Zero Impute | - |
| Statistics | Mean Impute | - |
| Statistics | Mean Impute By Series | - |
| Statistics | MinImpute | - |
| Statistics | ZeroImpute | - |
| Statistics | MeanImpute | - |
| Statistics | MeanImputeBySeries | - |

---

Expand Down Expand Up @@ -155,7 +155,7 @@ ts.normalize(normalizer="z_score")

# plot and print a subset of time series
ts.plot(input_data=ts.data, nbr_series=9, nbr_val=100, save_path="./imputegap_assets")
ts.print(nbr_series=9, nbr_val=100)
ts.print(nbr_series=9, nbr_val=20)
```

---
Expand Down Expand Up @@ -235,8 +235,7 @@ imputer.score(ts.data, imputer.recov_data)
ts.print_results(imputer.metrics)

# plot the recovered time series
ts.plot(input_data=ts.data, incomp_data=ts_m, recov_data=imputer.recov_data, nbr_series=9, subplot=True,
save_path="./imputegap_assets")
ts.plot(input_data=ts.data, incomp_data=ts_m, recov_data=imputer.recov_data, nbr_series=9, subplot=True, algorithm=imputer.algorithm, save_path="./imputegap_assets")
```

---
Expand Down Expand Up @@ -270,9 +269,16 @@ imputer = Imputation.MatrixCompletion.CDRec(ts_m)
# use Ray Tune to fine tune the imputation algorithm
imputer.impute(user_def=False, params={"input_data": ts.data, "optimizer": "ray_tune"})

# compute and print the imputation metrics
# compute the imputation metrics with optimized parameter values
imputer.score(ts.data, imputer.recov_data)
ts.print_results(imputer.metrics)

# compute the imputation metrics with default parameter values
imputer_def = Imputation.MatrixCompletion.CDRec(ts_m).impute()
imputer_def.score(ts.data, imputer_def.recov_data)

# print the imputation metrics with default and optimized parameter values
ts.print_results(imputer_def.metrics, text="Imputation metrics with default parameter values")
ts.print_results(imputer.metrics, text="Imputation metrics with optimized parameter values")

# plot the recovered time series
ts.plot(input_data=ts.data, incomp_data=ts_m, recov_data=imputer.recov_data, nbr_series=9, subplot=True,
Expand Down Expand Up @@ -357,7 +363,7 @@ imputer = Imputation.MatrixCompletion.CDRec(ts_m)
imputer.impute()

# compute and print the downstream results
downstream_config = {"task": "forecast", "model": "hw-add"}
downstream_config = {"task": "forecast", "model": "hw-add", "comparator": "ZeroImpute"}
imputer.score(ts.data, imputer.recov_data, downstream=downstream_config)
ts.print_results(imputer.downstream_metrics, algorithm=imputer.algorithm)
```
Expand All @@ -369,7 +375,7 @@ ts.print_results(imputer.downstream_metrics, algorithm=imputer.algorithm)

## Benchmark

ImputeGAP can serve as a common test-bed for comparing the effectiveness and efficiency of time series imputation algorithms[[33]](#ref33) . Users have full control over the benchmark by customizing various parameters, including the list of datasets to evaluate, the algorithms to compare, the choice of optimizer to fine-tune the algorithms on the chosen datasets, the missingness patterns, and the range of missing rates.
ImputeGAP can serve as a common test-bed for comparing the effectiveness and efficiency of time series imputation algorithms[[33]](#ref33) . Users have full control over the benchmark by customizing various parameters, including the list of datasets to evaluate, the algorithms to compare, the choice of optimizer to fine-tune the algorithms on the chosen datasets, the missingness patterns, and the range of missing rates. The default metrics evaluated include "RMSE", "MAE", "MI", "Pearson", and the runtime.
The documentation for the benchmark is described [here](https://imputegap.readthedocs.io/en/latest/benchmark.html).


Expand All @@ -381,22 +387,27 @@ The benchmarking module can be utilized as follows:
```python
from imputegap.recovery.benchmark import Benchmark

save_dir = "./analysis"
nbr_run = 2
save_dir = "./imputegap_assets/benchmark"
nbr_runs = 1

datasets = ["eeg-alcohol", "eeg-reading"]
datasets = ["eeg-alcohol"]

optimizer = {"optimizer": "ray_tune", "options": {"n_calls": 1, "max_concurrent_trials": 1}}
optimizers = [optimizer]
optimizers = ["default_params"]

algorithms = ["MeanImpute", "CDRec", "STMVL", "IIM", "MRNN"]
algorithms = ["SoftImpute", "KNNImpute"]

patterns = ["mcar"]

range = [0.05, 0.1, 0.2, 0.4, 0.6, 0.8]

# launch the evaluation
list_results, sum_scores = Benchmark().eval(algorithms=algorithms, datasets=datasets, patterns=patterns, x_axis=range, optimizers=optimizers, save_dir=save_dir, runs=nbr_run)
list_results, sum_scores = Benchmark().eval(algorithms=algorithms, datasets=datasets, patterns=patterns, x_axis=range, optimizers=optimizers, save_dir=save_dir, runs=nbr_runs)
```

You can change the optimizer using the following command:
```python
optimizer = {"optimizer": "ray_tune", "options": {"n_calls": 1, "max_concurrent_trials": 1}}
optimizers = [optimizer]
```

---
Expand All @@ -407,20 +418,6 @@ To add your own imputation algorithm in Python or C++, please refer to the detai

---


## Articles


Mourad Khayati, Alberto Lerner, Zakhar Tymchenko, Philippe Cudre-Mauroux: Mind the Gap: An Experimental Evaluation of Imputation of Missing Values Techniques in Time Series. Proc. VLDB Endow. 13(5): 768-782 (2020)

Mourad Khayati, Quentin Nater, Jacques Pasquier: ImputeVIS: An Interactive Evaluator to Benchmark Imputation Techniques for Time Series Data. Proc. VLDB Endow. 17(12): 4329-4332 (2024)

---





## Citing

If you use ImputeGAP in your research, please cite the paper:
Expand Down
2 changes: 1 addition & 1 deletion build/lib/imputegap/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "1.0.5"
__version__ = "1.0.7"
2 changes: 1 addition & 1 deletion build/lib/imputegap/algorithms/bayotide.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,6 @@ def bay_otide(incomp_data, K_trend=20, K_season=2, n_season=5, K_bias=1, time_sc

end_time = time.time()
if logs:
print(f"\n\t\t> logs, imputation bay_otide - Execution Time: {(end_time - start_time):.4f} seconds\n")
print(f"\n\t> logs, imputation bay_otide - Execution Time: {(end_time - start_time):.4f} seconds\n")

return recov_data
2 changes: 1 addition & 1 deletion build/lib/imputegap/algorithms/bit_graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,6 @@ def bit_graph(incomp_data, node_number=-1, kernel_set=[1], dropout=0.1, subgraph

end_time = time.time()
if logs:
print(f"\n\t\t> logs, imputation bit graph - Execution Time: {(end_time - start_time):.4f} seconds\n")
print(f"\n\t> logs, imputation bit graph - Execution Time: {(end_time - start_time):.4f} seconds\n")

return recov_data
2 changes: 1 addition & 1 deletion build/lib/imputegap/algorithms/brits.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,6 @@ def brits(incomp_data, model="brits", epoch=10, batch_size=7, nbr_features=1, hi

end_time = time.time()
if logs:
print(f"\n\t\t> logs, imputation brits - Execution Time: {(end_time - start_time):.4f} seconds\n")
print(f"\n\t> logs, imputation brits - Execution Time: {(end_time - start_time):.4f} seconds\n")

return recov_data
6 changes: 3 additions & 3 deletions build/lib/imputegap/algorithms/cdrec.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ def native_cdrec(__py_matrix, __py_rank, __py_epsilon, __py_iterations):
Khayati, M., Cudré-Mauroux, P. & Böhlen, M.H. Scalable recovery of missing blocks in time series with high and low cross-correlations. Knowl Inf Syst 62, 2257–2280 (2020). https://doi.org/10.1007/s10115-019-01421-7
"""

shared_lib = utils.load_share_lib("lib_cdrec.so")
shared_lib = utils.load_share_lib("lib_cdrec")

__py_n = len(__py_matrix);
__py_m = len(__py_matrix[0]);
Expand Down Expand Up @@ -87,7 +87,7 @@ def cdrec(incomp_data, truncation_rank, iterations, epsilon, logs=True, lib_path

"""

print(f"\t\t\t\t(PYTHON) CDRec: ({incomp_data.shape[0]},{incomp_data.shape[1]}) for rank {truncation_rank}, "
print(f"(PYTHON) CDRec: ({incomp_data.shape[0]},{incomp_data.shape[1]}) for rank {truncation_rank}, "
f"epsilon {epsilon}, and iterations {iterations}...")

start_time = time.time() # Record start time
Expand All @@ -98,6 +98,6 @@ def cdrec(incomp_data, truncation_rank, iterations, epsilon, logs=True, lib_path
end_time = time.time()

if logs:
print(f"\n\t\t> logs, imputation cdrec - Execution Time: {(end_time - start_time):.4f} seconds\n")
print(f"\n\t> logs, imputation cdrec - Execution Time: {(end_time - start_time):.4f} seconds\n")

return recov_data
4 changes: 2 additions & 2 deletions build/lib/imputegap/algorithms/cpp_integration.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ def native_algo(__py_matrix, __py_param):

"""

shared_lib = utils.load_share_lib("to_adapt.so")
shared_lib = utils.load_share_lib("to_adapt")

__py_n = len(__py_matrix);
__py_m = len(__py_matrix[0]);
Expand Down Expand Up @@ -79,6 +79,6 @@ def your_algo(contamination, param, logs=True):
end_time = time.time()

if logs:
print(f"\n\t\t> logs, imputation algo - Execution Time: {(end_time - start_time):.4f} seconds\n")
print(f"\n\t> logs, imputation algo - Execution Time: {(end_time - start_time):.4f} seconds\n")

return recov_data
2 changes: 1 addition & 1 deletion build/lib/imputegap/algorithms/deep_mvi.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,6 @@ def deep_mvi(incomp_data, max_epoch=1000, patience=2, lr=0.001, logs=True):

end_time = time.time()
if logs:
print(f"\n\t\t> logs, imputation deep mvi - Execution Time: {(end_time - start_time):.4f} seconds\n")
print(f"\n\t> logs, imputation deep mvi - Execution Time: {(end_time - start_time):.4f} seconds\n")

return recov_data
4 changes: 2 additions & 2 deletions build/lib/imputegap/algorithms/dynammo.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ def native_dynammo(__py_matrix, __py_h, __py_maxIter, __py_fast):
L. Li, J. McCann, N. S. Pollard, and C. Faloutsos. Dynammo: mining and summarization of coevolving sequences with missing values. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, June 28 - July 1, 2009, pages 507–516, 2009.
"""

shared_lib = utils.load_share_lib("lib_dynammo.so")
shared_lib = utils.load_share_lib("lib_dynammo")

__py_n = len(__py_matrix);
__py_m = len(__py_matrix[0]);
Expand Down Expand Up @@ -92,6 +92,6 @@ def dynammo(incomp_data, h, max_iteration, approximation, logs=True, lib_path=No
end_time = time.time()

if logs:
print(f"\n\t\t> logs, imputation DynaMMo - Execution Time: {(end_time - start_time):.4f} seconds\n")
print(f"\n\t> logs, imputation DynaMMo - Execution Time: {(end_time - start_time):.4f} seconds\n")

return recov_data
2 changes: 1 addition & 1 deletion build/lib/imputegap/algorithms/gain.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,6 @@ def gain(incomp_data, batch_size=32, hint_rate=0.9, alpha=10, epoch=100, logs=Tr

end_time = time.time()
if logs:
print(f"\n\t\t> logs, imputation gain - Execution Time: {(end_time - start_time):.4f} seconds\n")
print(f"\n\t> logs, imputation gain - Execution Time: {(end_time - start_time):.4f} seconds\n")

return recov_data
2 changes: 1 addition & 1 deletion build/lib/imputegap/algorithms/grin.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,6 @@ def grin(incomp_data, d_hidden=32, lr=0.001, batch_size=32, window=10, alpha=10.

end_time = time.time()
if logs:
print(f"\n\t\t> logs, imputation grin - Execution Time: {(end_time - start_time):.4f} seconds\n")
print(f"\n\t> logs, imputation grin - Execution Time: {(end_time - start_time):.4f} seconds\n")

return recov_data
4 changes: 2 additions & 2 deletions build/lib/imputegap/algorithms/grouse.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ def native_grouse(__py_matrix, __py_rank):
D. Zhang and L. Balzano. Global convergence of a grassmannian gradient descent algorithm for subspace estimation. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, AISTATS 2016, Cadiz, Spain, May 9-11, 2016, pages 1460–1468, 2016.
"""

shared_lib = utils.load_share_lib("lib_grouse.so")
shared_lib = utils.load_share_lib("lib_grouse")

__py_n = len(__py_matrix);
__py_m = len(__py_matrix[0]);
Expand Down Expand Up @@ -80,6 +80,6 @@ def grouse(incomp_data, max_rank, logs=True, lib_path=None):
end_time = time.time()

if logs:
print(f"\n\t\t> logs, imputation GROUSE - Execution Time: {(end_time - start_time):.4f} seconds\n")
print(f"\n\t> logs, imputation GROUSE - Execution Time: {(end_time - start_time):.4f} seconds\n")

return recov_data
2 changes: 1 addition & 1 deletion build/lib/imputegap/algorithms/hkmf_t.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,6 @@ def hkmf_t(incomp_data, tags=None, data_names=None, epoch=10, logs=True):

end_time = time.time()
if logs:
print(f"\n\t\t> logs, imputation hkmf_t - Execution Time: {(end_time - start_time):.4f} seconds\n")
print(f"\n\t> logs, imputation hkmf_t - Execution Time: {(end_time - start_time):.4f} seconds\n")

return recov_data
2 changes: 1 addition & 1 deletion build/lib/imputegap/algorithms/iim.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,6 @@ def iim(incomp_data, number_neighbor, algo_code, logs=True):

end_time = time.time()
if logs:
print(f"\n\t\t> logs, imputation iim - Execution Time: {(end_time - start_time):.4f} seconds\n")
print(f"\n\t> logs, imputation iim - Execution Time: {(end_time - start_time):.4f} seconds\n")

return recov_data
4 changes: 2 additions & 2 deletions build/lib/imputegap/algorithms/interpolation.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ def interpolation(incomp_data, method="linear", poly_order=2, logs=True):

"""

print(f"\t\t\t\t(PYTHON) interpolation : ({incomp_data.shape[0]},{incomp_data.shape[1]}) for method {method}"
print(f"(PYTHON) interpolation : ({incomp_data.shape[0]},{incomp_data.shape[1]}) for method {method}"
f", and polynomial order {poly_order}...")

start_time = time.time() # Record start time
Expand Down Expand Up @@ -70,6 +70,6 @@ def interpolation(incomp_data, method="linear", poly_order=2, logs=True):

end_time = time.time()
if logs:
print(f"\n\t\t> logs, imputation with interpolation - Execution Time: {(end_time - start_time):.4f} seconds\n")
print(f"\n\t> logs, imputation with interpolation - Execution Time: {(end_time - start_time):.4f} seconds\n")

return recov_data
4 changes: 2 additions & 2 deletions build/lib/imputegap/algorithms/iterative_svd.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ def native_iterative_svd(__py_matrix, __py_rank):
Olga Troyanskaya, Michael Cantor, Gavin Sherlock, Pat Brown, Trevor Hastie, Robert Tibshirani, David Botstein, Russ B. Altman, Missing value estimation methods for DNA microarrays , Bioinformatics, Volume 17, Issue 6, June 2001, Pages 520–525, https://doi.org/10.1093/bioinformatics/17.6.520
"""

shared_lib = utils.load_share_lib("lib_iterative_svd.so")
shared_lib = utils.load_share_lib("lib_iterative_svd")

__py_n = len(__py_matrix);
__py_m = len(__py_matrix[0]);
Expand Down Expand Up @@ -82,6 +82,6 @@ def iterative_svd(incomp_data, truncation_rank, logs=True, lib_path=None):
end_time = time.time()

if logs:
print(f"\n\t\t> logs, imputation iterative svd - Execution Time: {(end_time - start_time):.4f} seconds\n")
print(f"\n\t> logs, imputation iterative svd - Execution Time: {(end_time - start_time):.4f} seconds\n")

return recov_data
4 changes: 2 additions & 2 deletions build/lib/imputegap/algorithms/knn.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ def knn(incomp_data, k=5, weights="uniform", logs=True):

"""

print(f"\t\t\t\t(PYTHON) KNN: ({incomp_data.shape[0]},{incomp_data.shape[1]}) for k {k}, "
print(f"(PYTHON) KNNImpute: ({incomp_data.shape[0]},{incomp_data.shape[1]}) for k {k}, "
f", and weights {weights}...")

start_time = time.time() # Record start time
Expand Down Expand Up @@ -74,6 +74,6 @@ def knn(incomp_data, k=5, weights="uniform", logs=True):

end_time = time.time()
if logs:
print(f"\n\t\t> logs, imputation knn - Execution Time: {(end_time - start_time):.4f} seconds\n")
print(f"\n\t> logs, imputation knn_impute - Execution Time: {(end_time - start_time):.4f} seconds\n")

return recov_data
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added build/lib/imputegap/algorithms/lib/lib_svt.dylib
Binary file not shown.
Binary file added build/lib/imputegap/algorithms/lib/lib_tkcm.dylib
Binary file not shown.
Loading