Skip to content

Conversation

@yuxuan-z19
Copy link

@yuxuan-z19 yuxuan-z19 commented Jul 26, 2025

Summary

This PR upgrades GraKeL to support Python 3.8 and above, officially dropping Python 2.x compatibility, which has been end-of-life since January 1, 2020.

In addition:

  • Refactored code to align with modern NumPy and SciPy APIs
  • Updated pyproject.toml to specify minimum Python version and dependencies
  • Removed or updated legacy compatibility code (e.g., six, future imports).

Compatibility

  • BREAKING CHANGE: Python <3.8 is no longer supported.
  • Unit tests now target Python 3.8+ with the latest NumPy/SciPy versions.

Rationale

  • Improves long-term maintainability and compatibility with modern ML pipelines.
  • Fixes deprecation warnings introduced by recent versions of NumPy and SciPy.
  • Unblocks integration with newer frameworks (e.g., PyTorch Geometric, ONNX tools).

Runtime Warnings in Tests

During the upgrade, the following test cases raise RuntimeWarning: invalid value encountered in sqrt or divide:

grakel/tests/test_Kernel.py::test_svm_theta
grakel/tests/test_Kernel.py::test_svm_theta_pd
  /data/zyx/GraKeL/grakel/kernels/kernel.py:196: RuntimeWarning: invalid value encountered in divide
    return km / np.sqrt(np.outer(self._X_diag, self._X_diag))

grakel/tests/test_Kernel.py::test_svm_theta
grakel/tests/test_Kernel.py::test_svm_theta_pd
  /data/zyx/GraKeL/grakel/kernels/kernel.py:161: RuntimeWarning: invalid value encountered in divide
    km /= np.sqrt(np.outer(Y_diag, X_diag))
grakel/tests/test_Kernel.py::test_neighborhood_subgraph_pairwise_distance
  /data/zyx/GraKeL/grakel/kernels/neighborhood_subgraph_pairwise_distance.py:311: RuntimeWarning: invalid value encountered in divide
    Q = K / np.sqrt(np.outer(K_diag, K_diag))

grakel/tests/test_Kernel.py::test_neighborhood_subgraph_pairwise_distance
  /data/zyx/GraKeL/grakel/kernels/neighborhood_subgraph_pairwise_distance.py:275: RuntimeWarning: invalid value encountered in divide
    S += np.nan_to_num(K / np.sqrt(np.outer(np.array(Mp.power(2).sum(-1)), N[key])))
grakel/tests/test_Kernel.py::test_random_walk
grakel/tests/test_Kernel.py::test_random_walk_pd
grakel/tests/test_Kernel.py::test_random_walk_labels
grakel/tests/test_Kernel.py::test_random_walk_labels_pd
  /data/zyx/GraKeL/grakel/kernels/kernel.py:196: RuntimeWarning: invalid value encountered in sqrt
    return km / np.sqrt(np.outer(self._X_diag, self._X_diag))

grakel/tests/test_Kernel.py::test_random_walk
grakel/tests/test_Kernel.py::test_random_walk_pd
grakel/tests/test_Kernel.py::test_random_walk_labels
grakel/tests/test_Kernel.py::test_random_walk_labels_pd
  /data/zyx/GraKeL/grakel/kernels/kernel.py:161: RuntimeWarning: invalid value encountered in sqrt
    km /= np.sqrt(np.outer(Y_diag, X_diag))

These warnings may stem from:

  • Division by zero
  • Negative values under square roots
  • Improper normalization steps

They may indicate numerical instability or missing input validation in kernels. Review is requested to:

  • Confirm whether these are expected (e.g., due to random test data)
  • Improve test assertions if needed

Copy link
Owner

@ysig ysig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add uv.lock in .gitignore

@ysig
Copy link
Owner

ysig commented Jul 26, 2025

@yuxuan-z19 Thank you for your pull request! I'm very willing to proceed and pull it.
I require some major and some minor changes.

Major Please make the kernels work:

  • test_svm_theta
  • test_neighborhood_subgraph_pairwise_distance
  • test_random_walk
    It seems that one of your changes breaks them (it's probably an overflow error?)

Random walk is a very fundamental kernel to approve me have it work while its failing.
@giannisnik maybe can you have a look to help @yuxuan-z19 identify the error so we can fix it?
(it's important that we pull this request as a lot of people are having hard time installing the library now.)

Minor updates @yuxuan-z19 :

  • update the version to 0.1.11
  • add uv.lock in .gitignore

@yuxuan-z19
Copy link
Author

yuxuan-z19 commented Jul 26, 2025

@ysig @giannisnik I've investigated the failing tests in grakel/tests/test_Kernel.py. All of them invoke generate_dataset() and trigger failures reproducibly on commit 6a9cebf.

The issue stems from self._X_diag containing negative values, causing np.outer() to yield invalid results during normalization (i.e., sqrt() over negative values).

As a workaround, I suggest applying a positive semi-definite (PSD) correction by shifting the kernel matrix before normalization. Here is the code snippet implementing this approach:

epsilon = 1e-12
min_diag = np.min(X_diag)
if min_diag < 0:
    shift = abs(min_diag) + epsilon
    km += shift * np.eye(km.shape[0])
    X_diag = np.diagonal(km)
out = np.outer(X_diag, X_diag)
out[out == 0] = epsilon
res = km / np.sqrt(np.outer(X_diag, X_diag))

This ensures the diagonal entries are all positive before computing the normalization denominator.

I will add a new test to verify the correctness and stability of kernel values before and after applying the PSD adjustment on the successful tests in the test_Kernel.py.

@ysig
Copy link
Owner

ysig commented Jul 26, 2025

@yuxuan-z19 Thank you for locating this.
Diagonal elements shouldn't be negative (probably an overflow) nor close to zero.

  • Overflow sounds related to: x_sol, _ = cg(A, b, rtol=1.0e-6, maxiter=20)
  • Close to zero could be related to generated_dataset.
    @giannisnik what do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants