Skip to content

[ENH] Implement Proximity Forest 2.0 classifier using aeon distances #1978

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

itsdivya1309
Copy link
Contributor

@itsdivya1309 itsdivya1309 commented Aug 15, 2024

Reference Issues/PRs

Closes: #428
Incorporates changes suggested in #1874 (maybe we can close PR #1876)

What does this implement/fix? Explain your changes.

  1. Created a private distance file to parameterise DTW and ADTW distances.
  2. Created a function to calculate the first_order_derivative of a time series.
  3. Implemented the ProximityTree2 and ProximityForest2 class, as per the paper.
  4. Added the classes to API.
  5. Wrote unit tests.

To do:

  • Improve the computational efficiency by implementing Early Abandoning and Pruning algorithm for elastic distance measures.

@aeon-actions-bot aeon-actions-bot bot added classification Classification package enhancement New feature, improvement request or other non-bug code enhancement labels Aug 15, 2024
@aeon-actions-bot
Copy link
Contributor

Thank you for contributing to aeon

I have added the following labels to this PR based on the title: [ $\color{#FEF1BE}{\textsf{enhancement}}$ ].
I have added the following labels to this PR based on the changes made: [ $\color{#BCAE15}{\textsf{classification}}$ ]. Feel free to change these if they do not properly represent the PR.

The Checks tab will show the status of our automated tests. You can click on individual test runs in the tab or "Details" in the panel below to see more information if there is a failure.

If our pre-commit code quality check fails, any trivial fixes will automatically be pushed to your PR unless it is a draft.

Don't hesitate to ask questions on the aeon Slack channel if you have any.

PR CI actions

These checkboxes will add labels to enable/disable CI functionality for this PR. This may not take effect immediately, and a new commit may be required to run the new configuration.

  • Run pre-commit checks for all files
  • Run all pytest tests and configurations
  • Run all notebook example tests
  • Run numba-disabled codecov tests
  • Stop automatic pre-commit fixes (always disabled for drafts)
  • Push an empty commit to re-run CI checks

Copy link
Member

@MatthewMiddlehurst MatthewMiddlehurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put the distances in the same file as pf2 for now

@TonyBagnall
Copy link
Contributor

have we compared results vs published?

@MatthewMiddlehurst
Copy link
Member

MatthewMiddlehurst commented Sep 16, 2024

Not yet, need to put it on the cluster. I have not got results for the original yet also.

@itsdivya1309
Copy link
Contributor Author

have we compared results vs published?

Actually, the algorithm isn't complete yet. We still need to work on computational power, particularly integrating the EAP technique. I'd resume the work in a couple of days.

@MatthewMiddlehurst MatthewMiddlehurst marked this pull request as draft September 21, 2024 13:05
@itsdivya1309 itsdivya1309 marked this pull request as ready for review October 28, 2024 05:08
@MatthewMiddlehurst
Copy link
Member

Want to run this soon. This is not blocking that, but some component seems to be non-deterministic.

@baraline
Copy link
Member

baraline commented Dec 4, 2024

What were the remaining things to do here ? In terms of testing, it seems that there is some kind of non-deterministic thing happens, which given the algorithm shouldn't be happening If I remember correctly.

@MatthewMiddlehurst
Copy link
Member

Need to compare to past results and yeah something is failing the non-deterministic test

@itsdivya1309
Copy link
Contributor Author

What were the remaining things to do here ? In terms of testing, it seems that there is some kind of non-deterministic thing happens, which given the algorithm shouldn't be happening If I remember correctly.

Hi, we need to integrate the EAP for distance measures to complete this algorithm as such. In the current implementation, I've used private distance functions, which use pruning to stop distance calculations greater than a threshold.

@itsdivya1309
Copy link
Contributor Author

Hi, @MatthewMiddlehurst, I've corrected the code. I think we are good to compare the results now.

@MatthewMiddlehurst
Copy link
Member

that was not the issue, its will be internal to the fit tree function. Would revert the last change as the previous was how we do that elsewhere.

Does not stop me evaluating, just have to find the time to get it working with our setup and run it 🙂

@MatthewMiddlehurst
Copy link
Member

MatthewMiddlehurst commented Jun 2, 2025

Hi, I ran this on 90+ UCR datasets with 5 resamples, and it appears to perform worse.

5 resamples

aeon average acc: 0.827681
paper average acc: 0.861343
average acc diff: -0.03366

  aeon paper diff
SonyAIBORobotSurface1 0.941764 0.878869 0.062895
Wine 0.822222 0.781481 0.040741
Lightning7 0.775342 0.742466 0.032877
Lightning2 0.809836 0.783607 0.02623
ECG200 0.904 0.89 0.014
RefrigerationDevices 0.7008 0.693333 0.007467
DiatomSizeReduction 0.944444 0.937255 0.00719
HouseTwenty 0.941176 0.934454 0.006723
Ham 0.737143 0.733333 0.00381
DistalPhalanxOutlineCorrect 0.806522 0.803623 0.002899
FaceFour 0.956818 0.954545 0.002273
ItalyPowerDemand 0.957629 0.955879 0.001749
ECG5000 0.944267 0.943111 0.001156
Mallat 0.97177 0.971087 0.000682
Coffee 1 1 0
GunPointOldVersusYoung 1 1 0
InsectEPGRegularTrain 1 1 0
InsectEPGSmallTrain 1 1 0
SmoothSubspace 0.998667 0.998667 0
Trace 1 1 0
TwoPatterns 0.9997 0.99995 -0.00025
GunPointMaleVersusFemale 0.998101 0.999367 -0.00127
Wafer 0.99682 0.998799 -0.00198
SyntheticControl 0.992667 0.995333 -0.00267
Earthquakes 0.748201 0.751079 -0.00288
FaceAll 0.947456 0.951361 -0.00391
Plane 0.994286 1 -0.00571
FacesUCR 0.959024 0.964878 -0.00585
OliveOil 0.88 0.886667 -0.00667
DistalPhalanxOutlineAgeGroup 0.794245 0.801439 -0.00719
MixedShapesSmallTrain 0.92701 0.93468 -0.00767
MixedShapesRegularTrain 0.962392 0.971546 -0.00915
PhalangesOutlinesCorrect 0.812121 0.821678 -0.00956
Chinatown 0.956851 0.96793 -0.01108
MoteStrain 0.916294 0.927476 -0.01118
CBF 0.975778 0.988222 -0.01244
Meat 0.963333 0.976667 -0.01333
CricketZ 0.804103 0.818974 -0.01487
CinCECGTorso 0.973768 0.988696 -0.01493
InsectWingbeatSound 0.615253 0.630202 -0.01495
Worms 0.693506 0.709091 -0.01558
Crop 0.745595 0.762512 -0.01692
GunPointAgeSpan 0.978481 0.99557 -0.01709
CricketX 0.781538 0.798974 -0.01744
PowerCons 0.97 0.987778 -0.01778
ShapeletSim 0.89 0.907778 -0.01778
Strawberry 0.942162 0.963784 -0.02162
Computers 0.768 0.7904 -0.0224
BME 0.973333 1 -0.02667
Herring 0.56875 0.596875 -0.02813
ACSF1 0.712 0.742 -0.03
SwedishLeaf 0.93312 0.96384 -0.03072
ToeSegmentation2 0.887692 0.918462 -0.03077
Symbols 0.934271 0.96804 -0.03377
CricketY 0.767692 0.802051 -0.03436
Haptics 0.446104 0.480519 -0.03442
UWaveGestureLibraryY 0.75321 0.78794 -0.03473
UWaveGestureLibraryX 0.815131 0.851591 -0.03646
ArrowHead 0.861714 0.898286 -0.03657
MiddlePhalanxOutlineCorrect 0.798625 0.837113 -0.03849
UWaveGestureLibraryZ 0.750307 0.789336 -0.03903
MedicalImages 0.749211 0.791316 -0.04211
WordSynonyms 0.748903 0.79185 -0.04295
ECGFiveDays 0.926365 0.973519 -0.04715
Rock 0.764 0.812 -0.048
SonyAIBORobotSurface2 0.843861 0.89192 -0.04806
ShapesAll 0.85 0.898667 -0.04867
WormsTwoClass 0.737662 0.787013 -0.04935
BirdChicken 0.88 0.93 -0.05
GunPoint 0.949333 1 -0.05067
Yoga 0.8544 0.9054 -0.051
ProximalPhalanxTW 0.747317 0.8 -0.05268
FiftyWords 0.793846 0.847473 -0.05363
ProximalPhalanxOutlineAgeGroup 0.794146 0.847805 -0.05366
ProximalPhalanxOutlineCorrect 0.817182 0.876976 -0.05979
Fish 0.908571 0.969143 -0.06057
MiddlePhalanxTW 0.497403 0.558442 -0.06104
DistalPhalanxTW 0.647482 0.709353 -0.06187
Adiac 0.690537 0.75601 -0.06547
MiddlePhalanxOutlineAgeGroup 0.607792 0.675325 -0.06753
ChlorineConcentration 0.584635 0.652344 -0.06771
ScreenType 0.5696 0.6416 -0.072
OSULeaf 0.800826 0.88843 -0.0876
UMD 0.884722 0.975 -0.09028
SmallKitchenAppliances 0.685333 0.780267 -0.09493
BeetleFly 0.76 0.86 -0.1
Beef 0.52 0.62 -0.1
EOGHorizontalSignal 0.668508 0.768508 -0.1
InlineSkate 0.402182 0.505455 -0.10327
LargeKitchenAppliances 0.716267 0.819733 -0.10347
EOGVerticalSignal 0.656354 0.760221 -0.10387
FreezerRegularTrain 0.882456 0.999228 -0.11677
Car 0.763333 0.883333 -0.12
ToeSegmentation1 0.769298 0.89386 -0.12456
FreezerSmallTrain 0.754386 0.892912 -0.13853
TwoLeadECG 0.830378 0.997191 -0.16681

train/test

aeon average acc: 0.815098
paper average acc: 0.848603
average acc diff: -0.0335

  aeon paper diff
SonyAIBORobotSurface1 0.908486 0.816972 0.091514
Wine 0.574074 0.518519 0.055556
Ham 0.685714 0.657143 0.028571
ArrowHead 0.897143 0.874286 0.022857
DiatomSizeReduction 0.973856 0.954248 0.019608
HouseTwenty 0.94958 0.932773 0.016807
Lightning2 0.836066 0.819672 0.016393
RefrigerationDevices 0.557333 0.546667 0.010667
DistalPhalanxOutlineCorrect 0.789855 0.786232 0.003623
Coffee 1 1 0
DistalPhalanxOutlineAgeGroup 0.726619 0.726619 0
FaceFour 0.965909 0.965909 0
GunPointMaleVersusFemale 1 1 0
GunPointOldVersusYoung 1 1 0
Haptics 0.457792 0.457792 0
InsectEPGRegularTrain 1 1 0
InsectEPGSmallTrain 1 1 0
OliveOil 0.9 0.9 0
SmoothSubspace 1 1 0
SyntheticControl 0.99 0.99 0
Trace 1 1 0
FacesUCR 0.957073 0.957561 -0.00049
TwoPatterns 0.999 0.99975 -0.00075
CricketY 0.807692 0.810256 -0.00256
CricketZ 0.802564 0.805128 -0.00256
Wafer 0.99562 0.998378 -0.00276
Chinatown 0.973761 0.976676 -0.00292
ECG5000 0.941556 0.945778 -0.00422
ItalyPowerDemand 0.962099 0.96793 -0.00583
FaceAll 0.800592 0.808284 -0.00769
MixedShapesRegularTrain 0.960825 0.969485 -0.00866
CinCECGTorso 0.960145 0.968841 -0.0087
Plane 0.990476 1 -0.00952
MiddlePhalanxTW 0.519481 0.532468 -0.01299
GunPoint 0.986667 1 -0.01333
Lightning7 0.780822 0.794521 -0.0137
Earthquakes 0.748201 0.76259 -0.01439
MixedShapesSmallTrain 0.922887 0.938144 -0.01526
Meat 0.916667 0.933333 -0.01667
GunPointAgeSpan 0.981013 1 -0.01899
SwedishLeaf 0.9312 0.9504 -0.0192
BME 0.98 1 -0.02
CricketX 0.776923 0.797436 -0.02051
Crop 0.746786 0.767321 -0.02054
ProximalPhalanxOutlineCorrect 0.859107 0.879725 -0.02062
MoteStrain 0.904153 0.92492 -0.02077
DistalPhalanxTW 0.640288 0.661871 -0.02158
Mallat 0.955224 0.978252 -0.02303
ToeSegmentation2 0.884615 0.907692 -0.02308
MedicalImages 0.759211 0.782895 -0.02368
EOGHorizontalSignal 0.546961 0.571823 -0.02486
PowerCons 0.961111 0.988889 -0.02778
Symbols 0.948744 0.976884 -0.02814
UWaveGestureLibraryY 0.754886 0.784478 -0.02959
Strawberry 0.935135 0.964865 -0.02973
Herring 0.59375 0.625 -0.03125
Computers 0.712 0.744 -0.032
InsectWingbeatSound 0.611111 0.643434 -0.03232
Beef 0.7 0.733333 -0.03333
UMD 0.951389 0.986111 -0.03472
PhalangesOutlinesCorrect 0.794872 0.829837 -0.03497
Yoga 0.854333 0.889333 -0.035
FiftyWords 0.804396 0.841758 -0.03736
CBF 0.957778 0.996667 -0.03889
Worms 0.662338 0.701299 -0.03896
UWaveGestureLibraryX 0.807929 0.847292 -0.03936
ACSF1 0.79 0.83 -0.04
ECG200 0.9 0.94 -0.04
UWaveGestureLibraryZ 0.745114 0.785874 -0.04076
ShapesAll 0.848333 0.891667 -0.04333
ProximalPhalanxTW 0.746341 0.790244 -0.0439
MiddlePhalanxOutlineAgeGroup 0.551948 0.597403 -0.04545
WordSynonyms 0.731975 0.782132 -0.05016
WormsTwoClass 0.727273 0.779221 -0.05195
ProximalPhalanxOutlineAgeGroup 0.795122 0.84878 -0.05366
Fish 0.925714 0.982857 -0.05714
SonyAIBORobotSurface2 0.828961 0.886674 -0.05771
ChlorineConcentration 0.580469 0.642969 -0.0625
FreezerSmallTrain 0.682456 0.74807 -0.06561
ShapeletSim 0.883333 0.95 -0.06667
ECGFiveDays 0.894309 0.962834 -0.06852
ScreenType 0.466667 0.536 -0.06933
MiddlePhalanxOutlineCorrect 0.776632 0.85567 -0.07904
Rock 0.72 0.8 -0.08
FreezerRegularTrain 0.910175 0.998246 -0.08807
EOGVerticalSignal 0.475138 0.569061 -0.09392
Adiac 0.680307 0.780051 -0.09974
InlineSkate 0.374545 0.494545 -0.12
LargeKitchenAppliances 0.645333 0.765333 -0.12
ToeSegmentation1 0.846491 0.969298 -0.12281
SmallKitchenAppliances 0.658667 0.784 -0.12533
OSULeaf 0.743802 0.871901 -0.1281
BirdChicken 0.8 0.95 -0.15
Car 0.783333 0.933333 -0.15
TwoLeadECG 0.833187 0.998244 -0.16506
BeetleFly 0.65 0.85 -0.2

@itsdivya1309
Copy link
Contributor Author

The results are not what I expected :(
Apart from EAP, everything else was adopted per the paper, so accuracy should have been the same. Will look into the code once more.

@MatthewMiddlehurst
Copy link
Member

I have updated to also include just the default train/test split. The paper seems to have changed a big from the arxiv version. We can try asking to authors if necessary.

@itsdivya1309
Copy link
Contributor Author

I have updated to also include just the default train/test split. The paper seems to have changed a big from the arxiv version. We can try asking to authors if necessary.

Yes, you are right, in the published version, they've added HYDRA as well. It wasn't there in the arxiv version.

@itsdivya1309
Copy link
Contributor Author

Hi @MatthewMiddlehurst, could you rerun this on the UCR dataset? Have added Minkowski+HYDRA, so it may be slow compared to ProximityForest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
classification Classification package enhancement New feature, improvement request or other non-bug code enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] Implement the Proximity Forest 2 classifier using aeon distances
4 participants