Cyclic/Circular block bootstrapping - refactor

**Update - 29/01/2025**

This is issue is slightly morphed to instead be refactoring follow-up of #418 - now that xbootstrap is ported into scores and no longer needs an alternate implementation as an option.

However, the internal functions still need to conform with the styles and patterns used in other scores metrics. I believe in particular a lot of the "nested" logic can be replaced with a more readable and extensible design (though not trivial, there are elegant ways to do this)

See: https://github.com/nci/scores/issues/522#issuecomment-2621433404 for further comments

---

<span style="color: rgba(5,5,5);"><strike>**I would like the following data processing tool to be considered for addition to the `scores` repository**  

I want to add an https://github.com/nci/scores/labels/emerging version of the block bootstrapping.

A common implementation used scientific users, from the `xbootstrap` package. An initial review of the code (albeit by myself) found that while the functionally seems to be correct, it has some things I'm unsure about/are hard to verify, which are explained in #418. Shaping it to conform to the coding standards in `scores` can be tricky due to the reliance of multiple coding paradigms in the original implementation.

Furthermore, any updates/bugs/api incompatibilities related to the original implementation will have to be tracked and ported across.

I think it may be good to have a "in-house"/redesigned version. This version would be more in line with our code design paradigms, and hopefully improve maintainability and extension.

**Please provide a reference data processing tool** 
 
See:
- PR to port original tool: #418 (note: currently houses both `xbootstrap` and a sample emerging implementation. The latter will be ported out to a fork - and will be the implementation to this issue.)
- original tool: https://github.com/dougiesquire/xbootstrap/blob/main/xbootstrap/core.py
</strike>

- reference: Wilks, Daniel S. Statistical methods in the atmospheric sciences. Vol. 100 
 Academic press, 2011.

> [!NOTE]
> The concept of bootstrapping isn't in itself that hard. It basically involves sampling a block of data per iteration rather than a single point and reshaping it to fit the original dataset, potentially stacked over several iterations. This is done to address cross-correlations between samples affecting various statistical estimators e.g. `mae`.
>
> This is easily available in (in-built?) `R` packages. The trickiness comes from having to deal with arbitrary number of axes (nd-arrays) efficiently, and block sizes that don't fill the nd-axes tightly. The solution is actually fairly straightforward with recursive algorithms/functional programming, but things can become verbose in an iterative implementation.
>
<strike>
<quote>There are several tricks that can be used to overcome this, so I would suggest an initial implementation be in the **https://github.com/nci/scores/labels/emerging** space</quote>
</strike>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cyclic/Circular block bootstrapping - refactor #522

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cyclic/Circular block bootstrapping - refactor #522

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions