Skip to content
This repository was archived by the owner on Oct 24, 2024. It is now read-only.

Commit 5082eb0

Browse files
Method to match node paths via glob (#267)
* test * implementation * documentation * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * whatsnew * API * correct faulty test * remove newline * search-> match * format continuation lines correctly --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 07ac0ae commit 5082eb0

File tree

5 files changed

+89
-1
lines changed

5 files changed

+89
-1
lines changed

datatree/datatree.py

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1242,8 +1242,13 @@ def filter(self: DataTree, filterfunc: Callable[[DataTree], bool]) -> DataTree:
12421242
filterfunc: function
12431243
A function which accepts only one DataTree - the node on which filterfunc will be called.
12441244
1245+
Returns
1246+
-------
1247+
DataTree
1248+
12451249
See Also
12461250
--------
1251+
match
12471252
pipe
12481253
map_over_subtree
12491254
"""
@@ -1252,6 +1257,51 @@ def filter(self: DataTree, filterfunc: Callable[[DataTree], bool]) -> DataTree:
12521257
}
12531258
return DataTree.from_dict(filtered_nodes, name=self.root.name)
12541259

1260+
def match(self, pattern: str) -> DataTree:
1261+
"""
1262+
Return nodes with paths matching pattern.
1263+
1264+
Uses unix glob-like syntax for pattern-matching.
1265+
1266+
Parameters
1267+
----------
1268+
pattern: str
1269+
A pattern to match each node path against.
1270+
1271+
Returns
1272+
-------
1273+
DataTree
1274+
1275+
See Also
1276+
--------
1277+
filter
1278+
pipe
1279+
map_over_subtree
1280+
1281+
Examples
1282+
--------
1283+
>>> dt = DataTree.from_dict(
1284+
... {
1285+
... "/a/A": None,
1286+
... "/a/B": None,
1287+
... "/b/A": None,
1288+
... "/b/B": None,
1289+
... }
1290+
... )
1291+
>>> dt.match("*/B")
1292+
DataTree('None', parent=None)
1293+
├── DataTree('a')
1294+
│ └── DataTree('B')
1295+
└── DataTree('b')
1296+
└── DataTree('B')
1297+
"""
1298+
matching_nodes = {
1299+
node.path: node.ds
1300+
for node in self.subtree
1301+
if NodePath(node.path).match(pattern)
1302+
}
1303+
return DataTree.from_dict(matching_nodes, name=self.root.name)
1304+
12551305
def map_over_subtree(
12561306
self,
12571307
func: Callable,

datatree/tests/test_datatree.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -678,6 +678,25 @@ def f(x, tree, y):
678678

679679

680680
class TestSubset:
681+
def test_match(self):
682+
# TODO is this example going to cause problems with case sensitivity?
683+
dt = DataTree.from_dict(
684+
{
685+
"/a/A": None,
686+
"/a/B": None,
687+
"/b/A": None,
688+
"/b/B": None,
689+
}
690+
)
691+
result = dt.match("*/B")
692+
expected = DataTree.from_dict(
693+
{
694+
"/a/B": None,
695+
"/b/B": None,
696+
}
697+
)
698+
dtt.assert_identical(result, expected)
699+
681700
def test_filter(self):
682701
simpsons = DataTree.from_dict(
683702
d={

docs/source/api.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,7 @@ For manipulating, traversing, navigating, or mapping over the tree structure.
102102
DataTree.find_common_ancestor
103103
map_over_subtree
104104
DataTree.pipe
105+
DataTree.match
105106
DataTree.filter
106107

107108
DataTree Contents

docs/source/hierarchical-data.rst

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -379,7 +379,23 @@ Subsetting Tree Nodes
379379

380380
We can subset our tree to select only nodes of interest in various ways.
381381

382-
The :py:meth:`DataTree.filter` method can be used to retain only the nodes of a tree that meet a certain condition.
382+
Similarly to on a real filesystem, matching nodes by common patterns in their paths is often useful.
383+
We can use :py:meth:`DataTree.match` for this:
384+
385+
.. ipython:: python
386+
387+
dt = DataTree.from_dict(
388+
{
389+
"/a/A": None,
390+
"/a/B": None,
391+
"/b/A": None,
392+
"/b/B": None,
393+
}
394+
)
395+
result = dt.match("*/B")
396+
397+
We can also subset trees by the contents of the nodes.
398+
:py:meth:`DataTree.filter` retains only the nodes of a tree that meet a certain condition.
383399
For example, we could recreate the Simpson's family tree with the ages of each individual, then filter for only the adults:
384400
First lets recreate the tree but with an `age` data variable in every node:
385401

docs/source/whats-new.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,8 @@ v0.0.13 (unreleased)
2323
New Features
2424
~~~~~~~~~~~~
2525

26+
- New :py:meth:`DataTree.match` method for glob-like pattern matching of node paths. (:pull:`267`)
27+
By `Tom Nicholas <https://github.com/TomNicholas>`_.
2628
- Indicate which node caused the problem if error encountered while applying user function using :py:func:`map_over_subtree`
2729
(:issue:`190`, :pull:`264`). Only works when using python 3.11 or later.
2830
By `Tom Nicholas <https://github.com/TomNicholas>`_.

0 commit comments

Comments
 (0)