Skip to content

Commit 92156cf

Browse files
documented method and reasoning for Partitioner "defusing"
1 parent 492ddad commit 92156cf

File tree

2 files changed

+45
-0
lines changed

2 files changed

+45
-0
lines changed

sklearn/tree/_partitioner.pxd

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,38 @@
1+
# Authors: Gilles Louppe <g.louppe@gmail.com>
2+
# Peter Prettenhofer <peter.prettenhofer@gmail.com>
3+
# Brian Holt <bdholt1@gmail.com>
4+
# Joel Nothman <joel.nothman@gmail.com>
5+
# Arnaud Joly <arnaud.v.joly@gmail.com>
6+
# Jacob Schreiber <jmschreiber91@gmail.com>
7+
# Adam Li <adam2392@gmail.com>
8+
# Jong Shin <jshinm@gmail.com>
9+
# Samuel Carliles <scarlil1@jhu.edu>
10+
#
11+
# License: BSD 3 clause
12+
# SPDX-License-Identifier: BSD-3-Clause
13+
114
from ..utils._typedefs cimport float32_t, float64_t, intp_t, int8_t, int32_t, uint32_t
215

316
# Constant to switch between algorithm non zero value extract algorithm
417
# in SparsePartitioner
518
cdef float32_t EXTRACT_NNZ_SWITCH = 0.1
619

20+
# We introduce a different approach to the fused type for {Dense, Sparse}Partitioner.
21+
# The main drawback of the fused type approach is that it seemed to require a
22+
# proliferation of concrete Splitter types in order to accommodate holding ownership
23+
# of each concrete type of Partitioner, hence the
24+
# {Best, BestSparse, Random, RandomSparse}Splitter classes. This pattern generalizes
25+
# to any class wishing to hold a concrete instance of Partitioner, which makes
26+
# reusing the Partitioner code (as we wish to do for honesty and obliqueness) a
27+
# fractal class-generating process.
28+
#
29+
# The alternative we introduce is the same pattern we use all over the place:
30+
# function pointers. Assigning method implementations as function pointer values
31+
# in init allows DensePartitioner and SparsePartitioner to be plain old subclasses
32+
# of Partitioner, and there is no performance hit from virtual method lookup.
33+
#
34+
# Since we also seek to reuse Partitioner as its own module, we break it out into
35+
# its own files.
736

837
# Introduce a fused-class to make it possible to share the split implementation
938
# between the dense and sparse cases in the node_split_best and node_split_random

sklearn/tree/_sort.pxd

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,21 @@
1+
# Authors: Gilles Louppe <g.louppe@gmail.com>
2+
# Peter Prettenhofer <peter.prettenhofer@gmail.com>
3+
# Brian Holt <bdholt1@gmail.com>
4+
# Joel Nothman <joel.nothman@gmail.com>
5+
# Arnaud Joly <arnaud.v.joly@gmail.com>
6+
# Jacob Schreiber <jmschreiber91@gmail.com>
7+
# Adam Li <adam2392@gmail.com>
8+
# Jong Shin <jshinm@gmail.com>
9+
# Samuel Carliles <scarlil1@jhu.edu>
10+
#
11+
# License: BSD 3 clause
12+
# SPDX-License-Identifier: BSD-3-Clause
13+
114
from ..utils._typedefs cimport float32_t, float64_t, intp_t, int8_t, int32_t, uint32_t
215

16+
# Since we broke Partitioner out into its own module in order to reuse it, and since
17+
# both Splitter and Partitioner use these sort functions, we break them out into
18+
# their own files in order to avoid cyclic file dependency.
319

420
# Mitigate precision differences between 32 bit and 64 bit
521
cdef float32_t FEATURE_THRESHOLD = 1e-7

0 commit comments

Comments
 (0)