Skip to content

[jvm-packages] support distributed synchronization of customized evaluation metrics #4280

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
2ee821a
add JNI stub
Mar 18, 2019
6e7e28a
temp
Mar 19, 2019
06d48bb
temp
Mar 20, 2019
5a30277
basic framework
Mar 20, 2019
4d93614
fixing metrics adding logic
Mar 21, 2019
fa37eef
investigation
Mar 22, 2019
640dd19
fix refernce problem
Mar 22, 2019
642a8fc
fix race condition when registering new customized functions
Mar 23, 2019
1cabe70
temporarily to private dmlc-core
Mar 24, 2019
c3540ba
update dmlc-core
Mar 24, 2019
b6dafee
support multi_classes
Mar 24, 2019
2f41504
try to fix double free
Mar 24, 2019
72a51b3
investigate
Mar 24, 2019
d842ccf
fix compilation
Mar 24, 2019
c304e55
test
Mar 24, 2019
63c5cd8
thread safe registering
Mar 25, 2019
d9d130d
up
Mar 25, 2019
1a1f462
copy constructor
Mar 25, 2019
3f67692
fix lambda syntax
Mar 25, 2019
b517ff0
recover some changes
Mar 25, 2019
fdbeffd
fix race condition
Mar 25, 2019
6df2c9c
fix cuda build
Mar 25, 2019
df1d89f
add ranking
Mar 25, 2019
e4fa239
add unit test
Mar 25, 2019
e135e3e
move files
Mar 26, 2019
e374d82
make lint happy
Mar 26, 2019
28bf3a3
make lint further happy
Mar 26, 2019
333f251
fix windows build
Mar 26, 2019
4d7298d
windows build
Mar 26, 2019
ee89feb
add docs
Mar 28, 2019
df90c9b
support both 4j and 4j-spark
Mar 28, 2019
708bf69
recover header
Mar 28, 2019
3802324
address comments
Apr 2, 2019
d8ced76
update file names
Apr 18, 2019
137d014
sync with upstream
Apr 18, 2019
17785c5
fix cuda buiod
Apr 19, 2019
82e0ebb
change include path
Apr 19, 2019
416d3af
change include path
Apr 19, 2019
6942142
fix mc_metrics.cc
Apr 19, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions doc/jvm/xgboost4j_spark_tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,18 @@ Training with Evaluation Sets

You can also monitor the performance of the model during training with multiple evaluation datasets. By specifying ``eval_sets`` or call ``setEvalSets`` over a XGBoostClassifier or XGBoostRegressor, you can pass in multiple evaluation datasets typed as a Map from String to DataFrame.

Training with Custom Evaluation Metrics
----------------
With XGBoost4j (including XGBoost4J-Spark), users are able to implement their own custom evaluation metrics and synchronize the values in the distributed training setting. To implement a custom evaluation metric, users should implement the interface ``ml.dmlc.xgboost4j.java.IEvalElementWiseDistributed`` (for binary classification and regression), ``ml.dmlc.xgboost4j.java.IEvalMultiClassesDistributed`` (for multi classification) and ``ml.dmlc.xgboost4j.java.IEvalRankListDistributed`` (for ranking).

* ``ml.dmlc.xgboost4j.java.IEvalElementWiseDistributed``: users are supposed to implement ``float evalRow(float label, float pred);`` which calculates the metric for a single sample given the prediction and label, as well as ``float getFinal(float errorSum, float weightSum);`` which performs the final transformation over the sum of error and weights of samples.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness, we need to have weights included in element-wise metric:

float evalRow(float label, float pred, float weight)

We may also want to define an interface to let the user define a custom reduce op.


* ``ml.dmlc.xgboost4j.java.IEvalMultiClassesDistributed``: the methods to be implemented by the users are similar to ``ml.dmlc.xgboost4j.java.IEvalElementWiseDistributed`` except that the single row metric calculating method is ``float evalRow(int label, float pred, int numClasses);``
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be float evalRow(int label, float pred, float weight, int numClasses);


* ``ml.dmlc.xgboost4j.java.IEvalRankListDistributed``: users are to implement ``float evalMetric(float[] preds, int[] labels);`` which gives the predictions and labels for instances in the same group;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be float evalMetric(float[] preds, int[] labels, float group_weight);


By default, these interfaces do not support being used in single machine evaluation, users can change this by re-implement ``float eval(float[][] predicts, DMatrix dmat)`` method.

Prediction
==========

Expand Down
1 change: 1 addition & 0 deletions include/xgboost/build_config.h
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@

// These check are for Makefile.
#if !defined(XGBOOST_MM_PREFETCH_PRESENT) && !defined(XGBOOST_BUILTIN_PREFETCH_PRESENT)

/* default logic for software pre-fetching */
#if (defined(_MSC_VER) && (defined(_M_IX86) || defined(_M_AMD64))) || defined(__INTEL_COMPILER)
// Enable _mm_prefetch for Intel compiler and MSVC+x86
Expand Down
5 changes: 5 additions & 0 deletions include/xgboost/c_api.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@
#include <stdint.h>
#endif // __cplusplus

// XGBoost C API will include APIs in Rabit C API
#include <rabit/c_api.h>

#if defined(_MSC_VER) || defined(_WIN32)
#define XGB_DLL XGB_EXTERN_C __declspec(dllexport)
#else
Expand Down Expand Up @@ -565,4 +568,6 @@ XGB_DLL int XGBoosterLoadRabitCheckpoint(
*/
XGB_DLL int XGBoosterSaveRabitCheckpoint(BoosterHandle handle);

XGB_DLL void XGBoosterRegisterNewMetrics(BoosterHandle handle, const char* metrics_name);

#endif // XGBOOST_C_API_H_
7 changes: 4 additions & 3 deletions include/xgboost/learner.h
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
#include <vector>
#include "./base.h"
#include "./gbm.h"
#include "./metric.h"
#include "xgboost/metric/metric.h"
#include "./objective.h"

namespace xgboost {
Expand Down Expand Up @@ -186,15 +186,16 @@ class Learner : public rabit::Serializable {
*/
virtual const std::map<std::string, std::string>& GetConfigurationArguments() const = 0;

/*! \brief The evaluation metrics used to evaluate the model. */
std::vector<std::unique_ptr<Metric> > metrics_;

protected:
/*! \brief internal base score of the model */
bst_float base_score_;
/*! \brief objective function */
std::unique_ptr<ObjFunction> obj_;
/*! \brief The gradient booster used by the model*/
std::unique_ptr<GradientBooster> gbm_;
/*! \brief The evaluation metrics used to evaluate the model. */
std::vector<std::unique_ptr<Metric> > metrics_;
};

// implementation of inline functions.
Expand Down
194 changes: 194 additions & 0 deletions include/xgboost/metric/elementwise_metric.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,194 @@
/*
* Copyright 2015-2019 by Contributors
*/

#ifndef XGBOOST_METRIC_ELEMENTWISE_METRIC_H_
#define XGBOOST_METRIC_ELEMENTWISE_METRIC_H_

#include <xgboost/metric/metric.h>
#include <xgboost/metric/metric_common.h>

#include <functional>
#include <utility>
#include <string>
#include <vector>

#include "../../../src/common/common.h"

#if defined(XGBOOST_USE_CUDA)
#include <thrust/iterator/counting_iterator.h>
#include <thrust/transform_reduce.h>
#include <thrust/execution_policy.h>
#include <thrust/functional.h> // thrust::plus<>

#include "../../../src/common/device_helpers.cuh"
#endif // XGBOOST_USE_CUDA

/*!
* \brief base class of element-wise evaluation
* \tparam Derived the name of subclass
*/
namespace xgboost {
namespace metric {

template<typename EvalRow>
class ElementWiseMetricsReduction {
public:
explicit ElementWiseMetricsReduction(EvalRow policy) :
policy_(std::move(policy)) {}

PackedReduceResult CpuReduceMetrics(
const HostDeviceVector <bst_float> &weights,
const HostDeviceVector <bst_float> &labels,
const HostDeviceVector <bst_float> &preds) const {
size_t ndata = labels.Size();

const auto &h_labels = labels.HostVector();
const auto &h_weights = weights.HostVector();
const auto &h_preds = preds.HostVector();

bst_float residue_sum = 0;
bst_float weights_sum = 0;

#pragma omp parallel for reduction(+: residue_sum, weights_sum) schedule(static)
for (omp_ulong i = 0; i < ndata; ++i) {
const bst_float wt = h_weights.size() > 0 ? h_weights[i] : 1.0f;
residue_sum += policy_.EvalRow(h_labels[i], h_preds[i]) * wt;
weights_sum += wt;
}
PackedReduceResult res{residue_sum, weights_sum};
return res;
}

#if defined(XGBOOST_USE_CUDA)

PackedReduceResult DeviceReduceMetrics(
GPUSet::GpuIdType device_id,
size_t device_index,
const HostDeviceVector<bst_float>& weights,
const HostDeviceVector<bst_float>& labels,
const HostDeviceVector<bst_float>& preds) {
size_t n_data = preds.DeviceSize(device_id);

thrust::counting_iterator<size_t> begin(0);
thrust::counting_iterator<size_t> end = begin + n_data;

auto s_label = labels.DeviceSpan(device_id);
auto s_preds = preds.DeviceSpan(device_id);
auto s_weights = weights.DeviceSpan(device_id);

bool const is_null_weight = weights.Size() == 0;

auto d_policy = policy_;

PackedReduceResult result = thrust::transform_reduce(
thrust::cuda::par(allocators_.at(device_index)),
begin, end,
[=] XGBOOST_DEVICE(size_t idx) {
bst_float weight = is_null_weight ? 1.0f : s_weights[idx];

bst_float residue = d_policy.EvalRow(s_label[idx], s_preds[idx]);
residue *= weight;
return PackedReduceResult{ residue, weight };
},
PackedReduceResult(),
thrust::plus<PackedReduceResult>());

return result;
}

#endif // XGBOOST_USE_CUDA

PackedReduceResult Reduce(
GPUSet devices,
const HostDeviceVector <bst_float> &weights,
const HostDeviceVector <bst_float> &labels,
const HostDeviceVector <bst_float> &preds) {
PackedReduceResult result;

if (devices.IsEmpty()) {
result = CpuReduceMetrics(weights, labels, preds);
}
#if defined(XGBOOST_USE_CUDA)
else { // NOLINT
if (allocators_.size() != devices.Size()) {
allocators_.clear();
allocators_.resize(devices.Size());
}
preds.Reshard(devices);
labels.Reshard(devices);
weights.Reshard(devices);
std::vector<PackedReduceResult> res_per_device(devices.Size());

#pragma omp parallel for schedule(static, 1) if (devices.Size() > 1)
for (GPUSet::GpuIdType id = *devices.begin(); id < *devices.end(); ++id) {
dh::safe_cuda(cudaSetDevice(id));
size_t index = devices.Index(id);
res_per_device.at(index) = DeviceReduceMetrics(id, index, weights, labels, preds);
}

for (auto const& res : res_per_device) {
result += res;
}
}
#endif // defined(XGBOOST_USE_CUDA)
return result;
}

private:
EvalRow policy_;
#if defined(XGBOOST_USE_CUDA)
std::vector<dh::CubMemory> allocators_;
#endif // defined(XGBOOST_USE_CUDA)
};

template<typename Policy>
struct EvalEWiseBase : public Metric {
EvalEWiseBase() : policy_{}, reducer_{policy_} {}

explicit EvalEWiseBase(Policy &policy) : policy_{policy}, reducer_{policy_} {}

explicit EvalEWiseBase(char const *policy_param) :
policy_{policy_param}, reducer_{policy_} {}

void Configure(
const std::vector<std::pair<std::string, std::string>> &args) override {
param_.InitAllowUnknown(args);
}

bst_float Eval(const HostDeviceVector <bst_float> &preds,
const MetaInfo &info,
bool distributed) override {
CHECK_NE(info.labels_.Size(), 0U) << "label set cannot be empty";
CHECK_EQ(preds.Size(), info.labels_.Size())
<< "label and prediction size not match, "
<< "hint: use merror or mlogloss for multi-class classification";
const auto ndata = static_cast<omp_ulong>(info.labels_.Size());
// Dealing with ndata < n_gpus.
GPUSet devices = GPUSet::All(param_.gpu_id, param_.n_gpus, ndata);

auto result =
reducer_.Reduce(devices, info.weights_, info.labels_, preds);

double dat[2]{result.Residue(), result.Weights()};
if (distributed) {
rabit::Allreduce<rabit::op::Sum>(dat, 2);
}
return Policy::GetFinal(dat[0], dat[1]);
}

const char *Name() const override {
return policy_.Name();
}

private:
Policy policy_;

MetricParam param_;

ElementWiseMetricsReduction<Policy> reducer_;
};

} // namespace metric
} // namespace xgboost
#endif // XGBOOST_METRIC_ELEMENTWISE_METRIC_H_
16 changes: 8 additions & 8 deletions include/xgboost/metric.h → include/xgboost/metric/metric.h
Original file line number Diff line number Diff line change
@@ -1,21 +1,21 @@
/*!
* Copyright 2014 by Contributors
* Copyright 2014-2019 by Contributors
* \file metric.h
* \brief interface of evaluation metric function supported in xgboost.
* \author Tianqi Chen, Kailong Chen
*/
#ifndef XGBOOST_METRIC_H_
#define XGBOOST_METRIC_H_
#ifndef XGBOOST_METRIC_METRIC_H_
#define XGBOOST_METRIC_METRIC_H_

#include <dmlc/registry.h>
#include <xgboost/data.h>
#include <xgboost/base.h>

#include <vector>
#include <string>
#include <functional>
#include <utility>

#include "./data.h"
#include "./base.h"
#include "../../src/common/host_device_vector.h"
#include "../../../src/common/host_device_vector.h"

namespace xgboost {
/*!
Expand Down Expand Up @@ -93,4 +93,4 @@ struct MetricReg
::xgboost::MetricReg& __make_ ## MetricReg ## _ ## UniqueId ## __ = \
::dmlc::Registry< ::xgboost::MetricReg>::Get()->__REGISTER__(Name)
} // namespace xgboost
#endif // XGBOOST_METRIC_H_
#endif // XGBOOST_METRIC_METRIC_H_
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
/*!
* Copyright 2018-2019 by Contributors
* \file metric_param.cc
* Copyright 2019 by Contributors
*/
#ifndef XGBOOST_METRIC_METRIC_COMMON_H_
#define XGBOOST_METRIC_METRIC_COMMON_H_

#include <dmlc/parameter.h>
#include "../common/common.h"
#include "../../../src/common/common.h"

namespace xgboost {
namespace metric {
Expand Down Expand Up @@ -39,6 +38,7 @@ class PackedReduceResult {
return PackedReduceResult{residue_sum_ + other.residue_sum_,
weights_sum_ + other.weights_sum_};
}

PackedReduceResult &operator+=(PackedReduceResult const &other) {
this->residue_sum_ += other.residue_sum_;
this->weights_sum_ += other.weights_sum_;
Expand Down
Loading