binsmlr

Overview

binsmlr, as in bin similar, is a package to create bins that are assessed to be heterogenous (in terms of default risk) across a range of ordinal values. The binning can be performed automatically, interactively and manually.

The objective is to create groups of data (bins) across a range of ordinal values. The bins produced should be homogenous within and heterogenous across bins in terms of the default risk. A binning approach, in the context of data transformation, is common in different credit risk applications and in particular to credit risk portfolios having a relatively large number of defaults. A prerequisite is that each observation is assigned an ordinal value such as score or rating etc., that indicates the relative default risk as proposed by the rank ordering model. Also, a default indicator is required.

The main process is the following:

Create a relatively large set of initial bins.
Perform homogeneity tests between all adjacent bins
Merge the pair of bins that are the most similar in terms of proportions of state of non-default and state of default.
Repeat step 2 and 3 until all bins are heterogeneous or if a minimum number of bins are obtained.

The process above should result in bins that are statistically different in terms of default risk across a range of ordinal data.

Installation

Install the development version from GitHub.

# install.packages("devtools")
devtools::install_github("rrunner/binsmlr")

Usage

library(binsmlr)
help("binsmlr")

# example of interactive binning

# create initial bins
bins <- create_initial_bins(bin_data, 30, "score", "default")
length(bins)
is_monotonic(bins)

# reduce bins by performing repeated homogeneity tests
new_bins <- reduce_bins(bins, min_required_bins = 7, confidence_level = 0.01)
length(new_bins)
is_monotonic(new_bins)

# plot initial and reduced bins
bins_df <- dplyr::bind_rows(bins)
plot(x = bins_df$mid_score, y = log(bins_df$odds), type = "p",
     col = "lightblue", cex = 1.5, pch = 20, ylab = "log(odds)",
     xlab = "score")

new_bins_df <- dplyr::bind_rows(new_bins)
points(x = new_bins_df$mid_score, y = log(new_bins_df$odds),
       col = "darkblue", cex = 1.5, pch = 20)
legend(x = "topright", legend = c("Initial bins", "Reduced bins"),
       col = c("lightblue", "darkblue"), pch = 20, pt.cex = 1.5, bty = "n")

Getting help and report issues

If you encounter a clear bug, please file an issue with a minimal reproducible example on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
R		R
data		data
man		man
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
.travis.yml		.travis.yml
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
binsmlr.Rproj		binsmlr.Rproj
codecov.yml		codecov.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

binsmlr

Overview

Installation

Usage

Getting help and report issues

About

Uh oh!

Releases

Packages

Languages

rrunner/binsmlr

Folders and files

Latest commit

History

Repository files navigation

binsmlr

Overview

Installation

Usage

Getting help and report issues

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages