GitHub - roverbird/nbd-estimation: Methods to estimate Negative Binomial Distribution

The three methods provided for estimating the parameters of the Negative Binomial Distribution (NBD) differ in terms of their underlying mathematical approach, complexity, and computational efficiency. Below is a detailed comparison of each method, along with recommendations based on the context of their use.

1. Maximum Likelihood Estimation (MLE) - `EstNB1`

This method uses Maximum Likelihood Estimation (MLE), a widely used approach for estimating the parameters of statistical models. In this case, the mle2 function from the bbmle package is employed to estimate the size (k) and probability (p) parameters of the Negative Binomial Distribution.

Strengths:

Accuracy: MLE is considered the gold standard for parameter estimation because it finds the values that maximize the likelihood of observing the given data under the assumed model.
Flexibility: The MLE framework allows for the inclusion of complex models and constraints (e.g., bounds on parameters), making it adaptable to different problem settings.
Well-known: MLE is a familiar method to most statisticians, and it has well-developed theoretical properties such as asymptotic consistency and efficiency.

Weaknesses:

Computational Complexity: MLE can be computationally intensive, especially for large datasets or when the likelihood surface is complex. This method uses iterative optimization, which might take more time and resources.
Potential Convergence Issues: In some cases, MLE might not converge to the true parameter values, especially with small datasets or if the starting values for the optimization are not well-chosen.

Recommendation:

MLE is a strong, reliable choice when you have a large dataset and computational resources. However, it may be slower or less stable with smaller datasets.

2. Digamma Function-Based Estimation - `EstNB2`

This method uses a digamma function-based approach to estimate the parameters. The digamma function (ψ(x)) is the derivative of the log-gamma function and is used here to solve for the k (size) parameter. Once k is estimated, the p (probability) parameter is calculated as ( p = \frac{k}{k + \bar{x}} ).

Strengths:

Efficiency: The digamma function approach is faster because it avoids iterative maximum likelihood optimization. Instead, it directly solves an equation involving the digamma function using nleqslv.
Simplicity: This method is mathematically elegant and may be more intuitive for some, especially in theoretical settings involving NBD.

Weaknesses:

Numerical Stability: The method depends on the numerical solution of an equation involving the digamma function. Depending on the dataset, this can sometimes lead to numerical stability issues.
Less Flexibility: This approach is more rigid compared to MLE and may not generalize well to other distributions or constraints. It also assumes that the data fits the model well.

Recommendation:

This method is well-suited for situations where computational speed is important and the dataset is reasonably well-behaved (i.e., follows the assumptions of the NBD). It can be a good choice for large datasets where MLE would be too slow.

3. Cumulative Summation Approach - `EstNB3`

This method uses a cumulative summation approach to estimate the k parameter by summing terms of the form ( \frac{1}{k + l} ), where ( l ) runs over the observed counts. The k parameter is then solved using the nleqslv solver, and p is computed in the same way as the other methods.

Strengths:

Alternative Method: Provides an alternative estimation strategy to MLE and digamma, which might perform better in specific contexts where the other methods struggle.
Efficient for Large Datasets: Similar to the digamma approach, this method uses an efficient cumulative sum calculation that avoids full maximum likelihood optimization, making it faster.

Weaknesses:

Accuracy: This method can be less accurate than MLE, particularly if the data doesn't follow the distribution closely. The cumulative summation might not fully capture the nuances of the data.
Complexity in Understanding: While the cumulative summation method may be computationally efficient, it is less intuitive than the digamma function approach or MLE in terms of its theoretical foundation.

Recommendation:

Use this method when the dataset is large, and the speed of computation is crucial. However, it may not be as accurate as the other methods in smaller datasets or when the data deviates from the NBD model.

Comparison Summary:

Method	Accuracy	Speed	Complexity	Convergence Issues
MLE (EstNB1)	High	Medium	High	Possible
Digamma (EstNB2)	Medium-High	Fast	Medium	Sometimes stable
Cumulative Sum (EstNB3)	Medium	Fastest	Medium-High	Generally stable

Recommendation:

Use MLE (EstNB1) if accuracy is paramount, particularly when you have a reasonable dataset size and computational resources. It is the most reliable and general method for parameter estimation.
Use Digamma Approach (EstNB2) if computational speed is important, and you are confident that the data fits the NBD well. It offers a good balance between accuracy and speed.
Use Cumulative Summation Approach (EstNB3) if you have a very large dataset and need a quick solution. However, this method may sacrifice some accuracy compared to MLE.

In most cases, MLE (EstNB1) will be the best choice due to its accuracy and robustness, but for very large datasets or real-time applications where speed is critical, EstNB2 is a good alternative.

Три метода оценки параметров отрицательного биномиального распределения (NBD) отличаются друг от друга математическим подходом, сложностью и вычислительной эффективностью. Ниже приводится подробное сравнение каждого метода, а также рекомендации, основанные на контексте их использования.

1. Оценка максимального правдоподобия (MLE) - `EstNB1`.

Этот метод использует Maximum Likelihood Estimation (MLE), широко распространенный подход для оценки параметров статистических моделей. В данном случае для оценки параметров размера (k) и вероятности (p) отрицательного биномиального распределения используется функция mle2 из пакета bbmle.

Сильные стороны:

Точность: MLE считается золотым стандартом для оценки параметров, поскольку он находит значения, которые максимизируют вероятность наблюдения заданных данных при предполагаемой модели.
Гибкость**: Система MLE позволяет включать сложные модели и ограничения (например, ограничения на параметры), что делает ее адаптируемой к различным постановкам задач.
Отличная известность: Метод MLE знаком большинству статистиков, и он обладает хорошо развитыми теоретическими свойствами, такими как асимптотическая согласованность и эффективность.

Слабые стороны:

Вычислительная сложность: MLE может требовать больших вычислительных затрат, особенно для больших наборов данных или когда поверхность правдоподобия имеет сложную форму. Этот метод использует итеративную оптимизацию, что может потребовать больше времени и ресурсов.
Потенциальные проблемы сходимости: В некоторых случаях MLE может не сходиться к истинным значениям параметров, особенно при небольших наборах данных или при плохо подобранных начальных значениях для оптимизации.

2. Оценка на основе функции Дигаммы - `EstNB2`

В этом методе для оценки параметров используется подход на основе функции дигаммы. Функция дигамма (ψ(x)) является производной функции логарифма и используется здесь для решения задачи о параметре k (размер). После того, как k оценен, параметр p (вероятность) вычисляется как ( p = \frac{k}{k + \bar{x}} ).

Сильные стороны:

Эффективность: Подход с использованием функции дигаммы является более быстрым, поскольку он позволяет избежать итеративной оптимизации максимального правдоподобия. Вместо этого он напрямую решает уравнение с дигамма-функцией с помощью nleqslv.
Простота: Этот метод математически элегантен и может быть более интуитивным для некоторых, особенно в теоретических установках, включающих NBD.

Слабые стороны:

Численная стабильность: Метод зависит от численного решения уравнения, включающего функцию дигаммы. В зависимости от набора данных это может иногда приводить к проблемам с численной стабильностью.
Небольшая гибкость: Этот подход является более жестким по сравнению с MLE и может быть плохо применим к другим распределениям или ограничениям. Он также предполагает, что данные хорошо подходят к модели.

3. Подход кумулятивного суммирования - `EstNB3`

Этот метод использует подход кумулятивного суммирования для оценки параметра k путем суммирования членов вида ( \frac{1}{k + l} ), где ( l ) пробегает по наблюдаемым отсчетам. Параметр k затем решается с помощью решателя nleqslv, а p вычисляется так же, как и в других методах.

Сильные стороны:

Альтернативный метод: Предоставляет стратегию оценки, альтернативную MLE и digamma, которая может быть лучше в специфических контекстах, где другие методы не справляются.
Эффективность для больших наборов данных: Подобно методу digamma, этот метод использует эффективное вычисление кумулятивной суммы, которое позволяет избежать полной оптимизации максимального правдоподобия, что делает его более быстрым.

Слабые стороны:

Точность: Этот метод может быть менее точным, чем MLE, особенно если данные не соответствуют распределению. Кумулятивное суммирование может не полностью отражать нюансы данных.
Сложность в понимании: Хотя метод кумулятивного суммирования может быть эффективным с точки зрения вычислений, он менее интуитивен, чем подход с использованием функции дигаммы или MLE, с точки зрения его теоретической основы.

Сравнительный обзор:

Метод	Точность	Скорость	Сложность	Проблемы конвергенции
MLE (EstNB1)	Высокий	Средний	Высокий	Возможный
Дигамма (EstNB2)	Средне-высокий	Быстрый	Средний	Иногда стабильный
Кумулятивная сумма (EstNB3)	Средняя	Самая быстрая	Средне-высокая	Обычно стабильная

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
EstNB1.R		EstNB1.R
EstNB2.R		EstNB2.R
EstNB3.R		EstNB3.R
EstNB3_goodness_of_fit.R		EstNB3_goodness_of_fit.R
NBD_estimation.pdf		NBD_estimation.pdf
cvm.R		cvm.R
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

1. Maximum Likelihood Estimation (MLE) - `EstNB1`

Strengths:

Weaknesses:

Recommendation:

2. Digamma Function-Based Estimation - `EstNB2`

Strengths:

Weaknesses:

Recommendation:

3. Cumulative Summation Approach - `EstNB3`

Strengths:

Weaknesses:

Recommendation:

Comparison Summary:

Recommendation:

1. Оценка максимального правдоподобия (MLE) - `EstNB1`.

Сильные стороны:

Слабые стороны:

Рекомендация:

2. Оценка на основе функции Дигаммы - `EstNB2`

Сильные стороны:

Слабые стороны:

Рекомендация:

3. Подход кумулятивного суммирования - `EstNB3`

Сильные стороны:

Слабые стороны:

Рекомендация:

Сравнительный обзор:

Рекомендация:

About

Uh oh!

Releases

Packages

Languages

roverbird/nbd-estimation

Folders and files

Latest commit

History

Repository files navigation

1. Maximum Likelihood Estimation (MLE) - EstNB1

Strengths:

Weaknesses:

Recommendation:

2. Digamma Function-Based Estimation - EstNB2

Strengths:

Weaknesses:

Recommendation:

3. Cumulative Summation Approach - EstNB3

Strengths:

Weaknesses:

Recommendation:

Comparison Summary:

Recommendation:

1. Оценка максимального правдоподобия (MLE) - EstNB1.

Сильные стороны:

Слабые стороны:

Рекомендация:

2. Оценка на основе функции Дигаммы - EstNB2

Сильные стороны:

Слабые стороны:

Рекомендация:

3. Подход кумулятивного суммирования - EstNB3

Сильные стороны:

Слабые стороны:

Рекомендация:

Сравнительный обзор:

Рекомендация:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. Maximum Likelihood Estimation (MLE) - `EstNB1`

2. Digamma Function-Based Estimation - `EstNB2`

3. Cumulative Summation Approach - `EstNB3`

1. Оценка максимального правдоподобия (MLE) - `EstNB1`.

2. Оценка на основе функции Дигаммы - `EstNB2`

3. Подход кумулятивного суммирования - `EstNB3`

Packages