Awesome-Optimizer

A collection of optimizer-related papers and code.

For the last column, we let GD for Gradient Descent, S for second-order (quasi-newton) methods, E for evolutionary, GF for gradient free, VR for variance reduced.

Title	Year	Optimizer	Published	Code
The AdEMAMix Optimizer: Better, Faster, Older	2024	AdEMAMix	arxiv	pytorch	GD
FAdam: Adam is a natural gradient optimizer using diagonal empirical Fisher information	2024	FAdam	arxiv	pytorch	GD
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection	2024	GaLore	arxiv	pytorch	GD
CoRe Optimizer: An All-in-One Solution for Machine Learning	2023	CoRe	arxiv	pytorch	GD
AGD: an Auto-switchable Optimizer using Stepwise Gradient Difference for Preconditioning Matrix	2023	AGD	arxiv	pytorch	GD,S
AdaLomo: Low-memory Optimization with Adaptive Learning Rate	2023	AdaLOMO	arxiv	pytorch	GD
Large Language Models as Optimizers	2023	OPRO	arxiv	python	llm
Promoting Exploration in Memory-Augmented Adam using Critical Momenta	2023	Adam+CM	arxiv	pytorch	GD
CAME: Confidence-guided Adaptive Memory Efficient Optimization	2023	CAME	acl'23	pytorch	GD
Full Parameter Fine-tuning for Large Language Models with Limited Resources	2023	LOMO	arxiv	pytorch	GD
Prodigy: An Expeditiously Adaptive Parameter-Free Learner	2023	Prodigy	arxiv	pytorch	GD
DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method	2023	DoWG	neurips'23		GD
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training	2023	Sophia	arxiv	pytorch	GD
UAdam: Unified Adam-Type Algorithmic Framework for Non-Convex Stochastic Optimization	2023	UAdam	arxiv		GD
Sharpness-Aware Minimization Revisited: Weighted Sharpness as a Regularization Term	2023	WSAM	kdd'23	pytorch	GD
DP-Adam: Correcting DP Bias in Adam's Second Moment Estimation	2023	DP-Adam	iclr-W'23		GD
An Adam-enhanced Particle Swarm Optimizer for Latent Factor Analysis	2023	ADHPL	arxiv		E
DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule	2023	DoG	icml'23	pytorch	GD
FOSI: Hybrid First and Second Order Optimization	2023	FOSI	HPI'23	jax	GD,S
Symbolic Discovery of Optimization Algorithms	2023	Lion	neurips'23	jax, tf, pytorch	GD
Amos: An Adam-style Optimizer with Adaptive Weight Decay towards Model-Oriented Scale	2022	Amos	arxiv	jax	GD
VeLO: Training Versatile Learned Optimizers by Scaling Up	2022	VeLO	arxiv	jax	GD
Grad-GradaGrad? A Non-Monotone Adaptive Stochastic Gradient Method	2022	GradaGrad	arxiv		GD
CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10 minutes on 1 GPU	2022	CowClip	aaai'23	tf	GD
Smooth momentum: improving lipschitzness in gradient descent	2022	Smooth Momentum	APIN		GD
Towards Better Generalization of Adaptive Gradient Methods	2020	SAGD	neurips'20		GD
An Improved Adaptive Optimization Technique for Image Classification	2020	Mean-ADAM	ICIEV		GD
SCW-SGD: Stochastically Confidence-Weighted SGD	2020	SCWSGD	ICIP		GD
Slime mould algorithm: A new method for stochastic optimization	2020	SMA	FGCS	code	E
Ranger-Deep-Learning-Optimizer	2020	Ranger	github	pytorch	GD
pbSGD: Powered Stochastic Gradient Descent Methods for Accelerated Non-Convex Optimization	2020	pbSGD	ijcai'20	pytorch	GD
A Variant of Gradient Descent Algorithm Based on Gradient Averaging	2020	Grad-Avg	arxiv		GD
Stochastic Gradient Descent with Nonlinear Conjugate Gradient-Style Adaptive Momentum	2020	FRSGD	arxiv		GD
CADA: Communication-Adaptive Distributed Adam	2020	CADA	arxiv	pytorch, matlab	GD
Eigenvalue-corrected Natural Gradient Based on a New Approximation	2020	TEKFAC	arxiv		GD
SMG: A Shuffling Gradient-Based Method with Momentum	2020	SMG	icml'21		GD
SALR: Sharpness-aware Learning Rate Scheduler for Improved Generalization	2020	SALR	TNNLS		GD
Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering	2020	MEKA	neurips-W'21		GD
Mixing ADAM and SGD: a Combined Optimization Method	2020	MAS	arxiv	pytorch	GD
EAdam Optimizer: How ε Impact Adam	2020	EAdam	arxiv	pytorch	GD
Adam⁺: A Stochastic Method with Adaptive Variance Reduction	2020	Adam⁺	arxiv		GD
Sharpness-aware Minimization for Efficiently Improving Generalization	2020	SAM	iclr'21	jax	GD
Expectigrad: Fast Stochastic Optimization with Robust Convergence Properties	2020	Expectigrad	arxiv	tf	GD
AEGD: Adaptive Gradient Descent with Energy	2020	AEGD	AIMS	pytorch	GD
Adam with Bandit Sampling for Deep Learning	2020	Adambs	arxiv		GD
AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients	2020	AdaBelief	neurips'20	pytorch	GD
Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization	2020	Apollo[W]	arxiv	pytorch	GD,S
S-SGD: Symmetrical Stochastic Gradient Descent with Weight Noise Injection for Reaching Flat Minima	2020	S-SGD	arxiv		GD
Gravilon: Applications of a New Gradient Descent Method to Machine Learning	2020	Gravilon	arxiv		GD
PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization	2020	PAGE	icml'21		GD
Adaptive Gradient Methods for Constrained Convex Optimization and Variational Inequalities	2020	Ada{ACSA,AGD+}	aaai'21		GD
Stochastic Normalized Gradient Descent with Momentum for Large Batch Training	2020	SNGM	arxiv		GD
AdaScale SGD: A User-Friendly Algorithm for Distributed Training	2020	AdaScale	icml'21		GD
Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization	2020	PSTorm	JOTA		GD
MTAdam: Automatic Balancing of Multiple Training Loss Terms	2020	MTAdam	acl'21	pytorch	GD
AdaSGD: Bridging the gap between SGD and Adam	2020	AdaSGD	arxiv		GD
AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights	2020	AdamP	iclr'21	pytorch	GD
Accelerated Large Batch Optimization of BERT Pretraining in 54 minutes	2020	LANS	arxiv	pytorch	GD
AdaSwarm: Augmenting Gradient-Based optimizers in Deep Learning with Swarm Intelligence	2020	AdaSwarm	TETC	pytorch	E
Enhance Curvature Information by Structured Stochastic Quasi-Newton Methods	2020	SKQN,S4QN	cvpr'21		GD
Adaptive Gradient Methods Can Be Provably Faster than SGD after Finite Epochs	2020	SHAdaGrad	arxiv		GD
A New Accelerated Stochastic Gradient Method with Momentum	2020	SGDM	arxiv		GD
Practical Quasi-Newton Methods for Training Deep Neural Networks	2020	K-BFGS[(L)]	neurips'20	pytorch	GD
AdaS: Adaptive Scheduling of Stochastic Gradients	2020	AdaS	cvpr'22	pytorch	GD
Adai: Separating the Effects of Adaptive Learning Rate and Momentum Inertia	2020	Adai	icml'22	pytorch	GD
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning	2020	ADAHESSIAN	aaai'21	pytorch	GD
Momentum with Variance Reduction for Nonconvex Composition Optimization	2020	MVRC-[1,2]	arxiv		GD
CoolMomentum: A Method for Stochastic Optimization by Langevin Dynamics with Simulated Annealing	2020	CoolMomentum	arxiv	tf, pytorch	GD
Gradient Centralization: A New Optimization Technique for Deep Neural Networks	2020	GC	eccv'20	pytorch, tf	GD
AdaX: Adaptive Gradient Descent with Exponential Long Term Memory	2020	AdaX[-W]	arxiv	pytorch	GD
Weak and Strong Gradient Directions: Explaining Memorization, Generalization, and Hardness of Examples at Scale	2020	RM3	arxiv	tf	GD
TAdam: A Robust Stochastic Gradient Optimizer	2020	TAdam	arxiv	pytorch	GD
Iterative Averaging in the Quest for Best Test Error	2020	Gadam	arxiv		GD
On the distance between two neural networks and the stability of learning	2020	Fromage	neurips'20	pytorch	GD
Scheduled Restart Momentum for Accelerated Stochastic Gradient Descent	2020	SRSGD	arxiv	pytorch	GD
Stochastic Runge-Kutta methods and adaptive SGD-G2 stochastic gradient descent	2020	SGD-G2	arxiv		GD
LaProp: Separating Momentum and Adaptivity in Adam	2020	LaProp	arxiv	pytorch	GD
Compositional ADAM: An Adaptive Compositional Solver	2020	C-ADAM	arxiv		GD
Biased Stochastic Gradient Descent for Conditional Stochastic Optimization	2020	BSGD	arxiv		GD
On the Trend-corrected Variant of Adaptive Stochastic Optimization Methods	2020	AdamT	ijcnn'20	pytorch	GD
Efficient Learning Rate Adaptation for Convolutional Neural Network Training	2019	e-AdLR	ijcnn'19		GD
ProxSGD: Training Structured Neural Networks under Regularization and Constraints	2019	ProxSGD	iclr'20	tf	GD
An Adaptive Optimization Algorithm Based on Hybrid Power and Multidimensional Update Strategy	2019	AdaHMG	ieee		GD
signSGD via Zeroth-Order Oracle	2019	ZO-signSGD	iclr'19		GF
Fast DENSER: Efficient Deep NeuroEvolution	2019	F-DENSER	arxiv	tf	E
Adathm: Adaptive Gradient Method Based on Estimates of Third-Order Moments	2019	Adathm	DSC		GD
A new perspective in understanding of Adam-Type algorithms and beyond	2019	AdamAL	arxiv	pytorch	GD
CProp: Adaptive Learning Rate Scaling from Past Gradient Conformity	2019	CProp	arxiv	pytorch	GD
Domain-independent Dominance of Adaptive Methods	2019	AvaGrad, Delayed Adam	cvpr'21	pytorch	GD
Second-order Information in First-order Optimization Methods	2019	AdaSqrt	arxiv	tf	GD
Does Adam optimizer keep close to the optimal point?	2019	AdaFix	arxiv		GD
Local AdaAlter: Communication-Efficient Stochastic Gradient Descent with Adaptive Learning Rates	2019	AdaAlter	arxiv	mxnet	GD
UniXGrad: A Universal, Adaptive Algorithm with Optimal Guarantees for Constrained Optimization	2019	UniXGrad	neurips'19		GD
Demon: Improved Neural Network Training with Momentum Decay	2019	Demon {SGDM,Adam}	icassp'22	tf	GD
ZO-AdaMM: Zeroth-Order Adaptive Momentum Method for Black-Box Optimization	2019	ZO-AdaMM	neurips'19	tf	GF
On Empirical Comparisons of Optimizers for Deep Learning	2019	RMSterov	arxiv		GD
An Adaptive and Momental Bound Method for Stochastic Learning	2019	AdaMod	arxiv	pytorch	GD
On Higher-order Moments in Adam	2019	HAdam	arxiv		GD
diffGrad: An Optimization Method for Convolutional Neural Networks	2019	diffGrad	TNNLS	pytorch	GD
Calibrating the Adaptive Learning Rate to Improve Convergence of ADAM	2019	SAMSGrad	arxiv	pytorch	GD
On the Variance of the Adaptive Learning Rate and Beyond	2019	RAdam	iclr'20	pytorch, TF	GD
BGADAM: Boosting based Genetic-Evolutionary ADAM for Neural Network Optimization	2019	BGADAM	arxiv		GD
Adaloss: Adaptive Loss Function for Landmark Localization	2019	Adaloss	arxiv		GD
signADAM: Learning Confidences for Deep Neural Networks	2019	signADAM[++]	icdmw'19	pytorch	GD
The Role of Memory in Stochastic Optimization	2019	PolyAdam	UAI'20		GD
Lookahead Optimizer: k steps forward, 1 step back	2019	Lookahead	neurips'19	tf, pytorch	GD
Momentum-Based Variance Reduction in Non-Convex SGD	2019	STORM	neurips'19	pytorch	GD
SAdam: A Variant of Adam for Strongly Convex Functions	2019	SAdam	iclr'20	code	GD
Matrix-Free Preconditioning in Online Learning	2019	RecursiveOptimizer	icml'19	tf	GD
PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization	2019	PowerSGD[M]	neurips'19	pytorch	GD
Fast-DENSER++: Evolving Fully-Trained Deep Artificial Neural Networks	2019	F-DENSER++	arxiv	tf	E
Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks	2019	Novograd	neurips'19	pytorch	GD
An Adaptive Remote Stochastic Gradient Method for Training Neural Networks	2019	NAMS{G,B},ARSG	arxiv	pytorch,mxnet	GD
Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates	2019	ArmijoLS	neurips'19	pytorch	GD
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes	2019	LAMB	iclr'19	tf,pytorch	GD
On the Convergence Proof of AMSGrad and a New Version	2019	AdamX	arxiv		GD
An Optimistic Acceleration of AMSGrad for Nonconvex Optimization	2019	OPT-AMSGrad	acml'21		GD
Parabolic Approximation Line Search for DNNs	2019	PAL	neurip'20	pytorch	GD
Gradient-only line searches: An Alternative to Probabilistic Line Searches	2019	GOLS-I	arxiv		GD
Adaptive Gradient Methods with Dynamic Bound of Learning Rate	2019	AdaBound	iclr'19	pytorch	GD
Memory-Efficient Adaptive Optimization	2019	SM3	neurips'19	tf	GD
DADAM: A Consensus-based Distributed Adaptive Gradient Method for Online Optimization	2019	DADAM	arxiv	matlab	GD
On the Convergence of AdaGrad with Momentum for Training Deep Neural Networks	2018	Ada{NAG,HB}	arxiv		GD
SADAGRAD: Strongly Adaptive Stochastic Gradient Methods	2018	SADAGRAD	icml'18		GD
PSA-CMA-ES: CMA-ES with population size adaptation	2018	PSA-CMA-ES	gecco'18		E
Adaptive Methods for Nonconvex Optimization	2018	Yogi	neurips'18	tf	GD
Deep Frank-Wolfe For Neural Network Optimization	2018	DFW	iclr'19	pytorch	GD
HyperAdam: A Learnable Task-Adaptive Adam for Network Training	2018	HyperAdam	aaai'19	tf, pytorch	GD
Practical Bayesian Learning of Neural Networks via Adaptive Optimisation Methods	2018	BADAM	icml'20	tf	GD
Kalman Gradient Descent: Adaptive Variance Reduction in Stochastic Optimization	2018	KGD	arxiv	tf	GD
Quasi-hyperbolic momentum and Adam for deep learning	2018	QHM,QHAdam	iclr'19	pytorch, tf	GD
AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods	2018	AdaShift	iclr'19	pytorch	GD
Optimal Adaptive and Accelerated Stochastic Gradient Descent	2018	A2Grad{Exp,Inc,Uni}	arxiv	pytorch	GD
Accelerating SGD with momentum for over-parameterized learning	2018	MaSS	arxiv	tf	GD
Online Adaptive Methods, Universality and Acceleration	2018	AcceleGrad	neurips'18		GD
On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization	2018	AdaFom	iclr'19		GD
AdaGrad Stepsizes: Sharp Convergence Over Nonconvex Landscapes	2018	AdaGrad-Norm	icml'19	pytorch	GD
Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam	2018	VAdam	vadam'18	pytorch, tf	GD
Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks	2018	Padam	ijcai'20	pytorch	GD
Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis	2018	EKFAC	neurips'18	pytorch	GD
Bayesian filtering unifies adaptive and non-adaptive neural network optimization methods	2018	AdaBayes[FP]	neurips'18	pytorch	GD
Nostalgic Adam: Weighting more of the past gradients when designing the adaptive learning rate	2018	NosAdam	ijcai'19	pytorch	GD
Small steps and giant leaps: Minimal Newton solvers for Deep Learning	2018	Curveball	iccv'19	matlab	GD
GADAM: Genetic-Evolutionary ADAM for Deep Neural Network Optimization	2018	GADAM	arxiv		GD
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost	2018	Adafactor	icml'18	pytorch	GD
Aggregated Momentum: Stability Through Passive Damping	2018	AggMo	iclr'19	pytorch, tf	GD
Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization	2018	Katyusha X	icml'18		VR
WNGrad: Learn the Learning Rate in Gradient Descent	2018	WNGrad	arxiv	C++	GD
VR-SGD: A Simple Stochastic Variance Reduction Method for Machine Learning	2018	VR-SGD	IKDE	C++	GD
signSGD: Compressed Optimisation for Non-Convex Problems	2018	signSGD	icml'18	mxnet	GD
Shampoo: Preconditioned Stochastic Tensor Optimization	2018	Shampoo	icml'18	tf	GD
L4: Practical loss-based stepsize adaptation for deep learning	2018	L4{Adam,Momentum}	neurips'18	pytorch, tf	GD
On the Convergence of Adam and Beyond	2018	AMSGrad, AdamNC	iclr'18	pytorch	GD
SW-SGD: The Sliding Window Stochastic Gradient Descent Algorithm	2017	SW-SGD	PCS		GD
Improving Generalization Performance by Switching from Adam to SGD	2017	SWATS	iclr'18	pytorch	GD
Noisy Natural Gradient as Variational Inference	2017	Noisy {Adam,K-FAC}	icml'18	tf	GD
AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training	2017	AdaComp	aaai'18		GD
AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks	2017	AdaBatch	iclr-W'18	PyTorch	GD
First-order Stochastic Algorithms for Escaping From Saddle Points in Almost Linear Time	2017	NEON	neurips'18		GD
BPGrad: Towards Global Optimality in Deep Learning via Branch and Pruning	2017	BPGrad	cvpr'18	matlab	GD
Decoupled Weight Decay Regularization	2017	AdamW,SGDW	iclr'19	lua	GD
Evolving Deep Convolutional Neural Networks for Image Classification	2017	EvoCNN	ITEC	python	E
Normalized Direction-preserving Adam	2017	ND-Adam	arxiv	pytorch, tf	GD
Regularizing and Optimizing LSTM Language Models	2017	NT-ASGD	iclr'18	pytorch	GD
Natasha 2: Faster Non-Convex Optimization Than SGD	2017	Natasha{1.5,2}	neurips'18		GD
Large Batch Training of Convolutional Networks	2017	LARS	arxiv	pytorch	GD
Practical Gauss-Newton Optimisation for Deep Learning	2017	KFRA, KFLR	icml'17		GD
YellowFin and the Art of Momentum Tuning	2017	YellowFin	arxiv	tf	GD
Variants of RMSProp and Adagrad with Logarithmic Regret Bounds	2017	SC-{Adagrad,RMSProp}	icml'17	pytorch	GD
Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients	2017	M-SVAG	icml'18	tf	GD
Training Deep Networks without Learning Rates Through Coin Betting	2017	COCOB	neurips'17	tf	GD
Sub-sampled Cubic Regularization for Non-convex Optimization	2017	SCR	icml'17	numpy	S
Online Convex Optimization with Unconstrained Domains and Losses	2017	RescaledExp	neurips'16		GD
Evolving Deep Neural Networks	2017	CoDeepNEAT	arxiv	tf	E
SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient	2017	SARAH	icml'17		VR
IQN: An Incremental Quasi-Newton Method with Local Superlinear Convergence Rate	2017	IQN	icassp'17	C++	GD,S
NMODE --- Neuro-MODule Evolution	2017	NMODE	arxiv	C++	E
The Whale Optimization Algorithm	2016	WOA	AES	numpy	E
Incorporating Nesterov Momentum into Adam	2016	Nadam	arxiv	pytorch	GD
Eve: A Gradient Based Optimization Method with Locally and Globally Adaptive Learning Rates	2016	Eve	arxiv	pytorch	GD
Direct Feedback Alignment Provides Learning in Deep Neural Networks	2016	DFA	neurips'16	numpy	GD
SGDR: Stochastic Gradient Descent with Warm Restarts	2016	SGDR	iclr'17	theano	GD
Stochastic Quasi-Newton Methods for Nonconvex Stochastic Optimization	2016	Damp-oBFGS-Inf	SIAM	pytorch	GD,S
A Comprehensive Linear Speedup Analysis for Asynchronous Stochastic Parallel Optimization from Zeroth-Order to First-Order	2016	ZO-SCD	neurips'16		GF
Barzilai-Borwein Step Size for Stochastic Gradient Descent	2016	{SGD,SVRG}-BB	neurips'16	numpy	GD
Adaptive Learning Rate via Covariance Matrix Based Preconditioning for Deep Neural Networks	2016	SDProp	ijcai'17		GD
Katyusha: The First Direct Acceleration of Stochastic Gradient Methods	2016	Katyusha	stoc'17		VR
Accelerating SVRG via second-order information	2015	SVRG+{I,II}	arxiv		GD,S
adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs	2015	adaQN	ecml'16	numpy	GD,S
A Linearly-Convergent Stochastic L-BFGS Algorithm	2015	SVRG-SQN	aistats	julia	GD,S
Optimizing Neural Networks with Kronecker-factored Approximate Curvature	2015	K-FAC	icml'15	tf	GD
Probabilistic Line Searches for Stochastic Optimization	2015	ProbLS	JMLR		GD
Scale-Free Algorithms for Online Linear Optimization	2015	AdaFTRL	alt'15		GD
Adam: A Method for Stochastic Optimization	2014	Adam, AdaMax	iclr'15	pytorch	GD
Random feedback weights support learning in deep neural networks	2014	FA	arxiv	pytorch	GD
A Computationally Efficient Limited Memory CMA-ES for Large Scale Optimization	2014	LM-CMA-ES	gecco'14		E
A Proximal Stochastic Gradient Method with Progressive Variance Reduction	2014	Prox-SVRG	SIAM	tf, numpy	VR
RES: Regularized Stochastic BFGS Algorithm	2014	Reg-oBFGS-Inf	arxiv		GD,S
A Stochastic Quasi-Newton Method for Large-Scale Optimization	2014	SQN	SIAM	matlab	GD,S
SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives	2014	SAGA	neurips'14	numpy	VR
Accelerating stochastic gradient descent using predictive variance reduction	2013	SVRG	neurips'13	pytorch	VR
Ad Click Prediction: a View from the Trenches	2013	FTRL	kdd'13	pytorch	GD
Semi-Stochastic Gradient Descent Methods	2013	S2GD	arxiv		VR
Stochastic First- and Zeroth-order Methods for Nonconvex Stochastic Programming	2013	ZO-SGD	SIAM		GF
Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization	2013	ZO-{ProxSGD,PSGD}	arxiv		GF
Adaptive learning rates and parallelization for stochastic, sparse, non-smooth gradients	2013	vSGD-fd	arxiv		GD
Neural Networks for Machine Learning	2012	RMSProp	coursera	tf	GD
An Enhanced Hypercube-Based Encoding for Evolving the Placement, Density, and Connectivity of Neurons	2012	ES-HyperNEAT	AL	go	E
CMA-TWEANN: efficient optimization of neural networks via self-adaptation and seamless augmentation	2012	CMA-TWEANN	gecoo'12		E
ADADELTA: An Adaptive Learning Rate Method	2012	ADADELTA	arxiv	pytorch	GD
No More Pesky Learning Rates	2012	vSGD-{b,g,l}	icml'13	lua	VR
A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets	2012	SAG	neurips'12		VR
CMA-ES: evolution strategies and covariance matrix adaptation	2011	CMA-ES	gecco'12	tf	E
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization	2011	AdaGrad	JMLR	pytorch,C++	GD
AdaDiff: Adaptive Gradient Descent with the Differential of Gradient	2010	AdaDiff	iopscience		GD
A Hypercube-Based Encoding for Evolving Large-Scale Neural Networks	2009	HyperNEAT	AL		E
Scalable training of L1-regularized log-linear models	2007	OWL-QN	acm	javascript	GD,S
A Stochastic Quasi-Newton Method for Online Convex Optimization	2007	O-LBFGS	icml'07		GD,S
Online convex programming and generalized infinitesimal gradient ascent	2003	OGD	icml'03		GD
A Limited Memory Algorithm for Bound Constrained Optimization	2003	L-BFGS-B	SIAM	fortran, matlab	GD,S
Evolving Neural Networks through Augmenting Topologies	2002	NEAT	EC	numpy	E
Trust region methods	2000	Sub-sampled TR	SIAM		S
A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm	1993	RPROP	icnn'93	pytorch	GD
Acceleration of Stochastic Approximation by Averaging	1992	ASGD	SIAM	pytorch	GD
Particle swarm optimization	1995	PSO	icnn'95		E
On the limited memory BFGS method for large scale optimization	1989	L-BFGS	MP		GD,S
Large-scale linearly constrained optimization	1978	MINOS	MP	pytorch	GD,S
Some methods of speeding up the convergence of iteration methods	1964	Polyak (momentum)	paper		GD
A Stochastic Approximation Method	1951	SGD	paper	pytorch	GD

Name		Name	Last commit message	Last commit date
Latest commit History 298 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Awesome-Optimizer

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Uh oh!

Uh oh!

zoq/Awesome-Optimizer

Folders and files

Latest commit

History

Repository files navigation

Awesome-Optimizer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Packages