Skip to content

JlPang863/Eris-scheduler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Eris: An Online Auction for Scheduling Unbiased Distributed Learning Over Edge Networks

Eris is a customized cluster scheduler for deep learning training jobs that targets high job performance and resource efficiency in production clusters. It builds resource-performance models for each job on the go, and dynamically schedules resources to jobs based on job progress and the cluster load to maximize training performance and resource efficiency. The implementation uses MXNet as the distributed training framework and schedules jobs based on Kubernetes.

Setup

Software Environment

(1) Ubuntu 14.04.5 Server 64bit LTS;

(2) HDFS 2.8;

(3) Docker 17.06.0-ce;

(4) Kubernetes 1.7;

(5) NVIDIA Driver version >= 375.66;

(6) CUDA version >= 8.0.61;

(7) CuDNN Library version >= 6.0

See docs for installation guide.

Container Environment

MXNet GPU container (if the server has NVIDIA GPUs): see images

Usage

The PS load balance algorithm and code are in mxnet. The scheduling code is in scheduler. Before running experimentor.py, make sure hyper-parameters in params.py are correct.

More

Read the Eris paper and Optimus paper (Code base) for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published