Skip to content
This repository was archived by the owner on Oct 12, 2023. It is now read-only.

Commit 3ef7345

Browse files
authored
Milestone/0.4.0 (#74)
* Added set chunk size * Added cluster configuration validation function (#30) * Added pool config test validation * Added a fix for validation * Added if checks for null tests and more validation tests * Install R packages at job run time (#29) * Added cran/github installation scripts * Added package installation tests * Upgraded package version to 0.3.2 * Output file support (#40) * Output files support * Added createOutputFile method * output files readme documentation * added tests and find container sas * Added more detailed variable names * Enable/disable merge task (#39) * Merge task pass params * Fixed enableMerge cases * Merge task documentation on README.md * Fixed typo on merge task description * Update doAzureParallel.R * Changed enableMerge to enableCloudCombine * Fix/backwards compatible (#68) * Added backwards compatible in make cluster * Added deprecated config validator * Added mismatch label * Added validation for quota limits and bad getPool requests in waitForNodesToComplete (#52) * Added validation for quota limits and bad getPool requests * Fixed based on PR * Fixed progress bar layout to use switch statements instead of if statements * Changed clusterId to poolId * Added comments and fixed messages * Added running state to the node status * Reformatted lines for function * Added end statement for node completion * Feature/custom script and reduce (#70) * Added custom scripts and removed dependencies parameter * Updated roxygen tool version * Added parallelThreads support * Added test coverage * Removed verbose message on command line * Added Reduce function for group of tasks * Fix build because of doc semantics mismatch with function * Removed unused function * Added command line arg * Added docs for custom script * Moved customize cluster to separate doc for future usage * Fixed typo * Bug - Waiting for tasks to completion function ends too early (#69) * Moved wait for tasks to complete to doAzureParallel utility * Removed unneeded variables and progress * Fixed camel case for skiptoken * Travis/lintr (#72) * Added lintr config file * Added travis github package installation * Removed snake case rule * Fixed documents on doAzureParallel * Based on lintr default_settins docs, correctly added default rules * Updated lintr package to use object_name_style * Added package :: operator * Reformatted after merge * Fixed command line tests * Upgraded roxygen to 6.0.1 * Cluster config docs * Removed additional delete job * Fixed warning descriptions in makeCluster * Ramped up versions for DESCRIPTION * Updated CHANGELOG to 0.4.0 * Fixed environmentSettings style from merge conflict * Fixed cluster warning naming style
1 parent 6b215a2 commit 3ef7345

39 files changed

+1745
-803
lines changed

.travis.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,4 @@ warnings_are_errors: false
77

88
r_github_packages:
99
- Azure/rAzureBatch
10+
- jimhester/lintr

CHANGELOG.md

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,21 @@
1-
0.3.0
1+
# Change Log
2+
## [0.4.0] 2017-08-22
3+
### Added
4+
- Custom Scripts: Allows users to run commands on the command prompt when nodes boots up
5+
- Output Files: Able to persistently upload files to Azure Storage after task completion
6+
- Added cluster configuration validation at runtime
7+
- Enable/Disable merge task from collecting all the tasks into one list
8+
### Changed
9+
- Enable reduce function based on chunk size
10+
- Support backwards compatibility for older versions of the cluster configuration
11+
- Improve R package installation using scripts instead of creating R package installation command lines on the fly
12+
- Automatically load libraries defined in the foreach loop
13+
### Fixed
14+
- Paging through all tasks in `waitForTasksToComplete` function allow jobs to not fail early
15+
- Added `::` import operators to fix NAMESPACE problems
16+
17+
## [0.3.0] 2017-05-22
18+
### Added
219
- [BREAKING CHANGE] Two configuration files for easier debugging - credentials and cluster settings
320
- [BREAKING CHANGE] Added low priority virtual machine support for additional cost saving
421
- Added external method for setting chunk size (SetChunkSize)

DESCRIPTION

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Package: doAzureParallel
22
Type: Package
33
Title: doAzureParallel
4-
Version: 0.3.1
4+
Version: 0.4.0
55
Author: Brian Hoang
66
Maintainer: Brian Hoang <brhoan@microsoft.com>
77
Description: The project is for data experts who use R at scale. The project
@@ -15,7 +15,7 @@ LazyData: TRUE
1515
Depends:
1616
foreach (>= 1.4.3),
1717
iterators (>= 1.0.8),
18-
rAzureBatch (>= 0.2.4)
18+
rAzureBatch (>= 0.4.0)
1919
Suggests:
20-
testthat, caret, plyr
21-
RoxygenNote: 5.0.1
20+
testthat, caret, plyr, lintr
21+
RoxygenNote: 6.0.1

NAMESPACE

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
# Generated by roxygen2: do not edit by hand
22

3+
export(createOutputFile)
34
export(generateClusterConfig)
45
export(generateCredentialsConfig)
56
export(getJobList)
@@ -9,6 +10,8 @@ export(registerDoAzureParallel)
910
export(resizeCluster)
1011
export(setChunkSize)
1112
export(setCredentials)
13+
export(setReduce)
1214
export(setVerbose)
1315
export(stopCluster)
1416
export(waitForNodesToComplete)
17+
export(waitForTasksToComplete)

R/autoscale.R

Lines changed: 68 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,28 @@
1-
AUTOSCALE_WORKDAY_FORMULA <- paste0(
1+
autoscaleWorkdayFormula <- paste0(
22
"$curTime = time();",
33
"$workHours = $curTime.hour >= 8 && $curTime.hour < 18;",
44
"$isWeekday = $curTime.weekday >= 1 && $curTime.weekday <= 5;",
55
"$isWorkingWeekdayHour = $workHours && $isWeekday;",
6-
"$TargetDedicatedNodes = $isWorkingWeekdayHour ? %s:%s;")
6+
"$TargetDedicatedNodes = $isWorkingWeekdayHour ? %s:%s;"
7+
)
78

8-
AUTOSCALE_WEEKEND_FORMULA <- paste0(
9+
autoscaleWeekendFormula <- paste0(
910
"$isWeekend = $curTime.weekday >= 6 && $curTime.weekday <= 7;",
10-
"$TargetDedicatedNodes = $isWeekend ? %s:%s;")
11+
"$TargetDedicatedNodes = $isWeekend ? %s:%s;"
12+
)
1113

12-
AUTOSCALE_MAX_CPU_FORMULA <- "$totalNodes =
13-
(min($CPUPercent.GetSample(TimeInterval_Minute * 10)) > 0.7) ?
14-
($CurrentDedicated * 1.1) : $CurrentDedicated; $totalNodes =
15-
(avg($CPUPercent.GetSample(TimeInterval_Minute * 60)) < 0.2) ?
16-
($CurrentDedicated * 0.9) : $totalNodes;
17-
$TargetDedicatedNodes = min(%s, $totalNodes)"
14+
autoscaleMaxCpuFormula <- paste0(
15+
"$totalNodes = (min($CPUPercent.GetSample(TimeInterval_Minute * 10)) > 0.7) ? ",
16+
"($CurrentDedicated * 1.1) : $CurrentDedicated; $totalNodes = ",
17+
"(avg($CPUPercent.GetSample(TimeInterval_Minute * 60)) < 0.2) ? ",
18+
"($CurrentDedicated * 0.9) : $totalNodes; ",
19+
"$TargetDedicatedNodes = min(%s, $totalNodes)"
20+
)
1821

19-
AUTOSCALE_QUEUE_FORMULA <- paste0(
22+
autoscaleQueueFormula <- paste0(
2023
"$samples = $ActiveTasks.GetSamplePercent(TimeInterval_Minute * 15);",
21-
"$tasks = $samples < 70 ? max(0,$ActiveTasks.GetSample(1)) : max( $ActiveTasks.GetSample(1), avg($ActiveTasks.GetSample(TimeInterval_Minute * 15)));",
24+
"$tasks = $samples < 70 ? max(0,$ActiveTasks.GetSample(1)) : ",
25+
"max( $ActiveTasks.GetSample(1), avg($ActiveTasks.GetSample(TimeInterval_Minute * 15)));",
2226
"$maxTasksPerNode = %s;",
2327
"$round = $maxTasksPerNode - 1;",
2428
"$targetVMs = $tasks > 0? (($tasks + $round)/ $maxTasksPerNode) : max(0, $TargetDedicated/2) + 0.5;",
@@ -27,30 +31,47 @@ AUTOSCALE_QUEUE_FORMULA <- paste0(
2731
"$NodeDeallocationOption = taskcompletion;"
2832
)
2933

30-
AUTOSCALE_FORMULA = list("WEEKEND" = AUTOSCALE_WEEKEND_FORMULA,
31-
"WORKDAY" = AUTOSCALE_WORKDAY_FORMULA,
32-
"MAX_CPU" = AUTOSCALE_MAX_CPU_FORMULA,
33-
"QUEUE" = AUTOSCALE_QUEUE_FORMULA)
34+
autoscaleFormula <- list(
35+
"WEEKEND" = autoscaleWeekendFormula,
36+
"WORKDAY" = autoscaleWorkdayFormula,
37+
"MAX_CPU" = autoscaleMaxCpuFormula,
38+
"QUEUE" = autoscaleQueueFormula
39+
)
3440

35-
getAutoscaleFormula <- function(formulaName, dedicatedMin, dedicatedMax, lowPriorityMin, lowPriorityMax, maxTasksPerNode = 1){
36-
formulas <- names(AUTOSCALE_FORMULA)
41+
getAutoscaleFormula <-
42+
function(formulaName,
43+
dedicatedMin,
44+
dedicatedMax,
45+
lowPriorityMin,
46+
lowPriorityMax,
47+
maxTasksPerNode = 1) {
48+
formulas <- names(autoscaleFormula)
3749

38-
if(formulaName == formulas[1]){
39-
return(sprintf(AUTOSCALE_WEEKEND_FORMULA, dedicatedMin, dedicatedMax))
40-
}
41-
else if(formulaName == formulas[2]){
42-
return(sprintf(AUTOSCALE_WORKDAY_FORMULA, dedicatedMin, dedicatedMax))
50+
if (formulaName == formulas[1]) {
51+
return(sprintf(autoscaleWeekendFormula, dedicatedMin, dedicatedMax))
52+
}
53+
else if (formulaName == formulas[2]) {
54+
return(sprintf(autoscaleWorkdayFormula, dedicatedMin, dedicatedMax))
55+
}
56+
else if (formulaName == formulas[3]) {
57+
return(sprintf(autoscaleMaxCpuFormula, dedicatedMin))
58+
}
59+
else if (formulaName == formulas[4]) {
60+
return(
61+
sprintf(
62+
autoscaleQueueFormula,
63+
maxTasksPerNode,
64+
dedicatedMin,
65+
dedicatedMax,
66+
lowPriorityMin,
67+
lowPriorityMax
68+
)
69+
)
70+
}
71+
else{
72+
stop("Incorrect autoscale formula: QUEUE, MAX_CPU, WEEKEND, WORKDAY")
73+
}
4374
}
44-
else if(formulaName == formulas[3]){
45-
return(sprintf(AUTOSCALE_MAX_CPU_FORMULA, dedicatedMin))
46-
}
47-
else if(formulaName == formulas[4]){
48-
return(sprintf(AUTOSCALE_QUEUE_FORMULA, maxTasksPerNode, dedicatedMin, dedicatedMax, lowPriorityMin, lowPriorityMax))
49-
}
50-
else{
51-
stop("Incorrect autoscale formula: QUEUE, MAX_CPU, WEEKEND, WORKDAY")
52-
}
53-
}
5475

5576
#' Resize an Azure cloud-enabled cluster.
5677
#'
@@ -74,10 +95,19 @@ resizeCluster <- function(cluster,
7495
lowPriorityMin,
7596
lowPriorityMax,
7697
algorithm = "QUEUE",
77-
timeInterval = "PT5M"){
78-
pool <- getPool(cluster$poolId)
98+
timeInterval = "PT5M") {
99+
pool <- rAzureBatch::getPool(cluster$poolId)
79100

80-
resizePool(cluster$poolId,
81-
autoscaleFormula = getAutoscaleFormula(algorithm, dedicatedMin, dedicatedMax, lowPriorityMin, lowPriorityMax, maxTasksPerNode = pool$maxTasksPerNode),
82-
autoscaleInterval = timeInterval)
101+
rAzureBatch::resizePool(
102+
cluster$poolId,
103+
autoscaleFormula = getAutoscaleFormula(
104+
algorithm,
105+
dedicatedMin,
106+
dedicatedMax,
107+
lowPriorityMin,
108+
lowPriorityMax,
109+
maxTasksPerNode = pool$maxTasksPerNode
110+
),
111+
autoscaleInterval = timeInterval
112+
)
83113
}

0 commit comments

Comments
 (0)