Skip to content

Commit 2c62cd9

Browse files
committed
polish README and scripts
1 parent c525551 commit 2c62cd9

File tree

3 files changed

+66
-45
lines changed

3 files changed

+66
-45
lines changed

experiments/README.md

Lines changed: 46 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
Make sure the binary file `simon` has been generated in the `bin` directory (see "🚧 Environment Setup" in [README](../README.md) for details)
66

7-
### Generate Run Scripts
7+
### Run Scripts Generation
88

99
```bash
1010
# pwd: kubernetes-scheduler-simulator/experiments
@@ -25,46 +25,68 @@ $ cat experiments/run_scripts/run_scripts_0511.sh | while read i; do printf "%q\
2525
# "--max-procs=16" where 16 is the degree of PARALLEL suggested above
2626
# bash run_scripts_0511.sh will run experiments sequentially
2727

28-
# ..|''|| '|. '|' '||''''| '|| ||' ..|''|| '|. '|' |''||''| '||' '||' '||' | |''||''| '||''''| '||''|.
29-
# .|' || |'| | || . ||| ||| .|' || |'| | || || || || ||| || || . || ||
30-
# || || | '|. | ||''| |'|..'|| || || | '|. | || ||''''|| || | || || ||''| ||''|'
31-
# '|. || | ||| || | '|' || '|. || | ||| || || || || .''''|. || || || |.
32-
# ''|...|' .|. '| .||.....| .|. | .||. ''|...|' .|. '| .||. .||. .||. .||.....| .|. .||. .||. .||.....| .||. '|'
28+
# ..|''|| '|. '|' '||''''| '|| ||' ..|''|| '|. '|' |''||''| '||' '||' '||' | |''||''| '||''''| '||''|.
29+
# .|' || |'| | || . ||| ||| .|' || |'| | || || || || ||| || || . || ||
30+
# || || | '|. | ||''| |'|..'|| || || | '|. | || ||''''|| || | || || ||''| ||''|'
31+
# '|. || | ||| || | '|' || '|. || | ||| || || || || .''''|. || || || |.
32+
# ''|...|' .|. '| .||.....| .|. | .||. ''|...|' .|. '| .||. .||. .||. .||.....| .|. .||. .||. .||.....| .||. '|'
3333
```
3434

35-
Roughly, it takes around
36-
- 10 minutes for 1 experiment on 2 vCPU.
37-
- 10 hours for 1020 experiments on a 256 vCPU machine with pool size of 128 threads.
35+
To explain the bash script generated (e.g., `run_scripts_0511.sh`)
36+
- Each experiment is conducted via [scripts/generate_config_and_run.py](../scripts/generate_config_and_run.py)
37+
- Firstly, the script generates two configuration yaml files in that folder, which are served as input to `bin/simon apply` (i.e., cluster-config and scheduler-config, see "🔥 Quickstart Example" in repo [README](../README.md)),
38+
- Then, it execute the `bin/simon apply` command (confirmed by passing the `-e` parameter to the script)
39+
- The `bin/simon`, written in Golang, will schedule the tasks and produce a scheduling log file in the corresponding folder.
40+
- Afterwards, [scripts/analysis.py](../scripts/analysis.py) is executed to parse logs and yields multiple `analysis_*` files in the folder
3841

39-
### Merge
42+
In fact, it takes around
43+
- 10 minutes for 1 experiment on 2 vCPU, 9.4MB disk space for logs.
44+
- 10 hours for 1020 experiments on a 256 vCPU machine with pool size of 128 threads, 9.4GB disk space for logs.
45+
46+
### Analysis & Merge
47+
48+
As each experiment has its own folder where the `analysis_*` files are produced, here we bypass all folders and merge all the analysis files of the same category into one file under the `analysis/analysis_results` folder.
49+
50+
The top folder of the experiment varies with DATE (e.g., `2023_0511`), while the `analysis_merge.sh` is hard-coded to bypass and merge folders under the `data` folder. Therefore, we need to softlink the top folder to be merged to `data` (e.g., `$ ln -s 2023_0511 data`) before executing `$ bash analysis_merge.sh`.
4051

4152
```bash
4253
# pwd: kubernetes-scheduler-simulator/experiments
4354
$ ln -s 2023_0511 data # softlink it to data
4455
# pwd: kubernetes-scheduler-simulator/experiments/analysis
4556
$ cd analysis
46-
$ bash bash_merge.sh
47-
# The output was generated under "analysis_results"
48-
# The results of our large-scale experiments are cached in "expected_results" for comparison
57+
# The output will be put under "analysis_results" folder
58+
$ bash analysis_merge.sh
4959
```
5060

61+
Our results of the extensive 1020 experiments are cached in [analysis/expected_results](./analysis/expected_results/) (9.8MB) for your reference.
62+
5163
### Plot
5264

53-
Automatically generate figures in the paper based on the analysis results and compare them with the results in the [expected_results](plot/expected_results) directory.
54-
For example, running `python plot_paib_alloc.py` will generate `paib_alloc.pdf` figure, which corresponds to Fig. 9(a) in the paper.
65+
To reproduce the figures shown in the paper, we provide the plotting scripts in the [plot](./plot/) folder. As the scripts assume the existence of the merged `analysis_*.csv` files, we need to first copy (or softlink) these files to [plot](./plot/) folder
5566

5667
```bash
5768
# pwd: kubernetes-scheduler-simulator/experiments/analysis
5869
$ cd ..
5970
# pwd: kubernetes-scheduler-simulator/experiments
6071
$ cp analysis/analysis_results/* plot/ # copy all csv under analysis_results/ to plot/ for analysis
61-
# cp analysis/expected_results/* plot/ # if skipping the experiments and directly reuse our expected results for plotting
72+
```
73+
74+
if you have skipped the extensive experiments and/or would like to use our expected analysis results for plotting, please replace the last command as:
75+
```bash
76+
$ cp analysis/expected_results/* plot/
77+
```
78+
79+
As the final step, step into the [plot](./plot/) folder and generate figures (in pdf format) based on the analysis results. For example, running `python plot_openb_alloc.py` will produce `openb_alloc.pdf` in the current directory, which corresponds to Fig. 9(a) in the paper.
80+
81+
```bash
6282
$ cd plot
63-
$ python plot_paib_alloc.py # Fig. 9(a)
64-
$ python plot_paib_frag_amount.py # Fig. 7(a)
65-
$ python plot_paib_frag_ratio.py # Fig. 7(b)
66-
$ python plot_paib_gpushare_alloc_bar.py # Fig. 11
67-
$ python plot_paib_multigpu_alloc_bar.py # Fig. 12
68-
$ python plot_paib_gpuspec_alloc_bar.py # Fig. 13
69-
$ python plot_paib_nongpu_alloc_bar.py # Fig. 14
70-
```
83+
$ python plot_openb_alloc.py # Fig. 9(a)
84+
$ python plot_openb_frag_amount.py # Fig. 7(a)
85+
$ python plot_openb_frag_ratio.py # Fig. 7(b)
86+
$ python plot_openb_gpushare_alloc_bar.py # Fig. 11
87+
$ python plot_openb_multigpu_alloc_bar.py # Fig. 12
88+
$ python plot_openb_gpuspec_alloc_bar.py # Fig. 13
89+
$ python plot_openb_nongpu_alloc_bar.py # Fig. 14
90+
```
91+
92+
Our results shown in the paper are cached in [plot/expected_results](plot/expected_results) (164KB) for your reference.

experiments/run_scripts/expected_run_scripts_0511.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
#!/usr/bin/bash
1+
#!/bin/bash
22
# cat run_scripts_0511.sh | while read i; do printf "%q\n" "$i"; done | xargs --max-procs=16 -I CMD bash -c CMD
33

44
# 01, Random, random, <none>, <none> @ openb_pod_list_default

experiments/run_scripts/generate_run_scripts.py

Lines changed: 19 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,29 @@
1-
#
2-
# 2023_0509
1+
#
32
# Usage: python3 generate_run_scripts.py > run_scripts.sh
43

5-
PARALLEL=128
6-
NUM_REPEAT=10
74

8-
Date = "2023_0511"
9-
Remark = "Artifacts"
10-
FileList = [
11-
#: Fig.7, Fig.9
5+
DATE = "2023_0511" # Used as the folder name under experiments/ to hold all log results. To avoid collision of repeated experiments, may change date or append _v1, _v2, etc.
6+
REMARK = "Artifacts"
7+
REPEAT =10 # Number of repetitive experiments.
8+
FILELIST = [
9+
#: Main results in Fig. 7 and 9
1210
"data/openb_pod_list_default",
13-
#: Fig.
11+
#: Fig. 14 Various proportion of non-GPU tasks
1412
"data/openb_pod_list_cpu050",
1513
"data/openb_pod_list_cpu100",
1614
"data/openb_pod_list_cpu200",
1715
"data/openb_pod_list_cpu250",
18-
#:
16+
#: Fig. 11 Various proportion of GPU-sharing tasks
1917
"data/openb_pod_list_gpushare100",
2018
"data/openb_pod_list_gpushare40",
2119
"data/openb_pod_list_gpushare60",
2220
"data/openb_pod_list_gpushare80",
23-
#:
21+
#: Fig. 13 Various proportion of tasks with GPU-type constraints
2422
"data/openb_pod_list_gpuspec10",
2523
"data/openb_pod_list_gpuspec20",
2624
"data/openb_pod_list_gpuspec25",
2725
"data/openb_pod_list_gpuspec33",
28-
#:
26+
#: Fig. 12 Various proportion of multi-GPU tasks
2927
"data/openb_pod_list_multigpu20",
3028
"data/openb_pod_list_multigpu30",
3129
"data/openb_pod_list_multigpu40",
@@ -86,25 +84,25 @@ def get_dir_name_from_policy_id_list(id_list):
8684
###########################################################
8785
###########################################################
8886

89-
def generate_run_scripts(asyncc=True, parallel=PARALLEL):
90-
DateAndRemark = Date + "-" + Remark.replace(' ', "_").replace('(',"_").replace(')',"_")
87+
def generate_run_scripts(asyncc=True, parallel=16):
88+
DateAndRemark = DATE + "-" + REMARK.replace(' ', "_").replace('(',"_").replace(')',"_")
9189
numJobs=0
9290
if asyncc:
93-
print('#!/usr/bin/bash\n# screen -dmS sim-%s bash -c "bash run_scripts_%s.sh"\n' % (DateAndRemark, Date[-4:]))
91+
print('#!/bin/bash\n# screen -dmS sim-%s bash -c "bash run_scripts_%s.sh"\n' % (DateAndRemark, DATE[-4:]))
9492
else:
95-
print('#!/usr/bin/bash\n# cat run_scripts_%s.sh | while read i; do printf "%%q\\n" "$i"; done | xargs --max-procs=16 -I CMD bash -c CMD\n' % (Date[-4:]))
93+
print('#!/bin/bash\n# cat run_scripts_%s.sh | while read i; do printf "%%q\\n" "$i"; done | xargs --max-procs=16 -I CMD bash -c CMD\n' % (DATE[-4:]))
9694
for tune_ratio in [1.3]:
97-
tune_seed_end = 42 + NUM_REPEAT if NUM_REPEAT >= 1 else 43
95+
tune_seed_end = 42 + REPEAT if REPEAT >= 1 else 43
9896
for tune_seed in range(42, tune_seed_end, 1):
99-
for file in FileList:
97+
for file in FILELIST:
10098
filename = file.split('/')[-1]
10199
for id, policy, gsm, dem, nm in MethodList: # GpuSelMethod, DimExtMethod, NormMethod
102100
dir_name = get_dir_name_from_method([id, policy, gsm, dem, nm])
103101
gsm = policy if gsm == "<self>" else gsm
104102
OUTPUT_YAML = False
105103
SHUFFLE_POD = True
106104
outstr = "# %s, %s, %s, %s, %s @ %s\n" % (id, policy, gsm, dem, nm, filename)
107-
outstr += 'EXPDIR="experiments/%s/%s/%s/%s/%s' % (Date, filename, dir_name, tune_ratio, tune_seed)
105+
outstr += 'EXPDIR="experiments/%s/%s/%s/%s/%s' % (DATE, filename, dir_name, tune_ratio, tune_seed)
108106
outstr += '" && mkdir -p ${EXPDIR} && touch "${EXPDIR}/terminal.out" && '
109107
outstr += 'python3 scripts/generate_config_and_run.py -d "${EXPDIR}" '
110108
outstr += '-e -b '
@@ -132,7 +130,8 @@ def generate_run_scripts(asyncc=True, parallel=PARALLEL):
132130
print("wait && date")
133131

134132
if __name__=='__main__':
135-
# generate_run_scripts()
133+
# generate_run_scripts(asyncc=True)
134+
#: $ bash run_scripts.txt
136135
generate_run_scripts(asyncc=False)
137136
#: $ cat run_scripts.txt | while read i; do printf "%q\n" "$i"; done | xargs --max-procs=16 -I CMD bash -c CMD
138137

0 commit comments

Comments
 (0)