Skip to content

Commit eee4da0

Browse files
committed
add file for snakemake
1 parent 160cbf6 commit eee4da0

File tree

1 file changed

+21
-0
lines changed

1 file changed

+21
-0
lines changed

Snakefile

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
rule run_analysis:
2+
input:
3+
data_path = "/home/user/RStudio-Data-Repository/clean_data/commits_by_org.feather"
4+
output:
5+
results_path = directory("results")
6+
params:
7+
python_hash_seed = 0,
8+
n_workers = 8,
9+
n_projects = 1000,
10+
min_commits = None,
11+
min_count = 5,
12+
n_personas = 5,
13+
n_neurons = 128,
14+
n_iter = 10,
15+
random_state = 1
16+
shell:
17+
"PYTHONHASHSEED={params.python_hash_seed} python src/github_analysis/main.py -dp {input.data_path} -rp {output.results_path} -nw {params.n_workers} -np {params.n_projects} -mc {params.min_commits} -mcount {params.min_count} -nps {params.n_personas} -nn {params.n_neurons} -ni {params.n_iter} -rs {params.random_state}"
18+
19+
# Commented out because repo is currently over bandwidth: https://help.github.com/en/articles/about-storage-and-bandwidth-usage
20+
#rule clone_data_repo:
21+
# shell: "git clone https://github.com/UBC-MDS/RStudio-Data-Repository.git"

0 commit comments

Comments
 (0)