Skip to content

Commit e2d187d

Browse files
committed
merge in doc changes; minor bug fix for freq_graph
1 parent a208dea commit e2d187d

File tree

4 files changed

+11
-5
lines changed

4 files changed

+11
-5
lines changed

docs/proposal_presentation/proposal_report.Rmd

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,8 @@ The first approach we found was Node2Vec, which is an algorithmic framework for
2626

2727
Another approach is to frame this problem as finding common motifs in a network. In network science, motifs are subgraphs which occur in a network at a much higher frequency than random chance[5]. We plan to identify motifs ourselves either manually (by associating common Git patterns with their motif) or algorithmically (by sampling subgraphs) and then counting their occurences in the network of Git commits.
2828

29+
- asfd
30+
- fsdfgwr
2931

3032
## What are common workflow patterns across Git repositories?
3133
For analyzing and comparing features at a project level, we propose Graph2Vec[6]: A neural embedding framework to learn data-driven distributed representations of arbitrary sized graphs. We propose Graph2Vec over other subgraph analysis algorithms (Node2Vec[3] and Sub2Vec[4]) due to their lack of ability to model global structure similarities, instead focusing on local similarities within confined neighbourhoods. Using Graph2Vec, we can learn the differences within Git projects in an unsupervised manner and use the generated embeddings to cluster similar graphs together with widely-used clustering algorithms.
-34 Bytes
Binary file not shown.

src/github_analysis/freq_graph.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
from os import remove
1+
\from os import remove
22
import pickle
33
import glob
44

@@ -51,10 +51,11 @@ def visualize_motif_samples_bar_graph(motifs, plot_title='Motif Frequency in Dat
5151
A bar chart figure of the most common motifs and how often they occurred.
5252
"""
5353
motifs_sorted = sorted(motifs.items(), key=lambda kv: kv[1], reverse=True)
54-
54+
single_chain_occurences = 0
5555
# output files with individual motif images to be used in bar graph
5656
occurrences = []
5757
for n, motif in enumerate(motifs_sorted):
58+
5859
# print(motif[1])
5960
# nx.draw_spectral(motif[0], node_size=500, arrowsize=40, width=6)
6061
# plt.show()

src/github_analysis/motif_finder.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
"""
1+
\"""
22
Sample usage (from project root dir): python src/github_analysis/motif_finder.py 0
33

44
Functions for implementing the following algo, suggested by Trevor Campbell:
@@ -92,7 +92,10 @@ def get_sample_motif(self, k, recursion_limit=5000):
9292
A motif (nx subgraph) of length k.
9393
"""
9494
sys.setrecursionlimit(recursion_limit)
95-
root = self.sample_initial_node()
95+
try:
96+
root = self.sample_initial_node()
97+
except IndexError:
98+
return nx.DiGraph()
9699
edges = nx.bfs_edges(self.G, root) # https://networkx.github.io/documentation/networkx-2.2/reference/algorithms/generated/networkx.algorithms.traversal.breadth_first_search.bfs_edges.html#networkx.algorithms.traversal.breadth_first_search.bfs_edges
97100
nodes = [root] + [v for u, v in edges]
98101
if len(nodes) >= k:
@@ -231,4 +234,4 @@ def get_motifs_by_cluster(clusters, data_layer, k_for_motifs=5, number_of_sample
231234

232235
# get_most_common_motifs_from_clusters(clusters, k_for_motifs=i, output_folder_suffix='motif_size_is_' + str(i))
233236
# for i in range(2,102,10):
234-
# get_most_common_motifs_from_clusters(clusters, k_for_motifs=i, output_folder_suffix='motif_size_is_' + str(i))
237+
# get_most_common_motifs_from_clusters(clusters, k_for_motifs=i, output_folder_suffix='motif_size_is_' + str(i))

0 commit comments

Comments
 (0)