merge in doc changes; minor bug fix for freq_graph

rzitomer · rzitomer · commit e2d187dc1eb7 · 2019-06-24T12:58:06.000-07:00
diff --git a/docs/proposal_presentation/proposal_report.Rmd b/docs/proposal_presentation/proposal_report.Rmd
@@ -26,6 +26,8 @@ The first approach we found was Node2Vec, which is an algorithmic framework for
 
 Another approach is to frame this problem as finding common motifs in a network. In network science, motifs are subgraphs which occur in a network at a much higher frequency than random chance[5]. We plan to identify motifs ourselves either manually (by associating common Git patterns with their motif) or algorithmically (by sampling subgraphs) and then counting their occurences in the network of Git commits.
 
+- asfd
+- fsdfgwr
 
 ## What are common workflow patterns across Git repositories?
 For analyzing and comparing features at a project level, we propose Graph2Vec[6]: A neural embedding framework to learn data-driven distributed representations of arbitrary sized graphs. We propose Graph2Vec over other subgraph analysis algorithms (Node2Vec[3] and Sub2Vec[4]) due to their lack of ability to model global structure similarities, instead focusing on local similarities within confined neighbourhoods. Using Graph2Vec, we can learn the differences within Git projects in an unsupervised manner and use the generated embeddings to cluster similar graphs together with widely-used clustering algorithms. 
diff --git a/docs/proposal_presentation/proposal_report.pdf b/docs/proposal_presentation/proposal_report.pdf
diff --git a/src/github_analysis/freq_graph.py b/src/github_analysis/freq_graph.py
@@ -1,4 +1,4 @@
-from os import remove
+\from os import remove
 import pickle
 import glob
 
@@ -51,10 +51,11 @@ def visualize_motif_samples_bar_graph(motifs, plot_title='Motif Frequency in Dat
         A bar chart figure of the most common motifs and how often they occurred.
     """
     motifs_sorted = sorted(motifs.items(), key=lambda kv: kv[1], reverse=True)
-
+    single_chain_occurences = 0
     # output files with individual motif images to be used in bar graph
     occurrences = []
     for n, motif in enumerate(motifs_sorted):
+
         # print(motif[1])
         # nx.draw_spectral(motif[0], node_size=500, arrowsize=40, width=6)
         # plt.show()
diff --git a/src/github_analysis/motif_finder.py b/src/github_analysis/motif_finder.py
@@ -1,4 +1,4 @@
-"""
+\"""
 Sample usage (from project root dir): python src/github_analysis/motif_finder.py 0
 
 Functions for implementing the following algo, suggested by Trevor Campbell:
@@ -92,7 +92,10 @@ def get_sample_motif(self, k, recursion_limit=5000):
             A motif (nx subgraph) of length k.
         """
         sys.setrecursionlimit(recursion_limit)
-        root = self.sample_initial_node()
+        try:
+            root = self.sample_initial_node()
+        except IndexError:
+            return nx.DiGraph()
         edges = nx.bfs_edges(self.G, root) # https://networkx.github.io/documentation/networkx-2.2/reference/algorithms/generated/networkx.algorithms.traversal.breadth_first_search.bfs_edges.html#networkx.algorithms.traversal.breadth_first_search.bfs_edges
         nodes = [root] + [v for u, v in edges]
         if len(nodes) >= k:
@@ -231,4 +234,4 @@ def get_motifs_by_cluster(clusters, data_layer, k_for_motifs=5, number_of_sample
 
     #    get_most_common_motifs_from_clusters(clusters, k_for_motifs=i, output_folder_suffix='motif_size_is_' + str(i))
     # for i in range(2,102,10):
-    #   get_most_common_motifs_from_clusters(clusters, k_for_motifs=i, output_folder_suffix='motif_size_is_' + str(i))
+    #   get_most_common_motifs_from_clusters(clusters, k_for_motifs=i, output_folder_suffix='motif_size_is_' + str(i))