You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/final_presentation/final_report.Rmd
+4-7Lines changed: 4 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -109,13 +109,11 @@ The two most important things when choosing the motif length to study are 1) bei
109
109
110
110
As you can see from the graph above, motifs length twenty-five are the smallest motifs that are not overwhelmed by single chains. We also chose to look at motifs length five because, while they are overwhelmed by single chains, they are easy to inspect visually.
111
111
112
-
### Advantages and Limitations
113
-
114
112
One clear advantage to using motifs to summarize common subgraphs is that it allows us to view the data in a digestible and interpretable way. While approaches to understand graph structures that rely on neural networks (such as Graph2Vec) are incredibly useful for understanding latent features that the human eye might not pick up, the results from these models are not as easily interpretable as motifs.
115
113
116
114
One limitation to this approach is there is no simple way to get overall numerical summary statistics for a project or group of projects based on just these motifs. Also, a limitation in the data that affects the motifs is that we have no way to associate commits on the same branches with one other. If we did have this data, we’d be able to color motifs by branch and get a more sophisticated view of the structure.
117
115
118
-
Below are the most common motifs, length 5 and 25.
116
+
Below are the most common motifs of length 5 and 25.
119
117
120
118
121
119

@@ -128,17 +126,16 @@ Below are the most common motifs, length 5 and 25.
128
126
129
127
### Key Findings
130
128
131
-
One thing that stood out in our result set was that there were many more branches than merges. In fact, there were 1.41 times as many commits that had at least one branch coming from them as had at least one merge coming to them. We also observed that many branches that are created aren’t merged back in within the next five or even twenty-five commits. While we can’t directly calculate what percent of branches go five or twenty-five commits without a merge because of the limitations in the data, we can get a rough idea of this percentage by looking at only the motifs that start with a branch and seeing what percentage of those don’t contain even a single merge. These results are summarized below.
129
+
One thing that stood out in our analysis was that there were 1.41 times as many commits that had at least one branch coming from them as had at least one merge coming to them. We also observed that many branches that are created are not merged back within the next five or even twenty-five commits. While we cannot directly calculate what percent of branches go five or twenty-five commits without a merge because of the limitations in the data, we can get a rough idea of this percentage by looking at only the motifs that start with a branch and seeing what percentage of those do not contain a single merge. These results are summarized below.
132
130
131
+
*Table 1: Frequency of Motifs that Start with a Branch and Don’t End in a Merge*
133
132
| Motif Length| Percentage of Motifs Which Start with a Branch The Do Not Contain a Merge|
134
133
|------------|---------------|
135
134
| Length 5 | 42% |
136
135
| Length 25 | 13% |
137
136
138
-
*Table 1: Frequency of Motifs that Start with a Branch and Don’t End in a Merge*
139
-
140
137
### Extensions
141
-
Given the high percentage of motifs that were just a single chain, we were curious about projects that were made up of all or mostly single chains and how they compared with projects that had more branching and merging. To examine this further, we calculated the percent of motifs in a project that were not just single chains as a proxy for graph complexity, and then divided the dataset into projects whose graph complexity was in the top 40% and projects whose graph complexity were in the bottom 40%. We then calculated the mean number of issues, pull requests, and code reviews for these high- and low- complexity projects and compared the differences.
138
+
Given the high percentage of motifs that were made up of only a single chain, we were curious about projects that were made up of all or mostly single chains and how they compared with projects that had more branching and merging. To examine this further, we calculated the percent of motifs in a project that were not just single chains as a proxy for graph complexity. We then divided the dataset into projects whose graph complexity was in the top 40% and projects whose graph complexity were in the bottom 40%. We then calculated the mean number of issues, pull requests, and code reviews for these high- and low- complexity projects and compared the differences.
0 commit comments