Skip to content

Commit 05aaba6

Browse files
authored
some edits on the remainder of the results section
1 parent 19077d0 commit 05aaba6

File tree

1 file changed

+4
-7
lines changed

1 file changed

+4
-7
lines changed

docs/final_presentation/final_report.Rmd

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -109,13 +109,11 @@ The two most important things when choosing the motif length to study are 1) bei
109109

110110
As you can see from the graph above, motifs length twenty-five are the smallest motifs that are not overwhelmed by single chains. We also chose to look at motifs length five because, while they are overwhelmed by single chains, they are easy to inspect visually.
111111

112-
### Advantages and Limitations
113-
114112
One clear advantage to using motifs to summarize common subgraphs is that it allows us to view the data in a digestible and interpretable way. While approaches to understand graph structures that rely on neural networks (such as Graph2Vec) are incredibly useful for understanding latent features that the human eye might not pick up, the results from these models are not as easily interpretable as motifs.
115113

116114
One limitation to this approach is there is no simple way to get overall numerical summary statistics for a project or group of projects based on just these motifs. Also, a limitation in the data that affects the motifs is that we have no way to associate commits on the same branches with one other. If we did have this data, we’d be able to color motifs by branch and get a more sophisticated view of the structure.
117115

118-
Below are the most common motifs, length 5 and 25.
116+
Below are the most common motifs of length 5 and 25.
119117

120118

121119
![](imgs/motif_5_visual.png)
@@ -128,17 +126,16 @@ Below are the most common motifs, length 5 and 25.
128126

129127
### Key Findings
130128

131-
One thing that stood out in our result set was that there were many more branches than merges. In fact, there were 1.41 times as many commits that had at least one branch coming from them as had at least one merge coming to them. We also observed that many branches that are created aren’t merged back in within the next five or even twenty-five commits. While we can’t directly calculate what percent of branches go five or twenty-five commits without a merge because of the limitations in the data, we can get a rough idea of this percentage by looking at only the motifs that start with a branch and seeing what percentage of those don’t contain even a single merge. These results are summarized below.
129+
One thing that stood out in our analysis was that there were 1.41 times as many commits that had at least one branch coming from them as had at least one merge coming to them. We also observed that many branches that are created are not merged back within the next five or even twenty-five commits. While we cannot directly calculate what percent of branches go five or twenty-five commits without a merge because of the limitations in the data, we can get a rough idea of this percentage by looking at only the motifs that start with a branch and seeing what percentage of those do not contain a single merge. These results are summarized below.
132130

131+
*Table 1: Frequency of Motifs that Start with a Branch and Don’t End in a Merge*
133132
| Motif Length| Percentage of Motifs Which Start with a Branch The Do Not Contain a Merge|
134133
|------------|---------------|
135134
| Length 5 | 42% |
136135
| Length 25 | 13% |
137136

138-
*Table 1: Frequency of Motifs that Start with a Branch and Don’t End in a Merge*
139-
140137
### Extensions
141-
Given the high percentage of motifs that were just a single chain, we were curious about projects that were made up of all or mostly single chains and how they compared with projects that had more branching and merging. To examine this further, we calculated the percent of motifs in a project that were not just single chains as a proxy for graph complexity, and then divided the dataset into projects whose graph complexity was in the top 40% and projects whose graph complexity were in the bottom 40%. We then calculated the mean number of issues, pull requests, and code reviews for these high- and low- complexity projects and compared the differences.
138+
Given the high percentage of motifs that were made up of only a single chain, we were curious about projects that were made up of all or mostly single chains and how they compared with projects that had more branching and merging. To examine this further, we calculated the percent of motifs in a project that were not just single chains as a proxy for graph complexity. We then divided the dataset into projects whose graph complexity was in the top 40% and projects whose graph complexity were in the bottom 40%. We then calculated the mean number of issues, pull requests, and code reviews for these high- and low- complexity projects and compared the differences.
142139

143140
![](imgs/GH_features_by_complexity.png)
144141

0 commit comments

Comments
 (0)