You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/final_presentation/final_report.Rmd
+8-2Lines changed: 8 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -145,12 +145,18 @@ As you can see above, projects with more complex graphs use more GitHub features
145
145
146
146
# Recommendations
147
147
148
-
Given the different levels of uncertainty in the insights we gathered from the data we decided to divide the recommendations in three levels: confident, tentative, and recommendations that require further exploration. For confident recommendations, we start with the insights gained from the global analysis. We observe a consistent use of programming language across the top 8 most used languages in all clusters (Fig 3). This is indicative that language is not a driver either in project size or project complexity and thus shouldn’t be used as a feature driver for a new tool. We then subsetted to work with projects with more than 100 commits, looking more in-depth into the complexity of the projects. One of the main insights from this analysis is that large projects consist mainly of single chains. This is indicative, that contrary to popular belief, people use Git in a centralized manner. We recommend that this kind of centralized workflow (the concept of the master branch) should be carried to a new tool that aims to ease and concile users workflows.
148
+
Given the different levels of uncertainty in the insights we gathered from the data we decided to divide the recommendations in three levels: confident, tentative, and recommendations that require further exploration. For confident recommendations, we start with the insights gained from the global analysis. We observe a consistent use of programming language across the top 8 most used languages in all clusters (Fig 3). This is indicative that language is not a driver either in project size or project complexity and thus shouldn’t be used as a feature driver for a new tool.
149
149
150
-
Once we started analyzing complexity, our recommendations gain a higher level of uncertainty. One process we could observe was the positive correlation between the complexity a project takes during its history and it’s usage of GitHub features such as issues and pull request reviews. Based on this finding, we recommend that a new tool should direct users towards these features as they seem to be related with complexity. We also recommend that further studies do a time series analysis of GitHub projects to determine the causality between complexity and feature usage. One of the other patterns observed is that branching is 1.4 times more prevalent than merging. This finding led us to consider recommending that a new tool should direct users to keep up to date within a certain number of commits. However, we discarded this recommendation as users might use branches for different purposes such as `gh_pages` branches. Instead of directing users to sync we recommend adding a pointer to these specific branches where users have abandoned the capability of merging them back into the central branch.
150
+
We then focused our analysis on projects that had > 100 commits and looked more in-depth into the complexity of the projects. One of the main insights from this analysis is that large projects consist mainly of single chains. This is indicative, that contrary to popular belief, people use Git in a centralized manner. We recommend that this kind of centralized workflow (the concept of the master branch) should be carried to a new tool that aims to ease and concile users workflows.
151
+
152
+
Once we started analyzing complexity, our recommendations gain a higher level of uncertainty. One process we could observe was the positive correlation between the complexity a project takes during its history and it’s usage of GitHub features such as issues and pull request reviews. Based on this finding, we recommend that a new tool should direct users towards these features as they seem to be related with complexity. We also recommend that further studies do a time series analysis of GitHub projects to determine the causality between complexity and feature usage.
153
+
154
+
Another pattern we observed related to project complexity was that branching is 1.4 times more prevalent than merging. This finding led us to consider recommending that a new tool should direct users to keep up to date within a certain number of commits. However, we discarded this recommendation as users might use branches for different purposes such as `gh_pages` branches. Instead of directing users to sync we recommend adding a pointer to these specific branches where users have abandoned the capability of merging them back into the central branch. WHY, EXPLAIN.
151
155
152
156
This finding opens the door to talk about our final recommendation that requires further exploration. Some organizations suggest that their programmers follow pre-established workflows such as the Gitflow. We couldn’t find any evidence of these workflows, however we have to make clear that this doesn’t mean that they are not present. There are three possible scenarios of why we couldn’t observe these workflows. First, the patterns might not be present. The use of Gitflow might not be there and thus didn’t distinguish itself from other patterns. Second, our unsupervised learning approach is not able to capture the patterns of the Gitflow. This suggests that an approach specifically defined to look for this patterns needs to be implemented to determine its prevalence. Lastly, the Gitflow might be loosely defined and users might be following versions of the Gitflow that approximate it but don’t strictly follow it. A study to understand this could focus on projects that explicitly claim to follow the Gitflow and apply our methodology of clustering to these projects in particular.
153
157
158
+
I THINK YOU HAVE ADDITIONAL RECOMMENDATIONS TO ADD SURROUNDING WHAT ANALYSIS, STUDIES, SURVEYS OR EXPERIMENTS COULD BE MOST FRUITFUL TO DO NEXT.
159
+
154
160
# Conclusion
155
161
156
162
The project had the objective of understanding if there were identifiable workflow patterns in the way people use Git and what subpatterns account for everyday use. To do this we performed clustering using the Graph2Vec algorithm and the K Means algorithm and then extracted motifs from different projects. This enabled us to draw recommendations both for a new tool and for future studies.
0 commit comments