Skip to content

Commit 2355f1d

Browse files
authored
Broke up wall of text in recommendation section
1 parent 05aaba6 commit 2355f1d

File tree

1 file changed

+8
-2
lines changed

1 file changed

+8
-2
lines changed

docs/final_presentation/final_report.Rmd

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -145,12 +145,18 @@ As you can see above, projects with more complex graphs use more GitHub features
145145

146146
# Recommendations
147147

148-
Given the different levels of uncertainty in the insights we gathered from the data we decided to divide the recommendations in three levels: confident, tentative, and recommendations that require further exploration. For confident recommendations, we start with the insights gained from the global analysis. We observe a consistent use of programming language across the top 8 most used languages in all clusters (Fig 3). This is indicative that language is not a driver either in project size or project complexity and thus shouldn’t be used as a feature driver for a new tool. We then subsetted to work with projects with more than 100 commits, looking more in-depth into the complexity of the projects. One of the main insights from this analysis is that large projects consist mainly of single chains. This is indicative, that contrary to popular belief, people use Git in a centralized manner. We recommend that this kind of centralized workflow (the concept of the master branch) should be carried to a new tool that aims to ease and concile users workflows.
148+
Given the different levels of uncertainty in the insights we gathered from the data we decided to divide the recommendations in three levels: confident, tentative, and recommendations that require further exploration. For confident recommendations, we start with the insights gained from the global analysis. We observe a consistent use of programming language across the top 8 most used languages in all clusters (Fig 3). This is indicative that language is not a driver either in project size or project complexity and thus shouldn’t be used as a feature driver for a new tool.
149149

150-
Once we started analyzing complexity, our recommendations gain a higher level of uncertainty. One process we could observe was the positive correlation between the complexity a project takes during its history and it’s usage of GitHub features such as issues and pull request reviews. Based on this finding, we recommend that a new tool should direct users towards these features as they seem to be related with complexity. We also recommend that further studies do a time series analysis of GitHub projects to determine the causality between complexity and feature usage. One of the other patterns observed is that branching is 1.4 times more prevalent than merging. This finding led us to consider recommending that a new tool should direct users to keep up to date within a certain number of commits. However, we discarded this recommendation as users might use branches for different purposes such as `gh_pages` branches. Instead of directing users to sync we recommend adding a pointer to these specific branches where users have abandoned the capability of merging them back into the central branch.
150+
We then focused our analysis on projects that had > 100 commits and looked more in-depth into the complexity of the projects. One of the main insights from this analysis is that large projects consist mainly of single chains. This is indicative, that contrary to popular belief, people use Git in a centralized manner. We recommend that this kind of centralized workflow (the concept of the master branch) should be carried to a new tool that aims to ease and concile users workflows.
151+
152+
Once we started analyzing complexity, our recommendations gain a higher level of uncertainty. One process we could observe was the positive correlation between the complexity a project takes during its history and it’s usage of GitHub features such as issues and pull request reviews. Based on this finding, we recommend that a new tool should direct users towards these features as they seem to be related with complexity. We also recommend that further studies do a time series analysis of GitHub projects to determine the causality between complexity and feature usage.
153+
154+
Another pattern we observed related to project complexity was that branching is 1.4 times more prevalent than merging. This finding led us to consider recommending that a new tool should direct users to keep up to date within a certain number of commits. However, we discarded this recommendation as users might use branches for different purposes such as `gh_pages` branches. Instead of directing users to sync we recommend adding a pointer to these specific branches where users have abandoned the capability of merging them back into the central branch. WHY, EXPLAIN.
151155

152156
This finding opens the door to talk about our final recommendation that requires further exploration. Some organizations suggest that their programmers follow pre-established workflows such as the Gitflow. We couldn’t find any evidence of these workflows, however we have to make clear that this doesn’t mean that they are not present. There are three possible scenarios of why we couldn’t observe these workflows. First, the patterns might not be present. The use of Gitflow might not be there and thus didn’t distinguish itself from other patterns. Second, our unsupervised learning approach is not able to capture the patterns of the Gitflow. This suggests that an approach specifically defined to look for this patterns needs to be implemented to determine its prevalence. Lastly, the Gitflow might be loosely defined and users might be following versions of the Gitflow that approximate it but don’t strictly follow it. A study to understand this could focus on projects that explicitly claim to follow the Gitflow and apply our methodology of clustering to these projects in particular.
153157

158+
I THINK YOU HAVE ADDITIONAL RECOMMENDATIONS TO ADD SURROUNDING WHAT ANALYSIS, STUDIES, SURVEYS OR EXPERIMENTS COULD BE MOST FRUITFUL TO DO NEXT.
159+
154160
# Conclusion
155161

156162
The project had the objective of understanding if there were identifiable workflow patterns in the way people use Git and what subpatterns account for everyday use. To do this we performed clustering using the Graph2Vec algorithm and the K Means algorithm and then extracted motifs from different projects. This enabled us to draw recommendations both for a new tool and for future studies.

0 commit comments

Comments
 (0)