De Abreu Santos, Fabio Marcos (2023) Supporting the task-driven skill identification in open source project issue tracking systems. Doctoral thesis, Northern Arizona University.
|
Text
de_Abreu_Santos_2023_supporting_task-driven_skill_identification_open_.pdf - Published Version Download (4MB) |
Abstract
Selecting an appropriate issue, also known as a task, ticket, bug, or feature request, is challenging for newcomers to Open Source Software (OSS) projects. To facilitate filtering and issue selection, researchers and practitioners have proposed several strategies to automatically add labels to the issues. However, the results vary, and these approaches are still far from mainstream adoption, possibly because of a lack of good predictors and relevant labels.In this research, we investigate how to assist new contributors in finding an issue when onboarding a new project. To achieve our goal, we predict the domains of the APIs declared in the source code that solves the issues and use this information as labels in the issue tracker. Starting from a case study using one project and an empirical experiment, we found that predicting API domains is feasible and the API-domain labels are relevant to select an issue. Next, we generalize the predictions for five projects in different programming languages, issue trackers, and development modes. In the sequence, we employed interviews and a survey to identify what strategies communities adopt to assist the new contributors in finding a task. We found that maintainers, frequent contributors, and new contributors diverge about the importance of the strategies, but labeling issues is one of the most relevant strategies. Additionally, inspired by previous research, we mined conversation data from OSS projects' repositories to investigate whether predictions might benefit from leveraging metrics derived from communication data and social network analysis (SNA). We studied how these "social metrics" improve the automatic labeling of open issues with API domains. We also ran an empirical experiment to measure the influence of the API domain labels on the contribution progress and correctness. We observed the API-domain labels improved the participants' progress in proposing and coding a solution. Finally, we designed an OSS demonstration tool to recommend issues to contributors regarding the API domains they select in a user interface. The performance of the classifiers reached 0.922 precision, 0.978 recall, and 0.942 F-measure. These results indicate our models can predict API domain labels. We also found that assigning labels to issues is relevant for diverse developers in OSS communities because it can indicate the skills involved in a solution to the issues. By investigating this research topic, we expect to assist OSS communities in attracting and onboarding new contributors, who are very important for the sustainability of the projects.
| Item Type: | Thesis (Doctoral) |
|---|---|
| Publisher’s Statement: | © Copyright is held by the author. Digital access to this material is made possible by the Cline Library, Northern Arizona University. Further transmission, reproduction or presentation of protected items is prohibited except with permission of the author. |
| Keywords: | Labeling; Machine Learning; Mining Software Repositories; Open Source Software; Skills; Social Network Analysis |
| Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
| NAU Depositing Author Academic Status: | Student |
| Department/Unit: | Graduate College > Theses and Dissertations College of Engineering, Informatics, and Applied Sciences > School of Informatics, Computing, and Cyber Systems |
| Date Deposited: | 01 Oct 2025 17:01 |
| Last Modified: | 01 Oct 2025 17:01 |
| URI: | https://openknowledge.nau.edu/id/eprint/6104 |
Actions (login required)
![]() |
IR Staff Record View |
Downloads
Downloads per month over past year
