-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Labels
help wantedExtra attention is neededExtra attention is neededresearchStudies and/or research neededStudies and/or research needed
Description
Resources to check - checked mark means we have read the resource thoroughly (Ongoing effort, feel free to add and/or update):
- Resources from Tiffany:
- Rohan Alexander, Lindsay Katz, Callandra Moore, Michael Wing-Cheung Wong, & Zane Schwartz. (2024). Evaluating the Decency and Consistency of Data Validation Tests Generated by LLMs.
- Gawande, A. (2010). Checklist manifesto, the (HB). Penguin Books India.
- Pineau, J., Vincent-Lamarre, P., Sinha, K., Lariviere, V., Beygelzimer, A., d'Alche-Buc, F., Fox, E., & Larochelle, H. (2021). Improving reproducibility in machine learning research (a report from the neurips 2019 reproducibility program). Journal of Machine Learning Research, 22(164), 1–20.
- Jeremy Jordan. (2020). Effective testing for machine learning systems.
- Eugene Yan. (2020). How to Test Machine Learning Code and Systems. .
- Ribeiro, M., Wu, T., Guestrin, C., & Singh, S. (2020). Beyond accuracy: Behavioral testing of NLP models with CheckList. arXiv preprint arXiv:2005.04118.
- Focuses on NLP models
- Three kinds of post-training tests: Invariance Tests, Directional Expectation Tests and Minimum Functionality Tests.
- Cheng, D., Cao, C., Xu, C., & Ma, X. (2018). Manifesting Bugs in Machine Learning Code: An Explorative Study with Mutation Testing. In 2018 IEEE International Conference on Software Quality, Reliability and Security (QRS) (pp. 313-324).
- Openja, M., Khomh, F., Foundjem, A., Ming, Z., Abidi, M., Hassan, A., & others (2023). Studying the Practices of Testing Machine Learning Software in the Wild. arXiv preprint arXiv:2312.12604.
- Silva, S., & De França, B. (2023). A Case Study on Data Science Processes in an Academia-Industry Collaboration. In Proceedings of the XXII Brazilian Symposium on Software Quality (pp. 1–10).
- Houssem Ben Braiek, & Foutse Khomh (2020). On testing machine learning programs. Journal of Systems and Software, 164, 110542.
- Wattanakriengkrai, S., Chinthanet, B., Hata, H., Kula, R., Treude, C., Guo, J., & Matsumoto, K. (2022). GitHub repositories with links to academic papers: Public access, traceability, and evolution. Journal of Systems and Software, 183, 111117.
- Schäfer, M., Nadi, S., Eghbali, A., & Tip, F. (2024). An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation. IEEE Transactions on Software Engineering, 50(1), 85-105.
- Arghavan Moradi Dakhel, Amin Nikanjam, Vahid Majdinasab, Foutse Khomh, & Michel C. Desmarais (2024). Effective test generation using pre-trained Large Language Models and mutation testing. Information and Software Technology, 107468.
- Resources from our own research:
- Yu, B. (2017). Testing on the Toilet: Keep Cause and Effect Clear.
- Kent, K. (2024). Prefer Narrow Assertions in Unit Tests.
- Yu, B. (2018). Testing on the Toilet: Keep Tests Focused.
- Winters, T. (2024). Test Failures Should Be Actionable.
- Trenk, A. (2014). Testing on the toilet: Writing descriptive test names.
- Augustus Odena, & Ian Goodfellow. (2018). TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing.
- Coverage-Guided Fuzzing, similar to mutation testing?
- "quantify the area covered by radial neighborhoods around these activation vectors"
Metadata
Metadata
Assignees
Labels
help wantedExtra attention is neededExtra attention is neededresearchStudies and/or research neededStudies and/or research needed