Link to preprint: https://arxiv.org/pdf/2406.06164
We gathered our data from SO Data Dump released on March 2023.
We have organized our Stack Overflow data into three folders:
1-Full Dataset: CSV files containing data from the Stack Overflow Data Dump, including security analysis information.
data.csv --> All gathered data.
R1.csv --> Posts violated R1 rule.
R2.csv --> Posts violated R2 rules.
R3a.csv --> Posts violated R3a rule.
R3b.csv --> Posts violated R3b rule.
R3c.csv --> Posts violated R3c rule.
R3d.csv --> Posts violated R3d rule.
R3e.csv --> Posts violated R3e rule.
R3f.csv --> Posts violated R3f rule.
R3g.cv --> Posts violated R3g rule.
R4.csv --> Posts violated R4 rules.
2-Sample Dataset: CSV files containing data from 400 posts extracted from the Full Dataset, along with details on challenges and concerns.
data.csv --> All sample
>>Technical issues folder
>>Non-functional issues folder
R1.csv --> Posts violated R1 rule.
R2.csv --> Posts violated R2 rules.
R3a.csv --> Posts violated R3a rule.
R3b.csv --> Posts violated R3b rule.
R3c.csv --> Posts violated R3c rule.
R3d.csv --> Posts violated R3d rule.
R3e.csv --> Posts violated R3e rule.
R3f.csv --> Posts violated R3f rule.
R3g.cv --> Posts violated R3g rule.
R4.csv --> Posts violated R4 rules.
3-ChatGPT: CSV files containing links to ChatGPT-generated answers(GPT-3.5) for Stack Overflow questions.
ChatGPT.csv