Skip to content

MUICT-SERU/CodeCloneExplainability

Repository files navigation

CodeCloneExplainability

This is the repo for Chayanee Junplong master's thesis.

-----------------------------------------------------

Dataset

Data : GoogloeCodeJam (Java)

Ref : https://github.com/parasol-aser/deepsim/tree/master/dataset

Question for GoogleCodeJam that we use : Question-GCJ
Other Question (eg. Code Jam ,Kick Start,Hash Code) : this web!
Code fragments: 1665
True clone pairs: 274959
False clone pairs: 1110321

Data : CodeNet (java)

Ref: https://developer.ibm.com/exchanges/data/all/project-codenet/

Code fragments: 75000
True clone pairs: 11212500
False clone pairs: 2801250000

Model

Ref :https://github.com/microsoft/CodeBERT/tree/master

CodeBERT

CodeBERT is a pre-trained model for programming language, which is a multi-programming-lingual model pre-trained on NL-PL pairs in 6 programming languages (Python, Java, JavaScript, PHP, Ruby, Go).

GraphCodeBERT

GraphCodeBERT is a pre-trained model for programming language that considers the inherent structure of code i.e. data flow, which is a multi-programming-lingual model pre-trained on NL-PL pairs in 6 programming languages (Python, Java, JavaScript, PHP, Ruby, Go).

We will focus on this for clone detection

About

This is the repo for Chayanee Junplong master's thesis.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages