Skip to content

Academic study project on JavaScript code duplication using AST parsing and string similarity.

License

Notifications You must be signed in to change notification settings

felipelealdefaria/javascript-clone-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JavaScript Clone Detection - (v0.6.0)

Academic study project on JavaScript code duplication using AST parsing with text similarity.

Usage

Run:

make init
clone-analisys <PATH> <SIMILARITY INDEX>
// clone-analisys src/api-server 0.85

Current Process

We select a piece of code to convert it into an Abstract Syntax Tree (AST) representation. Then, the cleaning and normalization phase is carried out, in which we remove unwanted attributes and apply a standardization between similar structures, such as the example of an arrow function for a regular function.

// the both code snippets are characterized as type 2 clone

const arrowFunction = (value) => {
  const { type } = value
  return type
}

function regularFunction(value) {
  // this is a regular function
  const { type } = value
  return type
};

To perform a representation of code snippets in AST, we have good libraries like:

Library Version
espree 7.3.1
@babel/parser 7.14.7
abstract-syntax-tree 2.19.1

In this project we are using abstract-syntax-tree because it is a library that offers greater facilities to manipulate an AST.

Similarity between ASTs

To perform the comparison between ASTs, even in this current version, we had two options, namely: i) Comparison between pure ASTs where we only have the return if they are identical or not, or; ii) Convert the ASTs to text (string) and use libraries that check the textual similarity between the code snippets.

Library Version Type
ast-compare 2.1.0 Compare ASTs
string-similarity 4.0.4 Compare strings
string-comparison 1.0.9 Compare strings

The decision to compare ASTs directly seems to be the most coherent decision, but so far lib ast-compare can only identify whether the pieces are identical or not. In this scenario, using the representation of Abstract Syntax Trees still gives us the advantage of being a uniform and easy-to-manipulate representation for pre-processing and normalizations, in addition to transforming it into text so that it can be compared as a textual element.

Results

Using the code snippets examples above, we have:

No pre-processing and normalization

ast-compare:  false
string-similarity (Dice):  0.925351071692535
string-comparison (Cosine):  0.9672041516493517
string-comparison (Levenshtein):  0.9072164948453608
string-comparison (Longest Common Subsequence):  0.9357933579335793
string-comparison (Metric Longest Common Subsequence):  0.9337260677466863

With pre-processing and normalization (v.0.3.1)

ast-compare:  true
string-similarity (Dice):  1
string-comparison (Cosine):  1
string-comparison (Levenshtein):  1
string-comparison (Longest Common Subsequence):  1
string-comparison (Metric Longest Common Subsequence):  1

To learn more about the issues addressed, read: ESTUDO EMPÍRICO SOBRE DUPLICAÇÃO DE CÓDIGO EM APLICAÇÕES REACT.JS.

About

Academic study project on JavaScript code duplication using AST parsing and string similarity.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published