Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

...

Title

Implement a fully new COPY/PASTE detector algorithm

Keywords

CPD, Algorithm, Simian

Description

The current COPY/PASTE detector is based on PMD/CPD. CPD is pretty good open source implementation but which has two main drawbacks : there isn't too much activity on this library and it requires lot of memory to work so it can't be used to analyse millions of lines of code

Most COPY/PAST detector requires first to lex the source code in order to work with a list of tokens. This lexing mechanism is not part of this project which is really focused on the algorithm and CPU/Memory performances of this algorithm.
With this new algorithm, some new very valuable features could be implemented into Sonar : 

  • Find duplications across projects
  • Find code duplicated from Open Source projects

Mentor(s)

Freddy Mallet, Evgeny Mandrikov

Constraint

It should be possible to analyse any amount of source code with a limited amount of memory.

Materials


Below is a list of links to some materials, which can be very useful during preparation of proposal , so we decided to share them with you :

...