Implement a fully new COPY/PASTE detector algorithm
CPD, Algorithm, Simian
The current COPY/PASTE detector is based on PMD/CPD. CPD is pretty good open source implementation but which has two main drawbacks : there isn't too much activity on this library and it requires lot of memory to work so it can't be used to analyse millions of lines of code
Most COPY/PAST detector requires first to lex the source code in order to work with a list of tokens. This lexing mechanism is not part of this project which is really focused on the algorithm and CPU/Memory performances of this algorithm.
With this new algorithm, some new very valuable features could be implemented into Sonar :
- Find duplications across projects
- Find code duplicated from Open Source projects
Freddy Mallet, Evgeny Mandrikov
It should be possible to analyse any amount of source code with a limited amount of memory.
Below is a list of links to some materials, which can be very useful during preparation of proposal , so we decided to share them with you :