Sonar is the ultimate open source platform to manage code quality.
Implement a fully new COPY/PASTE detector algorithm
CPD, Algorithm, Simian
The current COPY/PASTE detector is based on PMD/CPD. CPD is pretty good open source implementation but which has two main drawbacks :
There isn't too much activity on this library
It requires lot of memory to work and can't be used to analyse millions of lines of code
Most COPY/PAST detector requires first to lex the source code in order to work with a list of tokens. This lexing mechanism is not part of this project which is really focused on the algorithm and CPU/Memory performances of this algorithm.
With this new algorithm, some new very valuable features could be implemented into Sonar :
Find duplications across projects
Find code duplicated from Open Source projects
Freddy Mallet, Evgeny Mandrikov
It should be possible to analyse any amount of source code with a limited amount of memory.