Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This tracking mechanism is based on a probabilistic approach. The goal is to match AS MANY past violations as possible to the ones detected during the last analysis. By definition, probability means possible false-positives but it's definitely better to match to many violations than not enough. Indeed not matching a violation although means closing a review and all related comments. 

Here is the current algoritm implemented in the ViolationTrackingDecorator class : 

  1. Try first to match violations on same rule with same line and with same checkum (but not necessarily with same message)
    • When nothing has changed (at least before the line of the violation)
    • Works well
  2. Try then to match violations on same rule with same message and with same checkum 
    • Basic way to track the move of lines but with some big limitations
  3. Try then to match violations on same rule with same line and with same message
    • Useful for instance when a violation is logged on a method signature but when the method name has been changed
  4. Last check: match violation if same rule and same checksum but different line and different message
    • Useful for instance when a violation is logged on a method signature but when the method name has been changed and when the method has been moved

Here are some use cases badly covered by this algorithm :

Use CaseVersion 1Version 2
1
Code Block
linenumberstrue
languagejava
void method1(){
   System.out.println("toto"); //violation due to sysout
}
Code Block
linenumberstrue
languagejava
void method2(){
   System.out.println("toto"); //old violation whereas it should be a new one
}
 
void method1(){
   System.out.println("toto"); //new violation whereas it should be an old one
}
 
void method3(){
   System.out.println("toto"); //new violation
}
2
Code Block
linenumberstrue
languagejava
void method1(){ // violation as method is too complex
   ... //lot of code
}
Code Block
linenumberstrue
languagejava
void renamedMethod(){ //new violation whereas it should be an old one
  ... //lot of the same code
}
3
Code Block
linenumberstrue
languagejava
void method1() {
  ... // lot of code
  } // violation - Indentation
 
void method2() {
  ... // lot of code
}
Code Block
linenumberstrue
languagejava
void method1() {
  ... // lot of code
  ... // new statement
  } // violation - Indentation
 
void newMethod() {
  ... // lot of code
}
 
void method2() {
  ... // lot of code
}

Without providing the details of the implementation, here is how the tracking mechanism should work :

  1. If the source code hasn't changed, then match everything without making any additional check
  2. If the source code has changed 
    1. Identify all the lines from the past version which have been moved to another line in the new version WITHOUT any ambiguity. Match all violations on lines which have been moved without any ambiguity. (use case 1 for instance)
    2. Identify all the lines from the past version which could have been moved to several destinations (sorted by probability) in the new version (there can't be any intersection with the previous line set). Match violations whose the line matching probability is the greatest one. (use case 2 for instance)

The efficiency of the underlying mechanism directly relates to its capability to generate as many line matching as possible WITHOUT ambiguity. In Sonar 2.15, the goal is really to correctly handle cases when there isn't any ambiguity. We can imagine to improve later the algoritm to handle more ambiguous cases (use case 2 when a method name is changed). 

So in fact a standard diff algorithm would do the job for this first version but I've the feeling that the duplication detection algoritm could help us providing more none-ambiguous line matches.