Change Detection
Algorithm
Here you can find the current way that the change detection is implemented. Possible improvements are always welcome.
1. The State Directory
A directory is created in .gradle/states. The name of the directory is the SHA-1 of the relative directory path agains the project root. This is the state directory for the directory to process. This directory will contain 2 directories ('new' and 'old'). This directory is the state directory.
2. Creating the list files
We get a flat directory listing of relative paths against the directory that needs to be processed. Then we group them by sub directory level and sort the relative paths in the groups alphabetically. Each group sub directory level group is written to a list file in state directory. The list files are have a name 'dirs.' + X '.list' with the X being the sub directory level.
3. Calculating the hash values (the new state)
We start calculating the hash values on the lowest sub directory level. The file hash values are calculated based on the relative path against the directory to process + the last modified date + the size of the file (optionaly the content of the file can be included aswell). The directory hash values are based on the hash values of the files in the directory + the hash values of the directory in the directory. During calculation of the hash values only the information of the current and previous level are kept in memory to limit the amount of memory needed. This all happens multi threaded in a thread pool each sub directory level we wait for all the directory has values to be calculated because they are needed for the next level. The thread pool size is determined based on the available processors * 2 with a minimum value of 4 threads. The file hash values are stored in a file with filename 'dir.' + the SHA-1 value of the relative directory path against the directory to process + '.state' in the 'new' state directory. After all the hash values of a sub directory level are calculated we sort the relative directory paths of the directories of that level and write the hash values of the level to a dirs.X.state file in sorted order (makes the compare step easier and more memory efficient) with the X being the sub directory level.
4. Comparing against the old known state
We start comparing the hash values on the top level sub directory. When a hash value change is detected on the current sub directory level we keep moving on to the next sub directory level and search for changed directory hash values. For each directory hash value that has changed a thread is submited to a thread pool to check the hash values of the files in that directory. For each directory / file that has a different/ new or removed hash value an event is queued on a blocking queue.
5. Change events
The change events are queued on a blocking queue. The events on the blocking queue are currently consumed by on thread. Depending on the old and new state of the event the created / changed / deleted event methods on the change processor are called.
Reflections
Multi threading
During hash value calculation we work by sub directory level, this is a rather simple way of working and making sure that all the information is available before moving on to the next level. I think it is possible to work more in a tree like way (closer to the actual directory layout) but that would make it way more complex. As the current performance is acceptable I would prefer not making this more complicated.
