Performance

The following benchmarks show the performance of sift. While speed is not everything, it is quite a difference whether you get your results after 2 seconds or 2 minutes, especially when you are grepping through many files over and over (e.g. when searching in large source code repositories or log files).
These results were achieved even though sift introduces new features that other tools do not have, like conditions and multiline matching.

In this comparision, all tools were configured to search the complete test data. Of course sift too can be configured to only search in specific paths/files/etc., but the aim was being fast while searching everywhere.

All searches were performed with the complete data files cached by the operating system. Three runs per test were performed and the best result was taken.
The weblog searches were done on a large server, while the rest was done on an old desktop system.


Benchmark Time
Web log files search
This search simulated searching for a specific pattern in web logs.
The search was performed over 35GB data, split over 32 files, using the pattern 'IntWebApp.*ParamName'. The logs were synthesized from real logs, and there was one valid match to find ('IntWebApp' was part of the logged URL path, while 'ParamName' was a query parameter).
0.579s
Web log files search for 10 strings in parallel
In this search, the same data as above was used, but the search was done for 10 static strings listed in a file.
5.439s
Linux source code
Listing all exported crypto symbols with line numbers - searching for "EXPORT_SYMBOL_GPL.*crypto" in the Linux kernel version 3.18.2 (637 MB).
0.426s
Wordlist search
Searching a large wordlist (1.8 GB, used in password cracking attempts) for all word variations containing 'qwerty' (returning 2722 results).
0.738s
Userlist search (ignore case)
Searching a list of usernames (125 MB) for all name variations containing 'grep' (ignore case, returning 121 results).
0.280s
Log search
This search was done over DNS logs (228 files, 15 MB). Some of the log files were already gzip'ed and the newest files were not.
0.230s