Testing file for modifications of SimpleSim.
SimpleSim is the original code. When the next found letter was before current offset (eg turnaround of haystack), the score for the letter is 0.
SimpleSim2 adds a score based on the distance of the found letter and the offset. It should better handle spelling disorder.
SimpleSim3 is like SimpleSim2 but gives the option to exit if a certain threshold cannot be achieved any more. The goal is here speed.
To create symmetry, SimpleSim functions are run twice in both directions.
Jaro Winkler is an established algorithm.
Levenshtein Distance is an established algorithm. To make it comparable (range 0-1 instead of distance) we calculate 1-levenshtein/maxlength.
Longest Common Sequence LCS is an established algorithm. To make it comparable, we divide it by maxlength.
The test uses a list of 150 000 english words.
Two words are chosen at random. Each function is run. If in average the similarity is above 0.7, the result is retained and displayed. The test runs until it habve 50 results.
Between all functions, CosineSimilarity is calculated for the retained results. As you can see, it is quite high, even if there ware differences for single word pairs.
For the perfomance test, a shortwords (length = 5) and a lomgwords (length = 5) table are used.
Each function is run 100'000 times and the duration is given in miliseconds. All functions use the same sets.
Dividing the results of both tests can give an information of the average complexity. If the complexity is O(n), the indicator is 2.00 (double the length, double the comparison time). If the complexity is O(n^2) the indicator is 4.0. Between, we would have O(n log n).
The test takes 30 seconds, depending on browser and CPU.
The results diverge per run, probably as javascript environment may be busy elsewhere. But it looks like Jaro Winkler is near O(n^2), Levenshtein, LCS and SimpleSim behave O(n ln n) and SimpleSim3 may have also O(n) behaviour: it has both a linear and a complexity gain compared to SimpleSim3. We can retain SimpleSim3.
2025-03-13 matti@belle-nuit.com