Dotplot Version 2
Goal improve the plotting speed.
The principal problem with the original implementation is its lack of speed. It also lacks features but that is another topic.
To improve the speed it is necessary to improve the plotting part of the processas this is where most of the time is spent. The parser is very simple and reasonably quick but would become a problem if it were enhanced to do a better tokenising job.
At the moment the plotter operates in O(n^2) time where n is the average number of tokens in the files, because the plotter consists of two nested loops that pick up tokens from one file and compare them with the token at the same position in the other file. If they are the same a dot is plotted at the current coordinate. This is not the fastest way to do it, partly because it requires a n^2 string comparisons. We could speed up the plotting part by storing the token numbers in a record with the tokens. Each token would have a structure something like:
record begin tokenstring record begin file1tokennumbers() file2tokennumbers() end end
Now the plotter simply picks up each token record and plots a point for each combination of token numbers. For example if the record looks like:
record begin tokenstring="Add" record begin file1tokennumbers()=(1,22) file2tokennumbers()=(10,22,30) end end
Then dots will be plotted at (1, 10), (1, 22), (1, 30), (22, 10), (22, 22), (22, 30).
Plotting the dotplot.frm against itself with this version takes between one and two seconds, with the first version between three and four. That's on my machine of course, but it is the relationship that counts.
No comments:
Post a Comment