Tuesday, 9 March 2010

Dot Plot Version Two

Dotplot Version 2

Goal improve the plotting speed.

The principal problem with the original implementation is its lack of speed. It also lacks features but that is another topic.

To improve the speed it is necessary to improve the plotting part of the processas this is where most of the time is spent. The parser is very simple and reasonably quick but would become a problem if it were enhanced to do a better tokenising job.

At the moment the plotter operates in O(n^2) time where n is the average number of tokens in the files, because the plotter consists of two nested loops that pick up tokens from one file and compare them with the token at the same position in the other file. If they are the same a dot is plotted at the current coordinate. This is not the fastest way to do it, partly because it requires a n^2 string comparisons. We could speed up the plotting part by storing the token numbers in a record with the tokens. Each token would have a structure something like:

record begin
  tokenstring
  record begin
    file1tokennumbers()
    file2tokennumbers()
  end
end

Now the plotter simply picks up each token record and plots a point for each combination of token numbers. For example if the record looks like:

record begin
  tokenstring="Add"
  record begin
    file1tokennumbers()=(1,22)
    file2tokennumbers()=(10,22,30)
  end
end

Then dots will be plotted at (1, 10), (1, 22), (1, 30), (22, 10), (22, 22), (22, 30).

Plotting the dotplot.frm against itself with this version takes between one and two seconds, with the first version between three and four. That's on my machine of course, but it is the relationship that counts.

No comments:

Post a Comment

Blog Archive

Followers