- Mathematical Documents are a Right Royal Pain
- Implementation Musings
- Parse Open Office Formulas
- Executing the Formulas
- Document Variables
- A Demo
- Prototype
- Next Step
Mathematical Documents are a Right Royal Pain
If you are involved in any engineering endeavour that is based on physics, chemistry or any other hard science (and quite likely the soft ones too) you probably write or at least read documents produced in a word processor that has embedded formulas.
I'm pretty certain that most of you will at least have written such a document and found much later, or worse, had it pointed out by someone else, that the formulas were dimensionally inconsistent. I'll bet that a lot of such documents also had actual arithmetic in them too. Of course all the arithmetic was done by hand or in some other program so its accuracy is also suspect because of the manual transfer.
Wouldn't it be nice if the formulas could be automatically checked for dimensional consistency? Even better if they formulas were live so that if the values of the parameters were known then the value of the formula could be automatically inserted.
If you have enough money and if those with whom you will share such documents are also willing you could use MathCad. But unless there is only one author and the rest just read pdf versions this is expensive. Also it is Windows/Macintosh only (or was last time I looked, in fact I'm not even sure if there is a Macintosh version).
So what is the alternative? There is only one real alternative, do it yourself.
A couple of years ago I wrote a kind of poor man's MathCad. I called it JArithmetic. It was written in VB6 and used JScript instead of formulas. You simply interleaved JScript statements with plain text and where you wanted to see the result of a formula you simply typed the of name the variable to which it was assigned with a special sequence of characters (a macro in the old and meaningful sense of the term). You then clicked a button to update the document. This collected all the JScript, executed it with the Microsoft JScript ActiveX library and then searched for the macros indicating where the results should be placed. It could even make charts using Ploticus. The documents were RichText (using the MS RTF editor control) zipped to reduce the file size (using the an open source zip utility). The whole thing was presented as a case study in the VisualBasicClassic book on WikiBooks (of which I am the principal author).
So, the basic concept is proven (well almost: it didn't do dimensional analysis but the arithmetic side worked perfectly). Of course writing JScript is not quite the same as writing camera ready formulas in MicrosoftOffice or OpenOffice. However it turns out to be quite easy to obtain the text of a formula from OpenOffice (might be equally easy to do for MicrosoftOffice but I'll leave that detail to someone else).
A good reason for going for OpenOffice (apart from it being multiplatform and free) is that Laurent Godard has written a very interesting suite of OpenOffice macros that insert tables, charts, graphics and formulas. It was while examining his code that I saw how easy it was to extract the formulas; Godard does it in order to modify the type size of the selected formula, all formulas or all those in the selected area of the document.
This means that much of the necessary knowledge is already out in the wild. All that remains is to pull it all together and write a parser for the string that represents the formula so that we can turn it into an expression in one of the languages that is available to OpenOffice and decide on a few details of syntax for the actual documents. Did I say all? How hard can the parser be? For simple expressions it is very easy we just have to replace words like times and over with the appropriate signs and curly brackets with ordinary round ones. More complex ones will need more work but I think that 90% of documents can work quite well with scalar reals and a few simple indexed expressions.
The document that prompted these musings is quite simple (real numbers, no indexing) but the author found it necessary to alter the values of some of the constants many times before the document was regarded as correct (these constants are things like thermal conductivity so they depend on the material used). It would have been much better (faster and more certain) to have a live document that could recalculate itself instead of having to laboriously correct each result by hand.
If every can afford to have the processor for such documents it makes it possible to distribute not only the raw knowledge of the calculation method but also the ability to immediately answer the question 'What happens if I change this parameter?'.
For an alternative see MavScript.
Implementation Musings
How can we implement the LiveMaths idea?
The first thing to do is limit our ambitions for the first release. Let's make a proof of concept first.
We can start by listing the things that we will definitely not include:
- no dimensional analysis,
- no charts,
- no indexed expressions,
- no complex numbers,
- no exact rationals,
- no attempt to ensure numerical stability,
- no function definitions,
- no things that detract from the proof of concept effort.
- no error checking,
- results will not be presented as formulas,
- no automatic recalculation.
What we will include:
- simple scalar expressions,
- real numbers,
- definition of variables,
- updating of placeholders that tell where to show the results,
- use JavaScript both as the implementation language and as the language used to execute the formulas.
So the basic process is this:
- scan the document for formulas,
- convert each formula into one or more OpenOfficeBasic statements (might need to declare the variables for the left hand side,
- execute the statements in sequence,
- create a document property for each variable that appeared on the left hand side of an equation, set it to the value.
The user must add fields to the document to receive the results. The field name is the variable name. This introduces some limitations on the names of variables, tough (wait for the next release, or the one after that, or ...).
Godard's code shows clearly how to collect the text of the formulas so we need to answer three questions:
- How do we parse the formula?
- How do we execute OpenOfficeBasic that we create on the fly?
- How do we create and set document variables in OpenOffice?
Parse Open Office Formulas
Here are some typical formulas created by typing the plain text and using Godard's macros to create the formulas:
| Plain text | OOo formula text | Possible OOo Basic | 
|---|---|---|
| a=1 | size 14{a={1}} | dim a: a=1 | 
| b=a+2 | size 14{b=a+{2}} | dim b: b=a+2 | 
| c=b*3 | size 14{c=b times {3}} | dim c: c=b*3 | 
| d=c/4 | size 13{d=c over {4}} | dim d: d=c/4 | 
| e=a/b | size 13{{nitalic e}=a over b} | dim e: e=a/b | 
| f=a+b/c | size 13{f=a+b over c} | dim f: f=a+b/c | 
| g=(a+b)/c | size 13{g=(a+b) over c} | dim g: g=(a+b)/c | 
| h=a+b/c*d | size 13{h=a+b over c times d} | dim h: h=a+(b/c)*d | 
| i=(a+b)/c*d | {nitalic i}=(a+b) over c times d | dim i: i=(a+b)/c*d | 
| j=(a+b)/(c*d) | {nitalic j}=(a+b) over (c times d) | dim j: j=(a+b/(c*d) | 
Note that some formulas or parts of formulas are prefixed by size n this happens if you alter the point size of the formula or part of it. You can see how the formula is built up by double clicking it. OpenOffice will open a new pane where you can edit the formula as a text string with explicit size for each part. You can also do a lot of other formatting such as sub and super script. We will ignore all that fancy stuff for now.
it looks as though we can treat words like size as function calls in a Logo like prefix notation language or perhaps we can treat size n as a function. That is if we scan the formula from left to right and encounter the word size we know that we must see a number followed by a left curly bracket. We then scan for a matching right curly bracket. We can now discard the word size and the number. For now we will ignore size within the formula. This means that we can discard one opening and one closing curly bracket. What remains is almost an assignment statement. To turn it into one we replace the word times with an asterisk and the word over with a slash.
So parsing the formula is easy. Creating a string containing the OpenOfficeBasic code is straightforward. Can we execute it?
Executing the Formulas
A search of the OpenOfficeBasic help file reveals no way to execute generated code. But there must be. If there is no direct way it must be possible to create a module, populate it with the code as a function and then call the function. This will tell us how well designed this system is. Pity it's not written in Lisp.
Not on line now so can't do a web search.
In Laurent Godard's DMath there is a function called evaluate in the module DMaths2.Parser that looks as though it might suffice for a proof of concept.
This approach is flawed because although it offers the ultimate in control it is guaranteed to be slow. It is better to bite the JavaScript bullet and use the eval method.
Aha, it turns out that what I knew had to be possible is also easy: OpenOfficeBasic can create new modules and call the code in them. This is what you do to create a module with a single public sub in it (see 'http://api.openoffice.org/servlets/ReadMsg?listName=dev&msgNo=1963):
  if not BasicLibraries.hasByName( "MyLibrary" ) then
    MyLib = BasicLibraries.createLibrary( "MyLibrary" )
    Source = "sub test" + chr(13) + "    print 42" + chr(13) + "end sub"
    MyLib.insertByName("Module1", Source)
  else
    ' assumes already created module and sub
  endif
  test ' calling this shows a message box containing *the* answer.
Document Variables
OpenOffice appears not to have a general purpose system for user defined variables(MS much is better in this respect).
Instead OpenOffice allows the user to create fields that define variables. Such fields can be hidden so that they do not show on the printout but do show on the screen as a thin grey bar about the size of a capital I.
The user could create fields that show the values of defined variables but he or she will have to create the field the defines the variable first because OpenOffice will only let you create a field that shows a variable value for variables that exist (you pick from a list). Not very convenient but it might do for this investigation. We can use the macro recorder to find out how to insert a field. The generated code is quite straightforward and easy to hack.
Another option is to have the user create equations where the left hand side is the name of variable that the user wants displayed and the right hand side is some kind of placeholder, a question mark perhaps or nothing at all.
The question then is how do we distinguish this equation from the others after we have filled it in with the calculated value? One way is to use distinct operators for equations that define a value and those that are strictly equations:
- := for definition,
- = for statement of equality.
The second kind is used to indicate that the right hand side is to be replaced with the value of the variable named on the left hand side.
So now instead of searching for placeholder text or setting field variables we can modify equations that indicate equality.
This won't work for more complicated results like charts but it should work for matrices and vectors as OpenOfficeEquations can represent those things. But as mentioned, those are for a later version.
A Demo
Using the example code that enumerates the equations we can find all the equations we need. We can sort them into two groups: definitions, output.
We then have to write a simple routine to strip size tags, convert times to asterisk, over to slash, and curly brackets to round.
Now we have formulas in very nearly the form needed by Java Script's eval method so we can execute the expression and assign the answer to a variable stored in a dictionary.
Now we can enumerate the output equations and replace the right hand side of each with the appropriate variable value.
Here is draft of an OpenOfficeBasic version without eval: LiveMathsOoBasicSource. It works and parses as desired but there is no really practical way of executing the expressions. So the next step is to add the ability create modules.
Prototype
So now we have enough information to create a functioning prototype that satisfies my original rather modest absolute minimum: ability to recalculate the values. Dimensional analysis can come later (needs an object oriented language such as JavaScript or BeanShell or a ready made system like MavScript).
So we build on the demo that extracted and parsed the formulas so that we create a module and fill it with the necessary code to execute the formulas.
Our formula extraction code can put all the left hand sides of the formulas in an array so that we can refer to them by position then we can create a module that creates a matching array with the values. Finally we can enumerate the formulas that display the results and search the array of left hand sides for a match and replace the right hand sides with the matching values.
This will work for simple documents but has the disadvantage that it ignores the order of evaluation when displaying the results. We should replace all references to variables with array references. The problem is that OpenOfficeBasic has no native hash tables so lookups are expensive; for now we assume a small document and simple formulas so this is not a problem.
The prototype works and is self contained: LiveMaths20071202.odt. By works I mean that simple formulas are correctly calculated and the results correctly displayed.
What the prototype does not do
Practically everything!
- Cannot define a range,
- Subscripted variables do not work (without ranges there is no point anyway),
- Not sure if the formulas are executed in text order,
- Formatting of output equations is not preserved,
- No way to control output formats,
- No dimensional analysis.
Next Step
Decide which of the many desirable features is to be provided next; see: LiveMathsTwo.
 
No comments:
Post a Comment