Purpose
We must translate OpenOfficeMath formulas into OpenOfficeBasic. To do this we need to know what constitutes legal OpenOfficeMath syntax. We need only deal with the subset that we consider useful.
EBNF
What follows is an attempt at an EBNF syntax for OpenOfficeMath as interpreted by LiveMaths. Only expressions that can be translated into executable code wil be considered. We will, at least for now, ignore calculus.
Alternates are separated by a pipe character, optional items are delimited by square brackets.
This will undoubtedly need substantial revision when I reach the stage of coding the functions that emit the target code.
Tokens
Tokens are delimited by characters having codes in the following inclusive ranges:
- U+000..U_002F
- U+003A..U+0040
- U+007B..U+00BF
Some of these characters are the initial character of
multi-character tokens (e.g. :=), others are complete tokens in
themselves (e.g. ). The lexical analyser is responsible for
determining the token class. ** "sec-4">Formula
A formula is a left hand side followed by either a definition operator or a display operator followed by an expression.
formula := LHS defop RHS
Defop
Defop is either the is defined as operator or the display operator. The first defines the LHS as equal to the RHS and the second displays the value that the LHS has as a result of the definition.
defop := ':=' | '='
LHS
The left hand side can be a simple variable name, a subscripted variable name or a function definition.
LHS := var | fndef
RHS
The right hand side is a formatted expression:
RHS := fexpr
This is complicated by by the syntax for formatting which even though it does not cause any code to be emitted must still be recognized and discarded.
var
var := varname | subvarname |
varname
A variable name is a sequence of characters that complies with the definition of a variable name in OpenOfficeBasic. We will not do rigorous checks and will instead accept any sequence of characters that have no other specific meaning. Errors will be detected either at run time or when the OpenOfficeBasic module is created, in both cases OpenOfficeBasic does the work, we will simply report the error (with enough context for the user to correct it if we can).
varname := any token where a variable name is expected.
subvarname
A subscripted variable represents several different concepts depending on the context. It can mean:
- a simple array item reference or,
- a bulk assignment or,
- a a reference to an implicit array item where the indexvalues are auto-generated because they are simply symbols
Which is meant depends on the type of the subscript. If the subscript is an integer then an array dereference is implied but if the subscript is a range then a bulk assignment is meant. If the subscript is a symbol then we have to assign a value to the symbol somehow so that we can use it as an array index.
We can distinguish the cases by maintaining a symbol table: if the subscript is present in the symbol table then we have an array item reference that will use the value of the variable at run time, if the subscript has the form of a variable name but does not apear in the symbol table then it is a symbol not a variable. We can then concatenate it with the variable name to make a normal variable name.
subvarname := varname ('rsub' | '_' ) fexpr |
fndef
A function name is an identifier. It follows the same rules as OpenOfficeBasic.
fndef := fnname '(' [argslistdef] ')'
fnname
A function name is a sequence of characters that complies with the definition of a function name in OpenOfficeBasic. We will not do rigorous checks and will instead accept any sequence of characters that have no other specific meaning. Errors will be detected either at run time or when the OpenOfficeBasic module is created, in both cases OpenOfficeBasic will do the work, we will simply report the error (with enough context for the user to correct it if we can).
fnname := any token where a function name is expected.
argslistdef
A function definion lists the names of the arguments to the function. Datatypes are neither expected nor allowed. The definition of the argument list on the left hand side is distinct from the argument list that appears in a call to the function because the latter can have arbitrary expressions where the first only allows varnames.
argslistdef := varname [',' argslistdef ]
expr
An expression describes a procedure that returns a value. OpenOfficeMath expressions include formatting codes as well as algebraic signs, variable and function names, and constants.
fexpr := [format] expr
expr := term [exprtail] | '{' fexpr '}' | '(' fexpr ')' | |
term := [unaryop] ( var | fncall | const | sumexpr) |
fncall := fnname '(' [exprlist] ')'
exprlist := fexpr [ ',' exprlist ]
exprtail := binop fexpr
format := simplefmt | fmt |
sumexpr
Sum and product operators are described by sum product keywords followed by from and to expressions follwed by an expression to be summed.
sumexpr := (sum | prod) 'from' fexpr 'to' fexpr fexpr |
To do
Try to find a program that can analyse the syntax definition. Perhaps yacc/bison would do. A better choice might be javacc or js/cc because the first is written in and creates Java and the second is written in and creates JavaScript.
Code Generation
The code to be generated is OpenOfficeBasic. All binary operators are to be transformed into calls to predefined functions that examine the datatypes of the operands at run time to determine which code paths to execute (scalar versus matrix arithmetic for instance).
No comments:
Post a Comment