Sila Dev Log: Defining Macro for Parser Rules
Posted on 3rd of October 2023So, like I mentioned in a previous post, my
hands have been quite full with Baldur’s Gate 3, so I haven’t been able to
program too much Sila. But thankfully, while I enjoyed the game through and
through, it’s nice to be back to hacking.
I started to add more parser rules for Sila, simple ones still, but crucial
nonetheless. These included stuff like parsing equality (==, !=), relational
(>, <, <=, >=) and unary nodes (-1, +2). While writing these rules, I quickly
realized that I’m repeating myself quite a bit. So as a Lisp hacker, naturally
I decided to reach for macros in this case to make my own life just a little
bit easier.
If we look at the structure on how I decided to parse equality and relational
nodes, they looked something like this:
(defun parse-equality-node (tok)
"equality-node ::== relational-node ( '==' relational-node
| '!=' relational-node ) *"
(multiple-value-bind (node rest)
(parse-relational-node tok)
(loop
(cond ((string= (token-val rest) "==")
(multiple-value-bind (node2 rest2)
(parse-relational-node (token-next rest))
(setf node (make-ast-node :kind :equal :lhs node :rhs node2))
(setf rest rest2)))
((string= (token-val rest) "!=")
(multiple-value-bind (node2 rest2)
(parse-relational-node (token-next rest))
(setf node (make-ast-node :kind :not-equal :lhs node :rhs node2))
(setf rest rest2)))
(t
(return-from parse-equality-node
(values node rest)))))))
(defun parse-relational-node (tok)
"relational-node ::== add ( '<' add
| '<=' add
| '>' add
| '>=' add ) *"
(multiple-value-bind (node rest)
(parse-add-node tok)
(loop
(cond ((string= (token-val rest) "<")
(multiple-value-bind (node2 rest2)
(parse-add-node (token-next rest))
(setf node (make-ast-node :kind :lesser-than :lhs node :rhs node2))
(setf rest rest2)))
((string= (token-val rest) "<=")
(multiple-value-bind (node2 rest2)
(parse-add-node (token-next rest))
(setf node (make-ast-node :kind :lesser-or-equal :lhs node :rhs node2))
(setf rest rest2)))
((string= (token-val rest) ">")
(multiple-value-bind (node2 rest2)
(parse-add-node (token-next rest))
(setf node (make-ast-node :kind :greater-than :lhs node :rhs node2))
(setf rest rest2)))
((string= (token-val rest) ">=")
(multiple-value-bind (node2 rest2)
(parse-add-node (token-next rest))
(setf node (make-ast-node :kind :greater-or-equal :lhs node :rhs node2))
(setf rest rest2)))
(t
(return-from parse-relational-node
(values node rest)))))))So the structure between these are pretty much identical. First, I bind the
values that I get from the next parser rule, e.g. parse-relational-node or
parse-add-node, and I run infinite loop and check the next tokens and create
nodes based on that.
Macro Definition
Function definition can be broken down to a following macro:
(defmacro define-parser (name &key descent-parser
comparison-symbols
bnf)
"Macro for generating new parser rules."
(let ((parser-name (intern (format nil "PARSE-~a-NODE" name)))
(descent-parser-name (intern (format nil "PARSE-~a-NODE" descent-parser))))
`(defun ,parser-name (tok)
,bnf
(multiple-value-bind (node rest)
(,descent-parser-name tok)
(loop
(cond
,@(loop :for symbol in comparison-symbols
:collect `((string= (token-val rest) ,(car symbol))
(multiple-value-bind (node2 rest2)
(,descent-parser-name (token-next rest))
(setf node (make-ast-node :kind ,(cdr symbol)
:lhs node
:rhs node2))
(setf rest rest2))))
(t
(return-from ,parser-name
(values node rest)))))))))So what is happening here:
First I define new symbols to the package that I will use inside the macro.
This is done with the intern
function.
When defining macros in Lisp, you often see code that is inside a backquote
(`), this signals that every expression inside that is not preceded by a
comma is to be quoted. So above you can see some places where there is comma
in front of some expressions, those will be evaluated when the macro is run.
- For example, if
parser-name equals to parse-example-node then `(defun ,parser-name ()) would evaluate to (defun parse-example-node ()).
Last crucial piece in the macro is the way I build the conditional for the
parsing itself.
Essentially how I do this is that I build a list of backquoted
expressions like:
((string= (token-val rest) "<")
(multiple-value-bind (node2 rest2)
(parse-add-node (token-next rest))
(setf node (make-ast-node :kind :lesser-than :lhs node :rhs node2))
(setf rest rest2)))Based on all the comparison symbols are given in to macro.
The collected list is inside ,@ which basically means that evaluate the
following expression (,) and splat the containing list (@). So if
some-list equals to (1 2 3), then `(fn ,@some-list) would equal to
(fn 1 2 3).
Now when the macro is defined, I can just define the parser rules in a
following manner:
(define-parser equality
:descent-parser relational
:comparison-symbols (("==" . :equal)
("!=" . :not-equal))
:bnf "equality-node ::== relational-node ( '==' relational-node | '!=' relational-node ) *")
(define-parser relational
:descent-parser add
:comparison-symbols (("<" . :lesser-than)
("<=" . :lesser-or-equal)
(">" . :greater-than)
(">=" . :greater-or-equal))
:bnf "relational-node ::== add ( '<' add | '<=' add | '>' add | '>=' add ) *")
(define-parser add
:descent-parser multiplicative
:comparison-symbols (("+" . :add)
("-" . :sub))
:bnf "add-node ::== multiplicative-node ( '+' multiplicative-node | '-' multiplicative-node ) *")
(define-parser multiplicative
:descent-parser unary
:comparison-symbols (("*" . :mul)
("/" . :div))
:bnf "multiplicative-node ::== unary-node ( '*' unary-node | '/' unary-node ) *")To see what those macros expand to you can just run macroexpand on them, for
example:
(define-parser relational
:descent-parser add
:comparison-symbols (("<" . :lesser-than)
("<=" . :lesser-or-equal)
(">" . :greater-than)
(">=" . :greater-or-equal))
:bnf "relational-node ::== add ( '<' add | '<=' add | '>' add | '>=' add ) *")Expands to:
(defun parse-relational-node (tok)
"relational-node ::== add ( '<' add | '<=' add | '>' add | '>=' add ) *"
(multiple-value-bind (node rest)
(parse-add-node tok)
(loop
(cond
((string= (token-val rest) "<")
(multiple-value-bind (node2 rest2)
(parse-add-node (token-next rest))
(setf node (make-ast-node :kind :lesser-than :lhs node :rhs node2))
(setf rest rest2)))
((string= (token-val rest) "<=")
(multiple-value-bind (node2 rest2)
(parse-add-node (token-next rest))
(setf node
(make-ast-node :kind :lesser-or-equal :lhs node :rhs node2))
(setf rest rest2)))
((string= (token-val rest) ">")
(multiple-value-bind (node2 rest2)
(parse-add-node (token-next rest))
(setf node (make-ast-node :kind :greater-than :lhs node :rhs node2))
(setf rest rest2)))
((string= (token-val rest) ">=")
(multiple-value-bind (node2 rest2)
(parse-add-node (token-next rest))
(setf node
(make-ast-node :kind :greater-or-equal :lhs node :rhs node2))
(setf rest rest2)))
(t (return-from parse-relational-node (values node rest)))))))Cool, seems to be identical to the earlier definition that I had. So now when
I need to add new parser rules, I can just utilize this macro to do them,
saving me of writing unnecessary boilerplate. I probably am not able to use
this macro for all the definitions. For example currently the topmost parser
rule is defined in a following manner:
(defun parse-expression-node (tok)
"expression-node ::== equality"
(parse-equality-node tok))
So it doesn’t really make sense to use that macro for defining something like
that. Similarly, unary and primary nodes are defined in a slightly different
manner currently:
(defun parse-unary-node (tok)
"unary-node ::== ( '+' | '-' ) unary | primary-node"
(cond ((string= (token-val tok) "+")
(parse-unary-node (token-next tok)))
((string= (token-val tok) "-")
(multiple-value-bind (node rest)
(parse-unary-node (token-next tok))
(values (make-ast-node :kind :neg :lhs node)
rest)))
(t
(parse-primary-node tok))))
(defun parse-primary-node (tok)
"primary-node ::== '(' expression-node ')' | number"
(cond ((eq (token-kind tok) :num)
(values (make-ast-node :kind :number :val (token-val tok))
(token-next tok)))
((string= (token-val tok) "(")
(multiple-value-bind (node rest)
(parse-expression-node (token-next tok))
(values node (token-next (skip-to-token ")" rest)))))
(t (error 'parser-error))))Which I could make it so that the macro above would define these kind of
parser rules if e.g. some special key is given in, but for now, I’m completely
fine by defining these by hand.