Sila Dev Log: Defining Macro for Parser Rules
Posted on 3rd of October 2023 | 1178 wordsPlug: Follow the Sila development here. |
So, like I mentioned in a previous post , my hands have been quite full with Baldur’s Gate 3, so I haven’t been able to program too much Sila. But thankfully, while I enjoyed the game through and through, it’s nice to be back to hacking.
I started to add more parser rules for Sila, simple ones still, but crucial nonetheless. These included stuff like parsing equality (==, !=), relational (>, <, <=, >=) and unary nodes (-1, +2). While writing these rules, I quickly realized that I’m repeating myself quite a bit. So as a Lisp hacker, naturally I decided to reach for macros in this case to make my own life just a little bit easier.
If we look at the structure on how I decided to parse equality and relational nodes, they looked something like this:
(defun parse-equality-node (tok) "equality-node ::== relational-node ( '==' relational-node | '!=' relational-node ) *" (multiple-value-bind (node rest) (parse-relational-node tok) (loop (cond ((string= (token-val rest) "==") (multiple-value-bind (node2 rest2) (parse-relational-node (token-next rest)) (setf node (make-ast-node :kind :equal :lhs node :rhs node2)) (setf rest rest2))) ((string= (token-val rest) "!=") (multiple-value-bind (node2 rest2) (parse-relational-node (token-next rest)) (setf node (make-ast-node :kind :not-equal :lhs node :rhs node2)) (setf rest rest2))) (t (return-from parse-equality-node (values node rest))))))) (defun parse-relational-node (tok) "relational-node ::== add ( '<' add | '<=' add | '>' add | '>=' add ) *" (multiple-value-bind (node rest) (parse-add-node tok) (loop (cond ((string= (token-val rest) "<") (multiple-value-bind (node2 rest2) (parse-add-node (token-next rest)) (setf node (make-ast-node :kind :lesser-than :lhs node :rhs node2)) (setf rest rest2))) ((string= (token-val rest) "<=") (multiple-value-bind (node2 rest2) (parse-add-node (token-next rest)) (setf node (make-ast-node :kind :lesser-or-equal :lhs node :rhs node2)) (setf rest rest2))) ((string= (token-val rest) ">") (multiple-value-bind (node2 rest2) (parse-add-node (token-next rest)) (setf node (make-ast-node :kind :greater-than :lhs node :rhs node2)) (setf rest rest2))) ((string= (token-val rest) ">=") (multiple-value-bind (node2 rest2) (parse-add-node (token-next rest)) (setf node (make-ast-node :kind :greater-or-equal :lhs node :rhs node2)) (setf rest rest2))) (t (return-from parse-relational-node (values node rest)))))))
So the structure between these are pretty much identical. First, I bind the
values that I get from the next parser rule, e.g. parse-relational-node
or
parse-add-node
, and I run infinite loop and check the next tokens and create
nodes based on that.
Macro Definition
Function definition can be broken down to a following macro:
(defmacro define-parser (name &key descent-parser comparison-symbols bnf) "Macro for generating new parser rules." (let ((parser-name (intern (format nil "PARSE-~a-NODE" name))) (descent-parser-name (intern (format nil "PARSE-~a-NODE" descent-parser)))) `(defun ,parser-name (tok) ,bnf (multiple-value-bind (node rest) (,descent-parser-name tok) (loop (cond ,@(loop :for symbol in comparison-symbols :collect `((string= (token-val rest) ,(car symbol)) (multiple-value-bind (node2 rest2) (,descent-parser-name (token-next rest)) (setf node (make-ast-node :kind ,(cdr symbol) :lhs node :rhs node2)) (setf rest rest2)))) (t (return-from ,parser-name (values node rest)))))))))
So what is happening here:
First I define new symbols to the package that I will use inside the macro. This is done with the intern function .
When defining macros in Lisp, you often see code that is inside a backquote (`), this signals that every expression inside that is not preceded by a comma is to be quoted. So above you can see some places where there is comma in front of some expressions, those will be evaluated when the macro is run.
- For example, if
parser-name
equals toparse-example-node
then`(defun ,parser-name ())
would evaluate to(defun parse-example-node ())
.
- For example, if
Last crucial piece in the macro is the way I build the conditional for the parsing itself.
Essentially how I do this is that I build a list of backquoted expressions like:
((string= (token-val rest) "<") (multiple-value-bind (node2 rest2) (parse-add-node (token-next rest)) (setf node (make-ast-node :kind :lesser-than :lhs node :rhs node2)) (setf rest rest2)))
Based on all the comparison symbols are given in to macro.
The collected list is inside
,@
which basically means that evaluate the following expression (,) and splat the containing list (@). So ifsome-list
equals to(1 2 3)
, then`(fn ,@some-list)
would equal to(fn 1 2 3)
.
Now when the macro is defined, I can just define the parser rules in a following manner:
(define-parser equality :descent-parser relational :comparison-symbols (("==" . :equal) ("!=" . :not-equal)) :bnf "equality-node ::== relational-node ( '==' relational-node | '!=' relational-node ) *") (define-parser relational :descent-parser add :comparison-symbols (("<" . :lesser-than) ("<=" . :lesser-or-equal) (">" . :greater-than) (">=" . :greater-or-equal)) :bnf "relational-node ::== add ( '<' add | '<=' add | '>' add | '>=' add ) *") (define-parser add :descent-parser multiplicative :comparison-symbols (("+" . :add) ("-" . :sub)) :bnf "add-node ::== multiplicative-node ( '+' multiplicative-node | '-' multiplicative-node ) *") (define-parser multiplicative :descent-parser unary :comparison-symbols (("*" . :mul) ("/" . :div)) :bnf "multiplicative-node ::== unary-node ( '*' unary-node | '/' unary-node ) *")
To see what those macros expand to you can just run macroexpand
on them, for
example:
(define-parser relational :descent-parser add :comparison-symbols (("<" . :lesser-than) ("<=" . :lesser-or-equal) (">" . :greater-than) (">=" . :greater-or-equal)) :bnf "relational-node ::== add ( '<' add | '<=' add | '>' add | '>=' add ) *")
Expands to:
(defun parse-relational-node (tok) "relational-node ::== add ( '<' add | '<=' add | '>' add | '>=' add ) *" (multiple-value-bind (node rest) (parse-add-node tok) (loop (cond ((string= (token-val rest) "<") (multiple-value-bind (node2 rest2) (parse-add-node (token-next rest)) (setf node (make-ast-node :kind :lesser-than :lhs node :rhs node2)) (setf rest rest2))) ((string= (token-val rest) "<=") (multiple-value-bind (node2 rest2) (parse-add-node (token-next rest)) (setf node (make-ast-node :kind :lesser-or-equal :lhs node :rhs node2)) (setf rest rest2))) ((string= (token-val rest) ">") (multiple-value-bind (node2 rest2) (parse-add-node (token-next rest)) (setf node (make-ast-node :kind :greater-than :lhs node :rhs node2)) (setf rest rest2))) ((string= (token-val rest) ">=") (multiple-value-bind (node2 rest2) (parse-add-node (token-next rest)) (setf node (make-ast-node :kind :greater-or-equal :lhs node :rhs node2)) (setf rest rest2))) (t (return-from parse-relational-node (values node rest)))))))
Cool, seems to be identical to the earlier definition that I had. So now when I need to add new parser rules, I can just utilize this macro to do them, saving me of writing unnecessary boilerplate. I probably am not able to use this macro for all the definitions. For example currently the topmost parser rule is defined in a following manner:
(defun parse-expression-node (tok) "expression-node ::== equality" (parse-equality-node tok))
So it doesn’t really make sense to use that macro for defining something like that. Similarly, unary and primary nodes are defined in a slightly different manner currently:
(defun parse-unary-node (tok) "unary-node ::== ( '+' | '-' ) unary | primary-node" (cond ((string= (token-val tok) "+") (parse-unary-node (token-next tok))) ((string= (token-val tok) "-") (multiple-value-bind (node rest) (parse-unary-node (token-next tok)) (values (make-ast-node :kind :neg :lhs node) rest))) (t (parse-primary-node tok)))) (defun parse-primary-node (tok) "primary-node ::== '(' expression-node ')' | number" (cond ((eq (token-kind tok) :num) (values (make-ast-node :kind :number :val (token-val tok)) (token-next tok))) ((string= (token-val tok) "(") (multiple-value-bind (node rest) (parse-expression-node (token-next tok)) (values node (token-next (skip-to-token ")" rest))))) (t (error 'parser-error))))
Which I could make it so that the macro above would define these kind of parser rules if e.g. some special key is given in, but for now, I’m completely fine by defining these by hand.
NB: If you want to read earlier posts of this dev log, head over here. |