Sila Dev Log: Defining Macro for Parser Rules

Posted on 3rd of October 2023 | 1178 words
Plug: Follow the Sila development here.

So, like I mentioned in a previous post , my hands have been quite full with Baldur’s Gate 3, so I haven’t been able to program too much Sila. But thankfully, while I enjoyed the game through and through, it’s nice to be back to hacking.

I started to add more parser rules for Sila, simple ones still, but crucial nonetheless. These included stuff like parsing equality (==, !=), relational (>, <, <=, >=) and unary nodes (-1, +2). While writing these rules, I quickly realized that I’m repeating myself quite a bit. So as a Lisp hacker, naturally I decided to reach for macros in this case to make my own life just a little bit easier.

If we look at the structure on how I decided to parse equality and relational nodes, they looked something like this:

(defun parse-equality-node (tok)
  "equality-node ::== relational-node ( '==' relational-node
                                      | '!=' relational-node ) *"
  (multiple-value-bind (node rest)
      (parse-relational-node tok)
    (loop
      (cond ((string= (token-val rest) "==")
             (multiple-value-bind (node2 rest2)
                 (parse-relational-node (token-next rest))
               (setf node (make-ast-node :kind :equal :lhs node :rhs node2))
               (setf rest rest2)))
            ((string= (token-val rest) "!=")
             (multiple-value-bind (node2 rest2)
                 (parse-relational-node (token-next rest))
               (setf node (make-ast-node :kind :not-equal :lhs node :rhs node2))
               (setf rest rest2)))
            (t
             (return-from parse-equality-node
               (values node rest)))))))

(defun parse-relational-node (tok)
  "relational-node ::== add ( '<'  add
                            | '<=' add
                            | '>'  add
                            | '>=' add ) *"
  (multiple-value-bind (node rest)
      (parse-add-node tok)
    (loop
      (cond ((string= (token-val rest) "<")
             (multiple-value-bind (node2 rest2)
                 (parse-add-node (token-next rest))
               (setf node (make-ast-node :kind :lesser-than :lhs node :rhs node2))
               (setf rest rest2)))
            ((string= (token-val rest) "<=")
             (multiple-value-bind (node2 rest2)
                 (parse-add-node (token-next rest))
               (setf node (make-ast-node :kind :lesser-or-equal :lhs node :rhs node2))
               (setf rest rest2)))
            ((string= (token-val rest) ">")
             (multiple-value-bind (node2 rest2)
                 (parse-add-node (token-next rest))
               (setf node (make-ast-node :kind :greater-than :lhs node :rhs node2))
               (setf rest rest2)))
            ((string= (token-val rest) ">=")
             (multiple-value-bind (node2 rest2)
                 (parse-add-node (token-next rest))
               (setf node (make-ast-node :kind :greater-or-equal :lhs node :rhs node2))
               (setf rest rest2)))
            (t
             (return-from parse-relational-node
               (values node rest)))))))

So the structure between these are pretty much identical. First, I bind the values that I get from the next parser rule, e.g. parse-relational-node or parse-add-node, and I run infinite loop and check the next tokens and create nodes based on that.

Macro Definition

Function definition can be broken down to a following macro:

(defmacro define-parser (name &key descent-parser
                                   comparison-symbols
                                   bnf)
  "Macro for generating new parser rules."
  (let ((parser-name (intern (format nil "PARSE-~a-NODE" name)))
        (descent-parser-name (intern (format nil "PARSE-~a-NODE" descent-parser))))
    `(defun ,parser-name (tok)
       ,bnf
       (multiple-value-bind (node rest)
           (,descent-parser-name tok)
         (loop
           (cond
             ,@(loop :for symbol in comparison-symbols
                     :collect `((string= (token-val rest) ,(car symbol))
                                (multiple-value-bind (node2 rest2)
                                    (,descent-parser-name (token-next rest))
                                  (setf node (make-ast-node :kind ,(cdr symbol)
                                                            :lhs node
                                                            :rhs node2))
                                  (setf rest rest2))))
             (t
              (return-from ,parser-name
                (values node rest)))))))))

So what is happening here:

Now when the macro is defined, I can just define the parser rules in a following manner:

(define-parser equality
  :descent-parser relational
  :comparison-symbols (("==" . :equal)
                       ("!=" . :not-equal))
  :bnf "equality-node ::== relational-node ( '==' relational-node | '!=' relational-node ) *")

(define-parser relational
  :descent-parser add
  :comparison-symbols (("<" . :lesser-than)
                       ("<=" . :lesser-or-equal)
                       (">" . :greater-than)
                       (">=" . :greater-or-equal))
  :bnf "relational-node ::== add ( '<'  add | '<=' add | '>'  add | '>=' add ) *")

(define-parser add
  :descent-parser multiplicative
  :comparison-symbols (("+" . :add)
                       ("-" . :sub))
  :bnf "add-node ::== multiplicative-node ( '+' multiplicative-node | '-' multiplicative-node ) *")

(define-parser multiplicative
  :descent-parser unary
  :comparison-symbols (("*" . :mul)
                       ("/" . :div))
  :bnf "multiplicative-node ::== unary-node ( '*' unary-node | '/' unary-node ) *")

To see what those macros expand to you can just run macroexpand on them, for example:

(define-parser relational
  :descent-parser add
  :comparison-symbols (("<" . :lesser-than)
                       ("<=" . :lesser-or-equal)
                       (">" . :greater-than)
                       (">=" . :greater-or-equal))
  :bnf "relational-node ::== add ( '<'  add | '<=' add | '>'  add | '>=' add ) *")

Expands to:

(defun parse-relational-node (tok)
  "relational-node ::== add ( '<'  add | '<=' add | '>'  add | '>=' add ) *"
  (multiple-value-bind (node rest)
      (parse-add-node tok)
    (loop
     (cond
      ((string= (token-val rest) "<")
       (multiple-value-bind (node2 rest2)
           (parse-add-node (token-next rest))
         (setf node (make-ast-node :kind :lesser-than :lhs node :rhs node2))
         (setf rest rest2)))
      ((string= (token-val rest) "<=")
       (multiple-value-bind (node2 rest2)
           (parse-add-node (token-next rest))
         (setf node
                 (make-ast-node :kind :lesser-or-equal :lhs node :rhs node2))
         (setf rest rest2)))
      ((string= (token-val rest) ">")
       (multiple-value-bind (node2 rest2)
           (parse-add-node (token-next rest))
         (setf node (make-ast-node :kind :greater-than :lhs node :rhs node2))
         (setf rest rest2)))
      ((string= (token-val rest) ">=")
       (multiple-value-bind (node2 rest2)
           (parse-add-node (token-next rest))
         (setf node
                 (make-ast-node :kind :greater-or-equal :lhs node :rhs node2))
         (setf rest rest2)))
      (t (return-from parse-relational-node (values node rest)))))))

Cool, seems to be identical to the earlier definition that I had. So now when I need to add new parser rules, I can just utilize this macro to do them, saving me of writing unnecessary boilerplate. I probably am not able to use this macro for all the definitions. For example currently the topmost parser rule is defined in a following manner:

(defun parse-expression-node (tok)
  "expression-node ::== equality"
  (parse-equality-node tok))

So it doesn’t really make sense to use that macro for defining something like that. Similarly, unary and primary nodes are defined in a slightly different manner currently:

(defun parse-unary-node (tok)
  "unary-node ::== ( '+' | '-' ) unary | primary-node"
  (cond ((string= (token-val tok) "+")
         (parse-unary-node (token-next tok)))
        ((string= (token-val tok) "-")
         (multiple-value-bind (node rest)
             (parse-unary-node (token-next tok))
           (values (make-ast-node :kind :neg :lhs node)
                   rest)))
        (t
         (parse-primary-node tok))))

(defun parse-primary-node (tok)
  "primary-node ::== '(' expression-node ')' | number"
  (cond ((eq (token-kind tok) :num)
         (values (make-ast-node :kind :number :val (token-val tok))
                 (token-next tok)))
        ((string= (token-val tok) "(")
         (multiple-value-bind (node rest)
             (parse-expression-node (token-next tok))
           (values node (token-next (skip-to-token ")" rest)))))
        (t (error 'parser-error))))

Which I could make it so that the macro above would define these kind of parser rules if e.g. some special key is given in, but for now, I’m completely fine by defining these by hand.

NB: If you want to read earlier posts of this dev log, head over here.