Sila Dev Log: Implementing Local Variables

Posted on 16th of November 2023 | 2838 words

I’ve been little bit slacking on the writing aspects of things but also little bit on the side of adding new features to the language as well. So much of my time have been spent on various bikeshedding topics. Although important work nonetheless. I won’t be writing too much about those cleanups and refactorings but you can follow the development of Sila from the link above.

After the bikeshedding, or yak-shaving, I got to writing new features such as initial implementation of the control flow for the language (conditionals and loops). So I could open a little bit on the development of those.

Before I started implementing either conditionals or loops, I wanted to implement the ability to assign local variables, such as a := 0. Starting with tokenization, I needed to differentiate somehow between identifiers and reserved keywords. Starting with my tokenize, adding support for those was simply adding following to it:

(defun tokenize (src)
  "Generate tokens from the given source code."
  (let* ((head (make-token))
         (cur head)
         (src-pos 0))

    (macrolet ((gentoken (kind)
                 (let ((token-gen-fn (intern (format nil "GEN-~a-TOKEN" kind))))
                   `(multiple-value-bind (token pos)
                        (,token-gen-fn src src-pos)
                      (setf (token-next cur) token)
                      (setf cur (token-next cur))
                      (setf src-pos pos)))))

      (loop :while (< src-pos (length src))
            :do (cond
                  ;; [...]

                  ;; Ident or keyword
                  ((alpha-char-p (char src src-pos))
                   (gentoken ident-or-keyword))

                  (t
                   (error 'lexer-error
                          :lexer-input src
                          :error-msg "Invalid token."
                          :token-pos src-pos)))))
    ;; No more tokens.
    (setf (token-next cur) (make-token :kind :eof :position src-pos))
    (setf cur (token-next cur))

    (token-next head)))

Which basically checks that if the current character in the given source code starts with a letter, it should be tokenized as either identifier or keyword. I wrote a simple helper macro genmacro to cleanup some code, which when called with something like (genmacro ident-or-keyword) would call function gen-ident-or-keyword and inserts the generated token to the token list.

gen-ident-or-keyword itself looks like this:

(defvar *sila-keywords*
  #("return" "if" "else" "for" "loop" "break"))

(defun gen-ident-or-keyword-token (src src-pos)
  "Generate IDENT or KEYWORD token and return it and the SRC-POS to the next
token in SRC."
  (flet ((keyword-lookup (input pos)
           (let ((keyword-end (skip-to #'whitespacep input pos)))
             ;; Keyword not found
             (when (null keyword-end)
               (return-from keyword-lookup (values nil pos)))
             (let ((keyword (subseq input pos keyword-end)))
               (if (find keyword *sila-keywords* :test #'string=)
                   (values keyword
                           (skip-to #'(lambda (c) (not (whitespacep c)))
                                    input keyword-end))
                   (values nil pos))))))

    (multiple-value-bind (keyword next-token-pos)
        (keyword-lookup src src-pos)

      (let* ((punct-pos (skip-to #'punctuatorp src src-pos))
             (token-len (cond (keyword (length keyword))
                              (punct-pos (- punct-pos src-pos))
                              (t (- (length src) src-pos))))
             (token-val (if keyword
                            keyword
                            (trim-whitespace
                             (subseq src src-pos (+ src-pos token-len))))))

        (setf src-pos (cond (keyword next-token-pos)
                            (punct-pos punct-pos)
                            (t (length src))))

        (values (make-token :kind (if keyword :keyword :ident)
                            :value token-val
                            :length (length token-val)
                            :position src-pos)
                src-pos)))))

Which is pretty straight-forward function that tries to tokenize something like a := 0; into a following structure:

SILA/LEXER> (tokenize "a := 0;")
#S(TOKEN
   :KIND :IDENT
   :POSITION 2
   :LENGTH 1
   :VALUE "a"
   :NEXT #S(TOKEN
            :KIND :PUNCT
            :POSITION 4
            :LENGTH 2
            :VALUE ":="
            :NEXT #S(TOKEN
                     :KIND :NUM
                     :POSITION 6
                     :LENGTH 1
                     :VALUE "0"
                     :NEXT #S(TOKEN
                              :KIND :PUNCT
                              :POSITION 7
                              :LENGTH 1
                              :VALUE ";"
                              :NEXT #S(TOKEN
                                       :KIND :EOF
                                       :POSITION 7
                                       :LENGTH 0
                                       :VALUE ""
                                       :NEXT NIL)))))

One thing I immediately noticed with my current tokenization setup was that if I wrote an identifier that start with a number, it would have been tokenized similarly to how rest of the numbers would get tokenized. Which of course wouldn’t work with identifiers. So I had to add a simple error handling in the number generation side to check if the identifier can’t start with numbers:

(defun gen-number-token (src src-pos)
  "Generate token for NUMBER and return it and the SRC-POS to the next token in
SRC."
   ;; [...]

    ;; Idents starting with a letter will be caught with
    ;; a different conditional so if this is hit, ident
    ;; starts with a number but contains letters, which
    ;; isn't acceptable.
    (unless (every #'digit-char-p token-val)
      (error 'lexer-error
             :lexer-input src
             :error-msg "Ident can't start with a number."
             :token-pos src-pos))

    ;; [...]
    )

This setup also handles keywords as intended, which naturally will be needed for the implementation of the control flow:

SILA/LEXER> (print-tokens (tokenize "{ a := 0; return a; }"))
#S(TOKEN :KIND PUNCT	:POSITION 1	:LENGTH 1	:VALUE {)
#S(TOKEN :KIND IDENT	:POSITION 4	:LENGTH 1	:VALUE a)
#S(TOKEN :KIND PUNCT	:POSITION 6	:LENGTH 2	:VALUE :=)
#S(TOKEN :KIND NUM	:POSITION 8	:LENGTH 1	:VALUE 0)
#S(TOKEN :KIND PUNCT	:POSITION 9	:LENGTH 1	:VALUE ;)
#S(TOKEN :KIND KEYWORD	:POSITION 17	:LENGTH 6	:VALUE return)
#S(TOKEN :KIND IDENT	:POSITION 18	:LENGTH 1	:VALUE a)
#S(TOKEN :KIND PUNCT	:POSITION 19	:LENGTH 1	:VALUE ;)
#S(TOKEN :KIND PUNCT	:POSITION 21	:LENGTH 1	:VALUE })
#S(TOKEN :KIND EOF	:POSITION 21	:LENGTH 0	:VALUE )
; No value

Parsing side of things is little bit more involved on the other hand. I started the implementation with writing following structures:

(defstruct (ast-node-variable
            (:include ast-node)
            (:copier nil))
  (object (util:required 'object) :type object :read-only t))

(defstruct (object
            (:copier nil))
  (name (util:required 'name) :type string :read-only t)
  (offset 0 :type integer)
  (next nil :type t))

(defstruct (ast-node-assign
            (:include ast-node)
            (:copier nil))
  (var (util:required 'var) :type ast-node-variable :read-only t)
  (expr (util:required 'expr) :type t :read-only t))

Parsing the variables and assign statements then looks like this:

(defun parse-assign-node (tok)
  (let (node)
    (multiple-value-bind (eql-node rest)
        (parse-equality-node tok)
      (setf node eql-node
            tok rest))

    (when (string= (lex:token-value tok) ":=")
      (multiple-value-bind (expr-node rest)
          (parse-assign-node (lex:token-next tok))
        (setf node (make-ast-node-assign :var node
                                         :expr expr-node)
              tok rest)))

    (values node tok)))

;;; [...]

(defvar *local-variables* nil
  "Global variable for holding local variable objects.")

(defun parse-primary-node (tok)
  (cond
    ;; [...]

    ((eq (lex:token-kind tok) :ident)
     (let* ((name (lex:token-value tok))
            (var (find-local-var name)))

       (when (null var)
         (setf var (make-object :name name :next *local-variables*))
         ;; New object should be in front of the list.
         (setf *local-variables* var))

       (values (make-ast-node-variable :object var)
               (lex:token-next tok))))

    ;; [...]

    (t (error "Unexpected token value: ~a" tok))))

Since I’m using recursive descent parsing, parsing the variable node happens via the equality parsing function which trickles down all the way to primary node. Again relatively straight forward parsing here. One thing to worth noting is how I save all the local variables to separate variable called *local-variables*. This is mainly for the fact that when it comes to code generation I need to save the fixed offsets in the memory for each variable in use. So I need to know all the variables that I have parsed. There is probably bunch of optimizations that could be done, but for now, I’m not interested in those.

Setting those variable offsets happens after the whole program has been parsed:

(defstruct (func
            (:copier nil))
  (body (util:required 'body) :type t :read-only t)
  (locals (util:required locals) :type t :read-only t)
  (stack-size 0 :type integer))

;;; [...]

(defun parse-program (tok)
  (labels ((align-to (n align)
             "Round N to the nearest multiple of ALIGN."
             (* (ceiling n align) align))

           (set-lvar-offsets (program)
             (let ((offset 0))
               (loop :for obj := (func-locals program)
                       :then (setf obj (object-next obj))
                     :until (null obj)
                     :do (progn
                           (incf offset 8)
                           (setf (object-offset obj) (- offset))))
               (setf (func-stack-size program) (align-to offset 16)))
             (values)))

    (let* ((head (make-ast-node))
           (cur head))

      (loop :until (eq (lex:token-kind tok) :eof)
            :do (multiple-value-bind (node rest)
                    (parse-statement-node tok)
                  (setf (ast-node-next cur) node)
                  (setf cur (ast-node-next cur))
                  (setf tok rest)))

      (let ((program (make-func :body (ast-node-next head)
                                :locals *local-variables*)))
        (set-lvar-offsets program)
        program))))

And that’s about for tokenization and parsing. When parsing a program with local variables it should look something like this:

SILA/PARSER> (parse-program (lex:tokenize "{ a := 0; return a; }"))
#S(FUNC
   :BODY #S(AST-NODE-BLOCK
            :NEXT NIL
            :BODY #S(AST-NODE-EXPRESSION
                     :NEXT #S(AST-NODE-RETURN
                              :NEXT NIL
                              :EXPR #S(AST-NODE-VARIABLE
                                       :NEXT NIL
                                       :OBJECT #S(OBJECT
                                                  :NAME "a"
                                                  :OFFSET -8
                                                  :NEXT NIL)))
                     :EXPR #S(AST-NODE-ASSIGN
                              :NEXT NIL
                              :VAR #S(AST-NODE-VARIABLE
                                      :NEXT NIL
                                      :OBJECT #S(OBJECT
                                                 :NAME "a"
                                                 :OFFSET -8
                                                 :NEXT NIL))
                              :EXPR #S(AST-NODE-INTEGER-LITERAL
                                       :NEXT NIL
                                       :VALUE 0))))
   :LOCALS #S(OBJECT :NAME "a" :OFFSET -8 :NEXT NIL)
   :STACK-SIZE 16)

Finally, we can proceed to the code generation aspects!

To implement local variables with x86 assembly language, we need to write epilogue and prologue for our function (currently only main function) which prepares the stack for use within the function. These will look like this:

	; Prologue
	push %rbp
	mov %rsp, %rbp
	sub STACK_SIZE, %rsp

	; Epilogue
	mov %rbp, %rsp
	pop %rbp
	ret

So what’s happening here:

  • Prologue:

    • push %rbp: Saves the value of the base pointer register (%rbp) onto the stack.
    • mov %rsp, %rbp: Sets the value of the stack pointer (%rsp) as the new base pointer (%rbp).
    • sub STACK_SIZE, %rsp: Allocates space on the stack for local variables by subtracting a amount required by our local variables (STACK_SIZE, calculated above) from the stack pointer (%rsp).
  • Epilogue:

    • mov %rbp, %rsp: Restores the stack pointer to its original value by copying the value of the base pointer back to the stack pointer.
    • pop %rbp: Restores the original value of the base pointer by popping it from the stack.
    • ret: Returns control flow back to the calling function.

in x86, similar thing could be achieved with enter and leave instructions but they do little bit more than just pushing/popping and moving. So I might start using those at some point if the prologue and epilogue starts to get more complicated. For now, this raw assembly is good enough for me.

So, if we want to write generate x86-64 code for the code we have above ({ a := 0; return a; }), we have to write the following assembly:

	.globl main
main:
	; Prologue
	push %rbp
	mov %rsp, %rbp
	sub $16, %rsp

	; Saving 'a' to an address offset relative to the base pointer.
	lea -8(%rbp), %rax
	push %rax
	mov $0, %rax
	pop %rdi
	mov %rax, (%rdi)

	; Calculating the address offset of 'a' and moving it to rax for return.
	lea -8(%rbp), %rax
	mov (%rax), %rax

	; Epilogue
	mov %rbp, %rsp
	pop %rbp
	ret

Again, this code could be cleaned up and optimized quite a bit and doesn’t need to be so involved with simple code like this, but I’m leaving the code generation optimizations tasks for future me.

With the code above, we now have a great test for starting to implement the code generation itself. In that, with the AST tree that we generated above, generating code for it is also pretty trivial:

(defun generate-expression (node)
  "Recursively generate the x86-64 assembly code."
  (flet ((lea (node)
           "Load effective address"
           (format nil "lea ~d(%rbp), %rax"
                   (parser:object-offset
                    (parser:ast-node-variable-object node)))))
    (let ((insts (make-inst-array)))
      (cond
        ;; [...]

        ((parser:ast-node-variable-p node)
         (vector-push-extend (lea node) insts)
         (vector-push-extend (format nil "mov (%rax), %rax") insts)
         insts)

        ((parser:ast-node-assign-p node)
         (vector-push-extend (lea (parser:ast-node-assign-var node)) insts)
         (vector-push-extend (asm-push) insts)
         (do-vector-push-inst (generate-expression
                               (parser:ast-node-assign-expr node)) insts)
         (vector-push-extend (asm-pop "rdi") insts)
         (vector-push-extend (format nil "mov %rax, (%rdi)") insts)
         insts)

        ;; [...]
        ))))

So the code above essentially is for the following assembly code:

	lea OFFSET(%rbp), %rax
	push %rax
	mov VALUE, %rax
	pop %rdi
	mov %rax, (%rdi)

In case we want to return the value we need to do the following:

(defun generate-statement (node)
  (let ((insts (make-inst-array)))
    (cond
      ;; [...]

      ((parser:ast-node-return-p node)
       (do-vector-push-inst (generate-expression
                             (parser:ast-node-return-expr node)) insts)
       (vector-push-extend (format nil "jmp .L.return") insts)
       insts)

      ;; [...]
      )))

Which is for generating assembly code for statement like return <expr>;. Here I decided the use jmp jmp .L.return, so in case I would write an return statement deeply nested in other statements, like if or for etc., I could just exit early from those and jump directly to the epilogue.

So printing the whole program might look something like this:

(defun emit-code (src &key (stream nil) (indent 2) (indent-tabs t))
  "Emit assembly code from given source code. Currently emits only x86-64 and
only Linux is tested."
  ;; Init environment
  (setf parser:*local-variables* nil
        *stack-depth* 0
        *label-count* 0)

  (let ((indent (if indent-tabs
                    #\Tab
                    (coerce (make-list indent
                                       :initial-element #\Space)
                            'string))))

    (let ((program (parser:parse-program (lex:tokenize src))))

      ;; TODO(topi): these instructions probably should be collected to some
      ;; structure so they can be divided in to sections more easily when the
      ;; programs become more complex.
      (format stream
              "~{~a~%~}"
              (alexandria:flatten
               (list
                ;; ASM Directive
                (format nil "~a.globl main" indent)

                ;; Main Label
                "main:"

                ;; Prologue
                (format nil "~apush %rbp" indent)
                (format nil "~amov %rsp, %rbp" indent)
                (format nil "~asub $~a, %rsp" indent
                        (parser:func-stack-size program))

                ;; ASM Routine
                (loop :for inst
                        :across (generate-statement (parser:func-body program))
                      :collect (if (string= (subseq inst 0 3) ".L.")
                                   ;; If instruction is label (.L. prefix),
                                   ;; don't indent it.
                                   (format nil "~a" inst)
                                   (format nil "~a~a" indent inst)))

                ;; Return label
                ".L.return:"

                ;; Epilogue
                (format nil "~amov %rbp, %rsp" indent)
                (format nil "~apop %rbp" indent)

                ;; Return
                (format nil "~aret" indent)))))))

Parameters stream, indent and indent-tabs are not needed for the functionality of code generation itself, but they are just used here as helpers.

With that we can write a couple of test to see that programs actually work as intended:

(deftest test-compilation-and-compare-rc
  (testing "Integer"
    (ok (compile-program-and-compare-rc "{ return 0; }" 0))
    (ok (compile-program-and-compare-rc "{ ;;;;; return 1; }" 1)))

  (testing "Arithmetics"
    (ok (compile-program-and-compare-rc "{ return 5 + 40 - 20; }" 25))
    (ok (compile-program-and-compare-rc "{ return 2 / (1 + 1) * 8; }" 8)))

  (testing "Unary"
    (ok (compile-program-and-compare-rc "{ return - -10; }" 10))
    (ok (compile-program-and-compare-rc "{ return -10+20; }" 10)))

  (testing "Comparisons"
    (ok (compile-program-and-compare-rc "{ return 0==1; }" 0))
    (ok (compile-program-and-compare-rc "{ return 1!=1; }" 0))
    (ok (compile-program-and-compare-rc "{ return 0==0; }" 1))
    (ok (compile-program-and-compare-rc "{ return 1!=0; }" 1))
    (ok (compile-program-and-compare-rc "{ return 0<1; }" 1))
    (ok (compile-program-and-compare-rc "{ return 1<1; }" 0))
    (ok (compile-program-and-compare-rc "{ return 1<=1; }" 1))
    (ok (compile-program-and-compare-rc "{ return 2<=1; }" 0))
    (ok (compile-program-and-compare-rc "{ return 0<1; }" 1))
    (ok (compile-program-and-compare-rc "{ return 1<1; }" 0))
    (ok (compile-program-and-compare-rc "{ return 1>=1; }" 1))
    (ok (compile-program-and-compare-rc "{ return 1>=2; }" 0)))

  (testing "Multiple statements"
    (ok (compile-program-and-compare-rc "{ return 1; 2; 3; }" 1))
    (ok (compile-program-and-compare-rc "{ 1; return 2; 3; }" 2))
    (ok (compile-program-and-compare-rc "{ 1; 2; return 3; }" 3)))

  (testing "Variables"
    (ok (compile-program-and-compare-rc "{ a:=8; return a; }" 8))
    (ok (compile-program-and-compare-rc "{ a:=3; b:=5; return a+b; }" 8))
    (ok (compile-program-and-compare-rc "{ foo:=3; bar:=5; return foo+bar; }" 8))
    (ok (compile-program-and-compare-rc "{ foo2:=3; bar2:=5; return foo2+bar2; }" 8)))

  (testing "Block"
    (ok (compile-program-and-compare-rc "{ 1; { 2; } return 3; }" 3))))
Testing System sila/tests

;; testing 'sila/tests/compiler'
test-compilation-and-compare-rc
  Integer
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 0; }" 0) to be true. (492ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ ;;;;; return 1; }" 1) to be true. (441ms)
  Arithmetics
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 5 + 40 - 20; }" 25) to be true. (423ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 2 / (1 + 1) * 8; }" 8) to be true. (431ms)
  Unary
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return - -10; }" 10) to be true. (419ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return -10+20; }" 10) to be true. (454ms)
  Comparisons
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 0==1; }" 0) to be true. (476ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1!=1; }" 0) to be true. (483ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 0==0; }" 1) to be true. (470ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1!=0; }" 1) to be true. (491ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 0<1; }" 1) to be true. (496ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1<1; }" 0) to be true. (487ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1<=1; }" 1) to be true. (474ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 2<=1; }" 0) to be true. (498ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 0<1; }" 1) to be true. (503ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1<1; }" 0) to be true. (502ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1>=1; }" 1) to be true. (471ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1>=2; }" 0) to be true. (418ms)
  Multiple statements
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1; 2; 3; }" 1) to be true. (433ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ 1; return 2; 3; }" 2) to be true. (426ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ 1; 2; return 3; }" 3) to be true. (441ms)
  Variables
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ a:=8; return a; }" 8) to be true. (432ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ a:=3; b:=5; return a+b; }" 8) to be true. (454ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ foo:=3; bar:=5; return foo+bar; }" 8) to be true. (467ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC
              "{ foo2:=3; bar2:=5; return foo2+bar2; }" 8) to be true. (452ms)
  Block
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ 1; { 2; } return 3; }" 3) to be true. (427ms)

;; testing 'sila/tests/codegen'
test-codegen-x86-64
  Integer
    ✓ { return 0; }
    ✓ { return 42; }
  Add and subtraction
    ✓ { return 5+20-4; }
    ✓ { return    5   +  20  -  4   ; }
  Division and multiplication
    ✓ { return 2 / (1 + 1) * 8; }
  Unary
    ✓ { return - -10; }
    ✓ { return -10+20; }
    ✓ { return - - -10; }
  Comparison
    ✓ { return 1==1; }
    ✓ { return 1>=1; }
    ✓ { return 1<=1; }
    ✓ { return 1<1; }
    ✓ { return 1>1; }
  Multiple statements
    ✓ { return 1;2;3; }
    ✓ { 1;return 2;3; }
    ✓ { 1;2;return 3; }
  Variables
    ✓ { a:=8;return a; }
    ✓ { foo:=5;bar:=8;return foo+bar; }
  Block
    ✓ { 1; { 2; } return 3; }

✓ 1 test completed

Summary:
  All 1 test passed.

He-hey! Tests seemed to pass! Until next time!

Sila Dev Log: Defining Macro for Parser Rules

Posted on 3rd of October 2023 | 1178 words

So, like I mentioned in a previous post , my hands have been quite full with Baldur’s Gate 3, so I haven’t been able to program too much Sila. But thankfully, while I enjoyed the game through and through, it’s nice to be back to hacking.

I started to add more parser rules for Sila, simple ones still, but crucial nonetheless. These included stuff like parsing equality (==, !=), relational (>, <, <=, >=) and unary nodes (-1, +2). While writing these rules, I quickly realized that I’m repeating myself quite a bit. So as a Lisp hacker, naturally I decided to reach for macros in this case to make my own life just a little bit easier.

If we look at the structure on how I decided to parse equality and relational nodes, they looked something like this:

(defun parse-equality-node (tok)
  "equality-node ::== relational-node ( '==' relational-node
                                      | '!=' relational-node ) *"
  (multiple-value-bind (node rest)
      (parse-relational-node tok)
    (loop
      (cond ((string= (token-val rest) "==")
             (multiple-value-bind (node2 rest2)
                 (parse-relational-node (token-next rest))
               (setf node (make-ast-node :kind :equal :lhs node :rhs node2))
               (setf rest rest2)))
            ((string= (token-val rest) "!=")
             (multiple-value-bind (node2 rest2)
                 (parse-relational-node (token-next rest))
               (setf node (make-ast-node :kind :not-equal :lhs node :rhs node2))
               (setf rest rest2)))
            (t
             (return-from parse-equality-node
               (values node rest)))))))

(defun parse-relational-node (tok)
  "relational-node ::== add ( '<'  add
                            | '<=' add
                            | '>'  add
                            | '>=' add ) *"
  (multiple-value-bind (node rest)
      (parse-add-node tok)
    (loop
      (cond ((string= (token-val rest) "<")
             (multiple-value-bind (node2 rest2)
                 (parse-add-node (token-next rest))
               (setf node (make-ast-node :kind :lesser-than :lhs node :rhs node2))
               (setf rest rest2)))
            ((string= (token-val rest) "<=")
             (multiple-value-bind (node2 rest2)
                 (parse-add-node (token-next rest))
               (setf node (make-ast-node :kind :lesser-or-equal :lhs node :rhs node2))
               (setf rest rest2)))
            ((string= (token-val rest) ">")
             (multiple-value-bind (node2 rest2)
                 (parse-add-node (token-next rest))
               (setf node (make-ast-node :kind :greater-than :lhs node :rhs node2))
               (setf rest rest2)))
            ((string= (token-val rest) ">=")
             (multiple-value-bind (node2 rest2)
                 (parse-add-node (token-next rest))
               (setf node (make-ast-node :kind :greater-or-equal :lhs node :rhs node2))
               (setf rest rest2)))
            (t
             (return-from parse-relational-node
               (values node rest)))))))

So the structure between these are pretty much identical. First, I bind the values that I get from the next parser rule, e.g. parse-relational-node or parse-add-node, and I run infinite loop and check the next tokens and create nodes based on that.

Macro Definition

Function definition can be broken down to a following macro:

(defmacro define-parser (name &key descent-parser
                                   comparison-symbols
                                   bnf)
  "Macro for generating new parser rules."
  (let ((parser-name (intern (format nil "PARSE-~a-NODE" name)))
        (descent-parser-name (intern (format nil "PARSE-~a-NODE" descent-parser))))
    `(defun ,parser-name (tok)
       ,bnf
       (multiple-value-bind (node rest)
           (,descent-parser-name tok)
         (loop
           (cond
             ,@(loop :for symbol in comparison-symbols
                     :collect `((string= (token-val rest) ,(car symbol))
                                (multiple-value-bind (node2 rest2)
                                    (,descent-parser-name (token-next rest))
                                  (setf node (make-ast-node :kind ,(cdr symbol)
                                                            :lhs node
                                                            :rhs node2))
                                  (setf rest rest2))))
             (t
              (return-from ,parser-name
                (values node rest)))))))))

So what is happening here:

  • First I define new symbols to the package that I will use inside the macro. This is done with the intern function .

  • When defining macros in Lisp, you often see code that is inside a backquote (`), this signals that every expression inside that is not preceded by a comma is to be quoted. So above you can see some places where there is comma in front of some expressions, those will be evaluated when the macro is run.

    • For example, if parser-name equals to parse-example-node then `(defun ,parser-name ()) would evaluate to (defun parse-example-node ()).
  • Last crucial piece in the macro is the way I build the conditional for the parsing itself.

    • Essentially how I do this is that I build a list of backquoted expressions like:

      ((string= (token-val rest) "<")
        (multiple-value-bind (node2 rest2)
            (parse-add-node (token-next rest))
          (setf node (make-ast-node :kind :lesser-than :lhs node :rhs node2))
          (setf rest rest2)))

      Based on all the comparison symbols are given in to macro.

    • The collected list is inside ,@ which basically means that evaluate the following expression (,) and splat the containing list (@). So if some-list equals to (1 2 3), then `(fn ,@some-list) would equal to (fn 1 2 3).

Now when the macro is defined, I can just define the parser rules in a following manner:

(define-parser equality
  :descent-parser relational
  :comparison-symbols (("==" . :equal)
                       ("!=" . :not-equal))
  :bnf "equality-node ::== relational-node ( '==' relational-node | '!=' relational-node ) *")

(define-parser relational
  :descent-parser add
  :comparison-symbols (("<" . :lesser-than)
                       ("<=" . :lesser-or-equal)
                       (">" . :greater-than)
                       (">=" . :greater-or-equal))
  :bnf "relational-node ::== add ( '<'  add | '<=' add | '>'  add | '>=' add ) *")

(define-parser add
  :descent-parser multiplicative
  :comparison-symbols (("+" . :add)
                       ("-" . :sub))
  :bnf "add-node ::== multiplicative-node ( '+' multiplicative-node | '-' multiplicative-node ) *")

(define-parser multiplicative
  :descent-parser unary
  :comparison-symbols (("*" . :mul)
                       ("/" . :div))
  :bnf "multiplicative-node ::== unary-node ( '*' unary-node | '/' unary-node ) *")

To see what those macros expand to you can just run macroexpand on them, for example:

(define-parser relational
  :descent-parser add
  :comparison-symbols (("<" . :lesser-than)
                       ("<=" . :lesser-or-equal)
                       (">" . :greater-than)
                       (">=" . :greater-or-equal))
  :bnf "relational-node ::== add ( '<'  add | '<=' add | '>'  add | '>=' add ) *")

Expands to:

(defun parse-relational-node (tok)
  "relational-node ::== add ( '<'  add | '<=' add | '>'  add | '>=' add ) *"
  (multiple-value-bind (node rest)
      (parse-add-node tok)
    (loop
     (cond
      ((string= (token-val rest) "<")
       (multiple-value-bind (node2 rest2)
           (parse-add-node (token-next rest))
         (setf node (make-ast-node :kind :lesser-than :lhs node :rhs node2))
         (setf rest rest2)))
      ((string= (token-val rest) "<=")
       (multiple-value-bind (node2 rest2)
           (parse-add-node (token-next rest))
         (setf node
                 (make-ast-node :kind :lesser-or-equal :lhs node :rhs node2))
         (setf rest rest2)))
      ((string= (token-val rest) ">")
       (multiple-value-bind (node2 rest2)
           (parse-add-node (token-next rest))
         (setf node (make-ast-node :kind :greater-than :lhs node :rhs node2))
         (setf rest rest2)))
      ((string= (token-val rest) ">=")
       (multiple-value-bind (node2 rest2)
           (parse-add-node (token-next rest))
         (setf node
                 (make-ast-node :kind :greater-or-equal :lhs node :rhs node2))
         (setf rest rest2)))
      (t (return-from parse-relational-node (values node rest)))))))

Cool, seems to be identical to the earlier definition that I had. So now when I need to add new parser rules, I can just utilize this macro to do them, saving me of writing unnecessary boilerplate. I probably am not able to use this macro for all the definitions. For example currently the topmost parser rule is defined in a following manner:

(defun parse-expression-node (tok)
  "expression-node ::== equality"
  (parse-equality-node tok))

So it doesn’t really make sense to use that macro for defining something like that. Similarly, unary and primary nodes are defined in a slightly different manner currently:

(defun parse-unary-node (tok)
  "unary-node ::== ( '+' | '-' ) unary | primary-node"
  (cond ((string= (token-val tok) "+")
         (parse-unary-node (token-next tok)))
        ((string= (token-val tok) "-")
         (multiple-value-bind (node rest)
             (parse-unary-node (token-next tok))
           (values (make-ast-node :kind :neg :lhs node)
                   rest)))
        (t
         (parse-primary-node tok))))

(defun parse-primary-node (tok)
  "primary-node ::== '(' expression-node ')' | number"
  (cond ((eq (token-kind tok) :num)
         (values (make-ast-node :kind :number :val (token-val tok))
                 (token-next tok)))
        ((string= (token-val tok) "(")
         (multiple-value-bind (node rest)
             (parse-expression-node (token-next tok))
           (values node (token-next (skip-to-token ")" rest)))))
        (t (error 'parser-error))))

Which I could make it so that the macro above would define these kind of parser rules if e.g. some special key is given in, but for now, I’m completely fine by defining these by hand.

What I Read in September 2023

Posted on 1st of October 2023 | 369 words

During September I seemed to spend quite a bit of time by reading books about addiction due to personal reasons.

Judtih Grisel: Never Enough

Grisel’s book focused mainly on how different substances work and how they cause addictions offering wonderful knowledge about the drug use and the development of human brain. Great book!

Gabor Maté: In the Realm of Hungry Ghost

Maté’s book approached addiction from the point of view of trying to answer why people become addicts. One of the most common factors between hard addicts seems to be, according to Maté, some form of trauma that causes them to seek chemical satisfaction from various substances. Book also raises a great point that addiction is really a spectrum. Everyone of us lies in some place in this spectrum.

Dr. Tom O. Bryan: You Can Fix Your Brain

While fixing somebody’s brain is definitely a hard task, it is not impossible. Bryan raises a point in this book that there is no silver-bullet for fixing your brain, but it is possible with small wins in multiple small areas. He calls these four faces of brain health pyramid, which are structure, mindset, biochemistry and electromagnetism, with what you can design a protocol for yourself.

Herman Hesse: Siddhartha (reread, but this time in German)

I’ve been tremendously interested in Buddhism ever since I was a teenager and I would consider myself being a practicing Buddhist. I have decided to start taking this practice more and more serious to try to fix somethings in my life. Siddhartha Gautama’s story is obviously a crucial part of the whole thing and Hesse’s book is a great novel for painting a picture of this. I’ve read this book earlier but in Finnish and English. Since last year I moved to Germany, I’ve been practicing my German and, at the same time, I’ve never been a huge fan of translations in books since I always feel that something always gets lost during the translation. So to improve my German I decided to read this book in its original language, German! While I’m very familiar with the story, from reading this book earlier but also from stuff like Pali canon, it’s still one of my favorite books!

So... Baldur's Gate 3 Happened

Posted on 24th of September 2023 | 274 words

Soo… Baldur’s Gate 3 was finally released after many years in open beta, and I have to be honest, it really made me forget pretty much any other project I had going on and just focus on playing it. Yesterday, I was finally able to finish the game after over 100 hours, and all I can say is that it truly was a great experience!

At one point in time, I played quite a bit of video games, but then I lost interest in playing them. Well, losing interest is maybe the wrong word, since I was still interested in what was happening in the gaming world, but I just followed it from afar without spending countless hours in those games.

Baldur’s Gate 3 really sparked this old fire again since it had many things going on in it. I loved the old Baldur’s Gates, and at the same time, many Dungeons and Dragons campaigns with friends hold a dear place in my heart, and naturally, Baldur’s Gate 3 combined those two in a truly remarkable fashion.

So I just wanted to write this small appreciation post to Larian Studios. Yes, the game had some minor bugs here and there, but nothing game-breaking (at least in my experience), but it was an absolutely amazing sequel to one of my favorite gaming series while respecting the legacy of D&D. I doubt that I will start consuming other video games endlessly like I used to, but nonetheless, Baldur’s Gate 3 was an amazing experience, and I’m glad I spent many, many hours in it.

Now, when the game is finished, I can finally return to hacking.

What I Read During the Summer (May-Aug) 2023

Posted on 5th of September 2023 | 661 words

Summer months went by fast when you had lot on your hands. Didn’t feel like keeping the reading log up to date during these months, so I’ll just do one big post-summer update here.

Aldous Huxley: Brave New World (reread)

On May, I seemed to be rereading bunch of classics especially with the common theme of dystopian future. Why you might ask? That I don’t know, maybe something inside me just thinks that future depicted in these classics is inevitable. Brave New World is probably one of my favorite books ever written. Wonderful vision of advanced utopia, but at what cost? Masterful critique of the dehumanizing effects of a highly controlled and pleasure-driven society challenges readers to reflect on the consequences of sacrificing personal freedom for comfort and conformity.

George Orwell: 1984 (reread)

Continuing on the series of dystopia and nightmarish tales of possible future. Tale about the dangers of totalitarianism and government surveillance. Serves as a stark reminder of the importance of safeguarding individual freedoms, critical thinking, and truth itself.

William Golding: Lord of the Flies (reread)

More classics! Compelling exploration of human nature and the thin veneer of civilization that separates order from chaos.

J.D. Sallinger: The Catcher in the Rye

The Catcher in the Rye was, one of many, classics that I hadn’t read before and to be honest, wasn’t a huge fan of it. Maybe since I read this as a little bit older. Don’t know. Topics in this book was really interesting, considering teenager alienation, identity etc. While Holden’s struggles in his adolescent were definitely unique, I just felt more annoyed about him that anything else.

Jacques Ellul: The Technological Society

Jacques Ellul: Propaganda: The Formation of Men’s Attitudes

Ted Kaczynski: Industrial Society and Its Future

Ted Kaczynski: Anti-Tech Revolution: Why and How

Every once in a while my inner luddite wakes up and I start hating everything about technology. Reading Jacques Ellul was definitely part of this, but this time it was mainly the death of Unabomber, Ted Kaczynski, that brought me to him. While I don’t agree/support on what Ted Kaczynski did, he had some great points about how technology affects us. Ellul was many ways inspiration for him in these sort of anti-technology philosophies and it affection on humans. But Kaczynski and Ellul had one big difference between them, Ellul was a pacifist were as Kaczynski was a domestic terrorist.

Nonetheless, all of these books make some great points about technology so despite your opinion about it, these are definitely worth a read.

Cormac McCarthy: The Road

Cormac McCarthy: No Country for Old Men

During the summer unfortunately we had a couple of passings of great authors, one of which were Cormac McCarthy. I had never read his books before, but I was familiar with as a movie from (No Country for Old Men by Coen Brothers) and I had heard great things about his writing. So I grabbed a copy of The Road and No Country for Old Men, since those were quite highly recommended. And the recommendations were definitely true.

The Road offers a haunting depiction of post-apocalyptic America and the struggles of father and son in this world. Immersing in bleak landscape where hope and love endure against all odds. No Country for Old Men delves into the realms of crime, fate, and the inexorable consequences of one’s choices.

Both of these works showcase McCarthy’s masterful storytelling, unique narrative styles, and philosophical depth, making them essential reads for those interested in literature that explores the human condition in its most challenging and thought-provoking forms.

Richard P. Gabriel: Patterns of Software

Richard P. Gabriel is a pretty known name in the software world and especially in the Lisp world. One of the famous writings of Richard P. Gabriel is the concept of “Worse Is Better” .

Patterns of Software gives a great look into software design, programming and business around it. Definitely a must read for everyone working in this industry.