Sila Dev Log: Initial Control Flow

Posted on 27th of January 2024 | 2526 words
Plug: Follow the Sila development here.

Have had a small pause on programming and writing due to vacations etc., but fortunately, now I’m back to normal schedule, so I’ll try to get back to it as much as possible!

I recently wrote about implementing local variables to my compiler and next I would like to write about implementing the initial control flow to it. With the initial control flow, I mean stuff like loops and conditionals in a relatively simple fashion for now.

Starting with simple conditionals. In my language I’ve decided to go with the usual if and else just not to over-complicate things for me. Starting with the required assembly code. Considering code like:

if 1 == 0 { return 1; } return 0;

Nothing too complicated going on. Of course in that, it would never return 1, but let’s not focus on that for now. Assembly for something like this (with the function prologue and epilogue) would look something like this:

	; Prologue
	push %rbp
	mov %rsp, %rbp
	sub $0, %rsp

	mov $0, %rax	; Move the value 0 into the register %rax.
	push %rax	; Push the value of %rax onto the stack.
	mov $1, %rax	; Move the value 1 into the register %rax.
	pop %rdi	; Pop the topmost value from the stack and store it in the %rdi register.
	cmp %rdi, %rax	; Compare the values in %rdi and %rax.
	sete %al	; Set the least significant byte of the register %rax to 1 if the previous comparison was equal, otherwise, set it to 0.
	movzb %al, %rax	; Zero-extend %al (the least significant byte) to the entire %rax register.
	cmp $0, %rax	; Compare the value in %rax with 0.
	je .L.else.1	; Jump to .L.else.1 if the zero flag is set (if the previous comparison resulted in equality).
.L.else.1:		; Label for the else block.
	nop		; No operation (this instruction does nothing).
.L.end.1:		; Label for the end of the code block.
	mov $0, %rax	; Move the value 0 into %rax.

	; Epilogue
	mov %rbp, %rsp
	pop %rbp
	ret

This probably could be written more cleanly but for now I’m not too worried about code generation optimizations, so I let future Topi worry about those.

In the last part, where we focused on implementing our local variables , I already added support for doing tokenization for our keywords that are required for the control flow. So when it comes to lexer, everything should already be in order.

Parser on the other hand requires little bit of work. First, let’s implement structures for conditionals:

(defstruct (ast-node-cond
            (:include ast-node)
            (:copier nil))
  (expr (util:required 'expr) :type t :read-only t)
  (then (util:required 'then) :type t :read-only t)
  (else (util:required 'else) :type t :read-only t))

(defmethod next-node ((node ast-node-cond))
  (ast-node-cond-next node))

For now, conditionals will be parsed as a statement node, so we’ll need add cases for conditional keywords in our parse-statement-node:

(defun parse-statement-node (tok)
  (alexandria:switch ((lex:token-value tok) :test #'string=)
    #| [...] |#

    ("if"
     (parse-cond-statement-node (lex:token-next tok)))

    #| [...] |#))

parse-cond-statement-node on the other hand looks something like this:

(defun parse-cond-statement-node (tok)
  ;; TOK passed in should be the token after "if".
  (let (expr then else)
    (multiple-value-bind (expr-node rest)
        (parse-expression-node tok)
      (setf expr expr-node
            tok rest))

    (multiple-value-bind (then-node rest)
        (parse-statement-node tok)
      (setf then then-node
            tok rest))

    (when (string= (lex:token-value tok) "else")
      (multiple-value-bind (else-node rest)
          (parse-statement-node (lex:token-next tok))
        (setf else else-node
              tok rest)))

    (values (make-ast-node-cond :expr expr
                                :then then
                                :else else)
            tok)))

So we’ll parse the conditional out of

if <cond> { <then-node> } else { <else-node> }

and proceed to parse the bodies inside curly braces. Parsing of the bodies of the conditional proceed down in a recursive descent manner how we have implemented the parses earlier.

To be able to produce somewhat similar assembly as I wrote above, naturally, the code generation needs some work:

(defun generate-statement (node)
  (let ((insts (make-inst-array)))
    (cond
      #| [...] |#

      ((parser:ast-node-cond-p node)
       (incf *label-count*)

       (let ((count *label-count*))
         (do-vector-push-inst (generate-expression
                               (parser:ast-node-cond-expr node)) insts)
         (vector-push-extend "cmp $0, %rax" insts)
         (vector-push-extend (format nil "je .L.else.~d" count) insts)

         (do-vector-push-inst (generate-statement
                               (parser:ast-node-cond-then node)) insts)
         (vector-push-extend (format nil "jmp .L.end.~d" count) insts)

         (vector-push-extend (format nil ".L.else.~d:" count) insts)
         (if (parser:ast-node-cond-else node)
             (do-vector-push-inst (generate-statement
                                   (parser:ast-node-cond-else node)) insts)
             (vector-push-extend "nop" insts))

         (vector-push-extend (format nil ".L.end.~d:" count) insts)

         (unless (parser:next-node node)
           (vector-push-extend "nop" insts)))
       insts)

      #| [...] |#)))

Essentially what is happening here is that we compare the conditional in the given statement to 0 and if or not comparison is true, we’ll jump to a label .L.else.<label no> or we just skip it and proceed forward to instructions generated from the “then” node. Label numbering is naturally for the reason that pretty much always in real programs functions might have multitude of different conditional in varying levels of nesting. So we need to keep track of which conditional is in question at this point.

But when it comes to initial implementation of conditionals, that should be more or less it and we can proceed on to implementing loops.

Loops in many way works similarly to how conditionals were implemented above, since you often have conditionals e.g. in “for” loops. In assembly, how something like this would be implemented is basically the same as above, but in the end, we would just jump back to the beginning depending of the conditional of the loop itself.

So code something like

{ for ;; { return 3; } return 5; }

would equal to code like:

	.globl main
main:
	push %rbp
	mov %rsp, %rbp
	sub $0, %rsp
.L.begin.1:
	mov $3, %rax
	jmp .L.return
	jmp .L.begin.1
.L.end.1:
	mov $5, %rax
	jmp .L.return
.L.return:
	mov %rbp, %rsp
	pop %rbp
	ret

Or little bit more complex for loop

{ i:=0; j:=0;for i:=0; i<=10; i:=i+1 { j:= i+j; } return j; }

would equal to:

	.globl main
main:
	push %rbp
	mov %rsp, %rbp
	sub $16, %rsp
	lea -16(%rbp), %rax
	push %rax
	mov $0, %rax
	pop %rdi
	mov %rax, (%rdi)
	lea -8(%rbp), %rax
	push %rax
	mov $0, %rax
	pop %rdi
	mov %rax, (%rdi)
	lea -16(%rbp), %rax
	push %rax
	mov $0, %rax
	pop %rdi
	mov %rax, (%rdi)
.L.begin.1:
	mov $10, %rax
	push %rax
	lea -16(%rbp), %rax
	mov (%rax), %rax
	pop %rdi
	cmp %rdi, %rax
	setle %al
	movzb %al, %rax
	cmp $0, %rax
	je .L.end.1
	lea -8(%rbp), %rax
	push %rax
	lea -8(%rbp), %rax
	mov (%rax), %rax
	push %rax
	lea -16(%rbp), %rax
	mov (%rax), %rax
	pop %rdi
	add %rdi, %rax
	pop %rdi
	mov %rax, (%rdi)
	lea -16(%rbp), %rax
	push %rax
	mov $1, %rax
	push %rax
	lea -16(%rbp), %rax
	mov (%rax), %rax
	pop %rdi
	add %rdi, %rax
	pop %rdi
	mov %rax, (%rdi)
	jmp .L.begin.1
.L.end.1:
	lea -8(%rbp), %rax
	mov (%rax), %rax
	jmp .L.return
.L.return:
	mov %rbp, %rsp
	pop %rbp
	ret

Again, let’s start by adding structures for our loops:

(defstruct (ast-node-for
             (:include ast-node)
             (:copier nil))
  (init (util:required 'init) :type t :read-only t)
  (inc (util:required 'inc) :type t :read-only t)
  (cond (util:required 'cond) :type t :read-only t)
  (body (util:required 'body) :type t :read-only t))

(defmethod next-node ((node ast-node-for))
  (ast-node-for-next node))

(defstruct (ast-node-loop
             (:include ast-node)
             (:copier nil))
  (body (util:required 'body) :type t :read-only t))

(defmethod next-node ((node ast-node-loop))
  (ast-node-loop-next node))

(defstruct (ast-node-break
             (:include ast-node)
             (:copier nil))
  (depth (util:required 'depth) :type integer :read-only t))

(defmethod next-node ((node ast-node-break))
  (ast-node-break-next node))

I’ve included ast-node-loop in here, which in my language depicts while loop. Similarly to e.g. Rust.

Parsing loops will similarly to conditionals happen in parse-statement-node which in current state looks fully something like this:

(defvar *break-depth* 0
  "Depth counter for BREAK keyword to know from what level it should break out.")

(defun parse-statement-node (tok)
  (alexandria:switch ((lex:token-value tok) :test #'string=)
    ("return"
     (multiple-value-bind (node rest)
         (parse-expression-node (lex:token-next tok))
       (values (make-ast-node-return :expr node)
               (lex:token-next (skip-to-token ";" rest)))))

    ("if"
     (parse-cond-statement-node (lex:token-next tok)))

    ("for"
     (parse-for-statement-node (lex:token-next tok)))

    ("loop"
     (parse-loop-statement-node (lex:token-next tok)))

    ("break"
     (values (make-ast-node-break :depth *break-depth*)
             (lex:token-next (skip-to-token ";" tok))))

    ("{"
     (parse-compound-statement-node (lex:token-next tok)))

    (otherwise
     (parse-expression-statement-node tok))))

Notice the *break-depth* variable, that’ll work in similar fashion as our label counter in the code generation for conditionals.

Parsing functions for our loop keywords looks like this:

(defun parse-for-statement-node (tok)
  ;; TOK passed in should be the token after "for"
  (let (init cond inc body)
    (multiple-value-bind (init-node rest)
        (parse-expression-statement-node tok)
      (setf init init-node
            tok rest))

    (unless (string= (lex:token-value tok) ";")
      (multiple-value-bind (cond-node rest)
          (parse-expression-node tok)
        (setf cond cond-node
              tok rest)))
    (setf tok (skip-to-token ";" tok))

    ;; Entering "for" scope.
    (incf *break-depth*)

    (unless (string= (lex:token-value (lex:token-next tok)) "{")
      (multiple-value-bind (inc-node rest)
          (parse-expression-node (lex:token-next tok))
        (setf inc inc-node
              tok rest)))
    (setf tok (skip-to-token "{" tok))

    (multiple-value-bind (body-node rest)
        (parse-statement-node tok)
      (setf body body-node
            tok rest))

    ;; Left "for" scope.
    (decf *break-depth*)

    (values (make-ast-node-for :init init
                               :cond cond
                               :inc inc
                               :body body)
            tok)))

(defun parse-loop-statement-node (tok)
  ;; TOK passed in should be the opening brace of the block after the "loop"
  ;; keyword.
  ;; TODO(topi): Add proper error handling.
  (assert (string= (lex:token-value tok) "{"))

  ;; Entering "loop" scope.
  (incf *break-depth*)

  (multiple-value-bind (body rest)
      (parse-statement-node tok)

    ;; Left "loop" scope.
    (decf *break-depth*)

    (values (make-ast-node-loop :body body)
            rest)))

So as you can see, parsing loops work in a very similar fashion to how conditionals are parsed since the structure for the syntax is quite similar. Same also applies to code generation so our generate-statement currently fully looks like this:

(defun generate-statement (node)
  (let ((insts (make-inst-array)))
    (cond
      ((parser:ast-node-block-p node)
       (loop :for body := (parser:ast-node-block-body node)
               :then (setf body (parser:next-node body))
             :until (null body)
             :do (do-vector-push-inst (generate-statement body) insts))
       insts)

      ((parser:ast-node-return-p node)
       (do-vector-push-inst (generate-expression
                             (parser:ast-node-return-expr node)) insts)
       (vector-push-extend "jmp .L.return" insts)
       insts)

      ((parser:ast-node-break-p node)
       (vector-push-extend (format nil "jmp .L.end.~d"
                                   (parser:ast-node-break-depth node)) insts)
       insts)

      ((parser:ast-node-cond-p node)
       (incf *label-count*)

       (let ((count *label-count*))
         (do-vector-push-inst (generate-expression
                               (parser:ast-node-cond-expr node)) insts)
         (vector-push-extend "cmp $0, %rax" insts)
         (vector-push-extend (format nil "je .L.else.~d" count) insts)

         (do-vector-push-inst (generate-statement
                               (parser:ast-node-cond-then node)) insts)
         (vector-push-extend (format nil "jmp .L.end.~d" count) insts)

         (vector-push-extend (format nil ".L.else.~d:" count) insts)
         (if (parser:ast-node-cond-else node)
             (do-vector-push-inst (generate-statement
                                   (parser:ast-node-cond-else node)) insts)
             (vector-push-extend "nop" insts))

         (vector-push-extend (format nil ".L.end.~d:" count) insts)

         (unless (parser:next-node node)
           (vector-push-extend "nop" insts)))
       insts)

      ((parser:ast-node-for-p node)
       (incf *label-count*)

       (let ((count *label-count*))
         (do-vector-push-inst (generate-statement
                               (parser:ast-node-for-init node)) insts)
         (vector-push-extend (format nil ".L.begin.~d:" count) insts)

         (when (parser:ast-node-for-cond node)
           (do-vector-push-inst (generate-expression
                                 (parser:ast-node-for-cond node)) insts)
           (vector-push-extend "cmp $0, %rax" insts)
           (vector-push-extend (format nil "je .L.end.~d" count) insts))

         (do-vector-push-inst (generate-statement
                               (parser:ast-node-for-body node)) insts)

         (when (parser:ast-node-for-inc node)
           (do-vector-push-inst (generate-expression
                                 (parser:ast-node-for-inc node)) insts))

         (vector-push-extend (format nil "jmp .L.begin.~d" count) insts)
         (vector-push-extend (format nil ".L.end.~d:" count) insts)

         (unless (parser:next-node node)
           (vector-push-extend "nop" insts)))
       insts)

      ((parser:ast-node-loop-p node)
       (incf *label-count*)

       (let ((count *label-count*))
         (vector-push-extend (format nil ".L.begin.~d:" count) insts)

         (do-vector-push-inst (generate-statement
                               (parser:ast-node-loop-body node)) insts)

         (vector-push-extend (format nil "jmp .L.begin.~d" count) insts)
         (vector-push-extend (format nil ".L.end.~d:" count) insts)

         (unless (parser:next-node node)
           (vector-push-extend "nop" insts)))
       insts)

      ((parser:ast-node-expression-p node)
       (do-vector-push-inst (generate-expression
                             (parser:ast-node-expression-expr node)) insts)
       insts))))

Mainly the difference between loops and conditionals is just the label .L.begin.<label no> which we use in case we want to jump back to the beginning of the loop with jmp .L.begin.<label no>.

And that’s about it! I have already implemented tests for these and they should now be be passing!

Testing System sila/tests

;; testing 'sila/tests/compiler'
test-compilation-and-compare-rc
  Integer
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 0; }" 0) to be true. (1013ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ ;;;;; return 1; }" 1) to be true. (1018ms)
  Arithmetics
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 5 + 40 - 20; }" 25) to be true. (1022ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 2 / (1 + 1) * 8; }" 8) to be true. (1015ms)
  Unary
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return - -10; }" 10) to be true. (1011ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return -10+20; }" 10) to be true. (1022ms)
  Comparisons
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 0==1; }" 0) to be true. (1023ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1!=1; }" 0) to be true. (1067ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 0==0; }" 1) to be true. (1059ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1!=0; }" 1) to be true. (1059ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 0<1; }" 1) to be true. (1061ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1<1; }" 0) to be true. (1087ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1<=1; }" 1) to be true. (1042ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 2<=1; }" 0) to be true. (1055ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 0<1; }" 1) to be true. (1021ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1<1; }" 0) to be true. (1014ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1>=1; }" 1) to be true. (1025ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1>=2; }" 0) to be true. (1014ms)
  Multiple statements
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1; 2; 3; }" 1) to be true. (1019ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ 1; return 2; 3; }" 2) to be true. (1023ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ 1; 2; return 3; }" 3) to be true. (1031ms)
  Variables
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ a:=8; return a; }" 8) to be true. (1043ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ a:=3; b:=5; return a+b; }" 8) to be true. (1024ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ foo:=3; bar:=5; return foo+bar; }" 8) to be true. (1032ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC
              "{ foo2:=3; bar2:=5; return foo2+bar2; }" 8) to be true. (1056ms)
  Block
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ 1; { 2; } return 3; }" 3) to be true. (1070ms)
  Conditionals
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ if 1 { return 1; } return 2; }" 1) to be true. (1085ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC
              "{ if 0 { return 1; } else { return 2; } }" 2) to be true. (1077ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC
              "{ if 1<0 { return 1; } else { return 2; } }" 2) to be true. (1024ms)
  For loop
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ for ;; { return 3; } return 5; }" 3) to be true. (1011ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC
              "{ i:=0; j:=0;for i:=0; i<=10; i:=i+1 {j := i+j;} return j; }" 55) to be true. (1019ms)
  Loop
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ loop { return 3; } return 5; }" 3) to be true. (1046ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC
              "{ i := 0; loop { i := 3; break; } return i; }" 3) to be true. (1030ms)
    ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC
              "{ i := 0; loop { i := i + 1; if i == 10 { break; } } return i; }" 10) to be true. (1038ms)

;; testing 'sila/tests/codegen'
test-codegen-x86-64
  Integer
    ✓ { return 0; }
    ✓ { return 42; }
  Add and subtraction
    ✓ { return 5+20-4; }
    ✓ { return    5   +  20  -  4   ; }
  Division and multiplication
    ✓ { return 2 / (1 + 1) * 8; }
  Unary
    ✓ { return - -10; }
    ✓ { return -10+20; }
    ✓ { return - - -10; }
  Comparison
    ✓ { return 1==1; }
    ✓ { return 1>=1; }
    ✓ { return 1<=1; }
    ✓ { return 1<1; }
    ✓ { return 1>1; }
  Multiple statements
    ✓ { return 1;2;3; }
    ✓ { 1;return 2;3; }
    ✓ { 1;2;return 3; }
  Variables
    ✓ { a:=8;return a; }
    ✓ { foo:=5;bar:=8;return foo+bar; }
  Block
    ✓ { 1; { 2; } return 3; }
  Conditional
    ✓ { if 1 == 0 { return 1; } return 0; }
    ✓ { if 0 { return 1; } else { return 2; } }
  For loop
    ✓ { for ;; { return 3; } return 5; }
    ✓ { i:=0; j:=0;for i:=0; i<=10; i:=i+1 {j := i+j;} return j; }

✓ 1 test completed

Summary:
  All 1 test passed.
NB: If you want to read earlier posts of this dev log, head over here.