Sila Dev Log: Initial Control Flow
Posted on 27th of January 2024 | 2526 wordsPlug: Follow the Sila development here. |
Have had a small pause on programming and writing due to vacations etc., but fortunately, now I’m back to normal schedule, so I’ll try to get back to it as much as possible!
I recently wrote about implementing local variables to my compiler and next I would like to write about implementing the initial control flow to it. With the initial control flow, I mean stuff like loops and conditionals in a relatively simple fashion for now.
Starting with simple conditionals. In my language I’ve decided to go with the
usual if
and else
just not to over-complicate things for me. Starting with
the required assembly code. Considering code like:
if 1 == 0 { return 1; } return 0;
Nothing too complicated going on. Of course in that, it would never return 1, but let’s not focus on that for now. Assembly for something like this (with the function prologue and epilogue) would look something like this:
; Prologue push %rbp mov %rsp, %rbp sub $0, %rsp mov $0, %rax ; Move the value 0 into the register %rax. push %rax ; Push the value of %rax onto the stack. mov $1, %rax ; Move the value 1 into the register %rax. pop %rdi ; Pop the topmost value from the stack and store it in the %rdi register. cmp %rdi, %rax ; Compare the values in %rdi and %rax. sete %al ; Set the least significant byte of the register %rax to 1 if the previous comparison was equal, otherwise, set it to 0. movzb %al, %rax ; Zero-extend %al (the least significant byte) to the entire %rax register. cmp $0, %rax ; Compare the value in %rax with 0. je .L.else.1 ; Jump to .L.else.1 if the zero flag is set (if the previous comparison resulted in equality). .L.else.1: ; Label for the else block. nop ; No operation (this instruction does nothing). .L.end.1: ; Label for the end of the code block. mov $0, %rax ; Move the value 0 into %rax. ; Epilogue mov %rbp, %rsp pop %rbp ret
This probably could be written more cleanly but for now I’m not too worried about code generation optimizations, so I let future Topi worry about those.
In the last part, where we focused on implementing our local variables , I already added support for doing tokenization for our keywords that are required for the control flow. So when it comes to lexer, everything should already be in order.
Parser on the other hand requires little bit of work. First, let’s implement structures for conditionals:
(defstruct (ast-node-cond (:include ast-node) (:copier nil)) (expr (util:required 'expr) :type t :read-only t) (then (util:required 'then) :type t :read-only t) (else (util:required 'else) :type t :read-only t)) (defmethod next-node ((node ast-node-cond)) (ast-node-cond-next node))
For now, conditionals will be parsed as a statement node, so we’ll need add
cases for conditional keywords in our parse-statement-node
:
(defun parse-statement-node (tok) (alexandria:switch ((lex:token-value tok) :test #'string=) #| [...] |# ("if" (parse-cond-statement-node (lex:token-next tok))) #| [...] |#))
parse-cond-statement-node
on the other hand looks something like this:
(defun parse-cond-statement-node (tok) ;; TOK passed in should be the token after "if". (let (expr then else) (multiple-value-bind (expr-node rest) (parse-expression-node tok) (setf expr expr-node tok rest)) (multiple-value-bind (then-node rest) (parse-statement-node tok) (setf then then-node tok rest)) (when (string= (lex:token-value tok) "else") (multiple-value-bind (else-node rest) (parse-statement-node (lex:token-next tok)) (setf else else-node tok rest))) (values (make-ast-node-cond :expr expr :then then :else else) tok)))
So we’ll parse the conditional out of
if <cond> { <then-node> } else { <else-node> }
and proceed to parse the bodies inside curly braces. Parsing of the bodies of the conditional proceed down in a recursive descent manner how we have implemented the parses earlier.
To be able to produce somewhat similar assembly as I wrote above, naturally, the code generation needs some work:
(defun generate-statement (node) (let ((insts (make-inst-array))) (cond #| [...] |# ((parser:ast-node-cond-p node) (incf *label-count*) (let ((count *label-count*)) (do-vector-push-inst (generate-expression (parser:ast-node-cond-expr node)) insts) (vector-push-extend "cmp $0, %rax" insts) (vector-push-extend (format nil "je .L.else.~d" count) insts) (do-vector-push-inst (generate-statement (parser:ast-node-cond-then node)) insts) (vector-push-extend (format nil "jmp .L.end.~d" count) insts) (vector-push-extend (format nil ".L.else.~d:" count) insts) (if (parser:ast-node-cond-else node) (do-vector-push-inst (generate-statement (parser:ast-node-cond-else node)) insts) (vector-push-extend "nop" insts)) (vector-push-extend (format nil ".L.end.~d:" count) insts) (unless (parser:next-node node) (vector-push-extend "nop" insts))) insts) #| [...] |#)))
Essentially what is happening here is that we compare the conditional in the
given statement to 0 and if or not comparison is true, we’ll jump to a label
.L.else.<label no>
or we just skip it and proceed forward to instructions
generated from the “then” node. Label numbering is naturally for the reason
that pretty much always in real programs functions might have multitude of
different conditional in varying levels of nesting. So we need to keep track
of which conditional is in question at this point.
But when it comes to initial implementation of conditionals, that should be more or less it and we can proceed on to implementing loops.
Loops in many way works similarly to how conditionals were implemented above, since you often have conditionals e.g. in “for” loops. In assembly, how something like this would be implemented is basically the same as above, but in the end, we would just jump back to the beginning depending of the conditional of the loop itself.
So code something like
{ for ;; { return 3; } return 5; }
would equal to code like:
.globl main main: push %rbp mov %rsp, %rbp sub $0, %rsp .L.begin.1: mov $3, %rax jmp .L.return jmp .L.begin.1 .L.end.1: mov $5, %rax jmp .L.return .L.return: mov %rbp, %rsp pop %rbp ret
Or little bit more complex for loop
{ i:=0; j:=0;for i:=0; i<=10; i:=i+1 { j:= i+j; } return j; }
would equal to:
.globl main main: push %rbp mov %rsp, %rbp sub $16, %rsp lea -16(%rbp), %rax push %rax mov $0, %rax pop %rdi mov %rax, (%rdi) lea -8(%rbp), %rax push %rax mov $0, %rax pop %rdi mov %rax, (%rdi) lea -16(%rbp), %rax push %rax mov $0, %rax pop %rdi mov %rax, (%rdi) .L.begin.1: mov $10, %rax push %rax lea -16(%rbp), %rax mov (%rax), %rax pop %rdi cmp %rdi, %rax setle %al movzb %al, %rax cmp $0, %rax je .L.end.1 lea -8(%rbp), %rax push %rax lea -8(%rbp), %rax mov (%rax), %rax push %rax lea -16(%rbp), %rax mov (%rax), %rax pop %rdi add %rdi, %rax pop %rdi mov %rax, (%rdi) lea -16(%rbp), %rax push %rax mov $1, %rax push %rax lea -16(%rbp), %rax mov (%rax), %rax pop %rdi add %rdi, %rax pop %rdi mov %rax, (%rdi) jmp .L.begin.1 .L.end.1: lea -8(%rbp), %rax mov (%rax), %rax jmp .L.return .L.return: mov %rbp, %rsp pop %rbp ret
Again, let’s start by adding structures for our loops:
(defstruct (ast-node-for (:include ast-node) (:copier nil)) (init (util:required 'init) :type t :read-only t) (inc (util:required 'inc) :type t :read-only t) (cond (util:required 'cond) :type t :read-only t) (body (util:required 'body) :type t :read-only t)) (defmethod next-node ((node ast-node-for)) (ast-node-for-next node)) (defstruct (ast-node-loop (:include ast-node) (:copier nil)) (body (util:required 'body) :type t :read-only t)) (defmethod next-node ((node ast-node-loop)) (ast-node-loop-next node)) (defstruct (ast-node-break (:include ast-node) (:copier nil)) (depth (util:required 'depth) :type integer :read-only t)) (defmethod next-node ((node ast-node-break)) (ast-node-break-next node))
I’ve included ast-node-loop
in here, which in my language depicts while
loop. Similarly to e.g. Rust.
Parsing loops will similarly to conditionals happen in parse-statement-node
which in current state looks fully something like this:
(defvar *break-depth* 0 "Depth counter for BREAK keyword to know from what level it should break out.") (defun parse-statement-node (tok) (alexandria:switch ((lex:token-value tok) :test #'string=) ("return" (multiple-value-bind (node rest) (parse-expression-node (lex:token-next tok)) (values (make-ast-node-return :expr node) (lex:token-next (skip-to-token ";" rest))))) ("if" (parse-cond-statement-node (lex:token-next tok))) ("for" (parse-for-statement-node (lex:token-next tok))) ("loop" (parse-loop-statement-node (lex:token-next tok))) ("break" (values (make-ast-node-break :depth *break-depth*) (lex:token-next (skip-to-token ";" tok)))) ("{" (parse-compound-statement-node (lex:token-next tok))) (otherwise (parse-expression-statement-node tok))))
Notice the
*break-depth*
variable, that’ll work in similar fashion as our label counter in the code generation for conditionals.
Parsing functions for our loop keywords looks like this:
(defun parse-for-statement-node (tok) ;; TOK passed in should be the token after "for" (let (init cond inc body) (multiple-value-bind (init-node rest) (parse-expression-statement-node tok) (setf init init-node tok rest)) (unless (string= (lex:token-value tok) ";") (multiple-value-bind (cond-node rest) (parse-expression-node tok) (setf cond cond-node tok rest))) (setf tok (skip-to-token ";" tok)) ;; Entering "for" scope. (incf *break-depth*) (unless (string= (lex:token-value (lex:token-next tok)) "{") (multiple-value-bind (inc-node rest) (parse-expression-node (lex:token-next tok)) (setf inc inc-node tok rest))) (setf tok (skip-to-token "{" tok)) (multiple-value-bind (body-node rest) (parse-statement-node tok) (setf body body-node tok rest)) ;; Left "for" scope. (decf *break-depth*) (values (make-ast-node-for :init init :cond cond :inc inc :body body) tok))) (defun parse-loop-statement-node (tok) ;; TOK passed in should be the opening brace of the block after the "loop" ;; keyword. ;; TODO(topi): Add proper error handling. (assert (string= (lex:token-value tok) "{")) ;; Entering "loop" scope. (incf *break-depth*) (multiple-value-bind (body rest) (parse-statement-node tok) ;; Left "loop" scope. (decf *break-depth*) (values (make-ast-node-loop :body body) rest)))
So as you can see, parsing loops work in a very similar fashion to how
conditionals are parsed since the structure for the syntax is quite similar.
Same also applies to code generation so our generate-statement
currently
fully looks like this:
(defun generate-statement (node) (let ((insts (make-inst-array))) (cond ((parser:ast-node-block-p node) (loop :for body := (parser:ast-node-block-body node) :then (setf body (parser:next-node body)) :until (null body) :do (do-vector-push-inst (generate-statement body) insts)) insts) ((parser:ast-node-return-p node) (do-vector-push-inst (generate-expression (parser:ast-node-return-expr node)) insts) (vector-push-extend "jmp .L.return" insts) insts) ((parser:ast-node-break-p node) (vector-push-extend (format nil "jmp .L.end.~d" (parser:ast-node-break-depth node)) insts) insts) ((parser:ast-node-cond-p node) (incf *label-count*) (let ((count *label-count*)) (do-vector-push-inst (generate-expression (parser:ast-node-cond-expr node)) insts) (vector-push-extend "cmp $0, %rax" insts) (vector-push-extend (format nil "je .L.else.~d" count) insts) (do-vector-push-inst (generate-statement (parser:ast-node-cond-then node)) insts) (vector-push-extend (format nil "jmp .L.end.~d" count) insts) (vector-push-extend (format nil ".L.else.~d:" count) insts) (if (parser:ast-node-cond-else node) (do-vector-push-inst (generate-statement (parser:ast-node-cond-else node)) insts) (vector-push-extend "nop" insts)) (vector-push-extend (format nil ".L.end.~d:" count) insts) (unless (parser:next-node node) (vector-push-extend "nop" insts))) insts) ((parser:ast-node-for-p node) (incf *label-count*) (let ((count *label-count*)) (do-vector-push-inst (generate-statement (parser:ast-node-for-init node)) insts) (vector-push-extend (format nil ".L.begin.~d:" count) insts) (when (parser:ast-node-for-cond node) (do-vector-push-inst (generate-expression (parser:ast-node-for-cond node)) insts) (vector-push-extend "cmp $0, %rax" insts) (vector-push-extend (format nil "je .L.end.~d" count) insts)) (do-vector-push-inst (generate-statement (parser:ast-node-for-body node)) insts) (when (parser:ast-node-for-inc node) (do-vector-push-inst (generate-expression (parser:ast-node-for-inc node)) insts)) (vector-push-extend (format nil "jmp .L.begin.~d" count) insts) (vector-push-extend (format nil ".L.end.~d:" count) insts) (unless (parser:next-node node) (vector-push-extend "nop" insts))) insts) ((parser:ast-node-loop-p node) (incf *label-count*) (let ((count *label-count*)) (vector-push-extend (format nil ".L.begin.~d:" count) insts) (do-vector-push-inst (generate-statement (parser:ast-node-loop-body node)) insts) (vector-push-extend (format nil "jmp .L.begin.~d" count) insts) (vector-push-extend (format nil ".L.end.~d:" count) insts) (unless (parser:next-node node) (vector-push-extend "nop" insts))) insts) ((parser:ast-node-expression-p node) (do-vector-push-inst (generate-expression (parser:ast-node-expression-expr node)) insts) insts))))
Mainly the difference between loops and conditionals is just the label
.L.begin.<label no>
which we use in case we want to jump back to the
beginning of the loop with jmp .L.begin.<label no>
.
And that’s about it! I have already implemented tests for these and they should now be be passing!
Testing System sila/tests ;; testing 'sila/tests/compiler' test-compilation-and-compare-rc Integer ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 0; }" 0) to be true. (1013ms) ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ ;;;;; return 1; }" 1) to be true. (1018ms) Arithmetics ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 5 + 40 - 20; }" 25) to be true. (1022ms) ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 2 / (1 + 1) * 8; }" 8) to be true. (1015ms) Unary ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return - -10; }" 10) to be true. (1011ms) ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return -10+20; }" 10) to be true. (1022ms) Comparisons ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 0==1; }" 0) to be true. (1023ms) ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1!=1; }" 0) to be true. (1067ms) ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 0==0; }" 1) to be true. (1059ms) ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1!=0; }" 1) to be true. (1059ms) ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 0<1; }" 1) to be true. (1061ms) ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1<1; }" 0) to be true. (1087ms) ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1<=1; }" 1) to be true. (1042ms) ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 2<=1; }" 0) to be true. (1055ms) ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 0<1; }" 1) to be true. (1021ms) ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1<1; }" 0) to be true. (1014ms) ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1>=1; }" 1) to be true. (1025ms) ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1>=2; }" 0) to be true. (1014ms) Multiple statements ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1; 2; 3; }" 1) to be true. (1019ms) ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ 1; return 2; 3; }" 2) to be true. (1023ms) ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ 1; 2; return 3; }" 3) to be true. (1031ms) Variables ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ a:=8; return a; }" 8) to be true. (1043ms) ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ a:=3; b:=5; return a+b; }" 8) to be true. (1024ms) ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ foo:=3; bar:=5; return foo+bar; }" 8) to be true. (1032ms) ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ foo2:=3; bar2:=5; return foo2+bar2; }" 8) to be true. (1056ms) Block ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ 1; { 2; } return 3; }" 3) to be true. (1070ms) Conditionals ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ if 1 { return 1; } return 2; }" 1) to be true. (1085ms) ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ if 0 { return 1; } else { return 2; } }" 2) to be true. (1077ms) ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ if 1<0 { return 1; } else { return 2; } }" 2) to be true. (1024ms) For loop ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ for ;; { return 3; } return 5; }" 3) to be true. (1011ms) ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ i:=0; j:=0;for i:=0; i<=10; i:=i+1 {j := i+j;} return j; }" 55) to be true. (1019ms) Loop ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ loop { return 3; } return 5; }" 3) to be true. (1046ms) ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ i := 0; loop { i := 3; break; } return i; }" 3) to be true. (1030ms) ✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ i := 0; loop { i := i + 1; if i == 10 { break; } } return i; }" 10) to be true. (1038ms) ;; testing 'sila/tests/codegen' test-codegen-x86-64 Integer ✓ { return 0; } ✓ { return 42; } Add and subtraction ✓ { return 5+20-4; } ✓ { return 5 + 20 - 4 ; } Division and multiplication ✓ { return 2 / (1 + 1) * 8; } Unary ✓ { return - -10; } ✓ { return -10+20; } ✓ { return - - -10; } Comparison ✓ { return 1==1; } ✓ { return 1>=1; } ✓ { return 1<=1; } ✓ { return 1<1; } ✓ { return 1>1; } Multiple statements ✓ { return 1;2;3; } ✓ { 1;return 2;3; } ✓ { 1;2;return 3; } Variables ✓ { a:=8;return a; } ✓ { foo:=5;bar:=8;return foo+bar; } Block ✓ { 1; { 2; } return 3; } Conditional ✓ { if 1 == 0 { return 1; } return 0; } ✓ { if 0 { return 1; } else { return 2; } } For loop ✓ { for ;; { return 3; } return 5; } ✓ { i:=0; j:=0;for i:=0; i<=10; i:=i+1 {j := i+j;} return j; } ✓ 1 test completed Summary: All 1 test passed.
NB: If you want to read earlier posts of this dev log, head over here. |