Posted on 31st of March 2024
| 183 wordsSo, again, I’ve been in this annoying procrastination rut when it comes to my
writing (literature and coding) and reading. I’ve been quite active in other
interests although, especially music, since I just recently had one gig after a
long break of not gigging and it was super fun! But now, I’ve felt the urge to
get back to books and programming, while still of course trying to keep musical
activities relatively active.
I’ve started to read through couple new tech books focusing on compilers and
garbage collection and also rereading some German literature classics, mainly
for trying to learn the language. Also, I should get back to
Sila
development at the same time. Next
steps for the Sila development is probably going to be to start working with
code generation for aarch64. This is due to the fact that I recently got a new
Macbook with ARM processor (M3 Max), so now I need to start worry about code
generation for both x86-64 and aarch64. Which is probably going to be quite a
lot of work but hey… it’s fun (I hope).
Posted on 2nd of February 2024
| 599 wordsSomebody contacted me and mentioned that there is something off with my RSS
feeds, thanks for that. I don’t know what had happened and when, but for some
reason my normal RSS link.
Didn’t contain the correct
information. It seemed to only contain the summary of the post. I recently did
some overhaul in my own Hugo theme and its layouts so most likely I had put
something to wrong place while doing that causing this.
When I started looking at what Hugo generated, it seemed that the correct feed
was actually located at the root of my site, so topikettunen.com/feed.xml
but this isn’t right. I remember making it so at one point that I only
generate the feed in the blog section since - personally speaking - I don’t
really see the benefit of generating the feed elsewhere.
This seemed to be caused by error in my Hugo configs, which most likely I had
accidentally changed at some point. I had to add the following to my
config.toml
:
[outputs]
home = ["HTML"]
section = ["HTML", "RSS"]
taxonomy = ["HTML"]
This makes it so that Hugo generates only HTML when it comes to home and
taxonomy page (so archive
and tag
for me). In section,
like blog
, it’ll generate HTML and RSS. So that was fixed.
Like I mentioned above, the RSS feed generated under section was still wrong
since it only contained the summary. In my Hugo theme, I had already created a
custom layouts/index.rss.xml
in there with the change:
-<description>{{ .Summary | html }}</description>
+<description>{{ .Content | safeHTML }}</description>
So that it’ll generate the whole content in there instead of summary, which
Hugo does by default. Unfortunately, this was one was in the wrong place for
section RSS. It should be in layouts/section/section.rss.xml
, and then it
works fine!
Also having full content under description
in the RSS XML seems quite odd so
I fixed it to look like:
<description>{{ .Summary | html }}</description>
<content:encoded>{{ .Content | html }}</content:encoded>
To have both in their own correct fields.
I also noticed that my custom notice shortcode looked quite ugly in the RSS
feed. The way I had done this shortcode was the having the following in
layouts/shortcodes/notice.html
:
<table class="notice">
<tbody>
<tr>
<td>
{{ .Inner }}
</td>
</tr>
</tbody>
</table>
I know, this probably could be done with something else than table, but hey,
I’m not a designer.
Which I then would call in my Markdown files like:
{{< notice >}}
<strong>Plug</strong>: <a href="https://git.sr.ht/~tok/sila" target="_blank">Follow the Sila development here.</a>
{{< /notice >}}
Also while writing this, I learned how to escape shortcodes in Hugo in case
you want to use them in write them in your Markdown, but not actually use
them. You need wrap the short code inside {{</* notice */>}}
.
But unfortunately it would just generate that in the beginning of the RSS
content without proper HTML tags etc. So instead of using the notice as a
shortcode, I removed the shortcodes from Markdown and just added the following
to my layouts/_default/single.html
:
{{ if in .Params.tag "sila"}}
<table class="notice">
<tbody>
<tr>
<td>
<strong>Plug</strong>: <a href="https://git.sr.ht/~tok/sila" target="_blank">Follow the Sila development here.</a>
</td>
</tr>
</tbody>
</table>
{{ end }}
This only works when blog has a tag sila
in it. Of course, if I would like
to use such a notice in some other posts, this wouldn’t work, but for now,
it’s good enough for me. This way also Hugo doesn’t include the notice in the
generated RSS, making the feed look little bit neater!
But hey, at least now the RSS feed should be working how it should!
Posted on 27th of January 2024
| 2526 wordsHave had a small pause on programming and writing due to vacations etc., but
fortunately, now I’m back to normal schedule, so I’ll try to get back to it as
much as possible!
I recently wrote about implementing local
variables
to my compiler and next I
would like to write about implementing the initial control flow to it. With
the initial control flow, I mean stuff like loops and conditionals in a
relatively simple fashion for now.
Starting with simple conditionals. In my language I’ve decided to go with the
usual if
and else
just not to over-complicate things for me. Starting with
the required assembly code. Considering code like:
if 1 == 0 { return 1; } return 0;
Nothing too complicated going on. Of course in that, it would never return 1,
but let’s not focus on that for now. Assembly for something like this (with
the function prologue and epilogue) would look something like this:
; Prologue
push %rbp
mov %rsp, %rbp
sub $0, %rsp
mov $0, %rax ; Move the value 0 into the register %rax.
push %rax ; Push the value of %rax onto the stack.
mov $1, %rax ; Move the value 1 into the register %rax.
pop %rdi ; Pop the topmost value from the stack and store it in the %rdi register.
cmp %rdi, %rax ; Compare the values in %rdi and %rax.
sete %al ; Set the least significant byte of the register %rax to 1 if the previous comparison was equal, otherwise, set it to 0.
movzb %al, %rax ; Zero-extend %al (the least significant byte) to the entire %rax register.
cmp $0, %rax ; Compare the value in %rax with 0.
je .L.else.1 ; Jump to .L.else.1 if the zero flag is set (if the previous comparison resulted in equality).
.L.else.1: ; Label for the else block.
nop ; No operation (this instruction does nothing).
.L.end.1: ; Label for the end of the code block.
mov $0, %rax ; Move the value 0 into %rax.
; Epilogue
mov %rbp, %rsp
pop %rbp
ret
This probably could be written more cleanly but for now I’m not
too worried about code generation optimizations, so I let future Topi worry
about those.
In the last part, where we focused on implementing our local
variables
, I already added support for
doing tokenization for our keywords that are required for the control flow. So
when it comes to lexer, everything should already be in order.
Parser on the other hand requires little bit of work. First, let’s implement
structures for conditionals:
(defstruct (ast-node-cond
(:include ast-node)
(:copier nil))
(expr (util:required 'expr) :type t :read-only t)
(then (util:required 'then) :type t :read-only t)
(else (util:required 'else) :type t :read-only t))
(defmethod next-node ((node ast-node-cond))
(ast-node-cond-next node))
For now, conditionals will be parsed as a statement node, so we’ll need add
cases for conditional keywords in our parse-statement-node
:
(defun parse-statement-node (tok)
(alexandria:switch ((lex:token-value tok) :test #'string=)
#| [...] |#
("if"
(parse-cond-statement-node (lex:token-next tok)))
#| [...] |#))
parse-cond-statement-node
on the other hand looks something like this:
(defun parse-cond-statement-node (tok)
;; TOK passed in should be the token after "if".
(let (expr then else)
(multiple-value-bind (expr-node rest)
(parse-expression-node tok)
(setf expr expr-node
tok rest))
(multiple-value-bind (then-node rest)
(parse-statement-node tok)
(setf then then-node
tok rest))
(when (string= (lex:token-value tok) "else")
(multiple-value-bind (else-node rest)
(parse-statement-node (lex:token-next tok))
(setf else else-node
tok rest)))
(values (make-ast-node-cond :expr expr
:then then
:else else)
tok)))
So we’ll parse the conditional out of
if <cond> { <then-node> } else { <else-node> }
and proceed to parse the bodies inside curly braces. Parsing of
the bodies of the conditional proceed down in a recursive descent manner how
we have implemented the parses earlier.
To be able to produce somewhat similar assembly as I wrote above, naturally,
the code generation needs some work:
(defun generate-statement (node)
(let ((insts (make-inst-array)))
(cond
#| [...] |#
((parser:ast-node-cond-p node)
(incf *label-count*)
(let ((count *label-count*))
(do-vector-push-inst (generate-expression
(parser:ast-node-cond-expr node)) insts)
(vector-push-extend "cmp $0, %rax" insts)
(vector-push-extend (format nil "je .L.else.~d" count) insts)
(do-vector-push-inst (generate-statement
(parser:ast-node-cond-then node)) insts)
(vector-push-extend (format nil "jmp .L.end.~d" count) insts)
(vector-push-extend (format nil ".L.else.~d:" count) insts)
(if (parser:ast-node-cond-else node)
(do-vector-push-inst (generate-statement
(parser:ast-node-cond-else node)) insts)
(vector-push-extend "nop" insts))
(vector-push-extend (format nil ".L.end.~d:" count) insts)
(unless (parser:next-node node)
(vector-push-extend "nop" insts)))
insts)
#| [...] |#)))
Essentially what is happening here is that we compare the conditional in the
given statement to 0 and if or not comparison is true, we’ll jump to a label
.L.else.<label no>
or we just skip it and proceed forward to instructions
generated from the “then” node. Label numbering is naturally for the reason
that pretty much always in real programs functions might have multitude of
different conditional in varying levels of nesting. So we need to keep track
of which conditional is in question at this point.
But when it comes to initial implementation of conditionals, that should be
more or less it and we can proceed on to implementing loops.
Loops in many way works similarly to how conditionals were implemented above,
since you often have conditionals e.g. in “for” loops. In assembly, how
something like this would be implemented is basically the same as above, but
in the end, we would just jump back to the beginning depending of the
conditional of the loop itself.
So code something like
{ for ;; { return 3; } return 5; }
would equal to code like:
.globl main
main:
push %rbp
mov %rsp, %rbp
sub $0, %rsp
.L.begin.1:
mov $3, %rax
jmp .L.return
jmp .L.begin.1
.L.end.1:
mov $5, %rax
jmp .L.return
.L.return:
mov %rbp, %rsp
pop %rbp
ret
Or little bit more complex for loop
{ i:=0; j:=0;for i:=0; i<=10; i:=i+1 { j:= i+j; } return j; }
would equal to:
.globl main
main:
push %rbp
mov %rsp, %rbp
sub $16, %rsp
lea -16(%rbp), %rax
push %rax
mov $0, %rax
pop %rdi
mov %rax, (%rdi)
lea -8(%rbp), %rax
push %rax
mov $0, %rax
pop %rdi
mov %rax, (%rdi)
lea -16(%rbp), %rax
push %rax
mov $0, %rax
pop %rdi
mov %rax, (%rdi)
.L.begin.1:
mov $10, %rax
push %rax
lea -16(%rbp), %rax
mov (%rax), %rax
pop %rdi
cmp %rdi, %rax
setle %al
movzb %al, %rax
cmp $0, %rax
je .L.end.1
lea -8(%rbp), %rax
push %rax
lea -8(%rbp), %rax
mov (%rax), %rax
push %rax
lea -16(%rbp), %rax
mov (%rax), %rax
pop %rdi
add %rdi, %rax
pop %rdi
mov %rax, (%rdi)
lea -16(%rbp), %rax
push %rax
mov $1, %rax
push %rax
lea -16(%rbp), %rax
mov (%rax), %rax
pop %rdi
add %rdi, %rax
pop %rdi
mov %rax, (%rdi)
jmp .L.begin.1
.L.end.1:
lea -8(%rbp), %rax
mov (%rax), %rax
jmp .L.return
.L.return:
mov %rbp, %rsp
pop %rbp
ret
Again, let’s start by adding structures for our loops:
(defstruct (ast-node-for
(:include ast-node)
(:copier nil))
(init (util:required 'init) :type t :read-only t)
(inc (util:required 'inc) :type t :read-only t)
(cond (util:required 'cond) :type t :read-only t)
(body (util:required 'body) :type t :read-only t))
(defmethod next-node ((node ast-node-for))
(ast-node-for-next node))
(defstruct (ast-node-loop
(:include ast-node)
(:copier nil))
(body (util:required 'body) :type t :read-only t))
(defmethod next-node ((node ast-node-loop))
(ast-node-loop-next node))
(defstruct (ast-node-break
(:include ast-node)
(:copier nil))
(depth (util:required 'depth) :type integer :read-only t))
(defmethod next-node ((node ast-node-break))
(ast-node-break-next node))
I’ve included ast-node-loop
in here, which in my language depicts while
loop. Similarly to e.g. Rust.
Parsing loops will similarly to conditionals happen in parse-statement-node
which in current state looks fully something like this:
(defvar *break-depth* 0
"Depth counter for BREAK keyword to know from what level it should break out.")
(defun parse-statement-node (tok)
(alexandria:switch ((lex:token-value tok) :test #'string=)
("return"
(multiple-value-bind (node rest)
(parse-expression-node (lex:token-next tok))
(values (make-ast-node-return :expr node)
(lex:token-next (skip-to-token ";" rest)))))
("if"
(parse-cond-statement-node (lex:token-next tok)))
("for"
(parse-for-statement-node (lex:token-next tok)))
("loop"
(parse-loop-statement-node (lex:token-next tok)))
("break"
(values (make-ast-node-break :depth *break-depth*)
(lex:token-next (skip-to-token ";" tok))))
("{"
(parse-compound-statement-node (lex:token-next tok)))
(otherwise
(parse-expression-statement-node tok))))
Notice the *break-depth*
variable, that’ll work in similar fashion as our
label counter in the code generation for conditionals.
Parsing functions for our loop keywords looks like this:
(defun parse-for-statement-node (tok)
;; TOK passed in should be the token after "for"
(let (init cond inc body)
(multiple-value-bind (init-node rest)
(parse-expression-statement-node tok)
(setf init init-node
tok rest))
(unless (string= (lex:token-value tok) ";")
(multiple-value-bind (cond-node rest)
(parse-expression-node tok)
(setf cond cond-node
tok rest)))
(setf tok (skip-to-token ";" tok))
;; Entering "for" scope.
(incf *break-depth*)
(unless (string= (lex:token-value (lex:token-next tok)) "{")
(multiple-value-bind (inc-node rest)
(parse-expression-node (lex:token-next tok))
(setf inc inc-node
tok rest)))
(setf tok (skip-to-token "{" tok))
(multiple-value-bind (body-node rest)
(parse-statement-node tok)
(setf body body-node
tok rest))
;; Left "for" scope.
(decf *break-depth*)
(values (make-ast-node-for :init init
:cond cond
:inc inc
:body body)
tok)))
(defun parse-loop-statement-node (tok)
;; TOK passed in should be the opening brace of the block after the "loop"
;; keyword.
;; TODO(topi): Add proper error handling.
(assert (string= (lex:token-value tok) "{"))
;; Entering "loop" scope.
(incf *break-depth*)
(multiple-value-bind (body rest)
(parse-statement-node tok)
;; Left "loop" scope.
(decf *break-depth*)
(values (make-ast-node-loop :body body)
rest)))
So as you can see, parsing loops work in a very similar fashion to how
conditionals are parsed since the structure for the syntax is quite similar.
Same also applies to code generation so our generate-statement
currently
fully looks like this:
(defun generate-statement (node)
(let ((insts (make-inst-array)))
(cond
((parser:ast-node-block-p node)
(loop :for body := (parser:ast-node-block-body node)
:then (setf body (parser:next-node body))
:until (null body)
:do (do-vector-push-inst (generate-statement body) insts))
insts)
((parser:ast-node-return-p node)
(do-vector-push-inst (generate-expression
(parser:ast-node-return-expr node)) insts)
(vector-push-extend "jmp .L.return" insts)
insts)
((parser:ast-node-break-p node)
(vector-push-extend (format nil "jmp .L.end.~d"
(parser:ast-node-break-depth node)) insts)
insts)
((parser:ast-node-cond-p node)
(incf *label-count*)
(let ((count *label-count*))
(do-vector-push-inst (generate-expression
(parser:ast-node-cond-expr node)) insts)
(vector-push-extend "cmp $0, %rax" insts)
(vector-push-extend (format nil "je .L.else.~d" count) insts)
(do-vector-push-inst (generate-statement
(parser:ast-node-cond-then node)) insts)
(vector-push-extend (format nil "jmp .L.end.~d" count) insts)
(vector-push-extend (format nil ".L.else.~d:" count) insts)
(if (parser:ast-node-cond-else node)
(do-vector-push-inst (generate-statement
(parser:ast-node-cond-else node)) insts)
(vector-push-extend "nop" insts))
(vector-push-extend (format nil ".L.end.~d:" count) insts)
(unless (parser:next-node node)
(vector-push-extend "nop" insts)))
insts)
((parser:ast-node-for-p node)
(incf *label-count*)
(let ((count *label-count*))
(do-vector-push-inst (generate-statement
(parser:ast-node-for-init node)) insts)
(vector-push-extend (format nil ".L.begin.~d:" count) insts)
(when (parser:ast-node-for-cond node)
(do-vector-push-inst (generate-expression
(parser:ast-node-for-cond node)) insts)
(vector-push-extend "cmp $0, %rax" insts)
(vector-push-extend (format nil "je .L.end.~d" count) insts))
(do-vector-push-inst (generate-statement
(parser:ast-node-for-body node)) insts)
(when (parser:ast-node-for-inc node)
(do-vector-push-inst (generate-expression
(parser:ast-node-for-inc node)) insts))
(vector-push-extend (format nil "jmp .L.begin.~d" count) insts)
(vector-push-extend (format nil ".L.end.~d:" count) insts)
(unless (parser:next-node node)
(vector-push-extend "nop" insts)))
insts)
((parser:ast-node-loop-p node)
(incf *label-count*)
(let ((count *label-count*))
(vector-push-extend (format nil ".L.begin.~d:" count) insts)
(do-vector-push-inst (generate-statement
(parser:ast-node-loop-body node)) insts)
(vector-push-extend (format nil "jmp .L.begin.~d" count) insts)
(vector-push-extend (format nil ".L.end.~d:" count) insts)
(unless (parser:next-node node)
(vector-push-extend "nop" insts)))
insts)
((parser:ast-node-expression-p node)
(do-vector-push-inst (generate-expression
(parser:ast-node-expression-expr node)) insts)
insts))))
Mainly the difference between loops and conditionals is just the label
.L.begin.<label no>
which we use in case we want to jump back to the
beginning of the loop with jmp .L.begin.<label no>
.
And that’s about it! I have already implemented tests for these and they
should now be be passing!
Testing System sila/tests
;; testing 'sila/tests/compiler'
test-compilation-and-compare-rc
Integer
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 0; }" 0) to be true. (1013ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ ;;;;; return 1; }" 1) to be true. (1018ms)
Arithmetics
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 5 + 40 - 20; }" 25) to be true. (1022ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 2 / (1 + 1) * 8; }" 8) to be true. (1015ms)
Unary
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return - -10; }" 10) to be true. (1011ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return -10+20; }" 10) to be true. (1022ms)
Comparisons
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 0==1; }" 0) to be true. (1023ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1!=1; }" 0) to be true. (1067ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 0==0; }" 1) to be true. (1059ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1!=0; }" 1) to be true. (1059ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 0<1; }" 1) to be true. (1061ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1<1; }" 0) to be true. (1087ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1<=1; }" 1) to be true. (1042ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 2<=1; }" 0) to be true. (1055ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 0<1; }" 1) to be true. (1021ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1<1; }" 0) to be true. (1014ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1>=1; }" 1) to be true. (1025ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1>=2; }" 0) to be true. (1014ms)
Multiple statements
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1; 2; 3; }" 1) to be true. (1019ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ 1; return 2; 3; }" 2) to be true. (1023ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ 1; 2; return 3; }" 3) to be true. (1031ms)
Variables
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ a:=8; return a; }" 8) to be true. (1043ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ a:=3; b:=5; return a+b; }" 8) to be true. (1024ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ foo:=3; bar:=5; return foo+bar; }" 8) to be true. (1032ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC
"{ foo2:=3; bar2:=5; return foo2+bar2; }" 8) to be true. (1056ms)
Block
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ 1; { 2; } return 3; }" 3) to be true. (1070ms)
Conditionals
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ if 1 { return 1; } return 2; }" 1) to be true. (1085ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC
"{ if 0 { return 1; } else { return 2; } }" 2) to be true. (1077ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC
"{ if 1<0 { return 1; } else { return 2; } }" 2) to be true. (1024ms)
For loop
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ for ;; { return 3; } return 5; }" 3) to be true. (1011ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC
"{ i:=0; j:=0;for i:=0; i<=10; i:=i+1 {j := i+j;} return j; }" 55) to be true. (1019ms)
Loop
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ loop { return 3; } return 5; }" 3) to be true. (1046ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC
"{ i := 0; loop { i := 3; break; } return i; }" 3) to be true. (1030ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC
"{ i := 0; loop { i := i + 1; if i == 10 { break; } } return i; }" 10) to be true. (1038ms)
;; testing 'sila/tests/codegen'
test-codegen-x86-64
Integer
✓ { return 0; }
✓ { return 42; }
Add and subtraction
✓ { return 5+20-4; }
✓ { return 5 + 20 - 4 ; }
Division and multiplication
✓ { return 2 / (1 + 1) * 8; }
Unary
✓ { return - -10; }
✓ { return -10+20; }
✓ { return - - -10; }
Comparison
✓ { return 1==1; }
✓ { return 1>=1; }
✓ { return 1<=1; }
✓ { return 1<1; }
✓ { return 1>1; }
Multiple statements
✓ { return 1;2;3; }
✓ { 1;return 2;3; }
✓ { 1;2;return 3; }
Variables
✓ { a:=8;return a; }
✓ { foo:=5;bar:=8;return foo+bar; }
Block
✓ { 1; { 2; } return 3; }
Conditional
✓ { if 1 == 0 { return 1; } return 0; }
✓ { if 0 { return 1; } else { return 2; } }
For loop
✓ { for ;; { return 3; } return 5; }
✓ { i:=0; j:=0;for i:=0; i<=10; i:=i+1 {j := i+j;} return j; }
✓ 1 test completed
Summary:
All 1 test passed.
Posted on 29th of November 2023
| 29 wordsI just decided to gather all the songs I have included in my posts in the “Now
playing” section to one neat page. This page can be found
here.
Posted on 16th of November 2023
| 2834 wordsI’ve been little bit slacking on the writing aspects of things but also little
bit on the side of adding new features to the language as well. So much of my
time have been spent on various bikeshedding topics. Although important work
nonetheless. I won’t be writing too much about those cleanups and refactorings
but you can follow the development of Sila from the link above.
After the bikeshedding, or yak-shaving, I got to writing new features such as
initial implementation of the control flow for the language (conditionals and
loops). So I could open a little bit on the development of those.
Before I started implementing either conditionals or loops, I wanted to
implement the ability to assign local variables, such as a := 0
. Starting
with tokenization, I needed to differentiate somehow between identifiers and
reserved keywords. Starting with my tokenize
, adding support for those was
simply adding following to it:
(defun tokenize (src)
"Generate tokens from the given source code."
(let* ((head (make-token))
(cur head)
(src-pos 0))
(macrolet ((gentoken (kind)
(let ((token-gen-fn (intern (format nil "GEN-~a-TOKEN" kind))))
`(multiple-value-bind (token pos)
(,token-gen-fn src src-pos)
(setf (token-next cur) token)
(setf cur (token-next cur))
(setf src-pos pos)))))
(loop :while (< src-pos (length src))
:do (cond
;; [...]
;; Ident or keyword
((alpha-char-p (char src src-pos))
(gentoken ident-or-keyword))
(t
(error 'lexer-error
:lexer-input src
:error-msg "Invalid token."
:token-pos src-pos)))))
;; No more tokens.
(setf (token-next cur) (make-token :kind :eof :position src-pos))
(setf cur (token-next cur))
(token-next head)))
Which basically checks that if the current character in the given source code
starts with a letter, it should be tokenized as either identifier or keyword.
I wrote a simple helper macro genmacro
to cleanup some code, which when
called with something like (genmacro ident-or-keyword)
would call function
gen-ident-or-keyword
and inserts the generated token to the token list.
gen-ident-or-keyword
itself looks like this:
(defvar *sila-keywords*
#("return" "if" "else" "for" "loop" "break"))
(defun gen-ident-or-keyword-token (src src-pos)
"Generate IDENT or KEYWORD token and return it and the SRC-POS to the next
token in SRC."
(flet ((keyword-lookup (input pos)
(let ((keyword-end (skip-to #'whitespacep input pos)))
;; Keyword not found
(when (null keyword-end)
(return-from keyword-lookup (values nil pos)))
(let ((keyword (subseq input pos keyword-end)))
(if (find keyword *sila-keywords* :test #'string=)
(values keyword
(skip-to #'(lambda (c) (not (whitespacep c)))
input keyword-end))
(values nil pos))))))
(multiple-value-bind (keyword next-token-pos)
(keyword-lookup src src-pos)
(let* ((punct-pos (skip-to #'punctuatorp src src-pos))
(token-len (cond (keyword (length keyword))
(punct-pos (- punct-pos src-pos))
(t (- (length src) src-pos))))
(token-val (if keyword
keyword
(trim-whitespace
(subseq src src-pos (+ src-pos token-len))))))
(setf src-pos (cond (keyword next-token-pos)
(punct-pos punct-pos)
(t (length src))))
(values (make-token :kind (if keyword :keyword :ident)
:value token-val
:length (length token-val)
:position src-pos)
src-pos)))))
Which is pretty straight-forward function that tries to tokenize something
like a := 0;
into a following structure:
SILA/LEXER> (tokenize "a := 0;")
#S(TOKEN
:KIND :IDENT
:POSITION 2
:LENGTH 1
:VALUE "a"
:NEXT #S(TOKEN
:KIND :PUNCT
:POSITION 4
:LENGTH 2
:VALUE ":="
:NEXT #S(TOKEN
:KIND :NUM
:POSITION 6
:LENGTH 1
:VALUE "0"
:NEXT #S(TOKEN
:KIND :PUNCT
:POSITION 7
:LENGTH 1
:VALUE ";"
:NEXT #S(TOKEN
:KIND :EOF
:POSITION 7
:LENGTH 0
:VALUE ""
:NEXT NIL)))))
One thing I immediately noticed with my current tokenization setup was that if
I wrote an identifier that start with a number, it would have been tokenized
similarly to how rest of the numbers would get tokenized. Which of course
wouldn’t work with identifiers. So I had to add a simple error handling in the
number generation side to check if the identifier can’t start with numbers:
(defun gen-number-token (src src-pos)
"Generate token for NUMBER and return it and the SRC-POS to the next token in
SRC."
#| [...] |#
;; Idents starting with a letter will be caught with
;; a different conditional so if this is hit, ident
;; starts with a number but contains letters, which
;; isn't acceptable.
(unless (every #'digit-char-p token-val)
(error 'lexer-error
:lexer-input src
:error-msg "Ident can't start with a number."
:token-pos src-pos))
#| [...] |#)
This setup also handles keywords as intended, which naturally will be needed
for the implementation of the control flow:
SILA/LEXER> (print-tokens (tokenize "{ a := 0; return a; }"))
#S(TOKEN :KIND PUNCT :POSITION 1 :LENGTH 1 :VALUE {)
#S(TOKEN :KIND IDENT :POSITION 4 :LENGTH 1 :VALUE a)
#S(TOKEN :KIND PUNCT :POSITION 6 :LENGTH 2 :VALUE :=)
#S(TOKEN :KIND NUM :POSITION 8 :LENGTH 1 :VALUE 0)
#S(TOKEN :KIND PUNCT :POSITION 9 :LENGTH 1 :VALUE ;)
#S(TOKEN :KIND KEYWORD :POSITION 17 :LENGTH 6 :VALUE return)
#S(TOKEN :KIND IDENT :POSITION 18 :LENGTH 1 :VALUE a)
#S(TOKEN :KIND PUNCT :POSITION 19 :LENGTH 1 :VALUE ;)
#S(TOKEN :KIND PUNCT :POSITION 21 :LENGTH 1 :VALUE })
#S(TOKEN :KIND EOF :POSITION 21 :LENGTH 0 :VALUE )
; No value
Parsing side of things is little bit more involved on the other hand. I
started the implementation with writing following structures:
(defstruct (ast-node-variable
(:include ast-node)
(:copier nil))
(object (util:required 'object) :type object :read-only t))
(defstruct (object
(:copier nil))
(name (util:required 'name) :type string :read-only t)
(offset 0 :type integer)
(next nil :type t))
(defstruct (ast-node-assign
(:include ast-node)
(:copier nil))
(var (util:required 'var) :type ast-node-variable :read-only t)
(expr (util:required 'expr) :type t :read-only t))
Parsing the variables and assign statements then looks like this:
(defun parse-assign-node (tok)
(let (node)
(multiple-value-bind (eql-node rest)
(parse-equality-node tok)
(setf node eql-node
tok rest))
(when (string= (lex:token-value tok) ":=")
(multiple-value-bind (expr-node rest)
(parse-assign-node (lex:token-next tok))
(setf node (make-ast-node-assign :var node
:expr expr-node)
tok rest)))
(values node tok)))
;;; [...]
(defvar *local-variables* nil
"Global variable for holding local variable objects.")
(defun parse-primary-node (tok)
(cond
#| [...] |#
((eq (lex:token-kind tok) :ident)
(let* ((name (lex:token-value tok))
(var (find-local-var name)))
(when (null var)
(setf var (make-object :name name :next *local-variables*))
;; New object should be in front of the list.
(setf *local-variables* var))
(values (make-ast-node-variable :object var)
(lex:token-next tok))))
#| [...] |#
(t (error "Unexpected token value: ~a" tok))))
Since I’m using recursive descent parsing, parsing the variable node happens
via the equality parsing function which trickles down all the way to primary
node. Again relatively straight forward parsing here. One thing to worth
noting is how I save all the local variables to separate variable called
*local-variables*
. This is mainly for the fact that when it comes to code
generation I need to save the fixed offsets in the memory for each variable in
use. So I need to know all the variables that I have parsed. There is probably
bunch of optimizations that could be done, but for now, I’m not interested in
those.
Setting those variable offsets happens after the whole program has been
parsed:
(defstruct (func
(:copier nil))
(body (util:required 'body) :type t :read-only t)
(locals (util:required locals) :type t :read-only t)
(stack-size 0 :type integer))
;;; [...]
(defun parse-program (tok)
(labels ((align-to (n align)
"Round N to the nearest multiple of ALIGN."
(* (ceiling n align) align))
(set-lvar-offsets (program)
(let ((offset 0))
(loop :for obj := (func-locals program)
:then (setf obj (object-next obj))
:until (null obj)
:do (progn
(incf offset 8)
(setf (object-offset obj) (- offset))))
(setf (func-stack-size program) (align-to offset 16)))
(values)))
(let* ((head (make-ast-node))
(cur head))
(loop :until (eq (lex:token-kind tok) :eof)
:do (multiple-value-bind (node rest)
(parse-statement-node tok)
(setf (ast-node-next cur) node)
(setf cur (ast-node-next cur))
(setf tok rest)))
(let ((program (make-func :body (ast-node-next head)
:locals *local-variables*)))
(set-lvar-offsets program)
program))))
And that’s about for tokenization and parsing. When parsing a program with
local variables it should look something like this:
SILA/PARSER> (parse-program (lex:tokenize "{ a := 0; return a; }"))
#S(FUNC
:BODY #S(AST-NODE-BLOCK
:NEXT NIL
:BODY #S(AST-NODE-EXPRESSION
:NEXT #S(AST-NODE-RETURN
:NEXT NIL
:EXPR #S(AST-NODE-VARIABLE
:NEXT NIL
:OBJECT #S(OBJECT
:NAME "a"
:OFFSET -8
:NEXT NIL)))
:EXPR #S(AST-NODE-ASSIGN
:NEXT NIL
:VAR #S(AST-NODE-VARIABLE
:NEXT NIL
:OBJECT #S(OBJECT
:NAME "a"
:OFFSET -8
:NEXT NIL))
:EXPR #S(AST-NODE-INTEGER-LITERAL
:NEXT NIL
:VALUE 0))))
:LOCALS #S(OBJECT :NAME "a" :OFFSET -8 :NEXT NIL)
:STACK-SIZE 16)
Finally, we can proceed to the code generation aspects!
To implement local variables with x86 assembly language, we need to write epilogue
and prologue for our function (currently only main function) which prepares
the stack for use within the function. These will look like this:
; Prologue
push %rbp
mov %rsp, %rbp
sub STACK_SIZE, %rsp
; Epilogue
mov %rbp, %rsp
pop %rbp
ret
So what’s happening here:
Prologue:
push %rbp
: Saves the value of the base pointer register (%rbp
) onto
the stack.mov %rsp, %rbp
: Sets the value of the stack pointer (%rsp
) as the new
base pointer (%rbp
).sub STACK_SIZE, %rsp
: Allocates space on the stack for local variables
by subtracting a amount required by our local variables (STACK_SIZE
,
calculated above) from the stack pointer (%rsp
).
Epilogue:
mov %rbp, %rsp
: Restores the stack pointer to its original value by
copying the value of the base pointer back to the stack pointer.pop %rbp
: Restores the original value of the base pointer by popping it
from the stack.ret
: Returns control flow back to the calling function.
in x86, similar thing could be achieved with enter
and leave
instructions but they do little bit more than just pushing/popping and
moving. So I might start using those at some point if the prologue and
epilogue starts to get more complicated. For now, this raw assembly is good
enough for me.
So, if we want to write generate x86-64 code for the code we have above ({ a := 0; return a; }
), we have to write the following assembly:
.globl main
main:
; Prologue
push %rbp
mov %rsp, %rbp
sub $16, %rsp
; Saving 'a' to an address offset relative to the base pointer.
lea -8(%rbp), %rax
push %rax
mov $0, %rax
pop %rdi
mov %rax, (%rdi)
; Calculating the address offset of 'a' and moving it to rax for return.
lea -8(%rbp), %rax
mov (%rax), %rax
; Epilogue
mov %rbp, %rsp
pop %rbp
ret
Again, this code could be cleaned up and optimized quite a bit and
doesn’t need to be so involved with simple code like this, but I’m leaving
the code generation optimizations tasks for future me.
With the code above, we now have a great test for starting to implement the code generation itself. In that, with the AST tree that we generated above, generating code for it is also pretty trivial:
(defun generate-expression (node)
"Recursively generate the x86-64 assembly code."
(flet ((lea (node)
"Load effective address"
(format nil "lea ~d(%rbp), %rax"
(parser:object-offset
(parser:ast-node-variable-object node)))))
(let ((insts (make-inst-array)))
(cond
#| [...] |#
((parser:ast-node-variable-p node)
(vector-push-extend (lea node) insts)
(vector-push-extend "mov (%rax), %rax" insts)
insts)
((parser:ast-node-assign-p node)
(vector-push-extend (lea (parser:ast-node-assign-var node)) insts)
(vector-push-extend (asm-push) insts)
(do-vector-push-inst (generate-expression
(parser:ast-node-assign-expr node)) insts)
(vector-push-extend (asm-pop "rdi") insts)
(vector-push-extend "mov %rax, (%rdi)" insts)
insts)
#| [...] |#))))
So the code above essentially is for the following assembly code:
lea OFFSET(%rbp), %rax
push %rax
mov VALUE, %rax
pop %rdi
mov %rax, (%rdi)
In case we want to return the value we need to do the following:
(defun generate-statement (node)
(let ((insts (make-inst-array)))
(cond
#| [...] |#
((parser:ast-node-return-p node)
(do-vector-push-inst (generate-expression
(parser:ast-node-return-expr node)) insts)
(vector-push-extend "jmp .L.return" insts)
insts)
#| [...] |#)))
Which is for generating assembly code for statement like return <expr>;
.
Here I decided the use jmp jmp .L.return
, so in case I would write an
return
statement deeply nested in other statements, like if
or for
etc.,
I could just exit early from those and jump directly to the epilogue.
So printing the whole program might look something like this:
(defun emit-code (src &key (stream nil) (indent 2) (indent-tabs t))
"Emit assembly code from given source code. Currently emits only x86-64 and
only Linux is tested."
;; Init environment
(setf parser:*local-variables* nil
*stack-depth* 0
*label-count* 0)
(let ((indent (if indent-tabs
#\Tab
(coerce (make-list indent
:initial-element #\Space)
'string))))
(let ((program (parser:parse-program (lex:tokenize src))))
;; TODO(topi): these instructions probably should be collected to some
;; structure so they can be divided in to sections more easily when the
;; programs become more complex.
(format stream
"~{~a~%~}"
(alexandria:flatten
(list
;; ASM Directive
(format nil "~a.globl main" indent)
;; Main Label
"main:"
;; Prologue
(format nil "~apush %rbp" indent)
(format nil "~amov %rsp, %rbp" indent)
(format nil "~asub $~a, %rsp" indent
(parser:func-stack-size program))
;; ASM Routine
(loop :for inst
:across (generate-statement (parser:func-body program))
:collect (if (string= (subseq inst 0 3) ".L.")
;; If instruction is label (.L. prefix),
;; don't indent it.
inst
(format nil "~a~a" indent inst)))
;; Return label
".L.return:"
;; Epilogue
(format nil "~amov %rbp, %rsp" indent)
(format nil "~apop %rbp" indent)
;; Return
(format nil "~aret" indent)))))))
Parameters stream
, indent
and indent-tabs
are not needed for
the functionality of code generation itself, but they are just used here as
helpers.
With that we can write a couple of test to see that programs actually work as
intended:
(deftest test-compilation-and-compare-rc
(testing "Integer"
(ok (compile-program-and-compare-rc "{ return 0; }" 0))
(ok (compile-program-and-compare-rc "{ ;;;;; return 1; }" 1)))
(testing "Arithmetics"
(ok (compile-program-and-compare-rc "{ return 5 + 40 - 20; }" 25))
(ok (compile-program-and-compare-rc "{ return 2 / (1 + 1) * 8; }" 8)))
(testing "Unary"
(ok (compile-program-and-compare-rc "{ return - -10; }" 10))
(ok (compile-program-and-compare-rc "{ return -10+20; }" 10)))
(testing "Comparisons"
(ok (compile-program-and-compare-rc "{ return 0==1; }" 0))
(ok (compile-program-and-compare-rc "{ return 1!=1; }" 0))
(ok (compile-program-and-compare-rc "{ return 0==0; }" 1))
(ok (compile-program-and-compare-rc "{ return 1!=0; }" 1))
(ok (compile-program-and-compare-rc "{ return 0<1; }" 1))
(ok (compile-program-and-compare-rc "{ return 1<1; }" 0))
(ok (compile-program-and-compare-rc "{ return 1<=1; }" 1))
(ok (compile-program-and-compare-rc "{ return 2<=1; }" 0))
(ok (compile-program-and-compare-rc "{ return 0<1; }" 1))
(ok (compile-program-and-compare-rc "{ return 1<1; }" 0))
(ok (compile-program-and-compare-rc "{ return 1>=1; }" 1))
(ok (compile-program-and-compare-rc "{ return 1>=2; }" 0)))
(testing "Multiple statements"
(ok (compile-program-and-compare-rc "{ return 1; 2; 3; }" 1))
(ok (compile-program-and-compare-rc "{ 1; return 2; 3; }" 2))
(ok (compile-program-and-compare-rc "{ 1; 2; return 3; }" 3)))
(testing "Variables"
(ok (compile-program-and-compare-rc "{ a:=8; return a; }" 8))
(ok (compile-program-and-compare-rc "{ a:=3; b:=5; return a+b; }" 8))
(ok (compile-program-and-compare-rc "{ foo:=3; bar:=5; return foo+bar; }" 8))
(ok (compile-program-and-compare-rc "{ foo2:=3; bar2:=5; return foo2+bar2; }" 8)))
(testing "Block"
(ok (compile-program-and-compare-rc "{ 1; { 2; } return 3; }" 3))))
Testing System sila/tests
;; testing 'sila/tests/compiler'
test-compilation-and-compare-rc
Integer
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 0; }" 0) to be true. (492ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ ;;;;; return 1; }" 1) to be true. (441ms)
Arithmetics
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 5 + 40 - 20; }" 25) to be true. (423ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 2 / (1 + 1) * 8; }" 8) to be true. (431ms)
Unary
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return - -10; }" 10) to be true. (419ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return -10+20; }" 10) to be true. (454ms)
Comparisons
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 0==1; }" 0) to be true. (476ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1!=1; }" 0) to be true. (483ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 0==0; }" 1) to be true. (470ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1!=0; }" 1) to be true. (491ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 0<1; }" 1) to be true. (496ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1<1; }" 0) to be true. (487ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1<=1; }" 1) to be true. (474ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 2<=1; }" 0) to be true. (498ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 0<1; }" 1) to be true. (503ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1<1; }" 0) to be true. (502ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1>=1; }" 1) to be true. (471ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1>=2; }" 0) to be true. (418ms)
Multiple statements
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ return 1; 2; 3; }" 1) to be true. (433ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ 1; return 2; 3; }" 2) to be true. (426ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ 1; 2; return 3; }" 3) to be true. (441ms)
Variables
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ a:=8; return a; }" 8) to be true. (432ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ a:=3; b:=5; return a+b; }" 8) to be true. (454ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ foo:=3; bar:=5; return foo+bar; }" 8) to be true. (467ms)
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC
"{ foo2:=3; bar2:=5; return foo2+bar2; }" 8) to be true. (452ms)
Block
✓ Expect (COMPILE-PROGRAM-AND-COMPARE-RC "{ 1; { 2; } return 3; }" 3) to be true. (427ms)
;; testing 'sila/tests/codegen'
test-codegen-x86-64
Integer
✓ { return 0; }
✓ { return 42; }
Add and subtraction
✓ { return 5+20-4; }
✓ { return 5 + 20 - 4 ; }
Division and multiplication
✓ { return 2 / (1 + 1) * 8; }
Unary
✓ { return - -10; }
✓ { return -10+20; }
✓ { return - - -10; }
Comparison
✓ { return 1==1; }
✓ { return 1>=1; }
✓ { return 1<=1; }
✓ { return 1<1; }
✓ { return 1>1; }
Multiple statements
✓ { return 1;2;3; }
✓ { 1;return 2;3; }
✓ { 1;2;return 3; }
Variables
✓ { a:=8;return a; }
✓ { foo:=5;bar:=8;return foo+bar; }
Block
✓ { 1; { 2; } return 3; }
✓ 1 test completed
Summary:
All 1 test passed.
He-hey! Tests seemed to pass! Until next time!