Building GPT From Scratch

Stumbled upon an interesting video by Andrej Karpathy talking about building Generatively Pretrained Transformer (GPT). I definitely need to take some time to start looking through the papers linked in the video on wonder how things work out.

Youtube video kCc8FmEb1nY

Haskell and C++ FFI for Fun and Profit

As some of you may know, I have a soft spot for functional programming. So most of the code I tend to write outside my professional work tends to be written in Haskell. Sometimes some other functional languages too, but Haskell is definitely the one that I tend to default if I’m able to have a choice. Unfortunately, the world we live in is still living in a world of imperative languages, and due to the sheer amount of that code, we most likely will be living in that world for years to come; I can’t say forever since the planet is not here forever, but we’re talking about a long, long time. It’s also the same for my own professional life. Even though I work as a DevOps consultant, I’m mainly working on projects where I’m contributing code, mainly in Go and C++ and, to a lesser extent, in JavaScript and Python. Of course, I would like to work more with something like Haskell for a living, but at the same time, I’m not very fanatical when it comes to tooling as long as it gets the job done. But still, the interaction between imperative and functional worlds is fascinating. So we are talking about foreign function interface – FFI – here.

Most code I tend to write, at least considering lines of code, is definitely, Haskell and C++, and I’m quite comfortable with both of those languages, although the latter one may be a little more reluctantly. Lately, I’ve been banging my head against the wall with the FFI of Haskell since I wanted to write a particular piece of code in mainly Haskell, but I needed something from the world of C++. Could have I written this in just Haskell? Probably, but despite enjoying functional programming more over imperative code, certain data structures, algorithms and libraries work better in the C++ world and sometimes you might need to “extra oomph” when it comes to performance.

Okay, better is a strong word for this, maybe better on how they work in the imperative world and but they don’t translate one to one to fully functional and pure world of Haskell where there are better options for data structures and algorithms. Sure some libraries are only written in C++, so there is nothing that can be done with it. For functional data structures, I definitely recommend Purely Functional Data Structures by Chris Okasaki.

Generally speaking, when it comes to FFI, interacting with C is pretty straightforward, but when C++ comes into play, some extra ceremony is required due to the nature of the language with things like unstable ABI, more extensive language and more complex features. Thankfully, C++ offers an easy way to write C ABI compatible code with the extern "C" keyword, which makes it so that the code can be called from C. But there is a caveat in this. While you can use C++ features inside the function body, when you use extern "C" functions type signature needs to be compatible with C. So no fancy STL features etc., here.

Also, when we talk about C and/or C++, memory management comes into question. So if you want to bind those languages to some memory-managed language like Haskell, you need to ensure that the memory gets handled correctly. C++ offers fancy features like RAII, smart pointers and stuff for making memory management a little bit easier, but that’s not the case in C.

Let’s start by creating a small Cabal project and some sample C++ library that we would like to interact with from Haskell.

common base
  ghc-options: -Wall -Wextra -Wno-orphans -Wno-name-shadowing
  default-language: Haskell2010
  build-depends: base ^>=4.16.3.0

executable arith
  import: base
  main-is: Main.hs
  hs-source-dirs: app

  -- C++ bits
  cxx-options: -std=c++20 -Wall -Werror -Wextra
  cxx-sources: cbits/arith_capi.cc cbits/arith.cc
  include-dirs: cbits
  extra-libraries: stdc++

While otherwise, it’s a reasonably standard Cabal boilerplate, the interesting bits are the lines relating to C++. Basically, what we do here is define some compiler options for the C++ compiler, where the C++ source files are located, and the relevant header files. Commonly, in Haskell, if you’re library/application has had anything related to C/C++ files relevant for those have usually resided in a directory called cbits, but nothing forces you to follow this convention. When that’s done, we can proceed to write some “earth-shattering” C++ library for our application.

// arith.h

#pragma once

struct arith {
  arith() noexcept;

  int add(int x, int y) noexcept;
  int sub(int x, int y) noexcept;
  int mult(int x, int y) noexcept;
  int div(int x, int y) noexcept;
};
// arith.cc

#include "arith.h"

arith::arith() noexcept {}

int arith::add(int x, int y) noexcept { return x + y; }
int arith::sub(int x, int y) noexcept { return x - y; }
int arith::mult(int x, int y) noexcept { return x * y; }
int arith::div(int x, int y) noexcept { return x / y; }

We’ll just define a simple arith struct/class with some elementary arithmetic operations. Nothing too fancy. This will work as our C++ library that we’ll interact with via Haskell. After that’s done, we need to provide some simple C API for this library so that we can interact with the library via the stable C ABI.

// arith_capi.h

#pragma once

#ifdef __cplusplus
extern "C" {
#endif

typedef struct arith arith;

extern arith *arith_new();

extern void arith_delete(arith *p);

extern int arith_add(arith *p, int x, int y);
extern int arith_sub(arith *p, int x, int y);
extern int arith_mult(arith *p, int x, int y);
extern int arith_div(arith *p, int x, int y);

#ifdef __cplusplus
}
#endif
// arith_capi.cc

#include "arith_capi.h"
#include "arith.h"

extern "C" {
  struct arith *arith_new() { return new arith(); }

  void arith_delete(arith *p) { delete p; }

  int arith_add(arith *p, int x, int y) { return p->add(x, y); }

  int arith_sub(arith *p, int x, int y) { return p->sub(x, y); }

  int arith_mult(arith *p, int x, int y) { return p->mult(x, y); }

  int arith_div(arith *p, int x, int y) { return p->div(x, y); }
}

In our C API, you’ll notice that we need to wrap our functions inside extern "C" to ensure that they’re compatible with C ABI. Also since extern "C" is a C++ keyword, we’ll wrap it in #ifdef __cplusplus directive to ensure that it gets only used if we happen to call this via C++. In the actual implementation side, you can notice that we use new and delete to do the memory management. The thing to note here is that using those keywords in “modern C++” is very much frowned upon since the language offers better ways to do that management with features like RAII, smart pointers etc., which basically makes it so that programmer don’t need memory management explicitly like we do here, but instead, they can let the compiler do it for you. We on the other need to use those keywords since we are similar management but from Haskell with its foreign pointers, which makes it so that we are able to leave the memory management to Haskell’s runtime and garbage collector.

Now we have the C++ bits done, we can proceed on how to interact with that via Haskell. All Haskell’s FFI features reside behind GHC’s {-# LANGUAGE ForeignFunctionInterface #-} language extension. So first, we need to include that in our Haskell files (either on top of the file or in the project’s Cabal file), and we can already import some of the needed modules.

{-# LANGUAGE ForeignFunctionInterface #-}

module Main where

import Control.Exception ( mask_ )
import Foreign.Ptr ( FunPtr, Ptr )
import Foreign.C.Types ( CInt(..) )
import Foreign.ForeignPtr ( ForeignPtr, newForeignPtr, withForeignPtr )

What are we importing here?

After the importing shenanigans, we can proceed on to make foreign imports to our code so that we can actually call the C++ code we just wrote.

data Arith

foreign import ccall unsafe "arith_capi.h arith_new" c_arithNew :: IO (Ptr Arith)
foreign import ccall unsafe "arith_capi.h &arith_delete" c_arithDelete :: FunPtr (Ptr Arith -> IO ())
foreign import ccall unsafe "arith_capi.h arith_add" c_arithAdd :: Ptr Arith -> CInt -> CInt -> IO CInt
foreign import ccall unsafe "arith_capi.h arith_sub" c_arithSub :: Ptr Arith -> CInt -> CInt -> IO CInt
foreign import ccall unsafe "arith_capi.h arith_mult" c_arithMult :: Ptr Arith -> CInt -> CInt -> IO CInt
foreign import ccall unsafe "arith_capi.h arith_div" c_arithDiv :: Ptr Arith -> CInt -> CInt -> IO CInt

So what’s happening here:

Finally, you’re actually able to call these functions.

-- | Create a new foreign object that will be cleaned after it's not in use
-- anymore. It also uses mask_ in case the pointer leaks if an exception happens.
newArith :: IO (ForeignPtr Arith)
newArith = mask_ c_arithNew >>= newForeignPtr c_arithDelete

main :: IO ()
main = newArith >>= \arith -> withForeignPtr arith $ \ptr -> do
  -- Foreign object is now unwrapped to a foreign pointer which you can use in
  -- any FFI function you described above.
  c_arithAdd ptr 1 1 >>= print
  c_arithSub ptr 1 1 >>= print
  c_arithMult ptr 2 2 >>= print
  c_arithDiv ptr 2 2 >>= print

Now you should be able to run your program and interact with C++ in safe manner from the comfortable world of Haskell.

$ cabal run
... cabal stuff ...
2
0
4
1

So you can see that while calling C++ from a foreign language is definitely possible, it just requires a bit more ceremony than calling C from these kinds of languages. I initially started to bang my head against the wall with these FFI shenanigans just for the need to use some C++ interfaces that didn’t offer C API, so hopefully, this proves to be beneficial to some. At least, I can use this to freshen my memory in the future since I can guarantee that I’ll forget about it.

The Poetry of Programming

I found this great quote from Richard P. Gabriel, Lisp hacker and a poet, which summarizes nicely my feelings towards programming and how I see it as a creative field more so than technical:

“Writing code certainly feels very similar to writing poetry. When I’m writing poetry, it feels like the center of my thinking is in a particular place, and when I’m writing code the center of my thinking feels in the same kind of place. It’s the same kind of concentration. So, I’m thinking up possibilities, I’m thinking about, well, so how do I reinvent the code, gee, you know, what’s the simplest way to do this.”

– Richard P. Gabriel

Table-Driven Test Design in C++ with GoogleTest

I’ve always found tremendous value in testing my software. Especially what might be closest to home for developers - or at least should be - are unit tests. While unit tests are not necessarily the best way of making safe working code (this often requires a little bit more exhaustive testing) but at least they’re very beneficial for your future self and/or co-workers who might be working with your code since with them you can quickly see any new errors that might’ve come from regression.

That being said, often, writing unit tests can be quite cumbersome. I would love to see some mature tooling for randomized testing like QuickCheck in Haskell (and later some other languages too) that would “just work”, but often something like that just isn’t possible, especially when the project reaches a certain degree of complexity. Tests and test suites should be designed on their own as well as your code itself. Unfortunately, people tend to forget this. In these kinds of cases, quite simple table-driven test design can come to help!

I first stumbled upon table-driven test design when I was working with Go, since in there, this seems to be a quite popular way of doing unit tests, and at least, in my opinion, it works quite nicely!

Often while writing unit tests, you would want to write various failing and passing test cases, which often leads to quite a bit of duplication. For example:

TEST(TwoSumTests, PassingTest) {
  std::vector<int> nums{2, 7, 11, 15};
  auto got = twoSum(nums, 9);
  std::vector<int> expected{0, 1};
  EXPECT_EQ(got, expected);
}

TEST(TwoSumTests, FailingTest) {
  std::vector<int> nums{2, 7, 11, 15};
  auto got = twoSum(nums, 9);
  std::vector<int> expected{0, 123};
  EXPECT_NEQ(got, expected);
}

So even with this elementary example, we can see that most of the code in the test case is duplicated and/or boilerplate. So we can do better. For example, with quite a simple table for tests, we can loop through multiple tests without duplication and easily add new tests.

Regarding testing functions, we often care about what is going in and what should go out. Everything else in unit tests is often boilerplate. So where table-driven design help in setting up these input and expected outputs.

typedef struct {
  std::vector<int> nums;
  int target;
  std::vector<int> expected;
} twoSumTestCase;

TEST(TwoSumTests, BasicAssertions) {
  twoSumTestCase tests[] = {
    {
      std::vector<int>{2, 7, 11, 15},
      9,
      std::vector<int>{0, 1},
    },
    {
      std::vector<int>{3, 2, 4},
      6,
      std::vector<int>{1, 2},
    },
    {
      std::vector<int>{3, 3},
      6,
      std::vector<int>{0, 1},
    },
  };
  for (auto t : tests) {
    auto got = twoSum(t.nums, t.target);
    EXPECT_EQ(got, t.expected);
  }
}

So when we run this we can easily run all the tests at once:

[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from TwoSumTests
[ RUN      ] TwoSumTests.BasicAssertions
[       OK ] TwoSumTests.BasicAssertions (0 ms)
[----------] 1 test from TwoSumTests (0 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (0 ms total)
[  PASSED  ] 1 test.

To demonstrate failing test case, let’s add new test there:

{
  std::vector<int>{3, 3},
  6,
  std::vector<int>{0, 2},
},

We get the following output:

Expected equality of these values:
  got
    Which is: { 0, 1 }
  t.expected
    Which is: { 0, 2 }
[  FAILED  ] TwoSumTests.BasicAssertions (0 ms)
[----------] 1 test from TwoSumTests (0 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (0 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] TwoSumTests.BasicAssertions

 1 FAILED TEST

Extending test cases

Of course, with that information, test logs can be pretty misleading. Thankfully, we can just change the table to our liking. For example, we could add names to the tests:

typedef struct {
  std::string name; 
  std::vector<int> nums;
  int target;
  std::vector<int> expected;
} twoSumTestCases;

That we could then use on diagnostic messages in GTest’s macros:

EXPECT_TRUE(false) << "diagnostic message"; // format to your liking

With this kind of formatting, we easily extend these test cases with just playing around a little bit with your test struct, so it could involve enumeration, subtests and much more. Which could help you making your tests/code easier to fix, but also easier for adding new useful and good tests.

Adventures in Linear Types

Lately, I have dedicated a large part of my free time to audio software. I have done this mainly out of interest in the subject due to my history in music. But at the same time, I also thought writing audio software could be a fun passion project or even a small business that I could work on alongside my day job. I don’t see myself replacing my current job with this, but maybe I could dedicate 20% of my work time to it.

The world of audio software is a pretty exciting place. It involves a lot of low-level systems like signals and real-time operations, complex math at times, and something you can feel or at least hear. And what’s great, I don’t have any background in this stuff!

Now I have programmed most of my life and played around with RTOS, but when it comes to writing algorithms for manipulating digital signals, that’s new stuff for me. However, I have experience with the topic from the user point of view since I have been making music for almost as long as I have programmed. This experience involves playing instruments, how effects affect the sound, how mixing and mastering works etc. But what do linear types have to do with any of this?

Signals in the Wild

Like I said earlier, signal processing (not necessarily just audio) is very low-level stuff. So when working with signals in software, you often need to work with C or C++. This is mainly due to the performant and close-to-hardware nature of the languages required to handle and manipulate signals optimally and efficiently.

Digital signal processing is also full of algorithms. The standard workflow for people in this industry seems to be that these applications are prototyped on some high-level language before being produced. Often in languages/tools like MATLAB, Octave, Mathematica, and similar, very heavily math-oriented languages and tools. Julia has appeared to grow in popularity also in this world. These high-level languages are mainly used due to the speed of development.

It is also not uncommon to see FPGA being used in these applications. For several reasons, they are reconfigurable hardware, so you can tailor and deploy computation units and data buses specifically designed for your particular needs. So if you’re working with digital hardware, you can’t go wrong with FPGAs. In this world, VHDL or Verilog comes in handy.

As you can see, overall, the applications tend to involve a lot of different low-level concepts but, at the same time, high-level topics in terms of prototyping. But, as the post’s header might hint, I’m not interested in the prototyping aspects of signal processing since I think those are all well and good. Instead, I’m interested in having a small thought experiment on whether the low-level elements could be improved somehow.

I would consider myself a functional programmer first and foremost, even though I mainly write imperative and/or object-oriented code, at least professionally. Now in my free time and in non-trivial side projects (that are not signal processing related), I like to work with weird languages like Haskell or Common Lisp. Unfortunately, as I mentioned above, almost all the work in this signal-processing world is written in C or C++, emphasizing the latter. However, I completely understand why these languages are used since we talk about real-time programming, so latency needs to be minimized.

“Real-time” can be understood that the program has to produce the correct result but also on a certain amount of time (which varies between systems).

If we use audio processing as an example, typically, you would have some sort of processing function in your code that would work in the audio callback:

process :: BufferRef -> ()

This function would get its callback from a sound card or some input device, e.g. microphone. After it has received its callback, this block of code (whatever might be inside it) would write the corresponding audio data into the given buffer. Which would then be played by the speakers or vice-versa when recording.

This procedure is basically what should happen in real-time lots of times when we are doing audio processing. Audio software is often set up to send these audio callbacks from a high-priority “real-time” thread with a very short latency between the callbacks, ~1-10ms (varies between systems).

To achieve this minimal latency between callbacks, you often can’t rely on garbage collection since you can’t be sure when your program launches it. I dare to say that most of the software benefits from GC significantly, but in the audio, making GC right is very hard. What makes it hard is that if GC launches at the wrong time or the latency between callbacks gets too large, garbage data will leak into the buffer, causing unwanted sounds.

Most other software might only see a slight latency in their computations if they do profiling, so that might not be the end of the world, of course, depending on your context. But in audio, you cannot let that happen since you can literally hear that glitch, which is unforgivable.

When it comes to C or C++, I think everyone knows their foot guns that involve memory management. Thankfully in modern C++, it’s not that bad (as long as you follow core guidelines), but there is still a lot of unnecessary baggage when it comes to safe code in these languages.

Could there be any way we could use garbage-collected language while doing “real-time” operations and how that could be achieved?

Linear Types

GHC 9.0 introduced support for Linear Haskell, which can be enabled with -XLinearTypes. One of the significant use cases for linear types is implementing latency-sensitive real-time services and analytics jobs. As I mentioned earlier a major issue in this use case is GC pauses, which can happen at arbitrary points for reasonably long periods. The problem is exacerbated as the size of the working set increases. The goal is to partially or entirely eliminate garbage collection by controlling aliasing and making memory management explicit but safe.

So what, then, are linear types. Henry Baker described linear types and their benefits in his paper Lively Linear Lisp — ‘Look Ma, No Garbage!’ and also on “Use-once” variables and linear objects: storage management, reflection and multi-threading. As you can see, we are not talking about a new topic. We are talking about types whose instances must be held in a linear variable. A variable is linear if it’s accessed exactly once in its scope. Same for a linear object, their reference count is always 1. When we have this safety guarantee on the type-level, we can avoid synchronization and GC and also, we could update linear objects in place since that would be referentially transparent.

Avoiding Garbage Collection

So why can we avoid synchronization and GC with linear types? If we would consider the following function as an example:

linearFunc :: a %1-> b

On their own, linear types only gives a type to functions that consume their argument exactly once when their result is consumed precisely once. So, alone, they don’t make your programs any faster or safer for resources. Still, they allow you to write many optimizations and resource-safe abstractions that weren’t possible before.

First, since linear values can only be used once, these values cannot be shared. This means that, in principle, they shouldn’t be subject to GC. But this depends on the value consumer since that may very well do some sort of de-allocation on the spot. One way to mitigate this could be to store these values to heap outside of GC’s control.

While utilizing heap for these values alone would diminish the GC, it would introduce some overhead to your program, which could increase the total running time of your application. But if we continue using real-time systems as an example, this isn’t necessarily bad.

In real-time systems, optimizations often happen only to the worst-case scenarios. This is because you don’t really care about your latencies as long as they stay within the particular window. But you watch that those latencies should never go above your maximum limit, and this is primarily where optimisation utilizing linear types could come in handy.

Practical Linear Types

Linear types are a blessing in GC languages if you intend to do anything safely in the low-level world. I would like to continue this post with some practical examples of how Haskell utilizes these types and how they can make low-level optimizations and resources safer in your Haskell code, but that deserves its own post.