Table-driven test design in C++ with GoogleTest

24th of April, 2022

I've always found tremendous value from testing my software. Especially what might be closest to home for developers - or at least should be - are unit tests. While unit tests are not necessarily the best way of making safe working code (this often requires little bit more exhaustive testing) but at least they're very beneficial for your future self and/or co-workers who might be working with your code since with them you can quickly see any new errors that might've come from regression.

That being said, often writing unit tests can be quite cumbersome. I would love to see some mature tooling for randomized testing like QuickCheck in Haskell (and later some other languages too) that would "just work", but often something like that just isn't possible especially when the project reaches a certain degree of complexity. Tests and test suites should be designed on their own as well as your code itself. Unfortunately, people tend to forget this. In these kind of cases quite simple table-driven test design can come to help!

I first stumbled upon table-driven test design when I was working with Go, since in there this seems to be quite popular way of doing unit tests, and at least in my opinion it works quite nicely!

Often while writing unit tests, you would want to write various failing and passing test cases, which often leads to quite a bit of duplication. For example:

Examples here are written in C++20 using Google Test.

TEST(TwoSumTests, PassingTest) {
  std::vector<int> nums{2, 7, 11, 15};
  auto got = twoSum(nums, 9);
  std::vector<int> expected{0, 1};
  EXPECT_EQ(got, expected);
}

TEST(TwoSumTests, FailingTest) {
  std::vector<int> nums{2, 7, 11, 15};
  auto got = twoSum(nums, 9);
  std::vector<int> expected{0, 123};
  EXPECT_NEQ(got, expected);
}

So even with this very simple example we can see that most of the code in the test case is duplicated and/or boilerplate. So we can do better. With quite simple table for tests we can loop through multiple tests without the need for duplication and easiness for adding new tests.

When it comes to testing functions, often we care about only what is going in and what should go out. Everything else in unit tests are often boilerplate. So where table-driven design help is setting up these input and expected outputs.

typedef struct {
  std::vector<int> nums;
  int target;
  std::vector<int> expected;
} twoSumTestCases;

TEST(TwoSumTests, BasicAssertions) {
  twoSumTestCases tests[] = {
    {
      std::vector<int>{2, 7, 11, 15},
      9,
      std::vector<int>{0, 1},
    },
    {
      std::vector<int>{3, 2, 4},
      6,
      std::vector<int>{1, 2},
    },
    {
      std::vector<int>{3, 3},
      6,
      std::vector<int>{0, 1},
    },
  };
  for (auto t : tests) {
    auto got = twoSum(t.nums, t.target);
    EXPECT_EQ(got, t.expected);
  }
}

So when we run this we can easily run all the tests at once:

[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from TwoSumTests
[ RUN      ] TwoSumTests.BasicAssertions
[       OK ] TwoSumTests.BasicAssertions (0 ms)
[----------] 1 test from TwoSumTests (0 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (0 ms total)
[  PASSED  ] 1 test.

To demonstrate failing test case, let's add new test there:

{
  std::vector<int>{3, 3},
  6,
  std::vector<int>{0, 2},
},

We get the following output:

Expected equality of these values:
  got
    Which is: { 0, 1 }
  t.expected
    Which is: { 0, 2 }
[  FAILED  ] TwoSumTests.BasicAssertions (0 ms)
[----------] 1 test from TwoSumTests (0 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (0 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] TwoSumTests.BasicAssertions

 1 FAILED TEST

Extending test cases

Of course with that information, test logs can be quite misleading. Thankfully, we can just change the table to our liking. For example, we could add names to the tests:

typedef struct {
  std::string name; 
  std::vector<int> nums;
  int target;
  std::vector<int> expected;
} twoSumTestCases;

That we could then use on diagnostic messages in GTest's macros:

EXPECT_TRUE(false) << "diagnostic message"; // format to your liking

This kind of formatting we easily extend these test cases with just playing around a little bit with your test struct, so it could involve enumeration, subtests and much more. Which could help you in making fixing your tests/code easier, but also easier for adding new useful and good tests.

Tags: cpp, programming

Now playing: Justin Townes Earle - Appalachian Nightmare


Why modern programming languages are like this?

20th of April, 2022

For some weird reason I've always enjoyed the topic of performance and optimizations tremendously. Especially trying to understand why some compiler does various optimizations on an instruction level to the hardware. It's quite a trip to see years of expertise on hardware design and how it works in modern computing. But recently that got me wondering is there really a point to that?

Now don't get me wrong, less instructions usually means slightly faster computation, and that's a good thing, right? But considering modern hardware, is that necessary? Should we be concerned about the fact that our compilers work that hard to minimize the amount of instructions in the output of the code? Of course that would make sense if we would be living in a world where computation would still be slow (I don't know, 50s? 70s?).

These kind of actions to minimize the amount of instructions can easily lead up to some funky situations where familiar operations start behaving unintuitively. For example, common operations like '+' or '<'. In these kind of situations, if the program happens to behave incorrectly, often it's considered to be programmer's fault.

In modern hardware, computations are more or less free, and we almost flirt with the idea of concrete Turing machine with infinite amount of memory. Shouldn't the fact that we mostly use this kind of hardware be reflected in the programming languages also? Especially if we consider the fact that one cache miss can easily lead up to a way bigger run-time compared to hundreds of add instructions. If those extra instructions don't increase the size of the data or the program itself, what's wrong with these extra instructions? Especially considering the fact that we could add quite a bit of run-time computation to the program without affecting too much the total running time of the program.

So instead of focusing on minimizing instructions in the output of the language, we could focus on improving the semantics of the language and pretty much completely remove these common hard-to-find errors from our software. This is especially present in many language where we have multiple different features that does more or less the same thing but they might have slight difference when it comes to the performance.

When we start having multiple of these different features that work pretty much the same way to each other, languages easily start having excess amount of features. Using the large amount various features in one code base can easily lead to complex and hard to understand programs. This then often leads that the used features are limited in one code base, so that programmers in the project only use common subset of the language.

Great example in "modern" programming languages of this is C++ regular vs. virtual functions. These kind of features easily lead to a fact that programmers start wasting their precious time on different micro-optimizations which usually in the grand scheme of things aren't really worthwhile. Especially considering the fact that when we start to focus on these kind of optimizations, we can easily loose focus from the stuff that really matters, large-scale behavior of the program.

Can we fix this anyway? Probably no, since we are already so invested in these kind of languages. We can point fingers to various places and blame them that we are in this situation. New programming language doesn't really solve this issue since we just can't rewrite everything in it, and the migration would be a really slow process. Can we fix existing languages? Probably no, which is why we rely on various external tools to analyze and check our programs and various conventions to follow so that we are able to write the best code possible in these languages.

So modern computing is very exciting but it also can be a mess…

Tags: computers, programming

Now playing: Gillian Welch - The Way It Will Be


Adventures in linear types

9th of January, 2022

Lately, I have dedicated a large part of my free time to audio software. I have done this mainly out of interest in the subject due to my history in music. But at the same time, I also thought writing audio software could be a fun passion project or even a small business that I could work on alongside my day job. I don't see myself replacing my current job with this, but maybe I could dedicate 20% of my work time to it.

The world of audio software is a pretty exciting place. It involves a lot of low-level systems stuff like signals and real-time operations, complex math at times, and something that you can feel or at least hear. And what's great, I don't have any background in this stuff!

Now I have programmed most of my life and played around with RTOS, but when it comes to writing algorithms for manipulating digital signals, that's new stuff for me. However, I have experience with the topic from the user point of view since I have been making music for almost as long as I have programmed. This experience involves playing instruments, how effects affect the sound, how mixing and mastering works etc. But what do linear types have to do with any of this?

Signals in the wild

Like I said earlier, signal processing (not necessarily just audio) is very low-level stuff. So when it comes to working with signals in software, you often need to work with C or C++. This is mainly due to the performant and close to hardware nature of the languages required to handle and manipulate signals optimally and efficiently.

Digital signal processing is also full of algorithms. Standard workflow for people working in this industry seems to be that these applications are prototyped on some high-level language before being produced. Often in languages/tools like MATLAB, Octave, Mathematica, and similar very heavily math-oriented languages and tools. Julia has appeared to grow in popularity also in this world. These high-level languages are mainly used due to the speed of development.

It is also not uncommon to see FPGA being used in these applications. For several reasons: they are reconfigurable hardware, so you can tailor and deploy on them computation units and data buses specifically designed to your particular needs. So if you're working with digital hardware, you can't go wrong with FPGAs. In this world, VHDL or Verilog comes in handy.

As you can see, overall, the applications tend to involve a lot of different low-level concepts, but at the same time, high-level topics in terms of prototyping. But, as the post's header might hint, I'm not interested in the prototyping aspects of signal processing since I think those are all well and good. Instead, I'm interested in having a small thought experiment on whether the low-level elements could be improved somehow.

I would consider myself a functional programmer first and foremost, even though I mainly write imperative and/or object-oriented code, at least professionally. Now in my free time and in non-trivial side projects (that are not signal processing related), I like to work with weird languages like Haskell or Common Lisp. Unfortunately, as I mentioned above, almost all the work in this signal processing world is written in C or C++, emphasizing the latter. However, I completely understand why these languages are used since we talk about real-time programming, so latency needs to be minimized.

"Real-time" can be understood that the program has to produce the correct result but also on a certain amount of time (which varies between systems).

If we use audio processing as an example, typically, you would have some sort of processing function in your code that would work in the audio callback:

process :: BufferRef -> ()

This function would get its callback from either a sound card or either some input device, e.g. microphone. After it has received its callback, this block of code (whatever might be inside it) would write the corresponding audio data into the given buffer. Which would then be played at the speakers or vice-versa when recording.

This procedure is basically what should happen in real-time lots of times when we are doing audio processing. Audio software is often set up to send these audio callbacks from a high priority "real-time" thread with a very short latency between the callbacks, ~1-10ms (varies between systems).

To achieve this minimal latency between callbacks, you often can't rely on stuff like garbage collection since you can't be sure when your program launches it. I dare to say that most of the software benefits from GC significantly, but in the audio making GC right is very hard. What makes it hard is that if GC launches at the wrong time or the latency between callbacks gets too large, garbage data will leak into the buffer, causing unwanted sounds.

Most other software might only see a slight latency in their computations if they do profiling, so that might not be the end of the world, of course, depending on your context. But in audio, you cannot let that happen since you can literally hear that glitch, which is unforgivable.

When it comes to C or C++, I think everyone knows their foot guns that involve memory management. Thankfully in modern C++, it's not that bad (as long as you follow core guidelines), but there is still a lot of unnecessary baggage when it comes to safe code in these languages.

Could there be any way we could use garbage collected language while doing "real-time" operations and how that could be achieved?

Linear types

GHC 9.0 introduced support for Linear Haskell, which can be enabled with -XLinearTypes. One of the significant use cases for linear types is implementing latency-sensitive real-time services and analytics jobs. As I mentioned earlier, a major issue in this use case is GC pauses, which can happen at arbitrary points for reasonably long periods. The problem is exacerbated as the size of the working set increases. The goal here is to partially or entirely eliminate garbage collection by controlling aliasing and making memory management explicit but safe.

So what then are linear types. Henry Baker described linear types and their benefits in his paper Lively Linear Lisp — 'Look Ma, No Garbage!' and also on "Use-once" variables and linear objects: storage management, reflection and multi-threading. As you can see, we are not talking about a new topic. Basically, we are talking about types whose instances must be held in a linear variable. A variable is linear if it's accessed exactly once in its scope. Same for a linear object, their reference count is always 1. When we have this safety guarantee on type-level, we can avoid synchronization and GC and also, we could update linear objects in place since that would be referentially transparent.

Avoiding garbage collection

So why we can avoid synchronization and GC with linear types? If we would consider the following function as an example:

linearFunc :: a %1-> b

On their own, linear types only gives a type to functions that consume their argument exactly once when their result is consumed precisely once. So alone, they don't make your programs any faster or more safe for resources. Still, they allow you to write many different optimizations and resource-safe abstractions that weren't possible before.

First, since linear values can only be used once, these values cannot be shared. This means that, in principle, they shouldn't be a subject to GC. But this is very dependant on the consumer of the values since that may very well do some sort of de-allocation on the spot. One way to mitigate this could be to store these values to heap outside of GC's control.

While utilizing heap for these values alone would diminish the GC, it would introduce some overhead to your program, which could increase the total running time of your application. But if we continue to use real-time systems as an example, this isn't necessarily a bad thing.

In real-time systems, optimizations often happen only to the worst-case scenarios. This is because you don't really care about your latencies as long as they stay within the particular window. But you do care that those latencies should never go above your maximum limit, and this is primarily where optimizations utilizing linear types could come in handy.

Practical linear types

Linear types are a blessing in GC languages if you intend to do anything safely in the low-level world. I would like to continue this post with some practical examples of how Haskell utilizes these types and how they can make low-level optimizations and resources safer in your Haskell code, but that deserves their own post.

Tags: computers, dsp, haskell, lisp, programming


Code reading

23rd of June, 2021

Code reading has always been this activity that I've just done without really giving any thought to it. But despite this, now, when I look back at this habit, I see it as immensely beneficial. This habit caught my attention when I was reading Peter Seibel's book Coders at Work, in which there is a section where Peter asks about code reading from his interviewees. His interviewees tended to be unanimous that code reading is very beneficial. Still, while reading his interviews, it left a picture that the practice itself seemed to be lacking even within those heavyweight programmers. Exception in this being Brad Fitzpatrick and, obviously, Donald Knuth. If these programmers speak for this practice but don't do it in the wild, then who does? This overall, it seems pretty odd to me. Seibel made a great comparison regarding this when he compared programmers to novelists, where if novelist hasn't ready anyone else's publications, it would be unheard of.

I've always enjoyed reading others' source code mainly, let's face it, to steal some ideas. But doing this, I've received a long list of different lessons, ideas, and patterns, which I've been able to utilize frequently in most of the work that I've done after these revelations.

Pattern Matching

One of the most significant benefits that I've learned while code reading is that you're able to learn various patterns after a while. Sure, every project might seem cluttered and hard to understand for a while, but when you get the gist of it, you start to realize why this or that has been done the way it is. Furthermore, when you've understood some of these patterns, it gets much more comfortable to start noticing them in other similar or not-so-similar projects. Fundamentally this means the graph of WTF-per-seconds starts getting less and less.

I have also noticed that pattern matching helps understand the whole project under study itself. It would be best to try to comprehend a large open-source project at once but in small pieces. Then, when one of these pieces is understood, it can help tremendously understand the other pieces.

Benefits of reinventing

It can often be pretty hard to understand the functionality of some part of an extensive program by looking at the code. So quite often, to get a better grasp of foreign code is to reimplement the way you would write it. This way, you're able to abstract the bread and butter out of the program and utilize it however you might want.

This kind of reimplementing can be quite hard on bigger projects. The best way to reinvent something in those projects is to change something and see changes in the new compilation. For example, try to change some text in some menu or output. This way, you can easily test how well you understand the foreign code.

Code as a literature medium

Many people say that code is not literature because you read it differently from prose. In my opinion, this doesn't necessarily need to be the case. Overall, code is written for humans first and then machine second. An excellent example of this is Robert C. Martin's ravings, in which he often recites that the "code should read like prose to be clean", which I tend to agree with. Another good one is Donald Knuth's approach to literate programming. However, the latter one is more about embedding code pieces amidst what one could call prose. Nonetheless, this kind of system makes the code much more readable since writing is such a big part.

One thing that I believe makes people think code is not literature is syntax highlighting. I don't use it. For some reason, I never grew used to colored text. Of course, I might be a bit biased, but when I turn on syntax highlighting, I tend to focus on the wrong things in the code, making it so that it doesn't read like prose anymore. Removing syntax highlighting has allowed me to grasp the whole structure better. Is this true, or does it work for everyone? I don't think so, but that's how I feel.

Code reading club

Based on these thoughts and Seibel's ideas, I decided to try some code reading clubs in my workplace. Initially, what I had in mind for this kind of club was choosing one library/program per week/month or whatever and then dissecting the main logic behind it and discussing it. However, I quickly realized that this would most likely work since people have different interests in programming. For example, I don't have an interest in various GUI applications or other frontend technologies, even though they might have some good ideas behind them.

So a much better approach would most likely be that person chooses one library/program and then dissects it sharing the findings to the rest of the group. This dissection done by someone else than yourself could easily inspire you and others to dive more deeply into the code itself, even though it might be a little bit outside your interests. That being said, exploring the world around your circles can be mind-opening since you can easily find new approaches to the same problems that you might face in your work.

I want to give this kind of approach a good try, and I could write some "deep thoughts" about it in the form of a review.

Tags: computers, programming