24th of April, 2022
I've always found tremendous value from testing my software. Especially what
might be closest to home for developers - or at least should be - are unit
tests. While unit tests are not necessarily the best way of making safe
working code (this often requires little bit more exhaustive testing) but at
least they're very beneficial for your future self and/or co-workers who might
be working with your code since with them you can quickly see any new errors
that might've come from regression.
That being said, often writing unit tests can be quite cumbersome. I would
love to see some mature tooling for randomized testing like QuickCheck in
Haskell (and later some other languages too) that would "just work", but often
something like that just isn't possible especially when the project reaches a
certain degree of complexity. Tests and test suites should be designed on
their own as well as your code itself. Unfortunately, people tend to forget
this. In these kind of cases quite simple table-driven test design can come to
help!
I first stumbled upon table-driven test design when I was working with Go,
since in there this seems to be quite popular way of doing unit tests, and at
least in my opinion it works quite nicely!
Often while writing unit tests, you would want to write various failing and
passing test cases, which often leads to quite a bit of duplication. For
example:
Examples here are written in C++20 using Google Test.
TEST(TwoSumTests, PassingTest) {
std::vector<int> nums{2, 7, 11, 15};
auto got = twoSum(nums, 9);
std::vector<int> expected{0, 1};
EXPECT_EQ(got, expected);
}
TEST(TwoSumTests, FailingTest) {
std::vector<int> nums{2, 7, 11, 15};
auto got = twoSum(nums, 9);
std::vector<int> expected{0, 123};
EXPECT_NEQ(got, expected);
}
So even with this very simple example we can see that most of the code in the
test case is duplicated and/or boilerplate. So we can do better. With quite
simple table for tests we can loop through multiple tests without the need for
duplication and easiness for adding new tests.
When it comes to testing functions, often we care about only what is going in
and what should go out. Everything else in unit tests are often
boilerplate. So where table-driven design help is setting up these input and
expected outputs.
typedef struct {
std::vector<int> nums;
int target;
std::vector<int> expected;
} twoSumTestCases;
TEST(TwoSumTests, BasicAssertions) {
twoSumTestCases tests[] = {
{
std::vector<int>{2, 7, 11, 15},
9,
std::vector<int>{0, 1},
},
{
std::vector<int>{3, 2, 4},
6,
std::vector<int>{1, 2},
},
{
std::vector<int>{3, 3},
6,
std::vector<int>{0, 1},
},
};
for (auto t : tests) {
auto got = twoSum(t.nums, t.target);
EXPECT_EQ(got, t.expected);
}
}
So when we run this we can easily run all the tests at once:
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from TwoSumTests
[ RUN ] TwoSumTests.BasicAssertions
[ OK ] TwoSumTests.BasicAssertions (0 ms)
[----------] 1 test from TwoSumTests (0 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (0 ms total)
[ PASSED ] 1 test.
To demonstrate failing test case, let's add new test there:
{
std::vector<int>{3, 3},
6,
std::vector<int>{0, 2},
},
We get the following output:
Expected equality of these values:
got
Which is: { 0, 1 }
t.expected
Which is: { 0, 2 }
[ FAILED ] TwoSumTests.BasicAssertions (0 ms)
[----------] 1 test from TwoSumTests (0 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (0 ms total)
[ PASSED ] 0 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] TwoSumTests.BasicAssertions
1 FAILED TEST
Extending test cases
Of course with that information, test logs can be quite
misleading. Thankfully, we can just change the table to our liking. For
example, we could add names to the tests:
typedef struct {
std::string name;
std::vector<int> nums;
int target;
std::vector<int> expected;
} twoSumTestCases;
That we could then use on diagnostic messages in GTest's macros:
EXPECT_TRUE(false) << "diagnostic message"; // format to your liking
This kind of formatting we easily extend these test cases with just playing
around a little bit with your test struct, so it could involve enumeration,
subtests and much more. Which could help you in making fixing your tests/code
easier, but also easier for adding new useful and good tests.
20th of April, 2022
For some weird reason I've always enjoyed the topic of performance and
optimizations tremendously. Especially trying to understand why some compiler
does various optimizations on an instruction level to the hardware. It's quite
a trip to see years of expertise on hardware design and how it works in modern
computing. But recently that got me wondering is there really a point to that?
Now don't get me wrong, less instructions usually means slightly faster
computation, and that's a good thing, right? But considering modern hardware,
is that necessary? Should we be concerned about the fact that our compilers
work that hard to minimize the amount of instructions in the output of the
code? Of course that would make sense if we would be living in a world where
computation would still be slow (I don't know, 50s? 70s?).
These kind of actions to minimize the amount of instructions can easily lead
up to some funky situations where familiar operations start behaving
unintuitively. For example, common operations like '+' or '<'. In these kind
of situations, if the program happens to behave incorrectly, often it's
considered to be programmer's fault.
In modern hardware, computations are more or less free, and we almost flirt
with the idea of concrete Turing machine with infinite amount of
memory. Shouldn't the fact that we mostly use this kind of hardware be
reflected in the programming languages also? Especially if we consider the
fact that one cache miss can easily lead up to a way bigger run-time compared
to hundreds of add instructions. If those extra instructions don't increase
the size of the data or the program itself, what's wrong with these extra
instructions? Especially considering the fact that we could add quite a bit of
run-time computation to the program without affecting too much the total
running time of the program.
So instead of focusing on minimizing instructions in the output of the
language, we could focus on improving the semantics of the language and pretty
much completely remove these common hard-to-find errors from our
software. This is especially present in many language where we have multiple
different features that does more or less the same thing but they might have
slight difference when it comes to the performance.
When we start having multiple of these different features that work pretty
much the same way to each other, languages easily start having excess amount
of features. Using the large amount various features in one code base can
easily lead to complex and hard to understand programs. This then often leads
that the used features are limited in one code base, so that programmers in
the project only use common subset of the language.
Great example in "modern" programming languages of this is C++ regular
vs. virtual functions. These kind of features easily lead to a fact that
programmers start wasting their precious time on different micro-optimizations
which usually in the grand scheme of things aren't really
worthwhile. Especially considering the fact that when we start to focus on
these kind of optimizations, we can easily loose focus from the stuff that
really matters, large-scale behavior of the program.
Can we fix this anyway? Probably no, since we are already so invested in these
kind of languages. We can point fingers to various places and blame them that
we are in this situation. New programming language doesn't really solve this
issue since we just can't rewrite everything in it, and the migration would be
a really slow process. Can we fix existing languages? Probably no, which is
why we rely on various external tools to analyze and check our programs and
various conventions to follow so that we are able to write the best code
possible in these languages.
So modern computing is very exciting but it also can be a mess…