Haskell and C++ FFI for Fun and Profit

As some of you may know, I have a soft spot for functional programming. So most of the code I tend to write outside my professional work tends to be written in Haskell. Sometimes some other functional languages too, but Haskell is definitely the one that I tend to default if I’m able to have a choice. Unfortunately, the world we live in is still living in a world of imperative languages, and due to the sheer amount of that code, we most likely will be living in that world for years to come; I can’t say forever since the planet is not here forever, but we’re talking about a long, long time. It’s also the same for my own professional life. Even though I work as a DevOps consultant, I’m mainly working on projects where I’m contributing code, mainly in Go and C++ and, to a lesser extent, in JavaScript and Python. Of course, I would like to work more with something like Haskell for a living, but at the same time, I’m not very fanatical when it comes to tooling as long as it gets the job done. But still, the interaction between imperative and functional worlds is fascinating. So we are talking about foreign function interface – FFI – here.

Most code I tend to write, at least considering lines of code, is definitely, Haskell and C++, and I’m quite comfortable with both of those languages, although the latter one may be a little more reluctantly. Lately, I’ve been banging my head against the wall with the FFI of Haskell since I wanted to write a particular piece of code in mainly Haskell, but I needed something from the world of C++. Could have I written this in just Haskell? Probably, but despite enjoying functional programming more over imperative code, certain data structures, algorithms and libraries work better in the C++ world and sometimes you might need to “extra oomph” when it comes to performance.

Okay, better is a strong word for this, maybe better on how they work in the imperative world and but they don’t translate one to one to fully functional and pure world of Haskell where there are better options for data structures and algorithms. Sure some libraries are only written in C++, so there is nothing that can be done with it. For functional data structures, I definitely recommend Purely Functional Data Structures by Chris Okasaki.

Generally speaking, when it comes to FFI, interacting with C is pretty straightforward, but when C++ comes into play, some extra ceremony is required due to the nature of the language with things like unstable ABI, more extensive language and more complex features. Thankfully, C++ offers an easy way to write C ABI compatible code with the extern "C" keyword, which makes it so that the code can be called from C. But there is a caveat in this. While you can use C++ features inside the function body, when you use extern "C" functions type signature needs to be compatible with C. So no fancy STL features etc., here.

Also, when we talk about C and/or C++, memory management comes into question. So if you want to bind those languages to some memory-managed language like Haskell, you need to ensure that the memory gets handled correctly. C++ offers fancy features like RAII, smart pointers and stuff for making memory management a little bit easier, but that’s not the case in C.

Let’s start by creating a small Cabal project and some sample C++ library that we would like to interact with from Haskell.

common base
  ghc-options: -Wall -Wextra -Wno-orphans -Wno-name-shadowing
  default-language: Haskell2010
  build-depends: base ^>=4.16.3.0

executable arith
  import: base
  main-is: Main.hs
  hs-source-dirs: app

  -- C++ bits
  cxx-options: -std=c++20 -Wall -Werror -Wextra
  cxx-sources: cbits/arith_capi.cc cbits/arith.cc
  include-dirs: cbits
  extra-libraries: stdc++

While otherwise, it’s a reasonably standard Cabal boilerplate, the interesting bits are the lines relating to C++. Basically, what we do here is define some compiler options for the C++ compiler, where the C++ source files are located, and the relevant header files. Commonly, in Haskell, if you’re library/application has had anything related to C/C++ files relevant for those have usually resided in a directory called cbits, but nothing forces you to follow this convention. When that’s done, we can proceed to write some “earth-shattering” C++ library for our application.

// arith.h

#pragma once

struct arith {
  arith() noexcept;

  int add(int x, int y) noexcept;
  int sub(int x, int y) noexcept;
  int mult(int x, int y) noexcept;
  int div(int x, int y) noexcept;
};
// arith.cc

#include "arith.h"

arith::arith() noexcept {}

int arith::add(int x, int y) noexcept { return x + y; }
int arith::sub(int x, int y) noexcept { return x - y; }
int arith::mult(int x, int y) noexcept { return x * y; }
int arith::div(int x, int y) noexcept { return x / y; }

We’ll just define a simple arith struct/class with some elementary arithmetic operations. Nothing too fancy. This will work as our C++ library that we’ll interact with via Haskell. After that’s done, we need to provide some simple C API for this library so that we can interact with the library via the stable C ABI.

// arith_capi.h

#pragma once

#ifdef __cplusplus
extern "C" {
#endif

typedef struct arith arith;

extern arith *arith_new();

extern void arith_delete(arith *p);

extern int arith_add(arith *p, int x, int y);
extern int arith_sub(arith *p, int x, int y);
extern int arith_mult(arith *p, int x, int y);
extern int arith_div(arith *p, int x, int y);

#ifdef __cplusplus
}
#endif
// arith_capi.cc

#include "arith_capi.h"
#include "arith.h"

extern "C" {
  struct arith *arith_new() { return new arith(); }

  void arith_delete(arith *p) { delete p; }

  int arith_add(arith *p, int x, int y) { return p->add(x, y); }

  int arith_sub(arith *p, int x, int y) { return p->sub(x, y); }

  int arith_mult(arith *p, int x, int y) { return p->mult(x, y); }

  int arith_div(arith *p, int x, int y) { return p->div(x, y); }
}

In our C API, you’ll notice that we need to wrap our functions inside extern "C" to ensure that they’re compatible with C ABI. Also since extern "C" is a C++ keyword, we’ll wrap it in #ifdef __cplusplus directive to ensure that it gets only used if we happen to call this via C++. In the actual implementation side, you can notice that we use new and delete to do the memory management. The thing to note here is that using those keywords in “modern C++” is very much frowned upon since the language offers better ways to do that management with features like RAII, smart pointers etc., which basically makes it so that programmer don’t need memory management explicitly like we do here, but instead, they can let the compiler do it for you. We on the other need to use those keywords since we are similar management but from Haskell with its foreign pointers, which makes it so that we are able to leave the memory management to Haskell’s runtime and garbage collector.

Now we have the C++ bits done, we can proceed on how to interact with that via Haskell. All Haskell’s FFI features reside behind GHC’s {-# LANGUAGE ForeignFunctionInterface #-} language extension. So first, we need to include that in our Haskell files (either on top of the file or in the project’s Cabal file), and we can already import some of the needed modules.

{-# LANGUAGE ForeignFunctionInterface #-}

module Main where

import Control.Exception ( mask_ )
import Foreign.Ptr ( FunPtr, Ptr )
import Foreign.C.Types ( CInt(..) )
import Foreign.ForeignPtr ( ForeignPtr, newForeignPtr, withForeignPtr )

What are we importing here?

After the importing shenanigans, we can proceed on to make foreign imports to our code so that we can actually call the C++ code we just wrote.

data Arith

foreign import ccall unsafe "arith_capi.h arith_new" c_arithNew :: IO (Ptr Arith)
foreign import ccall unsafe "arith_capi.h &arith_delete" c_arithDelete :: FunPtr (Ptr Arith -> IO ())
foreign import ccall unsafe "arith_capi.h arith_add" c_arithAdd :: Ptr Arith -> CInt -> CInt -> IO CInt
foreign import ccall unsafe "arith_capi.h arith_sub" c_arithSub :: Ptr Arith -> CInt -> CInt -> IO CInt
foreign import ccall unsafe "arith_capi.h arith_mult" c_arithMult :: Ptr Arith -> CInt -> CInt -> IO CInt
foreign import ccall unsafe "arith_capi.h arith_div" c_arithDiv :: Ptr Arith -> CInt -> CInt -> IO CInt

So what’s happening here:

Finally, you’re actually able to call these functions.

-- | Create a new foreign object that will be cleaned after it's not in use
-- anymore. It also uses mask_ in case the pointer leaks if an exception happens.
newArith :: IO (ForeignPtr Arith)
newArith = mask_ c_arithNew >>= newForeignPtr c_arithDelete

main :: IO ()
main = newArith >>= \arith -> withForeignPtr arith $ \ptr -> do
  -- Foreign object is now unwrapped to a foreign pointer which you can use in
  -- any FFI function you described above.
  c_arithAdd ptr 1 1 >>= print
  c_arithSub ptr 1 1 >>= print
  c_arithMult ptr 2 2 >>= print
  c_arithDiv ptr 2 2 >>= print

Now you should be able to run your program and interact with C++ in safe manner from the comfortable world of Haskell.

$ cabal run
... cabal stuff ...
2
0
4
1

So you can see that while calling C++ from a foreign language is definitely possible, it just requires a bit more ceremony than calling C from these kinds of languages. I initially started to bang my head against the wall with these FFI shenanigans just for the need to use some C++ interfaces that didn’t offer C API, so hopefully, this proves to be beneficial to some. At least, I can use this to freshen my memory in the future since I can guarantee that I’ll forget about it.

Table-Driven Test Design in C++ with GoogleTest

I’ve always found tremendous value in testing my software. Especially what might be closest to home for developers - or at least should be - are unit tests. While unit tests are not necessarily the best way of making safe working code (this often requires a little bit more exhaustive testing) but at least they’re very beneficial for your future self and/or co-workers who might be working with your code since with them you can quickly see any new errors that might’ve come from regression.

That being said, often, writing unit tests can be quite cumbersome. I would love to see some mature tooling for randomized testing like QuickCheck in Haskell (and later some other languages too) that would “just work”, but often something like that just isn’t possible, especially when the project reaches a certain degree of complexity. Tests and test suites should be designed on their own as well as your code itself. Unfortunately, people tend to forget this. In these kinds of cases, quite simple table-driven test design can come to help!

I first stumbled upon table-driven test design when I was working with Go, since in there, this seems to be a quite popular way of doing unit tests, and at least, in my opinion, it works quite nicely!

Often while writing unit tests, you would want to write various failing and passing test cases, which often leads to quite a bit of duplication. For example:

TEST(TwoSumTests, PassingTest) {
  std::vector<int> nums{2, 7, 11, 15};
  auto got = twoSum(nums, 9);
  std::vector<int> expected{0, 1};
  EXPECT_EQ(got, expected);
}

TEST(TwoSumTests, FailingTest) {
  std::vector<int> nums{2, 7, 11, 15};
  auto got = twoSum(nums, 9);
  std::vector<int> expected{0, 123};
  EXPECT_NEQ(got, expected);
}

So even with this elementary example, we can see that most of the code in the test case is duplicated and/or boilerplate. So we can do better. For example, with quite a simple table for tests, we can loop through multiple tests without duplication and easily add new tests.

Regarding testing functions, we often care about what is going in and what should go out. Everything else in unit tests is often boilerplate. So where table-driven design help in setting up these input and expected outputs.

typedef struct {
  std::vector<int> nums;
  int target;
  std::vector<int> expected;
} twoSumTestCase;

TEST(TwoSumTests, BasicAssertions) {
  twoSumTestCase tests[] = {
    {
      std::vector<int>{2, 7, 11, 15},
      9,
      std::vector<int>{0, 1},
    },
    {
      std::vector<int>{3, 2, 4},
      6,
      std::vector<int>{1, 2},
    },
    {
      std::vector<int>{3, 3},
      6,
      std::vector<int>{0, 1},
    },
  };
  for (auto t : tests) {
    auto got = twoSum(t.nums, t.target);
    EXPECT_EQ(got, t.expected);
  }
}

So when we run this we can easily run all the tests at once:

[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from TwoSumTests
[ RUN      ] TwoSumTests.BasicAssertions
[       OK ] TwoSumTests.BasicAssertions (0 ms)
[----------] 1 test from TwoSumTests (0 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (0 ms total)
[  PASSED  ] 1 test.

To demonstrate failing test case, let’s add new test there:

{
  std::vector<int>{3, 3},
  6,
  std::vector<int>{0, 2},
},

We get the following output:

Expected equality of these values:
  got
    Which is: { 0, 1 }
  t.expected
    Which is: { 0, 2 }
[  FAILED  ] TwoSumTests.BasicAssertions (0 ms)
[----------] 1 test from TwoSumTests (0 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (0 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] TwoSumTests.BasicAssertions

 1 FAILED TEST

Extending test cases

Of course, with that information, test logs can be pretty misleading. Thankfully, we can just change the table to our liking. For example, we could add names to the tests:

typedef struct {
  std::string name; 
  std::vector<int> nums;
  int target;
  std::vector<int> expected;
} twoSumTestCases;

That we could then use on diagnostic messages in GTest’s macros:

EXPECT_TRUE(false) << "diagnostic message"; // format to your liking

With this kind of formatting, we easily extend these test cases with just playing around a little bit with your test struct, so it could involve enumeration, subtests and much more. Which could help you making your tests/code easier to fix, but also easier for adding new useful and good tests.