Posted on 18th of December 2022
| 1747 wordsAs some of you may know, I have a soft spot for functional
programming. So most of the code I tend to write outside my
professional work tends to be written in Haskell!
Unfortunately, the world we live in is still living in a world of
imperative languages, and due to the sheer amount of that code, we
most likely will be living in that world for years to come; I can’t
say forever since the planet is not here forever, but we’re talking
about a long, long time. It’s also the same for my own professional
life.
I’m not very fanatical when it comes to tooling as long as it gets the
job done. But still, the interaction between imperative and functional
worlds is fascinating. So we are talking about foreign function
interface – FFI – here.
Lately, I’ve been banging my head against the wall with the FFI of
Haskell since I wanted to write a particular piece of code in mainly
Haskell, but I needed something from the world of C++. Could have I
written this in just Haskell? Probably, but despite enjoying
functional programming more over imperative code, certain data
structures, algorithms and libraries work better in the C++ world and
sometimes you might need to “extra oomph” when it comes to
performance.
Okay, better is a strong word for this, maybe better on how they
work in the imperative world and but they don’t translate one to one
to fully functional and pure world of Haskell where there are better
options for data structures and algorithms. Sure some libraries are
only written in C++, so there is nothing that can be done with
it. For functional data structures, I definitely recommend Purely
Functional Data Structures by Chris
Okasaki
.
Generally speaking, when it comes to FFI, interacting with C is pretty
straightforward, but when C++ comes into play, some extra ceremony is
required due to the nature of the language with things like unstable
ABI, more extensive language and more complex features. Thankfully,
C++ offers an easy way to write C ABI compatible code with the extern "C"
keyword, which makes it so that the code can be called from
C. But there is a caveat in this. While you can use C++ features
inside the function body, when you use extern "C"
functions type
signature needs to be compatible with C. So no fancy STL features
etc., here.
Also, when we talk about C and/or C++, memory management comes into
question. So if you want to bind those languages to some
memory-managed language like Haskell, you need to ensure that the
memory gets handled correctly. C++ offers fancy features like RAII,
smart pointers and stuff for making memory management a little bit
easier, but that’s not the case in C.
Let’s start by creating a small Cabal project and some sample C++
library that we would like to interact with from Haskell.
common base
ghc-options: -Wall -Wextra -Wno-orphans -Wno-name-shadowing
default-language: Haskell2010
build-depends: base ^>=4.16.3.0
executable arith
import: base
main-is: Main.hs
hs-source-dirs: app
-- C++ bits
cxx-options: -std=c++20 -Wall -Werror -Wextra
cxx-sources: cbits/arith_capi.cc cbits/arith.cc
include-dirs: cbits
extra-libraries: stdc++
While otherwise, it’s a reasonably standard Cabal boilerplate, the
interesting bits are the lines relating to C++. Basically, what we do
here is define some compiler options for the C++ compiler, where the
C++ source files are located, and the relevant header files. Commonly,
in Haskell, if you’re library/application has had anything related to
C/C++ files relevant for those have usually resided in a directory
called cbits
, but nothing forces you to follow this convention. When
that’s done, we can proceed to write some “earth-shattering” C++
library for our application.
// arith.h
pragma once
struct arith {
arith() noexcept;
int add(int x, int y) noexcept;
int sub(int x, int y) noexcept;
int mult(int x, int y) noexcept;
int div(int x, int y) noexcept;
};
// arith.cc
include "arith.h"
arith::arith() noexcept {}
int arith::add(int x, int y) noexcept { return x + y; }
int arith::sub(int x, int y) noexcept { return x - y; }
int arith::mult(int x, int y) noexcept { return x * y; }
int arith::div(int x, int y) noexcept { return x / y; }
We’ll just define a simple arith
struct/class with some elementary
arithmetic operations. Nothing too fancy. This will work as our C++
library that we’ll interact with via Haskell. After that’s done, we
need to provide some simple C API for this library so that we can
interact with the library via the stable C ABI.
// arith_capi.h
pragma once
ifdef __cplusplus
extern "C" {
endif
typedef struct arith arith;
extern arith *arith_new();
extern void arith_delete(arith *p);
extern int arith_add(arith *p, int x, int y);
extern int arith_sub(arith *p, int x, int y);
extern int arith_mult(arith *p, int x, int y);
extern int arith_div(arith *p, int x, int y);
ifdef __cplusplus
}
endif
// arith_capi.cc
include "arith_capi.h"
include "arith.h"
extern "C" {
struct arith *arith_new() { return new arith(); }
void arith_delete(arith *p) { delete p; }
int arith_add(arith *p, int x, int y) { return p->add(x, y); }
int arith_sub(arith *p, int x, int y) { return p->sub(x, y); }
int arith_mult(arith *p, int x, int y) { return p->mult(x, y); }
int arith_div(arith *p, int x, int y) { return p->div(x, y); }
}
In our C API, you’ll notice that we need to wrap our functions inside
extern "C"
to ensure that they’re compatible with C ABI. Also since
extern "C"
is a C++ keyword, we’ll wrap it in #ifdef __cplusplus
directive to ensure that it gets only used if we happen to call this
via C++. In the actual implementation side, you can notice that we use
new
and delete
to do the memory management. The thing to note here
is that using those keywords in “modern C++” is very much frowned upon
since the language offers better ways to do that management with
features like RAII, smart pointers etc., which basically makes it so
that programmer don’t need memory management explicitly like we do
here, but instead, they can let the compiler do it for you. We on the
other need to use those keywords since we are similar management but
from Haskell with its foreign pointers, which makes it so that we are
able to leave the memory management to Haskell’s runtime and garbage
collector.
Now we have the C++ bits done, we can proceed on how to interact with
that via Haskell. All Haskell’s FFI features reside behind GHC’s {-# LANGUAGE ForeignFunctionInterface #-}
language extension. So first,
we need to include that in our Haskell files (either on top of the
file or in the project’s Cabal file), and we can already import some
of the needed modules.
{-# LANGUAGE ForeignFunctionInterface #-}
module Main where
import Control.Exception ( mask_ )
import Foreign.Ptr ( FunPtr, Ptr )
import Foreign.C.Types ( CInt(..) )
import Foreign.ForeignPtr ( ForeignPtr, newForeignPtr, withForeignPtr )
What are we importing here?
mask_
is needed to avoid leaking the pointer in case an async
exception occurs between allocation and wrapping it in a foreign
pointer.
Foreign.Ptr
is a module holding pointers to foreign data from
which we’re using only FunPtr
, a pointer to a function, and Ptr
,
general pointer to an object.
Foreign.C.Types
holds the mappings of C types to Haskell types,
we’re now only using only int
in C++, so Haskell type CInt
corresponds to that.
Foreign.ForeignPtr
, as mentioned above, is used for representing
an object that is maintained in a foreign language and not Haskell’s
own storage manager. The way it differs from vanilla memory type,
Ptr
, is that we can associate finalizers to it that can be invoked
by Haskell’s storage manager, essentially cleaning it from all the
references.
After the importing shenanigans, we can proceed on to make foreign
imports to our code so that we can actually call the C++ code we just
wrote.
data Arith
foreign import ccall unsafe "arith_capi.h arith_new" c_arithNew :: IO (Ptr Arith)
foreign import ccall unsafe "arith_capi.h &arith_delete" c_arithDelete :: FunPtr (Ptr Arith -> IO ())
foreign import ccall unsafe "arith_capi.h arith_add" c_arithAdd :: Ptr Arith -> CInt -> CInt -> IO CInt
foreign import ccall unsafe "arith_capi.h arith_sub" c_arithSub :: Ptr Arith -> CInt -> CInt -> IO CInt
foreign import ccall unsafe "arith_capi.h arith_mult" c_arithMult :: Ptr Arith -> CInt -> CInt -> IO CInt
foreign import ccall unsafe "arith_capi.h arith_div" c_arithDiv :: Ptr Arith -> CInt -> CInt -> IO CInt
So what’s happening here:
Finally, you’re actually able to call these functions.
-- | Create a new foreign object that will be cleaned after it's not in use
-- anymore. It also uses mask_ in case the pointer leaks if an exception happens.
newArith :: IO (ForeignPtr Arith)
newArith = mask_ c_arithNew >>= newForeignPtr c_arithDelete
main :: IO ()
main = newArith >>= \arith -> withForeignPtr arith $ \ptr -> do
-- Foreign object is now unwrapped to a foreign pointer which you can use in
-- any FFI function you described above.
c_arithAdd ptr 1 1 >>= print
c_arithSub ptr 1 1 >>= print
c_arithMult ptr 2 2 >>= print
c_arithDiv ptr 2 2 >>= print
Now you should be able to run your program and interact with C++ in
safe manner from the comfortable world of Haskell.
$ cabal run
# ... cabal stuff ...
2
0
4
1
So you can see that while calling C++ from a foreign language is
definitely possible, it just requires a bit more ceremony than calling
C from these kinds of languages. I initially started to bang my head
against the wall with these FFI shenanigans just for the need to use
some C++ interfaces that didn’t offer C API, so hopefully, this proves
to be beneficial to some. At least, I can use this to freshen my
memory in the future since I can guarantee that I’ll forget about it.