C++ Compile Time Function Tables for Fun and Profit
A technique that I’ve recently found useful is to use compile time function tables to eliminate complicated branching logic from my code. It is especially useful where you have to perform the same actions in multiple different conditions, but the conditional logic gets overly complicated. More generally it is useful when you need to map a contiguous range of values to a small number of actions. Though the best part is that it is implemented in simple and straightforward C++20 code which gets evaluated at compile time without any sort of tricky templates or macros.
At its core the technique consists of the following items:
- A compile time
consteval
function which creates and returns an array of function pointers. - Calling the compile time function and assigning the result into a
constinit
variable at a global/file scope. - A runtime function which uses the array to look up a function to call and calls it with the required arguments.
The generated assembly for this technique ends up being a simple jump table as data and a handful of instructions used to index into it and then jump to the specified address1.
Simplest Single Argument Lookup
To best illustrate this technique I’m going to use a simple example that you may find in an interpreter, with an enum class
that represents a type and a tagged union value struct
which holds the payload, like so:
enum class Type : uint8_t
{
Void, Bool, Signed, Unsigned, Float, String, // ...
COUNT
};
struct RuntimeValue
{
Type type;
union { /* ... */ } value;
};
With these types defined I can use them to write the code for the rest of this simple example, where we only need to handle a single argument.
Note that I’m using the overloaded unary operator+ technique described in another article to make the code more concise. Likewise using unary operator+ on a lambda with no bound variables turns it into a function pointer.
// (1)
using UnaryTypeFunction = void (*)(const RuntimeValue&);
using UnaryTypeTable = std::array<UnaryTypeFunction, +Type::COUNT>;
// (2)
consteval UnaryTypeTable MakeUnaryActionTable()
{
UnaryTypeTable result{};
result[+Type::Bool] = +[](const RuntimeValue&) { /* Boolean stuff */ };
auto integer_function = +[](const RuntimeValue&) { /* Integer stuff */ };
result[+Type::Signed] = integer_function;
result[+Type::Unsigned] = integer_function;
result[+Type::Floating] = +[](const RuntimeValue&) { /* Floating Point stuff */ };
return result;
}
// (3)
static constinit auto unary_type_action_table = MakeUnaryActionTable();
// (4)
void DoUnaryTypeAction(const RuntimeValue& value)
{
auto function = unary_type_action_table[+value.type];
if (function != nullptr)
{
(*function)(value);
}
// Emit error...
}
The example above is split up into four sections, all of which can be implemented in the source (.cpp
) file, and the first three of which can be put into a private or internal namespace to hide the implementation details and make the code somewhat cleaner.
1.) Helper Definitions
Helper type definitions which simplify using the function pointer and function pointer table definitions in the code. In this example these are UnaryTypeFunction
and UnaryTypeTable
. These are not strictly necessary but they do make the code clearer and more concise.
2.) Make Table Function
A make table function MakeUnaryActionTable
which wraps up creation of the table.
There are a few things to note with this function:
- Using
consteval
is key, as it guarantees that the function will be evaluated at compile time or a compiler error will be emitted. - Using aggregate/list initialisation syntax for the
result
variable will assignnullptr
to all entries. Therefore there’s no uninitialised memory in the table, and no need to explicitly specify every entry, only the entries that we want to do something. - We can store a function pointer in a variable and assign it to multiple table entries, like with the
integer_function
variable in the example above. This also means that we can call otherconsteval
functions and pass in function pointers to them and also modify the tables that they return.
3.) Assigning the Function Table
Calling the MakeUnaryActionTable
and storing its result in a constinit
variable, ensuring that if we cannot assign it at compile time then the compiler will emit an error.
You can also use an immediately invoked consteval
lambda to create the table if that is your preference, but I much prefer to use an explicit function instead, as you can give it a proper name and it is more consistent when you want to call other helper consteval
functions.
4.) The Callable Function
The DoUnaryTypeAction
function which actually performs the action at runtime. This is the public facing API of this technique, with the rest being implementation details. This function can also be a member function of a class even if the other components are not.
Multiple Argument Lookup
This technique actually shines when you have to execute code based on multiple input arguments, which would normally result in a complicated tree of conditional if/else
and switch
statements, but instead you encode the functions in a multidimensional lookup table, and then do a single lookup to figure out which code to run.
So in order to handle multiple arguments the previous example expands to this:
using BinaryTypeFunction = RuntimeValue (*)(const RuntimeValue&, const RuntimeValue&);
using BinaryTypeTable = std::array<std::array<BinaryTypeFunction, +Type::COUNT>, +Type::COUNT>;
consteval BinaryTypeTable MakeBinaryActionTable()
{
BinaryTypeTable result{};
result[+Type::Bool][+Type::Bool] = +[](const RuntimeValue&, const RuntimeValue&) { /* Boolean stuff */ };
auto integer_function = +[](const RuntimeValue&, const RuntimeValue&) { /* Integer stuff */ };
result[+Type::Signed][+Type::Signed] = integer_function;
result[+Type::Signed][+Type::Unsigned] = integer_function;
result[+Type::Unsigned][+Type::Signed] = integer_function;
result[+Type::Unsigned][+Type::Unsigned] = integer_function;
auto integer_float_function = +[](const RuntimeValue&, const RuntimeValue&) { /* Integer and Floating Point stuff */ };
result[+Type::Signed][+Type::Floating] = integer_float_function;
result[+Type::Unsigned][+Type::Floating] = integer_float_function;
result[+Type::Floating][+Type::Signed] = integer_float_function;
result[+Type::Floating][+Type::Unsigned] = integer_float_function;
result[+Type::Floating][+Type::Floating] = +[](const RuntimeValue&, const RuntimeValue&) { /* Floating Point stuff */ };
return result;
}
static constinit auto binary_type_action_table = MakeBinaryActionTable();
RuntimeValue DoBinaryTypeAction(const RuntimeValue& left, const RuntimeValue& right)
{
auto function = binary_type_action_table[+left.type][+right.type];
if (function != nullptr)
{
return (*function)(left, right);
}
// Emit error...
return RuntimeValue{};
}
As you can see the code hasn’t changed much at all but now we can select a function based on two input parameters instead of one.
The main down side of this is that the lookup table can grow quite large in size, as each entry is sizeof(void*)
(which is usually 8 bytes on modern platforms), multiplied by Type::COUNT
raised to the number of parameters/dimensions in the table. So in the example above it would have 36 entries and be 288 bytes in size.
Character Based Lookup Table
Another use case where this technique can be applied is in writing a lexer or other similar type of parser which deals with characters. The main benefit in this case is being able to write clear concise code which handles the different starting characters without needing to specify the full table in code or have every valid character in a gigantic switch statement.
An example of the make table function for a simple lexer is:
using CharFunction = bool (*)(const char*&);
using CharLookupTable = std::array<CharFunction, 256>;
consteval CharLookupTable MakeLookupTable()
{
CharLookupTable table{};
// (1)
auto error_func = +[](const char*&) { /* Handle Error */ };
for (int i = 0; i < 256; ++i)
{
table[i] = error_func;
}
// (2)
auto handle_word = +[](const char*&) { /* Handle word */ };
table['_'] = handle_word;
for (char c = 'a'; c <= 'z'; ++c)
{
table[c] = handle_word;
}
for (char c = 'A'; c <= 'Z'; ++c)
{
table[c] = handle_word;
}
auto handle_number = +[](const char*&) { /* Handle number */ };
for (char c = '0'; c <= '9'; ++c)
{
table[c] = handle_number;
}
auto handle_symbol = +[](const char*&) { /* Handle symbol */ };
table['+'] = handle_symbol;
table['-'] = handle_symbol;
// etc
return table;
}
Which can be used in code like so:
void Process(const char*& current)
{
constexpr auto lookup_table = MakeLookupTable();
while (SkipWhitespaceAndComments(current))
{
auto result = lookup_table[*current](current);
if (!result) break;
}
}
The main difference in this example is in the way invalid or error entries are encoded in the table.
In this case every table entry is filled with a pointer to an error handling function that handles an unexpected or illegal character (labelled as (1) in the example). Then the entries which handle specific cases are set up (labelled as (2) in the example), such as the alphabet characters handling the start of a word or the digits handling the start of a number.
Because this function is evaluated at compile time we don’t need to worry about how many times we assign the entries in the table, as the table will be stored as data in the compiled code.
This also results in a more efficient way of handling errors as there doesn’t need to be an explicit check for a null pointer being stored in the table, and instead the function can just be executed directly.
Note that in this case an unsigned 8 bit number is used to index into the array of 256 entries, meaning that there’s no way to index out of bounds and therefore no need to check that. If using a smaller array size, especially one that is not a power of two, that a bounds check will be needed.
I’ve used both methods of error handling in my code, and which method to use depends on a variety of factors. When using a table with null pointers then the error checking is done in the function doing the table lookup, and therefore you might have a better error handling context. In the case above you handle the errors in the error handling functions, so either the error handling need to be simpler (or global), or you need to pass in the context to every function in the table.
As far as performance goes you can only tell by implementing both and measuring to see which method is faster, and with this approach it is easy to switch from one error handling strategy to another by just changing the default table entries.
Using Tables Within Classes
One last detail in using this technique is how to use it to call member functions rather than just calling free functions. There are two main methods to accomplish this, either by storing pointers to member functions, or by binding lambdas which get passed the object in as a parameter and then call the required function.
The main issue with the first method is that the size of the pointer-to-member-function may be larger than a regular pointer-to-function, and therefore drastically increase the size of the lookup array. In the simplest case with a basic style class the sizes will be the same so that’s not an issue, but in more complicated classes with inheritance and virtual functions the size will end up being bigger. It could also be an issue that the pointer-to-member-function syntax in C++ is not used that often, so it will be unknown to some people.
Therefore my preferred method is to bind simple lambdas that take a reference to the class as an additional parameter and then just call the desired function on the object in the lambda. It can be best illustrated with the example below:
class Lexer
{
public:
void HandleError();
void HandleWord();
void HandleNumber();
void HandleSymbol();
};
consteval CharLookupTable MakeLookupTable()
{
CharLookupTable table{};
auto error_func = +[](Lexer& lexer) { lexer.HandleError(); };
// ...
auto handle_word = +[](Lexer& lexer) { lexer.HandleWord(); };
// ...
auto handle_number = +[](Lexer& lexer) { lexer.HandleNumber(); };
// ...
auto handle_symbol = +[](Lexer& lexer) { lexer.HandleSymbol(); };
return table;
}
With this method you end up with somewhat simpler and cleaner looking code, and guaranteed smaller size of table than by using pointer-to-member functions. Though one issue is that you need to make the member functions public as they need to be called from functions outside the class.
-
This happens to be very similar (if not identical) to assembly code generated by compilers for a switch statement. ↩