<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://www.dominikgrabiec.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://www.dominikgrabiec.com/" rel="alternate" type="text/html" /><updated>2026-02-14T01:58:06+00:00</updated><id>https://www.dominikgrabiec.com/feed.xml</id><title type="html">Dominik Grabiec Blog</title><subtitle>Personal blog for describing programming topics, showing off personal projects, and discussing technical opinions.</subtitle><entry><title type="html">What AI Really Is</title><link href="https://www.dominikgrabiec.com/posts/2026/02/14/what_ai_really_is.html" rel="alternate" type="text/html" title="What AI Really Is" /><published>2026-02-14T01:30:00+00:00</published><updated>2026-02-14T01:30:00+00:00</updated><id>https://www.dominikgrabiec.com/posts/2026/02/14/what_ai_really_is</id><content type="html" xml:base="https://www.dominikgrabiec.com/posts/2026/02/14/what_ai_really_is.html"><![CDATA[<p>I see a lot of surprised posts on social media and articles in the news about how badly incorrect AI generated text is, how it constantly gets things wrong, how it mixing up information in search, or how it just completely fabricates statements as if they were fact. This has never surprised me given how LLMs work but it still surprises me that people are surprised by this.</p>

<p>I’d like to think that this is about information and education, and that if people knew how these things actually worked they’d be less surprised by it and in turn less inclined to trust and use it in all applications. So I will try to explain what “AI” is, how it works, and why you will never be able to trust its output.</p>

<!--more-->

<p>Specifically in this case I am talking about Large Language Models (LLMs) which are the fashionable thing in AI these days.</p>

<h2 id="how-do-llms-work">How do LLMs work?</h2>

<p>At the highest level a LLM (Large Language Model) is a program which generates a sequence of words which are most likely to follow the previous sequence of words. The way it knows what word comes after the previous words is by scanning as many texts as possible and recording the connections between words. This is the “Model” of a Large Language Model. It has no knowledge of what the words actually mean, nor does it know if a sequence of words is factual, fictional, sarcasm, or just plain lies. All it knows is just the probabilities of which words could follow a given sequence of words.</p>

<p>A very simple example of this is presented in the ACCU 2025 talk titled “<a href="https://www.youtube.com/watch?v=17RQ4DdywQU">A Very Small Language Model</a>” which is about building a very basic language model which can generate realistic looking text. A LLM is just a slightly more complicated version of this working with a massively larger set of word connections.</p>

<p>The massive Model is built by scanning <em>(legally or otherwise)</em> all the available written texts in existence. Naturally this includes texts which are accurate and factual, but also includes texts which are fictional, politically motivated, discredited, fraudulent, or just plain wrong. More importantly it lumps all of these texts together, meaning the connections between words in factual sentences are treated the same as in sentences containing falsehoods.</p>

<blockquote>
  <p>Scanning all written works in existence has caused authors and publishers to file lawsuits against these companies for copyright infringement, so far with mixed success. It has also caused online publishers to either close their archives or make licencing deals with these companies, and when that content has been user generated those sites have  faced backlash from their users.</p>
</blockquote>

<p>Now when you ask the LLM a question what it is actually doing is calculating which word is most likely to follow the words that have been fed into it. This not only includes your question, but also the previous chat history and control text which has been included by the company. It does this to generate one word, which it then adds to the input text and feeds the whole thing back into itself to generate the next word. This is shown in the way a LLM outputs text one word at a time, with the speed mostly being determined by the available processing power.</p>

<p>These calculations are also probabilistic meaning they include randomness in the result, as evidenced by a LLM generating different outputs for an identical input. This means that a LLM will not always pick the most likely word each time, but a word which is statistically likely to appear, which will then alter which words are likely to appear after.</p>

<blockquote>
  <p>A more mathematical explanation behind how a LLM works is provided in this <a href="https://www.youtube.com/watch?v=gqP-Jap_kV0&amp;list=LL">delightful video</a>, where it also contains links to videos explaining more of the theoretical foundations of the technology behind LLMs.</p>
</blockquote>

<h2 id="what-does-this-mean">What does this mean?</h2>

<p>Given that a LLM is just a program which probabilistically generates a sequence of words that are statistically most likely to follow the input text that it was given, we should realise that it:</p>

<ul>
  <li>Doesn’t actually know what is fact, truth, real, fiction, falsehood, fraud, or a lie.</li>
  <li>Doesn’t actually know what any of those concepts even are.</li>
  <li>Doesn’t know what is true or false.</li>
  <li>Has no concept of being right or wrong.</li>
  <li>Has no concept of internal consistency of ideas, subjects, or objects.</li>
  <li>Has no concept of what ideas are factual, fraudulent, or fictional.</li>
  <li>Cannot do arithmetic or perform simple operations (like count how many letters are in a word).</li>
</ul>

<p>It just generates plausible looking text based on its input.</p>

<p>Therefore:</p>

<ul>
  <li>When it answers a question there is no certainty or guarantee that the information is correct or factual.</li>
  <li>When it apologises for getting something wrong, it is not sorry, it is just generating words which are most likely to appear after your message correcting it.</li>
  <li>When it says it is doing something in the background, it is not, it is just generating words which are most likely to appear after you ask it to do something.</li>
  <li>It will never be able to generate new ideas or facts, only recombine words based on the text it has been trained on.</li>
</ul>

<p>To reiterate, all a LLM does is generate a random sequence of statistically probable words based on an input. There is no knowledge, no thought, no creative decisions, and no internal consistency behind those words. Which means they can easily contain both true and false statements, or even a mixture of both in some bizarre combination. This makes the output unreliable as it cannot be trusted.</p>

<h3 id="hallucinations">Hallucinations</h3>

<p>Unfortunately the AI industry has come to refer to these false statements as mere <em>“hallucinations”</em>, and claim that it is just a small problem that can be solved with more resources and time, rather than it being a fundamental issue with the technology itself.</p>

<p>This view is present in this <a href="https://www.computerworld.com/article/4059383/openai-admits-ai-hallucinations-are-mathematically-inevitable-not-just-engineering-flaws.html">article</a>, where they at least admit that inaccurate and false statements are a fundamental problem of how LLMs work, but also say that the problems can be somewhat mitigated, and that we should just adapt to getting back unreliable information.</p>

<p>This seems completely counter-intuitive from a computer system which we have grown accustomed to being deterministic in its behaviour and output.</p>

<blockquote>
  <p>I really don’t like the use of the word “hallucinations” because it seems benign, like the LLM is having a temporary memory or mental issue, rather than calling it what it is, a “<a href="https://dictionary.cambridge.org/dictionary/english/fabrication">fabrication</a>”, in both meanings of the word.</p>
</blockquote>

<h3 id="externalising-costs">Externalising Costs</h3>

<p>Another big problem is that LLMs can generate a lot of plausible looking text at a speed that is faster than humans can write. Given that the text is also unreliable and with questionable accuracy, it means that the costs of reading through and fact checking it has been externalised from the writer<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> of the text to the people reading it. It is quicker and cheaper to generate the text, but it costs everyone else more to be able to process the text, even if they discard it.</p>

<p>There are notable examples of this happening in all manner of fields, some of which are very important to life and liberty.</p>
<ul>
  <li>In the legal field where a lawyer uses a LLM to generate a brief for a court which contains completely fictional legal cases and court decisions. In the best case these are discovered to be false and the lawyer is fined for not checking, but in the worst case these actually decide a legal ruling and are then included in court documents, further propagating these fabricated falsehoods.</li>
  <li>In the computer security field these are security bug reports being submitted to a bug bounty program, where the actual security issue is pure fiction and was fabricated by a LLM. This costs people’s time in investigating the issue and prevents them from working on many other aspects of software development.</li>
  <li>In reports for governments that are authored by big name management consultancies where LLMs are used and insert references to fabricated statistics, court rulings, and other studies. This is dangerous because it is used to justify certain government actions which can harm people’s lives.</li>
</ul>

<p>So instead of a writer taking more time to write something good and succinct for people to read in a short time, LLMs vomit out a lot of text which takes other people’s time to read and process, even if they discard it quickly.</p>

<blockquote>
  <p>These days it is not just text that generative AI models like LLMs spew out, there’s images, audio, and video that also gets generated and is making it harder to distinguish fact from fiction.</p>
</blockquote>

<h2 id="what-now">What now?</h2>

<p>Just realise that LLMs are just text generation programs, with no actual knowledge inside of them, and no capability to do actual data processing. With this in mind you can see what tasks a LLM might be useful for - emitting plausible looking text, and what tasks it is wholly unsuited for - emitting factual text, performing calculations, answering questions.</p>

<p>For some it might <em>(seem to)</em> be a useful tool which helps them do things and write what they need to write, but realise that you always need to check and verify its output with actual non LLM sources, as you the writer<sup id="fnref:1:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> are ultimately liable for what is in the text that the LLM generates.</p>

<blockquote>
  <p>“A computer can never be held accountable, therefore a computer must never make a management decision”. - IBM Training Manual, 1979</p>
</blockquote>

<p>As for me I see no present value in using LLMs for generating content, as we have existing proven technologies for many of its use claimed cases, I have no interest in spending time to fact check its output<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>, and I prefer to write code myself. I may experiment with it now and again, and if there comes a time in the future when LLMs have an actual useful use case I can always try it then.</p>

<!-- Footnotes -->
<hr />

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>I use the term “writer” here loosely, what they really are is the writer of the prompt used to generate the text, which you can say is the “prompter”. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:1:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p>For some examples check out <a href="https://stopcitingai.com/">this site</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="Opinion" /><category term="opinion" /><category term="ai" /><summary type="html"><![CDATA[I see a lot of surprised posts on social media and articles in the news about how badly incorrect AI generated text is, how it constantly gets things wrong, how it mixing up information in search, or how it just completely fabricates statements as if they were fact. This has never surprised me given how LLMs work but it still surprises me that people are surprised by this. I’d like to think that this is about information and education, and that if people knew how these things actually worked they’d be less surprised by it and in turn less inclined to trust and use it in all applications. So I will try to explain what “AI” is, how it works, and why you will never be able to trust its output.]]></summary></entry><entry><title type="html">ACCU 2025 Conference Presentations &amp;amp; Review</title><link href="https://www.dominikgrabiec.com/posts/2025/11/13/accu_2025_review.html" rel="alternate" type="text/html" title="ACCU 2025 Conference Presentations &amp;amp; Review" /><published>2025-11-13T11:00:00+00:00</published><updated>2025-11-13T11:00:00+00:00</updated><id>https://www.dominikgrabiec.com/posts/2025/11/13/accu_2025_review</id><content type="html" xml:base="https://www.dominikgrabiec.com/posts/2025/11/13/accu_2025_review.html"><![CDATA[<p>In April this year (2025) I attended the <a href="https://accu.org/">ACCU Conference</a> in Bristol, UK to present a couple of talks. The first was a longer (and better) version of my talk about optimising multi-threaded data building for game development that I initially presented at CppCon last year, and the second was a shorter talk about various mistakes that I’ve seen made in handling data during game development.</p>

<!--more-->

<p>I had originally intended to only present a longer version of my talk from CppCon about Optimising Data Building in Game Development, but I was unsure if it would get accepted for the conference. So I quickly thought about and wrote a shorter presentation about various mistakes that I’ve encountered when handling data while making games. Fortunately <em>(or unfortunately)</em> both of my presentations were accepted for the conference.</p>

<p>I’m posting this now, quite a long time after the event, as both videos for my talks are now up on YouTube and available for all to see, please check them out.</p>

<ul>
  <li><a href="https://www.youtube.com/watch?v=x_5PIxOFknY">Mistakes With Data Made During Game Development</a> <em>- the shorter presentation.</em></li>
  <li><a href="https://www.youtube.com/watch?v=KNAyUjeNewc">Optimising Data Building in Game Development</a> <em>- extended version</em></li>
</ul>

<p>The conference itself felt a lot more community focused, smaller, and a lot less intimidating than CppCon, and covering a wider array of topics than just pure C++ programming. This was actually kind of refreshing because it meant that not every talk was a deep discussion of the technical details of C++, and you got to meet a whole range of different friendly people.</p>

<p>One of the unfortunate things for people reading this is that not all the talks were recorded - at the presenters request, one of which was a talk about programming a server for Ultima Online.</p>

<p>Out of the talks that are online I got the most value from seeing:</p>

<ul>
  <li><a href="https://www.youtube.com/watch?v=gbs-qMIlYUg">Tanzt Kaputt, Was Euch Kaputt Macht!</a> by Dom Davis, talking about mental health in software development.</li>
  <li><a href="https://www.youtube.com/watch?v=HUS_vPJbQX4">So You Think You Can Lead a Software Team</a> by Paul Grenyer, talking about leadership coming from a technical background.</li>
  <li><a href="https://www.youtube.com/watch?v=q7OmdusczC8">consteval All The Things?</a> by Jason Turner, a fun interactive talk about compile time C++.</li>
  <li><a href="https://www.youtube.com/watch?v=jlt_fScVl50">Teaching an Old Dog New Tricks</a> by Matt Godbolt, the closing keynote talking about learning new C++ language features and using them in a practical sense. I’m also constantly amazed how easily Matt can clobber together technologies like the ones in the presentation.</li>
</ul>

<p>To anyone at the conference that is reading this, it was a pleasure meeting you and we’ll hopefully get the chance to again sometime.</p>]]></content><author><name></name></author><category term="C++" /><category term="c++" /><category term="ACCU" /><category term="conferences" /><summary type="html"><![CDATA[In April this year (2025) I attended the ACCU Conference in Bristol, UK to present a couple of talks. The first was a longer (and better) version of my talk about optimising multi-threaded data building for game development that I initially presented at CppCon last year, and the second was a shorter talk about various mistakes that I’ve seen made in handling data during game development.]]></summary></entry><entry><title type="html">C++ Compile Time Function Tables for Fun and Profit</title><link href="https://www.dominikgrabiec.com/posts/2025/09/11/compile_time_function_tables.html" rel="alternate" type="text/html" title="C++ Compile Time Function Tables for Fun and Profit" /><published>2025-09-11T11:00:00+00:00</published><updated>2025-09-11T11:00:00+00:00</updated><id>https://www.dominikgrabiec.com/posts/2025/09/11/compile_time_function_tables</id><content type="html" xml:base="https://www.dominikgrabiec.com/posts/2025/09/11/compile_time_function_tables.html"><![CDATA[<p>A technique that I’ve recently found useful is to use compile time function tables to eliminate complicated branching logic from my code. It is especially useful where you have to perform the same actions in multiple different conditions, but the conditional logic gets overly complicated. More generally it is useful when you need to map a contiguous range of values to a small number of actions. Though the best part is that it is implemented in simple and straightforward C++20 code which gets evaluated at compile time without any sort of tricky templates or macros.</p>

<!--more-->

<p>At its core the technique consists of the following items:</p>
<ul>
  <li>A compile time <code class="language-plaintext highlighter-rouge">consteval</code> function which creates and returns an array of function pointers.</li>
  <li>Calling the compile time function and assigning the result into a <code class="language-plaintext highlighter-rouge">constinit</code> variable at a global/file scope.</li>
  <li>A runtime function which uses the array to look up a function to call and calls it with the required arguments.</li>
</ul>

<p>The generated assembly for this technique ends up being a simple jump table as data and a handful of instructions used to index into it and then jump to the specified address<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>

<h2 id="simplest-single-argument-lookup">Simplest Single Argument Lookup</h2>

<p>To best illustrate this technique I’m going to use a simple example that you may find in an interpreter, with an <code class="language-plaintext highlighter-rouge">enum class</code> that represents a type and a tagged union value <code class="language-plaintext highlighter-rouge">struct</code> which holds the payload, like so:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">enum</span> <span class="k">class</span> <span class="nc">Type</span> <span class="o">:</span> <span class="kt">uint8_t</span>
<span class="p">{</span>
	<span class="n">Void</span><span class="p">,</span> <span class="n">Bool</span><span class="p">,</span> <span class="n">Signed</span><span class="p">,</span> <span class="n">Unsigned</span><span class="p">,</span> <span class="n">Float</span><span class="p">,</span> <span class="n">String</span><span class="p">,</span> <span class="c1">// ...</span>
	<span class="n">COUNT</span>
<span class="p">};</span>

<span class="k">struct</span> <span class="nc">RuntimeValue</span>
<span class="p">{</span>
	<span class="n">Type</span> <span class="n">type</span><span class="p">;</span>
	<span class="k">union</span> <span class="p">{</span> <span class="cm">/* ... */</span> <span class="p">}</span> <span class="n">value</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p>With these types defined I can use them to write the code for the rest of this simple example, where we only need to handle a single argument.</p>

<blockquote>
  <p>Note that I’m using the <a href="/posts/2025/08/08/overloading_unary_plus_operator.html">overloaded unary operator+</a> technique described in another article to make the code more concise. Likewise using unary operator+ on a lambda with no bound variables turns it into a function pointer.</p>
</blockquote>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// (1)</span>
<span class="k">using</span> <span class="n">UnaryTypeFunction</span> <span class="o">=</span> <span class="kt">void</span> <span class="p">(</span><span class="o">*</span><span class="p">)(</span><span class="k">const</span> <span class="n">RuntimeValue</span><span class="o">&amp;</span><span class="p">);</span>
<span class="k">using</span> <span class="n">UnaryTypeTable</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">array</span><span class="o">&lt;</span><span class="n">UnaryTypeFunction</span><span class="p">,</span> <span class="o">+</span><span class="n">Type</span><span class="o">::</span><span class="n">COUNT</span><span class="o">&gt;</span><span class="p">;</span>

<span class="c1">// (2)</span>
<span class="k">consteval</span> <span class="n">UnaryTypeTable</span> <span class="nf">MakeUnaryActionTable</span><span class="p">()</span>
<span class="p">{</span>
	<span class="n">UnaryTypeTable</span> <span class="n">result</span><span class="p">{};</span>
	<span class="n">result</span><span class="p">[</span><span class="o">+</span><span class="n">Type</span><span class="o">::</span><span class="n">Bool</span><span class="p">]</span> <span class="o">=</span> <span class="o">+</span><span class="p">[](</span><span class="k">const</span> <span class="n">RuntimeValue</span><span class="o">&amp;</span><span class="p">)</span> <span class="p">{</span> <span class="cm">/* Boolean stuff */</span> <span class="p">};</span>

	<span class="k">auto</span> <span class="n">integer_function</span> <span class="o">=</span> <span class="o">+</span><span class="p">[](</span><span class="k">const</span> <span class="n">RuntimeValue</span><span class="o">&amp;</span><span class="p">)</span> <span class="p">{</span> <span class="cm">/* Integer stuff */</span> <span class="p">};</span>
	<span class="n">result</span><span class="p">[</span><span class="o">+</span><span class="n">Type</span><span class="o">::</span><span class="n">Signed</span><span class="p">]</span> <span class="o">=</span> <span class="n">integer_function</span><span class="p">;</span>
	<span class="n">result</span><span class="p">[</span><span class="o">+</span><span class="n">Type</span><span class="o">::</span><span class="n">Unsigned</span><span class="p">]</span> <span class="o">=</span> <span class="n">integer_function</span><span class="p">;</span>

	<span class="n">result</span><span class="p">[</span><span class="o">+</span><span class="n">Type</span><span class="o">::</span><span class="n">Floating</span><span class="p">]</span> <span class="o">=</span> <span class="o">+</span><span class="p">[](</span><span class="k">const</span> <span class="n">RuntimeValue</span><span class="o">&amp;</span><span class="p">)</span> <span class="p">{</span> <span class="cm">/* Floating Point stuff */</span> <span class="p">};</span>
	<span class="k">return</span> <span class="n">result</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// (3)</span>
<span class="k">static</span> <span class="k">constinit</span> <span class="k">auto</span> <span class="n">unary_type_action_table</span> <span class="o">=</span> <span class="n">MakeUnaryActionTable</span><span class="p">();</span>

<span class="c1">// (4)</span>
<span class="kt">void</span> <span class="n">DoUnaryTypeAction</span><span class="p">(</span><span class="k">const</span> <span class="n">RuntimeValue</span><span class="o">&amp;</span> <span class="n">value</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">auto</span> <span class="n">function</span> <span class="o">=</span> <span class="n">unary_type_action_table</span><span class="p">[</span><span class="o">+</span><span class="n">value</span><span class="p">.</span><span class="n">type</span><span class="p">];</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">function</span> <span class="o">!=</span> <span class="nb">nullptr</span><span class="p">)</span>
	<span class="p">{</span>
		<span class="p">(</span><span class="o">*</span><span class="n">function</span><span class="p">)(</span><span class="n">value</span><span class="p">);</span>
	<span class="p">}</span>
	<span class="c1">// Emit error...</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The example above is split up into four sections, all of which can be implemented in the source (<code class="language-plaintext highlighter-rouge">.cpp</code>) file, and the first three of which can be put into a private or internal namespace to hide the implementation details and make the code somewhat cleaner.</p>

<h4 id="1-helper-definitions">1.) Helper Definitions</h4>
<p>Helper type definitions which simplify using the function pointer and function pointer table definitions in the code. In this example these are <code class="language-plaintext highlighter-rouge">UnaryTypeFunction</code> and <code class="language-plaintext highlighter-rouge">UnaryTypeTable</code>. These are not strictly necessary but they do make the code clearer and more concise.</p>

<h4 id="2-make-table-function">2.) Make Table Function</h4>
<p>A make table function <code class="language-plaintext highlighter-rouge">MakeUnaryActionTable</code> which wraps up creation of the table.</p>

<p>There are a few things to note with this function:</p>
<ul>
  <li>Using <code class="language-plaintext highlighter-rouge">consteval</code> is <strong>key</strong>, as it guarantees that the function will be evaluated at compile time or a compiler error will be emitted.</li>
  <li>Using aggregate/list initialisation syntax for the <code class="language-plaintext highlighter-rouge">result</code> variable will assign <code class="language-plaintext highlighter-rouge">nullptr</code> to all entries. Therefore there’s no uninitialised memory in the table, and no need to explicitly specify every entry, only the entries that we want to do something.</li>
  <li>We can store a function pointer in a variable and assign it to multiple table entries, like with the <code class="language-plaintext highlighter-rouge">integer_function</code> variable in the example above. This also means that we can call other <code class="language-plaintext highlighter-rouge">consteval</code> functions and pass in function pointers to them and also modify the tables that they return.</li>
</ul>

<h4 id="3-assigning-the-function-table">3.) Assigning the Function Table</h4>
<p>Calling the <code class="language-plaintext highlighter-rouge">MakeUnaryActionTable</code> and storing its result in a <code class="language-plaintext highlighter-rouge">constinit</code> variable, ensuring that if we cannot assign it at compile time then the compiler will emit an error.</p>

<p>You can also use an immediately invoked <code class="language-plaintext highlighter-rouge">consteval</code> lambda to create the table if that is your preference, but I much prefer to use an explicit function instead, as you can give it a proper name and it is more consistent when you want to call other helper <code class="language-plaintext highlighter-rouge">consteval</code> functions.</p>

<h4 id="4-the-callable-function">4.) The Callable Function</h4>
<p>The <code class="language-plaintext highlighter-rouge">DoUnaryTypeAction</code> function which actually performs the action at runtime. This is the public facing API of this technique, with the rest being implementation details. This function can also be a member function of a class even if the other components are not.</p>

<h2 id="multiple-argument-lookup">Multiple Argument Lookup</h2>

<p>This technique actually shines when you have to execute code based on multiple input arguments, which would normally result in a complicated tree of conditional <code class="language-plaintext highlighter-rouge">if/else</code> and <code class="language-plaintext highlighter-rouge">switch</code> statements, but instead you encode the functions in a multidimensional lookup table, and then do a single lookup to figure out which code to run.</p>

<p>So in order to handle multiple arguments the previous example expands to this:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">using</span> <span class="n">BinaryTypeFunction</span> <span class="o">=</span> <span class="n">RuntimeValue</span> <span class="p">(</span><span class="o">*</span><span class="p">)(</span><span class="k">const</span> <span class="n">RuntimeValue</span><span class="o">&amp;</span><span class="p">,</span> <span class="k">const</span> <span class="n">RuntimeValue</span><span class="o">&amp;</span><span class="p">);</span>
<span class="k">using</span> <span class="n">BinaryTypeTable</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">array</span><span class="o">&lt;</span><span class="n">std</span><span class="o">::</span><span class="n">array</span><span class="o">&lt;</span><span class="n">BinaryTypeFunction</span><span class="p">,</span> <span class="o">+</span><span class="n">Type</span><span class="o">::</span><span class="n">COUNT</span><span class="o">&gt;</span><span class="p">,</span> <span class="o">+</span><span class="n">Type</span><span class="o">::</span><span class="n">COUNT</span><span class="o">&gt;</span><span class="p">;</span>

<span class="k">consteval</span> <span class="n">BinaryTypeTable</span> <span class="nf">MakeBinaryActionTable</span><span class="p">()</span>
<span class="p">{</span>
	<span class="n">BinaryTypeTable</span> <span class="n">result</span><span class="p">{};</span>
	<span class="n">result</span><span class="p">[</span><span class="o">+</span><span class="n">Type</span><span class="o">::</span><span class="n">Bool</span><span class="p">][</span><span class="o">+</span><span class="n">Type</span><span class="o">::</span><span class="n">Bool</span><span class="p">]</span> <span class="o">=</span> <span class="o">+</span><span class="p">[](</span><span class="k">const</span> <span class="n">RuntimeValue</span><span class="o">&amp;</span><span class="p">,</span> <span class="k">const</span> <span class="n">RuntimeValue</span><span class="o">&amp;</span><span class="p">)</span> <span class="p">{</span> <span class="cm">/* Boolean stuff */</span> <span class="p">};</span>

	<span class="k">auto</span> <span class="n">integer_function</span> <span class="o">=</span> <span class="o">+</span><span class="p">[](</span><span class="k">const</span> <span class="n">RuntimeValue</span><span class="o">&amp;</span><span class="p">,</span> <span class="k">const</span> <span class="n">RuntimeValue</span><span class="o">&amp;</span><span class="p">)</span> <span class="p">{</span> <span class="cm">/* Integer stuff */</span> <span class="p">};</span>
	<span class="n">result</span><span class="p">[</span><span class="o">+</span><span class="n">Type</span><span class="o">::</span><span class="n">Signed</span><span class="p">][</span><span class="o">+</span><span class="n">Type</span><span class="o">::</span><span class="n">Signed</span><span class="p">]</span> <span class="o">=</span> <span class="n">integer_function</span><span class="p">;</span>
	<span class="n">result</span><span class="p">[</span><span class="o">+</span><span class="n">Type</span><span class="o">::</span><span class="n">Signed</span><span class="p">][</span><span class="o">+</span><span class="n">Type</span><span class="o">::</span><span class="n">Unsigned</span><span class="p">]</span> <span class="o">=</span> <span class="n">integer_function</span><span class="p">;</span>
	<span class="n">result</span><span class="p">[</span><span class="o">+</span><span class="n">Type</span><span class="o">::</span><span class="n">Unsigned</span><span class="p">][</span><span class="o">+</span><span class="n">Type</span><span class="o">::</span><span class="n">Signed</span><span class="p">]</span> <span class="o">=</span> <span class="n">integer_function</span><span class="p">;</span>
	<span class="n">result</span><span class="p">[</span><span class="o">+</span><span class="n">Type</span><span class="o">::</span><span class="n">Unsigned</span><span class="p">][</span><span class="o">+</span><span class="n">Type</span><span class="o">::</span><span class="n">Unsigned</span><span class="p">]</span> <span class="o">=</span> <span class="n">integer_function</span><span class="p">;</span>

	<span class="k">auto</span> <span class="n">integer_float_function</span> <span class="o">=</span> <span class="o">+</span><span class="p">[](</span><span class="k">const</span> <span class="n">RuntimeValue</span><span class="o">&amp;</span><span class="p">,</span> <span class="k">const</span> <span class="n">RuntimeValue</span><span class="o">&amp;</span><span class="p">)</span> <span class="p">{</span> <span class="cm">/* Integer and Floating Point stuff */</span> <span class="p">};</span>
	<span class="n">result</span><span class="p">[</span><span class="o">+</span><span class="n">Type</span><span class="o">::</span><span class="n">Signed</span><span class="p">][</span><span class="o">+</span><span class="n">Type</span><span class="o">::</span><span class="n">Floating</span><span class="p">]</span> <span class="o">=</span> <span class="n">integer_float_function</span><span class="p">;</span>
	<span class="n">result</span><span class="p">[</span><span class="o">+</span><span class="n">Type</span><span class="o">::</span><span class="n">Unsigned</span><span class="p">][</span><span class="o">+</span><span class="n">Type</span><span class="o">::</span><span class="n">Floating</span><span class="p">]</span> <span class="o">=</span> <span class="n">integer_float_function</span><span class="p">;</span>
	<span class="n">result</span><span class="p">[</span><span class="o">+</span><span class="n">Type</span><span class="o">::</span><span class="n">Floating</span><span class="p">][</span><span class="o">+</span><span class="n">Type</span><span class="o">::</span><span class="n">Signed</span><span class="p">]</span> <span class="o">=</span> <span class="n">integer_float_function</span><span class="p">;</span>
	<span class="n">result</span><span class="p">[</span><span class="o">+</span><span class="n">Type</span><span class="o">::</span><span class="n">Floating</span><span class="p">][</span><span class="o">+</span><span class="n">Type</span><span class="o">::</span><span class="n">Unsigned</span><span class="p">]</span> <span class="o">=</span> <span class="n">integer_float_function</span><span class="p">;</span>

	<span class="n">result</span><span class="p">[</span><span class="o">+</span><span class="n">Type</span><span class="o">::</span><span class="n">Floating</span><span class="p">][</span><span class="o">+</span><span class="n">Type</span><span class="o">::</span><span class="n">Floating</span><span class="p">]</span> <span class="o">=</span> <span class="o">+</span><span class="p">[](</span><span class="k">const</span> <span class="n">RuntimeValue</span><span class="o">&amp;</span><span class="p">,</span> <span class="k">const</span> <span class="n">RuntimeValue</span><span class="o">&amp;</span><span class="p">)</span> <span class="p">{</span> <span class="cm">/* Floating Point stuff */</span> <span class="p">};</span>
	<span class="k">return</span> <span class="n">result</span><span class="p">;</span>
<span class="p">}</span>

<span class="k">static</span> <span class="k">constinit</span> <span class="k">auto</span> <span class="n">binary_type_action_table</span> <span class="o">=</span> <span class="n">MakeBinaryActionTable</span><span class="p">();</span>

<span class="n">RuntimeValue</span> <span class="n">DoBinaryTypeAction</span><span class="p">(</span><span class="k">const</span> <span class="n">RuntimeValue</span><span class="o">&amp;</span> <span class="n">left</span><span class="p">,</span> <span class="k">const</span> <span class="n">RuntimeValue</span><span class="o">&amp;</span> <span class="n">right</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">auto</span> <span class="n">function</span> <span class="o">=</span> <span class="n">binary_type_action_table</span><span class="p">[</span><span class="o">+</span><span class="n">left</span><span class="p">.</span><span class="n">type</span><span class="p">][</span><span class="o">+</span><span class="n">right</span><span class="p">.</span><span class="n">type</span><span class="p">];</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">function</span> <span class="o">!=</span> <span class="nb">nullptr</span><span class="p">)</span>
	<span class="p">{</span>
		<span class="k">return</span> <span class="p">(</span><span class="o">*</span><span class="n">function</span><span class="p">)(</span><span class="n">left</span><span class="p">,</span> <span class="n">right</span><span class="p">);</span>
	<span class="p">}</span>
	<span class="c1">// Emit error...</span>
	<span class="k">return</span> <span class="n">RuntimeValue</span><span class="p">{};</span>
<span class="p">}</span>
</code></pre></div></div>

<p>As you can see the code hasn’t changed much at all but now we can select a function based on two input parameters instead of one.</p>

<p>The main down side of this is that the lookup table can grow quite large in size, as each entry is <code class="language-plaintext highlighter-rouge">sizeof(void*)</code> <em>(which is usually 8 bytes on modern platforms)</em>, multiplied by <code class="language-plaintext highlighter-rouge">Type::COUNT</code> raised to the number of parameters/dimensions in the table. So in the example above it would have 36 entries and be 288 bytes in size.</p>

<h2 id="character-based-lookup-table">Character Based Lookup Table</h2>

<p>Another use case where this technique can be applied is in writing a lexer or other similar type of parser which deals with characters. The main benefit in this case is being able to write clear concise code which handles the different starting characters without needing to specify the full table in code or have every valid character in a gigantic switch statement.</p>

<p>An example of the make table function for a simple lexer is:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">using</span> <span class="n">CharFunction</span> <span class="o">=</span> <span class="kt">bool</span> <span class="p">(</span><span class="o">*</span><span class="p">)(</span><span class="k">const</span> <span class="kt">char</span><span class="o">*&amp;</span><span class="p">);</span>
<span class="k">using</span> <span class="n">CharLookupTable</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">array</span><span class="o">&lt;</span><span class="n">CharFunction</span><span class="p">,</span> <span class="mi">256</span><span class="o">&gt;</span><span class="p">;</span>

<span class="k">consteval</span> <span class="n">CharLookupTable</span> <span class="nf">MakeLookupTable</span><span class="p">()</span>
<span class="p">{</span>
	<span class="n">CharLookupTable</span> <span class="n">table</span><span class="p">{};</span>

	<span class="c1">// (1)</span>
	<span class="k">auto</span> <span class="n">error_func</span> <span class="o">=</span> <span class="o">+</span><span class="p">[](</span><span class="k">const</span> <span class="kt">char</span><span class="o">*&amp;</span><span class="p">)</span> <span class="p">{</span> <span class="cm">/* Handle Error */</span> <span class="p">};</span>
	<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">256</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
	<span class="p">{</span>
		<span class="n">table</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">error_func</span><span class="p">;</span>
	<span class="p">}</span>

	<span class="c1">// (2)</span>
	<span class="k">auto</span> <span class="n">handle_word</span> <span class="o">=</span> <span class="o">+</span><span class="p">[](</span><span class="k">const</span> <span class="kt">char</span><span class="o">*&amp;</span><span class="p">)</span> <span class="p">{</span> <span class="cm">/* Handle word */</span> <span class="p">};</span>
	<span class="n">table</span><span class="p">[</span><span class="sc">'_'</span><span class="p">]</span> <span class="o">=</span> <span class="n">handle_word</span><span class="p">;</span>
	<span class="k">for</span> <span class="p">(</span><span class="kt">char</span> <span class="n">c</span> <span class="o">=</span> <span class="sc">'a'</span><span class="p">;</span> <span class="n">c</span> <span class="o">&lt;=</span> <span class="sc">'z'</span><span class="p">;</span> <span class="o">++</span><span class="n">c</span><span class="p">)</span>
	<span class="p">{</span>
		<span class="n">table</span><span class="p">[</span><span class="n">c</span><span class="p">]</span> <span class="o">=</span> <span class="n">handle_word</span><span class="p">;</span>
	<span class="p">}</span>
	<span class="k">for</span> <span class="p">(</span><span class="kt">char</span> <span class="n">c</span> <span class="o">=</span> <span class="sc">'A'</span><span class="p">;</span> <span class="n">c</span> <span class="o">&lt;=</span> <span class="sc">'Z'</span><span class="p">;</span> <span class="o">++</span><span class="n">c</span><span class="p">)</span>
	<span class="p">{</span>
		<span class="n">table</span><span class="p">[</span><span class="n">c</span><span class="p">]</span> <span class="o">=</span> <span class="n">handle_word</span><span class="p">;</span>
	<span class="p">}</span>

	<span class="k">auto</span> <span class="n">handle_number</span> <span class="o">=</span> <span class="o">+</span><span class="p">[](</span><span class="k">const</span> <span class="kt">char</span><span class="o">*&amp;</span><span class="p">)</span> <span class="p">{</span> <span class="cm">/* Handle number */</span> <span class="p">};</span>
	<span class="k">for</span> <span class="p">(</span><span class="kt">char</span> <span class="n">c</span> <span class="o">=</span> <span class="sc">'0'</span><span class="p">;</span> <span class="n">c</span> <span class="o">&lt;=</span> <span class="sc">'9'</span><span class="p">;</span> <span class="o">++</span><span class="n">c</span><span class="p">)</span>
	<span class="p">{</span>
		<span class="n">table</span><span class="p">[</span><span class="n">c</span><span class="p">]</span> <span class="o">=</span> <span class="n">handle_number</span><span class="p">;</span>
	<span class="p">}</span>

	<span class="k">auto</span> <span class="n">handle_symbol</span> <span class="o">=</span> <span class="o">+</span><span class="p">[](</span><span class="k">const</span> <span class="kt">char</span><span class="o">*&amp;</span><span class="p">)</span> <span class="p">{</span> <span class="cm">/* Handle symbol */</span> <span class="p">};</span>
	<span class="n">table</span><span class="p">[</span><span class="sc">'+'</span><span class="p">]</span> <span class="o">=</span> <span class="n">handle_symbol</span><span class="p">;</span>
	<span class="n">table</span><span class="p">[</span><span class="sc">'-'</span><span class="p">]</span> <span class="o">=</span> <span class="n">handle_symbol</span><span class="p">;</span>
	<span class="c1">// etc</span>

	<span class="k">return</span> <span class="n">table</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Which can be used in code like so:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">Process</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span><span class="o">*&amp;</span> <span class="n">current</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">constexpr</span> <span class="k">auto</span> <span class="n">lookup_table</span> <span class="o">=</span> <span class="n">MakeLookupTable</span><span class="p">();</span>
	<span class="k">while</span> <span class="p">(</span><span class="n">SkipWhitespaceAndComments</span><span class="p">(</span><span class="n">current</span><span class="p">))</span>
	<span class="p">{</span>
		<span class="k">auto</span> <span class="n">result</span> <span class="o">=</span> <span class="n">lookup_table</span><span class="p">[</span><span class="o">*</span><span class="n">current</span><span class="p">](</span><span class="n">current</span><span class="p">);</span>
		<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">result</span><span class="p">)</span> <span class="k">break</span><span class="p">;</span>
	<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The main difference in this example is in the way invalid or error entries are encoded in the table.</p>

<p>In this case every table entry is filled with a pointer to an error handling function that handles an unexpected or illegal character <em>(labelled as (1) in the example)</em>. Then the entries which handle specific cases are set up <em>(labelled as (2) in the example)</em>, such as the alphabet characters handling the start of a word or the digits handling the start of a number.</p>

<blockquote>
  <p>Because this function is evaluated at compile time we don’t need to worry about how many times we assign the entries in the table, as the table will be stored as data in the compiled code.</p>
</blockquote>

<p>This also results in a more efficient way of handling errors as there doesn’t need to be an explicit check for a null pointer being stored in the table, and instead the function can just be executed directly.</p>

<blockquote>
  <p>Note that in this case an unsigned 8 bit number is used to index into the array of 256 entries, meaning that there’s no way to index out of bounds and therefore no need to check that. If using a smaller array size, especially one that is not a power of two, that a bounds check will be needed.</p>
</blockquote>

<p>I’ve used both methods of error handling in my code, and which method to use depends on a variety of factors. When using a table with null pointers then the error checking is done in the function doing the table lookup, and therefore you might have a better error handling context. In the case above you handle the errors in the error handling functions, so either the error handling need to be simpler (or global), or you need to pass in the context to every function in the table.</p>

<p>As far as performance goes you can only tell by implementing both and measuring to see which method is faster, and with this approach it is easy to switch from one error handling strategy to another by just changing the default table entries.</p>

<h2 id="using-tables-within-classes">Using Tables Within Classes</h2>

<p>One last detail in using this technique is how to use it to call member functions rather than just calling free functions. There are two main methods to accomplish this, either by storing pointers to member functions, or by binding lambdas which get passed the object in as a parameter and then call the required function.</p>

<p>The main issue with the first method is that the size of the pointer-to-member-function may be larger than a regular pointer-to-function, and therefore drastically increase the size of the lookup array. In the simplest case with a basic style class the sizes will be the same so that’s not an issue, but in more complicated classes with inheritance and virtual functions the size will end up being bigger. It could also be an issue that the pointer-to-member-function syntax in C++ is not used that often, so it will be unknown to some people.</p>

<p>Therefore my preferred method is to bind simple lambdas that take a reference to the class as an additional parameter and then just call the desired function on the object in the lambda. It can be best illustrated with the example below:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Lexer</span>
<span class="p">{</span>
<span class="nl">public:</span>
	<span class="kt">void</span> <span class="n">HandleError</span><span class="p">();</span>
	<span class="kt">void</span> <span class="n">HandleWord</span><span class="p">();</span>
	<span class="kt">void</span> <span class="n">HandleNumber</span><span class="p">();</span>
	<span class="kt">void</span> <span class="n">HandleSymbol</span><span class="p">();</span>
<span class="p">};</span>

<span class="k">consteval</span> <span class="n">CharLookupTable</span> <span class="n">MakeLookupTable</span><span class="p">()</span>
<span class="p">{</span>
	<span class="n">CharLookupTable</span> <span class="n">table</span><span class="p">{};</span>

	<span class="k">auto</span> <span class="n">error_func</span> <span class="o">=</span> <span class="o">+</span><span class="p">[](</span><span class="n">Lexer</span><span class="o">&amp;</span> <span class="n">lexer</span><span class="p">)</span> <span class="p">{</span> <span class="n">lexer</span><span class="p">.</span><span class="n">HandleError</span><span class="p">();</span> <span class="p">};</span>
	<span class="c1">// ...</span>

	<span class="k">auto</span> <span class="n">handle_word</span> <span class="o">=</span> <span class="o">+</span><span class="p">[](</span><span class="n">Lexer</span><span class="o">&amp;</span> <span class="n">lexer</span><span class="p">)</span> <span class="p">{</span> <span class="n">lexer</span><span class="p">.</span><span class="n">HandleWord</span><span class="p">();</span> <span class="p">};</span>
	<span class="c1">// ...</span>

	<span class="k">auto</span> <span class="n">handle_number</span> <span class="o">=</span> <span class="o">+</span><span class="p">[](</span><span class="n">Lexer</span><span class="o">&amp;</span> <span class="n">lexer</span><span class="p">)</span> <span class="p">{</span> <span class="n">lexer</span><span class="p">.</span><span class="n">HandleNumber</span><span class="p">();</span> <span class="p">};</span>
	<span class="c1">// ...</span>

	<span class="k">auto</span> <span class="n">handle_symbol</span> <span class="o">=</span> <span class="o">+</span><span class="p">[](</span><span class="n">Lexer</span><span class="o">&amp;</span> <span class="n">lexer</span><span class="p">)</span> <span class="p">{</span> <span class="n">lexer</span><span class="p">.</span><span class="n">HandleSymbol</span><span class="p">();</span> <span class="p">};</span>

	<span class="k">return</span> <span class="n">table</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>With this method you end up with somewhat simpler and cleaner looking code, and guaranteed smaller size of table than by using pointer-to-member functions. Though one issue is that you need to make the member functions public as they need to be called from functions outside the class.</p>

<!-- Footnotes -->
<hr />

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>This happens to be very similar <em>(if not identical)</em> to assembly code generated by compilers for a switch statement. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="C++" /><category term="c++" /><category term="code" /><summary type="html"><![CDATA[A technique that I’ve recently found useful is to use compile time function tables to eliminate complicated branching logic from my code. It is especially useful where you have to perform the same actions in multiple different conditions, but the conditional logic gets overly complicated. More generally it is useful when you need to map a contiguous range of values to a small number of actions. Though the best part is that it is implemented in simple and straightforward C++20 code which gets evaluated at compile time without any sort of tricky templates or macros.]]></summary></entry><entry><title type="html">Overloading Unary Operator+ for Enum Classes</title><link href="https://www.dominikgrabiec.com/posts/2025/08/08/overloading_unary_plus_operator.html" rel="alternate" type="text/html" title="Overloading Unary Operator+ for Enum Classes" /><published>2025-08-08T11:00:00+00:00</published><updated>2025-08-08T11:00:00+00:00</updated><id>https://www.dominikgrabiec.com/posts/2025/08/08/overloading_unary_plus_operator</id><content type="html" xml:base="https://www.dominikgrabiec.com/posts/2025/08/08/overloading_unary_plus_operator.html"><![CDATA[<p>The unary plus operator in C++ is one of those lesser known and even less frequently used operators in the language. I have been programming in C++ for many years and only recently started to use it when I found some nifty use cases. The first such case I found in the Carbon language compiler source code, where it was used to convert lambdas into function pointers. The second use case is in creating a convenient function for casting an <code class="language-plaintext highlighter-rouge">enum class</code> into its underlying type, which I will describe in this article.</p>

<!--more-->

<h1 id="casting-enum-class-to-underlying-type">Casting Enum Class to Underlying Type</h1>

<p>In modern C++ we have the <code class="language-plaintext highlighter-rouge">enum class</code> construct (or <code class="language-plaintext highlighter-rouge">enum struct</code>) which should be used as it fixes some issues with C style enums<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. One feature of this is that we have to explicitly cast to get an integer value, and when you’re needing the underlying value in many places then it becomes a burden.</p>

<p>The correct way to do this is to do a static cast to the underlying type of the enumeration. In C++ this is properly done by using the expression <code class="language-plaintext highlighter-rouge">static_cast&lt;std::underlying_type_t&lt;EnumType&gt;&gt;</code>, but as you can see it is pretty verbose. If it only needs to be used in a handful of places then using this construct will be fine, but if it needs to be used multiple times in the same piece of code then it will be a distraction from the actual intent of the code.</p>

<p>Some might want to use C-style casts, and while it does reduce the total amount of characters in the code, it brings with it many issues and removes the type safety that using an <code class="language-plaintext highlighter-rouge">enum class</code> provides. So in this case you might as well just use a plain <code class="language-plaintext highlighter-rouge">enum</code> instead, and if you’re fine with this you can stop reading here.</p>

<p>Some others might just explicitly put the type into the <code class="language-plaintext highlighter-rouge">static_cast&lt;&gt;</code> in order to remove the rather verbose <code class="language-plaintext highlighter-rouge">std::underlying_type_t&lt;&gt;</code> part of the expression. While this is a much better choice than a C-style cast, it will still require a lot of code changes if the underlying type of the enum ever changes, and there is a potential for silent problems if someone forgets to update a cast to the new type.</p>

<h1 id="leveraging-unary-operator">Leveraging Unary Operator+</h1>

<p>Now this is where we can overload the unary <code class="language-plaintext highlighter-rouge">operator+</code> for our specific enun class in order to create a convenient notation to convert the enum to its underlying type. This ends up being a pretty simple bit of code, just a function which wraps up the static cast to the underlying type, like so:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">enum</span> <span class="k">class</span> <span class="nc">Foo</span> <span class="o">:</span> <span class="kt">uint16_t</span><span class="p">;</span>

<span class="k">constexpr</span> <span class="k">auto</span> <span class="k">operator</span><span class="o">+</span><span class="p">(</span><span class="n">Foo</span> <span class="n">value</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">return</span> <span class="k">static_cast</span><span class="o">&lt;</span><span class="n">std</span><span class="o">::</span><span class="n">underlying_type_t</span><span class="o">&lt;</span><span class="n">Foo</span><span class="o">&gt;&gt;</span><span class="p">(</span><span class="n">value</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Which can be used in code like this:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">enum</span> <span class="k">class</span> <span class="nc">Foo</span> <span class="o">:</span> <span class="kt">uint8_t</span>
<span class="p">{</span>
	<span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="c1">// ...</span>
	<span class="n">COUNT</span><span class="p">,</span>
<span class="p">};</span>

<span class="k">constexpr</span> <span class="k">auto</span> <span class="k">operator</span><span class="o">+</span><span class="p">(</span><span class="n">Foo</span> <span class="n">value</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">return</span> <span class="k">static_cast</span><span class="o">&lt;</span><span class="n">std</span><span class="o">::</span><span class="n">underlying_type_t</span><span class="o">&lt;</span><span class="n">Foo</span><span class="o">&gt;&gt;</span><span class="p">(</span><span class="n">value</span><span class="p">);</span>
<span class="p">}</span>

<span class="k">constexpr</span> <span class="n">Value</span> <span class="n">ValueFromFoo</span><span class="p">(</span><span class="n">Foo</span> <span class="n">foo</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">static</span> <span class="k">constexpr</span> <span class="n">Value</span> <span class="n">values</span><span class="p">[</span><span class="o">+</span><span class="n">Foo</span><span class="o">::</span><span class="n">COUNT</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="cm">/* ... */</span> <span class="p">};</span>
	<span class="k">return</span> <span class="n">values</span><span class="p">[</span><span class="o">+</span><span class="n">foo</span><span class="p">];</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Additionally in Visual Studio you can add the <code class="language-plaintext highlighter-rouge">[[msvc::intrinsic]]</code> attribute to the operator function it to make it more performant in debug builds<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>.</p>

<p>However this is not the best example of where a function like this is needed, as there are only two places where the unary <code class="language-plaintext highlighter-rouge">operator+</code> is used. I will present a much more extensive example of where this can be used to great benefit in a future article.</p>

<h1 id="caution">Caution</h1>

<p>Lastly a word of caution, only use this where it is absolutely necessary, preferably only defining it in C++ source files near where it is needed, and only for enum classes that actually need the functionality. My reasoning for this is that this is a single character function converts a strongly typed enumeration to an integral type which can easily and accidentally be used in arithmetic operations. Same as you wouldn’t automatically provide an <code class="language-plaintext highlighter-rouge">operator++</code> and <code class="language-plaintext highlighter-rouge">operator--</code> for an <code class="language-plaintext highlighter-rouge">enum class</code> unless you actually need to iterate over it in some code.</p>

<!-- Footnotes -->
<hr />

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>Issues include leaking enumeration names into the enclosing namespace and implicit conversion to integer types. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p><a href="https://devblogs.microsoft.com/cppblog/improving-the-state-of-debug-performance-in-c/">Improving the State of Debug Performance in C++</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="C++" /><category term="c++" /><category term="code" /><summary type="html"><![CDATA[The unary plus operator in C++ is one of those lesser known and even less frequently used operators in the language. I have been programming in C++ for many years and only recently started to use it when I found some nifty use cases. The first such case I found in the Carbon language compiler source code, where it was used to convert lambdas into function pointers. The second use case is in creating a convenient function for casting an enum class into its underlying type, which I will describe in this article.]]></summary></entry><entry><title type="html">Presenting at ACCU 2025</title><link href="https://www.dominikgrabiec.com/posts/2025/02/24/presenting_at_accu_2025.html" rel="alternate" type="text/html" title="Presenting at ACCU 2025" /><published>2025-02-24T11:00:00+00:00</published><updated>2025-02-24T11:00:00+00:00</updated><id>https://www.dominikgrabiec.com/posts/2025/02/24/presenting_at_accu_2025</id><content type="html" xml:base="https://www.dominikgrabiec.com/posts/2025/02/24/presenting_at_accu_2025.html"><![CDATA[<p>Just a quick post about me attending and presenting two talks at the <a href="https://accuconference.org/">ACCU conference</a> in April this year (2025).</p>

<p>The main talk will be an expanded version of my CppCon talk from last year titled <a href="https://accuconference.org/2025/session/optimising-data-building-in-game-development">“Optimising Data Building in Game Development”</a>, going into more detail and hopefully with a more consistent presentation. The second talk will be a shorter presentation on <a href="https://accuconference.org/2025/session/mistakes-with-data-made-during-game-development">mistakes that people have made in handling data</a> during game development.</p>

<!--more-->

<p>Originally I had only intended to do the one main presentation, but after submitting the proposal I felt that it would get rejected as I had already presented it. So I came up with some other ideas for shorter talks and submitted the best one, and now I am presenting both.</p>

<p>If you happen to be at the conference then I would be delighted if you would attend my talks, or even meet me in the hallway to say hi.</p>]]></content><author><name></name></author><category term="C++" /><category term="c++" /><category term="ACCU" /><category term="ACCU2025" /><category term="conferences" /><summary type="html"><![CDATA[Just a quick post about me attending and presenting two talks at the ACCU conference in April this year (2025). The main talk will be an expanded version of my CppCon talk from last year titled “Optimising Data Building in Game Development”, going into more detail and hopefully with a more consistent presentation. The second talk will be a shorter presentation on mistakes that people have made in handling data during game development.]]></summary></entry><entry><title type="html">How to Layout Data in C++ Classes</title><link href="https://www.dominikgrabiec.com/posts/2025/02/01/how_to_layout_data_in_classes.html" rel="alternate" type="text/html" title="How to Layout Data in C++ Classes" /><published>2025-02-01T11:00:00+00:00</published><updated>2025-02-01T11:00:00+00:00</updated><id>https://www.dominikgrabiec.com/posts/2025/02/01/how_to_layout_data_in_classes</id><content type="html" xml:base="https://www.dominikgrabiec.com/posts/2025/02/01/how_to_layout_data_in_classes.html"><![CDATA[<p>The layout of data members within a class is an important consideration in writing C++, it affects readability and understanding of the class, and can impact performance as well. There are a lot of things to consider when organising and ordering data members, and in this article I will go through my thoughts and explain the guidelines I used when writing C++ code.</p>

<!--more-->

<p>Take note that these are just guidelines and one size will not fit all situations, so feel free to mix and match them as required. I’ve ordered them roughly from most readable and least packed to least readable and most packed.</p>

<h3 id="initializer-list-order">Initializer List Order</h3>

<p>The most important thing to remember is that the order of initialisation of the member variables happens in the order they are declared in the class definition, not in the order they are listed in the initializer list in the constructor. There is a <a href="https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rc-order">C++ core guideline</a> which says to keep the order of members in the class definition and in the initializer list the same.</p>

<p>The main effect of this in regards to the layout of data members is that you will want to initialise some data members before others, especially when they are not just simple assignments. For example computing the <code class="language-plaintext highlighter-rouge">size</code> from <code class="language-plaintext highlighter-rouge">width</code> and <code class="language-plaintext highlighter-rouge">height</code> parameters before allocating an array to store the data.</p>

<h3 id="group-related-members">Group Related Members</h3>

<p>The first <em>(and probably main)</em> method of organisation is to group related members together so that they are logically close in the source code. An example of this is putting members related to storage of data in one group, and members related to efficiently finding the data in another group. I’d like to think that people do this naturally, grouping related members together surrounded with whitespace, but it may just be me.</p>

<p>One could argue that you don’t need to create these groupings because in principle a class should only do one thing, so all its members should be in one group. However in reality classes can do one thing but still contain many members and syb-systems used to accomplish that, or they are responsible for several things so it makes sense to group the members for those logically to aid readability.</p>

<p>One down side of this method is that you’re likely to get plenty of padding inside and in between groups of members, as smaller data members such as <code class="language-plaintext highlighter-rouge">bool</code>, <code class="language-plaintext highlighter-rouge">int</code>, etc are placed next to larger data structures which have bigger alignment requirements. Though depending on the class it might not matter as we will discuss below.</p>

<h3 id="group-members-by-usage">Group Members By Usage</h3>

<p>Related to the above method is to group members by usage together, so that they are physically close to each other in memory, and more likely to be on the same cache line in the processor. The main difference between the previous method and this is that you’re taking into consideration how the data is used and not just what part of the system it logically belongs to.</p>

<p>This can reduce readability of the class definition but in the same vein it can also increase performance in some specific circumstances.</p>

<h3 id="minimise-padding">Minimise Padding</h3>

<p>The next major method of organisation is to arrange the members of a class in a way that minimises padding and wasted space within the class. This is especially useful when you’re creating many thousands or millions of instances of these classes, as each byte of wasted space becomes significant in aggregate. This also helps to efficiently pack these classes contiguously in memory, but it can come at a cost to readability.</p>

<p>An important detail to remember here is that a class’s size is a multiple of its alignment, and its alignment is the highest alignment of its members. This means that padding will be inserted after the last data member to make the class’s size a multiple of its alignment.</p>

<p>The simplest way to minimise padding is to put the elements with the highest alignment requirements at the beginning of the class, followed by members with successively smaller alignment requirements, with byte sized elements like <code class="language-plaintext highlighter-rouge">bool</code> or <code class="language-plaintext highlighter-rouge">uint8_t</code> at the end. Of course in doing this there may be gaps created in between the larger members, and in this case fill the gaps by moving any appropriately sized elements in between the larger ones. If all goes well then there should not be any padding, or only a few bytes of padding at the end of the class.</p>

<blockquote>
  <p>Think of this as filling a jar with various sized rocks, first put in the biggest rocks (big members), then fill the gaps in with smaller pebbles (smaller members, integers, etc), and finally pour in the sand to fill in the remaining space (bytes and booleans).</p>
</blockquote>

<p>This works best with more plain-old-data style classes and structures that contain many smaller sized members that themselves have smaller alignment requirements and do not have any internal padding. If larger classes are used be aware that they might have their own internal padding which is created automatically due to their alignment requirements. A simple example of this is:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">BufferView</span>	<span class="c1">// Size: 16 bytes, Alignment: 8 bytes</span>
<span class="p">{</span>
	<span class="kt">void</span><span class="o">*</span> <span class="n">data</span><span class="p">;</span>	<span class="c1">// Size: 8 bytes, Alignment: 8 bytes</span>
	<span class="kt">int</span> <span class="n">size</span><span class="p">;</span>	<span class="c1">// Size: 4 bytes, Alignment: 4 bytes</span>
			<span class="c1">// Padding: 4 bytes</span>
<span class="p">};</span>
</code></pre></div></div>

<p>Making this a member of another class will introduce 4 bytes of padding each time it is used, even if it is followed by a 4-byte value which would otherwise fit within the padding.</p>

<h4 id="packing">Packing</h4>

<p>The remedy for this situation is to use compiler-specific attributes and pragmas to specify the packing of elements within the classes. For MSVC this involves surrounding the class declaration(s) with a <a href="https://learn.microsoft.com/en-us/cpp/preprocessor/pack">pragma pack declaration</a> like so:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#pragma pack(push, 1)	// Could also be 4 instead of 1
</span><span class="k">struct</span> <span class="nc">BufferView</span>	<span class="c1">// Size: 12 bytes, Alignment: 1 byte</span>
<span class="p">{</span>
	<span class="kt">void</span><span class="o">*</span> <span class="n">data</span><span class="p">;</span>	<span class="c1">// Size: 8 bytes, Alignment: 8 bytes</span>
	<span class="kt">int</span> <span class="n">size</span><span class="p">;</span>	<span class="c1">// Size: 4 bytes, Alignment: 4 bytes</span>
<span class="p">};</span>
<span class="cp">#pragma pack(pop)
</span></code></pre></div></div>

<p>Likewise on GCC and Clang you can use the <a href="https://gcc.gnu.org/onlinedocs/gcc-14.2.0/gcc/Common-Type-Attributes.html#index-packed-type-attribute">packed attribute</a> to tell the compiler to pack the members tightly. Note that the packed attribute has to be applied to each class declaration separately in order to pack the elements as tightly as possible.</p>

<blockquote>
  <p>Visual Studio 2022 version 17.8.0 introduced a neat little feature to show the size of a type or value in a tooltip, it helps as you can quickly and easily see what effect moving a member has on the size of the class. There are also other plugins which help visualise class members and padding, though I do not use them.</p>
</blockquote>

<h3 id="compressing-members">Compressing Members</h3>

<p>In some cases just packing the elements is not enough, as the size of the members just exceeds a multiple of the alignment, thereby creating a relatively large amount of padding at the end of the class.</p>

<p>These techniques can be used to combat this, helping reduce the size of the data and making it fit more nicely within a multiple of the alignment. This becomes important when these classes are stored contiguously and processed by performance critical code, as more instances can be packed within the same amount of memory.</p>

<h4 id="combine-booleans">Combine Booleans</h4>

<p>The first technique is to combine <code class="language-plaintext highlighter-rouge">bool</code> values into a bitfield, as traditionally each <code class="language-plaintext highlighter-rouge">bool</code> value takes a byte. This can be done by having an integer and manually using masks, or by declaring a C++ bitfield. One trick I learned is to create a bitfield using <code class="language-plaintext highlighter-rouge">bool</code>, like <code class="language-plaintext highlighter-rouge">bool a : 1;</code>, <code class="language-plaintext highlighter-rouge">bool b : 1;</code> etc, which has the advantage of being descriptive but also combining adjacent values together.</p>

<p>So instead of something like this:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">Example</span>
<span class="p">{</span>
	<span class="kt">bool</span> <span class="n">a</span><span class="p">;</span>
	<span class="kt">bool</span> <span class="n">b</span><span class="p">;</span>
	<span class="kt">bool</span> <span class="n">c</span><span class="p">;</span>
<span class="p">};</span>
<span class="k">static_assert</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="n">Example</span><span class="p">)</span> <span class="o">==</span> <span class="mi">3</span><span class="p">);</span>
</code></pre></div></div>

<p>It would instead be something like this:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">Example</span>
<span class="p">{</span>
	<span class="kt">bool</span> <span class="n">a</span> <span class="o">:</span> <span class="mi">1</span><span class="p">;</span>
	<span class="kt">bool</span> <span class="n">b</span> <span class="o">:</span> <span class="mi">1</span><span class="p">;</span>
	<span class="kt">bool</span> <span class="n">c</span> <span class="o">:</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">};</span>
<span class="k">static_assert</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="n">Example</span><span class="p">)</span> <span class="o">==</span> <span class="mi">1</span><span class="p">);</span>
</code></pre></div></div>

<h4 id="bitpack-values">Bitpack Values</h4>

<p>The next technique is to bitpack smaller integer values <em>(and enumerations)</em> together in a larger field. For example when you have 3 integer values that range only from 0 to 1000 you can store them as three 10 bit values inside a single <code class="language-plaintext highlighter-rouge">uint32_t</code> instead of one for each value. The left-over bits can be used for flags.</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">Example</span>
<span class="p">{</span>
	<span class="kt">uint32_t</span> <span class="n">r</span> <span class="o">:</span> <span class="mi">10</span><span class="p">;</span>
	<span class="kt">uint32_t</span> <span class="n">g</span> <span class="o">:</span> <span class="mi">10</span><span class="p">;</span>
	<span class="kt">uint32_t</span> <span class="n">b</span> <span class="o">:</span> <span class="mi">10</span><span class="p">;</span>
	<span class="kt">uint32_t</span> <span class="n">a</span> <span class="o">:</span> <span class="mi">2</span><span class="p">;</span>
<span class="p">};</span>
<span class="k">static_assert</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="n">Example</span><span class="p">)</span> <span class="o">==</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">uint32_t</span><span class="p">));</span>
</code></pre></div></div>

<h4 id="encode-values">Encode Values</h4>

<p>A more advanced version of this would be to use some other encoding scheme to store multiple values in the same integer variable. This can be something simple like encoding in the same way that multidimensional array indexes are calculated. For example storing values <code class="language-plaintext highlighter-rouge">a</code>, <code class="language-plaintext highlighter-rouge">b</code>, <code class="language-plaintext highlighter-rouge">c</code>, as: <code class="language-plaintext highlighter-rouge">v = (a + A * (b + B * c))</code>, and then decoding it using <code class="language-plaintext highlighter-rouge">/</code> and <code class="language-plaintext highlighter-rouge">%</code>. There are other encoding schemes that could be used, but the more complex the encoding is the slower it will be to interact with the values.</p>

<p>An example of the simple multidimensional array index calculation:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint64_t</span> <span class="nf">encode</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">x</span><span class="p">,</span> <span class="kt">uint32_t</span> <span class="n">y</span><span class="p">,</span> <span class="kt">uint32_t</span> <span class="n">z</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">return</span> <span class="n">x</span> <span class="o">+</span> <span class="n">MAX_X</span> <span class="o">*</span> <span class="p">(</span><span class="n">y</span> <span class="o">+</span> <span class="n">MAX_Y</span> <span class="o">*</span> <span class="n">z</span><span class="p">);</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="n">decode</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="n">value</span><span class="p">,</span> <span class="kt">uint32_t</span><span class="o">&amp;</span> <span class="n">x</span><span class="p">,</span> <span class="kt">uint32_t</span><span class="o">&amp;</span> <span class="n">y</span><span class="p">,</span> <span class="kt">uint32_t</span> <span class="o">&amp;</span> <span class="n">z</span><span class="p">)</span>
<span class="p">{</span>
	<span class="n">z</span> <span class="o">=</span> <span class="n">value</span> <span class="o">%</span> <span class="n">MAX_Y</span><span class="p">;</span>
	<span class="n">value</span> <span class="o">/=</span> <span class="n">MAX_Y</span><span class="p">;</span>
	<span class="n">y</span> <span class="o">=</span> <span class="n">value</span> <span class="o">%</span> <span class="n">MAX_X</span><span class="p">;</span>
	<span class="n">value</span> <span class="o">/=</span> <span class="n">MAX_X</span><span class="p">;</span>
	<span class="n">x</span> <span class="o">=</span> <span class="n">value</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<h4 id="quantise-floating-point-values">Quantise Floating Point Values</h4>

<p>If dealing with floating point values then a good technique is to quantise them and store them in smaller integer types. The simplest version is to store normalised values (ranging from <code class="language-plaintext highlighter-rouge">0.0</code> to <code class="language-plaintext highlighter-rouge">1.0</code>) in unsigned integer types like <code class="language-plaintext highlighter-rouge">uint8_t</code> or <code class="language-plaintext highlighter-rouge">uint16_t</code>, where <code class="language-plaintext highlighter-rouge">0.0</code> maps to <code class="language-plaintext highlighter-rouge">0</code> and <code class="language-plaintext highlighter-rouge">1.0</code> maps to <code class="language-plaintext highlighter-rouge">255</code> or <code class="language-plaintext highlighter-rouge">65535</code> respectively. Values ranging from <code class="language-plaintext highlighter-rouge">-1.0</code> to <code class="language-plaintext highlighter-rouge">1.0</code> can be similarly stored in signed integer types, and as long as the range is limited the values can be quantised into a smaller type while retaining reasonable accuracy. The tricky part is that the range has to be defined in code and not in data for there to be a reduction in size.</p>

<p>For example to quantise a value between 0 and 1 into a smaller integer:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint8_t</span> <span class="nf">encode</span><span class="p">(</span><span class="kt">float</span> <span class="n">value</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">return</span> <span class="k">static_cast</span><span class="o">&lt;</span><span class="kt">uint8_t</span><span class="o">&gt;</span><span class="p">(</span><span class="n">value</span> <span class="o">*</span> <span class="mf">255.0</span><span class="n">f</span><span class="p">);</span>
<span class="p">}</span>

<span class="kt">float</span> <span class="n">decode</span><span class="p">(</span><span class="kt">uint8_t</span> <span class="n">value</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">return</span> <span class="n">value</span> <span class="o">/</span> <span class="mf">255.0</span><span class="n">f</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<h4 id="drop-computable-values">Drop Computable Values</h4>

<p>Another technique which is common in graphics programming is to drop elements of compound types when they can easily be recomputed. When dealing with normalised vectors and unit quaternions, in addition to quantising their values, one of them can be dropped entirely. One caveat to that is the sign of the dropped element must be stored somewhere or explicitly known externally, as squaring a number will make it positive.</p>

<h3 id="why-and-when">Why and When</h3>

<p>A lot of this only matters if the class you’re writing will have many <em>(millions of)</em> instances created of it and therefore you need to organise your data for optimum efficiency. In all other cases you should make it easy to read and easy to understand.</p>

<p>To summarise:</p>

<ul>
  <li>If your class will only be instantiated a handful of times, then readability is far more important than data layout, so no need to optimise.</li>
  <li>If your class has a container inside of it, then you will probably be better off optimising the class stored inside the container.</li>
  <li>If your class contains another large class inside of it, then optimise that class first.</li>
  <li>If you instantiate millions of instances then pay special attention to the class and optimise it.</li>
  <li>If you are optimising to get a performance improvement then measure, measure, and measure!</li>
</ul>]]></content><author><name></name></author><category term="C++" /><category term="c++" /><category term="data" /><summary type="html"><![CDATA[The layout of data members within a class is an important consideration in writing C++, it affects readability and understanding of the class, and can impact performance as well. There are a lot of things to consider when organising and ordering data members, and in this article I will go through my thoughts and explain the guidelines I used when writing C++ code.]]></summary></entry><entry><title type="html">CppCon 2024 Presentation &amp;amp; Review</title><link href="https://www.dominikgrabiec.com/posts/2025/01/19/cppcon_2024_review.html" rel="alternate" type="text/html" title="CppCon 2024 Presentation &amp;amp; Review" /><published>2025-01-19T11:00:00+00:00</published><updated>2025-01-19T11:00:00+00:00</updated><id>https://www.dominikgrabiec.com/posts/2025/01/19/cppcon_2024_review</id><content type="html" xml:base="https://www.dominikgrabiec.com/posts/2025/01/19/cppcon_2024_review.html"><![CDATA[<p>In September last year (2024) I attended <a href="https://cppcon.org/">CppCon</a> to present my talk about optimising multi-threaded data building for game development. It was quite a hectic and busy experience, talking to people, attending sessions, many of which were not recorded, and most importantly learning about what other people are doing in the C++ community.</p>

<!--more-->

<p>I’m posting this now quite a few months after the event, as I’ve been busy with work and other projects. Also the video of <a href="https://www.youtube.com/watch?v=ZrpB0gLteUI">my presentation</a> has been officially released, with the other videos for the event also slowly trickling out on the <a href="https://www.youtube.com/@CppCon">CppCon YouTube Channel</a>,<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>

<p>This was my first ever public presentation and I was a bit nervous, so I’m thankful to my friends and colleagues who helped me do practice runs of the presentation at home and at work before going. This was incredibly helpful since at the even I had some technical difficulties with my laptop refusing to cooperate with the AV equipment<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>, so I had to give the talk from memory with no speaker notes, just the slides on screen.</p>

<p>The event itself was kind of intense, I tried to attend every talk, especially the ones that weren’t being recorded (the open sessions in the morning, during the lunch break, and in the evening). Meaning most days started at 8am and finished at 10pm, though with breaks in between. Luckily for me the hotel was pretty cool, with an impressive view of the rocky mountains in Colorado, and warm pools to soak in.</p>

<p>If people are interested in C++ I would definitely recommend going to CppCon, I had a blast there and would love to go again.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>If you want to you can buy access to all the CppCon videos <a href="https://cppcon.programmingarchive.com/">here</a>. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p>I suspect this might be due to the laptop being old and not having enough power to transmit the HDMI signal over a distance greater than 1.5 metres, or something like that. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="C++" /><category term="c++" /><category term="CppCon" /><category term="CppCon2024" /><category term="conferences" /><summary type="html"><![CDATA[In September last year (2024) I attended CppCon to present my talk about optimising multi-threaded data building for game development. It was quite a hectic and busy experience, talking to people, attending sessions, many of which were not recorded, and most importantly learning about what other people are doing in the C++ community.]]></summary></entry><entry><title type="html">Presenting at CppCon 2024</title><link href="https://www.dominikgrabiec.com/posts/2024/08/25/presenting_at_cppcon_2024.html" rel="alternate" type="text/html" title="Presenting at CppCon 2024" /><published>2024-08-25T11:00:00+00:00</published><updated>2024-08-25T11:00:00+00:00</updated><id>https://www.dominikgrabiec.com/posts/2024/08/25/presenting_at_cppcon_2024</id><content type="html" xml:base="https://www.dominikgrabiec.com/posts/2024/08/25/presenting_at_cppcon_2024.html"><![CDATA[<p>Earlier in May of this year I came across a call for submissions for the <a href="https://cppcon.org/cfs2024-gamedevchair/">Game Development track at CppCon</a>. Having had the loose idea for a talk in my head for the longest time I submitted a last minute proposal, and to my delighted surprise they accepted it!</p>

<p>The talk is titled <a href="https://cppcon2024.sched.com/event/1gZg6/techniques-to-optimise-multithreaded-data-building-during-game-development">“Techniques to Optimise Multi-threaded Data Building During Game Development”</a>, and I know it’s quite a mouthful but they said to be detailed.</p>

<!--more-->

<p>This will be my first big public presentation and I am honoured to have been selected. So now I am deep into preparing and practicing the presentation, writing, coding, editing, and everything else that comes with it. I’ll see what I can release publicly after the conference but at least the slides will be available on the CppCon github, and if I can I’ll post something on my own Github.</p>

<p>If you happen to be at the conference then I would be delighted if you would attend my talk, and if not then please say hi!</p>]]></content><author><name></name></author><category term="C++" /><category term="c++" /><category term="CppCon" /><category term="CppCon2024" /><category term="conferences" /><summary type="html"><![CDATA[Earlier in May of this year I came across a call for submissions for the Game Development track at CppCon. Having had the loose idea for a talk in my head for the longest time I submitted a last minute proposal, and to my delighted surprise they accepted it! The talk is titled “Techniques to Optimise Multi-threaded Data Building During Game Development”, and I know it’s quite a mouthful but they said to be detailed.]]></summary></entry><entry><title type="html">Upgrading Assert Macro in C++</title><link href="https://www.dominikgrabiec.com/posts/2024/06/21/upgrading_assert_macro.html" rel="alternate" type="text/html" title="Upgrading Assert Macro in C++" /><published>2024-06-21T11:00:00+00:00</published><updated>2024-06-21T11:00:00+00:00</updated><id>https://www.dominikgrabiec.com/posts/2024/06/21/upgrading_assert_macro</id><content type="html" xml:base="https://www.dominikgrabiec.com/posts/2024/06/21/upgrading_assert_macro.html"><![CDATA[<p>An article detailing investigations and upgrades to the <a href="/posts/2023/02/28/making_a_flexible_assert.html">Flexible Assert Macro</a> to fix some oversights and add some C++20 features which improve the generated code. These updates are now available on <a href="https://github.com/DominikGrabiec/Assert">Github</a>.</p>

<!--more-->

<h1 id="making-conditions-unlikely">Making Conditions Unlikely</h1>

<p>The first update was to add the <code class="language-plaintext highlighter-rouge">[[unlikely]]</code> <a href="https://en.cppreference.com/w/cpp/language/attributes/likely">attribute</a> to the assert condition. This will tell the compiler to generate the assembly code under the assumption that the condition will not be true at runtime <em>(but not with the assumption it will never be true)</em>.</p>

<p>This actually changes the generated assembly quite a bit in places, moving the handling of the assert to the end of the function and out of the immediate code to execute. While I haven’t measured any performance impact of this change, the assembly code looks tidier and because the regular function code doesn’t need a jump to reach it should be more efficient.</p>

<p><strong>Before Unlikely</strong></p>

<pre><code class="language-assembly">; 9    : 	ASSERT(name.length() &lt;= 255);

	cmp	QWORD PTR [rcx+16], 255			; 000000ffH
	mov	rbx, rcx
	jbe	SHORT $LN2@Example
	lea	rax, OFFSET FLAT:??_C@_08MPNMAILL@Test?4cpp@
	mov	DWORD PTR $T1[rsp], 9
	mov	QWORD PTR $T1[rsp+8], rax
	lea	rdx, OFFSET FLAT:??_C@_0BF@CFNDKGCM@name?4length?$CI?$CJ?5?$DM?$DN?5255@
	lea	rax, OFFSET FLAT:??_C@_0HF@HDFDIDJI@int?5__cdecl?5Example?$CIconst?5class@
	mov	DWORD PTR $T1[rsp+4], 2
	lea	rcx, QWORD PTR $T1[rsp]
	mov	QWORD PTR $T1[rsp+16], rax
	call	?handle_assert@error@@YAXUsource_location@std@@PEBD@Z ; error::handle_assert
	int	3
	call	?terminate@std@@YAXXZ			; std::terminate
	int	3
$LN2@Example:

; ... Normal function code here ...
</code></pre>

<p><strong>After Unlikely</strong></p>

<pre><code class="language-assembly">; 9    : 	ASSERT(name.length() &lt;= 255);

	cmp	QWORD PTR [rcx+16], 255			; 000000ffH
	mov	rbx, rcx
	ja	SHORT $LN20@Example

; ... Normal function code here ...

$LN20@Example:

; 9    : 	ASSERT(name.length() &lt;= 255);

	lea	rax, OFFSET FLAT:??_C@_08MPNMAILL@Test?4cpp@
	mov	DWORD PTR $T1[rsp], 9
	mov	QWORD PTR $T1[rsp+8], rax
	lea	rdx, OFFSET FLAT:??_C@_0BF@CFNDKGCM@name?4length?$CI?$CJ?5?$DM?$DN?5255@
	lea	rax, OFFSET FLAT:??_C@_0HF@HDFDIDJI@int?5__cdecl?5Example?$CIconst?5class@
	mov	DWORD PTR $T1[rsp+4], 2
	lea	rcx, QWORD PTR $T1[rsp]
	mov	QWORD PTR $T1[rsp+16], rax
	call	?handle_assert@error@@YAXUsource_location@std@@PEBD@Z ; error::handle_assert
	int	3
	call	?terminate@std@@YAXXZ			; std::terminate
	int	3
</code></pre>

<h1 id="checking-if-debugger-is-attached">Checking if Debugger is Attached</h1>

<p>The next investigation was trying various ways to integrate a check to see if a debugger was attached before triggering the debug break. The main goal behind this was that when running the program with a debugger attached it would trigger the breakpoint and allow the programmer to see the assert that was triggered, and when running outside of a debugger it would just terminate without triggering a breakpoint.</p>

<p>The simplest way of doing this was to wrap the <code class="language-plaintext highlighter-rouge">__debugbreak()</code> (or <code class="language-plaintext highlighter-rouge">DebugBreak()</code>) with a check like <code class="language-plaintext highlighter-rouge">if (IsDebuggerPresent()) { ... }</code>. Doing this added a function call, a test, and a jump to the assert code, which in most cases made the code significantly larger. It also required forward declaring or including <code class="language-plaintext highlighter-rouge">debugapi.h</code>, into what otherwise is a fairly low level header.</p>

<p>Another way of doing this was to move the <code class="language-plaintext highlighter-rouge">IsDebuggerPresent()</code> call to be inside the <code class="language-plaintext highlighter-rouge">handle_assert</code> function, and have that return a boolean indicating if the breakpoint should be triggered or not. This eliminated a function call instruction from the assert macro but it didn’t clean up the assembly all that much.</p>

<p>Overall I wasn’t happy with either of these solutions so I ended up looking for alternatives, but not ones which would require me to implement magical assembly or weird intrinsics. <em>(For reference most alternatives involved manually implementing the <code class="language-plaintext highlighter-rouge">IsDebuggerPresent()</code> function by looking up the debugger present flag in the thread information block in Windows. As such I didn’t want the support burden to keep this up to date with newer versions of Windows.)</em></p>

<p>It was when I was investigating how to handle other program faults that I realised that asserts (and error handling in general) need to be handled differently in developer and retail versions of the program. During development you want to use breakpoints to catch problems early, either by running in a debugger or by being able to attach one as easily as possible. However in retail mode you cannot do that so you want to create a detailed error report with plenty of supporting information, and send that to yourself as a package in order to try and figure out what went wrong.</p>

<p>This means that a separate retail version of the assert macro and assert handler function will need to be created, though that can be done at a later time together with a more thorough error reporting system.</p>

<h1 id="actually-making-it-fatal">Actually Making it Fatal</h1>

<p>The last thing to add was a call to <code class="language-plaintext highlighter-rouge">std::terminate()</code> inside the macros to actually make the asserts fatal and exit the program.</p>

<p>One interesting thing discovered by doing this was that in some cases adding the terminate function to the macro caused the compiler to move the implementation of the assert contents to the end of the function, in a similar way as when adding the unlikely attribute. But it did not do this in every situation, therefore using the unlikely attribute is still a good idea.</p>]]></content><author><name></name></author><category term="C++" /><category term="c++" /><category term="preprocessor" /><category term="assert" /><summary type="html"><![CDATA[An article detailing investigations and upgrades to the Flexible Assert Macro to fix some oversights and add some C++20 features which improve the generated code. These updates are now available on Github.]]></summary></entry><entry><title type="html">Classifying Characters with Simple Functions</title><link href="https://www.dominikgrabiec.com/posts/2023/12/08/classifying_characters_with_functions.html" rel="alternate" type="text/html" title="Classifying Characters with Simple Functions" /><published>2023-12-08T11:00:00+00:00</published><updated>2023-12-08T11:00:00+00:00</updated><id>https://www.dominikgrabiec.com/posts/2023/12/08/classifying_characters_with_functions</id><content type="html" xml:base="https://www.dominikgrabiec.com/posts/2023/12/08/classifying_characters_with_functions.html"><![CDATA[<p>This is the second in a <a href="/posts/2023/12/04/classifying_characters_introduction.html">series of articles</a> I’m writing on character classification as used in lexers and compilers. In this I describe the simplest method of character classification which is using plain functions with the logic directly inside.</p>

<!--more-->

<p>This is the simplest method to understand and implement, it’s the logic that you would write within an if statement just wrapped up in a convenient and descriptive function. In general you would write a function for each character classification that you need to distinguish.</p>

<h2 id="the-code">The Code</h2>

<p>There are two main ways of performing the tests, checking a range of characters such as <code class="language-plaintext highlighter-rouge">'0'</code> - <code class="language-plaintext highlighter-rouge">'9'</code>, and testing individual characters such as <code class="language-plaintext highlighter-rouge">'$'</code>. The way I’ve written the examples below is designed to make it easy to read the ranges being tested in each function.</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">constexpr</span> <span class="kt">bool</span> <span class="nf">IsNumber</span><span class="p">(</span><span class="kt">char</span> <span class="n">c</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">return</span> <span class="sc">'0'</span> <span class="o">&lt;=</span> <span class="n">c</span> <span class="o">&amp;&amp;</span> <span class="n">c</span> <span class="o">&lt;=</span> <span class="sc">'9'</span><span class="p">;</span>
<span class="p">}</span>

<span class="k">constexpr</span> <span class="kt">bool</span> <span class="n">IsAlpha</span><span class="p">(</span><span class="kt">char</span> <span class="n">c</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">return</span> <span class="p">(</span><span class="sc">'A'</span> <span class="o">&lt;=</span> <span class="n">c</span> <span class="o">&amp;&amp;</span> <span class="n">c</span> <span class="o">&lt;=</span> <span class="sc">'Z'</span><span class="p">)</span> <span class="o">||</span> <span class="p">(</span><span class="sc">'a'</span> <span class="o">&lt;=</span> <span class="n">c</span> <span class="o">&amp;&amp;</span> <span class="n">c</span> <span class="o">&lt;=</span> <span class="sc">'z'</span><span class="p">);</span>
<span class="p">}</span>

<span class="k">constexpr</span> <span class="kt">bool</span> <span class="n">IsWhitespace</span><span class="p">(</span><span class="kt">char</span> <span class="n">c</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">return</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">' '</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'\t'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'\r'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'\n'</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>These can also be broken down into more specific functions like <code class="language-plaintext highlighter-rouge">IsLower</code> and <code class="language-plaintext highlighter-rouge">IsUpper</code>, and combined to create character classifiers of a more broad type. By using the C++ <code class="language-plaintext highlighter-rouge">constexpr</code> keyword it pretty much guarantees <em>(when compiled with any optimisation level enabled)</em> that the function code will be inlined rather than cause a function call in assembly. So much so in fact that I had to remove the <code class="language-plaintext highlighter-rouge">constexpr</code> or make secondary non-constexpr functions when testing in order to see the assembly output for GCC and Clang. In some ways MSVC is nice in that it emits the assembly code for inlined and constexpr functions anyway.</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">constexpr</span> <span class="kt">bool</span> <span class="nf">IsNumber</span><span class="p">(</span><span class="kt">char</span> <span class="n">c</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">return</span> <span class="sc">'0'</span> <span class="o">&lt;=</span> <span class="n">c</span> <span class="o">&amp;&amp;</span> <span class="n">c</span> <span class="o">&lt;=</span> <span class="sc">'9'</span><span class="p">;</span>
<span class="p">}</span>

<span class="k">constexpr</span> <span class="kt">bool</span> <span class="n">IsLower</span><span class="p">(</span><span class="kt">char</span> <span class="n">c</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">return</span> <span class="sc">'a'</span> <span class="o">&lt;=</span> <span class="n">c</span> <span class="o">&amp;&amp;</span> <span class="n">c</span> <span class="o">&lt;=</span> <span class="sc">'z'</span><span class="p">;</span>
<span class="p">}</span>

<span class="k">constexpr</span> <span class="kt">bool</span> <span class="n">IsUpper</span><span class="p">(</span><span class="kt">char</span> <span class="n">c</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">return</span> <span class="sc">'A'</span> <span class="o">&lt;=</span> <span class="n">c</span> <span class="o">&amp;&amp;</span> <span class="n">c</span> <span class="o">&lt;=</span> <span class="sc">'Z'</span><span class="p">;</span>
<span class="p">}</span>

<span class="k">constexpr</span> <span class="kt">bool</span> <span class="n">IsAlpha</span><span class="p">(</span><span class="kt">char</span> <span class="n">c</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">return</span> <span class="n">IsLower</span><span class="p">(</span><span class="n">c</span><span class="p">)</span> <span class="o">||</span> <span class="n">IsUpper</span><span class="p">(</span><span class="n">c</span><span class="p">);</span>
<span class="p">}</span>

<span class="k">constexpr</span> <span class="kt">bool</span> <span class="n">IsAlphaNum</span><span class="p">(</span><span class="kt">char</span> <span class="n">c</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">return</span> <span class="n">IsAlpha</span><span class="p">(</span><span class="n">c</span><span class="p">)</span> <span class="o">||</span> <span class="n">IsNumber</span><span class="p">(</span><span class="n">c</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<blockquote>
  <p>Note that this is just a short selection of functions and not an exhaustive set that one might need.</p>
</blockquote>

<h2 id="the-generated-assembly">The Generated Assembly</h2>

<p>In examining the assembly generated by each compiler for the code in this article I made some interesting observations.</p>

<p>In general each compiler will attempt to optimise the code to the best of their ability, and some common optimisation techniques are:</p>
<ul>
  <li>Combining tests for adjacent values into a simpler range test.</li>
  <li>Combining tests for disjoint but close values (within 64) into tests against a computed bit mask.</li>
</ul>

<p>There are also differences between compilers in what assembly instructions they generate, with which you can make some generalisations:</p>
<ul>
  <li><strong>Clang</strong> produces assembly code which tries to avoid small branches whenever possible, either by evaluating all conditions and then combining the result, or by using a jump table. This seems more suited towards newer processor architectures and reflects on Clang relatively recent creation.</li>
  <li><strong>MSVC</strong> produces similar but smaller assembly code although it uses branches to short circuit evaluating all the conditions. This sort of code reminds me of programming for older processor architectures with more limited memory.</li>
  <li><strong>GCC</strong> produces code that can be seen as a mix of the other two compilers. Sometimes closer to MSVC and sometimes to Clang, and sometimes unfortunately it also produces the most confusing code.</li>
</ul>

<h3 id="single-ranges">Single Ranges</h3>

<p>For functions which only test a single range, such as <code class="language-plaintext highlighter-rouge">IsNumber</code>, all compilers effectively generate code similar to:</p>

<pre><code class="language-assembly">IsNumber(char):
	sub     cl, 48
	cmp     cl, 9
	setbe   al
	ret     0
</code></pre>

<p>Which in C++ is equivalent to:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">bool</span> <span class="nf">IsNumber</span><span class="p">(</span><span class="kt">char</span> <span class="n">c</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">return</span> <span class="p">(</span><span class="n">c</span> <span class="o">-</span> <span class="mi">48</span><span class="p">)</span> <span class="o">&lt;=</span> <span class="mi">9</span><span class="p">;</span> <span class="c1">// '0' == 48</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This also ends up being the general pattern used for testing ranges, where the minimum of the range is subtracted from the value and then its tested against the length of the range. In this way only a single comparison is needed rather separately testing and branching for each of the minimum and maximum.</p>

<h3 id="multiple-ranges">Multiple Ranges</h3>

<p>For functions which test multiple ranges the compilers generate different code at all optimisation levels. Using the <code class="language-plaintext highlighter-rouge">IsAlphaNum</code> function as our subject for comparison and compiling at the <code class="language-plaintext highlighter-rouge">O2</code> optimisation level we can clearly see the differences.</p>

<p><strong>MSVC</strong> generates assembly code which most accurately reflects the C++ language semantics in the original code. It tests each condition in an optimised form but then jumps to the end if true, mirroring the short-circuit evaluation of the original C++ source code.</p>

<pre><code class="language-assembly">bool IsAlphaNum(char) PROC
        lea     eax, DWORD PTR [rcx-65]
        cmp     al, 25
        jbe     SHORT $LN3@IsAlphaNum
        lea     eax, DWORD PTR [rcx-97]
        cmp     al, 25
        jbe     SHORT $LN3@IsAlphaNum
        sub     cl, 48
        cmp     cl, 9
        jbe     SHORT $LN3@IsAlphaNum
        xor     al, al
        ret     0
$LN3@IsAlphaNum:
        mov     al, 1
        ret     0
bool IsAlphaNum(char) ENDP
</code></pre>

<blockquote>
  <p>The MSVC implementation in the code above uses the <code class="language-plaintext highlighter-rouge">lea</code> instruction to compute the initial subtraction of the minimum value before testing. For example the first <code class="language-plaintext highlighter-rouge">lea</code> computes <code class="language-plaintext highlighter-rouge">eax = ecx - 65</code>.</p>
</blockquote>

<p><strong>GCC</strong> actually does a similar thing, where it jumps to the end if the first <code class="language-plaintext highlighter-rouge">IsAlpha</code> condition is true, but it only has a single branch as the assembly it generates for <code class="language-plaintext highlighter-rouge">IsAlpha</code> has no branches.</p>

<pre><code class="language-assembly">IsAlphaNum(char):
        mov     eax, edi
        mov     edx, 1
        and     eax, -33
        sub     eax, 65
        cmp     al, 25
        jbe     .L6
        sub     edi, 48
        cmp     dil, 9
        setbe   dl
.L6:
        mov     eax, edx
        ret
</code></pre>

<blockquote>
  <p>In the GCC implementation it uses <code class="language-plaintext highlighter-rouge">and</code> to make make the character upper case and then performs the test on that. As <code class="language-plaintext highlighter-rouge">-33</code> is <code class="language-plaintext highlighter-rouge">1101 1111</code> in binary.</p>
</blockquote>

<p><strong>Clang</strong> on the other hand follows the intention of the function and generates assembly which produces the right result but does not strictly represent the language semantics of the C++ code as written. Specifically it does not perform any short-circuit evaluation of the logical code and just tests all conditions, combining the result at the end.</p>

<pre><code class="language-assembly">IsAlphaNum(char):
        mov     eax, edi
        and     al, -33
        add     al, -65
        cmp     al, 26
        setb    cl
        add     dil, -48
        cmp     dil, 10
        setb    al
        or      al, cl
        ret
</code></pre>

<p>My intuition says that the branch-less code that Clang produces should run marginally faster than the other code with branches, as getting a branch misprediction costs many cycles, where as executing a handful more simple instructions would almost be free.</p>

<h4 id="reordering-range-conditions">Reordering Range Conditions</h4>

<p>It is generally advisable when writing functions to test equality to put the test that partitions the search space the most, first. For example the test that can eliminate 90% of values should go before the test that eliminates only 50%.</p>

<p>To apply it to this case would mean putting the test <code class="language-plaintext highlighter-rouge">c &lt;= '9'</code> (which eliminates 184 values) before <code class="language-plaintext highlighter-rouge">c &gt;= '0'</code> (which eliminates 60 values), and likewise swapping the order of tests for the alphabet ranges.</p>

<p>However when investigating this with <a href="https://godbolt.org/z/zPMhPWo18">Compiler Explorer</a> these changes generally had no effect on the generated assembly code, but in some cases made the assembly code worse.</p>

<ul>
  <li>For the simplest functions such as <code class="language-plaintext highlighter-rouge">IsNumber</code> there was no difference in the generated assembly.</li>
  <li>For slightly more complicated functions such as <code class="language-plaintext highlighter-rouge">IsAlpha</code> the generated assembly was slightly larger and contained branching on all compilers.</li>
  <li>Interestingly enough the reordered versions which called functions rather than do the comparisons directly were just as optimised as the simplest functions.</li>
</ul>

<p>So in this case the idea to take away from this is to write simple straightforward code that is easy for both people and the compiler to understand.</p>

<h3 id="multiple-characters">Multiple Characters</h3>

<p>The other type of tests involved directly compare against individual characters rather than ranges of characters. An example of this is the <code class="language-plaintext highlighter-rouge">IsWhitespace</code> function from the beginning of the article, though here is a more complete version which tests all of the white-space characters including the lesser known form feed <em>(<code class="language-plaintext highlighter-rouge">'\f'</code>, <code class="language-plaintext highlighter-rouge">12</code> dec, <code class="language-plaintext highlighter-rouge">0x0C</code> hex)</em> and vertical tab <em>(<code class="language-plaintext highlighter-rouge">'\v'</code>, <code class="language-plaintext highlighter-rouge">11</code> dec, <code class="language-plaintext highlighter-rouge">0x0B</code> hex)</em>.</p>

<blockquote>
  <p>I was not actually aware of these characters myself until I started looking into the Clang and Carbon compiler source code and then cross-referencing with the ASCII table.</p>
</blockquote>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">bool</span> <span class="nf">IsWhitespace</span><span class="p">(</span><span class="kt">char</span> <span class="n">c</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">' '</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'\t'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'\v'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'\f'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'\r'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'\n'</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>MSVC</strong> compiles this into a couple of tests, one for the space character <em>(<code class="language-plaintext highlighter-rouge">32</code> dec, <code class="language-plaintext highlighter-rouge">0x20</code> hex)</em>, and one for the range of white-space characters <em>(From <code class="language-plaintext highlighter-rouge">0x09</code> to <code class="language-plaintext highlighter-rouge">0x0D</code>)</em>.</p>

<pre><code class="language-assembly">bool IsWhitespace(char) PROC
        cmp     cl, 32
        je      SHORT $LN3@IsWhitespa
        sub     cl, 9
        cmp     cl, 4
        jbe     SHORT $LN3@IsWhitespa
        xor     al, al
        ret     0
$LN3@IsWhitespa:
        mov     al, 1
        ret     0
</code></pre>

<p><strong>Clang</strong> compiles this code into a range check and test against a computed bit mask, combining the result together using logical operations.</p>

<pre><code class="language-assembly">IsWhitespace(char):
        cmp     dil, 33
        setb    cl
        movabs  rax, 4294983168
        bt      rax, rdi
        setb    al
        and     al, cl
        ret
</code></pre>

<p><strong>GCC</strong> however compiles to something more interesting where it creates assembly code which both uses a lookup table to check most of the values and then explicitly checks for carriage-return <em>(<code class="language-plaintext highlighter-rouge">13</code> dec, <code class="language-plaintext highlighter-rouge">0x0D</code> hex)</em> and line-feed <em>(<code class="language-plaintext highlighter-rouge">10</code> dec, <code class="language-plaintext highlighter-rouge">0x0A</code> hex)</em>.</p>

<pre><code class="language-assembly">IsWhitespace(char):
        cmp     dil, 32
        ja      .L12
        movabs  rax, 4294973952
        bt      rax, rdi
        setc    al
        test    al, al
        je      .L12
        ret
.L12:
        cmp     dil, 13
        sete    al
        cmp     dil, 10
        sete    dl
        or      eax, edx
        ret
</code></pre>

<p>My suspicion was that GCC tried to honour the ordering of the comparisons in the written C++ code, and it managed to collapse the first set of 4 comparisons into a bit field lookup, but it did not do so for the last two.</p>

<p>This was confirmed when I sorted the comparisons in the C++ function to match the ASCII values.</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">bool</span> <span class="nf">IsWhitespace</span><span class="p">(</span><span class="kt">char</span> <span class="n">c</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'\t'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'\n'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'\v'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'\f'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'\r'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">' '</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>As both <strong>GCC</strong> and <strong>Clang</strong> generated nearly identical and heavily optimised assembly. <em>(This is the GCC version)</em></p>

<pre><code class="language-assembly">IsWhitespace(char):
        lea     eax, [rdi-9]
        cmp     al, 4
        setbe   al
        cmp     dil, 32
        sete    dl
        or      eax, edx
        ret
</code></pre>

<p>With <strong>MSVC</strong> generating the same tests but branching to return the result and therefore adhering to the short-circuit evaluation of the C++ code.</p>

<pre><code class="language-assembly">IsWhitespace(char) PROC
        lea     eax, DWORD PTR [rcx-9]
        cmp     al, 4
        jbe     SHORT $LN5@IsWhitespa
        cmp     cl, 32
        je      SHORT $LN5@IsWhitespa
        xor     al, al
        ret     0
$LN5@IsWhitespa:
        mov     al, 1
        ret     0
</code></pre>

<h3 id="more-complex-comparisons">More Complex Comparisons</h3>

<p>A more complete example is the <code class="language-plaintext highlighter-rouge">IsSymbol</code> function below, which I’ve written to classify all symbols used in ASCII just as they appear on my keyboard (US layout).</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">bool</span> <span class="nf">IsSymbol</span><span class="p">(</span><span class="kt">char</span> <span class="n">c</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'~'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'`'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'!'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'@'</span>
        <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'#'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'$'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'%'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'^'</span>
        <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'&amp;'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'*'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'('</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">')'</span>
        <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'_'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'-'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'+'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'='</span>
        <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'['</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">']'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'{'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'}'</span>
        <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'|'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'\\'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">';'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">':'</span>
        <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'\''</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'"'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">','</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'.'</span>
        <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'&lt;'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'&gt;'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'/'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'?'</span>
    <span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>In this initial version <strong>MSVC</strong> ends up with the smallest and possibly cleanest assembly code. It performs a quick test against a short range at the end of the ASCII sequence <em>(for <code class="language-plaintext highlighter-rouge">'{'</code>, <code class="language-plaintext highlighter-rouge">'|'</code>, <code class="language-plaintext highlighter-rouge">'}'</code>, and <code class="language-plaintext highlighter-rouge">'~'</code>)</em>, then checks the value is within the desired range before doing a bit mask test for the remaining characters.</p>

<pre><code class="language-assembly">bool IsSymbol(char) PROC
        lea     eax, DWORD PTR [rcx-123]
        cmp     al, 3
        jbe     SHORT $LN3@IsSymbol
        sub     cl, 33
        cmp     cl, 63
        ja      SHORT $LN5@IsSymbol
        mov     rax, -288230371890266113
        bt      rax, rcx
        jb      SHORT $LN3@IsSymbol
$LN5@IsSymbol:
        xor     al, al
        ret     0
$LN3@IsSymbol:
        mov     al, 1
        ret     0
bool IsSymbol(char) ENDP
</code></pre>

<p><strong>Clang</strong> also generates simple code, although it uses a 93 element jump table instead of testing against a bit mask. While this may be compact code this table takes up 372 bytes of space, which will take up instruction cache space and could affect performance.</p>

<pre><code class="language-assembly">IsSymbol(char):
        add     edi, -33
        cmp     edi, 93
        ja      .LBB1_2
        mov     al, 1
        lea     rcx, [rip + .LJTI1_0]
        movsxd  rdx, dword ptr [rcx + 4*rdi]
        add     rdx, rcx
        jmp     rdx
.LBB1_3:
        ret
.LBB1_2:
        xor     eax, eax
        ret
.LJTI1_0:
        .long   .LBB1_3-.LJTI1_0	# When false
        .long   .LBB1_2-.LJTI1_0	# When true
        # ...
</code></pre>

<p>Much like in the previous unordered <code class="language-plaintext highlighter-rouge">IsWhitespace</code> function it turns out that <strong>GCC</strong> doesn’t like the <code class="language-plaintext highlighter-rouge">IsSymbol</code> C++ code as written and produces quite a long and branchy sequence of assembly.</p>

<p>The actual assembly code is a bit of a mess, doing a lot of individual tests, some range tests, and a bit mask test. Though the bit mask only has 3 bits set, meaning that it’s only testing for 3 characters even though it could test nearly the entire range of symbols using it <em>(as the MSVC assembly code does)</em>.</p>

<p><details>
    <summary>Expand GCC assembly code</summary>

    <pre><code class="language-assembly">IsSymbol(char):
        cmp     dil, 33
        je      .L12
        lea     eax, [rdi-64]
        cmp     al, 62
        ja      .L18
        movabs  rdx, 4611686022722355201
        bt      rdx, rax
        setc    al
        test    al, al
        je      .L19
.L5:
        ret
.L19:
        cmp     dil, 95
        jg      .L10
        mov     eax, 1
        cmp     dil, 90
        jg      .L5
.L8:
        and     edi, -17
        cmp     dil, 47
        sete    al
        ret
.L18:
        cmp     dil, 95
        jg      .L8
        cmp     dil, 46
        jg      .L11
        mov     eax, 1
        cmp     dil, 33
        jg      .L5
        jmp     .L8
.L10:
        lea     edx, [rdi-123]
        mov     eax, 1
        cmp     dl, 2
        ja      .L8
        ret
.L11:
        lea     eax, [rdi-58]
        cmp     al, 4
        ja      .L8
.L12:
        mov     eax, 1
        ret
</code></pre>

    <p>Just for fun I reconstructed the C/C++ code from the assembly and it looks something like this:</p>

    <div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">bool</span> <span class="nf">IsSymbol</span><span class="p">(</span><span class="kt">char</span> <span class="n">c</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">c</span> <span class="o">==</span> <span class="mi">33</span><span class="p">)</span> <span class="c1">// '!'</span>
	<span class="p">{</span>
		<span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
	<span class="p">}</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">c</span> <span class="o">-</span> <span class="mi">64</span> <span class="o">&lt;=</span> <span class="mi">62</span><span class="p">)</span>
	<span class="p">{</span>
		<span class="k">if</span> <span class="p">(((</span><span class="mi">1</span> <span class="o">&lt;&lt;</span> <span class="p">(</span><span class="n">c</span> <span class="o">-</span> <span class="mi">64</span><span class="p">))</span> <span class="o">&amp;</span> <span class="mi">4611686022722355201</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="c1">// '@', '`', '~'</span>
		<span class="p">{</span>
			<span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
		<span class="p">}</span>
		<span class="k">if</span> <span class="p">(</span><span class="n">c</span> <span class="o">&gt;</span> <span class="mi">95</span><span class="p">)</span>
		<span class="p">{</span>
			<span class="k">if</span> <span class="p">(</span><span class="n">c</span> <span class="o">-</span> <span class="mi">123</span> <span class="o">&lt;=</span> <span class="mi">2</span><span class="p">)</span> <span class="c1">// '{', '|', '}'</span>
			<span class="p">{</span>
				<span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
			<span class="p">}</span>
		<span class="p">}</span>
		<span class="k">else</span>
		<span class="p">{</span>
			<span class="k">if</span> <span class="p">(</span><span class="n">c</span> <span class="o">&gt;</span> <span class="mi">90</span><span class="p">)</span> <span class="c1">// '[', '\\', ']', '^', '_'</span>
			<span class="p">{</span>
				<span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
			<span class="p">}</span>
		<span class="p">}</span>
	<span class="p">}</span>
	<span class="k">else</span>
	<span class="p">{</span>
		<span class="k">if</span> <span class="p">(</span><span class="n">c</span> <span class="o">&lt;=</span> <span class="mi">95</span><span class="p">)</span>
		<span class="p">{</span>
			<span class="k">if</span> <span class="p">(</span><span class="n">c</span> <span class="o">&gt;</span> <span class="mi">46</span><span class="p">)</span>
			<span class="p">{</span>
				<span class="k">if</span> <span class="p">(</span><span class="n">c</span> <span class="o">-</span> <span class="mi">58</span> <span class="o">&lt;=</span> <span class="mi">4</span><span class="p">)</span> <span class="c1">// ':', ';', '&lt;', '=', '&gt;', '?'</span>
				<span class="p">{</span>
					<span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
				<span class="p">}</span>
			<span class="p">}</span>
			<span class="k">else</span>
			<span class="p">{</span>
				<span class="k">if</span> <span class="p">(</span><span class="n">c</span> <span class="o">&gt;</span> <span class="mi">33</span><span class="p">)</span> <span class="c1">// '"', '#', '$', '%', '&amp;', '\'', '(', ')', '*', '+', ',', '-', '.'</span>
				<span class="p">{</span>
					<span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
				<span class="p">}</span>
			<span class="p">}</span>
		<span class="p">}</span>
	<span class="p">}</span>
	<span class="k">return</span> <span class="p">(</span><span class="n">c</span> <span class="o">&amp;</span> <span class="mh">0xEF</span><span class="p">)</span> <span class="o">==</span> <span class="mi">47</span><span class="p">;</span> <span class="c1">// '/'</span>
<span class="p">}</span>
</code></pre></div>    </div>

    <p>After doing this the best that I can tell is that GCC is grouping contiguous sequences together where it can find them but otherwise attempts to perform the tests in order. So if the characters tested were more randomly organised then the code would end up looking quite different <em>(in GCC)</em>.</p>

  </details></p>

<h4 id="using-a-switch">Using a Switch</h4>

<p>Given that we’re testing so many different characters it may seem more natural to use a <code class="language-plaintext highlighter-rouge">switch</code> statement for this instead, so converting the previous implementation gives us this:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">bool</span> <span class="nf">IsSymbol_switch</span><span class="p">(</span><span class="kt">char</span> <span class="n">c</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">switch</span> <span class="p">(</span><span class="n">c</span><span class="p">)</span>
    <span class="p">{</span>
    <span class="k">case</span> <span class="sc">'~'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'`'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'!'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'@'</span><span class="p">:</span>
    <span class="k">case</span> <span class="sc">'#'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'$'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'%'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'^'</span><span class="p">:</span>
    <span class="k">case</span> <span class="sc">'&amp;'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'*'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'('</span><span class="p">:</span> <span class="k">case</span> <span class="sc">')'</span><span class="p">:</span>
    <span class="k">case</span> <span class="sc">'_'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'-'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'+'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'='</span><span class="p">:</span>
    <span class="k">case</span> <span class="sc">'['</span><span class="p">:</span> <span class="k">case</span> <span class="sc">']'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'{'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'}'</span><span class="p">:</span>
    <span class="k">case</span> <span class="sc">'|'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'\\'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">';'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">':'</span><span class="p">:</span>
    <span class="k">case</span> <span class="sc">'\''</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'"'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">','</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'.'</span><span class="p">:</span>
    <span class="k">case</span> <span class="sc">'&lt;'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'&gt;'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'/'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'?'</span><span class="p">:</span>
        <span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
    <span class="nl">default:</span>
        <span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
    <span class="p">}</span>
<span class="p">}</span>

</code></pre></div></div>

<p>This greatly simplifies the assembly code that <strong>GCC</strong> emits, turning it into a far simpler set of comparisons and branches. My intuition is that in a switch statement the compiler is more free to re-order case statements giving it more opportunity to optimise.</p>

<pre><code class="language-assembly">IsSymbol_switch(char):
        cmp     dil, 96
        jg      .L28
        cmp     dil, 90
        jg      .L31
        cmp     dil, 47
        jg      .L30
        cmp     dil, 32
        setg    al
        ret
.L28:
        sub     edi, 123
        cmp     dil, 3
        setbe   al
        ret
.L30:
        sub     edi, 58
        cmp     dil, 6
        setbe   al
        ret
.L31:
        mov     eax, 1
        ret
</code></pre>

<p>While <strong>MSVC</strong> now generates code that uses an indirect jump table. It stores the jump addresses in a smaller table, and then has the main 93 element table store an index to the smaller table. This means that there are two table lookups per character, but the tables are about a quarter the size of the Clang version.</p>

<pre><code class="language-assembly">bool IsSymbol_switch(char) PROC
        movsx   eax, cl
        add     eax, -33
        cmp     eax, 93
        ja      SHORT $LN36@IsSymbol_s
        lea     rdx, OFFSET FLAT:__ImageBase
        cdqe
        movzx   eax, BYTE PTR $LN38@IsSymbol_s[rdx+rax]
        mov     ecx, DWORD PTR $LN39@IsSymbol_s[rdx+rax*4]
        add     rcx, rdx
        jmp     rcx
$LN4@IsSymbol_s:
        mov     al, 1
        ret     0
$LN36@IsSymbol_s:
        xor     al, al
        ret     0
        npad    2
$LN39@IsSymbol_s:
        DD      $LN4@IsSymbol_s
        DD      $LN36@IsSymbol_s
$LN38@IsSymbol_s:
        DB      0	# When true
        DB      1 	# When false
        # ...
</code></pre>

<p>Interestingly enough the <strong>Clang</strong> version remains the same between the two C++ implementations, suggesting that somewhere along the way it converts both versions of the code to a common sequence <em>(I suspect this is likely in the intermediate representation that LLVM uses)</em>.</p>

<h4 id="reordering-compared-values">Reordering Compared Values</h4>

<p>Now just as reordering comparisons in the shorter <code class="language-plaintext highlighter-rouge">IsWhitespace</code> case helped generate better assembly code, I wanted to see what effect ordering the characters would have in the more complete <code class="language-plaintext highlighter-rouge">IsSymbol</code> case. So sorting all of the symbols and updating the C++ code gave me this:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">bool</span> <span class="nf">IsSymbol_ordered</span><span class="p">(</span><span class="kt">char</span> <span class="n">c</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'!'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'"'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'#'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'$'</span>
        <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'%'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'&amp;'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'\''</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'('</span>
        <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">')'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'*'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'+'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">','</span>
        <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'-'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'.'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'/'</span>
        <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">';'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">':'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'&lt;'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'='</span> 
        <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'&gt;'</span>  <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'?'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'@'</span>
        <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'['</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'\\'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">']'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'^'</span>
        <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'_'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'`'</span>
        <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'{'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'|'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'}'</span> <span class="o">||</span> <span class="n">c</span> <span class="o">==</span> <span class="sc">'~'</span>
    <span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>For <strong>MSVC</strong> it ended up with almost identical code as the original, with some of the tests swapped around but otherwise performing the same tests.</p>

<pre><code class="language-assembly">bool IsSymbol_ordered(char) PROC
        lea     eax, DWORD PTR [rcx-33]
        cmp     al, 63
        ja      SHORT $LN7@TestSymbol
        mov     rdx, -288230371890266113
        bt      rdx, rax
        jb      SHORT $LN5@TestSymbol
$LN7@TestSymbol:
        sub     cl, 123
        cmp     cl, 3
        jbe     SHORT $LN5@TestSymbol
        xor     al, al
        ret     0
$LN5@TestSymbol:
        mov     al, 1
        ret     0
bool IsSymbol_ordered(char) ENDP
</code></pre>

<p><strong>GCC</strong> ended up with the same comparison functions but just organised differently, and it even generated an identical bit mask to test against.</p>

<pre><code class="language-assembly">IsSymbol_ordered(char):
        lea     eax, [rdi-33]
        cmp     al, 63
        jbe     .L26
.L21:
        sub     edi, 123
        cmp     dil, 3
        setbe   dl
        mov     eax, edx
        ret
.L26:
        movabs  rcx, -288230371890266113
        mov     edx, 1
        bt      rcx, rax
        jnc     .L21
        mov     eax, edx
        ret
</code></pre>

<p>With <strong>Clang</strong> being the odd one out generating two more range comparisons before performing a similar but different bit mask test.</p>

<pre><code class="language-assembly">IsSymbol_ordered(char):
        lea     ecx, [rdi - 33]
        mov     al, 1
        cmp     cl, 15
        jb      .LBB2_5
        movsx   ecx, dil
        lea     edx, [rcx - 91]
        cmp     edx, 35
        ja      .LBB2_2
        movabs  rsi, 64424509503
        bt      rsi, rdx
        jb      .LBB2_5
.LBB2_2:
        add     ecx, -58
        cmp     ecx, 7
        jae     .LBB2_3
.LBB2_5:
        ret
.LBB2_3:
        xor     eax, eax
        ret
</code></pre>

<p>So we see that there is a vast difference between the assembly code generated when the characters being compared are sorted versus when they are unsorted. The best reason I can think of that this is the case is because of the short circuit behaviour of C++’s logical operators <em>(<code class="language-plaintext highlighter-rouge">||</code> and <code class="language-plaintext highlighter-rouge">&amp;&amp;</code>)</em>. These effectively imply an ordering to the tests where in this case we don’t really need that.</p>

<h4 id="reordering-the-switch-statement">Reordering the Switch Statement</h4>

<p>Now the final thing to check is to see what happens when we reorder the case statements within the switch. The code should look something like:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">bool</span> <span class="nf">IsSymbol_switch_ordered</span><span class="p">(</span><span class="kt">char</span> <span class="n">c</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">switch</span> <span class="p">(</span><span class="n">c</span><span class="p">)</span>
    <span class="p">{</span>
    <span class="k">case</span> <span class="sc">'!'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'"'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'#'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'$'</span><span class="p">:</span>
    <span class="k">case</span> <span class="sc">'%'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'&amp;'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'\''</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'('</span><span class="p">:</span>
    <span class="k">case</span> <span class="sc">')'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'*'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'+'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">','</span><span class="p">:</span>
    <span class="k">case</span> <span class="sc">'-'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'.'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'/'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">':'</span><span class="p">:</span>
    <span class="k">case</span> <span class="sc">';'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'&lt;'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'='</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'&gt;'</span><span class="p">:</span>
    <span class="k">case</span> <span class="sc">'?'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'@'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'['</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'\\'</span><span class="p">:</span>
    <span class="k">case</span> <span class="sc">']'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'^'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'_'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'`'</span><span class="p">:</span>
    <span class="k">case</span> <span class="sc">'{'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'|'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'}'</span><span class="p">:</span> <span class="k">case</span> <span class="sc">'~'</span><span class="p">:</span>
        <span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
    <span class="nl">default:</span>
        <span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>And the answer is <em>absolutely nothing!</em></p>

<p>The generated assembly code for this function with the sorted case statements is exactly the same as the unordered switch function shown before for every compiler tested. This confirms my intuition from before that the compiler is free to reorder the case statements in this situation.</p>

<h3 id="analysis">Analysis</h3>

<p>There’s a few things that we can learn from this investigation, even before the other articles are written. <em>Though the performance evaluation will have to wait until the benchmarks are done.</em></p>

<ol>
  <li>
    <p>All compilers can create small and efficient bits of code from these simple character classification functions. They can combine multiple tests written in C++ into a single range or bit test down at the assembly level.</p>

    <p>So in general terms do not worry about the performance of such functions but instead write simple and clear code, as that also makes it easier for the compiler to optimise.</p>

    <p>If you’re in doubt or curious then check with <a href="https://godbolt.org/">Compiler Explorer</a>.</p>
  </li>
  <li>
    <p>If the order of the comparisons is unimportant then sorting the values being compared will give better code generation than having them in a random order. This will more easily allow the compiler to implement the comparisons as range or bit mask tests rather than single character tests.</p>

    <p>This also applies to testing ranges as keeping the begin and end ranges ordered helps both humans read the code and the compiler optimise the code.</p>

    <p>If code size is not an issue, or you want to avoid sorting the comparisons each time, then using a switch statement might work better for you.</p>

    <p>Of course if you know the distribution of the values in your data then place the most common tests first, because the compiler should retain that order and the code should run faster in practice. Of course you should test and measure to get a proper understanding.</p>
  </li>
</ol>]]></content><author><name></name></author><category term="C++" /><category term="c++" /><category term="compilers" /><category term="lexers" /><summary type="html"><![CDATA[This is the second in a series of articles I’m writing on character classification as used in lexers and compilers. In this I describe the simplest method of character classification which is using plain functions with the logic directly inside.]]></summary></entry></feed>