<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Josh Haberman &#187; upb</title>
	<atom:link href="http://blog.reverberate.org/category/upb/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.reverberate.org</link>
	<description>parsing, performance, minimalism with C99</description>
	<lastBuildDate>Wed, 02 Dec 2009 20:45:36 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Porting upb to C++?</title>
		<link>http://blog.reverberate.org/2009/11/28/porting-upb-to-c/</link>
		<comments>http://blog.reverberate.org/2009/11/28/porting-upb-to-c/#comments</comments>
		<pubDate>Sat, 28 Nov 2009 20:53:02 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[upb]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=287</guid>
		<description><![CDATA[I am on the verge of trying something I never thought I&#8217;d do.  I&#8217;m considering porting upb to C++.
My reasons aren&#8217;t ideological, they are highly practical.  Basically I am realizing that while object-oriented C is OK for a while, it&#8217;s very weak at inheritance.  Inheritance in C involves a lot of casting, [...]]]></description>
			<content:encoded><![CDATA[<p>I am on the verge of trying something I never thought I&#8217;d do.  I&#8217;m considering porting upb to C++.</p>
<p>My reasons aren&#8217;t ideological, they are highly practical.  Basically I am realizing that while object-oriented C is OK for a while, it&#8217;s very weak at inheritance.  Inheritance in C involves a lot of casting, duplicated code and/or macros, and careful discipline.  The main problems with this are:
<ul>
<li>the code gets longer and less readable</li>
<li>the code involves more possibly-unsafe operations like casts</li>
</ul>
<p>Both of these problems make the code ultimately more difficult to audit for security.  And getting upb audited for security is something I plan to do very soon.</p>
<p>I am coming to believe that porting to C++ would make upb smaller (in lines of code) and easier for verify for security.  However, there are a few major disadvantages that are giving me pause:
<ul>
<li>there are still some contexts in which C++ is a no-go, like the Linux kernel, embedded systems that only have a C compiler (but no C++), or projects that want to stay C-only.  Doing this port would make upb unsuitable for these contexts.</li>
<li>projects that are currently C-only would need to create C++ source files to call upb APIs, and will have to link in the C++ runtime
<li>(possible) C++ could result in a larger binary.</li>
</ul>
<p>When I look at the downsides though, they don&#8217;t seem to pertain to my initial goals of making upb useful for Python, Lua, Ruby, etc. extensions, and for use inside Google.  Being useful for really restricted embedded systems is a far-off use case.  So it&#8217;s sounding like porting to C++ is the right thing to do.</p>
<p>I hope it significantly reduces the line count, as I expect it will.  That will make me feel better about giving up the minimalism of C.  I will definitely be compiling with <tt>-fno-exceptions -fno-rtti -fvisibility-inlines-hidden</tt> on gcc.  I also won&#8217;t be using any of the C++ standard library (not even &lt;string&gt;).</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2009/11/28/porting-upb-to-c/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Site Overhaul / upb status</title>
		<link>http://blog.reverberate.org/2009/07/06/site-overhaul-upb-status/</link>
		<comments>http://blog.reverberate.org/2009/07/06/site-overhaul-upb-status/#comments</comments>
		<pubDate>Tue, 07 Jul 2009 06:11:39 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[upb]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=220</guid>
		<description><![CDATA[You&#8217;ll notice that the site has a new theme, a new title, maybe a little bit of a new attitude (being &#8220;josh the outspoken&#8221; isn&#8217;t all it&#8217;s cracked up to be).  All this is in anticipation of upb&#8217;s release, and of a lot of follow-up work that I couldn&#8217;t be more excited about.
A lot [...]]]></description>
			<content:encoded><![CDATA[<p>You&#8217;ll notice that the site has a new theme, a new title, maybe a little bit of a new attitude (being &#8220;josh the outspoken&#8221; isn&#8217;t all it&#8217;s cracked up to be).  All this is in anticipation of upb&#8217;s release, and of a lot of follow-up work that I couldn&#8217;t be more excited about.</p>
<p>A lot of people mentioned that my last theme didn&#8217;t handle preview correctly, which I agree was quite annoying &#8212; sorry about that!  Unfortunately my new theme doesn&#8217;t have preview at all.  Finding the perfect WordPress theme is harder than it seems.</p>
<p>So as I mentioned I expect the first release of upb to come within the next week or two.  It will only have parsing (not serializing), but most of the core functionality besides that is done.  I expect that the core will stay relatively static, and all the enhancements/innovations will come from modules that are layered on top of that.  For example, some of the things I have in mind are:</p>
<ul>
<li>Extensions for every language known to man.  The first ones will be Python, Lua, and Ruby (in that order).  These will take some thought to get just right, because I want the memory management to be tightly integrated.  In particular, I want protobuf objects in these languages to be able to reference string data from the original protobufs, but all be memory-managed appropriately.</li>
<li>Support for the Protocol Buffer text format.</li>
<li>An easy way to parse only selected fields.  For example, I want to be able to say (in Python):

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #dc143c;">time</span>, url = logrecord.<span style="color: black;">parse_fields</span><span style="color: black;">&#40;</span>proto_data, <span style="color: #483d8b;">&quot;time&quot;</span>, <span style="color: #483d8b;">&quot;url&quot;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>This can be optimized out the wazoo, because you can skip all fields and submessages that you don&#8217;t care about, and you can stop parsing once you have the data you need.</li>
</ul>
<p>And of course, performance, performance, performance.  I can&#8217;t <em>wait</em> to get my hands on some real profiles and see what my next optimization target will be.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2009/07/06/site-overhaul-upb-status/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Amazing Tools: Massif, a heap profiler</title>
		<link>http://blog.reverberate.org/2009/07/06/amazing-tools-massif-a-heap-profiler/</link>
		<comments>http://blog.reverberate.org/2009/07/06/amazing-tools-massif-a-heap-profiler/#comments</comments>
		<pubDate>Tue, 07 Jul 2009 05:54:11 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[upb]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=207</guid>
		<description><![CDATA[I love the feeling of discovering an amazing new tool.  It&#8217;s a pleasant surprise to have some task you want to achieve &#8212; one that you could do manually, given enough time &#8212; and find that some tool you didn&#8217;t even know existed will make the solution easy.
Tonight I was working on the upb [...]]]></description>
			<content:encoded><![CDATA[<p>I love the feeling of discovering an amazing new tool.  It&#8217;s a pleasant surprise to have some task you want to achieve &#8212; one that you could do manually, given enough time &#8212; and find that some tool you didn&#8217;t even know existed will make the solution easy.</p>
<p>Tonight I was working on the upb compiler (and upb&#8217;s first release is impending, by the way), and ran it under Valgrind as I frequently do to catch memory leaks.  There weren&#8217;t any leaks, but I did notice that the program had allocated 80kb of memory over the course of its run.</p>
<p>People who are less OCD than I would probably shrug off 80kb of memory.  But intuitively 80kb sounded high to me given how much data this program was dealing with, and I wanted to know where all those allocations were coming from.</p>
<p>I didn&#8217;t know off the top of my head how I might profile where my memory allocations were happening, but I had a hunch that Valgrind would be there for me.  And sure enough, one of the tools included in Valgrind is <a href="http://valgrind.org/docs/manual/ms-manual.html">Massif: a heap profiler</a>.</p>
<p>A few short shell commands later:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">$ <span style="color: #c20cb9; font-weight: bold;">valgrind</span> <span style="color: #660033;">--tool</span>=massif .<span style="color: #000000; font-weight: bold;">/</span>upbc
$ ms_print massif.out.17604 <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">less</span></pre></div></div>

<p>&#8230;and I had this call graph sitting in front of me:</p>
<pre>93.72% (74,243B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->58.49% (46,336B) 0x40194E: upb_table_init (upb_table.c:34)
| ->38.13% (30,208B) 0x4019E9: upb_strtable_init (upb_table.c:45)
| | ->29.09% (23,040B) 0x40343A: upb_msg_init (upb_msg.c:44)
| | | ->29.09% (23,040B) 0x402CEE: insert_message (upb_context.c:193)
| | |   ->25.85% (20,480B) 0x402E80: addfd (upb_context.c:223)
| | |   | ->12.93% (10,240B) 0x40263F: upb_context_init (upb_context.c:30)
| | |   | | ->12.93% (10,240B) 0x40167C: main (upbc.c:195)
| | |   | |
| | |   | ->12.93% (10,240B) 0x403135: upb_context_parsefds (upb_context.c:283)
| | |   |   ->12.93% (10,240B) 0x4016BA: main (upbc.c:198)
| | |   |
| | |   ->03.23% (2,560B) 0x402D56: insert_message (upb_context.c:203)
| | |     ->03.23% (2,560B) 0x402E80: addfd (upb_context.c:223)
| | |       ->01.62% (1,280B) 0x40263F: upb_context_init (upb_context.c:30)
| | |       | ->01.62% (1,280B) 0x40167C: main (upbc.c:195)
| | |       |
| | |       ->01.62% (1,280B) 0x403135: upb_context_parsefds (upb_context.c:283)
| | |         ->01.62% (1,280B) 0x4016BA: main (upbc.c:198)
| | |
| | ->05.82% (4,608B) 0x4046E0: upb_enum_init (upb_enum.c:14)
| | | ->05.82% (4,608B) 0x402C35: insert_enum (upb_context.c:167)
| | |   ->05.82% (4,608B) 0x402DBC: insert_message (upb_context.c:209)
| | |     ->05.82% (4,608B) 0x402E80: addfd (upb_context.c:223)
| | |       ->03.23% (2,560B) 0x403135: upb_context_parsefds (upb_context.c:283)
| | |       | ->03.23% (2,560B) 0x4016BA: main (upbc.c:198)
| | |       |
| | |       ->02.59% (2,048B) 0x40263F: upb_context_init (upb_context.c:30)
| | |         ->02.59% (2,048B) 0x40167C: main (upbc.c:195)
| | |
| | ->01.62% (1,280B) 0x402612: upb_context_init (upb_context.c:27)
| | | ->01.62% (1,280B) 0x40167C: main (upbc.c:195)
| | |
| | ->01.62% (1,280B) 0x402629: upb_context_init (upb_context.c:28)
| | | ->01.62% (1,280B) 0x40167C: main (upbc.c:195)
| | |
| | ->00.00% (0B) in 1+ places, all below ms_print's threshold (01.00%)
| |
| ->20.36% (16,128B) 0x4019C4: upb_inttable_init (upb_table.c:40)
|   ->17.45% (13,824B) 0x403417: upb_msg_init (upb_msg.c:42)
|   | ->17.45% (13,824B) 0x402CEE: insert_message (upb_context.c:193)
|   |   ->15.51% (12,288B) 0x402E80: addfd (upb_context.c:223)
|   |   | ->07.76% (6,144B) 0x40263F: upb_context_init (upb_context.c:30)
|   |   | | ->07.76% (6,144B) 0x40167C: main (upbc.c:195)
|   |   | |
|   |   | ->07.76% (6,144B) 0x403135: upb_context_parsefds (upb_context.c:283)
|   |   |   ->07.76% (6,144B) 0x4016BA: main (upbc.c:198)
|   |   |
|   |   ->01.94% (1,536B) 0x402D56: insert_message (upb_context.c:203)
|   |     ->01.94% (1,536B) 0x402E80: addfd (upb_context.c:223)
|   |       ->01.94% (1,536B) in 2 places, all below massif's threshold (01.00%)
|   |
|   ->02.91% (2,304B) 0x4046F5: upb_enum_init (upb_enum.c:15)
|     ->02.91% (2,304B) 0x402C35: insert_enum (upb_context.c:167)
|       ->02.91% (2,304B) 0x402DBC: insert_message (upb_context.c:209)
|         ->02.91% (2,304B) 0x402E80: addfd (upb_context.c:223)
|           ->01.62% (1,280B) 0x403135: upb_context_parsefds (upb_context.c:283)
|           | ->01.62% (1,280B) 0x4016BA: main (upbc.c:198)
|           |
|           ->01.29% (1,024B) 0x40263F: upb_context_init (upb_context.c:30)
|             ->01.29% (1,024B) 0x40167C: main (upbc.c:195)
|
->07.98% (6,324B) 0x403AF1: upb_msgdata_new (upb_msg.c:158)
| ->07.96% (6,308B) 0x403F05: upb_msg_reuse_submsg (upb_msg.c:241)
| | ->07.96% (6,308B) 0x404497: submsg_start_cb (upb_msg.c:325)
| |   ->07.96% (6,308B) 0x4056E8: push_stack_frame (upb_parse.c:288)
| |     ->07.96% (6,308B) 0x4057ED: parse_delimited (upb_parse.c:316)
| |       ->07.96% (6,308B) 0x405A91: upb_parse (upb_parse.c:369)
| |         ->07.96% (6,308B) 0x4045BA: upb_msg_parse (upb_msg.c:352)
| |           ->07.96% (6,308B) 0x404631: upb_alloc_and_parse (upb_msg.c:361)
| |             ->07.96% (6,308B) 0x4030CA: upb_context_parsefds (upb_context.c:274)
| |               ->07.96% (6,308B) 0x4016BA: main (upbc.c:198)</pre>
<p>This might look somewhat daunting if you&#8217;re not as deeply familiar with upb as I am.  But it immediately told me what I wanted to know: almost 60% of the memory is being used by upb&#8217;s int->record and string->record hash tables.  That seems a little bit high.  And it&#8217;s being allocated right when the tables are constructed (<tt>upb_table_init</tt>), not as a result of a resize.</p>
<p>Breaking open the code, I found a table minimum size that I had imposed as an attempt to limit the number of resizes.  Resizes have a high overhead &#8212; in my hash table implementation, they result in everything being re-hashed and all the memory being re-allocated, so I had imposed a minimum size of 16 in my constructor:</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #993333;">void</span> upb_table_init<span style="color: #009900;">&#40;</span><span style="color: #993333;">struct</span> upb_table <span style="color: #339933;">*</span>t<span style="color: #339933;">,</span> uint32_t size<span style="color: #339933;">,</span> uint16_t entry_size<span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
  t<span style="color: #339933;">-&gt;</span>count <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
  t<span style="color: #339933;">-&gt;</span>entry_size <span style="color: #339933;">=</span> entry_size<span style="color: #339933;">;</span>
  t<span style="color: #339933;">-&gt;</span>size_lg2 <span style="color: #339933;">=</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">;</span>
  <span style="color: #b1b100;">while</span><span style="color: #009900;">&#40;</span>size <span style="color: #339933;">&gt;&gt;=</span> <span style="color: #0000dd;">1</span><span style="color: #009900;">&#41;</span> t<span style="color: #339933;">-&gt;</span>size_lg2<span style="color: #339933;">++;</span>
  t<span style="color: #339933;">-&gt;</span>size_lg2 <span style="color: #339933;">=</span> max<span style="color: #009900;">&#40;</span>t<span style="color: #339933;">-&gt;</span>size_lg2<span style="color: #339933;">,</span> <span style="color: #0000dd;">4</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>  <span style="color: #808080; font-style: italic;">/* Min size of 16. */</span></pre></div></div>

<p>When I inserted some print statements to compare how often this minimum was taking effect, I saw that there were tons of tables that were trying to allocate just a few (0-10) entries.  With my minimum, they were always being allocated at least 16.  And what&#8217;s more, in all these cases I knew <em>up front</em> how many entries I planned to insert!  So there was no danger of a resize anyway.</p>
<p>I removed this minimum size and the memory usage of my program dipped to about 55kb (from 80kb &#8212; a ~30% reduction!)  That seems a bit more reasonable, though I&#8217;m sure it&#8217;s not the last of my efforts to make sure upb&#8217;s memory footprint stays small.</p>
<p>Anyway, the point of this entry is that now I know about a new tool (Massif) that is at my disposal whenever I need it.  It&#8217;s easy to use and requires almost no set-up.  I can run it on a whim whenever I want to collect memory usage data.  I have just become a little bit more resourceful.</p>
<p>Valgrind has tons of spiffy tools of this sort that ship with it.  I wonder how many people know about them.</p>
<p>Another tool I had a similar reaction to was <a href="http://www.wireshark.org/">WireShark</a>.  I was experiencing a redirect loop bug in my browser and wanted to submit a useful report to the developers.  The useful information here is the contents of all the HTTP traffic that was occurring during the loop.  I fired up WireShark (as a first-time user) and found out relatively quickly how to sniff the network interface, capture my HTTP session, dump it at the HTTP layer (as opposed to the TCP layer or something else), and dumped it to a text file.  Massively spiffy.</p>
<p>Learn an amazing new tool today!  And then tell me about it!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2009/07/06/amazing-tools-massif-a-heap-profiler/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>μpb Status Update</title>
		<link>http://blog.reverberate.org/2009/06/26/%ce%bcpb-status-update/</link>
		<comments>http://blog.reverberate.org/2009/06/26/%ce%bcpb-status-update/#comments</comments>
		<pubDate>Sat, 27 Jun 2009 03:01:01 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[upb]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=180</guid>
		<description><![CDATA[I haven&#8217;t posted many status updates for upb lately.  Sometimes that means I&#8217;m busy with other things, but right now it means I am working on it feverishly and can hardly stand to take a break from it.
I&#8217;m extremely happy with how it&#8217;s shaping up.  It&#8217;s getting more and more complete, and yet [...]]]></description>
			<content:encoded><![CDATA[<p>I haven&#8217;t posted many status updates for <a href="http://github.com/haberman/upb/tree/master">upb</a> lately.  Sometimes that means I&#8217;m busy with other things, but right now it means I am working on it feverishly and can hardly stand to take a break from it.</p>
<p>I&#8217;m extremely happy with how it&#8217;s shaping up.  It&#8217;s getting more and more complete, and yet still staying quite &#8220;micro.&#8221;  Notably, it just recently crossed 1500 lines of C (I don&#8217;t count auto-generated code in this), and it compiles to not quite 10kb of object code.  Throw in the auto-generated code (reflection data that describes the types in descriptor.proto &#8212; this is what allows loading other proto definitions at runtime) and this jumps to about 16kb.  Keep in mind that this effectively has all the major features of the main protobuf implementation (albeit packaged in different ways), but the main protobuf implementation is just over 1MB of object code!</p>
<p>I still think I can reach the main protobuf implementation&#8217;s performance, or maybe even exceed it.  I will consider the project a failure if I can&#8217;t get within 15% of its performance.</p>
<p>Anyway, just wanted to be sure everyone knows I&#8217;m still alive and very much working on this.  Now back to work.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2009/06/26/%ce%bcpb-status-update/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bit-fields in C99</title>
		<link>http://blog.reverberate.org/2009/06/26/bit-fields-in-c99/</link>
		<comments>http://blog.reverberate.org/2009/06/26/bit-fields-in-c99/#comments</comments>
		<pubDate>Sat, 27 Jun 2009 02:48:09 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[upb]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=168</guid>
		<description><![CDATA[Recently I came upon some spirited discussion on reddit concerning a blog post that discussed the use of bit-fields in C.  As a quick refresher to anyone unfamiliar or rusty on bit-fields, they are a construct in C that lets you specify how many bits different members of a structure should be allocated:

struct foo [...]]]></description>
			<content:encoded><![CDATA[<p>Recently I came upon some <a href="http://www.reddit.com/r/programming/comments/8vj04/a_much_better_way_of_handling_bitfields_in_c_more/">spirited discussion on reddit</a> concerning <a href="http://www.pagetable.com/?p=250">a blog post that discussed the use of bit-fields in C</a>.  As a quick refresher to anyone unfamiliar or rusty on bit-fields, they are a construct in C that lets you specify how many bits different members of a structure should be allocated:</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #993333;">struct</span> foo <span style="color: #009900;">&#123;</span>
  <span style="color: #993333;">unsigned</span> <span style="color: #993333;">int</span> a<span style="color: #339933;">:</span><span style="color: #0000dd;">1</span><span style="color: #339933;">;</span>
  <span style="color: #993333;">unsigned</span> <span style="color: #993333;">int</span> b<span style="color: #339933;">:</span><span style="color: #0000dd;">2</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>Without the bit-field specifiers (the &#8220;:1&#8243; and &#8220;:2&#8243; above) this structure would be at least four bytes long, and both &#8220;a&#8221; and &#8220;b&#8221; would be capable of representing integers from 0 to 65535.  With the bit-field specifiers the structure is likely only one byte long, and &#8220;a&#8221; and &#8220;b&#8221; can only represent 0-1 and 0-3, respectively.  Bit fields are a way to store many distinct values compactly inside a struct, to save memory.</p>
<p>I am using bitfields in <a href="http://github.com/haberman/upb/tree/master">upb</a> to store flags for whether each field is set or not.  When I generate a structure definition for a specific proto type, code in the header file looks something like this:</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #993333;">struct</span> google_protobuf_DescriptorProto <span style="color: #009900;">&#123;</span>
  <span style="color: #993333;">union</span> <span style="color: #009900;">&#123;</span>
    uint8_t bytes<span style="color: #009900;">&#91;</span><span style="color: #0000dd;">1</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
    <span style="color: #993333;">struct</span> <span style="color: #009900;">&#123;</span>
      bool name<span style="color: #339933;">:</span><span style="color: #0000dd;">1</span><span style="color: #339933;">;</span>  <span style="color: #808080; font-style: italic;">/* = 1, optional. */</span>
      bool field<span style="color: #339933;">:</span><span style="color: #0000dd;">1</span><span style="color: #339933;">;</span>  <span style="color: #808080; font-style: italic;">/* = 2, repeated. */</span>
      bool nested_type<span style="color: #339933;">:</span><span style="color: #0000dd;">1</span><span style="color: #339933;">;</span>  <span style="color: #808080; font-style: italic;">/* = 3, repeated. */</span>
      bool enum_type<span style="color: #339933;">:</span><span style="color: #0000dd;">1</span><span style="color: #339933;">;</span>  <span style="color: #808080; font-style: italic;">/* = 4, repeated. */</span>
      bool extension_range<span style="color: #339933;">:</span><span style="color: #0000dd;">1</span><span style="color: #339933;">;</span>  <span style="color: #808080; font-style: italic;">/* = 5, repeated. */</span>
      bool extension<span style="color: #339933;">:</span><span style="color: #0000dd;">1</span><span style="color: #339933;">;</span>  <span style="color: #808080; font-style: italic;">/* = 6, repeated. */</span>
      bool options<span style="color: #339933;">:</span><span style="color: #0000dd;">1</span><span style="color: #339933;">;</span>  <span style="color: #808080; font-style: italic;">/* = 7, optional. */</span>
    <span style="color: #009900;">&#125;</span> has<span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span> set_flags<span style="color: #339933;">;</span>
  <span style="color: #993333;">struct</span> upb_string<span style="color: #339933;">*</span> name<span style="color: #339933;">;</span>
  UPB_STRUCT_ARRAY<span style="color: #009900;">&#40;</span>google_protobuf_FieldDescriptorProto<span style="color: #009900;">&#41;</span><span style="color: #339933;">*</span> field<span style="color: #339933;">;</span>
  UPB_STRUCT_ARRAY<span style="color: #009900;">&#40;</span>google_protobuf_FieldDescriptorProto<span style="color: #009900;">&#41;</span><span style="color: #339933;">*</span> extension<span style="color: #339933;">;</span>
  UPB_STRUCT_ARRAY<span style="color: #009900;">&#40;</span>google_protobuf_DescriptorProto<span style="color: #009900;">&#41;</span><span style="color: #339933;">*</span> nested_type<span style="color: #339933;">;</span>
  UPB_STRUCT_ARRAY<span style="color: #009900;">&#40;</span>google_protobuf_EnumDescriptorProto<span style="color: #009900;">&#41;</span><span style="color: #339933;">*</span> enum_type<span style="color: #339933;">;</span>
  UPB_STRUCT_ARRAY<span style="color: #009900;">&#40;</span>google_protobuf_DescriptorProto_ExtensionRange<span style="color: #009900;">&#41;</span><span style="color: #339933;">*</span> extension_range<span style="color: #339933;">;</span>
  google_protobuf_MessageOptions<span style="color: #339933;">*</span> options<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span></pre></div></div>

<p>(that&#8217;s from <a href="http://github.com/haberman/upb/blob/c7f2a271ae29066744cf09499f744a0c6b89a27e/descriptor.h">descriptor.h</a>, which is automatically generated from descriptor.proto from the official protobuf implementation).</p>
<p>I &#8220;union&#8221; the bitfield with bytes so I can set the bytes en-masse more efficiently.  This seems like a really nice solution, because then client code that wants to test whether a field is set in a protobuf can write code like:</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span>fd<span style="color: #339933;">-&gt;</span>set_flags.<span style="color: #202020;">has</span>.<span style="color: #202020;">message_type</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
   <span style="color: #666666; font-style: italic;">// ...</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>So unfortunately the commenters on Reddit were pretty down on bit-fields and the aforementioned article.  The points they raised filled me with despair that I would have to abandon bit-fields.  This is bad because the only real alternative I would have for this kind of code generation is to generate explicit getters and setters for each bit of each structure!  The code would then immediately become quite uglified:</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span>google_protobuf_DescriptorProto_has_message_type<span style="color: #009900;">&#40;</span>fd<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
  <span style="color: #666666; font-style: italic;">// ...</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>Besides being longer and forcing you to re-type the whole name of the enclosed type, this approach forces you to generate lots of nearly-identical code and pollute the symbol namespace with tons of these useless functions that add no real value.  I was bumming about this.</p>
<p>But as I investigated the supposed problems with bit-fields, I became convinced that the problems were tractable and that I could in good conscience press forward with my plans to use them as intended.</p>
<h3>Here Be Dragons</h3>
<p><i>&#8220;Almost everything about [bit] fields is implementation-dependent.&#8221;</i> &#8211;K&#038;R, Section 6.9</p>
<p>From my reading of the <a href="http://www.open-std.org/JTC1/SC22/wg14/www/docs/n1124.pdf">C99 standard</a>, here are the guarantees you get (or don&#8217;t get) with bit-fields.</p>
<p><b>1. Bit-fields will be packed as tightly as possible, provided they don&#8217;t cross storage unit boundaries: YES.</b>  From the standard: (6.7.2.1 #10):</p>
<blockquote><p>An implementation may allocate any addressable storage unit large enough to hold a bit-ﬁeld. If enough space remains, a bit-ﬁeld that immediately follows another bit-ﬁeld in a structure shall be packed into adjacent bits of the same unit.</p></blockquote>
<p>The first part just means that the compiler can allocate any number of bytes to a bit-field, as long as it has enough bits.  So if you write:</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #993333;">struct</span> foo <span style="color: #009900;">&#123;</span>
  <span style="color: #993333;">unsigned</span> <span style="color: #993333;">int</span> a<span style="color: #339933;">:</span><span style="color: #0000dd;">8</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>&#8230;a will be at least 1 byte long, but could be more.  But the second part is where we get the nice guarantee that:</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #993333;">struct</span> foo <span style="color: #009900;">&#123;</span>
  <span style="color: #993333;">unsigned</span> <span style="color: #993333;">int</span> a<span style="color: #339933;">:</span><span style="color: #0000dd;">1</span><span style="color: #339933;">;</span>
  <span style="color: #993333;">unsigned</span> <span style="color: #993333;">int</span> b<span style="color: #339933;">:</span><span style="color: #0000dd;">2</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>&#8230;will always be packed into a single byte.  sizeof(struct foo) could be greater than 1, but you are guaranteed that a and b are in the same byte.</p>
<p><b>2. Bits are allocated low-to-high (or high-to-low): NO</b>.  This is not guaranteed by the spec, so an implementation could choose to allocate bits either starting at the high bit or the low bit of each byte.  The following program will test an implementation to see which it is:</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #339933;">#include &lt;stdio.h&gt;</span>
<span style="color: #993333;">int</span> main<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
  <span style="color: #993333;">union</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #993333;">struct</span> <span style="color: #009900;">&#123;</span>
      <span style="color: #993333;">unsigned</span> <span style="color: #993333;">int</span> a<span style="color: #339933;">:</span><span style="color: #0000dd;">1</span><span style="color: #339933;">;</span>
      <span style="color: #993333;">unsigned</span> <span style="color: #993333;">int</span> b<span style="color: #339933;">:</span><span style="color: #0000dd;">2</span><span style="color: #339933;">;</span>
      <span style="color: #993333;">unsigned</span> <span style="color: #993333;">int</span> c<span style="color: #339933;">:</span><span style="color: #0000dd;">3</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span> bitfield<span style="color: #339933;">;</span>
    <span style="color: #993333;">unsigned</span> <span style="color: #993333;">char</span> ch<span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span> u <span style="color: #339933;">=</span> <span style="color: #009900;">&#123;</span>.<span style="color: #202020;">ch</span> <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
&nbsp;
  u.<span style="color: #202020;">bitfield</span>.<span style="color: #202020;">a</span> <span style="color: #339933;">=</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">;</span>
  u.<span style="color: #202020;">bitfield</span>.<span style="color: #202020;">c</span> <span style="color: #339933;">=</span> <span style="color: #0000dd;">7</span><span style="color: #339933;">;</span>
  <span style="color: #b1b100;">switch</span><span style="color: #009900;">&#40;</span>u.<span style="color: #202020;">ch</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #b1b100;">case</span> <span style="color: #208080;">0x39</span><span style="color: #339933;">:</span> <span style="color: #000066;">printf</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;Bits allocated LOW-TO-HIGH.<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #000000; font-weight: bold;">break</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">case</span> <span style="color: #208080;">0x9C</span><span style="color: #339933;">:</span> <span style="color: #000066;">printf</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;Bits allocated HIGH-TO-LOW.<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #000000; font-weight: bold;">break</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">default</span><span style="color: #339933;">:</span> <span style="color: #000066;">printf</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;Oddball allocation: 0x%02hhx.<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">,</span> u.<span style="color: #202020;">ch</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #000000; font-weight: bold;">break</span><span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span>
  <span style="color: #b1b100;">return</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<h3>My Goal</h3>
<p>What I am trying to do is create a C struct like the above, but <i>also</i> be able to perform data-driven reflection on that C struct based on generic routines like:</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;">INLINE bool upb_msg_is_set<span style="color: #009900;">&#40;</span><span style="color: #993333;">void</span> <span style="color: #339933;">*</span>s<span style="color: #339933;">,</span> <span style="color: #993333;">int</span> index<span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
  <span style="color: #b1b100;">return</span> <span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span><span style="color: #993333;">char</span><span style="color: #339933;">*</span><span style="color: #009900;">&#41;</span>s<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#91;</span>index <span style="color: #339933;">/</span> <span style="color: #0000dd;">8</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">&amp;</span> <span style="color: #009900;">&#40;</span><span style="color: #0000dd;">1</span> <span style="color: #339933;">&lt;&lt;</span> <span style="color: #009900;">&#40;</span>index <span style="color: #339933;">%</span> <span style="color: #0000dd;">8</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>This means that my reflection routines like the one above need to be able to mimic the implementation-defined bit ordering as described above.  The routine above does not &#8212; it assumes that the bits are assigned from low to high.  But a few #ifdefs should make the above not too difficult.</p>
<h3>Performance</h3>
<p>In theory, the ideal compiler should love bit-fields because they give it maximum information with which to optimize loads and stores.  I see a lot of people claim that some compilers generate bad code for bit-fields, but that is a possibility with any construct you use.  From what I can see GCC generates very good code for bit-fields.</p>
<p>So my conclusion is that the hurdles of bit-fields are tractable, and unless some new information comes my way, I plan to move forward with my plan to use them.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2009/06/26/bit-fields-in-c99/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>pbstream&#8217;s name</title>
		<link>http://blog.reverberate.org/2009/05/03/pbstreams-name/</link>
		<comments>http://blog.reverberate.org/2009/05/03/pbstreams-name/#comments</comments>
		<pubDate>Mon, 04 May 2009 03:09:27 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[upb]]></category>
		<category><![CDATA[pbstream]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=158</guid>
		<description><![CDATA[I hate naming things.  I&#8217;m starting to realize that pbstream is outgrowing its name.  That&#8217;s the one thing I always thought was brilliant about the iPod&#8217;s name.  When the iPod started playing video it was no problem &#8212; it&#8217;s a pod!  Why shouldn&#8217;t a pod play video?  There&#8217;s no way [...]]]></description>
			<content:encoded><![CDATA[<p>I hate naming things.  I&#8217;m starting to realize that pbstream is outgrowing its name.  That&#8217;s the one thing I always thought was brilliant about the iPod&#8217;s name.  When the iPod started playing video it was no problem &#8212; it&#8217;s a pod!  Why <i>shouldn&#8217;t</i> a pod play video?  There&#8217;s no way to outgrow the name &#8220;iPod.&#8221;  Brilliant.</p>
<p>I had no such luck with pbstream.  The tagline is currently:</p>
<p><code> pbstream - a stream-oriented implementation of protocol buffers.</code></p>
<p>It&#8217;s true that offering streaming semantics is one distinguishing feature of my implementation, but the way it&#8217;s growing it&#8217;s no longer really &#8220;stream-oriented.&#8221;  It will offer both stream-oriented and structure-oriented semantics.  Both will be equally supported/encouraged &#8212; it will be a matter of what best suits your needs.</p>
<p>If anything, the number one distinguishing feature of my implementation is that it is minimal.  It gives you a set of tools to use whatever paradigm is the right trade-off for you.  It gives you building blocks to assemble as you see fit.  There are never &#8220;riders&#8221; &#8212; things like memory management that you have to take along with the parts you <i>really</i> want.</p>
<p>So what&#8217;s in a name?  Desired characteristics:</p>
<ul>
<li>Unique, reasonably Googleable.</li>
<li>Communicates that it is a protobuf implementation, and if possible communicates the philosophy described above.</li>
<li>The name (or a reasonable abbreviation thereof) makes for a nice prefix that can be appended to all my identifiers.  Right now it&#8217;s <code>pbstream_this</code> and <code>pbstream_that</code>.</li>
</ul>
<p>Originally I just called it &#8220;pb&#8221; but that was a tad bit too generic.  I slightly like &#8220;pblab&#8221; (communicates that it gives you building blocks for forging your own strategies), but I dislike the connotation that it is unfinished, unpolished, not production quality.</p>
<p>Perhaps I could give it a name, but keep using &#8220;pb_&#8221; in my identifiers.  It could also help me organize the names of the different parts.  I could use names like:</p>
<ul>
<li><code>pb_stream_parser_*</code> for the stream parser.</li>
<li><code>pb_struct_*</code> for the code that defines an in-memory structure.</li>
</ul>
<p>I kind of like that.  But I still need a name for the package itself that is better than &#8220;pb&#8221;.  Ideas?</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2009/05/03/pbstreams-name/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>&#8220;Hey! It’s been months! What happened?&#8221;</title>
		<link>http://blog.reverberate.org/2009/05/03/hey-it%e2%80%99s-been-months-what-happened/</link>
		<comments>http://blog.reverberate.org/2009/05/03/hey-it%e2%80%99s-been-months-what-happened/#comments</comments>
		<pubDate>Mon, 04 May 2009 02:43:22 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[upb]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=152</guid>
		<description><![CDATA[&#8230;reads the latest comment on my last entry.  It&#8217;s true, I&#8217;ve gone silent for a few months.  Work on pbstream has languished.  The answer to &#8220;what happened?&#8221; is a combination of a personal life that got crazy and a brick wall I hit in the design of pbstream.  But I think [...]]]></description>
			<content:encoded><![CDATA[<p>&#8230;reads the latest comment on my last entry.  It&#8217;s true, I&#8217;ve gone silent for a few months.  Work on <a href="http://github.com/haberman/pbstream/tree/master">pbstream</a> has languished.  The answer to &#8220;what happened?&#8221; is a combination of a personal life that got crazy and a brick wall I hit in the design of pbstream.  But I think I have resolved the latter of those at least and work is progressing again.</p>
<p>So what is the brick wall I hit with pbstream?  I had conflicting goals that I couldn&#8217;t figure out how to reconcile.  One goal was to include as <i>little</i> policy related to memory management as humanly possible.  In other words, if you&#8217;re using pbstream to store fully-parsed protobufs in memory &#8212; I&#8217;ll use the <a href="http://en.wikipedia.org/wiki/Simple_API_for_XML">SAX</a> vs. <a href="http://en.wikipedia.org/wiki/Document_Object_Model">DOM</a> analogy again, where this is the DOM case &#8212; then I want pbstream to not define any semantics for how the memory for that tree is managed.  Is it reference-counted, garbage-collected, or is it just allocated/deallocated in one big chunk?  Every runtime already has its own answer to this question.  I don&#8217;t want to include a memory management strategy, which adds to the code size and complexity, just to have it be an annoying thing that you have to integrate into whatever runtime you&#8217;re already using.</p>
<p>Or, as I said in a previous entry: <a href="http://blog.reverberate.org/2008/02/18/porting-gazelle-from-lua-to-javascript/">Oh fantastic! Because definitely the one thing that my application doesn’t already have is a memory manager.</a></p>
<p>On the other hand, I realized that I really <i>do</i> want the ability to pass an in-memory protobuf representation between language implementations.  For example, if there&#8217;s a C library that conjures up some complex protobuf, I want to be able to pass that parsed in-memory protobuf to Python without serializing/deserializing.  I want different runtimes to be able to look at this memory and make sense of it.</p>
<p>So far so good.  This alone doesn&#8217;t require memory management.  But I also wanted two things that <i>do</i> pose a memory-management problem:</p>
<ul>
<li>I wanted to pass protobufs between language implementations <i>without copying</i>.</li>
<li>I wanted one language implementation to be able to <i>modify</i> the protobuf that had been created by a different language implementation.</li>
</ul>
<p>The first is a problem because if you take a protobuf you created in C and pass it to Python, once that Python returns C has no idea whether Python has taken references to any of the protobuf&#8217;s data.  If C frees the protobuf, Python could try to later read from the freed memory.  If C <i>doesn&#8217;t</i> free the protobuf, the memory will leak.  The fundamental problem is that C and Python have no way of cooperating to be sure the memory is freed only when the last reference that <i>either</i> of them holds is dropped.</p>
<p>The second is a problem for a similar reason.  Suppose you take a protobuf you created in C and pass it to Python, and Python wants to mutate that protobuf.  Suppose the protobuf has a submessage, which is implemented as a reference to that other message.  If in python you say:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">message.<span style="color: black;">submessage</span> = other_submessage</pre></div></div>

<p>&#8230;Python now has to decide what to do with the <code>submessage</code> that <code>message</code> already had.  Should it free it or not?  If you free it and C had a reference to it, C references freed memory.  If you don&#8217;t free it and C <i>doesn&#8217;t</i> have a reference to it, it leaks.</p>
<p>I fretted over the question of what to do about this for weeks.  I didn&#8217;t want to give up the ability to pass protobufs between languages without serializing/deserializing.  But I was loath to include the slightest amount of memory management code because of the implied complexity, run-time overhead, and code size effects.  So everything ground to a halt while I tried to reconcile these conflicting desires in my head.</p>
<p>At one point I convinced myself that the only way out was reference-counting.  The in-memory protobufs themselves would contain one extra integer for the reference count, and the code would contain just these little extra reference count increments/decrements and checks.  Then C and Python would both maintain reference counts to these shared data structures.  For Python it would be a sort of two-stage reference counting, since the individual Python objects would be reference-counted, and when their reference count dropped to zero they would decrement the reference count of the shared protobuf structure one.</p>
<p>Eww, Eww, EWW!!  Now maybe you can understand a bit better why I haven&#8217;t gotten anything done for a while.  There&#8217;s no way I could be very happy implementing this scheme.</p>
<p>So quite recently I found a way to relax the sharing requirements <i>just enough</i> that I can get away with not implementing any memory management.  My new strategy is:</p>
<ul>
<li>language A can reference language B&#8217;s protobufs, but only for as long as B guarantees that all the references will continue to be valid.  In most cases this just means that when B calls A and passes a protobuf, A can only reference B&#8217;s protobufs before it returns control back to B.</li>
<li>In general, languages cannot mutate each other&#8217;s protobufs unless they have some special arrangement with regards to memory management.  I thought I really needed this capability in Gazelle, but I&#8217;m no longer convinced I do.</li>
</ul>
<p>So with that resolved, work is resuming, and I hope to have some results soon.  I always hate making promises though, because I&#8217;m unwilling to press on when I don&#8217;t have a satisfactory answer to a question (as the last month demonstrates), and this can delay my progress long beyond my original intentions.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2009/05/03/hey-it%e2%80%99s-been-months-what-happened/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>C Compilers</title>
		<link>http://blog.reverberate.org/2009/03/03/c-compilers/</link>
		<comments>http://blog.reverberate.org/2009/03/03/c-compilers/#comments</comments>
		<pubDate>Tue, 03 Mar 2009 09:38:28 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[upb]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=146</guid>
		<description><![CDATA[So the more I work on pbstream (and I&#8217;m working on it quite a lot) the more obsessed I get with making it perfect.  I&#8217;m no longer just out to just write some good software, my new goal is to utterly demolish this problem space with blindingly utter perfection.  It&#8217;s going to be [...]]]></description>
			<content:encoded><![CDATA[<p>So the more I work on <a href="http://github.com/haberman/pbstream">pbstream</a> (and I&#8217;m working on it quite a lot) the more obsessed I get with making it perfect.  I&#8217;m no longer just out to just write some good software, my new goal is to utterly demolish this problem space with blindingly utter perfection.  It&#8217;s going to be one of the best things I&#8217;ve ever done.</p>
<p>So my latest OCD goal this quest is to make it truly 100% C99 compliant: compilable on the weirdest and most nontraditional target, and free of any undefined behavior.  I like most people have spent several years lulled into the x86/GCC monoculture.  It&#8217;s time for me to transcend that narrow-minded view.</p>
<p>So I&#8217;ve set out to obtain every C compiler and lint tool I can possibly find, hoping that each one can show me even the smallest spot of imperfection that I can proceed to purge from my crystalline statements and declarations.  I was ashamed to realize that I couldn&#8217;t even think of a time I had used a non-GCC/MSVC compiler in almost five years, and I didn&#8217;t know which ones were available to me.  I am happy to say that I have eradicated some of that ignorance, and already have found some bugs that GCC didn&#8217;t show me!</p>
<p>So what C compilers are out there?  Here are some of the ones I came across:</p>
<ul>
<li><a href="http://developers.sun.com/sunstudio/">Sun Studio</a>, Sun&#8217;s C and C++ compiler.  To my surprise, it is available for free.  Includes a lint tool.</li>
<li><a href="http://www.intel.com/cd/software/products/asmo-na/eng/284132.htm">Intel C++ Compiler</a>.  You can obtain a non-commercial version for free if you absolutely, totally, cross-your-heart promise that you are not compensated in any way for your work.  I have a vision of the Intel guys following around open-source developers at conferences and revoking them their non-commercial licenses at the first sign of any swag.</li>
<li><a href="http://www.digitalmars.com/download/freecompiler.html">Digital Mars C and C++ compiler</a>, which is freely available for Windows and DOS.</li>
<li><a href="http://www.microsoft.com/express/vc/">Visual C++ Express</a> is a free download from Microsoft.  Apparently has very poor C99 support though.</li>
<li><a href="http://www.pgroup.com/">The Portland Group</a> has a C and C++ compiler, but there doesn&#8217;t seem to be any way to use it without paying several hundred dollars.</a></li>
<li><a href="http://www.comeaucomputing.com/">Comeau</a> has a C and C++ compiler that takes great pride in its standards-compliance.  You just have to get over the curiously awful web page, which <i>actually says</i> &#8220;Bursting With So Much Language Support It Hurts!&#8221;  No free use (though at $50, it&#8217;s not a big hurdle &#8212; I paid more for a parking ticket today.  If only I hadn&#8217;t had six such parking tickets to pay.  For parking on my own street.  But that&#8217;s a story for another day).  There is a web form (!) where you can paste your code and see error/warning messages though, which could be all you need to test standards-compliance.</li>
<li><a href="http://sdcc.sourceforge.net/">SDCC</a>, the Small Device C Compiler is a GPL&#8217;d C compiler that targets tiny embedded processors.  I was looking quite forward to this one as a target that is wildly different from other C compilers.  Unfortunately, though it has partial C99 support, it does not support 64-bit integer types, and I&#8217;m not quite willing tilt at that particular windmill of making pbstream portable to compilers without 64-bit integer support.</li>
<li><a href="http://bellard.org/tcc/">TCC</a>, the Tiny C Compiler.  Fabrice Bellard is an absolute bad-ass, but unfortunately TCC doesn&#8217;t seem to support much of C99 (it choked on a for loop that defined a variable).</li>
<li><a href="http://clang.llvm.org/">Clang</a>, the LLVM compiler.  My version of Ubuntu doesn&#8217;t seem to have this available, and I always find LLVM a bit of a pain to compile and install from source, but I would like to try this when I get the chance.
</ul>
<p>So this should be enough variety that I can feel confident making smug assertions about pbstream&#8217;s portability.  Now the only question is whether I&#8217;ll shell out $50 for the Comeau compiler just for that added potential for portability smugness.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2009/03/03/c-compilers/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>The art of hashing</title>
		<link>http://blog.reverberate.org/2009/03/01/the-art-of-hashing/</link>
		<comments>http://blog.reverberate.org/2009/03/01/the-art-of-hashing/#comments</comments>
		<pubDate>Sun, 01 Mar 2009 09:50:29 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[upb]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=131</guid>
		<description><![CDATA[It might sound strange for me to call hashing an art.  After all, it&#8217;s about the most fundamental data structure we&#8217;ve got, after arrays, linked lists, and binary trees.  It&#8217;s been studied extensively for fifty years.  What amount of &#8220;art&#8221; could possibly be left in such a topic?
I&#8217;ve learned to anticipate that [...]]]></description>
			<content:encoded><![CDATA[<p>It might sound strange for me to call hashing an art.  After all, it&#8217;s about the most fundamental data structure we&#8217;ve got, after arrays, linked lists, and binary trees.  It&#8217;s been studied extensively for fifty years.  What amount of &#8220;art&#8221; could possibly be left in such a topic?</p>
<p>I&#8217;ve learned to anticipate that there are dark and unstudied corners in even the most fundamental computer science topics.  And hashing appears to be no exception.  I&#8217;m investigating hashing at the moment because I need hash tables in a few places for my rapidly-developing Protocol Buffers implementation <a href="http://github.com/haberman/pbstream">pbstream</a>.  So I wanted to take a quick survey of the state-of-the-art in hash tables, and implement a minimal hash table that suits my needs.</p>
<p>What are my needs?  If someone defines a protobuf like so:</p>
<p><code>message Person {<br />
  required string name = 1;<br />
  required int32 id = 2;<br />
  optional string email = 3;<br />
}</code></p>
<p>&#8230;I need to build two hash tables for looking up fields: one keyed by number (1, 2, 3) and one keyed by name (name, id, email).</p>
<p>You might say &#8220;why not just use an array for looking up fields by number?&#8221;  Indeed that&#8217;s what I&#8217;ll do in most cases, but field numbers can be as large as 2**27 and needn&#8217;t be allocated densely.  So gracefully degrading to a hash table is important (one practical reason why the field numbers might not be dense is if the client is using extensions).  To get the best of both worlds I&#8217;ll follow Lua&#8217;s lead and have the field number table be a hybrid array/hashtable, where the size of the array part is such that it&#8217;s at least half full.</p>
<p>My usage pattern for these hashtables is that I&#8217;ll build them once (when I parse a .proto file), then do lookups in the critical path of parsing.  So insert time is practically irrelevant, but lookup time is extremely important, because it&#8217;s in my critical path.  I&#8217;ll gladly trade some memory for fast lookups.</p>
<p>So I figured I&#8217;d take a few hours to discover the state-of-the-art in hashtables, implement that, and be on my merry way.  Unfortunately there doesn&#8217;t seem to be much agreement about what the state of the art even is!</p>
<p>Let&#8217;s start with even the simple and most basic question: how big should a hashtable be?  The literature differs here: some of it claiming that hashtables should have sizes that are prime numbers and some saying they should be powers of two.</p>
<p>But once you&#8217;ve decided that, you&#8217;ve only just entered the jungle of collision-resolution strategies.  There are a lot of them, and it&#8217;s not at all clear to me that the trade-offs are well understood.  A list of ones I&#8217;ve come across, and my current understanding of what they mean:</p>
<ul>
<li>linear probing: if your hash function is h(k), then a collision in T[h(k)] means you should also try T[h(k)+i] for i=1 to some limit.
<li>quadratic probing: like linear probing, but also throw in a T[h(k) + i + j^2], to spread it out a little.
<li>double hashing: like linear probing, but you scale the jumps by another hash function: T[h(k) + g(k)*i].
<li>chaining: all entries that collide are in a linked list, so for lookups you search this list.  chaining can be either internal (table entries have both values and links, links point to other locations in the table) or external (table is just pointers to the head of linked lists, inserts always allocate a new node).
<li>cuckoo hashing: Wikipedia makes this sound like hashing&#8217;s best-kept-secret.  You use two hash functions: if you do an insertion and the spot you want is taken, you kick whatever was already there out and put it in its alternate spot.
<li>two-way chaining: use two hash functions (like cuckoo and double hashing), but you use the two hash values to find the two possible chains the value could be in.
</ul>
<p>Few papers seem to offer a comprehensive account of the trade-offs between these.  Some authors claim that linear probing is best on modern CPUs because of excellent locality (cache-friendly), but <a href="www.siam.org/meetings/alenex05/papers/13gheileman.pdf">this paper [PDF]</a> takes exception to that claim, saying that the better overall performance of double-hashing negates the cache effects.  And indeed, none of the real-world implementations I found use linear probing.</p>
<p>A quick survey of mainstream hashing implementations shows that they can&#8217;t agree on what&#8217;s best either.</p>
<ul>
<li>Lua uses internal chaining with powers-of-two table sizes and &#8220;Brent&#8217;s variation&#8221; (so-named because of a 1973 paper called <a href="http://wwwmaths.anu.edu.au/~brent/pub/pub013.html">Reducing the retrieval time of scatter storage techniques</a> &#8212; so much for caching-conscious schemes).</li>
<li>Ruby uses internal chaining with prime table sizes.</li>
<li>Python uses a custom probing scheme.  You have to read <a href="http://svn.python.org/view/python/trunk/Objects/dictobject.c?revision=68128&#038;view=markup">the comments in the source file itself</a> &#8212; they do more to make hashing sound like black magic than I possible could.</li>
<li>Perl appears to use external chaining, but its code is a bit hard to follow.</li>
</ul>
<p>Even if you decide on a collision strategy, you still have to decide on a hash function, and there are surprises lurking for you there as well.  Conventional wisdom about hashing is that you want a hash function that randomly distributes the inputs to the outputs.  But Python and Lua both hash integers to themselves (so h(5) = 5)!</p>
<p>There are just a dizzying number of variations on hashing.  Given my relatively simple use case (one-time construction, lookup speed trumps all) you&#8217;d think there would be a clear and obvious answer.  But I don&#8217;t see one.</p>
<p>Given my deep respect for Lua&#8217;s implementation, I&#8217;m tempted to just start with that and worry about trying to optimize it later.  I&#8217;m slightly disconcerted with its focus on getting good performance even when the table is quite full, because as I mentioned for my use case I&#8217;ll gladly trade modest amounts of RAM for faster lookup.  At least it&#8217;s someplace to start.</p>
<p>But seriously, who knew that hash tables had so many variations, with no clear winner?</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2009/03/01/the-art-of-hashing/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
	</channel>
</rss>
