<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Josh Haberman &#187; upb</title>
	<atom:link href="http://blog.reverberate.org/category/upb/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.reverberate.org</link>
	<description>parsing, performance, minimalism with C99</description>
	<lastBuildDate>Mon, 30 Jan 2012 00:15:14 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Making Knuth&#8217;s wish come true: the x32 ABI</title>
		<link>http://blog.reverberate.org/2011/09/26/making-knuths-wish-come-true-the-x32-abi/</link>
		<comments>http://blog.reverberate.org/2011/09/26/making-knuths-wish-come-true-the-x32-abi/#comments</comments>
		<pubDate>Mon, 26 Sep 2011 23:34:07 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[upb]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=496</guid>
		<description><![CDATA[Several years ago (though I can&#8217;t say exactly how many since it&#8217;s not dated) Knuth made the following complaint: A Flame About 64-bit Pointers It is absolutely idiotic to have 64-bit pointers when I compile a program that uses less &#8230; <a href="http://blog.reverberate.org/2011/09/26/making-knuths-wish-come-true-the-x32-abi/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Several years ago (though I can&#8217;t say exactly how many since it&#8217;s not dated) Knuth <a href="http://www-cs-faculty.stanford.edu/~uno/news08.html">made the following complaint</a>:</p>
<blockquote><p><b>A Flame About 64-bit Pointers</b></p>
<p>It is absolutely idiotic to have 64-bit pointers when I compile a program that uses less than 4 gigabytes of RAM. When such pointer values appear inside a struct, they not only waste half the memory, they effectively throw away half of the cache.</p>
<p>The gcc manpage advertises an option &#8220;-mlong32&#8243; that sounds like what I want. Namely, I think it would compile code for my x86-64 architecture, taking advantage of the extra registers etc., but it would also know that my program is going to live inside a 32-bit virtual address space.</p>
<p>Unfortunately, the -mlong32 option was introduced only for MIPS computers, years ago. Nobody has yet adopted such conventions for today&#8217;s most popular architecture. Probably that happens because programs compiled with this convention will need to be loaded with a special version of libc.</p>
<p>Please, somebody, make that possible.</p></blockquote>
<p>I always thought this made a lot of sense.  <a href="https://launchpad.net/bugs/185263">People have asked distro-makers for this before without a lot of success</a>, but it looks like this is now being worked on by high-profile people in the Linux community.  It is called <a href="https://sites.google.com/site/x32abi/">The x32 ABI</a> (see <a href="http://lwn.net/Articles/456731/">the LWN coverage</a> for a more digestible description).  It&#8217;s exciting because in some benchmarks this can outperform the x86-64 ABI by 10% or more.  It&#8217;s a tradeoff &#8212; if you don&#8217;t need to address more than 4GB of memory, you can get faster programs because smaller pointers have better cache utilization.  You&#8217;ll use less memory too.</p>
<p>This could have been done in a way that operated nearly the same as &#8220;compatibility mode&#8221; (ie. running 32-bit binaries on a 64-bit CPU/OS), which would have required only minimal changes to the kernel/toolchain.  But it looks like their plans are more ambitious: they want to be able to use the optimized <code>SYSCALL64</code> instruction (which is &#8220;much faster&#8221; than <code>int 0x80</code> <a href="http://article.gmane.org/gmane.linux.kernel/1184885">according to H. Peter Anvin</a>), and they&#8217;re looking at fixing other problems like 32-bit <code>time_t</code>.  So it&#8217;s a more substantial effort, but it looks like there&#8217;s significant interest and momentum behind this.</p>
<p>Thinking about how this would affect upb, my impression is that I could use my x86-64 JIT unmodified with x32, since it appears to have all of the same calling conventions.  It has the same set of callee-save registers and the same set of registers for parameter transfer, and I think these are the main things upb&#8217;s JIT-ted code depends on.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2011/09/26/making-knuths-wish-come-true-the-x32-abi/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Beating the compiler</title>
		<link>http://blog.reverberate.org/2011/09/17/beating-the-compiler/</link>
		<comments>http://blog.reverberate.org/2011/09/17/beating-the-compiler/#comments</comments>
		<pubDate>Sat, 17 Sep 2011 22:08:25 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[upb]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=491</guid>
		<description><![CDATA[It&#8217;s been a while since I&#8217;ve posted about upb, but I&#8217;ve been busy improving it! I think the biggest achievement I can mention is that the core upb APIs (upb_handlers, upb_def, and upb_bytestream/upb_bytesink) are converging to the point where I&#8217;m &#8230; <a href="http://blog.reverberate.org/2011/09/17/beating-the-compiler/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s been a while since I&#8217;ve posted about upb, but I&#8217;ve been busy improving it!  I think the biggest achievement I can mention is that the core upb APIs (upb_handlers, upb_def, and upb_bytestream/upb_bytesink) are converging to the point where I&#8217;m comfortable with people starting to experiment with them.  I&#8217;m not promising they won&#8217;t change at all, but I&#8217;m a lot more confident in their overall structure and semantics than I have been previously.</p>
<p>Notably, I think upb&#8217;s deserialization is ready for casual and experimental use.  Definitely don&#8217;t trust any data to it until it&#8217;s better tested, though.  I won&#8217;t be releasing until things have converged a bit more (and are better tested).</p>
<p>I gave a talk about upb at Google yesterday that was well-received.  One question that comes up is &#8220;how are you beating the generated code from the protobuf compiler?&#8221;  For the record, my JIT appears to be about 25% faster than Google&#8217;s protobuf release on a completely apples-to-apples test.  It is a bit surprising, since my code has basically the same structure as protobuf&#8217;s generated C++ &#8212; I don&#8217;t invent any new optimizations or anything like that.  I think it really comes down to generating better assembly than gcc is.</p>
<p>We live in a day and age where common wisdom is that you can&#8217;t beat a good C++ compiler, or at least not by much, and I think this is probably true for 99% of use cases.  But I was first inspired to think that I could beat the C++ compiler in this case by reading <a href="http://article.gmane.org/gmane.comp.lang.lua.general/75426">this mailing list post from Mike Pall</a> where he explains why you can still beat the compiler for interpreter main loops.  A protobuf parser is surprisingly similar to a byte-code interpreter main loop, so I thought I&#8217;d give it a shot.</p>
<p>Below is just the simplest example I could dig up of a side-by-side comparison of my code vs. the compiler&#8217;s.  What follows is the code to parse a single fixed64 value:</p>
<pre lang="asm">
  ; upb JIT assembly:
  mov    rdx,QWORD PTR [rbx+0x2]    ; load fixed64 val out of buffer
  add    rbx,0xa                    ; advance buffer by 10 (2 for tag)
  mov    QWORD PTR [r12+0x40],rdx   ; store fixed64 value in message
  or     BYTE PTR [r12+0x1],0x4     ; set hasbit
  cmp    rbx,QWORD PTR [r15+0xaf8]  ; check for end-of-buffer
  jae    <end of buffer>
  mov    rcx,QWORD PTR [rbx]        ; load next tag
  cmp    cx,0x1b0                   ; next field+wt in order?
  je     <expected next field>
</pre>
<p>There&#8217;s not a lot left to cut away here.  Compare with protobuf/gcc-generated code:</p>
<pre lang="asm">
  ; protobuf/gcc code:
  mov    ecx,DWORD PTR [rbx+0x10]   ; load buffer end
  mov    rax,QWORD PTR [rbx+0x8]    ; load buffer
  sub    ecx,eax                    ; if (buffer_end_ - buffer_ <= 7)
  cmp    ecx,0x7
  jle    <error>
  mov    rax,QWORD PTR [rax]        ; load fixed64 val
  mov    rdx,QWORD PTR [rbp-0x48]   ; load this
  mov    QWORD PTR [rdx],rax        ; store fixed64 val in this
  add    QWORD PTR [rbx+0x8],0x8    ; advance buffer
  or     DWORD PTR [r12+0x74],0x800 ; set hasbit
  mov    rdx,QWORD PTR [rbx+0x10]   ; load buffer end
  mov    rax,QWORD PTR [rbx+0x8]    ; load buffer
  mov    ecx,edx
  sub    ecx,eax                    ; if (buffer_end_ - buffer_ <= 1)
  cmp    ecx,0x1
  jle    <end of file>
  cmp    BYTE PTR [rax],0xb0        ; check first byte of next tag
  jne    <lookup field>
  cmp    BYTE PTR [rax+0x1],0x1     ; check second byte of next tag
  jne    <lookup field>
</pre>
<p>There is some poor register allocation going on here &#8212; gcc is repeatedly loading <tt>buffer_</tt> and <tt>buffer_end_</tt> even though it has plenty of registers to play with.  All in all, gcc is generating over twice the number of instructions, over twice the number of loads, and more stores too.  Note that this is taken from the middle of a very large C++ function with a big switch statement and a bunch of gotos, and I think it&#8217;s just difficult for a compiler to do good register allocation under these constraints.</p>
<p>If the C++ compiler could know the difference between fast paths and slow paths and do the register allocation solely for the fast paths (spilling everything for the slow paths) it might have a better shot.  But still I think it&#8217;s just a hard problem.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2011/09/17/beating-the-compiler/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>upb status and preliminary performance numbers</title>
		<link>http://blog.reverberate.org/2011/04/25/upb-status-and-preliminary-performance-numbers/</link>
		<comments>http://blog.reverberate.org/2011/04/25/upb-status-and-preliminary-performance-numbers/#comments</comments>
		<pubDate>Tue, 26 Apr 2011 06:18:53 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[upb]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=441</guid>
		<description><![CDATA[The last few weeks have been very exciting for upb. On April 1 I checked in a JIT compiler for parsing protobufs, which one might think was an April Fool&#8217;s joke, but it&#8217;s real and the performance numbers so far &#8230; <a href="http://blog.reverberate.org/2011/04/25/upb-status-and-preliminary-performance-numbers/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>The last few weeks have been very exciting for upb.  On April 1 <a href="https://github.com/haberman/upb/commit/9eb4d695c49a85f7f72ad68c3c31affd61fef984">I checked in a JIT compiler for parsing protobufs</a>, which one might think was an April Fool&#8217;s joke, but it&#8217;s real and the performance numbers so far have exceeded my expectations.</p>
<h3>Why JIT?</h3>
<p>Before I get to the numbers, I should explain what it even means for upb to have a JIT.  If you&#8217;re not interested in the technical details, feel free to skip this section to go straight to the impacts and what this means for upb.</p>
<p>So why a JIT?  After all, protobufs are not an imperative language.  You can&#8217;t write an algorithm in a <code>.proto</code> file.  You can&#8217;t apply compiler techniques like SSA, strength reduction, tracing, or really any of the things you&#8217;d expect from a JIT on a platform like the JVM.  There is no byte code or intermediate representation to speak of.  So why would upb have a JIT and what on earth would it do?</p>
<p>To review, a <code>.proto</code> file defines a schema for some messages.  In this sense, it is comparable to <a href="http://json-schema.org/">JSON Schema</a> or <a href="http://en.wikipedia.org/wiki/XML_Schema_(W3C)">XML Schema</a>.</p>
<pre>message Person {
  required int32 id = 1;
  required string name = 2;
  optional string email = 3;
}</pre>
<p>To support forwards and backwards compatibility, we don&#8217;t make any assumptions about which fields we&#8217;re going to see or in what order.  Instead, on the wire each field is preceded by its tag number.  Logically, the serialized version is something like:</p>
<pre>- field number: 1, wire type: varint, value: 1
- field number 2, wire type: delimited, value: "Josh Haberman"
- field number 3, wire type: delimited, value: "jhaberman@gmail.com"</pre>
<p>If you were going to write a parser for this, you might create a table of fields, keyed by field number, and a main parser loop that looks something like this:</p>
<pre lang="c">
while (!done) {
  int type = fields[get_field_number()];
  switch (type) {
    case WIRE_TYPE_VARINT:
      parse_varint();
      break;
    case WIRE_TYPE_DELIMITED:
      parse_delimited();
      break;
    // etc.
  }
}</pre>
<p>We call this a &#8220;table-based parser.&#8221;  We look up the type in a table of fields, and use that to branch to the correct value parsing function.  There are minor variations on this, like using a function pointer instead of a switch statement, but it&#8217;s the same general idea.  To see an actual program that implements this style of parser, check out my bare-bones 100-line protobuf parser that I wrote a few years ago: <a href="http://blog.reverberate.org/wp-content/uploads/2008/07/pb.c">pb.c</a>.</p>
<p>In a way this resembles a byte-code interpreter, and the serialized protobuf resembles byte-code.  For example, see <a href="http://www.lua.org/source/5.1/lvm.c.html#luaV_execute">the main loop of the Lua interpreter</a>.  It has a similar pattern where it extracts an opcode and uses that to branch into a giant switch statement.</p>
<p>It turns out that if we generate code that is specific to one message type, and takes advantage of the fact that fields are usually encoded in order, we can beat this table-based parser significantly.  Google&#8217;s main &#8220;protobuf&#8221; software has done this for a long time: their strategy is to generate C++ classes that are specific to a proto message type (like <code>Person</code> in the above example).  This code has been highly tuned and optimized, and the generated parsing code looks something like this:</p>
<pre lang="cpp">while((tag = input->ReadTag()) != 0) {
  field_number = GetFieldNumber(tag);
  wire_type = GetWireType(tag);
  switch(tag) {
    // optional float field1 = 1;
    case 1:
      if (wire_type != FLOAT_WIRE_TYPE) goto error;
      input->ParseFloat();
      // Saves the switch() and type check branch if fields are in order.
      if (input->ExpectTag(2, DOUBLE_WIRE_TYPE)) goto parse_field2;
      break;

    // optional double field2 = 2;
    case 2:
      if (wire_type != DOUBLE_WIRE_TYPE) goto error;
parse_field2:
      input->ParseDouble();
      // saves the switch() and type check branch if fields are in order.
      if (input->ExpectTag(3, INT32_WIRE_TYPE)) goto parse_field3
      break;

    // optional int32 field3 = 3;
    case 3:
    // [...]
  }
}
</pre>
<p>This code also has a big <code>switch()</code> statement at the top level, but the targets are field numbers instead of wire types.  More importantly, each field&#8217;s case has a line like this:</p>
<pre lang="cpp">if (input->ExpectTag(3, INT32_WIRE_TYPE)) goto parse_field3</pre>
<p>This is key, because if the fields occur in order (which they usually do) then this branch will always be taken and it will be predicted correctly by the CPU.  This can make a huge difference.</p>
<p>While this kind of C++ code generation has benefited Google&#8217;s protobuf software in speed, I always found it inconvenient from a dynamic languages perspective.  What dynamic language user wants to have to generate C++ code and link that into their interpreter?  We&#8217;re not talking about a single C++ extension that you compile once: every single message you define in a .proto file generates <i>different</i> C++ code.  So you can&#8217;t just <code>apt-get install libprotobuf-python</code> and be done with it, you have to generate C++ for every message that your specific app wants to use.</p>
<p>This is where the JIT comes in.  If we can generate machine code at runtime, we can get the speed benefits of the generated code without having to generate, compile, and link C++ into our interpreters or VMs.  You could compile the library once, and after that you can dynamically load message definitions but still get the fastest possible parsing.</p>
<h3>Preliminary results</h3>
<p>I knew all of the theory I just described before I started writing a JIT.  But theory is just that &#8212; theory.  What would actually happen when I implemented a JIT?</p>
<p>The results exceeded my expectations.  I still am being somewhat cautious, because there are so many dimensions to any performance equation, and because benchmarks are not guaranteed to correspond to real-world performance.  That said, here are my preliminary results.  You can reproduce these by <a href="https://github.com/haberman/upb">obtaining upb from GitHub</a> and running <code>make benchmarks</code> followed by <code>make benchmark</code> (you have to define <code>-DUPB_USE_JIT_X64</code> to get the JIT).</p>
<pre>Parsing an 80k protobuf into a data structure repeatedly,
calling Clear() between each parse.  (proto2 == Google protobuf)

proto2 table-based parser      38 MB/s
proto2 generated code parser   265 MB/s
upb table-based parser         340 MB/s
upb JIT parser                 741 MB/s</pre>
<p>The results above are designed to be as apples-to-apples as possible.  In other words, I disabled upb&#8217;s optimization that avoids <code>memcpy()</code> because Google&#8217;s protobuf doesn&#8217;t support it in its open-source release.  I think the reason that even my table-based parser beats Google&#8217;s generated code here is because proto2 implements Clear() in a sub-optimal way that requires an extra pass over the message tree; see <a href="https://github.com/haberman/upb/blob/7cf5893dcc755a1bc706536088db3d34cfc8c46b/src/upb_msg.h#L232">my comment in upb_msg.h</a> for more info.</p>
<p>Things get even better if we allow ourselves to drop the constraint of being apples-to-apples with proto2.  upb is capable of <i>stream-based parsing</i> like SAX, whereas proto2 only supports a DOM-based approach where you parse into a data structure.  If we include the performance numbers for stream parsing, and for DOM-based parsing that avoids memcpy, we get:</p>
<pre>upb table-based stream parser    420 MB/s
upb JIT no-memcpy() DOM parser   870 MB/s
upb JIT stream parser           1430 MB/s</pre>
<p>If you&#8217;re like me when I first saw these numbers, your jaw is on the floor at seeing almost 1.5GB/s doing stream parsing of protocol buffers.  At this point we are 5x proto2&#8242;s highly optimized generated code.</p>
<p>upb&#8217;s JIT isn&#8217;t complete and can&#8217;t handle all cases, but these performance numbers should still be valid because these benchmarks only used the parts of the format that <i>are</i> currently supported.</p>
<h3>So when can I USE it?  And can I help?</h3>
<p>One reason I&#8217;ve avoided posting these extremely positive results so far is that I hate to get people excited about something that&#8217;s still not ready for users yet.</p>
<p>Adding support for the JIT required making extremely large and intrusive changes to the core interfaces, like <a href="https://github.com/haberman/upb/commit/8ef6873e0e14309a1715a252a650bab0ae1a33ef">this 3000 line refactoring</a> that had to be completed before I could write even the first line of the JIT.  I <i>need</i> to have the flexibility to change core interfaces like this still, because the design is still converging.  So as much as I wish I could say it&#8217;s ready to go, I still need to hold off on a real release.</p>
<p>The good news is that the design is still making huge strides.  In the last few weeks I&#8217;ve been refining my scheme for how upb will integrate into different VM&#8217;s and language runtimes, and I feel more confident than ever that the language bindings for Lua, Python, etc. will be some of the fastest extensions ever offered for this kind of functionality.</p>
<p>As a preview for where this is going, I think that upb will even be usable as a JSON library, offering speed that is as good or greater than any existing JSON libraries.  JSON can be directly mapped onto a subset of protobufs (JSON only uses double-precision numbers) and the protobuf text format is already extremely similar to JSON.  So all the work I&#8217;ve done to optimize memory layout and dynamic language object access should apply.</p>
<p>And while I&#8217;m really happy to get offers to help out, it&#8217;s still at a stage of design where I need to be doing most of the work.  Working on upb generally involves pacing around my apartment deep in thought about all the requirements and use cases I want to satisfy and brainstorming a million different approaches until I converge on the the one that is the smallest, fastest, and most flexible.  It&#8217;s hard work but I love it, and the more time that passes the more convinced I am that this is going to be big.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2011/04/25/upb-status-and-preliminary-performance-numbers/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Gazelle/upb status</title>
		<link>http://blog.reverberate.org/2011/01/28/gazelleupb-status/</link>
		<comments>http://blog.reverberate.org/2011/01/28/gazelleupb-status/#comments</comments>
		<pubDate>Fri, 28 Jan 2011 18:35:01 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[Gazelle]]></category>
		<category><![CDATA[upb]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=343</guid>
		<description><![CDATA[It has been just over a year since I last posted, leading some people to rightfully wonder whether my projects Gazelle and upb are abandoned. The answer to that question is a resounding &#8220;no.&#8221; I am more motivated to complete &#8230; <a href="http://blog.reverberate.org/2011/01/28/gazelleupb-status/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>It has been just over a year since I last posted, leading some people to rightfully wonder whether my projects Gazelle and upb are abandoned.  The answer to that question is a resounding &#8220;no.&#8221;  I am more motivated to complete Gazelle and upb than I have ever been, and I have been working on upb actively lately (<a href="https://github.com/haberman/upb/commits/src-refactoring">here are my recent commits</a>).</p>
<p>However, if I&#8217;ve learned one thing over the last several years of working on Gazelle and upb, it&#8217;s that I am extraordinarily bad at knowing how close I am to being ready to release.  I don&#8217;t want to make any predictions or promises.  I honestly thought I was almost ready to release upb a year and a half ago, but I&#8217;ve almost completely rewritten it twice since then.  Each time it gets significantly better and closer to being what I want it to be.  Once I have a release I&#8217;ll describe more about the stages of evolution it went through and how each iteration was objectively better than the one before it.</p>
<p>The core interfaces have gotten to a point where I&#8217;m really happy with them and feel no more need to rework them.  I think they will continue to evolve incrementally, but not in a way that requires redoing them completely.  Here are the most core interfaces &#8212; if you&#8217;re interested in upb, I recommend reading these headers.  I&#8217;ve just added substantial comments to explain them, and more than anything these will give you a taste of what upb is all about:
<ul>
<li><a href="https://github.com/haberman/upb/blob/fbb9fd35e05b88908beeca2c2b88b15aec1fca01/core/upb_stream.h">upb_stream.h</a>, the streaming interfaces for doing SAX-like tree traversal of protobuf data, and abstractions of fread()/fwrite().  These are probably the most important interfaces in all of upb, since all of the encoders and decoders are based on them.</li>
<li><a href="https://github.com/haberman/upb/blob/fbb9fd35e05b88908beeca2c2b88b15aec1fca01/core/upb_string.h">upb_string.h</a>, an immutable, length-delimited (instead of NULL-terminated), reference-counted string type.</li>
<li><a href="https://github.com/haberman/upb/blob/fbb9fd35e05b88908beeca2c2b88b15aec1fca01/core/upb_def.h">upb_def.h</a>, the data structures for a protobuf schema (.proto file) and routines for loading them.</li>
</ul>
<p>By the way, I also mean to write zero-overhead C++ wrappers around the above to give you C++ programmers a nicer interface at no cost.</p>
<p>With those set, I have been rapidly getting everything building/working again.  It&#8217;s a bit annoying to rewrite upb_def.c for the third time (literally) but it feels good knowing the interfaces are right.</p>
<p>So I have renewed optimism that I&#8217;ll be releasing soon.  And once I&#8217;m happy with upb, it&#8217;s back to Gazelle.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2011/01/28/gazelleupb-status/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Porting upb to C++?</title>
		<link>http://blog.reverberate.org/2009/11/28/porting-upb-to-c/</link>
		<comments>http://blog.reverberate.org/2009/11/28/porting-upb-to-c/#comments</comments>
		<pubDate>Sat, 28 Nov 2009 20:53:02 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[upb]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=287</guid>
		<description><![CDATA[I am on the verge of trying something I never thought I&#8217;d do. I&#8217;m considering porting upb to C++. My reasons aren&#8217;t ideological, they are highly practical. Basically I am realizing that while object-oriented C is OK for a while, &#8230; <a href="http://blog.reverberate.org/2009/11/28/porting-upb-to-c/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I am on the verge of trying something I never thought I&#8217;d do.  I&#8217;m considering porting upb to C++.</p>
<p>My reasons aren&#8217;t ideological, they are highly practical.  Basically I am realizing that while object-oriented C is OK for a while, it&#8217;s very weak at inheritance.  Inheritance in C involves a lot of casting, duplicated code and/or macros, and careful discipline.  The main problems with this are:
<ul>
<li>the code gets longer and less readable</li>
<li>the code involves more possibly-unsafe operations like casts</li>
</ul>
<p>Both of these problems make the code ultimately more difficult to audit for security.  And getting upb audited for security is something I plan to do very soon.</p>
<p>I am coming to believe that porting to C++ would make upb smaller (in lines of code) and easier for verify for security.  However, there are a few major disadvantages that are giving me pause:
<ul>
<li>there are still some contexts in which C++ is a no-go, like the Linux kernel, embedded systems that only have a C compiler (but no C++), or projects that want to stay C-only.  Doing this port would make upb unsuitable for these contexts.</li>
<li>projects that are currently C-only would need to create C++ source files to call upb APIs, and will have to link in the C++ runtime
<li>(possible) C++ could result in a larger binary.</li>
</ul>
<p>When I look at the downsides though, they don&#8217;t seem to pertain to my initial goals of making upb useful for Python, Lua, Ruby, etc. extensions, and for use inside Google.  Being useful for really restricted embedded systems is a far-off use case.  So it&#8217;s sounding like porting to C++ is the right thing to do.</p>
<p>I hope it significantly reduces the line count, as I expect it will.  That will make me feel better about giving up the minimalism of C.  I will definitely be compiling with <tt>-fno-exceptions -fno-rtti -fvisibility-inlines-hidden</tt> on gcc.  I also won&#8217;t be using any of the C++ standard library (not even &lt;string&gt;).</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2009/11/28/porting-upb-to-c/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Site Overhaul / upb status</title>
		<link>http://blog.reverberate.org/2009/07/06/site-overhaul-upb-status/</link>
		<comments>http://blog.reverberate.org/2009/07/06/site-overhaul-upb-status/#comments</comments>
		<pubDate>Tue, 07 Jul 2009 06:11:39 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[upb]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=220</guid>
		<description><![CDATA[You&#8217;ll notice that the site has a new theme, a new title, maybe a little bit of a new attitude (being &#8220;josh the outspoken&#8221; isn&#8217;t all it&#8217;s cracked up to be). All this is in anticipation of upb&#8217;s release, and &#8230; <a href="http://blog.reverberate.org/2009/07/06/site-overhaul-upb-status/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>You&#8217;ll notice that the site has a new theme, a new title, maybe a little bit of a new attitude (being &#8220;josh the outspoken&#8221; isn&#8217;t all it&#8217;s cracked up to be).  All this is in anticipation of upb&#8217;s release, and of a lot of follow-up work that I couldn&#8217;t be more excited about.</p>
<p>A lot of people mentioned that my last theme didn&#8217;t handle preview correctly, which I agree was quite annoying &#8212; sorry about that!  Unfortunately my new theme doesn&#8217;t have preview at all.  Finding the perfect WordPress theme is harder than it seems.</p>
<p>So as I mentioned I expect the first release of upb to come within the next week or two.  It will only have parsing (not serializing), but most of the core functionality besides that is done.  I expect that the core will stay relatively static, and all the enhancements/innovations will come from modules that are layered on top of that.  For example, some of the things I have in mind are:</p>
<ul>
<li>Extensions for every language known to man.  The first ones will be Python, Lua, and Ruby (in that order).  These will take some thought to get just right, because I want the memory management to be tightly integrated.  In particular, I want protobuf objects in these languages to be able to reference string data from the original protobufs, but all be memory-managed appropriately.</li>
<li>Support for the Protocol Buffer text format.</li>
<li>An easy way to parse only selected fields.  For example, I want to be able to say (in Python):
<pre lang="Python">time, url = logrecord.parse_fields(proto_data, "time", "url")</pre>
<p>This can be optimized out the wazoo, because you can skip all fields and submessages that you don&#8217;t care about, and you can stop parsing once you have the data you need.</li>
</ul>
<p>And of course, performance, performance, performance.  I can&#8217;t <em>wait</em> to get my hands on some real profiles and see what my next optimization target will be.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2009/07/06/site-overhaul-upb-status/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Amazing Tools: Massif, a heap profiler</title>
		<link>http://blog.reverberate.org/2009/07/06/amazing-tools-massif-a-heap-profiler/</link>
		<comments>http://blog.reverberate.org/2009/07/06/amazing-tools-massif-a-heap-profiler/#comments</comments>
		<pubDate>Tue, 07 Jul 2009 05:54:11 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[upb]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=207</guid>
		<description><![CDATA[I love the feeling of discovering an amazing new tool. It&#8217;s a pleasant surprise to have some task you want to achieve &#8212; one that you could do manually, given enough time &#8212; and find that some tool you didn&#8217;t &#8230; <a href="http://blog.reverberate.org/2009/07/06/amazing-tools-massif-a-heap-profiler/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I love the feeling of discovering an amazing new tool.  It&#8217;s a pleasant surprise to have some task you want to achieve &#8212; one that you could do manually, given enough time &#8212; and find that some tool you didn&#8217;t even know existed will make the solution easy.</p>
<p>Tonight I was working on the upb compiler (and upb&#8217;s first release is impending, by the way), and ran it under Valgrind as I frequently do to catch memory leaks.  There weren&#8217;t any leaks, but I did notice that the program had allocated 80kb of memory over the course of its run.</p>
<p>People who are less OCD than I would probably shrug off 80kb of memory.  But intuitively 80kb sounded high to me given how much data this program was dealing with, and I wanted to know where all those allocations were coming from.</p>
<p>I didn&#8217;t know off the top of my head how I might profile where my memory allocations were happening, but I had a hunch that Valgrind would be there for me.  And sure enough, one of the tools included in Valgrind is <a href="http://valgrind.org/docs/manual/ms-manual.html">Massif: a heap profiler</a>.</p>
<p>A few short shell commands later:</p>
<pre lang="bash">$ valgrind --tool=massif ./upbc
$ ms_print massif.out.17604 | less</pre>
<p>&#8230;and I had this call graph sitting in front of me:</p>
<pre>93.72% (74,243B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->58.49% (46,336B) 0x40194E: upb_table_init (upb_table.c:34)
| ->38.13% (30,208B) 0x4019E9: upb_strtable_init (upb_table.c:45)
| | ->29.09% (23,040B) 0x40343A: upb_msg_init (upb_msg.c:44)
| | | ->29.09% (23,040B) 0x402CEE: insert_message (upb_context.c:193)
| | |   ->25.85% (20,480B) 0x402E80: addfd (upb_context.c:223)
| | |   | ->12.93% (10,240B) 0x40263F: upb_context_init (upb_context.c:30)
| | |   | | ->12.93% (10,240B) 0x40167C: main (upbc.c:195)
| | |   | |
| | |   | ->12.93% (10,240B) 0x403135: upb_context_parsefds (upb_context.c:283)
| | |   |   ->12.93% (10,240B) 0x4016BA: main (upbc.c:198)
| | |   |
| | |   ->03.23% (2,560B) 0x402D56: insert_message (upb_context.c:203)
| | |     ->03.23% (2,560B) 0x402E80: addfd (upb_context.c:223)
| | |       ->01.62% (1,280B) 0x40263F: upb_context_init (upb_context.c:30)
| | |       | ->01.62% (1,280B) 0x40167C: main (upbc.c:195)
| | |       |
| | |       ->01.62% (1,280B) 0x403135: upb_context_parsefds (upb_context.c:283)
| | |         ->01.62% (1,280B) 0x4016BA: main (upbc.c:198)
| | |
| | ->05.82% (4,608B) 0x4046E0: upb_enum_init (upb_enum.c:14)
| | | ->05.82% (4,608B) 0x402C35: insert_enum (upb_context.c:167)
| | |   ->05.82% (4,608B) 0x402DBC: insert_message (upb_context.c:209)
| | |     ->05.82% (4,608B) 0x402E80: addfd (upb_context.c:223)
| | |       ->03.23% (2,560B) 0x403135: upb_context_parsefds (upb_context.c:283)
| | |       | ->03.23% (2,560B) 0x4016BA: main (upbc.c:198)
| | |       |
| | |       ->02.59% (2,048B) 0x40263F: upb_context_init (upb_context.c:30)
| | |         ->02.59% (2,048B) 0x40167C: main (upbc.c:195)
| | |
| | ->01.62% (1,280B) 0x402612: upb_context_init (upb_context.c:27)
| | | ->01.62% (1,280B) 0x40167C: main (upbc.c:195)
| | |
| | ->01.62% (1,280B) 0x402629: upb_context_init (upb_context.c:28)
| | | ->01.62% (1,280B) 0x40167C: main (upbc.c:195)
| | |
| | ->00.00% (0B) in 1+ places, all below ms_print's threshold (01.00%)
| |
| ->20.36% (16,128B) 0x4019C4: upb_inttable_init (upb_table.c:40)
|   ->17.45% (13,824B) 0x403417: upb_msg_init (upb_msg.c:42)
|   | ->17.45% (13,824B) 0x402CEE: insert_message (upb_context.c:193)
|   |   ->15.51% (12,288B) 0x402E80: addfd (upb_context.c:223)
|   |   | ->07.76% (6,144B) 0x40263F: upb_context_init (upb_context.c:30)
|   |   | | ->07.76% (6,144B) 0x40167C: main (upbc.c:195)
|   |   | |
|   |   | ->07.76% (6,144B) 0x403135: upb_context_parsefds (upb_context.c:283)
|   |   |   ->07.76% (6,144B) 0x4016BA: main (upbc.c:198)
|   |   |
|   |   ->01.94% (1,536B) 0x402D56: insert_message (upb_context.c:203)
|   |     ->01.94% (1,536B) 0x402E80: addfd (upb_context.c:223)
|   |       ->01.94% (1,536B) in 2 places, all below massif's threshold (01.00%)
|   |
|   ->02.91% (2,304B) 0x4046F5: upb_enum_init (upb_enum.c:15)
|     ->02.91% (2,304B) 0x402C35: insert_enum (upb_context.c:167)
|       ->02.91% (2,304B) 0x402DBC: insert_message (upb_context.c:209)
|         ->02.91% (2,304B) 0x402E80: addfd (upb_context.c:223)
|           ->01.62% (1,280B) 0x403135: upb_context_parsefds (upb_context.c:283)
|           | ->01.62% (1,280B) 0x4016BA: main (upbc.c:198)
|           |
|           ->01.29% (1,024B) 0x40263F: upb_context_init (upb_context.c:30)
|             ->01.29% (1,024B) 0x40167C: main (upbc.c:195)
|
->07.98% (6,324B) 0x403AF1: upb_msgdata_new (upb_msg.c:158)
| ->07.96% (6,308B) 0x403F05: upb_msg_reuse_submsg (upb_msg.c:241)
| | ->07.96% (6,308B) 0x404497: submsg_start_cb (upb_msg.c:325)
| |   ->07.96% (6,308B) 0x4056E8: push_stack_frame (upb_parse.c:288)
| |     ->07.96% (6,308B) 0x4057ED: parse_delimited (upb_parse.c:316)
| |       ->07.96% (6,308B) 0x405A91: upb_parse (upb_parse.c:369)
| |         ->07.96% (6,308B) 0x4045BA: upb_msg_parse (upb_msg.c:352)
| |           ->07.96% (6,308B) 0x404631: upb_alloc_and_parse (upb_msg.c:361)
| |             ->07.96% (6,308B) 0x4030CA: upb_context_parsefds (upb_context.c:274)
| |               ->07.96% (6,308B) 0x4016BA: main (upbc.c:198)</pre>
<p>This might look somewhat daunting if you&#8217;re not as deeply familiar with upb as I am.  But it immediately told me what I wanted to know: almost 60% of the memory is being used by upb&#8217;s int->record and string->record hash tables.  That seems a little bit high.  And it&#8217;s being allocated right when the tables are constructed (<tt>upb_table_init</tt>), not as a result of a resize.</p>
<p>Breaking open the code, I found a table minimum size that I had imposed as an attempt to limit the number of resizes.  Resizes have a high overhead &#8212; in my hash table implementation, they result in everything being re-hashed and all the memory being re-allocated, so I had imposed a minimum size of 16 in my constructor:</p>
<pre lang="C">void upb_table_init(struct upb_table *t, uint32_t size, uint16_t entry_size)
{
  t->count = 0;
  t->entry_size = entry_size;
  t->size_lg2 = 1;
  while(size >>= 1) t->size_lg2++;
  t->size_lg2 = max(t->size_lg2, 4);  /* Min size of 16. */</pre>
<p>When I inserted some print statements to compare how often this minimum was taking effect, I saw that there were tons of tables that were trying to allocate just a few (0-10) entries.  With my minimum, they were always being allocated at least 16.  And what&#8217;s more, in all these cases I knew <em>up front</em> how many entries I planned to insert!  So there was no danger of a resize anyway.</p>
<p>I removed this minimum size and the memory usage of my program dipped to about 55kb (from 80kb &#8212; a ~30% reduction!)  That seems a bit more reasonable, though I&#8217;m sure it&#8217;s not the last of my efforts to make sure upb&#8217;s memory footprint stays small.</p>
<p>Anyway, the point of this entry is that now I know about a new tool (Massif) that is at my disposal whenever I need it.  It&#8217;s easy to use and requires almost no set-up.  I can run it on a whim whenever I want to collect memory usage data.  I have just become a little bit more resourceful.</p>
<p>Valgrind has tons of spiffy tools of this sort that ship with it.  I wonder how many people know about them.</p>
<p>Another tool I had a similar reaction to was <a href="http://www.wireshark.org/">WireShark</a>.  I was experiencing a redirect loop bug in my browser and wanted to submit a useful report to the developers.  The useful information here is the contents of all the HTTP traffic that was occurring during the loop.  I fired up WireShark (as a first-time user) and found out relatively quickly how to sniff the network interface, capture my HTTP session, dump it at the HTTP layer (as opposed to the TCP layer or something else), and dumped it to a text file.  Massively spiffy.</p>
<p>Learn an amazing new tool today!  And then tell me about it!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2009/07/06/amazing-tools-massif-a-heap-profiler/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Î¼pb Status Update</title>
		<link>http://blog.reverberate.org/2009/06/26/%ce%bcpb-status-update/</link>
		<comments>http://blog.reverberate.org/2009/06/26/%ce%bcpb-status-update/#comments</comments>
		<pubDate>Sat, 27 Jun 2009 03:01:01 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[upb]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=180</guid>
		<description><![CDATA[I haven&#8217;t posted many status updates for upb lately. Sometimes that means I&#8217;m busy with other things, but right now it means I am working on it feverishly and can hardly stand to take a break from it. I&#8217;m extremely &#8230; <a href="http://blog.reverberate.org/2009/06/26/%ce%bcpb-status-update/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I haven&#8217;t posted many status updates for <a href="http://github.com/haberman/upb/tree/master">upb</a> lately.  Sometimes that means I&#8217;m busy with other things, but right now it means I am working on it feverishly and can hardly stand to take a break from it.</p>
<p>I&#8217;m extremely happy with how it&#8217;s shaping up.  It&#8217;s getting more and more complete, and yet still staying quite &#8220;micro.&#8221;  Notably, it just recently crossed 1500 lines of C (I don&#8217;t count auto-generated code in this), and it compiles to not quite 10kb of object code.  Throw in the auto-generated code (reflection data that describes the types in descriptor.proto &#8212; this is what allows loading other proto definitions at runtime) and this jumps to about 16kb.  Keep in mind that this effectively has all the major features of the main protobuf implementation (albeit packaged in different ways), but the main protobuf implementation is just over 1MB of object code!</p>
<p>I still think I can reach the main protobuf implementation&#8217;s performance, or maybe even exceed it.  I will consider the project a failure if I can&#8217;t get within 15% of its performance.</p>
<p>Anyway, just wanted to be sure everyone knows I&#8217;m still alive and very much working on this.  Now back to work.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2009/06/26/%ce%bcpb-status-update/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bit-fields in C99</title>
		<link>http://blog.reverberate.org/2009/06/26/bit-fields-in-c99/</link>
		<comments>http://blog.reverberate.org/2009/06/26/bit-fields-in-c99/#comments</comments>
		<pubDate>Sat, 27 Jun 2009 02:48:09 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[upb]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=168</guid>
		<description><![CDATA[Recently I came upon some spirited discussion on reddit concerning a blog post that discussed the use of bit-fields in C. As a quick refresher to anyone unfamiliar or rusty on bit-fields, they are a construct in C that lets &#8230; <a href="http://blog.reverberate.org/2009/06/26/bit-fields-in-c99/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Recently I came upon some <a href="http://www.reddit.com/r/programming/comments/8vj04/a_much_better_way_of_handling_bitfields_in_c_more/">spirited discussion on reddit</a> concerning <a href="http://www.pagetable.com/?p=250">a blog post that discussed the use of bit-fields in C</a>.  As a quick refresher to anyone unfamiliar or rusty on bit-fields, they are a construct in C that lets you specify how many bits different members of a structure should be allocated:</p>
<pre lang="C">
struct foo {
  unsigned int a:1;
  unsigned int b:2;
}</pre>
<p>Without the bit-field specifiers (the &#8220;:1&#8243; and &#8220;:2&#8243; above) this structure would be at least four bytes long, and both &#8220;a&#8221; and &#8220;b&#8221; would be capable of representing integers from 0 to 65535.  With the bit-field specifiers the structure is likely only one byte long, and &#8220;a&#8221; and &#8220;b&#8221; can only represent 0-1 and 0-3, respectively.  Bit fields are a way to store many distinct values compactly inside a struct, to save memory.</p>
<p>I am using bitfields in <a href="http://github.com/haberman/upb/tree/master">upb</a> to store flags for whether each field is set or not.  When I generate a structure definition for a specific proto type, code in the header file looks something like this:</p>
<pre lang="C">struct google_protobuf_DescriptorProto {
  union {
    uint8_t bytes[1];
    struct {
      bool name:1;  /* = 1, optional. */
      bool field:1;  /* = 2, repeated. */
      bool nested_type:1;  /* = 3, repeated. */
      bool enum_type:1;  /* = 4, repeated. */
      bool extension_range:1;  /* = 5, repeated. */
      bool extension:1;  /* = 6, repeated. */
      bool options:1;  /* = 7, optional. */
    } has;
  } set_flags;
  struct upb_string* name;
  UPB_STRUCT_ARRAY(google_protobuf_FieldDescriptorProto)* field;
  UPB_STRUCT_ARRAY(google_protobuf_FieldDescriptorProto)* extension;
  UPB_STRUCT_ARRAY(google_protobuf_DescriptorProto)* nested_type;
  UPB_STRUCT_ARRAY(google_protobuf_EnumDescriptorProto)* enum_type;
  UPB_STRUCT_ARRAY(google_protobuf_DescriptorProto_ExtensionRange)* extension_range;
  google_protobuf_MessageOptions* options;
};</pre>
<p>(that&#8217;s from <a href="http://github.com/haberman/upb/blob/c7f2a271ae29066744cf09499f744a0c6b89a27e/descriptor.h">descriptor.h</a>, which is automatically generated from descriptor.proto from the official protobuf implementation).</p>
<p>I &#8220;union&#8221; the bitfield with bytes so I can set the bytes en-masse more efficiently.  This seems like a really nice solution, because then client code that wants to test whether a field is set in a protobuf can write code like:</p>
<pre lang="C">if(fd->set_flags.has.message_type) {
   // ...
}</pre>
<p>So unfortunately the commenters on Reddit were pretty down on bit-fields and the aforementioned article.  The points they raised filled me with despair that I would have to abandon bit-fields.  This is bad because the only real alternative I would have for this kind of code generation is to generate explicit getters and setters for each bit of each structure!  The code would then immediately become quite uglified:</p>
<pre lang="C">if(google_protobuf_DescriptorProto_has_message_type(fd)) {
  // ...
}</pre>
<p>Besides being longer and forcing you to re-type the whole name of the enclosed type, this approach forces you to generate lots of nearly-identical code and pollute the symbol namespace with tons of these useless functions that add no real value.  I was bumming about this.</p>
<p>But as I investigated the supposed problems with bit-fields, I became convinced that the problems were tractable and that I could in good conscience press forward with my plans to use them as intended.</p>
<h3>Here Be Dragons</h3>
<p><i>&#8220;Almost everything about [bit] fields is implementation-dependent.&#8221;</i> &#8211;K&#038;R, Section 6.9</p>
<p>From my reading of the <a href="http://www.open-std.org/JTC1/SC22/wg14/www/docs/n1124.pdf">C99 standard</a>, here are the guarantees you get (or don&#8217;t get) with bit-fields.</p>
<p><b>1. Bit-fields will be packed as tightly as possible, provided they don&#8217;t cross storage unit boundaries: YES.</b>  From the standard: (6.7.2.1 #10):</p>
<blockquote><p>An implementation may allocate any addressable storage unit large enough to hold a bit-ï¬eld. If enough space remains, a bit-ï¬eld that immediately follows another bit-ï¬eld in a structure shall be packed into adjacent bits of the same unit.</p></blockquote>
<p>The first part just means that the compiler can allocate any number of bytes to a bit-field, as long as it has enough bits.  So if you write:</p>
<pre lang="C">struct foo {
  unsigned int a:8;
}</pre>
<p>&#8230;a will be at least 1 byte long, but could be more.  But the second part is where we get the nice guarantee that:</p>
<pre lang="C">struct foo {
  unsigned int a:1;
  unsigned int b:2;
}</pre>
<p>&#8230;will always be packed into a single byte.  sizeof(struct foo) could be greater than 1, but you are guaranteed that a and b are in the same byte.</p>
<p><b>2. Bits are allocated low-to-high (or high-to-low): NO</b>.  This is not guaranteed by the spec, so an implementation could choose to allocate bits either starting at the high bit or the low bit of each byte.  The following program will test an implementation to see which it is:</p>
<pre lang="C">#include <stdio.h>
int main() {
  union {
    struct {
      unsigned int a:1;
      unsigned int b:2;
      unsigned int c:3;
    } bitfield;
    unsigned char ch;
  } u = {.ch = 0};

  u.bitfield.a = 1;
  u.bitfield.c = 7;
  switch(u.ch) {
    case 0x39: printf("Bits allocated LOW-TO-HIGH.\n"); break;
    case 0x9C: printf("Bits allocated HIGH-TO-LOW.\n"); break;
    default: printf("Oddball allocation: 0x%02hhx.\n", u.ch); break;
  }
  return 0;
}</pre>
<h3>My Goal</h3>
<p>What I am trying to do is create a C struct like the above, but <i>also</i> be able to perform data-driven reflection on that C struct based on generic routines like:</p>
<pre lang="C">INLINE bool upb_msg_is_set(void *s, int index)
{
  return ((char*)s)[index / 8] &#038; (1 << (index % 8));
}</pre>
<p>This means that my reflection routines like the one above need to be able to mimic the implementation-defined bit ordering as described above.  The routine above does not -- it assumes that the bits are assigned from low to high.  But a few #ifdefs should make the above not too difficult.</p>
<h3>Performance</h3>
<p>In theory, the ideal compiler should love bit-fields because they give it maximum information with which to optimize loads and stores.  I see a lot of people claim that some compilers generate bad code for bit-fields, but that is a possibility with any construct you use.  From what I can see GCC generates very good code for bit-fields.</p>
<p>So my conclusion is that the hurdles of bit-fields are tractable, and unless some new information comes my way, I plan to move forward with my plan to use them.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2009/06/26/bit-fields-in-c99/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>pbstream&#8217;s name</title>
		<link>http://blog.reverberate.org/2009/05/03/pbstreams-name/</link>
		<comments>http://blog.reverberate.org/2009/05/03/pbstreams-name/#comments</comments>
		<pubDate>Mon, 04 May 2009 03:09:27 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[upb]]></category>
		<category><![CDATA[pbstream]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=158</guid>
		<description><![CDATA[I hate naming things. I&#8217;m starting to realize that pbstream is outgrowing its name. That&#8217;s the one thing I always thought was brilliant about the iPod&#8217;s name. When the iPod started playing video it was no problem &#8212; it&#8217;s a &#8230; <a href="http://blog.reverberate.org/2009/05/03/pbstreams-name/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I hate naming things.  I&#8217;m starting to realize that pbstream is outgrowing its name.  That&#8217;s the one thing I always thought was brilliant about the iPod&#8217;s name.  When the iPod started playing video it was no problem &#8212; it&#8217;s a pod!  Why <i>shouldn&#8217;t</i> a pod play video?  There&#8217;s no way to outgrow the name &#8220;iPod.&#8221;  Brilliant.</p>
<p>I had no such luck with pbstream.  The tagline is currently:</p>
<p><code> pbstream - a stream-oriented implementation of protocol buffers.</code></p>
<p>It&#8217;s true that offering streaming semantics is one distinguishing feature of my implementation, but the way it&#8217;s growing it&#8217;s no longer really &#8220;stream-oriented.&#8221;  It will offer both stream-oriented and structure-oriented semantics.  Both will be equally supported/encouraged &#8212; it will be a matter of what best suits your needs.</p>
<p>If anything, the number one distinguishing feature of my implementation is that it is minimal.  It gives you a set of tools to use whatever paradigm is the right trade-off for you.  It gives you building blocks to assemble as you see fit.  There are never &#8220;riders&#8221; &#8212; things like memory management that you have to take along with the parts you <i>really</i> want.</p>
<p>So what&#8217;s in a name?  Desired characteristics:</p>
<ul>
<li>Unique, reasonably Googleable.</li>
<li>Communicates that it is a protobuf implementation, and if possible communicates the philosophy described above.</li>
<li>The name (or a reasonable abbreviation thereof) makes for a nice prefix that can be appended to all my identifiers.  Right now it&#8217;s <code>pbstream_this</code> and <code>pbstream_that</code>.</li>
</ul>
<p>Originally I just called it &#8220;pb&#8221; but that was a tad bit too generic.  I slightly like &#8220;pblab&#8221; (communicates that it gives you building blocks for forging your own strategies), but I dislike the connotation that it is unfinished, unpolished, not production quality.</p>
<p>Perhaps I could give it a name, but keep using &#8220;pb_&#8221; in my identifiers.  It could also help me organize the names of the different parts.  I could use names like:</p>
<ul>
<li><code>pb_stream_parser_*</code> for the stream parser.</li>
<li><code>pb_struct_*</code> for the code that defines an in-memory structure.</li>
</ul>
<p>I kind of like that.  But I still need a name for the package itself that is better than &#8220;pb&#8221;.  Ideas?</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2009/05/03/pbstreams-name/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
	</channel>
</rss>

