<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Josh Haberman &#187; Uncategorized</title>
	<atom:link href="http://blog.reverberate.org/category/uncategorized/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.reverberate.org</link>
	<description>parsing, performance, minimalism with C99</description>
	<lastBuildDate>Mon, 30 Jan 2012 00:15:14 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>State of the hash functions, 2012</title>
		<link>http://blog.reverberate.org/2012/01/29/state-of-the-hash-functions-2012/</link>
		<comments>http://blog.reverberate.org/2012/01/29/state-of-the-hash-functions-2012/#comments</comments>
		<pubDate>Mon, 30 Jan 2012 00:15:14 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=536</guid>
		<description><![CDATA[The state-of-the-art in non-cryptographic hash functions has advanced rapidly in the last few years. When I did some searching this week I was happy to see that new cutting-edge hash functions had been released even since last time I looked &#8230; <a href="http://blog.reverberate.org/2012/01/29/state-of-the-hash-functions-2012/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>The state-of-the-art in non-cryptographic hash functions has advanced rapidly in the last few years.  When I did some searching this week I was happy to see that new cutting-edge hash functions had been released even since last time I looked into this 6 months or a year ago.</p>
<p>Non-cryptographic hash functions take a string as input and compute an integer output.  The desirable property of a hash function is that the outputs are evenly distributed across the domain of possible outputs, especially for inputs that are similar.  Unlike a cryptographic hash function, these functions are <em>not</em> designed to withstand an effort by an attacker to find a collision.  Cryptographic hash functions have this property, but are much slower: <a href="http://www.cryptopp.com/benchmarks.html">SHA-1 is on the order of 0.09 bytes/cycle</a> whereas the newest non-cryptographic hash functions are on the order of 3 bytes/cycle.  So non-cryptographic hashes are roughly 33x faster, at the cost of not being able to withstand attacks.  Non-cryptographic hashes are most often used for hash tables.</p>
<p>As an interesting aside, <a href="http://thread.gmane.org/gmane.comp.lang.lua.general/87491">there is a debate going on in the Lua community right now</a> about what, if anything, should be done about the fact that Lua&#8217;s hash function could theoretically be attacked to force its hash table implementation into its O(n) worst-case lookup performance.  This could let an attacker DoS you if he is feeding you input that you are putting into a Lua hash table.  The Lua authors are somewhat skeptical about how realistic this attack is (and whether it would be cheaper than other DoS alternatives), but are moving ahead anyway with a plan to generate a random seed at startup that the hash function will use.  This is an interesting alternative to cryptographic hash functions that should be able to give you the same collision resistance as a cryptographic hash function (presuming you have an entropy source that can give you truly random bits), but at the cost of non-reproducible output.</p>
<p>Since there are lots of options out there for non-cryptographic hash functions and this number keeps expanding, I thought I&#8217;d summarize my knowledge of what is out there.</p>
<h2>Paul Jenkins&#8217; Functions</h2>
<p><a href="http://burtleburtle.net/bob/">Paul Jenkins</a> has been working on hash functions for 15 years or so.  In 1997 he published an article about hash functions in Dr. Dobbs Journal; the article is available now on the web with more content added since its original publication: <a href="http://www.burtleburtle.net/bob/hash/doobs.html">A hash function for hash Table lookup</a>.  In this article Bob has an extensive catalog of existing hash functions, as well as presenting his own called &#8220;lookup2.&#8221;  Paul subsequently published <a href="http://burtleburtle.net/bob/c/lookup3.c">lookup3</a> in 2006, which for the purposes of this article I will consider the first &#8220;modern&#8221; hash function, in the sense that it is both fast (0.5 bytes/cycle, according to Paul) and free of any serious flaws.</p>
<p>More information about Paul&#8217;s functions can be found on Wikipedia: <a href="http://en.wikipedia.org/wiki/Jenkins_hash_function">Jenkins hash function</a>.</p>
<h2>Second generation: MurmurHash</h2>
<p>In 2008 Austin Appleby published a new hash function called <a href="https://sites.google.com/site/murmurhash/">MurmurHash</a>.  In its most recent version it is roughly 2x the speed of lookup3 (so roughly 1 byte/cycle), and it comes in both 32 and 64-bit versions.  The 32-bit version uses only 32-bit math and gives you a 32-bit hash, the 64-bit version uses 64-bit math and gives a 64-bit hash.  According to Austin&#8217;s analysis it has excellent properties, though Bob Jenkins says in his expanded Dr. Dobbs article &#8220;I can see [MurmurHash is] weaker than my lookup3, but I don&#8217;t by how much, I haven&#8217;t tested it.&#8221;  MurmurHash quickly became popular thanks to its excellent speed and statistical properties.</p>
<h2>Third generation: CityHash and SpookyHash</h2>
<p>In 2011 two hash functions were released that both improve on MurmurHash due largely to greater instruction-level parallelism.  Google released <a href="http://code.google.com/p/cityhash/">CityHash</a> (written by Geoff Pike and Jyrki Alakuijala) and Bob Jenkins released a new hash of his own, <a href="http://burtleburtle.net/bob/hash/spooky.html">SpookyHash</a> (so named because it was released on Halloween).  Both functions are on the order of 2x the speed of MurmurHash, but both functions use 64-bit math and have no 32-bit version, and CityHash depends on the CRC32 instruction that is present in SSE 4.2 (Intel Nehalem and later) for its speed.  SpookyHash gives you 128-bit output, whereas CityHash has 64-bit, 128-bit, and 256-bit variants.</p>
<h2>Which function is best/fastest?</h2>
<p>From what I can see, all of the hash functions I mentioned in this article are good enough from a statistical perspective.  One consideration is that only CityHash/SpookyHash give more than 64 bits of output, but for a hash table 32 bits of output is plenty.  Other applications may have use for 128 or 256 bit output.</p>
<p>If you&#8217;re on 32-bit, MurmurHash looks like the clear winner since it&#8217;s the only function faster than lookup3 that has a native 32-bit version.  32-bit machines could probably compile and run City and Spooky, but I would expect it to be much slower because the 64-bit math would have to be emulated.</p>
<p>On 64-bit machines it&#8217;s hard to say which is best without further benchmarking.  I&#8217;d be liable to prefer Spooky to City since the latter depends on the CRC32 instruction for speed which isn&#8217;t available everywhere.</p>
<p>One other consideration is aligned vs. unaligned access.  MurmurHash (unlike City or Spooky) comes in a variant that will only perform aligned reads, since on many architectures unaligned reads will crash or return the wrong data (unaligned reads are undefined behavior in C).  City and Spooky both address the issue by copying the input data into aligned storage with memcpy(); Spooky does the memcpy() a block at a time (if ALLOW_UNALIGNED_READS is not defined), City does the memcpy() an integer at a time!  On machines that can handle unaligned reads (like x86 and x86-64) the memcpy will be optimized away, but I did a test on my little ARM box and found that this:</p>
<pre lang="c">#include &lt;stdint.t&gt;
#include &lt;string.h&gt;
int32_t read32_unaligned(const void *buf) {
  int32_t ret;
  memcpy(&#038;ret, buf, 4);
  return ret;
}</pre>
<p>compiles to this very inefficient code (this would be a single instruction on x86):</p>
<pre lang="asm">   0:	b500      	push	{lr}
   2:	2204      	movs	r2, #4
   4:	b083      	sub	sp, #12
   6:	4601      	mov	r1, r0
   8:	eb0d 0002 	add.w	r0, sp, r2
   c:	f7ff fffe 	bl	0 &lt;memcpy&gt;
  10:	9801      	ldr	r0, [sp, #4]
  12:	b003      	add	sp, #12
  14:	bd00      	pop	{pc}</pre>
<p>To conclude, MurmurHash still looks like the best option if you need 32-bit or aligned-only reads.  CityHash and SpookyHash look to be faster on x86-64, but I would almost think of them as being specific to that architecture, since I&#8217;m not aware of other architectures that are both 64-bit and allow unaligned reads.</p>
<p>Please let me know of any errors in this article.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2012/01/29/state-of-the-hash-functions-2012/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Refcounting immutable cyclic graphs</title>
		<link>http://blog.reverberate.org/2012/01/21/refcounting-immutable-cyclic-graphs/</link>
		<comments>http://blog.reverberate.org/2012/01/21/refcounting-immutable-cyclic-graphs/#comments</comments>
		<pubDate>Sat, 21 Jan 2012 19:07:30 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=518</guid>
		<description><![CDATA[Cycles are a thorn in the side of refcounting. But yesterday I discovered (or perhaps rediscovered) a simple, efficient scheme for refcounting cyclic graphs if the defs are immutable: find strongly-connected components and make all nodes in each SCC share &#8230; <a href="http://blog.reverberate.org/2012/01/21/refcounting-immutable-cyclic-graphs/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Cycles are a thorn in the side of refcounting.  But yesterday I discovered (or perhaps rediscovered) a simple, efficient scheme for refcounting cyclic graphs if the defs are immutable: find <a href="http://en.wikipedia.org/wiki/Strongly_connected_component">strongly-connected components</a> and make all nodes in each SCC share a refcount.  Beautiful.</p>
<p>This relies on the graph theory result that if each SCC in a graph is replaced with a single node, the resulting graph forms a <a href="http://en.wikipedia.org/wiki/Directed_acyclic_graph">directed acyclic graph</a>.</p>
<p>I&#8217;m planning to use this scheme to refcount defs in upb (garbage-collection is a non-starter because I don&#8217;t want to have to track a global list of roots or force the client to periodically call a GC function).</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2012/01/21/refcounting-immutable-cyclic-graphs/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Using dtrace on OS X to debug a performance problem</title>
		<link>http://blog.reverberate.org/2011/05/08/using-dtrace-on-os-x-to-debug-a-performance-problem/</link>
		<comments>http://blog.reverberate.org/2011/05/08/using-dtrace-on-os-x-to-debug-a-performance-problem/#comments</comments>
		<pubDate>Sun, 08 May 2011 20:28:12 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=467</guid>
		<description><![CDATA[I recently ported upb&#8217;s table-based decoder to use setjmp/longjmp-based error handling. I did this largely for code simplicity and readability, so that the non-error code-paths didn&#8217;t have to check for errors all the time. But unfortunately I noticed a dramatic &#8230; <a href="http://blog.reverberate.org/2011/05/08/using-dtrace-on-os-x-to-debug-a-performance-problem/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I recently ported upb&#8217;s table-based decoder to use <code>setjmp/longjmp</code>-based error handling.  I did this largely for code simplicity and readability, so that the non-error code-paths didn&#8217;t have to check for errors all the time.  But unfortunately I noticed a dramatic 75% performance decrease.  What was going on?</p>
<p>A profile in Shark showed the majority of my time being spent in system calls, but didn&#8217;t make it clear which system calls were involved.  It sounded like a job for dtrace.</p>
<p>I used the following script to dump all the system calls that were being issued by my process:</p>
<pre lang="d">syscall:::entry
/pid == $target/
{
     @[probefunc] = count();
} </pre>
<p>I then ran this script on both my old (fast) benchmark and the new (unexpectedly slow) one:</p>
<pre lang="bash">$ sudo dtrace -c ./old-benchmark -s trace.d

  exit                                             1
  fstat64                                          1
  ioctl                                            1
  write_nocancel                                   1
  munmap                                          93
  mmap                                            94
  getrusage                                     7728    

$ sudo dtrace -c ./new-benchmark -s trace.d

  exit                                             1
  fstat64                                          1
  ioctl                                            1
  munmap                                           1
  write_nocancel                                   1
  getrusage                                      438
  sigaltstack                                 111773
  sigreturn                                   111774
  sigprocmask                                 223546</pre>
<p>Sure enough, the new version is making a ton of system calls that weren&#8217;t happening before, and they appear to be signal related.  I investigated the manpage of <code>setjmp/longjmp</code> and found:</p>
<blockquote><p>The setjmp()/longjmp() pairs save and restore the signal mask while _setjmp()/_longjmp() pairs save and restore only the register set and the stack.  (See sigprocmask(2).)</p>
<p>     The sigsetjmp()/siglongjmp() function pairs save and restore the signal mask if the argument savemask is non-zero; otherwise, only the register set and the stack are saved.</p></blockquote>
<p>I replaced my calls to <code>setjmp/longjmp</code> with <code>sigsetjmp/siglongjmp</code>, passing 0 for my signal mask, and the performance problems went away.  Score one for dtrace.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2011/05/08/using-dtrace-on-os-x-to-debug-a-performance-problem/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ARM architecture is mired in oppressive legalese</title>
		<link>http://blog.reverberate.org/2011/04/18/arm-architecture-is-mired-in-oppressive-legalese/</link>
		<comments>http://blog.reverberate.org/2011/04/18/arm-architecture-is-mired-in-oppressive-legalese/#comments</comments>
		<pubDate>Tue, 19 Apr 2011 05:36:47 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=431</guid>
		<description><![CDATA[My Efika MX Smarttop came and I&#8217;ve had some fun compiling upb for it and trying it out. I don&#8217;t have a pressing ARM use case yet, but I want to work ahead a bit and get familiar with the &#8230; <a href="http://blog.reverberate.org/2011/04/18/arm-architecture-is-mired-in-oppressive-legalese/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://blog.reverberate.org/2011/03/31/addicted-to-hardware-new-toy-on-the-way/">My Efika MX Smarttop came</a> and I&#8217;ve had some fun compiling upb for it and trying it out.  I don&#8217;t have a pressing ARM use case yet, but I want to work ahead a bit and get familiar with the architecture so that when I <i>do</i> want to program for it I&#8217;m not starting from scratch.</p>
<p>I went to the ARM website to get my hands on some documentation.  I was prepared to buy physical books if that is their authoritative documentation, but it appears that they have PDFs for the architecture&#8217;s reference manuals.</p>
<p>So far so good &#8212; this is on par with Intel which has <a href="http://www.intel.com/products/processor/manuals/">a nice and easily accessible website where you can download any of their manuals in like 5 seconds.</a>  I&#8217;ve downloaded them and referenced them a countless number of times.</p>
<p>But I get no such love from ARM.  Strike 1: the ARM website won&#8217;t let you download the manuals unless you have registered first.  Not cool ARM, not cool.  I grudgingly give them my name, email address, company name, country, and state.  And the website warns:</p>
<blockquote><p><b>Note</b>: We recommend using your business email address to ensure you can access all your relevant services</p></blockquote>
<p>This is a vague but ominous warning that if you use a personal email address the registration might not work correctly.</p>
<p>But fine, I go through with the registration.  Now I can download the manual, right?  Turns out no: first I have to accept a EULA!  It begins:</p>
<blockquote><p>USER AGREEMENT FOR THE ARM ARCHITECTURE REFERENCE MANUAL</p>
<p>THIS AGREEMENT (&#8221; AGREEMENT &#8220;) IS A LEGAL AGREEMENT BETWEEN YOU (EITHER A SINGLE INDIVIDUAL, OR SINGLE LEGAL ENTITY) AND ARM LIMITED (&#8220;ARM&#8221;) FOR THE USE OF THE ARM ARCHITECTURE REFERENCE MANUAL. ARM IS ONLY WILLING TO PROVIDE ACCESS TO THE ARM ARCHITECTURE REFERENCE MANUAL TO YOU ON CONDITION THAT YOU ACCEPT ALL OF THE TERMS IN THIS AGREEMENT. BY CLICKING &#8220;I AGREE&#8221; OR BY DOWNLOADING OR OTHERWISE COPYING THE DELIVERABLES YOU INDICATE THAT YOU AGREE TO BE BOUND BY ALL THE TERMS OF THIS LICENCE.</p></blockquote>
<p>This is all just to read some documentation.  Not impressed.  ARM, We&#8217;re off to a rocky start.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2011/04/18/arm-architecture-is-mired-in-oppressive-legalese/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>EINTR and PC loser-ing (The &#8220;Worse Is Better&#8221; case study)</title>
		<link>http://blog.reverberate.org/2011/04/18/eintr-and-pc-loser-ing-the-worse-is-better-case-study/</link>
		<comments>http://blog.reverberate.org/2011/04/18/eintr-and-pc-loser-ing-the-worse-is-better-case-study/#comments</comments>
		<pubDate>Mon, 18 Apr 2011 08:57:36 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=399</guid>
		<description><![CDATA[Richard Gabriel&#8217;s 1989 essay Worse Is Better is a famous comparison between LISP and Unix/C that pops up from time to time and is guaranteed to spark a spirited discussion. The philosophical argument itself is not something I want to &#8230; <a href="http://blog.reverberate.org/2011/04/18/eintr-and-pc-loser-ing-the-worse-is-better-case-study/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Richard Gabriel&#8217;s 1989 essay <a href="http://www.jwz.org/doc/worse-is-better.html">Worse Is Better</a> is a famous comparison between LISP and Unix/C that pops up from time to time and is guaranteed to spark a spirited discussion.  The philosophical argument itself is not something I want to get into right now; I am interested in the technical content of the essay.  What always bothered me about this paper is that I never fully understood Gabriel&#8217;s primary example of a dirty hack vs. &#8220;the right thing.&#8221;</p>
<p>His example is &#8220;the PC loser-ing problem,&#8221; which he describes thus:</p>
<blockquote><p>Two famous people, one from MIT and another from Berkeley (but working on Unix) once met to discuss operating system issues. The person from MIT was knowledgeable about ITS (the MIT AI Lab operating system) and had been reading the Unix sources. He was interested in how Unix solved the PC loser-ing problem. The PC loser-ing problem occurs when a user program invokes a system routine to perform a lengthy operation that might have significant state, such as IO buffers. If an interrupt occurs during the operation, the state of the user program must be saved. Because the invocation of the system routine is usually a single instruction, the PC of the user program does not adequately capture the state of the process. The system routine must either back out or press forward. The right thing is to back out and restore the user program PC to the instruction that invoked the system routine so that resumption of the user program after the interrupt, for example, re-enters the system routine. It is called &#8220;PC loser-ing&#8221; because the PC is being coerced into &#8220;loser mode,&#8221; where &#8220;loser&#8221; is the affectionate name for &#8220;user&#8221; at MIT.</p>
<p>The MIT guy did not see any code that handled this case and asked the New Jersey guy how the problem was handled. The New Jersey guy said that the Unix folks were aware of the problem, but the solution was for the system routine to always finish, but sometimes an error code would be returned that signaled that the system routine had failed to complete its action. A correct user program, then, had to check the error code to determine whether to simply try the system routine again. The MIT guy did not like this solution because it was not the right thing.</p>
<p>The New Jersey guy said that the Unix solution was right because the design philosophy of Unix was simplicity and that the right thing was too complex.</p></blockquote>
<p>When I read this I always had a burning desire to know: how did the story end?  How do modern operating systems resolve this problem &#8212; the &#8220;dirty hack&#8221; way or the &#8220;right way?&#8221;  What part of our modern POSIX interfaces are affected by this question?</p>
<p>There are several things that never made sense to me about this example.  First of all, why would you need to abort a system call just because an interrupt occurred?  I investigated the Linux source and it seems quite clear that interrupt handlers can return to either the kernel or userspace &#8212; whichever was running when the interrupt fired.  So I don&#8217;t see why you&#8217;d need to &#8220;coerce&#8221; the system into &#8220;loser mode&#8221; at all.</p>
<p>But let&#8217;s suppose you accept this as a given &#8212; we will assume that when a hardware interrupt occurs, you must exit to user mode.  I still don&#8217;t see the difficulty in automatically re-invoking the system call.  It&#8217;s true that invoking the system routine is a single instruction, but why is it that &#8220;the PC of the user program does not adequately capture the state of the process,&#8221; as Gabriel&#8217;s essay states?  What other process state do we need to capture?  The registers must already be saved when the syscall is entered, because they must be restored even with a completely normal syscall return.  So if we want to re-invoke the system routine, it should be as easy as simply re-executing the instruction that made the system call.  Right?</p>
<p>The whole example confused me quite a lot until I had the idea to replace &#8220;interrupt&#8221; in the above description with &#8220;signal.&#8221;  This is not such a stretch, since signals are essentially user-space software interrupts.  With this small change, everything started to make a lot more sense.  If a <i>signal</i> was delivered to a process that was currently inside a system call, that signal handler could invoke a system call itself, which would cause us to re-enter the kernel.  I could easily see how the complexity of dealing with this could have led early UNIX implementors to simply abort the original system call before delivering the signal.</p>
<p>But this is only speculation about what UNIX was like in the mid to late 80s when &#8220;Worse is Better&#8221; was written.  I could be completely off the mark in this analysis &#8212; maybe returning to the kernel from a hardware interrupt handler really wasn&#8217;t implemented at that time.  Or maybe saving user state really was difficult for some reason.  I&#8217;d love to hear from anyone who has more historical context about this.  But the essay contains an important clue that seems to reinforce my speculation that it&#8217;s actually about signals.</p>
<h3>UNIX and EINTR</h3>
<p>If we look closely at the &#8220;Worse is Better&#8221; essay, we get a strong clue about what the Unix guy in the story might have been talking about:</p>
<blockquote><p>The New Jersey guy said that the Unix folks were aware of the problem, but the solution was for the system routine to always finish, but sometimes an error code would be returned that signaled that the system routine had failed to complete its action. A correct user program, then, had to check the error code to determine whether to simply try the system routine again.</p></blockquote>
<p>As someone who has done a lot of Unix system-level programming, this sounds to me like it <i>must</i> be describing EINTR, the error code in Unix that means &#8220;Interrupted system call.&#8221;  To give a quick description of EINTR I&#8217;ll enlist the help of my trusty copy of &#8220;Advanced Programming in the Unix Environment&#8221; by W. Richard Stevens:</p>
<blockquote><p>A characteristic of earlier UNIX systems is that if a process caught a signal while the process was blocked in a &#8220;slow&#8221; system call, the system call was interrupted.  The system call returned an error and <code>errno</code> was set to <code>EINTR</code>.  This was done under the assumption that since a signal occurred and the process caught it, there is a good chance that something has happened that should wake up the blocked system call.</p>
<p>[...]</p>
<p>The problem with interrupted system calls is that we now have to handle the error return explicitly.  The typical code sequence (assuming a read operation and assuming that we want to restart the read even if it&#8217;s interrupted) would be</p>
<pre lang="c">
again:
  if ((n = read(fd, buf, BUFFSIZE)) < 0) {
    if (errno == EINTR)
      goto again;  /* just an interrupted system call */
    /* handle other errors */
  }
</pre>
</blockquote>
<p>This sounds an awful lot like the the New Jersey guy's approach from the story, which required a correct program "to check the error code to determine whether to simply try the system routine again."  And there's nothing else in Unix that I've ever heard of that's anything like this.  This must be what the New Jersey guy from the story was talking about!</p>
<p>But note that in W. Richard Stevens' explanation this isn't some dirty hack!  It's not a case of cutting corners that is justified by favoring implementation simplicity over interface simplicity.  Stevens describes it as a deliberate design decision that gives users the capability to abort a long-running system call if you catch a signal in the meantime.  Now you could easily see this as a rationalization of a dirty hack ("it's not a bug, it's a feature!"), but it certainly seems plausible that if you catch a signal while you're blocked on a long system call, the signal might make you decide that you don't want to wait for the long system call any more.  Indeed, Ulrich Drepper claimed in 2000 that <a href="http://lkml.indiana.edu/hypermail/linux/kernel/0011.0/0494.html">"Returning EINTR is necessary for many applications,"</a> though it would have been helpful if he had expanded on this point by giving some examples.</p>
<p>Of course, the price we have paid for this capability is that we have to wrap all of our potentially long system calls in a loop like the example above.  If we don't, our system calls can start failing and causing program errors whenever we catch a signal.  You may think that you don't use any signals yourself, but are you sure that none of your libraries do?  On the flip side, if you're implementing a library you can never know if the main application will use signals or not, so any library that wants to be robust will have to wrap these system calls in a retry loop.</p>
<p>Since the vast majority of programs will always want their system calls to continue even when a signal is received, 4.2BSD (released in 1983) implemented support for automatically retrying most system calls that could previously fail with EINTR.  To me this sounds exactly like what the MIT guy in Richard Gabriel's story was saying is "the right thing."  <b>In other words, Berkeley UNIX was already doing "the right thing" five years before "Worse is Better" was written!</b></p>
<p>Modern POSIX APIs allow both behaviors (either restarting the system call automatically or returning <code>EINTR</code>) -- this is controlled by the <code>SA_RESTART</code> flag.  The following program illustrates both behaviors:</p>
<pre lang="c">

#include <errno.h>
#include <signal.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

void doread() {
  char buf[128];
  printf("doing read() into buf %p\n", buf);
  ssize_t ret = read(STDIN_FILENO, buf, sizeof(buf));
  if (ret < 0) {
    printf("read() for buf %p returned error: %s\n", buf, strerror(errno));
  } else {
    printf("read() for buf %p returned data: %.*s", buf, (int)ret, buf);
  }
}

void sighandler(int signo) {
  printf("received signal %d\n", signo);
  doread();
}

int main(int argc, char *argv[]) {
  // Register SIGHUP handler.  Pass any argument to get SA_RESTART.
  struct sigaction action;
  action.sa_handler = &#038;sighandler;
  sigemptyset(&#038;action.sa_mask);
  action.sa_flags = (argc > 1) ? SA_RESTART : 0;
  sigaction(SIGHUP, &#038;action, NULL);

  doread();
  return 0;
}
</pre>
<p>Here are the results of running the program three different times. I've bolded the parts where I typed to give the program input on stdin.  You can also see where I sent the program a <code>SIGHUP</code>.</p>
<pre>
$ ./test
doing read() into buf 0x7ec7959c
<b>INPUT FROM TERMINAL</b>
read() for buf 0x7ec7959c returned data: INPUT FROM TERMINAL
$ ./test
doing read() into buf 0x7ef6659c
received signal 1
doing read() into buf 0x7ef66204
<b>INPUT FROM TERMINAL</b>
read() for buf 0x7ef66204 returned data: INPUT FROM TERMINAL
read() for buf 0x7ef6659c returned error: Interrupted system call
$ ./test give_me_sa_restart
doing read() into buf 0x7eb7657c
received signal 1
doing read() into buf 0x7eb761e4
<b>INPUT FROM TERMINAL</b>
read() for buf 0x7eb761e4 returned data: INPUT FROM TERMINAL
<b>INPUT FROM TERMINAL AGAIN</b>
read() for buf 0x7eb7657c returned data: INPUT FROM TERMINAL AGAIN
</pre>
<h3>Conclusion</h3>
<p>You might ask "why all the fuss over a little example?"  As I mentioned, my primary motivation in researching all of this was to get to the bottom of this issue and understand how it plays out in modern operating systems.</p>
<p>But if we were going to take all of this information and reflect on the "Worse is Better" argument, my personal observations/conclusions would be:</p>
<ul>
<li>The "worse" system (Unix) did indeed do "the right thing" eventually, even if it didn't at first.  "Worse is better" systems incrementally improve by responding to user needs.  Since users got tired of checking for EINTR, the "worse" system added the functionality for addressing this pain point.</li>
<li>The whole thing did leave a rather large wart, though -- all Unix programs have to wrap these system calls in an EINTR retry loop unless they can be absolutely sure the process will never catch signals that don't have SA_RESTART set.  So there is a price to pay for this incremental evolution.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2011/04/18/eintr-and-pc-loser-ing-the-worse-is-better-case-study/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Addicted to hardware: new toy on the way</title>
		<link>http://blog.reverberate.org/2011/03/31/addicted-to-hardware-new-toy-on-the-way/</link>
		<comments>http://blog.reverberate.org/2011/03/31/addicted-to-hardware-new-toy-on-the-way/#comments</comments>
		<pubDate>Thu, 31 Mar 2011 08:50:24 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=395</guid>
		<description><![CDATA[I&#8217;m addicted to hardware. I can&#8217;t stop thinking about all of the CPUs that currently exist, how they compare to each other, and how to write the fastest possible code on them. (Actually I want to learn FPGA programming too, &#8230; <a href="http://blog.reverberate.org/2011/03/31/addicted-to-hardware-new-toy-on-the-way/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m addicted to hardware.  I can&#8217;t stop thinking about all of the CPUs that currently exist, how they compare to each other, and how to write the fastest possible code on them.  (Actually I want to learn FPGA programming too, in case they ever start bundling FPGAs with computers).</p>
<p>A year and a half ago I bought a <a href="http://en.wikipedia.org/wiki/SheevaPlug">SheevaPlug</a> which is a little ARM computer that&#8217;s hardly bigger than a wall wart AC adapter.  It has USB and Ethernet and that&#8217;s about it (no display).  Unfortunately when I tried to start playing with it again tonight I discovered it was bricked thanks to a faulty power supply, which is apparently <a href="http://www.google.com/search?&#038;q=sheevaplug+power+supply">a very common problem with SheevaPlugs</a>.</p>
<p>I was a bit sad about this, and I started looking for alternatives.  I wanted something that runs on a non-x86 architecture and that I could stick in a closet and SSH to.  I discovered that there is a new form factor of computers known as a <a href="http://en.wikipedia.org/wiki/Nettop">nettop</a>.  I found a nettop that appears to fulfill my wishes perfectly: the <a href="http://www.genesi-usa.com/products/efika">EFIKA MX Smarttop</a> made by a company called Genesi.  It&#8217;s surprisingly capable for $129: it&#8217;s powered by an ARM Cortex-A8 800MHz CPU, has 512MB RAM, 8GB of flash, Ethernet, WiFi, Bluetooth, USB, and 720p video output through HDMI (with hardware-accelerated video decoding).  Not bad for $129!  And it has a maximum power consumption of 15W.</p>
<p>So I ordered one &#8212; will look forward to seeing if it is really all it&#8217;s stacked up to be!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2011/03/31/addicted-to-hardware-new-toy-on-the-way/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>When a compiler&#8217;s slow code actually bites you</title>
		<link>http://blog.reverberate.org/2011/03/19/when-a-compilers-slow-code-actually-bites-you/</link>
		<comments>http://blog.reverberate.org/2011/03/19/when-a-compilers-slow-code-actually-bites-you/#comments</comments>
		<pubDate>Sat, 19 Mar 2011 22:26:12 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=374</guid>
		<description><![CDATA[A few days ago I posted GCC: the impressive and the disappointing where I looked at some cases where GCC produces not-quite-optimal code. One of the comments on that post was (emphasis mine): So, it seems like there is a &#8230; <a href="http://blog.reverberate.org/2011/03/19/when-a-compilers-slow-code-actually-bites-you/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>A few days ago I posted <a href="http://blog.reverberate.org/2011/03/03/gcc-the-impressive-and-the-disappointing/">GCC: the impressive and the disappointing</a> where I looked at some cases where GCC produces not-quite-optimal code.  One of the comments on that post was (emphasis mine):</p>
<blockquote><p>So, it seems like there is a much better way to give the compiler a shot at doing the right thing: [snip suggestion]. I think you will find the compiler will generate quite efficient code in this case, <b>particularly if you look at the real execution overhead, rather than what the assembler looks like.</b></p></blockquote>
<p>This is a common attitude I encounter when I am discussing my attempts to optimize my protocol buffer decoding library <a href="https://github.com/haberman/upb">upb</a>.  Programmers love to tell other programmers that they are prematurely optimizing, and most of the time they&#8217;re right.  I&#8217;m sure to some people it seems ludicrous that I would be looking at assembly language output to determine whether it is efficient enough.  For 99.99% of programs, it would be.  But I&#8217;m working in one of those rare domains where it actually matters.  And today I encountered pretty convincing evidence that the compiler&#8217;s bad code is actually affecting me.</p>
<p>The compiler&#8217;s bad code in this case is an example of a bug I previously filed on GCC: <a href="http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194">struct returned by value generates useless stores</a>.  Though I had previously observed that bug only by inspecting assembly language output, today I had it show up on an actual profile as clear as day.  Here is a screenshot from Shark (click to get full-size):</p>
<p><a href="http://blog.reverberate.org/wp-content/uploads/2011/03/badcode.png"><img src="http://blog.reverberate.org/wp-content/uploads/2011/03/badcode-300x161.png" alt="Screenshot from Apple Shark showing the bad code." width="300" height="161" class="aligncenter size-medium wp-image-376" /></a></p>
<p>To summarize, the compiler took the code:</p>
<pre lang="c">
typedef struct {
  upb_flow_t flow;  // An enum defined elsewhere.
  void *closure;
} upb_sflow_t;

upb_flow_t upb_dispatch_startsubmsg([...]) {
  // [...]
  upb_sflow_t sflow = f->cb.startsubmsg([...]);
  if (sflow.flow != UPB_CONTINUE) {
    // [...]
  }
</pre>
<p>&#8230;and turned that function call/test into this awful machine code (here in its Intel-syntax form):</p>
<pre lang="asm">
  call   QWORD PTR [r12 + 16]
  mov    DWORD PTR [rbp - 64], eax
  mov    QWORD PTR [rbp - 56], rdx
  mov    rax, QWORD PTR [rbp - 64]    ; loads rax with data it already has.
  mov    QWORD PTR [rbp - 48], rax    ; stores rax into the stack a second time.
  mov    QWORD PTR [rbp - 40], rdx    ; stores rdx into the stack a second time.
  mov    edx, DWORD PTR [rbp - 48]    ; loads edx with data already in rax.
  testl  edx, edx
</pre>
<p>..and <i>then</i> (this is the important part) in an actual profile it shows up as being 43.4% of the execution time of a hot function in my program.</p>
<p>This is not a slam against the GCC developers.  GCC is a big and complex piece of software, and they have to prioritize all sorts of different bugs, feature requests, new hardware, etc.</p>
<p>This is just a reminder to those who jump to dare-I-say &#8220;premature&#8221; conclusions about what is premature optimization: some of us really are working in domains where things like virtual function overhead, branch predictability, and the efficiency of the compiler&#8217;s code make a difference.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2011/03/19/when-a-compilers-slow-code-actually-bites-you/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>GCC: the impressive and the disappointing</title>
		<link>http://blog.reverberate.org/2011/03/03/gcc-the-impressive-and-the-disappointing/</link>
		<comments>http://blog.reverberate.org/2011/03/03/gcc-the-impressive-and-the-disappointing/#comments</comments>
		<pubDate>Fri, 04 Mar 2011 07:58:53 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=361</guid>
		<description><![CDATA[In my work on upb I&#8217;ve looked at a lot of compiler-generated assembly code. I frequently want to know how GCC will compile a certain block of code, so I&#8217;ll write a little test program in C and use objdump &#8230; <a href="http://blog.reverberate.org/2011/03/03/gcc-the-impressive-and-the-disappointing/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>In my work on upb I&#8217;ve looked at a <i>lot</i> of compiler-generated assembly code.  I frequently want to know how GCC will compile a certain block of code, so I&#8217;ll write a little test program in C and use objdump to look at the object file.</p>
<p>Over two years of doing this, I&#8217;ve had many moments where I am pleasantly surprised at how smart the compiler is being.  C compilers can figure out a <i>lot</i>.  I demonstrated a few examples of this on stackoverflow.com &#8212; see <a href="http://stackoverflow.com/questions/895574/what-are-some-good-code-optimization-methods/1354688#1354688">here</a> and <a href="http://stackoverflow.com/questions/1269994/nanoseconds-to-milliseconds-fast-division-by-1000000/1270277#1270277">here</a>.</p>
<p>But once in a while I am disappointed, because GCC doesn&#8217;t figure something out that I really wish it could.  For example, I have a situation where I have a callback interface, but one of the parameters to my callback is something that a lot of clients don&#8217;t need.  For convenience, I want to let them register a callback that doesn&#8217;t take the final parameter.  But to be ANSI C, I can&#8217;t cast between the two callback types, I need to store the two possible callback types in a union.  My quick test looked like this:</p>
<pre lang="c">// My two callback types -- the second takes an extra parameter.
typedef union {
  void (*f1)(int);
  void (*f2)(int, int);
} funcs;

// I store "which" separately to track which type of callback was registered.
void foo(funcs f, int which) {
  int a = 5, b = 10;
  if (which) {
    f.f1(a);
  } else {
    f.f2(a, b);
  }
}</pre>
<p>Now on x86-64, parameters are passed in registers (thank god!), so there&#8217;s no actual reason to branch here.  You might as well just always put both values in registers, because if we were calling &#8220;f1&#8243; it will just ignore the register that&#8217;s holding the second parameter, or overwrite it which is fine too.  The only reason we put the &#8220;if&#8221; in the C code was to be ANSI C compilant &#8212; according to the standard you can&#8217;t cast between function pointer types.</p>
<p>But alas, GCC didn&#8217;t figure this out:</p>
<pre lang="asm">0000000000000000 <foo>:
   0:	85 f6                	test   esi,esi
   2:	48 89 f8             	mov    rax,rdi
   5:	75 11                	jne    18 <foo+0x18>
   7:	be 0a 00 00 00       	mov    esi,0xa
   c:	bf 05 00 00 00       	mov    edi,0x5
  11:	ff e0                	jmp    rax
  13:	0f 1f 44 00 00       	nop    DWORD PTR [rax+rax*1+0x0]
  18:	bf 05 00 00 00       	mov    edi,0x5
  1d:	ff e0                	jmp    rax</pre>
<p>Notice it put a branch in there (&#8220;jne&#8221;) <i>just to avoid</i> putting the value 10 (0xa) in register esi for the one-argument path.</p>
<p>It&#8217;s even worse if I give both functions one parameter, but of different types:</p>
<pre lang="c">typedef union {
  void (*f1)(int);
  void (*f2)(long);
} funcs;

void foo(funcs f, int which) {
  int a = 5;
  if (which) {
    f.f1(a);
  } else {
    f.f2(a);
  }
}</pre>
<p>In this case, GCC again generates a branch, but both paths have <i>identical</i> code in them!</p>
<pre lang="asm">0000000000000000 <foo>:
   0:	85 f6                	test   esi,esi
   2:	48 89 f8             	mov    rax,rdi
   5:	75 09                	jne    10 <foo+0x10>
   7:	bf 05 00 00 00       	mov    edi,0x5
   c:	ff e0                	jmp    rax
   e:	66 90                	xchg   ax,ax
  10:	bf 05 00 00 00       	mov    edi,0x5
  15:	ff e0                	jmp    rax</pre>
<p>Both branches simply move 5 into edi and jump to rax.  There is absolutely no reason to branch here.  Sigh.</p>
<p>C compilers are smart, but have their limits.  Another thing this demonstrates is how programming in C can be a bit constraining compared to assembly language.  The only reason I&#8217;m jumping through these hoops to begin with is that C has very strict rules about pointer conversion: you can&#8217;t just go around casting one function pointer type to another, because you&#8217;ll get undefined behavior.  But if you&#8217;re programming in assembly there <i>is</i> no undefined behavior, no worrying about aliasing, etc.  The code I&#8217;m trying to get C to generate in a standards-compliant way would be trivial to write in assembly language directly.</p>
<p>Of course in that case I&#8217;d have to implement the assembly language on every platform I wanted to target.  In the end I&#8217;ll probably use the branchy version that the compiler will generate; the branch will probably predict pretty well, and more importantly for the <i>real</i> fast path for protobuf decoding I have a JIT on the way&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2011/03/03/gcc-the-impressive-and-the-disappointing/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
		<item>
		<title>Torn over the C++ question</title>
		<link>http://blog.reverberate.org/2009/12/02/torn-over-the-c-question/</link>
		<comments>http://blog.reverberate.org/2009/12/02/torn-over-the-c-question/#comments</comments>
		<pubDate>Wed, 02 Dec 2009 20:17:57 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=301</guid>
		<description><![CDATA[I am having a very difficult time deciding whether to go through with the C++ port of upb or to stay in C. I&#8217;ve ported about one third of upb to C++, on a branch, to see how it would &#8230; <a href="http://blog.reverberate.org/2009/12/02/torn-over-the-c-question/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I am having a very difficult time deciding whether to go through with the C++ port of upb or to stay in C.</p>
<p>I&#8217;ve ported about one third of upb to C++, on a branch, to see how it would turn out.  It was a ton of work.  Here are my current observations:</p>
<ul>
<li>The C++ is cleaner, more readable, less error-prone code.  It&#8217;s just a fact.  Compare for yourself (C: <a href="http://github.com/haberman/upb/blob/a95ab58e79c50b0927eae2b834d3de20a8effc36/src/upb_def.h">upb_def.h</a>, <a href="http://github.com/haberman/upb/blob/a95ab58e79c50b0927eae2b834d3de20a8effc36/src/upb_def.c">upb_def.c</a>; C++: <a href="http://github.com/haberman/upb/blob/cplusplus/src/upb_def.h">upb_def.h</a>, <a href="http://github.com/haberman/upb/blob/cplusplus/src/upb_def.cc">upb_def.cc</a>). This is due to numerous factors:
<ul>
<li>type-safe containers means fewer casts.</li>
<li>&#8220;public&#8221; and &#8220;private&#8221; keywords make it easy to separate the private parts of your interface, without having to specify in comments which is which.</li>
<li>namespaces and class scope mean that I don&#8217;t have to write out my identifiers like upb_fielddef_dothis(), I can just write DoThis().</li>
<li>real inheritance and member classes mean I don&#8217;t have to explicitly call all the right constructors/destructors, or write explicit casts for upcasts</li>
<li>destructors that are guaranteed to run on scope exit mean I can use RAII patterns like mutexes that automatically unlock when the scope is exited</li>
</ul>
</li>
<li>The source got shorter; the portion I ported went from 1483 lines to 1133, or a ~30% reduction.</li>
<li>The binary got a LOT bigger.  I had one function get literally 5x as big.  I haven&#8217;t figured out why this happened yet.  I used templates to make the table generic, but I was extremely careful to make sure that the template only generated a small amount of code &#8212; basically just the hash lookup routine, which is small (note: the hash <i>function</i> for strings was not templated or inlined).  But another issue is that the C++ compiler appears to emit multiple copies of the same function in the same object file!  For example, I found some virtual destructors emitted literally three times in the same file.  Why is this?</li>
<li>I just heard back from a security guru from the Google security team, who said that C is often easier to audit than C++ because it&#8217;s easier to figure out what is actually going on, without having to dig through layers of abstraction.  This surprised me (maybe it shouldn&#8217;t have, since <a href="http://www.emerose.com/">Sam Quigley</a> said the same thing in a comment on my last entry), but I was also a little bit relieved.</li>
</ul>
<p>I&#8217;m leaning towards sticking with C, for the following reasons:
<ul>
<li>C++ compilers aren&#8217;t very good at keeping things small, even when you are juducious with your use of templates.</li>
<li>C++ compilers are much more complicated that C compilers, and therefore not as ubiquitous or as easy to trust generally.</li>
<li>C isn&#8217;t harder to audit for security than C++, and may actually be easier.</li>
</ul>
<p>I&#8217;ll try to take some of the lessons I learned from my partial C++ port to make the C more readable.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2009/12/02/torn-over-the-c-question/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Gazelle/upb status and plans (aka: On Releasing)</title>
		<link>http://blog.reverberate.org/2009/11/28/gazelleupb-status-and-plans-aka-on-releasing/</link>
		<comments>http://blog.reverberate.org/2009/11/28/gazelleupb-status-and-plans-aka-on-releasing/#comments</comments>
		<pubDate>Sun, 29 Nov 2009 01:24:15 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=293</guid>
		<description><![CDATA[This summer my friends Ben and Mike gave me grief about never releasing anything. Their criticism is definitely valid to some degree. I&#8217;ve been working on Gazelle for about two years now, and upb for almost one. Gazelle has had &#8230; <a href="http://blog.reverberate.org/2009/11/28/gazelleupb-status-and-plans-aka-on-releasing/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>This summer my friends <a href="http://benjaminbernard.blogspot.com/">Ben</a> and <a href="http://www.technofetish.net/buffaloblog/">Mike</a> gave me grief about never releasing anything.  Their criticism is definitely valid to some degree.  I&#8217;ve been working on <a href="http://www.gazelle-parser.org/">Gazelle</a> for about two years now, and <a href="http://wiki.github.com/haberman/upb">upb</a> for almost one.  Gazelle has had <a href="http://github.com/haberman/gazelle/blob/master/ReleaseNotes">four releases</a> in that time, but they have mostly focused on moving Gazelle to where I think it ought to be, as opposed to releasing something hacky that people can actually use now.  There is a class of problems that Gazelle is useful for now, but it is pretty small in comparison to the amount of work I&#8217;ve put in.</p>
<p>I haven&#8217;t released upb at all yet, and my last message indicating I&#8217;m thinking of porting it to C++ will probably make skeptical readers think I&#8217;m moving farther away from a release rather than closer to one.</p>
<p>Since I agree that my progress doesn&#8217;t look too promising to someone observing from the outside, let me say where I think these projects currently are, where they&#8217;re going, and when they&#8217;re likely to release.</p>
<p>First of all, Gazelle is currently pushed on the stack until I have upb released.  The reason is that I realized that Protocol Buffers are the answer to two big problems I was facing with Gazelle:
<ol>
<li><b>byte-code format</b>: right now the Gazelle byte-code format is LLVM&#8217;s <a href="http://llvm.org/docs/BitCodeFormat.html">BitCode</a>, which is the format LLVM uses for storing its byte-code internally.  I invested a lot into BitCode (you&#8217;ll notice my name is on the linked document), including writing a standalone encoder and decoder (<a href="http://github.com/haberman/gazelle/blob/master/compiler/bc.lua">230 lines of Lua</a> and <a href="http://github.com/haberman/gazelle/blob/master/runtime/bc_read_stream.c">856 lines of C</a>, respectively).  But this was before I worked at Google or knew about Protocol Buffers.  Protocol Buffers are much easier to use because they have a formal schema (the .proto file) that can generate nice APIs and help you out with backward compatibility.  Without a format schema, BitCode makes you resort to things like <a href="http://github.com/haberman/gazelle/blob/master/docs/FILEFORMAT">an ad hoc text file that describes the schema</a>.  This approach was showing its limits.</li>
<li><b>parse tree format</b>: I always knew I wanted Gazelle to be capable of generating parse trees in some kind of standard format.  Protocol Buffers end up being a match made in heaven, since they are isomorphic to parse trees in a very deep way.  Indeed, <a href="http://scottmcpeak.com/elkhound/sources/ast/index.html">the ast system for the Elkhound Parser</a> is very much like Protocol Buffers in that you define your parse tree format and it generates classes for representing your AST.</li>
</ol>
<p>Since Gazelle is gated on upb, the question then is: when will upb release?  Why hasn&#8217;t it released already?</p>
<p>A few months ago I was working on upb for 100% of my time at work.  I had banked 20% time for a while, and I was also a bit burned out on my 80% project, so my manager very graciously gave me the liberty to work on upb for all of my working hours.</p>
<p>During that time upb made progress in several areas.  It got some better benchmarks and tests, and I fleshed out the upb compiler so that it wasn&#8217;t dependent on the official Protocol Buffers compiler for bootstrapping.  Maybe most importantly, I worked a lot on the in-memory message format to figure out how to make it work well with dynamic languages.</p>
<p>My goal during that time was to write a Python extension that a few initial internal-to-Google customers could use.  The value proposition is that it would be API-compatible with what they were already using, but many times faster.  I wrote <a href="http://github.com/haberman/upb/blob/master/lang_ext/python/pb.c">said extension</a>, which was incomplete (supported decoding only, not encoding), but looked complete enough to use for this case.</p>
<p>By this time I was approaching the amount of time I could reasonably ask from my manager at work, so I had to tie up the loose ends and get it into my initial customer&#8217;s hands.  I put all the pieces together and tried it out, but then ran into a problem; I hadn&#8217;t realized that this initial customer was using an old deprecated feature of Protocol Buffers called MessageSet.  There was no way I could support MessageSet without significant changes.  I was defeated for the moment.  I had to take a break for a few months and re-devote my time to my 80% project.</p>
<p>I mention this all just to illustrate that I do have actual customers that I am targeting, and I have had aggressive pushes to deliver something to those customers, but unfortunately my work wasn&#8217;t complete enough for them yet.</p>
<p>This brings us up to now.  In the last week or two, I have made several strides, including executing on part of a design that will get me MessageSet support.  I have also developed an interface for a &#8220;pick parser&#8221;, which lets you pull only a small subset of fields out of a protobuf.  This will be a big win for use cases that only need a few fields from a very large proto, and I have a customer internal-to-Google who is very interested in this interface.</p>
<p>Meanwhile I&#8217;m very interested in trying to get the upb Python extension into AppEngine, because I think it could be a huge win there since users aren&#8217;t allowed to load custom Python extensions.  This means that currently, people trying to use protocol buffers on AppEngine are limited to pure-Python extensions that are much slower than a C extension can be.  But to get into AppEngine I will need to get a security audit, which is part of the reason I am leaning towards C++ at this point.  I think C++ will make the code shorter and less gnarly (fewer casts), which should lead to easier verifiability.  I converted one header file so far, and it got 38% smaller and much easier to read.</p>
<p>I hesitate to make schedule estimates, but my main purpose is to impress on my possibly-impatient audience that:
<ul>
<li>I do have motivation to release.</li>
<li>I do have initial customers and initial use cases.</li>
<li>I am making progress.</li>
<li>I am currently focused on delivering (1) a pick parser, (2) a Python extension, (3) an easily-auditable code-base.</li>
<li>I look forward to being able to announce my first release!</li>
</ul>
<p>Yours,<br />
Josh</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2009/11/28/gazelleupb-status-and-plans-aka-on-releasing/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

