<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Josh Haberman</title>
	<atom:link href="http://blog.reverberate.org/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.reverberate.org</link>
	<description>parsing, performance, minimalism with C99</description>
	<lastBuildDate>Wed, 02 Dec 2009 20:45:36 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Torn over the C++ question</title>
		<link>http://blog.reverberate.org/2009/12/02/torn-over-the-c-question/</link>
		<comments>http://blog.reverberate.org/2009/12/02/torn-over-the-c-question/#comments</comments>
		<pubDate>Wed, 02 Dec 2009 20:17:57 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=301</guid>
		<description><![CDATA[I am having a very difficult time deciding whether to go through with the C++ port of upb or to stay in C.
I&#8217;ve ported about one third of upb to C++, on a branch, to see how it would turn out.  It was a ton of work.  Here are my current observations:

The C++ [...]]]></description>
			<content:encoded><![CDATA[<p>I am having a very difficult time deciding whether to go through with the C++ port of upb or to stay in C.</p>
<p>I&#8217;ve ported about one third of upb to C++, on a branch, to see how it would turn out.  It was a ton of work.  Here are my current observations:</p>
<ul>
<li>The C++ is cleaner, more readable, less error-prone code.  It&#8217;s just a fact.  Compare for yourself (C: <a href="http://github.com/haberman/upb/blob/a95ab58e79c50b0927eae2b834d3de20a8effc36/src/upb_def.h">upb_def.h</a>, <a href="http://github.com/haberman/upb/blob/a95ab58e79c50b0927eae2b834d3de20a8effc36/src/upb_def.c">upb_def.c</a>; C++: <a href="http://github.com/haberman/upb/blob/cplusplus/src/upb_def.h">upb_def.h</a>, <a href="http://github.com/haberman/upb/blob/cplusplus/src/upb_def.cc">upb_def.cc</a>). This is due to numerous factors:
<ul>
<li>type-safe containers means fewer casts.</li>
<li>&#8220;public&#8221; and &#8220;private&#8221; keywords make it easy to separate the private parts of your interface, without having to specify in comments which is which.</li>
<li>namespaces and class scope mean that I don&#8217;t have to write out my identifiers like upb_fielddef_dothis(), I can just write DoThis().</li>
<li>real inheritance and member classes mean I don&#8217;t have to explicitly call all the right constructors/destructors, or write explicit casts for upcasts</li>
<li>destructors that are guaranteed to run on scope exit mean I can use RAII patterns like mutexes that automatically unlock when the scope is exited</li>
</ul>
</li>
<li>The source got shorter; the portion I ported went from 1483 lines to 1133, or a ~30% reduction.</li>
<li>The binary got a LOT bigger.  I had one function get literally 5x as big.  I haven&#8217;t figured out why this happened yet.  I used templates to make the table generic, but I was extremely careful to make sure that the template only generated a small amount of code &#8212; basically just the hash lookup routine, which is small (note: the hash <i>function</i> for strings was not templated or inlined).  But another issue is that the C++ compiler appears to emit multiple copies of the same function in the same object file!  For example, I found some virtual destructors emitted literally three times in the same file.  Why is this?</li>
<li>I just heard back from a security guru from the Google security team, who said that C is often easier to audit than C++ because it&#8217;s easier to figure out what is actually going on, without having to dig through layers of abstraction.  This surprised me (maybe it shouldn&#8217;t have, since <a href="http://www.emerose.com/">Sam Quigley</a> said the same thing in a comment on my last entry), but I was also a little bit relieved.</li>
</ul>
<p>I&#8217;m leaning towards sticking with C, for the following reasons:
<ul>
<li>C++ compilers aren&#8217;t very good at keeping things small, even when you are juducious with your use of templates.</li>
<li>C++ compilers are much more complicated that C compilers, and therefore not as ubiquitous or as easy to trust generally.</li>
<li>C isn&#8217;t harder to audit for security than C++, and may actually be easier.</li>
</ul>
<p>I&#8217;ll try to take some of the lessons I learned from my partial C++ port to make the C more readable.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2009/12/02/torn-over-the-c-question/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Gazelle/upb status and plans (aka: On Releasing)</title>
		<link>http://blog.reverberate.org/2009/11/28/gazelleupb-status-and-plans-aka-on-releasing/</link>
		<comments>http://blog.reverberate.org/2009/11/28/gazelleupb-status-and-plans-aka-on-releasing/#comments</comments>
		<pubDate>Sun, 29 Nov 2009 01:24:15 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=293</guid>
		<description><![CDATA[This summer my friends Ben and Mike gave me grief about never releasing anything.  Their criticism is definitely valid to some degree.  I&#8217;ve been working on Gazelle for about two years now, and upb for almost one.  Gazelle has had four releases in that time, but they have mostly focused on moving [...]]]></description>
			<content:encoded><![CDATA[<p>This summer my friends <a href="http://benjaminbernard.blogspot.com/">Ben</a> and <a href="http://www.technofetish.net/buffaloblog/">Mike</a> gave me grief about never releasing anything.  Their criticism is definitely valid to some degree.  I&#8217;ve been working on <a href="http://www.gazelle-parser.org/">Gazelle</a> for about two years now, and <a href="http://wiki.github.com/haberman/upb">upb</a> for almost one.  Gazelle has had <a href="http://github.com/haberman/gazelle/blob/master/ReleaseNotes">four releases</a> in that time, but they have mostly focused on moving Gazelle to where I think it ought to be, as opposed to releasing something hacky that people can actually use now.  There is a class of problems that Gazelle is useful for now, but it is pretty small in comparison to the amount of work I&#8217;ve put in.</p>
<p>I haven&#8217;t released upb at all yet, and my last message indicating I&#8217;m thinking of porting it to C++ will probably make skeptical readers think I&#8217;m moving farther away from a release rather than closer to one.</p>
<p>Since I agree that my progress doesn&#8217;t look too promising to someone observing from the outside, let me say where I think these projects currently are, where they&#8217;re going, and when they&#8217;re likely to release.</p>
<p>First of all, Gazelle is currently pushed on the stack until I have upb released.  The reason is that I realized that Protocol Buffers are the answer to two big problems I was facing with Gazelle:
<ol>
<li><b>byte-code format</b>: right now the Gazelle byte-code format is LLVM&#8217;s <a href="http://llvm.org/docs/BitCodeFormat.html">BitCode</a>, which is the format LLVM uses for storing its byte-code internally.  I invested a lot into BitCode (you&#8217;ll notice my name is on the linked document), including writing a standalone encoder and decoder (<a href="http://github.com/haberman/gazelle/blob/master/compiler/bc.lua">230 lines of Lua</a> and <a href="http://github.com/haberman/gazelle/blob/master/runtime/bc_read_stream.c">856 lines of C</a>, respectively).  But this was before I worked at Google or knew about Protocol Buffers.  Protocol Buffers are much easier to use because they have a formal schema (the .proto file) that can generate nice APIs and help you out with backward compatibility.  Without a format schema, BitCode makes you resort to things like <a href="http://github.com/haberman/gazelle/blob/master/docs/FILEFORMAT">an ad hoc text file that describes the schema</a>.  This approach was showing its limits.</li>
<li><b>parse tree format</b>: I always knew I wanted Gazelle to be capable of generating parse trees in some kind of standard format.  Protocol Buffers end up being a match made in heaven, since they are isomorphic to parse trees in a very deep way.  Indeed, <a href="http://scottmcpeak.com/elkhound/sources/ast/index.html">the ast system for the Elkhound Parser</a> is very much like Protocol Buffers in that you define your parse tree format and it generates classes for representing your AST.</li>
</ol>
<p>Since Gazelle is gated on upb, the question then is: when will upb release?  Why hasn&#8217;t it released already?</p>
<p>A few months ago I was working on upb for 100% of my time at work.  I had banked 20% time for a while, and I was also a bit burned out on my 80% project, so my manager very graciously gave me the liberty to work on upb for all of my working hours.</p>
<p>During that time upb made progress in several areas.  It got some better benchmarks and tests, and I fleshed out the upb compiler so that it wasn&#8217;t dependent on the official Protocol Buffers compiler for bootstrapping.  Maybe most importantly, I worked a lot on the in-memory message format to figure out how to make it work well with dynamic languages.</p>
<p>My goal during that time was to write a Python extension that a few initial internal-to-Google customers could use.  The value proposition is that it would be API-compatible with what they were already using, but many times faster.  I wrote <a href="http://github.com/haberman/upb/blob/master/lang_ext/python/pb.c">said extension</a>, which was incomplete (supported decoding only, not encoding), but looked complete enough to use for this case.</p>
<p>By this time I was approaching the amount of time I could reasonably ask from my manager at work, so I had to tie up the loose ends and get it into my initial customer&#8217;s hands.  I put all the pieces together and tried it out, but then ran into a problem; I hadn&#8217;t realized that this initial customer was using an old deprecated feature of Protocol Buffers called MessageSet.  There was no way I could support MessageSet without significant changes.  I was defeated for the moment.  I had to take a break for a few months and re-devote my time to my 80% project.</p>
<p>I mention this all just to illustrate that I do have actual customers that I am targeting, and I have had aggressive pushes to deliver something to those customers, but unfortunately my work wasn&#8217;t complete enough for them yet.</p>
<p>This brings us up to now.  In the last week or two, I have made several strides, including executing on part of a design that will get me MessageSet support.  I have also developed an interface for a &#8220;pick parser&#8221;, which lets you pull only a small subset of fields out of a protobuf.  This will be a big win for use cases that only need a few fields from a very large proto, and I have a customer internal-to-Google who is very interested in this interface.</p>
<p>Meanwhile I&#8217;m very interested in trying to get the upb Python extension into AppEngine, because I think it could be a huge win there since users aren&#8217;t allowed to load custom Python extensions.  This means that currently, people trying to use protocol buffers on AppEngine are limited to pure-Python extensions that are much slower than a C extension can be.  But to get into AppEngine I will need to get a security audit, which is part of the reason I am leaning towards C++ at this point.  I think C++ will make the code shorter and less gnarly (fewer casts), which should lead to easier verifiability.  I converted one header file so far, and it got 38% smaller and much easier to read.</p>
<p>I hesitate to make schedule estimates, but my main purpose is to impress on my possibly-impatient audience that:
<ul>
<li>I do have motivation to release.</li>
<li>I do have initial customers and initial use cases.</li>
<li>I am making progress.</li>
<li>I am currently focused on delivering (1) a pick parser, (2) a Python extension, (3) an easily-auditable code-base.</li>
<li>I look forward to being able to announce my first release!</li>
</ul>
<p>Yours,<br />
Josh</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2009/11/28/gazelleupb-status-and-plans-aka-on-releasing/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Porting upb to C++?</title>
		<link>http://blog.reverberate.org/2009/11/28/porting-upb-to-c/</link>
		<comments>http://blog.reverberate.org/2009/11/28/porting-upb-to-c/#comments</comments>
		<pubDate>Sat, 28 Nov 2009 20:53:02 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[upb]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=287</guid>
		<description><![CDATA[I am on the verge of trying something I never thought I&#8217;d do.  I&#8217;m considering porting upb to C++.
My reasons aren&#8217;t ideological, they are highly practical.  Basically I am realizing that while object-oriented C is OK for a while, it&#8217;s very weak at inheritance.  Inheritance in C involves a lot of casting, [...]]]></description>
			<content:encoded><![CDATA[<p>I am on the verge of trying something I never thought I&#8217;d do.  I&#8217;m considering porting upb to C++.</p>
<p>My reasons aren&#8217;t ideological, they are highly practical.  Basically I am realizing that while object-oriented C is OK for a while, it&#8217;s very weak at inheritance.  Inheritance in C involves a lot of casting, duplicated code and/or macros, and careful discipline.  The main problems with this are:
<ul>
<li>the code gets longer and less readable</li>
<li>the code involves more possibly-unsafe operations like casts</li>
</ul>
<p>Both of these problems make the code ultimately more difficult to audit for security.  And getting upb audited for security is something I plan to do very soon.</p>
<p>I am coming to believe that porting to C++ would make upb smaller (in lines of code) and easier for verify for security.  However, there are a few major disadvantages that are giving me pause:
<ul>
<li>there are still some contexts in which C++ is a no-go, like the Linux kernel, embedded systems that only have a C compiler (but no C++), or projects that want to stay C-only.  Doing this port would make upb unsuitable for these contexts.</li>
<li>projects that are currently C-only would need to create C++ source files to call upb APIs, and will have to link in the C++ runtime
<li>(possible) C++ could result in a larger binary.</li>
</ul>
<p>When I look at the downsides though, they don&#8217;t seem to pertain to my initial goals of making upb useful for Python, Lua, Ruby, etc. extensions, and for use inside Google.  Being useful for really restricted embedded systems is a far-off use case.  So it&#8217;s sounding like porting to C++ is the right thing to do.</p>
<p>I hope it significantly reduces the line count, as I expect it will.  That will make me feel better about giving up the minimalism of C.  I will definitely be compiling with <tt>-fno-exceptions -fno-rtti -fvisibility-inlines-hidden</tt> on gcc.  I also won&#8217;t be using any of the C++ standard library (not even &lt;string&gt;).</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2009/11/28/porting-upb-to-c/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Wanted: a portable mutex and atomic refcount</title>
		<link>http://blog.reverberate.org/2009/08/14/wanted-a-portable-mutex-and-atomic-refcount/</link>
		<comments>http://blog.reverberate.org/2009/08/14/wanted-a-portable-mutex-and-atomic-refcount/#comments</comments>
		<pubDate>Sat, 15 Aug 2009 02:12:02 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[wanted]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=275</guid>
		<description><![CDATA[upb needs to have some lightweight thread-aware behavior.  I&#8217;m leaving most synchronization up to users (individual messages will not be thread-safe), but there are a few central structures I need to make thread-safe and reference-counted.
I need only the tiniest bit of functionality:

a portable mutex.
a portable atomic_t that lets me atomic_inc() and atomic_dec().

We&#8217;re talking &#8220;lives [...]]]></description>
			<content:encoded><![CDATA[<p>upb needs to have some lightweight thread-aware behavior.  I&#8217;m leaving most synchronization up to users (individual messages will <i>not</i> be thread-safe), but there are a few central structures I need to make thread-safe and reference-counted.</p>
<p>I need only the tiniest bit of functionality:</p>
<ul>
<li>a portable mutex.</li>
<li>a portable atomic_t that lets me atomic_inc() and atomic_dec().</li>
</ul>
<p>We&#8217;re talking &#8220;lives in one single header&#8221; small.  The mutex would just be wrappers around existing mutex implementations (pthreads, windows, etc), and since those routines typically take care of any memory barriers you need to safely read/mutate the shared state, I wouldn&#8217;t have to worry about that.</p>
<p>The atomic type would have to be hand-coded and architecture-specific, since most threading libraries don&#8217;t provide one.  The reason for providing this would be reference-counting.  If you are reference-counting an immutable structure, then you don&#8217;t need to worry about memory barriers to ensure the consistency of that structure; if you&#8217;re reference-counting a <i>mutable</i> structure, then you&#8217;ll need to  protect the mutable state with mutexes and acquire the mutex before freeing anything.</p>
<p>The library (er, header file) should also support compiling everything to nothing if NO_THREAD_SAFETY is defined as a preprocessor symbol.</p>
<p>Yes, that all sounds good.  Tiny yet functional.  I&#8217;ll be writing this very soon unless something exactly like what I&#8217;ve described happens to already exist.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2009/08/14/wanted-a-portable-mutex-and-atomic-refcount/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Wanted: a mailing list reader website</title>
		<link>http://blog.reverberate.org/2009/08/10/wanted-a-mailing-list-reader-website/</link>
		<comments>http://blog.reverberate.org/2009/08/10/wanted-a-mailing-list-reader-website/#comments</comments>
		<pubDate>Tue, 11 Aug 2009 05:33:17 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=265</guid>
		<description><![CDATA[There is tons of interesting discussion that happens on technical mailing lists.  Mailing lists are the best snapshot of the state of a software project; they capture what current users are trying to do, where they&#8217;re succeeding, where they&#8217;re running into trouble, and what the current plans are for making things better.
Unfortunately, there is [...]]]></description>
			<content:encoded><![CDATA[<p>There is tons of interesting discussion that happens on technical mailing lists.  Mailing lists are the best snapshot of the state of a software project; they capture what current users are trying to do, where they&#8217;re succeeding, where they&#8217;re running into trouble, and what the current plans are for making things better.</p>
<p>Unfortunately, there is still no good way (AFICS) to lurk on high-volume mailing lists.  Your current options are:</p>
<ul>
<li><b>Subscribe your personal email address to the mailing list.</b>  <b>Pros:</b> threads well, tracks &#8220;read&#8221; status well, easy to reply.  <b>Cons:</b> too much overhead to subscribe to a new list (subscribe, confirm, set up mail filter), mail builds up as unread if you don&#8217;t read it for a while.  Overwhelming for high volume lists.  Not convenient when your level of interest in a list varies.</li>
<li><b>Read on gmane.org.</b>  <b>Pros:</b> threads somewhat well (new messages on old threads get lost, because the whole thread is sorted based on when its first message arrived), tracks your &#8220;read&#8221; status (but not across browsers or computers), easy to track a single list and only read what interests you.  <b>Cons:</b> not easy to track multiple lists.  There&#8217;s no top-level &#8220;what&#8217;s new on lists I care about&#8221; view.  Not easy to reply (the built in editor makes you wrap lines yourself).</li>
<li><b>RSS feed from gmane.org.</b>  <b>Pros:</b> easy to track multiple lists, old mail doesn&#8217;t build up if I don&#8217;t read it for a while, lets me read mailing list posts alongside my other feeds.  <b>Cons:</b> RSS is a terrible match for mailing lists.  It doesn&#8217;t understand threads at all.  No easy way to reply.  Also, the gmane RSS feeds link you to the blog interface, which is equally terrible at threading.</li>
<li><b>The <a href="http://lurker.sourceforge.net/">Lurker</a> email archiver. </b> I&#8217;ve been a fan of this project for a while.  <b>Pros:</b> GREAT interface (check out this <a href="http://archives.free.net.ph/list/modperl.en.html">demo site</a>), top-level page that summarizes threads and their activity, thread view that shows you replies according to both time and threading.  <b>Cons:</b> you have to run it yourself (it&#8217;s a project, not a service), doesn&#8217;t remember what you&#8217;ve read, and no easy way to reply.</li>
</ul>
<p>What I would <i>love</i> is a website that would let me easily lurk on mailing lists.  I&#8217;d love an interface like Lurker, but that I log into so that it knows what I&#8217;ve read.  I&#8217;d want a top-level view that shows popular threads across ALL mailing lists I&#8217;m lurking on, not just one mailing list at a time.  And I&#8217;d want the ability to easily reply to mailing list threads.</p>
<p>If anyone knows a better way to get what I want, please let me know!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2009/08/10/wanted-a-mailing-list-reader-website/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Giving up on AT&amp;T style assembler syntax</title>
		<link>http://blog.reverberate.org/2009/07/31/giving-up-on-att-style-assembler-syntax/</link>
		<comments>http://blog.reverberate.org/2009/07/31/giving-up-on-att-style-assembler-syntax/#comments</comments>
		<pubDate>Fri, 31 Jul 2009 23:07:05 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=262</guid>
		<description><![CDATA[Until recently I had been pretty agnostic about Intel vs. AT&#038;T style assembler syntax.  I always noticed that people who had a strong opinion almost always preferred Intel-style, but I didn&#8217;t care too much one way or the other.
gcc was my first real compiler, and this was back before binutils supported Intel syntax like [...]]]></description>
			<content:encoded><![CDATA[<p>Until recently I had been pretty agnostic about Intel vs. AT&#038;T style assembler syntax.  I always noticed that people who had a strong opinion almost always preferred Intel-style, but I didn&#8217;t care too much one way or the other.</p>
<p>gcc was my first real compiler, and this was back before binutils supported Intel syntax like they do now.  So I read <a href="http://www.delorie.com/djgpp/doc/brennan/brennan_att_inline_djgpp.html">Brennan&#8217;s Guide to Inline Assembly</a> (which I still reference frequently), and didn&#8217;t worry too much about it.</p>
<p>One thing that always bugged me a little bit was how the instruction names weren&#8217;t exactly the same.  AT&#038;T made you put these suffixes on your instructions, so <code>mov</code> would become <code>movl</code>.  The main problem with this is Googleability.</p>
<p>But today what was previously an annoyance reached the level of being a serious problem.  I was looking at an instruction listing and saw the instruction <code>movslq</code>.  First I Googled for movsl (presuming that the &#8220;q&#8221; was a &#8220;quadword&#8221; suffix), but that yielded nothing.  Then I tried Googling for <code>movslq</code> in its entirety, still nothing that seemed to define the instruction.</p>
<p>When I did follow the link, what I discovered is that <code>movslq</code> in AT&#038;T syntax corresponds to <code>movsxd</code> in Intel syntax.  The moment I discovered this, it became quite clear to me that AT&#038;T syntax was a dead end.  &#8220;-M intel&#8221; will be my default parameter to objdump from now on.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2009/07/31/giving-up-on-att-style-assembler-syntax/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Git needs a new interface</title>
		<link>http://blog.reverberate.org/2009/07/30/gits-needs-a-new-interface/</link>
		<comments>http://blog.reverberate.org/2009/07/30/gits-needs-a-new-interface/#comments</comments>
		<pubDate>Thu, 30 Jul 2009 20:38:17 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=257</guid>
		<description><![CDATA[I&#8217;ve been a git advocate for a while, and I use git in two different projects.  I think git is an impressive technical accomplishment, but I think its interface (&#8221;porcelain&#8221;) is not ready for prime-time.  I really hope some UI-focused person will design a &#8220;v2&#8243; for the git interface so that someday git [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been a git advocate for a while, and I use git in two different projects.  I think git is an impressive technical accomplishment, but I think its interface (&#8221;porcelain&#8221;) is not ready for prime-time.  I really hope some UI-focused person will design a &#8220;v2&#8243; for the git interface so that someday git can be the obvious choice for version control for any project.</p>
<p>Specific problems:</p>
<h3>&#8220;checkout&#8221; is a destructive command</h3>
<p>I seriously have no idea what Linus was thinking.  It is insanity that:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">$ git checkout foo.c</pre></div></div>

<p>&#8230;will overwrite any local modifications you may have to foo.c without asking.</p>
<h3>You can&#8217;t merge upstream changes into your local, uncommitted modifications</h3>
<p>Suppose I cloned some repository and I started hacking it up.  My changes are still hacky and not ready to be committed.  Say a few days later I want to pull upstream changes, but without committing my hacky changes to my local repository:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">$ git pull
remote: Counting objects: <span style="color: #000000;">5</span>, done.
remote: Compressing objects: <span style="color: #000000;">100</span><span style="color: #000000; font-weight: bold;">%</span> <span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #000000;">2</span><span style="color: #000000; font-weight: bold;">/</span><span style="color: #000000;">2</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>, done.
remote: Total <span style="color: #000000;">3</span> <span style="color: #7a0874; font-weight: bold;">&#40;</span>delta <span style="color: #000000;">0</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>, reused <span style="color: #000000;">0</span> <span style="color: #7a0874; font-weight: bold;">&#40;</span>delta <span style="color: #000000;">0</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>
Unpacking objects: <span style="color: #000000;">100</span><span style="color: #000000; font-weight: bold;">%</span> <span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #000000;">3</span><span style="color: #000000; font-weight: bold;">/</span><span style="color: #000000;">3</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>, done.
From <span style="color: #000000; font-weight: bold;">/</span>tmp<span style="color: #000000; font-weight: bold;">/</span>foo
   e726259..e20c51a  master     -<span style="color: #000000; font-weight: bold;">&gt;</span> origin<span style="color: #000000; font-weight: bold;">/</span>master
Updating e726259..e20c51a
error: Entry <span style="color: #ff0000;">'foo.c'</span> not uptodate. Cannot merge.</pre></div></div>

<p>Git is refusing to perform a merge of my local modifications with the upstream changes.  It wants me to commit my local changes first.  This is annoying.  Every version control system I have ever used supports this, except Git.  Asking me to commit my hacky changes is unreasonable; they&#8217;re hacky and unfinished.  They might not even compile!</p>
<p>Yes, I could do:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">$ git stash
$ git pull
$ git stash apply</pre></div></div>

<p>But why should I have to do this?  CVS, SVN, and P4 don&#8217;t make me.</p>
<h3>Git&#8217;s merge conflict resolution workflow is unintuitive</h3>
<p>Continuing with the above example, now suppose I committed my local changes and then did a pull, but the changes were conflicting:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">$ git pull
Auto-merged foo.c
CONFLICT <span style="color: #7a0874; font-weight: bold;">&#40;</span>content<span style="color: #7a0874; font-weight: bold;">&#41;</span>: Merge conflict <span style="color: #000000; font-weight: bold;">in</span> foo.c
Automatic merge failed; fix conflicts and <span style="color: #000000; font-weight: bold;">then</span> commit the result.</pre></div></div>

<p>Ok, git is somewhat helpful here, I&#8217;ll fix the conflicts in foo.c and commit the result:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">$ <span style="color: #c20cb9; font-weight: bold;">vim</span> foo.c
$ git commit
foo.c: needs merge
foo.c: unmerged <span style="color: #7a0874; font-weight: bold;">&#40;</span>f388ef85dd65c39e4c76f5e597d3b67f7d1a0726<span style="color: #7a0874; font-weight: bold;">&#41;</span>
foo.c: unmerged <span style="color: #7a0874; font-weight: bold;">&#40;</span>6f4bf54585ae256236c0d6cfa9f114affb94313f<span style="color: #7a0874; font-weight: bold;">&#41;</span>
foo.c: unmerged <span style="color: #7a0874; font-weight: bold;">&#40;</span>06c974ebbfc04394f4fad8a6dcb31e64866fa1bf<span style="color: #7a0874; font-weight: bold;">&#41;</span>
error: Error building trees</pre></div></div>

<p>Ok, maybe it&#8217;s obvious to experienced git users what my error is here, but git&#8217;s error message here is worse than unhelpful &#8212; it&#8217;s downright confusing.  <i>I</i> think I&#8217;ve resolved the conflict, but all git can think to do is tell me is that &#8220;foo.c needs merge&#8221; and spit some SHA1&#8217;s at me.  It gives me absolutely no help about what I need to do to fix the problem.</p>
<p>Suppose that I want to resolve the merge by using either my version or their version verbatim (&#8221;accept mine&#8221;/&#8221;accept theirs&#8221;):</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">$ git checkout foo.c
error: path <span style="color: #ff0000;">'foo.c'</span> is unmerged</pre></div></div>

<p>Again, unhelpful (and in this case, what I&#8217;m trying to say actually makes sense, git just won&#8217;t let me do it).</p>
<h3>Interface for working with the index almost universally confusing</h3>
<p>I understand the difference between the working directory, the index, and the committed tree pretty well.  But I cannot for the life of me remember the difference between:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">$ git reset <span style="color: #660033;">--soft</span>
$ git reset <span style="color: #660033;">--hard</span>
$ git reset <span style="color: #660033;">--mixed</span></pre></div></div>

<p>I can barely keep them straight while I&#8217;m reading the manpage.  &#8220;Soft&#8221; resets HEAD but not the working directory or index.  &#8220;Mixed&#8221; resets HEAD and the index, but not the working directory.  &#8220;Hard&#8221; reset HEAD, the index, and the working directory.</p>
<p>In conclusion, this isn&#8217;t meant to be an exhaustive list of problems with git&#8217;s interface, it&#8217;s more meant to be a microcosm.  Git&#8217;s interface is not intuitive or easy to learn, and its error messages are not helpful.  Which is too bad, because as I said I think Git is solid technology.  I just hope someone writes a better porcelain for it.  I&#8217;m not talking about evolutionary changes, I think the suite of top-level commands (checkout, branch, merge, pull, reset, commit, etc) needs to be redesigned from scratch.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2009/07/30/gits-needs-a-new-interface/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>The Perils of Writing Good Documentation</title>
		<link>http://blog.reverberate.org/2009/07/30/the-perils-of-writing-good-documentation/</link>
		<comments>http://blog.reverberate.org/2009/07/30/the-perils-of-writing-good-documentation/#comments</comments>
		<pubDate>Thu, 30 Jul 2009 19:41:36 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=250</guid>
		<description><![CDATA[I&#8217;ve been thinking about documentation lately, and I feel unsatisfied with the options I currently have available to me for writing and publishing documents.  This dissatisfaction is not too well defined; I can&#8217;t put my finger on exactly what I want, but when I look at my options I&#8217;m not too excited about any [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been thinking about documentation lately, and I feel unsatisfied with the options I currently have available to me for writing and publishing documents.  This dissatisfaction is not too well defined; I can&#8217;t put my finger on exactly what I want, but when I look at my options I&#8217;m not too excited about any of them.</p>
<p>When I say &#8220;documentation,&#8221; I am talking about several slightly different things:</p>
<ul>
<li><strong>Project Homepage for projects like upb and Gazelle.</strong>  The goal of a project homepage is to answer the question &#8220;what is this project and why should I use it?&#8221;  It should also be attractive enough for a person to feel like this project is high-quality stuff.  And of course it should point them to the relevant resources (downloads, source tree, bug tracker, etc).  Good examples: <a href="http://git-scm.com/">http://git-scm.com/</a>, <a href="http://www.ruby-lang.org/en/">http://www.ruby-lang.org/en/</a>, <a href="http://www.gazelle-parser.org">http://www.gazelle-parser.org</a>.</li>
<li><strong>Manuals for projects like upb and Gazelle.</strong>  The goal of the manual is to provide both tutorial-like and reference-like information about how to use the software.  Manuals have a lot of structure and are a bit more formal, since they are intended to precisely explain how the software should be used.  They tend to track the software itself more closely than the other types of documentation, and are often even checked into the source tree.  For example, <a href="http://www.gazelle-parser.org/docs/manual.html">Gazelle&#8217;s Manual</a>.</li>
<li><strong>Design discussions/rationale.</strong>  This isn&#8217;t quite like a manual because instead of describing how the software works, they describe <i>why</i> the software is the way it is.  What are the alternatives to your approach and why did you pick the one you did?  What are the trade-offs?  I don&#8217;t think we see as much of this documentation as we should in the open-source world, but one good example is <a href="http://www.python.org/dev/peps/">the Python PEP process</a>.</li>
<li><strong>General articles about a particular subject.</strong>  I mean to write some documents that explain the basic ideas of parsing in a more approachable way than most parsing literature.  The literature can be a bit oblique, and I think I could do a good job of explaining it in a way that anyone can understand.</li>
</ul>
<p>The main options that I see available to me are:</p>
<ul>
<li><strong>Plain HTML</strong>.  Even I have come to the conclusion that this isn&#8217;t a good choice any more.  Too much work to write, too little flexibility, not enough bang for the buck.  Of the four documentation kinds above, the only one it remotely makes sense for is the &#8220;Project Homepage&#8221; case, but even that is too much work for me.  Creating the Gazelle homepage took too much work, and it&#8217;s not even that awesome.</li>
<li><strong>Personal Wiki / MarkDown</strong>.  By &#8220;Personal Wiki&#8221; I mean a wiki that you run yourself.  I put this in the same category as MarkDown because the two tend to have the same advantages/disadvantages.  The advantages are that you can get a reasonably attractive product with minimal effort, and they are fairly customizable.  The big disadvantage is that no two markdown languages are compatible, and there are so many to choose from (seriously: <a href="http://daringfireball.net/projects/markdown/">MarkDown</a>, <a href="http://docutils.sourceforge.net/rst.html">ReStructured Text</a>, <a href="http://hobix.com/textile/">Textile</a>, <a href="http://www.methods.co.nz/asciidoc/">AsciiDoc</a>, and those are just the ones I know off the top of my head).  It&#8217;s slightly scary to invest a lot into a format that is one of many possible contenders.</li>
<li><strong>Hosted Wiki</strong>, like the Google Code wiki or the GitHub wiki.  In this case hosting is taken care of, but you have less control over the look and more stuff cluttering your page.  Also, I can&#8217;t figure out why, but something about the design of Google Code makes me totally uninspired to write any documents in its wiki.  Another thing to note is that if a hosted wiki disappears (GitHub is only a startup, it could totally go under), it&#8217;s not clear what happens to your documents!</li>
<li><strong>DocBook</strong>, which is a little better than a MarkDown scheme because DocBook seems to have gained some critical mass.  Still, the DocBook people seem to have a mild-to-moderate case of XML-itis, and <a href="http://www.docbook.org/">the DocBook homepage</a> seems more concerned with spitting acronyms at you than telling you if DocBook is capable of something basic like theming your document in different ways.</li>
</ul>
<p>So as you can see, I&#8217;m not super satisified with any of my options.  The Gazelle Manual uses AsciiDoc, which seems to work ok, and I would probably choose it again.  I guess I&#8217;d be most inclined to choose either AsciiDoc or DocBook for writing general articles (I like <a href="http://www.cafepy.com/article/python_types_and_objects/python_types_and_objects.html">this article about Python types and objects</a> which was made using DocBook and is attractive).</p>
<p>I can&#8217;t decide what to do for the Project Homepage or the Design Discussions case.  I really want to have attractive Project Homepages, but I don&#8217;t have too much web design talent and HTML is too much work for me.  For Design Discussions I guess I&#8217;m leaning towards the GitHub wiki just because it pairs with the project hosting nicely, though I am somewhat uncomfortable with the idea that GitHub could disappear one day, and that moving my documents from the GitHub wiki somewhere else sounds like a headache.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2009/07/30/the-perils-of-writing-good-documentation/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Don&#8217;t forget -march!</title>
		<link>http://blog.reverberate.org/2009/07/19/dont-forget-march/</link>
		<comments>http://blog.reverberate.org/2009/07/19/dont-forget-march/#comments</comments>
		<pubDate>Mon, 20 Jul 2009 07:46:50 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=243</guid>
		<description><![CDATA[The -march flag itself is GCC-specific, but the general advice is universal: don&#8217;t forget to tell your compiler that it can take full advantage of your spiffy new CPU!  I should know better but I&#8217;ve been forgetting to specify -march when compiling upb.
Here&#8217;s an extreme example of why.  Take an innocent-looking function like:

int [...]]]></description>
			<content:encoded><![CDATA[<p>The <tt>-march</tt> flag itself is GCC-specific, but the general advice is universal: don&#8217;t forget to tell your compiler that it can take full advantage of your spiffy new CPU!  I should know better but I&#8217;ve been forgetting to specify -march when compiling upb.</p>
<p>Here&#8217;s an extreme example of why.  Take an innocent-looking function like:</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #993333;">int</span> float_to_int<span style="color: #009900;">&#40;</span><span style="color: #993333;">float</span> f<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
  <span style="color: #b1b100;">return</span> <span style="color: #009900;">&#40;</span><span style="color: #993333;">int</span><span style="color: #009900;">&#41;</span>f<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>Looks simple enough, right?  Unfortunately, <a href="http://www.mega-nerd.com/FPcast/">float -> int casts are stupidly expensive on x86</a>.  Without any -m flags, gcc compiles this to:</p>

<div class="wp_syntax"><div class="code"><pre class="asm" style="font-family:monospace;"><span style="color: #00007f; font-weight: bold;">sub</span>      $<span style="color: #0000ff;">0x8</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #00007f;">esp</span>       <span style="color: #666666; font-style: italic;">; allocate stack space</span>
<span style="color: #0000ff; font-weight: bold;">fnstcw</span>   <span style="color: #0000ff;">0x6</span><span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span><span style="color: #00007f;">esp</span><span style="color: #009900; font-weight: bold;">&#41;</span>        <span style="color: #666666; font-style: italic;">; save floating-point control word</span>
flds     $<span style="color: #0000ff;">0xc</span><span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span><span style="color: #00007f;">esp</span><span style="color: #009900; font-weight: bold;">&#41;</span>       <span style="color: #666666; font-style: italic;">; push floating-point param onto fp stack</span>
movzwl   <span style="color: #0000ff;">0x6</span><span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span><span style="color: #00007f;">esp</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #00007f;">eax</span>  <span style="color: #666666; font-style: italic;">; move prev fp control word into %eax</span>
<span style="color: #00007f; font-weight: bold;">mov</span>      $<span style="color: #0000ff;">0xc</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #00007f;">ah</span>        <span style="color: #666666; font-style: italic;">; set rounding mode of control word to &quot;truncate&quot;</span>
<span style="color: #00007f; font-weight: bold;">mov</span>      <span style="color: #339933;">%</span><span style="color: #00007f;">ax</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">0x4</span><span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span><span style="color: #00007f;">esp</span><span style="color: #009900; font-weight: bold;">&#41;</span>   <span style="color: #666666; font-style: italic;">; save it *back* to the stack</span>
<span style="color: #0000ff; font-weight: bold;">fldcw</span>    <span style="color: #0000ff;">0x4</span><span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span><span style="color: #00007f;">esp</span><span style="color: #009900; font-weight: bold;">&#41;</span>        <span style="color: #666666; font-style: italic;">; set the floating-point control word to truncate</span>
<span style="color: #0000ff; font-weight: bold;">fistp</span>    <span style="color: #0000ff;">0x2</span><span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span><span style="color: #00007f;">esp</span><span style="color: #009900; font-weight: bold;">&#41;</span>        <span style="color: #666666; font-style: italic;">; store integer from the fp stack to the stack</span>
<span style="color: #0000ff; font-weight: bold;">fldcw</span>    <span style="color: #0000ff;">0x6</span><span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span><span style="color: #00007f;">esp</span><span style="color: #009900; font-weight: bold;">&#41;</span>        <span style="color: #666666; font-style: italic;">; set the fp control word back to what it was</span>
movzwl   <span style="color: #0000ff;">0x2</span><span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span><span style="color: #00007f;">esp</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #00007f;">eax</span>  <span style="color: #666666; font-style: italic;">; read the value into eax (the return value)</span>
<span style="color: #00007f; font-weight: bold;">add</span>      $<span style="color: #0000ff;">0x8</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #00007f;">esp</span>       <span style="color: #666666; font-style: italic;">; give the stack space back</span>
<span style="color: #00007f; font-weight: bold;">ret</span></pre></div></div>

<p>This would be funny if it weren&#8217;t so sad.  All these gymnastics are required because the cast is required to round down (according to the C standard), but that requires the x86&#8217;s floating point unit to be in a different mode than for most operations.</p>
<p>Compiling exactly the same code with <tt>-msse2</tt> allows the compiler to take advantage of an SSE-only instruction, and the above is replaced with:</p>

<div class="wp_syntax"><div class="code"><pre class="asm" style="font-family:monospace;">cvttss2si  <span style="color: #0000ff;">0x4</span><span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span><span style="color: #00007f;">esp</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #00007f;">eax</span>     <span style="color: #666666; font-style: italic;">; convert value to integer with truncation</span>
<span style="color: #00007f; font-weight: bold;">ret</span></pre></div></div>

<p>The difference in this case is astounding. Hopefully this will motivate you never to forget the <tt>-march</tt> flag!</p>
<p>The right thing to do in my case is compile with -march=core2.  When I compile with -march=core2 or -msse3, the compiler to emits the not-quite-as-terse:</p>

<div class="wp_syntax"><div class="code"><pre class="asm" style="font-family:monospace;"><span style="color: #00007f; font-weight: bold;">sub</span>     $<span style="color: #0000ff;">0x4</span><span style="color: #339933;">,%</span><span style="color: #00007f;">esp</span>
flds    <span style="color: #0000ff;">0x8</span><span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span><span style="color: #00007f;">esp</span><span style="color: #009900; font-weight: bold;">&#41;</span>
fisttpl <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span><span style="color: #00007f;">esp</span><span style="color: #009900; font-weight: bold;">&#41;</span>
<span style="color: #00007f; font-weight: bold;">mov</span>     <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span><span style="color: #00007f;">esp</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,%</span><span style="color: #00007f;">eax</span>
<span style="color: #00007f; font-weight: bold;">add</span>     $<span style="color: #0000ff;">0x4</span><span style="color: #339933;">,%</span><span style="color: #00007f;">esp</span>
<span style="color: #00007f; font-weight: bold;">ret</span></pre></div></div>

<p>I&#8217;m really not sure why gcc prefers this version when sse3 is available.  It seems to be more work than the sse2 version.  I tend to believe gcc know what it&#8217;s doing here, but I&#8217;d love to learn why.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2009/07/19/dont-forget-march/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Gazelle is going to love SSE 4.2</title>
		<link>http://blog.reverberate.org/2009/07/18/gazelle-is-going-to-love-sse-4-2/</link>
		<comments>http://blog.reverberate.org/2009/07/18/gazelle-is-going-to-love-sse-4-2/#comments</comments>
		<pubDate>Sat, 18 Jul 2009 21:09:09 +0000</pubDate>
		<dc:creator>Josh</dc:creator>
				<category><![CDATA[Gazelle]]></category>
		<category><![CDATA[Hardware]]></category>

		<guid isPermaLink="false">http://blog.reverberate.org/?p=227</guid>
		<description><![CDATA[SSE 4.2 includes text processing instructions.  In the words of Ars Technica:
Intel has added a number of new instructions to Nehalem and it has sped up others. The 4.2 version of Intel&#8217;s SSE vector extensions takes the x86 ISA back to the future just a bit by adding new string manipulation instructions. I say [...]]]></description>
			<content:encoded><![CDATA[<p>SSE 4.2 includes <a href="http://www.reghardware.co.uk/2008/03/18/intel_sse_4_text_tweaks/">text processing instructions</a>.  In the words of <a href="http://arstechnica.com/hardware/news/2008/04/what-you-need-to-know-about-nehalem.ars/3">Ars Technica</a>:</p>
<blockquote><p>Intel has added a number of new instructions to Nehalem and it has sped up others. The 4.2 version of Intel&#8217;s SSE vector extensions takes the x86 ISA back to the future just a bit by adding new string manipulation instructions. I say &#8220;back to the future&#8221; because ISA-level support for string processing is a hallmark of CISC architectures that was actively deprecated in the post-RISC years; typically, when a writer wants to give an example of crufty old corners of the x86 ISA that have caused pain for chip architects, string manipulation instructions are what he or she reaches for. But the new SSE 4.2 string instructions are aimed at accelerating XML processing, which makes them Web-friendly and therefore modern (i.e., not crufty).</p></blockquote>
<p>I chuckled a bit when I read this.  I&#8217;m not very purist when it comes to hardware.  If these instructions will make my parsers faster, then they sound great to me!</p>
<p>The four new instructions are:</p>
<ul>
<li><strong>pcmpestri</strong>: packed compare of <em>explicit</em> length strings, returning <em>index</em></li>
<li><strong>pcmpestrm</strong>: packed compare of <em>explicit</em> length strings, returning <em>mask</em></li>
<li><strong>pcmpistri</strong>: packed compare of <em>implicit</em> length strings, returning <em>index</em></li>
<li><strong>pcmpistrm</strong>: packed compare of <em>implicit</em> length strings, returning <em>mask</em></li>
</ul>
<p>The variants are as follows:</p>
<ul>
<li><em>implicit</em> length strings are NULL-terminated, <em>explicit</em> strings have an explicit length (ie. the whole input register).</li>
<li>they can return an <em>index</em> into the source string (if you were searching for something) or a <em>mask</em> (if you wanted to test each character of the input</li>
</ul>
<p>Both let you scan a 128-bit SSE register (treating it as either 16 8-bit characters or 8 16-bit characters) and perform all kinds of searches/comparisons.  The instructions are configurable; you supply a control word that specifies all of the different variations of the instructions.  For example, are the input values signed or unsigned, are we comparing against ranges or specific values, etc.</p>
<p>The reciprocal throughput of these instructions is high (2 cycles) but the latency is annoyingly slow (9 cycles).  This means that you have to wait nine cycles after issuing the instruction before you can use the result.  It&#8217;s hard to think of too many useful things you can execute in parallel while you&#8217;re waiting for that answer.  As a side note, these figures come from Intel&#8217;s <a href="http://www.intel.com/products/processor/manuals/">Intel® 64 and IA-32 Architectures Optimization Reference Manual</a>, which says that the latency number is a worst case estimate:</p>
<blockquote><p>Actual performance of these instructions by the out-of-order core execution unit can range from somewhat faster to significantly faster than the latency data shown in these tables.</p></blockquote>
<p>I&#8217;m not enough of a hardware geek to know what to actually expect.</p>
<p>Still, that&#8217;s nine cycles to wait before getting a lot of really useful information.  In addition to returning the index or mask, the instructions set several of the flags in useful ways.</p>
<p>So what processors have SSE 4.2?  Or in other words, how long will my impatient self have to wait to try them out?  Apparently SSE 4.2 is available on <a href="http://en.wikipedia.org/wiki/Intel_Core_2#Penryn">Penryn</a>, which is the second-gen Core 2, which debuted in 2007/2008.  It uses a &#8220;45 nm process&#8221;, which I&#8217;m sure means something to hardware geeks but not to me.  All I know is that it&#8217;s not the Core 2 that&#8217;s inside the MacBook Pro sitting on my lap.  And of course SSE 4.2 is in the new <a href="http://en.wikipedia.org/wiki/Intel_Nehalem_(microarchitecture)">Nehalem</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.reverberate.org/2009/07/18/gazelle-is-going-to-love-sse-4-2/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
