Archive

Archive for the ‘upb’ Category

Porting upb to C++?

November 28th, 2009 Josh 7 comments

I am on the verge of trying something I never thought I’d do. I’m considering porting upb to C++.

My reasons aren’t ideological, they are highly practical. Basically I am realizing that while object-oriented C is OK for a while, it’s very weak at inheritance. Inheritance in C involves a lot of casting, duplicated code and/or macros, and careful discipline. The main problems with this are:

  • the code gets longer and less readable
  • the code involves more possibly-unsafe operations like casts

Both of these problems make the code ultimately more difficult to audit for security. And getting upb audited for security is something I plan to do very soon.

I am coming to believe that porting to C++ would make upb smaller (in lines of code) and easier for verify for security. However, there are a few major disadvantages that are giving me pause:

  • there are still some contexts in which C++ is a no-go, like the Linux kernel, embedded systems that only have a C compiler (but no C++), or projects that want to stay C-only. Doing this port would make upb unsuitable for these contexts.
  • projects that are currently C-only would need to create C++ source files to call upb APIs, and will have to link in the C++ runtime
  • (possible) C++ could result in a larger binary.

When I look at the downsides though, they don’t seem to pertain to my initial goals of making upb useful for Python, Lua, Ruby, etc. extensions, and for use inside Google. Being useful for really restricted embedded systems is a far-off use case. So it’s sounding like porting to C++ is the right thing to do.

I hope it significantly reduces the line count, as I expect it will. That will make me feel better about giving up the minimalism of C. I will definitely be compiling with -fno-exceptions -fno-rtti -fvisibility-inlines-hidden on gcc. I also won’t be using any of the C++ standard library (not even <string>).

Categories: upb Tags:

Site Overhaul / upb status

July 6th, 2009 Josh No comments

You’ll notice that the site has a new theme, a new title, maybe a little bit of a new attitude (being “josh the outspoken” isn’t all it’s cracked up to be). All this is in anticipation of upb’s release, and of a lot of follow-up work that I couldn’t be more excited about.

A lot of people mentioned that my last theme didn’t handle preview correctly, which I agree was quite annoying — sorry about that! Unfortunately my new theme doesn’t have preview at all. Finding the perfect WordPress theme is harder than it seems.

So as I mentioned I expect the first release of upb to come within the next week or two. It will only have parsing (not serializing), but most of the core functionality besides that is done. I expect that the core will stay relatively static, and all the enhancements/innovations will come from modules that are layered on top of that. For example, some of the things I have in mind are:

  • Extensions for every language known to man. The first ones will be Python, Lua, and Ruby (in that order). These will take some thought to get just right, because I want the memory management to be tightly integrated. In particular, I want protobuf objects in these languages to be able to reference string data from the original protobufs, but all be memory-managed appropriately.
  • Support for the Protocol Buffer text format.
  • An easy way to parse only selected fields. For example, I want to be able to say (in Python):
    time, url = logrecord.parse_fields(proto_data, "time", "url")

    This can be optimized out the wazoo, because you can skip all fields and submessages that you don’t care about, and you can stop parsing once you have the data you need.

And of course, performance, performance, performance. I can’t wait to get my hands on some real profiles and see what my next optimization target will be.

Categories: upb Tags:

Amazing Tools: Massif, a heap profiler

July 6th, 2009 Josh 3 comments

I love the feeling of discovering an amazing new tool. It’s a pleasant surprise to have some task you want to achieve — one that you could do manually, given enough time — and find that some tool you didn’t even know existed will make the solution easy.

Tonight I was working on the upb compiler (and upb’s first release is impending, by the way), and ran it under Valgrind as I frequently do to catch memory leaks. There weren’t any leaks, but I did notice that the program had allocated 80kb of memory over the course of its run.

People who are less OCD than I would probably shrug off 80kb of memory. But intuitively 80kb sounded high to me given how much data this program was dealing with, and I wanted to know where all those allocations were coming from.

I didn’t know off the top of my head how I might profile where my memory allocations were happening, but I had a hunch that Valgrind would be there for me. And sure enough, one of the tools included in Valgrind is Massif: a heap profiler.

A few short shell commands later:

$ valgrind --tool=massif ./upbc
$ ms_print massif.out.17604 | less

…and I had this call graph sitting in front of me:

93.72% (74,243B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->58.49% (46,336B) 0x40194E: upb_table_init (upb_table.c:34)
| ->38.13% (30,208B) 0x4019E9: upb_strtable_init (upb_table.c:45)
| | ->29.09% (23,040B) 0x40343A: upb_msg_init (upb_msg.c:44)
| | | ->29.09% (23,040B) 0x402CEE: insert_message (upb_context.c:193)
| | |   ->25.85% (20,480B) 0x402E80: addfd (upb_context.c:223)
| | |   | ->12.93% (10,240B) 0x40263F: upb_context_init (upb_context.c:30)
| | |   | | ->12.93% (10,240B) 0x40167C: main (upbc.c:195)
| | |   | |
| | |   | ->12.93% (10,240B) 0x403135: upb_context_parsefds (upb_context.c:283)
| | |   |   ->12.93% (10,240B) 0x4016BA: main (upbc.c:198)
| | |   |
| | |   ->03.23% (2,560B) 0x402D56: insert_message (upb_context.c:203)
| | |     ->03.23% (2,560B) 0x402E80: addfd (upb_context.c:223)
| | |       ->01.62% (1,280B) 0x40263F: upb_context_init (upb_context.c:30)
| | |       | ->01.62% (1,280B) 0x40167C: main (upbc.c:195)
| | |       |
| | |       ->01.62% (1,280B) 0x403135: upb_context_parsefds (upb_context.c:283)
| | |         ->01.62% (1,280B) 0x4016BA: main (upbc.c:198)
| | |
| | ->05.82% (4,608B) 0x4046E0: upb_enum_init (upb_enum.c:14)
| | | ->05.82% (4,608B) 0x402C35: insert_enum (upb_context.c:167)
| | |   ->05.82% (4,608B) 0x402DBC: insert_message (upb_context.c:209)
| | |     ->05.82% (4,608B) 0x402E80: addfd (upb_context.c:223)
| | |       ->03.23% (2,560B) 0x403135: upb_context_parsefds (upb_context.c:283)
| | |       | ->03.23% (2,560B) 0x4016BA: main (upbc.c:198)
| | |       |
| | |       ->02.59% (2,048B) 0x40263F: upb_context_init (upb_context.c:30)
| | |         ->02.59% (2,048B) 0x40167C: main (upbc.c:195)
| | |
| | ->01.62% (1,280B) 0x402612: upb_context_init (upb_context.c:27)
| | | ->01.62% (1,280B) 0x40167C: main (upbc.c:195)
| | |
| | ->01.62% (1,280B) 0x402629: upb_context_init (upb_context.c:28)
| | | ->01.62% (1,280B) 0x40167C: main (upbc.c:195)
| | |
| | ->00.00% (0B) in 1+ places, all below ms_print's threshold (01.00%)
| |
| ->20.36% (16,128B) 0x4019C4: upb_inttable_init (upb_table.c:40)
|   ->17.45% (13,824B) 0x403417: upb_msg_init (upb_msg.c:42)
|   | ->17.45% (13,824B) 0x402CEE: insert_message (upb_context.c:193)
|   |   ->15.51% (12,288B) 0x402E80: addfd (upb_context.c:223)
|   |   | ->07.76% (6,144B) 0x40263F: upb_context_init (upb_context.c:30)
|   |   | | ->07.76% (6,144B) 0x40167C: main (upbc.c:195)
|   |   | |
|   |   | ->07.76% (6,144B) 0x403135: upb_context_parsefds (upb_context.c:283)
|   |   |   ->07.76% (6,144B) 0x4016BA: main (upbc.c:198)
|   |   |
|   |   ->01.94% (1,536B) 0x402D56: insert_message (upb_context.c:203)
|   |     ->01.94% (1,536B) 0x402E80: addfd (upb_context.c:223)
|   |       ->01.94% (1,536B) in 2 places, all below massif's threshold (01.00%)
|   |
|   ->02.91% (2,304B) 0x4046F5: upb_enum_init (upb_enum.c:15)
|     ->02.91% (2,304B) 0x402C35: insert_enum (upb_context.c:167)
|       ->02.91% (2,304B) 0x402DBC: insert_message (upb_context.c:209)
|         ->02.91% (2,304B) 0x402E80: addfd (upb_context.c:223)
|           ->01.62% (1,280B) 0x403135: upb_context_parsefds (upb_context.c:283)
|           | ->01.62% (1,280B) 0x4016BA: main (upbc.c:198)
|           |
|           ->01.29% (1,024B) 0x40263F: upb_context_init (upb_context.c:30)
|             ->01.29% (1,024B) 0x40167C: main (upbc.c:195)
|
->07.98% (6,324B) 0x403AF1: upb_msgdata_new (upb_msg.c:158)
| ->07.96% (6,308B) 0x403F05: upb_msg_reuse_submsg (upb_msg.c:241)
| | ->07.96% (6,308B) 0x404497: submsg_start_cb (upb_msg.c:325)
| |   ->07.96% (6,308B) 0x4056E8: push_stack_frame (upb_parse.c:288)
| |     ->07.96% (6,308B) 0x4057ED: parse_delimited (upb_parse.c:316)
| |       ->07.96% (6,308B) 0x405A91: upb_parse (upb_parse.c:369)
| |         ->07.96% (6,308B) 0x4045BA: upb_msg_parse (upb_msg.c:352)
| |           ->07.96% (6,308B) 0x404631: upb_alloc_and_parse (upb_msg.c:361)
| |             ->07.96% (6,308B) 0x4030CA: upb_context_parsefds (upb_context.c:274)
| |               ->07.96% (6,308B) 0x4016BA: main (upbc.c:198)

This might look somewhat daunting if you’re not as deeply familiar with upb as I am. But it immediately told me what I wanted to know: almost 60% of the memory is being used by upb’s int->record and string->record hash tables. That seems a little bit high. And it’s being allocated right when the tables are constructed (upb_table_init), not as a result of a resize.

Breaking open the code, I found a table minimum size that I had imposed as an attempt to limit the number of resizes. Resizes have a high overhead — in my hash table implementation, they result in everything being re-hashed and all the memory being re-allocated, so I had imposed a minimum size of 16 in my constructor:

void upb_table_init(struct upb_table *t, uint32_t size, uint16_t entry_size)
{
  t->count = 0;
  t->entry_size = entry_size;
  t->size_lg2 = 1;
  while(size >>= 1) t->size_lg2++;
  t->size_lg2 = max(t->size_lg2, 4);  /* Min size of 16. */

When I inserted some print statements to compare how often this minimum was taking effect, I saw that there were tons of tables that were trying to allocate just a few (0-10) entries. With my minimum, they were always being allocated at least 16. And what’s more, in all these cases I knew up front how many entries I planned to insert! So there was no danger of a resize anyway.

I removed this minimum size and the memory usage of my program dipped to about 55kb (from 80kb — a ~30% reduction!) That seems a bit more reasonable, though I’m sure it’s not the last of my efforts to make sure upb’s memory footprint stays small.

Anyway, the point of this entry is that now I know about a new tool (Massif) that is at my disposal whenever I need it. It’s easy to use and requires almost no set-up. I can run it on a whim whenever I want to collect memory usage data. I have just become a little bit more resourceful.

Valgrind has tons of spiffy tools of this sort that ship with it. I wonder how many people know about them.

Another tool I had a similar reaction to was WireShark. I was experiencing a redirect loop bug in my browser and wanted to submit a useful report to the developers. The useful information here is the contents of all the HTTP traffic that was occurring during the loop. I fired up WireShark (as a first-time user) and found out relatively quickly how to sniff the network interface, capture my HTTP session, dump it at the HTTP layer (as opposed to the TCP layer or something else), and dumped it to a text file. Massively spiffy.

Learn an amazing new tool today! And then tell me about it!

Categories: upb Tags:

μpb Status Update

June 26th, 2009 Josh No comments

I haven’t posted many status updates for upb lately. Sometimes that means I’m busy with other things, but right now it means I am working on it feverishly and can hardly stand to take a break from it.

I’m extremely happy with how it’s shaping up. It’s getting more and more complete, and yet still staying quite “micro.” Notably, it just recently crossed 1500 lines of C (I don’t count auto-generated code in this), and it compiles to not quite 10kb of object code. Throw in the auto-generated code (reflection data that describes the types in descriptor.proto — this is what allows loading other proto definitions at runtime) and this jumps to about 16kb. Keep in mind that this effectively has all the major features of the main protobuf implementation (albeit packaged in different ways), but the main protobuf implementation is just over 1MB of object code!

I still think I can reach the main protobuf implementation’s performance, or maybe even exceed it. I will consider the project a failure if I can’t get within 15% of its performance.

Anyway, just wanted to be sure everyone knows I’m still alive and very much working on this. Now back to work.

Categories: upb Tags:

Bit-fields in C99

June 26th, 2009 Josh 2 comments

Recently I came upon some spirited discussion on reddit concerning a blog post that discussed the use of bit-fields in C. As a quick refresher to anyone unfamiliar or rusty on bit-fields, they are a construct in C that lets you specify how many bits different members of a structure should be allocated:

struct foo {
  unsigned int a:1;
  unsigned int b:2;
}

Without the bit-field specifiers (the “:1″ and “:2″ above) this structure would be at least four bytes long, and both “a” and “b” would be capable of representing integers from 0 to 65535. With the bit-field specifiers the structure is likely only one byte long, and “a” and “b” can only represent 0-1 and 0-3, respectively. Bit fields are a way to store many distinct values compactly inside a struct, to save memory.

I am using bitfields in upb to store flags for whether each field is set or not. When I generate a structure definition for a specific proto type, code in the header file looks something like this:

struct google_protobuf_DescriptorProto {
  union {
    uint8_t bytes[1];
    struct {
      bool name:1;  /* = 1, optional. */
      bool field:1;  /* = 2, repeated. */
      bool nested_type:1;  /* = 3, repeated. */
      bool enum_type:1;  /* = 4, repeated. */
      bool extension_range:1;  /* = 5, repeated. */
      bool extension:1;  /* = 6, repeated. */
      bool options:1;  /* = 7, optional. */
    } has;
  } set_flags;
  struct upb_string* name;
  UPB_STRUCT_ARRAY(google_protobuf_FieldDescriptorProto)* field;
  UPB_STRUCT_ARRAY(google_protobuf_FieldDescriptorProto)* extension;
  UPB_STRUCT_ARRAY(google_protobuf_DescriptorProto)* nested_type;
  UPB_STRUCT_ARRAY(google_protobuf_EnumDescriptorProto)* enum_type;
  UPB_STRUCT_ARRAY(google_protobuf_DescriptorProto_ExtensionRange)* extension_range;
  google_protobuf_MessageOptions* options;
};

(that’s from descriptor.h, which is automatically generated from descriptor.proto from the official protobuf implementation).

I “union” the bitfield with bytes so I can set the bytes en-masse more efficiently. This seems like a really nice solution, because then client code that wants to test whether a field is set in a protobuf can write code like:

if(fd->set_flags.has.message_type) {
   // ...
}

So unfortunately the commenters on Reddit were pretty down on bit-fields and the aforementioned article. The points they raised filled me with despair that I would have to abandon bit-fields. This is bad because the only real alternative I would have for this kind of code generation is to generate explicit getters and setters for each bit of each structure! The code would then immediately become quite uglified:

if(google_protobuf_DescriptorProto_has_message_type(fd)) {
  // ...
}

Besides being longer and forcing you to re-type the whole name of the enclosed type, this approach forces you to generate lots of nearly-identical code and pollute the symbol namespace with tons of these useless functions that add no real value. I was bumming about this.

But as I investigated the supposed problems with bit-fields, I became convinced that the problems were tractable and that I could in good conscience press forward with my plans to use them as intended.

Here Be Dragons

“Almost everything about [bit] fields is implementation-dependent.” –K&R, Section 6.9

From my reading of the C99 standard, here are the guarantees you get (or don’t get) with bit-fields.

1. Bit-fields will be packed as tightly as possible, provided they don’t cross storage unit boundaries: YES. From the standard: (6.7.2.1 #10):

An implementation may allocate any addressable storage unit large enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit.

The first part just means that the compiler can allocate any number of bytes to a bit-field, as long as it has enough bits. So if you write:

struct foo {
  unsigned int a:8;
}

…a will be at least 1 byte long, but could be more. But the second part is where we get the nice guarantee that:

struct foo {
  unsigned int a:1;
  unsigned int b:2;
}

…will always be packed into a single byte. sizeof(struct foo) could be greater than 1, but you are guaranteed that a and b are in the same byte.

2. Bits are allocated low-to-high (or high-to-low): NO. This is not guaranteed by the spec, so an implementation could choose to allocate bits either starting at the high bit or the low bit of each byte. The following program will test an implementation to see which it is:

#include <stdio.h>
int main() {
  union {
    struct {
      unsigned int a:1;
      unsigned int b:2;
      unsigned int c:3;
    } bitfield;
    unsigned char ch;
  } u = {.ch = 0};
 
  u.bitfield.a = 1;
  u.bitfield.c = 7;
  switch(u.ch) {
    case 0x39: printf("Bits allocated LOW-TO-HIGH.\n"); break;
    case 0x9C: printf("Bits allocated HIGH-TO-LOW.\n"); break;
    default: printf("Oddball allocation: 0x%02hhx.\n", u.ch); break;
  }
  return 0;
}

My Goal

What I am trying to do is create a C struct like the above, but also be able to perform data-driven reflection on that C struct based on generic routines like:

INLINE bool upb_msg_is_set(void *s, int index)
{
  return ((char*)s)[index / 8] & (1 << (index % 8));
}

This means that my reflection routines like the one above need to be able to mimic the implementation-defined bit ordering as described above. The routine above does not — it assumes that the bits are assigned from low to high. But a few #ifdefs should make the above not too difficult.

Performance

In theory, the ideal compiler should love bit-fields because they give it maximum information with which to optimize loads and stores. I see a lot of people claim that some compilers generate bad code for bit-fields, but that is a possibility with any construct you use. From what I can see GCC generates very good code for bit-fields.

So my conclusion is that the hurdles of bit-fields are tractable, and unless some new information comes my way, I plan to move forward with my plan to use them.

Categories: upb Tags: