Amazing Tools: Massif, a heap profiler
I love the feeling of discovering an amazing new tool. It’s a pleasant surprise to have some task you want to achieve – one that you could do manually, given enough time – and find that some tool you didn’t even know existed will make the solution easy.
Tonight I was working on the upb compiler (and upb’s first release is impending, by the way), and ran it under Valgrind as I frequently do to catch memory leaks. There weren’t any leaks, but I did notice that the program had allocated 80kb of memory over the course of its run.
People who are less OCD than I would probably shrug off 80kb of memory. But intuitively 80kb sounded high to me given how much data this program was dealing with, and I wanted to know where all those allocations were coming from.
I didn’t know off the top of my head how I might profile where my memory allocations were happening, but I had a hunch that Valgrind would be there for me. And sure enough, one of the tools included in Valgrind is Massif: a heap profiler.
A few short shell commands later:
$ valgrind --tool=massif ./upbc
$ ms_print massif.out.17604 | less
…and I had this call graph sitting in front of me:
93.72% (74,243B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc. ->58.49% (46,336B) 0x40194E: upb_table_init (upb_table.c:34) | ->38.13% (30,208B) 0x4019E9: upb_strtable_init (upb_table.c:45) | | ->29.09% (23,040B) 0x40343A: upb_msg_init (upb_msg.c:44) | | | ->29.09% (23,040B) 0x402CEE: insert_message (upb_context.c:193) | | | ->25.85% (20,480B) 0x402E80: addfd (upb_context.c:223) | | | | ->12.93% (10,240B) 0x40263F: upb_context_init (upb_context.c:30) | | | | | ->12.93% (10,240B) 0x40167C: main (upbc.c:195) | | | | | | | | | ->12.93% (10,240B) 0x403135: upb_context_parsefds (upb_context.c:283) | | | | ->12.93% (10,240B) 0x4016BA: main (upbc.c:198) | | | | | | | ->03.23% (2,560B) 0x402D56: insert_message (upb_context.c:203) | | | ->03.23% (2,560B) 0x402E80: addfd (upb_context.c:223) | | | ->01.62% (1,280B) 0x40263F: upb_context_init (upb_context.c:30) | | | | ->01.62% (1,280B) 0x40167C: main (upbc.c:195) | | | | | | | ->01.62% (1,280B) 0x403135: upb_context_parsefds (upb_context.c:283) | | | ->01.62% (1,280B) 0x4016BA: main (upbc.c:198) | | | | | ->05.82% (4,608B) 0x4046E0: upb_enum_init (upb_enum.c:14) | | | ->05.82% (4,608B) 0x402C35: insert_enum (upb_context.c:167) | | | ->05.82% (4,608B) 0x402DBC: insert_message (upb_context.c:209) | | | ->05.82% (4,608B) 0x402E80: addfd (upb_context.c:223) | | | ->03.23% (2,560B) 0x403135: upb_context_parsefds (upb_context.c:283) | | | | ->03.23% (2,560B) 0x4016BA: main (upbc.c:198) | | | | | | | ->02.59% (2,048B) 0x40263F: upb_context_init (upb_context.c:30) | | | ->02.59% (2,048B) 0x40167C: main (upbc.c:195) | | | | | ->01.62% (1,280B) 0x402612: upb_context_init (upb_context.c:27) | | | ->01.62% (1,280B) 0x40167C: main (upbc.c:195) | | | | | ->01.62% (1,280B) 0x402629: upb_context_init (upb_context.c:28) | | | ->01.62% (1,280B) 0x40167C: main (upbc.c:195) | | | | | ->00.00% (0B) in 1+ places, all below ms_print's threshold (01.00%) | | | ->20.36% (16,128B) 0x4019C4: upb_inttable_init (upb_table.c:40) | ->17.45% (13,824B) 0x403417: upb_msg_init (upb_msg.c:42) | | ->17.45% (13,824B) 0x402CEE: insert_message (upb_context.c:193) | | ->15.51% (12,288B) 0x402E80: addfd (upb_context.c:223) | | | ->07.76% (6,144B) 0x40263F: upb_context_init (upb_context.c:30) | | | | ->07.76% (6,144B) 0x40167C: main (upbc.c:195) | | | | | | | ->07.76% (6,144B) 0x403135: upb_context_parsefds (upb_context.c:283) | | | ->07.76% (6,144B) 0x4016BA: main (upbc.c:198) | | | | | ->01.94% (1,536B) 0x402D56: insert_message (upb_context.c:203) | | ->01.94% (1,536B) 0x402E80: addfd (upb_context.c:223) | | ->01.94% (1,536B) in 2 places, all below massif's threshold (01.00%) | | | ->02.91% (2,304B) 0x4046F5: upb_enum_init (upb_enum.c:15) | ->02.91% (2,304B) 0x402C35: insert_enum (upb_context.c:167) | ->02.91% (2,304B) 0x402DBC: insert_message (upb_context.c:209) | ->02.91% (2,304B) 0x402E80: addfd (upb_context.c:223) | ->01.62% (1,280B) 0x403135: upb_context_parsefds (upb_context.c:283) | | ->01.62% (1,280B) 0x4016BA: main (upbc.c:198) | | | ->01.29% (1,024B) 0x40263F: upb_context_init (upb_context.c:30) | ->01.29% (1,024B) 0x40167C: main (upbc.c:195) | ->07.98% (6,324B) 0x403AF1: upb_msgdata_new (upb_msg.c:158) | ->07.96% (6,308B) 0x403F05: upb_msg_reuse_submsg (upb_msg.c:241) | | ->07.96% (6,308B) 0x404497: submsg_start_cb (upb_msg.c:325) | | ->07.96% (6,308B) 0x4056E8: push_stack_frame (upb_parse.c:288) | | ->07.96% (6,308B) 0x4057ED: parse_delimited (upb_parse.c:316) | | ->07.96% (6,308B) 0x405A91: upb_parse (upb_parse.c:369) | | ->07.96% (6,308B) 0x4045BA: upb_msg_parse (upb_msg.c:352) | | ->07.96% (6,308B) 0x404631: upb_alloc_and_parse (upb_msg.c:361) | | ->07.96% (6,308B) 0x4030CA: upb_context_parsefds (upb_context.c:274) | | ->07.96% (6,308B) 0x4016BA: main (upbc.c:198)
This might look somewhat daunting if you’re not as deeply familiar with upb as I am. But it immediately told me what I wanted to know: almost 60% of the memory is being used by upb’s int->record and string->record hash tables. That seems a little bit high. And it’s being allocated right when the tables are constructed (upb_table_init), not as a result of a resize.
Breaking open the code, I found a table minimum size that I had imposed as an attempt to limit the number of resizes. Resizes have a high overhead – in my hash table implementation, they result in everything being re-hashed and all the memory being re-allocated, so I had imposed a minimum size of 16 in my constructor:
void upb_table_init(struct upb_table *t, uint32_t size,
uint16_t entry_size)
{
t->count = 0;
t->entry_size = entry_size;
t->size_lg2 = 1;
while(size >>= 1) t->size_lg2++;
t->size_lg2 = max(t->size_lg2, 4); /* Min size of 16. */
When I inserted some print statements to compare how often this minimum was taking effect, I saw that there were tons of tables that were trying to allocate just a few (0-10) entries. With my minimum, they were always being allocated at least 16. And what’s more, in all these cases I knew up front how many entries I planned to insert! So there was no danger of a resize anyway.
I removed this minimum size and the memory usage of my program dipped to about 55kb (from 80kb – a ~30% reduction!) That seems a bit more reasonable, though I’m sure it’s not the last of my efforts to make sure upb’s memory footprint stays small.
Anyway, the point of this entry is that now I know about a new tool (Massif) that is at my disposal whenever I need it. It’s easy to use and requires almost no set-up. I can run it on a whim whenever I want to collect memory usage data. I have just become a little bit more resourceful.
Valgrind has tons of spiffy tools of this sort that ship with it. I wonder how many people know about them.
Another tool I had a similar reaction to was WireShark. I was experiencing a redirect loop bug in my browser and wanted to submit a useful report to the developers. The useful information here is the contents of all the HTTP traffic that was occurring during the loop. I fired up WireShark (as a first-time user) and found out relatively quickly how to sniff the network interface, capture my HTTP session, dump it at the HTTP layer (as opposed to the TCP layer or something else), and dumped it to a text file. Massively spiffy.
Learn an amazing new tool today! And then tell me about it!