Bloaty McBloatface 1.0
Today I am releasing Bloaty McBloatface 1.0. Bloaty is a size profiler for binaries. It helps you peek into ELF/Mach-O binaries to see what is taking up space inside.
Bloaty has gotten lots new features, bugfixes, and overall improvements since I announced it in 2016. I listed these changes briefly on the release page, but I wanted to go into a bit more detail here.
Improving Data Quality
Perhaps the biggest overall improvement to Bloaty is its data quality. When I first announced Bloaty, I got very understandable complaints like this one:
I ran it and it gives an awful lot of “[None]”:
$ ~/d/bloaty/bloaty builder/virt-builder -d compileunits VM SIZE FILE SIZE -------------- -------------- 75.5% 1.96Mi [None] 3.67Mi 85.2% 8.7% 232Ki guestfs-c-actions.c 232Ki 5.3% 8.2% 219Ki guestfs.ml 219Ki 5.0% 2.0% 52.4Ki [Other] 52.4Ki 1.2% 1.3% 33.7Ki _none_ 33.7Ki 0.8% 0.7% 17.5Ki customize_cmdline.ml 17.5Ki 0.4% 0.6% 17.3Ki builder.ml 17.3Ki 0.4% 0.4% 11.8Ki customize_run.ml 11.8Ki 0.3% 0.4% 10.4Ki cmdline.ml 10.4Ki 0.2% 0.3% 7.08Ki firstboot.ml 7.08Ki 0.2% 0.2% 6.21Ki index-scan.c 6.21Ki 0.1% 0.2% 5.90Ki index_parser.ml 5.90Ki 0.1% 0.2% 5.15Ki sigchecker.ml 5.15Ki 0.1% 0.2% 4.87Ki getopt-c.c 4.87Ki 0.1% [...]
It’s a mixed OCaml/C executable, but I ran it on a build from the local directory and all debug symbols are still available.
Indeed, a profiler tool that has no idea what to say about 85.2% of the binary is not going to be very useful. This was Bloaty’s biggest weakness when I first released it.
At first I misunderstood the nature of this problem.
Bloaty’s design at the time was simple: it was reading
.debug_aranges
to assign ranges of the binary to
compilation units. DWARF’s .debug_aranges
section is an
{address range -> compileunit} map that debuggers use to
decide what compile unit a given function or data variable
is from, given its address.
The output above indicates that .debug_aranges
was only
covering about 15% of the binary. What gives? My theory at
the time was that .debug_aranges
should theoretically be
covering the whole binary, but was pretty incomplete for
some reason. It seemed like a compiler problem that I was
going to have to work around somehow.
Later I realized that .debug_aranges
is only meant for
identifying addresses of functions or data. Large portions
of the binary are not functions or program data! For example,
ELF/Mach-O binaries have all sort of stuff in them like:
- symbol tables
- relocations
- debug information
- unwind information
To get better results, I needed to find a way to break down these sections. I needed to determine which parts of the unwind information, for example, I could attribute to each function.
To achieve this, I had to parse the binary more thoroughly
than I had before. I had to learn to parse unwind
information (.eh_frame
and .eh_frame_hdr
sections),
which is a really esoteric and low-level thing to be doing.
I’ll quote my comment in the code about how tricky this
is to do correctly:
// Code to read the .eh_frame section. This is not technically DWARF, but it
// is similar to .debug_frame (which is DWARF) so it's convenient to put it
// here.
//
// The best documentation I can find for this format comes from:
//
// * http://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/ehframechpt.html
// * https://www.airs.com/blog/archives/460
//
// However these are both under-specified. Some details are not mentioned in
// either of these (for example, the fact that the function length uses the FDE
// encoding, but always absolute). libdwarf's implementation contains a comment
// saying "It is not clear if this is entirely correct". Basically the only
// thing you can trust for some of these details is the code that actually
// implements unwinding in production:
//
// * libunwind http://www.nongnu.org/libunwind/
// https://github.com/pathscale/libunwind/blob/master/src/dwarf/Gfde.c
// * LLVM libunwind (a different project!!)
// https://github.com/llvm-mirror/libunwind/blob/master/src/DwarfParser.hpp
// * libgcc
// https://github.com/gcc-mirror/gcc/blob/master/libgcc/unwind-dw2-fde.c
Once I implemented this parser, I could attribute the
.eh_frame
and .eh_frame_hdr
sections properly, and they
would no longer show up as [None]
.
I did the same thing for all the different kinds of DWARF debug info, for the symbol/string table and relocations. All of these are somewhat easier since they at least have clear standards that describe them.
But even that wasn’t enough. After implementing all of the
above, I still found that some parts of the data section
don’t have symbol table entries or debug info at all. Data
like string constants or other anonymous data can resist
being properly analyzed and attributed. To combat this,
Bloaty will actually disassemble the binary looking for
references to the data section. If a function references
part of .data
or .rodata
, then we can attribute that
part of the binary to the function that references it.
This was hard and detailed work, but it paid off. We can see the fruits of this labor if we do a hierarchical profile:
$ ./bloaty bloaty -d compileunits,sections
VM SIZE FILE SIZE
-------------- --------------
44.9% 2.07Mi [136 Others] 8.72Mi 33.7%
6.0% 281Ki protobuf/src/google/protobuf/descriptor.cc 4.07Mi 15.7%
0.0% 0 .debug_str 1.16Mi 28.4%
0.0% 0 .debug_info 1.01Mi 24.8%
0.0% 0 .debug_loc 766Ki 18.4%
0.0% 0 .debug_pubnames 383Ki 9.2%
69.6% 195Ki .text 195Ki 4.7%
0.0% 0 .debug_line 177Ki 4.3%
0.0% 0 .debug_pubtypes 158Ki 3.8%
0.0% 0 .debug_ranges 131Ki 3.2%
0.0% 0 .strtab 44.6Ki 1.1%
14.6% 41.2Ki .dynstr 41.2Ki 1.0%
7.1% 19.8Ki .eh_frame 19.8Ki 0.5%
4.6% 12.8Ki .rodata 12.8Ki 0.3%
0.0% 0 .symtab 9.45Ki 0.2%
3.1% 8.62Ki .dynsym 8.62Ki 0.2%
1.0% 2.79Ki .eh_frame_hdr 2.79Ki 0.1%
0.0% 88 .bss 0 0.0%
6.5% 306Ki protobuf/src/google/protobuf/descriptor.pb.cc 2.38Mi 9.2%
0.0% 0 .debug_info 660Ki 27.1%
0.0% 0 .debug_loc 620Ki 25.4%
0.0% 0 .debug_str 256Ki 10.5%
0.0% 0 .debug_pubnames 166Ki 6.8%
0.0% 0 .debug_line 163Ki 6.7%
53.2% 163Ki .text 163Ki 6.7%
0.0% 0 .debug_ranges 154Ki 6.3%
0.0% 0 .strtab 71.1Ki 2.9%
22.3% 68.3Ki .dynstr 68.3Ki 2.8%
10.0% 30.8Ki .eh_frame 30.8Ki 1.3%
0.0% 0 .symtab 27.2Ki 1.1%
8.6% 26.4Ki .dynsym 26.4Ki 1.1%
0.0% 0 .debug_pubtypes 17.6Ki 0.7%
2.5% 7.63Ki .eh_frame_hdr 7.63Ki 0.3%
2.3% 6.91Ki .rodata 6.91Ki 0.3%
1.0% 3.13Ki .bss
[...]
Here we can see that Bloaty has figured out what part
of each section (.debug_*
, .text
, ehframe
, etc)
it can attribute to each source file. Bloaty has
constructed a very granular look into this binary, where
each part of the file is attributed to the code that
produced it.
I generally see 2% or less of the binary attributed to
[None]
now. Actually Bloaty never spits out a literal
[None]
anymore, because if we can’t figure out what
function/compileunit/etc. some part of the binary comes
from, we at least report its section. So if we’re stumped
by some file range, we’ll report something like [section
.rodata]
instead of the very unhelpful [None]
.
Debugging Stripped Binaries
People often want to profile stripped binaries. Very often
the binaries you ship to customers don’t have full debug
info in them, and you want to profile what you are shipping.
But some of Bloaty’s more useful data sources
(compileunits
especially) require debug information. What
to do?
Bloaty now supports reading symbols and debug info from separate files. That way you can profile the thing you’re actually trying to shrink, instead of having your results skewed with the overhead of debugging information.
Bloaty uses build IDs to make sure that the debug information always exactly matches the file you are profiling.
First-class Mach-O Support
When Bloaty was first released, it parsed ELF and DWARF directly, but shelled out to command-line programs to parse Mach-O. This was slow and didn’t give us as much info as we would have liked. As of Bloaty 1.0, we now have first-class Mach-O support. Both fat and single-arch binaries are supported.
DWARF is fortunately a cross-platform standard, which means that Mach-O and ELF can share all of the code that parses DWARF. The code to parse DWARF is about the same size as the ELF and Mach-O parsers combined, so it’s great that so much of this code can be shared.
Experimental WebAssembly Support
I am really excited about WebAssembly. I wanted to learn more about it, so I wrote a basic parser for Bloaty. It can handle sections and functions so far.
I am excited to see that this has been getting some use already!
Using Bloaty as a Presubmit
Some people might wonder how to integrate Bloaty into their workflow. One thing I’ve seen that’s very cool is the way some projects like grpc integrate Bloaty with their pull requests. Here is an example.
This gives quick and useful feedback about how a given PR will affect the binary size of your artifacts. For size-sensitive projects, this is a nice way of keeping tabs and making sure PR’s don’t cause unexpected or disproportionate growth.
Post 1.0
Bloaty has become quite capable, but there is always more to do. Maybe the biggest thing on my wishlist is PE/COFF support so people on Windows can benefit.
I would also like to make Bloaty understand references
between symbols. This would make it easier to answer
questions like “could I shrink the binary a lot by avoiding
calls to this one particular function?” It could also show
you the the benefit you could get by compiling with
-ffunction-sections
and -fdata-sections
if you’re not
doing that already. These are options that let the linker
strip individual functions if they are unreachable.
I’d also like to do a better job of mapping inlines. The
idea of the “inlines” data source is to know if the inlining
of a particular function is bloating your binary a lot.
If it is, maybe it would be helpful to un-inline it. Right
now the “inlines” data source uses the .debug_line
section, which is what a debugger uses to decide what source
file:line to place the cursor on when your problem is
stopped at a given address. It would be more convenient to
report inlines by function name instead, but .debug_line
doesn’t know anything about functions. If I get my inlining
info from .debug_info
instead, I should be able to report
inlines by function instead.
I’m happy with Bloaty 1.0 and look forward to improving it further!