Introducing Bloaty McBloatface: a size profiler for binaries
I’m very excited to announce that today I’m open-sourcing a
tool I’ve been working on for several months at Google.
It’s called Bloaty McBloatface, and it lets you explore
what’s taking up space in your .o
, .a
, .so
, and
executable binary files.
For the TL;DR, here are a couple examples of what Bloaty output looks like.
$ ./bloaty bloaty
VM SIZE FILE SIZE
-------------- --------------
0.0% 0 .debug_info 2.97Mi 38.3%
0.0% 0 .debug_loc 2.30Mi 29.7%
0.0% 0 .debug_str 1.03Mi 13.3%
0.0% 0 .debug_ranges 611Ki 7.7%
72.8% 332Ki .text 332Ki 4.2%
0.0% 0 .debug_line 218Ki 2.8%
0.0% 0 .debug_abbrev 85.4Ki 1.1%
0.0% 0 .strtab 62.8Ki 0.8%
13.2% 60.0Ki .rodata 60.0Ki 0.8%
7.0% 31.8Ki .eh_frame 31.8Ki 0.4%
0.0% 0 .symtab 27.8Ki 0.3%
0.0% 0 .debug_aranges 13.5Ki 0.2%
2.3% 10.5Ki .gcc_except_table 10.5Ki 0.1%
1.5% 6.77Ki [Other] 5.60Ki 0.1%
0.9% 4.18Ki .eh_frame_hdr 4.18Ki 0.1%
0.8% 3.54Ki .dynsym 3.54Ki 0.0%
0.8% 3.52Ki .dynstr 3.52Ki 0.0%
0.7% 2.98Ki .rela.plt 2.98Ki 0.0%
0.1% 568 [ELF Headers] 2.93Ki 0.0%
0.0% 34 [Unmapped] 2.85Ki 0.0%
0.0% 4 [None] 0 0.0%
100.0% 456Ki TOTAL 7.75Mi 100.0%
Or to break down by compile unit / source file:
$ ./bloaty bloaty -d compileunits
VM SIZE FILE SIZE
-------------- --------------
27.9% 128Ki [None] 7.43Mi 95.9%
12.9% 59.2Ki src/bloaty.cc 59.0Ki 0.7%
7.3% 33.4Ki re2/re2.cc 32.3Ki 0.4%
6.9% 31.6Ki re2/dfa.cc 31.6Ki 0.4%
6.8% 31.4Ki re2/parse.cc 31.4Ki 0.4%
6.7% 30.9Ki src/dwarf.cc 30.9Ki 0.4%
6.7% 30.6Ki re2/regexp.cc 27.8Ki 0.4%
5.1% 23.7Ki re2/compile.cc 23.7Ki 0.3%
4.3% 19.7Ki re2/simplify.cc 19.7Ki 0.2%
3.2% 14.8Ki src/elf.cc 14.8Ki 0.2%
3.1% 14.2Ki re2/nfa.cc 14.2Ki 0.2%
1.8% 8.34Ki re2/bitstate.cc 8.34Ki 0.1%
1.7% 7.84Ki re2/prog.cc 7.84Ki 0.1%
1.6% 7.13Ki re2/tostring.cc 7.13Ki 0.1%
1.5% 6.67Ki re2/onepass.cc 6.67Ki 0.1%
1.4% 6.58Ki src/macho.cc 6.58Ki 0.1%
0.7% 3.27Ki src/main.cc 3.27Ki 0.0%
0.2% 797 [Other] 797 0.0%
0.1% 666 util/stringprintf.cc 666 0.0%
0.1% 573 util/strutil.cc 573 0.0%
0.1% 476 util/rune.cc 476 0.0%
100.0% 460Ki TOTAL 7.75Mi 100.0%
Many more examples and more explanation is available in the README on GitHub.
Bloaty is available under the Apache 2 license. All of the code is available on GitHub: github.com/google/bloaty. It is quick and easy to build, though it does require a somewhat recent compiler since it uses C++11 extensively. Bloaty primarily supports ELF files (Linux, BSD, etc) but there is some support for Mach-O files on OS X too. I’m interested in expanding Bloaty’s capabilities to more platforms if there is interest!
We’ve been using Bloaty a lot on the Protocol Buffers team at Google to evaluate the binary size impacts of our changes. If a change causes a size increase, where did it come from? What sections/symbols grew, and why? Bloaty has a diff mode for understanding changes in binary size:
$ ./bloaty bloaty -- oldbloaty
VM SIZE FILE SIZE
++++++++++++++ GROWING ++++++++++++++
[ = ] 0 .debug_str +41.2Ki +5.0%
[ = ] 0 .debug_info +36.8Ki +1.3%
[ = ] 0 .debug_loc +12.4Ki +0.6%
+1.8% +6.12Ki .text +6.12Ki +1.8%
[ = ] 0 .debug_ranges +4.47Ki +0.8%
[ = ] 0 .debug_line +2.69Ki +1.3%
[ = ] 0 .strtab +1.52Ki +3.1%
+3.9% +1.32Ki .eh_frame +1.32Ki +3.9%
+1.6% +1.12Ki .rodata +1.12Ki +1.6%
[ = ] 0 .symtab +696 +2.3%
[ = ] 0 .debug_aranges +288 +2.4%
+2.7% +272 .gcc_except_table +272 +2.7%
+2.7% +136 .eh_frame_hdr +136 +2.7%
+1.2% +48 .dynsym +48 +1.2%
+1.4% +48 .rela.plt +48 +1.4%
+1.4% +32 .plt +32 +1.4%
+0.6% +22 .dynstr +22 +0.6%
+1.3% +16 .got.plt +16 +1.3%
+1.2% +4 .gnu.version +4 +1.2%
-------------- SHRINKING --------------
-18.5% -10 [Unmapped] -1.14Ki -31.4%
[ = ] 0 .debug_abbrev -72 -0.1%
+1.9% +9.12Ki TOTAL +107Ki +1.5%
Bloaty gives a high-level overview (sections) by default, but command-line switches can help you dig into the details by slicing and dicing the different dimensions (segments, sections, symbols, compile units, and even inlines).
You can use these slicing and dicing capabilities even if
overall binary size isn’t your primary concern. One thing I
like doing with Bloaty is seeing what section my variables
ended up in. If you have a global variable that shows up in
.data
but you know it doesn’t actually need to be
writable, slap const
on the variable and watch it move
from .data
to .rodata
.
$ echo "int foo[500] = {1};" > test.c
$ gcc -c test.c
$ ./bloaty -d sections,symbols test.o
VM SIZE FILE SIZE
-------------- --------------
100.0% 1.95Ki .data 1.95Ki 67.5%
100.0% 1.95Ki foo 1.95Ki 100.0%
0.0% 0 [ELF Headers] 640 21.6%
0.0% 0 .symtab 192 6.5%
0.0% 0 .shstrtab 69 2.3%
0.0% 0 .comment 44 1.5%
0.0% 0 .strtab 12 0.4%
0.0% 0 [Unmapped] 7 0.2%
100.0% 1.95Ki TOTAL 2.89Ki 100.0%
$ echo "const int foo[500] = {1};" > test.c
$ gcc -c test.c
$ ./bloaty -d sections,symbols test.o
VM SIZE FILE SIZE
-------------- --------------
100.0% 1.95Ki .rodata 1.95Ki 65.4%
100.0% 1.95Ki foo 1.95Ki 100.0%
0.0% 0 [ELF Headers] 704 23.0%
0.0% 0 .symtab 216 7.1%
0.0% 0 .shstrtab 77 2.5%
0.0% 0 .comment 44 1.4%
0.0% 0 .strtab 12 0.4%
0.0% 0 [Unmapped] 7 0.2%
100.0% 1.95Ki TOTAL 2.99Ki 100.0%
Before I wrote Bloaty I would use scripts that parse the
output of nm(1)
to look for big symbols. I would also
look at the output of size(1)
and use ls
to look at the
size of the file on disk. It always bothered me that these
different sources of information always seemed to disagree.
The totals never matched up. (There are reasons for this,
of course, but those tools didn’t help my understand what
those reasons were.)
To counter this, part of my goal with Bloaty was to create
measurements that are accurate, reliable, and verifiable.
To do this, Bloaty builds a memory map of the entire file,
both in the VM domain and the file domain. The TOTAL
row
in the file domain will always exactly match the file’s
actual size. If you ask to measure symbols, Bloaty will
track how much of the file and virtual memory space were not
covered by any symbol, and report those parts as [None]
.
If any part of the binary is referred to by more than one
symbol, Bloaty will notice and only count the first one.
That way the totals are always correct and you can trust the
results.
Bloaty also transparently reports the sizes of various overheads like ELF/AR headers that are otherwise hard to see.
One of the key ideas behind Bloaty is the dual VM/file view of binary size. When we talk about binary size, there are two distinct but related concerns:
-
how much space does my binary take on disk, how much bandwidth to download, etc.? This is the file view.
-
how much memory does my binary take to load when you actually run it? This is the vm view.
Both of these matter, but they are affected by different
things. Debug symbols do make the binary larger, but they
don’t take up any memory at runtime because they don’t get
loaded. And zero-initialized global variables take up RAM
at runtime but no space in the binary, because we know they
are just zero. Since Bloaty shows both views, you can take
a huge binary full of debug symbols and see just the runtime
memory costs. (size(1)
can do the same thing, but the
results are more vague and harder to interpret).
Help Wanted
I love working on Bloaty, but unfortunately I don’t have the time to implement everything I want to add. If this sort of things sounds up your alley, I would love to get some pull requests!
Working on Bloaty has taught me a ton about how linkers, loaders, compilers, and debuggers work. I know more about ELF now than most people. If learning about the bottom of the stack appeals to you, I highly recommend this project as a way to learn more.
Here are some of the things I’d love to see added to Bloaty:
- more fleshed-out Mach-O support (right now we shell out to
otool
and other command-line programs. Parsing Mach-O directly, as we do with ELF/DWARF, is much faster and more robust. - dependency analysis. It turns out that you can
disassemble the
.text
(and look for pointers from.data
) to construct a pretty complete dependency graph of what symbols refer to other symbols. This can be useful for all sorts of things. It can find “dead” symbols (symbols not reachable from the program’s entry point), which will tell you how much binary size you’re wasting by not compiling with-ffunction-sections
/-fdata-sections
-Wl,-strip_unused
. It can help you understand why a particular function wasn’t stripped when you thought it might be. I prototyped all this but haven’t had the time to turn my prototype ideas into robust code. There are lots of fun things to explore here. - refining the data providers to be more complete. I’ve
learned that binary files have tons of data to mine, but
it can take some creativity to make good use of it. For
example, take the
compileunits
data provider. This uses DWARF data to determine which code came from which source files. The tricky part is that compilers often leave some debugging info out to try to keep the binary size reasonable. For exaple, if the.debug_aranges
section is present, it gives us almost exactly what we need, but it’s often not present, and when it is present sometimes it’s incomplete. There are other places we can look in the debug info to supplement this and give more complete results. - optimization. Bloaty is pretty fast (particularly the file parsers), but on binaries that are hundreds of megabytes long you will have to wait a bit for results. Some of the core data structures are showing up as hotspots in profiles. I think there might be some low-hanging fruit here.
If you’re interested in working on Bloaty, please read the CONTRIBUTING file for more info!
Acknowledgments
I got a lot of help from coworkers at Google. Thanks to Paul Pluzhnikov and David Blaikie who answered a lot of my questions about the ELF/DWARF formats. Thanks to my teammate Gerben Stavenga for giving it some solid use and offering helpful feedback. Thanks to Vincent Vanhoucke for the suggestion of the name (a fun twist on Boaty McBoatface). Thanks especially to Google and the Open Source team who make it possible to release things like this as open-source. It is one of the great perks of working here!