For a while I’ve been wondering what it would be like to use arenas in Rust. In C and C++ I have been turning to arenas more and more as a fast alternative to heap allocation. If you have a bunch of objects that share a common lifetime, arenas offer cheaper allocation and much cheaper deallocation than the heap. The more I use this pattern, the more it feels downright wasteful to use heap allocation when an arena would do.

I’ve been wanting to know how arenas would play with Rust’s lifetime semantics. An arena must always outlive all the objects allocated from that arena. Rust’s lifetime system seems ideal for expressing a condition like this. I was curious to see how this plays out in practice.

Arena APIs

C and C++

First I will present the arena APIs I am familiar with in C and C++. Here is a simplified version of the C++ Arena API for protobuf:

// The C++ Arena is thread-safe (Functions taking Arena* may be called
// concurrently, except the destructor).
class Arena {
 public:
  Arena();
  ~Arena();  // Frees all objects in the arena.

  // Creates an object on the arena. The object is freed when the arena is
  // destroyed.  The destructor will be run unless it is trivial.
  template <class T, class... Args>
  static T* Create(Arena* arena, Args&&... args);
}

void Test() {
  Arena arena;
  int* i1 = Arena::Create<int>(&arena);
  int* i2 = Arena::Create<int>(&arena);
  int* i3 = Arena::Create<int>(&arena);

  // Use i1, i2, i3...

  // When the arena is destroyed, the individual objects are freed.
}

Here is a similar but somewhat different example in C, from the upb protobuf library:

// The C arena is thread-compatible, but not thread-safe (functions that take
// upb_arena* may not be called concurrently).
upb_arena *upb_arena_new(void);

// Frees all memory in the arena.
void upb_arena_free(upb_arena *a);

// Allocates so memory from the arena.
void *upb_arena_malloc(upb_arena *a, size_t size);

void test() {
  upb_arena *arena = upb_arena_new();

  int* i1 = upb_arena_malloc(arena, sizeof(*i1));
  int* i2 = upb_arena_malloc(arena, sizeof(*i2));
  int* i3 = upb_arena_malloc(arena, sizeof(*i3));

  // Use i1, i2, i3...

  upb_arena_free(arena);  // All the individual objects are freed.
}

In both the C and C++ versions, it is the user’s responsibility to make sure that arena-allocated objects are not used past the lifetime of the arena. C and C++ are not memory safe languages, and they offer no lifetime checking that would help us avoiding dangling pointers here.

Even techniques like unique_ptr can’t help us here since the objects cannot be freed independently. We could get dynamic checking by using unique_ptr with a custom deleter that decrements a refcount on the arena, and then panic if an arena is destroyed with a non-zero refcount. This would be better than nothing, but it’s still a runtime check (not compile time). In any case neither arena uses this technique at the moment, so respecting the arena’s lifetime is entirely the responsibility of the user.

The C and C++ versions above have different thread-safety properties: the C++ arena is thread-safe (concurrent allocations are allowed), while the C arena is thread-compatible (concurrent allocations are not allowed, because they access a mutable pointer). This means the C++ API is paying an efficiency/complexity overhead to allow concurrent mutable access.

Rust

The most established Rust Arena library I could find was called Bumpalo. Its API is slightly different than both of these:

pub struct Bump { /* ... */ }

impl Bump {
    pub fn new() -> Bump { /* ... */ }

    /// Allocate an object in this `Bump` and return an exclusive reference to
    /// it.
    pub fn alloc<T>(&self, val: T) -> &mut T { /* ... */ }
}

// Bump cannot be shared between threads!
impl !Sync for Bump {}

impl Drop for Bump {
    /// Frees all the objects in the arena, without calling their Drop.
    fn drop(&mut self) { /* ... */ }
}

fn Test() {
    let bump = Bump::new();

    let i1 = bump.alloc(1);
    let i1 = bump.alloc(2);
    let i1 = bump.alloc(3);

    // When bump is dropped, the integers are freed.
}

The Bump::alloc() function returns a mutable reference &mut T. Rust’s lifetime system will statically ensure that the reference doesn’t outlive the arena. We will explore some consequences of this below.

From a thread-safety perspective, Rust’s Bump is distinct from both the C and C++ arenas above. Like the C arena, Bump does not perform any internal synchronization, so it avoids the overheads of the C++ thread-safe arena. But unlike the C arena, Bump allows allocation through an immutable reference; in other words it uses “interior mutability.” To ensure that it is not used concurrently, it expressly forbids sharing between threads by being !Sync.

If we use the analysis in my previous article, Bump lives in the upper right quadrant with Cell (indeed Bump internally uses Cell, which is what prevents it from being Sync).

Why is Rust’s “Bump” not “Sync”?

An interesting question is why Bump in Rust chooses not to implement Sync. If we want to avoid synchronization overhead, our two main choices are:

  1. Avoid interior mutability (use &mut self), and implement Sync.
  2. Use interior mutability (use &self), and do not implement Sync (likely using Cell internally).

(1) is safe because it only mutates through a &mut reference, which is guaranteed to be unique and therefore cannot race with anything. (2) is safe because while mutation can happen from any reference, all references are bound to a single thread, so this cannot create a data race.

Bump chooses (2), but we could imagine an alternate world where it had chosen (1) instead:

pub struct SyncBump { /* ... */ }

// Alternate version of Bump that is sync, but stil avoids syncronization
// overhead:
impl SyncBump {
    pub fn new() -> Bump { /* ... */ }

    /// NOTE: takes a &mut self instead of &self.
    pub fn alloc<T>(&mut self, val: T) -> &mut T { /* ... */ }
}

Unfortunately, SyncBump is a very limiting interface. If allocation is only allowed through a unique reference, it is impossible for any data structure to stash away a reference to the allocator and use it internally. For example, consider the arena-aware Vec type that comes along with Bumpalo:

use bumpalo::{Bump, collections::Vec};

let b = Bump::new();

let mut vec = Vec::new();
let mut vec2 = Vec::new();
vec.push(1);
vec.push(2);

If Bump used option (1) above, this code would be impossible, because the two Vec objects would not both be able to have mutable references to the same arena. Instead you would need to inject the arena into every operation that can possibly allocate:

use sync_bump_thought_experiment::{SyncBump, collections::Vec};

let b = SyncBump::new();

let mut vec = Vec::new();
let mut vec2 = Vec::new();

// We have to explicitly inject the arena into every operation that could
// allocate.
vec.push(1, &b);
vec.push(2, &b);

You could try to optimize the ergonomics of this pattern a bit by using a separate builder object:

use sync_bump_thought_experiment::SyncBump;
use sync_bump_thought_experiment::{collections::Vec, collections::VecBuilder};

let b = SyncBump::new();

let mut vec = Vec::new();
let mut vec2 = Vec::new();

// For repeated mutations, we construct a builder.  Only one builder can
// exist for a given arena at a time.
let builder = VecBuilder::new_for(&vec, &b);
builder.push(1);
builder.push(2);

On some level, injecting the arena explicitly for mutating operations is a more optimal solution: it avoids wasting space for a &SyncBump in every arena-aware container. But ergonomically, it is very inconvenient to have to pass &b as an extra parameter all the time.

When I tried to learn more about how Rust historically decided between &self and &mut self in allocators, I found a long and complicated discussion spanning many years:

I don’t have enough practical experience with Rust to have an overall opinion on this issue. It is clearly a complex tradeoff with many downstream implications. It appears that Rust allocators are generally going in the direction of &self (interior mutability).

Ergonomics of Arenas in Rust

What’s it like to use Arenas in Rust? Let’s take a simple struct that doesn’t use arenas:

struct Foo {
    str: String,
    vec: Vec,
}

fn main() {
    let foo = Foo {
        str: "ABC".to_string(),
        vec: Vec::new(),
    };
}

If we want to put the string/vec data on an arena instead, it looks like this:

use bumpalo::collections::String as BumpString;
use bumpalo::collections::Vec as BumpVec;
use bumpalo::Bump;

struct ArenaFoo<'a> {
    str: BumpString<'a>,
    vec: BumpVec<'a, i32>,
}

fn main() {
    let bump = Bump::new();
    let foo = ArenaFoo {
        str: BumpString::from_str_in("ABC", &bump),
        vec: BumpVec::new_in(&bump),
    };
}

We have to use different, arena-aware variations of containers like String and Vec. This is intended to be a temporary situation which will eventually be remedied by adding allocator support to the standard containers.

More interesting and more fundamental is that our struct now has a lifetime parameter 'a that constrains the struct in a way that the original struct was not constrained. This makes sense, as it expresses the fact that our struct must be outlived by the arena bump. Rust will require that we propagate this lifetime parameter to any other struct that contains ArenaFoo.

In my experience trying this out on a hobby project PDF parser, this all worked out reasonably well. I had to put lifetime parameters on most of my types, but it didn’t really cause a problem in my testing.

Having the compiler automatically check the lifetimes was very satisfying; it was a level of static safety checking that I have never gotten to experience in C or C++.

Hiding the lifetime parameter

Things changed once I tried to expose my parser to Python through the PyO3 Python bindings for Rust. A Rust struct exposed as a Python class through #[pyclass] cannot have type parameters. Here is my attempt at a basic “Hello, World” of exposing ArenaFoo to Python:

use pyo3::prelude::*;
use bumpalo::collections::String as BumpString;
use bumpalo::collections::Vec as BumpVec;
use bumpalo::Bump;

#[pyclass]
struct ArenaFoo<'a> {
    str: BumpString<'a>,
    vec: BumpVec<'a, i32>,
}

#[pymodule]
fn arena_test(_py: Python, m: &PyModule) -> PyResult<()> {
    Ok(())
}

The compiler rejects this:

error: #[pyclass] cannot have generic parameters: arena_test
 --> src/lib.rs:9:11
  |
9 | struct ArenaFoo<'a> {
  |           ^

This naturally led me to brainstorm how I could put my ArenaFoo<'a> inside of a struct with no lifetime constraints. This question is not specific to my problem of trying to bind to Python, it is a more general question: is it possible for a type to use arenas internally, in a way that is invisible to the user? Can we hide the lifetime parameter, so that our usage of arenas can be a purely internal concern that does not impose any extra constraints on users of the type?

My first thought was to bundle the Bump inside of the same struct, thereby guaranteeing that it will have the same lifetime:

#[pyclass]
struct ArenaFoo {
    bump: Bump,
    str: BumpString<'a>,
    vec: BumpVec<'a, i32>,
}

By itself this does nothing to convince the compiler that the 'a is unnecessary on ArenaFoo<'a>.

I asked about this on Rust’s Discord, and I was informed that this pattern is known as the “self-referential struct”, and it is notorously difficult to make sound. The problem is that str and vec would have references to bump in the steady state. Rust’s normal rules would allow anyone with &mut ArenaFoo to get &mut Bump, but this would be unsound if other members of the struct have &Bump references.

I was referred to the ouroboros and owning_ref crates, which can help users construct self-referencing structs in a reasonably sound way (I was told that both have soundness holes, but that these are apparently fixable).

Using ouroboros for self-referencing structs

With ouroboros, things were looking promising:

use pyo3::prelude::*;
use bumpalo::collections::String as BumpString;
use bumpalo::collections::Vec as BumpVec;
use bumpalo::Bump;
use ouroboros::self_referencing;

#[pyclass]
#[self_referencing]
struct ArenaFoo {
    bump: Bump,
    #[covariant]
    #[borrows(bump)]
    str: BumpString<'this>,
    #[covariant]
    #[borrows(bump)]
    vec: BumpVec<'this, i32>,
}

#[pymodule]
fn arena_test(_py: Python, m: &PyModule) -> PyResult<()> {
    Ok(())
}

However the Rust compiler threw errors:

error[E0277]: `NonNull<i32>` cannot be sent between threads safely
   --> src/lib.rs:8:1
    |
8   | #[pyclass]
    | ^^^^^^^^^^ `NonNull<i32>` cannot be sent between threads safely
    |
   ::: /Users/haberman/.cargo/registry/src/github.com-1ecc6299db9ec823/pyo3-0.14.5/src/class/impl_.rs:328:33
    |
328 | pub struct ThreadCheckerStub<T: Send>(PhantomData<T>);
    |                                 ---- required by this bound in `ThreadCheckerStub`
    |
    = help: within `SelfReferencingFoo`, the trait `Send` is not implemented for `NonNull<i32>`
    = note: required because it appears within the type `bumpalo::collections::raw_vec::RawVec<'static, i32>`
    = note: required because it appears within the type `bumpalo::collections::Vec<'static, i32>`
note: required because it appears within the type `SelfReferencingFoo`
   --> src/lib.rs:10:8
    |
10  | struct SelfReferencingFoo {
    |        ^^^^^^^^^^^^^^^^^^
    = note: this error originates in the attribute macro `pyclass` (in Nightly builds, run with -Z macro-backtrace for more info)


error[E0277]: `Cell<NonNull<bumpalo::ChunkFooter>>` cannot be shared between threads safely
   --> src/lib.rs:8:1
    |
8   | #[pyclass]
    | ^^^^^^^^^^ `Cell<NonNull<bumpalo::ChunkFooter>>` cannot be shared between threads safely
    |
   ::: /Users/haberman/.cargo/registry/src/github.com-1ecc6299db9ec823/pyo3-0.14.5/src/class/impl_.rs:328:33
    |
328 | pub struct ThreadCheckerStub<T: Send>(PhantomData<T>);
    |                                 ---- required by this bound in `ThreadCheckerStub`
    |
    = help: within `bumpalo::Bump`, the trait `Sync` is not implemented for `Cell<NonNull<bumpalo::ChunkFooter>>`
    = note: required because it appears within the type `bumpalo::Bump`
    = note: required because of the requirements on the impl of `Send` for `&'static bumpalo::Bump`
    = note: required because it appears within the type `bumpalo::collections::raw_vec::RawVec<'static, i32>`
    = note: required because it appears within the type `bumpalo::collections::Vec<'static, i32>`
note: required because it appears within the type `SelfReferencingFoo`
   --> src/lib.rs:10:8
    |
10  | struct SelfReferencingFoo {
    |        ^^^^^^^^^^^^^^^^^^
    = note: this error originates in the attribute macro `pyclass` (in Nightly builds, run with -Z macro-backtrace for more info)

The main problem here is that Bump is not Sync. This unfortunately prevents SelfReferencingFoo from being Send; a significant limitation.

For the PyO3 case, while PyO3 does allow you to specify #[pyclass(unsendable)] to indicate a class that doesn’t support Send, this will cause a runtime panic if a Python user accesses the object from a different Python thread than the one that created it. This might be ok for a library that is just for experimentation, but it would not be an acceptable limitation for a production-quality Python library.

Conclusion

Rust’s lifetime system offers the very attractive possibility of having the type system automatically check lifetimes of arena-allocated objects. And indeed, this lifetime checking worked great in the scenarios where I was able to use it.

I was able to use ouroboros to make my usage of arenas an internal concern of my type. A self-referencing struct can allow the arena to be packaged together with the references, such that we do not need to impose a lifetime parameter on users of the struct.

I was unfortunately not able to find a satisfactory solution to using arenas in Rust while supporting Send, which hampered my ability to wrap my library in Python. The combination of (1) !Sync on Bump and (2) arena-aware containers that store &Bump resulted in types that are not Send, which for my case was too big a limitation to move forward with.

This made me wonder if either of these design choices would make sense to revisit:

  1. Should allocator-aware data structures avoid actually storing a reference to the allocator, and instead take the allocator as a parameter for every mutating operation? The low-level programmer in me says yes: it’s more optimal overall since it saves space in the container, and in this case it has the added benefit of avoiding the problem that made BumpVec a !Send type. But realistically I can’t imagine that the community would accept the ergonomics of having to thread the allocator through every function call that could allocate. It’s just too intrusive.
  2. Should Bump be Sync and avoid interior mutability? This also seems like a hard sell. It would require either doing (1) anyway (since allocator-aware data structures could no longer allocate through an immutable reference) or using some other mitigation to let containers store something that will give them a &mut Bump when they need to allocate.

Ultimately I could not convince myself that either of these two changes would make sense.

The only other idea that came to mind was to make Bump truly thread-safe. Then it could support allocation through &self but also support Sync. Protobuf’s C++ thread-safe allocator has been optimized extensively to keep the synchronization overhead low: a thread-local is used as a cache to quickly resolve to a structure specific to that arena and thread, from which allocations can happen without synchronization. There is definitely some significant complexity and a bit of overhead to this, but it is one option that could potentially solve the issue.