Arenas and Rust
For a while I’ve been wondering what it would be like to use arenas in Rust. In C and C++ I have been turning to arenas more and more as a fast alternative to heap allocation. If you have a bunch of objects that share a common lifetime, arenas offer cheaper allocation and much cheaper deallocation than the heap. The more I use this pattern, the more it feels downright wasteful to use heap allocation when an arena would do.
I’ve been wanting to know how arenas would play with Rust’s lifetime semantics. An arena must always outlive all the objects allocated from that arena. Rust’s lifetime system seems ideal for expressing a condition like this. I was curious to see how this plays out in practice.
Arena APIs
C and C++
First I will present the arena APIs I am familiar with in C and C++. Here is a simplified version of the C++ Arena API for protobuf:
// The C++ Arena is thread-safe (Functions taking Arena* may be called
// concurrently, except the destructor).
class Arena {
public:
Arena();
~Arena(); // Frees all objects in the arena.
// Creates an object on the arena. The object is freed when the arena is
// destroyed. The destructor will be run unless it is trivial.
template <class T, class... Args>
static T* Create(Arena* arena, Args&&... args);
}
void Test() {
Arena arena;
int* i1 = Arena::Create<int>(&arena);
int* i2 = Arena::Create<int>(&arena);
int* i3 = Arena::Create<int>(&arena);
// Use i1, i2, i3...
// When the arena is destroyed, the individual objects are freed.
}
Here is a similar but somewhat different example in C, from the upb protobuf library:
// The C arena is thread-compatible, but not thread-safe (functions that take
// upb_arena* may not be called concurrently).
upb_arena *upb_arena_new(void);
// Frees all memory in the arena.
void upb_arena_free(upb_arena *a);
// Allocates so memory from the arena.
void *upb_arena_malloc(upb_arena *a, size_t size);
void test() {
upb_arena *arena = upb_arena_new();
int* i1 = upb_arena_malloc(arena, sizeof(*i1));
int* i2 = upb_arena_malloc(arena, sizeof(*i2));
int* i3 = upb_arena_malloc(arena, sizeof(*i3));
// Use i1, i2, i3...
upb_arena_free(arena); // All the individual objects are freed.
}
In both the C and C++ versions, it is the user’s responsibility to make sure that arena-allocated objects are not used past the lifetime of the arena. C and C++ are not memory safe languages, and they offer no lifetime checking that would help us avoiding dangling pointers here.
Even techniques like unique_ptr
can’t help us here since the objects cannot
be freed independently. We could get dynamic checking by using unique_ptr
with a custom deleter that decrements a refcount on the arena, and then panic if
an arena is destroyed with a non-zero refcount. This would be better than
nothing, but it’s still a runtime check (not compile time). In any case
neither arena uses this technique at the moment, so respecting the arena’s
lifetime is entirely the responsibility of the user.
The C and C++ versions above have different thread-safety properties: the C++ arena is thread-safe (concurrent allocations are allowed), while the C arena is thread-compatible (concurrent allocations are not allowed, because they access a mutable pointer). This means the C++ API is paying an efficiency/complexity overhead to allow concurrent mutable access.
Rust
The most established Rust Arena library I could find was called Bumpalo. Its API is slightly different than both of these:
pub struct Bump { /* ... */ }
impl Bump {
pub fn new() -> Bump { /* ... */ }
/// Allocate an object in this `Bump` and return an exclusive reference to
/// it.
pub fn alloc<T>(&self, val: T) -> &mut T { /* ... */ }
}
// Bump cannot be shared between threads!
impl !Sync for Bump {}
impl Drop for Bump {
/// Frees all the objects in the arena, without calling their Drop.
fn drop(&mut self) { /* ... */ }
}
fn Test() {
let bump = Bump::new();
let i1 = bump.alloc(1);
let i1 = bump.alloc(2);
let i1 = bump.alloc(3);
// When bump is dropped, the integers are freed.
}
The Bump::alloc()
function returns a mutable reference &mut T
. Rust’s
lifetime system will statically ensure that the reference doesn’t outlive the
arena. We will explore some consequences of this below.
From a thread-safety perspective, Rust’s Bump
is distinct from both the
C and C++ arenas above. Like the C arena, Bump
does not perform any internal
synchronization, so it avoids the overheads of the C++ thread-safe arena.
But unlike the C arena, Bump
allows allocation through an immutable reference;
in other words it uses “interior mutability.” To ensure that it is not used
concurrently, it expressly forbids sharing between threads by being !Sync
.
If we use the analysis in my previous article, Bump
lives in the upper right quadrant
with Cell
(indeed Bump
internally uses Cell
, which is what prevents it
from being Sync
).
Why is Rust’s “Bump” not “Sync”?
An interesting question is why Bump
in Rust chooses not to implement Sync
.
If we want to avoid synchronization overhead, our two main choices are:
- Avoid interior mutability (use
&mut self
), and implementSync
. - Use interior mutability (use
&self
), and do not implementSync
(likely usingCell
internally).
(1) is safe because it only mutates through a &mut
reference, which is
guaranteed to be unique and therefore cannot race with anything. (2) is safe
because while mutation can happen from any reference, all references are bound
to a single thread, so this cannot create a data race.
Bump
chooses (2), but we could imagine an alternate world where it had
chosen (1) instead:
pub struct SyncBump { /* ... */ }
// Alternate version of Bump that is sync, but stil avoids syncronization
// overhead:
impl SyncBump {
pub fn new() -> Bump { /* ... */ }
/// NOTE: takes a &mut self instead of &self.
pub fn alloc<T>(&mut self, val: T) -> &mut T { /* ... */ }
}
Unfortunately this will not actually work. Due to a limitation in Rust’s
syntax, any call to
alloc()
will mutably borrow SyncBump
for the lifetime of the returned
object. The net effect is that you can only allocate a single object from the
arena at a time, a limitation so severe that
it makes SyncBump
useless.
This means that, for the time being, interior mutability (a la Bumpalo) is the only feasible way to implement a non-thread-safe arena in Rust.
Ergonomics of Arenas in Rust
What’s it like to use Arenas in Rust? Let’s take a simple struct that doesn’t use arenas:
struct Foo {
str: String,
vec: Vec,
}
fn main() {
let foo = Foo {
str: "ABC".to_string(),
vec: Vec::new(),
};
}
If we want to put the string/vec data on an arena instead, it looks like this:
use bumpalo::collections::String as BumpString;
use bumpalo::collections::Vec as BumpVec;
use bumpalo::Bump;
struct ArenaFoo<'a> {
str: BumpString<'a>,
vec: BumpVec<'a, i32>,
}
fn main() {
let bump = Bump::new();
let foo = ArenaFoo {
str: BumpString::from_str_in("ABC", &bump),
vec: BumpVec::new_in(&bump),
};
}
We have to use different, arena-aware variations of containers like String
and Vec
. This is intended to be a temporary situation which will eventually
be remedied by adding allocator support to the standard
containers.
More interesting and more fundamental is that our struct now has a lifetime
parameter 'a
that constrains the struct in a way that the original struct was
not constrained. This makes sense, as it expresses the fact that our struct
must be outlived by the arena bump
. Rust will require that we propagate
this lifetime parameter to any other struct that contains ArenaFoo
.
In my experience trying this out on a hobby project PDF parser, this all worked out reasonably well. I had to put lifetime parameters on most of my types, but it didn’t really cause a problem in my testing.
Having the compiler automatically check the lifetimes was very satisfying; it was a level of static safety checking that I have never gotten to experience in C or C++.
Hiding the lifetime parameter
Things changed once I tried to expose my parser to Python through the
PyO3 Python bindings for Rust. A Rust struct exposed as a
Python class through
#[pyclass]
cannot
have type parameters. Here is my
attempt at a basic “Hello, World” of exposing ArenaFoo
to Python:
use pyo3::prelude::*;
use bumpalo::collections::String as BumpString;
use bumpalo::collections::Vec as BumpVec;
use bumpalo::Bump;
#[pyclass]
struct ArenaFoo<'a> {
str: BumpString<'a>,
vec: BumpVec<'a, i32>,
}
#[pymodule]
fn arena_test(_py: Python, m: &PyModule) -> PyResult<()> {
Ok(())
}
The compiler rejects this:
error: #[pyclass] cannot have generic parameters: arena_test
--> src/lib.rs:9:11
|
9 | struct ArenaFoo<'a> {
| ^
This naturally led me to brainstorm how I could put my ArenaFoo<'a>
inside of
a struct with no lifetime constraints. This question is not specific to my
problem of trying to bind to Python, it is a more general question: is it
possible for a type to use arenas internally, in a way that is invisible to the
user? Can we hide the lifetime parameter, so that our usage of arenas can be a
purely internal concern that does not impose any extra constraints on users of
the type?
My first thought was to bundle the Bump
inside of the same struct, thereby
guaranteeing that it will have the same lifetime:
#[pyclass]
struct ArenaFoo {
bump: Bump,
str: BumpString<'a>,
vec: BumpVec<'a, i32>,
}
By itself this does nothing to convince the compiler that the 'a
is
unnecessary on ArenaFoo<'a>
.
I asked about this on Rust’s Discord, and I was informed that this pattern is
known as the “self-referential struct”, and it is notorously difficult to make
sound. The problem is that str
and vec
would have references to bump
in
the steady state. Rust’s normal rules would allow anyone with &mut ArenaFoo
to get &mut Bump
, but this would be unsound if other members of the struct
have &Bump
references.
I was referred to the ouroboros and owning_ref crates, which can help users construct self-referencing structs in a reasonably sound way (I was told that both have soundness holes, but that these are apparently fixable).
Using ouroboros for self-referencing structs
With ouroboros, things were looking promising:
use pyo3::prelude::*;
use bumpalo::collections::String as BumpString;
use bumpalo::collections::Vec as BumpVec;
use bumpalo::Bump;
use ouroboros::self_referencing;
#[pyclass]
#[self_referencing]
struct ArenaFoo {
bump: Bump,
#[covariant]
#[borrows(bump)]
str: BumpString<'this>,
#[covariant]
#[borrows(bump)]
vec: BumpVec<'this, i32>,
}
#[pymodule]
fn arena_test(_py: Python, m: &PyModule) -> PyResult<()> {
Ok(())
}
However the Rust compiler threw errors:
error[E0277]: `NonNull<i32>` cannot be sent between threads safely
--> src/lib.rs:8:1
|
8 | #[pyclass]
| ^^^^^^^^^^ `NonNull<i32>` cannot be sent between threads safely
|
::: /Users/haberman/.cargo/registry/src/github.com-1ecc6299db9ec823/pyo3-0.14.5/src/class/impl_.rs:328:33
|
328 | pub struct ThreadCheckerStub<T: Send>(PhantomData<T>);
| ---- required by this bound in `ThreadCheckerStub`
|
= help: within `SelfReferencingFoo`, the trait `Send` is not implemented for `NonNull<i32>`
= note: required because it appears within the type `bumpalo::collections::raw_vec::RawVec<'static, i32>`
= note: required because it appears within the type `bumpalo::collections::Vec<'static, i32>`
note: required because it appears within the type `SelfReferencingFoo`
--> src/lib.rs:10:8
|
10 | struct SelfReferencingFoo {
| ^^^^^^^^^^^^^^^^^^
= note: this error originates in the attribute macro `pyclass` (in Nightly builds, run with -Z macro-backtrace for more info)
error[E0277]: `Cell<NonNull<bumpalo::ChunkFooter>>` cannot be shared between threads safely
--> src/lib.rs:8:1
|
8 | #[pyclass]
| ^^^^^^^^^^ `Cell<NonNull<bumpalo::ChunkFooter>>` cannot be shared between threads safely
|
::: /Users/haberman/.cargo/registry/src/github.com-1ecc6299db9ec823/pyo3-0.14.5/src/class/impl_.rs:328:33
|
328 | pub struct ThreadCheckerStub<T: Send>(PhantomData<T>);
| ---- required by this bound in `ThreadCheckerStub`
|
= help: within `bumpalo::Bump`, the trait `Sync` is not implemented for `Cell<NonNull<bumpalo::ChunkFooter>>`
= note: required because it appears within the type `bumpalo::Bump`
= note: required because of the requirements on the impl of `Send` for `&'static bumpalo::Bump`
= note: required because it appears within the type `bumpalo::collections::raw_vec::RawVec<'static, i32>`
= note: required because it appears within the type `bumpalo::collections::Vec<'static, i32>`
note: required because it appears within the type `SelfReferencingFoo`
--> src/lib.rs:10:8
|
10 | struct SelfReferencingFoo {
| ^^^^^^^^^^^^^^^^^^
= note: this error originates in the attribute macro `pyclass` (in Nightly builds, run with -Z macro-backtrace for more info)
The main problem here is that Bump
is not Sync
. This unfortunately
prevents SelfReferencingFoo
from being Send
; a significant limitation.
For the PyO3 case, while PyO3 does allow you to specify
#[pyclass(unsendable)]
to indicate a class that doesn’t support Send
, this will cause a runtime
panic if a Python user accesses the object from a different Python thread
than the one that created it. This might be ok for a library that is just
for experimentation, but it would not be an acceptable limitation for a
production-quality Python library.
Conclusion
Rust’s lifetime system offers the very attractive possibility of having the type system automatically check lifetimes of arena-allocated objects. And indeed, this lifetime checking worked great in the scenarios where I was able to use it.
I was able to use ouroboros to make my usage of arenas an internal concern of my type. A self-referencing struct can allow the arena to be packaged together with the references, such that we do not need to impose a lifetime parameter on users of the struct.
I was unfortunately not able to find a satisfactory solution to using arenas in
Rust while supporting Send
, which hampered my ability to wrap my library in
Python. The combination of (1) !Sync
on Bump
and (2) arena-aware
containers that store &Bump
resulted in types that are not Send
, which for
my case was too big a limitation to move forward with.
This led me to conclude that, for the time being, we will need to avoid storing
Bump&
in any structure if we need it to support Send
.
The only other idea that came to mind was to make Bump
truly thread-safe.
Then it could support allocation through &self
but also support Sync
.
Protobuf’s C++ thread-safe allocator has been optimized extensively to keep the
synchronization overhead low: a thread-local is used as a cache to quickly
resolve to a structure specific to that arena and thread, from which
allocations can happen without synchronization. There is definitely some
significant complexity and a bit of overhead to this, but it is one option that
could potentially solve the issue.