How do Native Extensions Manage Memory? Part 2: JavaScript (V8)
This article is the second part in a series. In this article we look at the V8 VM for JavaScript and explore the memory-management APIs it offers to native C++ extensions.
The other articles in the series are:
- Part 1: Ruby (MRI) (also contains a high-level introduction to the problem).
- Part 2: JavaScript (V8) (this article)
Representing JavaScript objects in C++
Let’s start by exploring how objects are represented in the
V8 native extension API. The V8 API is written in C++. To
refer to a JavaScript object from C++, you use a special
kind of C++ object called a handle. There are several
kinds of handles defined in the v8.h
header, the most
common of which is v8::Local
:
Handles in V8 are important from a memory management perspective – they function as GC roots. They also provide a layer of indirection that lets V8 move objects around in memory without breaking references from C++.
To go in the other direction (to refer to C++ data from V8
space) you use a special kind of V8 object called
v8::External
. A v8::External
is a JavaScript object
that simply stores a void*
. The pointer can refer to
whatever data you want.
v8::External
is a normal JavaScript object, but you don’t
normally want to expose these directly to JavaScript
programs. For one thing, you can’t define any behavior on
them (attributes or methods), so they would appear as just
dumb opaque objects that were only useful for passing as an
argument to other functions. Worse, the v8::External
doesn’t carry any type information about the type of the
pointer, so you wouldn’t be able to verify that the user
passed you the right kind of v8::External
. The
JavaScript program could pass you a v8::External
containing a Foo*
when you were expecting a Bar*
. This
will probably lead to a SIGSEGV
.
Instead, you probably want to stash the v8::External
into
a hidden storage area called the “internal fields” array.
This is a per-object array of arbitrary of JavaScript
objects that are only accessible to C++ extensions.
Here is some sample code from the v8 Embedder’s Guide illustrating how this works (with some comments added by me):
Our strategy then is for all our C++ code to follow the rule
“slot 0 in the internal fields for this object type will
always contain a Point*
.” Since JavaScript can’t mutate
this “internal field” array, we can be guaranteed that
JavaScript can’t mess this up and crash our program.
References between plain JavaScript objects occurr all the
time, especially with container types such as Object
and Array
.
We can also create pointers between objects in “native extension” space, but we have to be careful that we don’t dereference a pointer to data that has been garbage-collected! We need to talk more about V8’s GC before we can understand how to do this safely.
Garbage Collection with V8
Like Ruby, V8 uses tracing garbage collection. From the previous article we know that there are two primary concerns we have to address to support tracing GC:
-
What are the root objects?
-
Given an object, what other objects does it directly reference?
We’ll start with #1.
Handles: GC roots from C++
To find root objects, V8 looks at all live JavaScript variables as well as handles from C++. It’s as simple as that! We have more to learn about handles though.
V8 has several kinds of handles:
v8::Local
: for handles that you create on the C++ stack.v8::Persistent
: for handles not on the C++ stack (static or heap).v8::Global
: likev8::Persistant
, but movable according to C++11 move semantics.
You might wonder why multiple handle types are necessary.
If you’re like me, you’re defiantly thinking “maybe I want
to put a v8::Persistent
on the C++ stack, what are you
going to do about it?”
You can certainly do that if you want, and it will work.
The main benefit of using v8::Local
is that it can be more
efficient than v8::Persistent
when you have stack-like
allocation patterns.
If you need to store a handle somewhere other than the C++
stack, v8::Persistent
and v8::Global
are for you. They
are not associated with any v8::HandleScope
, so they are
valid until they are destroyed. They have higher overhead
than v8::Local
though.
Finding References Between Objects
The V8 API does not require us to assist it in any way to
find references between objects. We aren’t on the hook to
write any mark()
function like we did with Ruby. V8 can
find all references between objects on its own, because they
exist in either the regular references (like from an Array
to its members) or in the “internal array” we discussed
earlier. V8 has full visibility into all these references,
so can trace them without our help.
Now that we understand how this picture works as a whole, let’s revisit an earlier question: when is it safe for one C++ object in “native extension space” to have a pointer directly to another?
In the diagram above, we are not safe at all! There is no GC reference preventing Object B from being collected while Object A is still alive. But if Object B is collected, its C++ data will be collected also, and the pointer to it will become invalid. So this is a bad situation.
The clear solution here is that Object A needs to have a reference to object B. We can accomplish this with the internal fields array. If the internal fields array references Object B, this will prevent it from being collected when Object A is reachable. This will make the pointer between C++ data always happy and safe.