This article is the second part in a series. In this article we look at the V8 VM for JavaScript and explore the memory-management APIs it offers to native C++ extensions.

The other articles in the series are:

Representing JavaScript objects in C++

Let’s start by exploring how objects are represented in the V8 native extension API. The V8 API is written in C++. To refer to a JavaScript object from C++, you use a special kind of C++ object called a handle. There are several kinds of handles defined in the v8.h header, the most common of which is v8::Local:

// Create a new v8::String object, with the handle "str".
v8::Local<v8::String> str = v8::String::NewFromUtf8(isolate, "abc");

Handles in V8 are important from a memory management perspective – they function as GC roots. They also provide a layer of indirection that lets V8 move objects around in memory without breaking references from C++.

To go in the other direction (to refer to C++ data from V8 space) you use a special kind of V8 object called v8::External. A v8::External is a JavaScript object that simply stores a void*. The pointer can refer to whatever data you want.

v8::External is a normal JavaScript object, but you don’t normally want to expose these directly to JavaScript programs. For one thing, you can’t define any behavior on them (attributes or methods), so they would appear as just dumb opaque objects that were only useful for passing as an argument to other functions. Worse, the v8::External doesn’t carry any type information about the type of the pointer, so you wouldn’t be able to verify that the user passed you the right kind of v8::External. The JavaScript program could pass you a v8::External containing a Foo* when you were expecting a Bar*. This will probably lead to a SIGSEGV.

Instead, you probably want to stash the v8::External into a hidden storage area called the “internal fields” array. This is a per-object array of arbitrary of JavaScript objects that are only accessible to C++ extensions.

Here is some sample code from the v8 Embedder’s Guide illustrating how this works (with some comments added by me):

v8::Local<ObjectTemplate> point_templ = v8::ObjectTemplate::New(isolate);

// Indicate that we want our object to have one slot
// available in the "internal field" array.
point_templ->SetInternalFieldCount(1);

// Create a new object and set slot 0 to point to the
// given External object.
Point* p = ...;
v8::Local<Object> obj = point_templ->NewInstance();
obj->SetInternalField(0, v8::External::New(isolate, p));

Our strategy then is for all our C++ code to follow the rule “slot 0 in the internal fields for this object type will always contain a Point*.” Since JavaScript can’t mutate this “internal field” array, we can be guaranteed that JavaScript can’t mess this up and crash our program.

References between plain JavaScript objects occurr all the time, especially with container types such as Object and Array.

We can also create pointers between objects in “native extension” space, but we have to be careful that we don’t dereference a pointer to data that has been garbage-collected! We need to talk more about V8’s GC before we can understand how to do this safely.

Garbage Collection with V8

Like Ruby, V8 uses tracing garbage collection. From the previous article we know that there are two primary concerns we have to address to support tracing GC:

  1. What are the root objects?

  2. Given an object, what other objects does it directly reference?

We’ll start with #1.

Handles: GC roots from C++

To find root objects, V8 looks at all live JavaScript variables as well as handles from C++. It’s as simple as that! We have more to learn about handles though.

V8 has several kinds of handles:

  • v8::Local: for handles that you create on the C++ stack.
  • v8::Persistent: for handles not on the C++ stack (static or heap).
  • v8::Global: like v8::Persistant, but movable according to C++11 move semantics.

You might wonder why multiple handle types are necessary. If you’re like me, you’re defiantly thinking “maybe I want to put a v8::Persistent on the C++ stack, what are you going to do about it?”

You can certainly do that if you want, and it will work. The main benefit of using v8::Local is that it can be more efficient than v8::Persistent when you have stack-like allocation patterns.

void Foo(v8::Isolate* isolate) {
  // While this HandleScope lives, all Local<T> handles that
  // are created will be associated with it (even if they
  // aren't directly created in this function).  These
  // handles will be GC roots.
  //
  // When the HandleScope is destroyed, all Local<T> GC
  // roots associated with it will be destroyed also.
  v8::HandleScope handle_scope(isolate);

  // "str" will be a GC root (and thus safe to access) until
  // "handle_scope" is destroyed.  It will be placed on the
  // HandleScope's stack.
  v8::Local<String> str = v8::String::NewFromUtf8(isolate, "abc");

  // Warning: this will create a lot of entries on the
  // HandleScope stack!  Even when the v8::Local<T> object
  // is destroyed, the handle remains on the stack until
  // the v8::HandleScope is destroyed.
  for (int i = 0; i < 10000; i++) {
    v8::Local<String> tmp_str = v8::String::NewFromUtf8(isolate, "tmp");
  }
}

If you need to store a handle somewhere other than the C++ stack, v8::Persistent and v8::Global are for you. They are not associated with any v8::HandleScope, so they are valid until they are destroyed. They have higher overhead than v8::Local though.

Finding References Between Objects

The V8 API does not require us to assist it in any way to find references between objects. We aren’t on the hook to write any mark() function like we did with Ruby. V8 can find all references between objects on its own, because they exist in either the regular references (like from an Array to its members) or in the “internal array” we discussed earlier. V8 has full visibility into all these references, so can trace them without our help.

Now that we understand how this picture works as a whole, let’s revisit an earlier question: when is it safe for one C++ object in “native extension space” to have a pointer directly to another?

In the diagram above, we are not safe at all! There is no GC reference preventing Object B from being collected while Object A is still alive. But if Object B is collected, its C++ data will be collected also, and the pointer to it will become invalid. So this is a bad situation.

The clear solution here is that Object A needs to have a reference to object B. We can accomplish this with the internal fields array. If the internal fields array references Object B, this will prevent it from being collected when Object A is reachable. This will make the pointer between C++ data always happy and safe.