Cycles are a thorn in the side of refcounting. But yesterday I discovered (or perhaps rediscovered) a simple, efficient scheme for refcounting cyclic graphs if the defs are immutable: find strongly-connected components and make all nodes in each SCC share a refcount. Beautiful.
This relies on the graph theory result that if each SCC in a graph is replaced with a single node, the resulting graph forms a directed acyclic graph.
I’m planning to use this scheme to refcount defs in upb (garbage-collection is a non-starter because I don’t want to have to track a global list of roots or force the client to periodically call a GC function).
How do you avoid the expense of frequently running the O(V+E) SCC-finding algorithm whenever an object’s reference count is decremented to 0? I can imagine coloring each object you visit as you go along in a normal refcounting approach, then when you detect an object you’ve already visited (b/c it’s been colored already) switching to the SCC approach. But this algorithm will probably be pretty abusive to the cache for large numbers of objects.
Or is it that in upb you (the programmer) naturally know which objects are SCCs because of some nice structural property of the code, so that you don’t need to “discover” where they are with some sort of search? Perhaps that’s what you mean by “if the defs are immutable”…
Because my objects are immutable once initialized, the graph is immutable also and I can find SCC at initialization time, not when a refcount is decremented to 0. I do have to search, but I can do the search only once per graph.
This scheme wouldn’t work in most situations where GC or refcounting is used, but I have the luxury of an immutable object graph.
If each node is immutable once initialized, how do you build a cycle to begin with? Do you build nodes in batches?
When a node is first created it is mutable (edges can be created or changed), but it does not own refs on nodes it points to, so the client of my library is responsible for keeping nodes alive through this first stage. When they client is ready they can “finalize” a set of nodes which does the SCC search and guarantees, from then on, that all reachable nodes are always kept alive.