Garbage Collection Without Unsafe Code

Hacker News Top 04/19/26, 04:41 AM Tools

Summary

safe-gc is a new Rust crate that provides a garbage collector implemented entirely without unsafe code, using heap-indexing instead of direct pointer dereferencing to maintain memory safety.

No content available

Original Article

View Cached Full Text

Cached at: 04/22/26, 06:15 AM

# Garbage Collection Without Unsafe Code Source: [https://fitzgen.com/2024/02/06/safe-gc.html](https://fitzgen.com/2024/02/06/safe-gc.html) Many people, including myself, have implemented garbage collection \(GC\) libraries for Rust\. Manish Goregaokar wrote up a[fantastic survey](https://manishearth.github.io/blog/2021/04/05/a-tour-of-safe-tracing-gc-designs-in-rust/)of this space a few years ago\. These libraries aim to provide a safe API for their users to consume: an`unsafe`\-free interface which soundly encapsulates and hides the library’s internal`unsafe`code\. The one exception is their mechanism to enumerate the outgoing GC edges of user\-defined GC types, since failure to enumerate all edges can lead the collector to believe that an object is unreachable and collect it, despite the fact that the user still has a reference to the reclaimed object, leading to use\-after\-free bugs\.[1](https://fitzgen.com/2024/02/06/safe-gc.html#fn:edges)This functionality is generally exposed as an`unsafe`trait for the user to implement because it is the user’s responsibility, not the library’s, to uphold this particular critical safety invariant\. However, despite providing safe interfaces, all of these libraries make extensive use of`unsafe`code in their internal implementations\. I’ve always believed it was possible to write a garbage collection library without any`unsafe`code, and no one I’ve asserted this to has disagreed, but there has never been a proof by construction\. So, finally, I created the[`safe\-gc`](https://github.com/fitzgen/safe-gc)crate: a garbage collection library for Rust with zero`unsafe`code\. No`unsafe`in the API\. No`unsafe`in the implementation\. It even has a`forbid\(unsafe\_code\)`pragma at the top\. That said,`safe\-gc`is not a particularly high\-performance garbage collector\. ### Using`safe\-gc` To use`safe\-gc`, first we define our GC\-managed types, using`Gc<T\>`to define references to other GC\-managed objects, and implement the`Trace`trait to report each of those GC edges to the collector: ``` use safe_gc::{Collector, Gc, Trace}; // Define a GC-managed object. struct List { value: u32, // GC-managed references to the next and previous links in the list. prev: Option<Gc<List>>, next: Option<Gc<List>>, } // Report GC edges to the collector. impl Trace for List { fn trace(&self, collector: &mut Collector) { if let Some(prev) = self.prev { collector.edge(prev); } if let Some(next) = self.next { collector.edge(next); } } } ``` This looks pretty similar to other GC libraries in Rust, although it could definitely benefit from an implementation of`Trace`for`Option<T\>`and a`derive\(Trace\)`macro\. The big difference from existing GC libraries is that`Trace`is safe to implement; more on this later\. Next, we create one or more`Heap`s to allocate our objects within\. Each heap is independently garbage collected\. ``` use safe_gc::Heap; let mut heap = Heap::new(); ``` And with a`Heap`in hand, we can allocate objects: ``` let a = heap.alloc(List { value: 42, prev: None, next: None, }); let b = heap.alloc(List { value: 36, prev: Some(a.into()), next: None, }); // Create a bunch of garbage! Who cares! It'll all be cleaned // up eventually! for i in 0..100 { let _ = heap.alloc(List { value: i, prev: None, next: None, }); } ``` The heap will automatically trigger garbage collections, as necessary, but we can also force a collection if we want: ``` // Force a garbage collection! heap.gc() ``` Rather than deref’ing`Gc<T\>`pointers directly, we must index into the`Heap`to access the referenced`T`object\. This contrasts with other GC libraries and is the key that unlocks`safe\-gc`’s lack of`unsafe`code, allowing the implementation to abide by Rust’s ownership and borrowing discipline\.[2](https://fitzgen.com/2024/02/06/safe-gc.html#fn:gc-arena) ``` // Read from a GC object in the heap. let b_value = heap[&b].value; assert_eq!(b_value, 36); // Write to a GC object in the heap. heap[&b].value += 1; assert_eq!(heap[&b].value, 37); ``` Finally, there are actually two types for indexing into`Heap`s to access GC objects: 1. `Gc<T\>`, which we have seen already, and 2. `Root<T\>`, which we have also seen in action, but which was hidden from us by type inference\. The`Gc<T\>`type is`Copy`and should be used when referencing other GC\-managed objects from within a GC\-managed object’s type definition, or when you can prove that a garbage collection will not happen \(i\.e\. you have a shared borrow of its heap\)\. A`Gc<T\>`does*not*root its referenced`T`, keeping it alive across garbage collections, and therefore`Gc<T\>`should not be used to hold onto GC references across any operation that can trigger a garbage collection\. A`Root<T\>`, on the other hand, does indeed root its associated`T`object, preventing the object from being reclaimed during garbage collection\. This makes`Root<T\>`suitable for holding references to GC\-managed objects across operations that can trigger garbage collections\.`Root<T\>`is not`Copy`because dropping it must remove its entry from the heap’s root set\. Allocation returns rooted references; all the`heap\.alloc\(\.\.\.\)`calls from our earlier examples returned`Root<T\>`s\. ### Peeking Under the Hood A`safe\_gc::Heap`is more similar to[an arena newtype over a`Vec`](https://docs.rs/id-arena)than an engineered heap with hierarchies of regions like[Immix](http://users.cecs.anu.edu.au/~steveb/pubs/papers/immix-pldi-2008.pdf)\. Its main storage is a hash map from`std::any::TypeId`to uniform arenas of the associated type\. This lets us ultimately use`Vec`as the storage for heap\-allocated objects, and we don’t need to do any unsafe pointer arithmetic or worry about splitting large blocks in our free lists\. In fact, the free lists only manage indices, not blocks of raw memory\. ``` pub struct Heap { // A map from `type_id(T)` to `Arena<T>`. The `ArenaObject` // trait facilitates crossing the boundary from an untyped // heap to typed arenas. arenas: HashMap<TypeId, Box<dyn ArenaObject>>, // ... } struct Arena<T> { elements: FreeList<T>, // ... } enum FreeListEntry<T> { /// An occupied entry holding a `T`. Occupied(T), /// A free entry that is also part of a linked list /// pointing to the next free entry, if any. Free(Option<u32>), } struct FreeList<T> { // The actual backing storage for our `T`s. entries: Vec<FreeListEntry<T>>, /// The index of the first free entry in the free list. free: Option<u32>, // ... } ``` To allocate a new`T`in the heap, we first get the`T`object arena out of the heap’s hash map, or create it if it doesn’t exist yet\. Then, we check if the arena has capacity to allocate our new`T`\. If it does, we push the object onto the arena and return a rooted reference\. If it does not, we fall back to an out\-of\-line slow path where we trigger a garbage collection to ensure that we have space for the new object, and then try again\. ``` impl Heap { #[inline] pub fn alloc<T>(&mut self, value: T) -> Root<T> where T: Trace, { let heap_id = self.id; let arena = self.ensure_arena::<T>(); // Fast path for when we have available capacity for // allocating into. match arena.try_alloc(heap_id, value) { Ok(root) => root, Err(value) => self.alloc_slow(value), } } // Out-of-line slow path for when we need to GC to free // up or allocate additional space. #[inline(never)] fn alloc_slow<T>(&mut self, value: T) -> Root<T> where T: Trace, { self.gc(); let heap_id = self.id; let arena = self.ensure_arena::<T>(); arena.alloc_slow(heap_id, value) } } ``` `Arena<T\>`allocation bottoms out in allocating from a`FreeList<T\>`, which will attempt to use existing capacity by popping off its internal list of empty entries when possible, or otherwise fall back to reserving additional capacity\. ``` impl<T> FreeList<T> { fn try_alloc(&mut self, value: T) -> Result<u32, T> { if let Some(index) = self.free { // We have capacity. Pop the first free entry off // the free list and put the value in there. let index = usize::try_from(index).unwrap(); let next_free = match self.entries[index] { Entry::Free(next_free) => next_free, Entry::Occupied { .. } => unreachable!(), }; self.free = next_free; self.entries[index] = Entry::Occupied(value); Ok(index) } else { // No capacity to hold the value; give it back. Err(value) } } fn alloc(&mut self, value: T) -> u32 { self.try_alloc(value).unwrap_or_else(|value| { // Reserve additional capacity, since we didn't have // space for the allocation. self.double_capacity(); // After which the allocation will succeed. self.try_alloc(value).ok().unwrap() }) } } ``` Accessing objects in the heap is straightforward: look up the arena for`T`and index into it\. ``` impl Heap { /// Get a shared borrow of the referenced `T`. pub fn get<T>(&self, gc: impl Into<Gc<T>>) -> &T where T: Trace, { let gc = gc.into(); assert_eq!(self.id, gc.heap_id); let arena = self.arena::<T>().unwrap(); arena.elements.get(gc.index) } // Get an exclusive borrow of the referenced `T`. pub fn get_mut<T>(&mut self, gc: impl Into<Gc<T>>) -> &mut T where T: Trace, { let gc = gc.into(); assert_eq!(self.id, gc.heap_id); let arena = self.arena_mut::<T>().unwrap(); arena.elements.get_mut(gc.index) } } ``` Before we get into how`safe\-gc`actually performs garbage collection, we need to look at how it implements the root set\. The root set are the set of things that are definitely alive; things that the application is actively using right now or planning to use in the future\. The goal of the collector is to identify all objects transitively referenced by these roots, since these are the objects that can still be used in the future, and recycle all others\. Each`Arena<T\>`has its own`RootSet<T\>`\. For simplicity a`RootSet<T\>`is a wrapper around a`FreeList<Gc<T\>\>`\. When we add new roots, we insert them into the`FreeList`, and when we drop a root, we remove it from the`FreeList`\. This does mean that the root set can contain duplicates and is therefore not a proper set\. The root set’s`FreeList`is additionally wrapped in an`Rc<RefCell<\.\.\.\>\>`so that we can implement`Clone`for`Root<T\>`, which adds another entry in the root set, and don’t need to explicitly pass around a`Heap`to hold additional references to a rooted object\. Finally, I took care to design`Root<T\>`and`RootSet<T\>`such that`Root<T\>`doesn’t directly hold a`Gc<T\>`\. This allows for updating rooted GC pointers after a collection, which is necessary for moving GC algorithms like generational GC and compaction\. In fact, I originally intended to implement a copying collector, which is a moving GC algorithm, for`safe\-gc`but ran into some issues\. More on those later\. For now, we retain the possibility of introducing moving GC at a later date\. ``` struct Arena<T> { // ... // Each arena has a root set. roots: RootSet<T>, } // The set of rooted `T`s in an arena. struct RootSet<T> { inner: Rc<RefCell<FreeList<Gc<T>>>>, } impl<T: Trace> RootSet<T> { // Rooting a `Gc<T>` adds an entry to the root set. fn insert(&self, gc: Gc<T>) -> Root<T> { let mut inner = self.inner.borrow_mut(); let index = inner.alloc(gc); Root { roots: self.clone(), index, } } fn remove(&self, index: u32) { let mut inner = self.inner.borrow_mut(); inner.dealloc(index); } } pub struct Root<T: Trace> { // Each `Root<T>` holds a reference to the root set. roots: RootSet<T>, // Index of this root in the root set. index: u32, } // Dropping a `Root<T>` removes its entry from the root set. impl<T: Trace> Drop for Root<T> { fn drop(&mut self) { self.roots.remove(self.index); } } ``` With all that out of the way, we can finally look at the core garbage collection algorithm\. `safe\-gc`implements simple mark\-and\-sweep garbage collection\. We begin by resetting the mark bits for each arena, and making sure that there are enough bits for all of our allocated objects, since we keep the mark bits in an out\-of\-line compact bitset rather than in each object’s header word or something like that\. ``` impl Heap { #[inline(never)] pub fn gc(&mut self) { // Reset/pre-allocate the mark bits. for (ty, arena) in &self.arenas { self.collector .mark_bits .entry(*ty) .or_default() .reset(arena.capacity()); } // ... } } ``` Next we begin the mark phase\. This starts by iterating over each root and then setting its mark bit and enqueuing it in the mark stack by calling`collector\.edge\(root\)`\. ``` impl Heap { #[inline(never)] pub fn gc(&mut self) { // ... // Mark all roots. for arena in self.arenas.values() { arena.trace_roots(&mut self.collector); } // ... } } trait ArenaObject: Any { fn trace_roots(&self, collector: &mut Collector); // ... } impl<T: Trace> ArenaObject for Arena<T> { fn trace_roots(&self, collector: &mut Collector) { self.roots.trace(collector); } // ... } impl<T: Trace> RootSet<T> { fn trace(&self, collector: &mut Collector) { let inner = self.inner.borrow(); for (_, root) in inner.iter() { collector.edge(*root); } } } ``` The mark phase continues by marking everything transitively reachable from those roots in a fixed\-point loop\. If we discover an unmarked object, we mark it and enqueue it for tracing\. Whenever we see an already\-marked object, we ignore it\. What is kind of unusual is that we don’t have a single mark stack\. The`Heap`has no`T`type parameter, and contains many different types of objects, so the heap itself doesn’t know how to trace any particular object\. However, each of the heap’s`Arena<T\>`s holds only a single type of object, and an arena*does*know how to trace its objects\. So we have a mark stack for each`T`, or equivalently, each arena\. This means that our fixed\-point loop has two levels: an outer loop that continues while any mark stack has work enqueued, and an inner loop to drain a particular mark stack\. ``` impl Heap { #[inline(never)] pub fn gc(&mut self) { // ... // Mark everything transitively reachable from the roots. while let Some(type_id) = self .collector .next_non_empty_mark_stack() { while let Some(index) = self .collector .pop_mark_stack(type_id) { self.arenas .get_mut(&type_id) .unwrap() .trace_one(index, &mut self.collector); } } // ... } } ``` While the driver loop for marking is inside the`Heap::gc`method, the actual edge tracing and mark bit setting happens inside`Collector`and the arena which, because it has a`T`type parameter, can call the correct`Trace`implementation for each object\. ``` trait ArenaObject: Any { fn trace_one(&mut self, index: u32, collector: &mut Collector); // ... } impl<T: Trace> ArenaObject for Arena<T> { fn trace_one(&mut self, index: u32, collector: &mut Collector) { self.elements.get(index).trace(collector); } // ... } pub struct Collector { heap_id: u32, // The mark stack for each type in the heap. mark_stacks: HashMap<TypeId, Vec<u32>>, // The mark bits for each type in the heap. mark_bits: HashMap<TypeId, MarkBits>, } impl Collector { pub fn edge<T: Trace>(&mut self, to: Gc<T>) { assert_eq!(to.heap_id, self.heap_id); // Get the mark bits for `T` objects. let ty = TypeId::of::<T>(); let mark_bits = self.mark_bits.get_mut(&ty).unwrap(); // Set `to`'s mark bit. If the bit was already set, we're // done. if mark_bits.set(to.index) { return; } // Otherwise this is the first time visiting this GC // object so enqueue it for further marking. let mark_stack = self.mark_stacks.entry(ty).or_default(); mark_stack.push(to.index); } } ``` Once our mark stacks are all empty, we’ve reached our fixed point, and that means we’ve finished marking all objects reachable from the root set\. Now we transition to the sweep phase\. Sweeping iterates over each object in each arena\. If that object’s mark bit is not set, then it is unreachable from the GC roots, i\.e\. it is not a member of the live set, i\.e\. it is garbage\. We drop such objects and push their slots into their arena’s free list, making the slot available for future allocations\. After sweeping each arena we check whether the arena is still close to running out of capacity and, if so, reserve additional space for the arena\. This amortizes the cost of garbage collection and avoids a scenario that could otherwise trigger a full GC on every object allocation: - The arena has zero available capacity\. - The user tries to allocate, triggering a GC\. - The GC is able to reclaim only one slot in the arena\. - The user’s pending allocation fills the reclaimed slot\. - Now the arena is out of capacity again, and the process repeats from the top\. By reserving additional space in the arena after sweeping, we avoid this failure mode\. We could also compact the arena and release excess space back to the global allocator if there was too much available capacity\. This would additionally require a method for updating incoming edges to the compacted objects, and`safe\-gc`does not implement compaction at this time\. ``` impl Heap { #[inline(never)] pub fn gc(&mut self) { // ... // Sweep. for (ty, arena) in &mut self.arenas { let mark_bits = &self.collector.mark_bits[ty]; arena.sweep(mark_bits); } } } trait ArenaObject: Any { // ... fn sweep(&mut self, mark_bits: &MarkBits); } impl<T: Trace> ArenaObject for Arena<T> { // ... fn sweep(&mut self, mark_bits: &MarkBits) { // Reclaim garbage slots. let capacity = self.elements.capacity(); for index in 0..capacity { if !mark_bits.get(index) { self.elements.dealloc(index); } } // Amortize the cost of GC across allocations. let len = self.elements.len(); let available = capacity - len; if available < capacity / 4 { self.elements.double_capacity(); } } } ``` After every arena is swept, garbage collection is complete\! ### Preventing Classic Footguns Now that we know how`safe\-gc`is implemented, we can explore a couple classic GC footguns and analyze how`safe\-gc`either completely nullifies them or downgrades them from critical security vulnerabilities to plain old bugs\. Often an object might represent some external resource that should be cleaned up when the object is no longer in use, like an open file descriptor\. This functionality is typically supported with*finalizers*, the GC\-equivalent of C\+\+ destructors and Rust’s`Drop`trait\. Finalization of GC objects is usually tricky because of the risks of either accessing objects that have already been reclaimed by the collector \(which is a use\-after\-free bug\) or accidentally entrenching objects and making them live again \(which leads to memory leaks\)\. Because of these risks, Rust GC libraries often make finalization an`unsafe`trait and even forbid allocating types that implement`Drop`in their heaps\. However,`safe\-gc`doesn’t need an`unsafe`finalizer trait, or even any additional finalizer trait: it can just use`Drop`\.`Drop`implementations simply do not have access to a`Heap`, which is required to deref GC pointers, so they*cannot*suffer from those finalization footguns\. Next up: why isn’t`Trace`an`unsafe`trait? And what happens if you don’t root a`Gc<T\>`and then index into a`Heap`with it after a garbage collection? These are actually the same question: what happens if I use a dangling`Gc<T\>`? As mentioned at the start, if a`Trace`implementation fails to report all edges to the collector, the collector may believe an object is unreachable and reclaim it, and now the unreported edge is dangling\. Similarly, if the user holds an unrooted`Gc<T\>`, rather than a`Root<T\>`, across a garbage collection then the collector might believe that the referenced object is garbage and reclaim it, leaving the unrooted reference dangling\. Indexing into a`Heap`with a potentially\-dangling`Gc<T\>`will result in one of three possibilities: 1. We got “lucky” and something else happened to keep the object alive\. The access succeeds as it otherwise would have and the potentially\-dangling bug is hidden\. 2. The associated slot in the arena’s free list is empty and contains a`FreeListEntry::Free`variant\. This scenario will raise a panic\. 3. A new object has since been allocated in the same arena slot\. The access will succeed, but it will be to the wrong object\. This is an instance of[the ABA problem](https://en.wikipedia.org/wiki/ABA_problem)\. We could, at the cost of some runtime overhead, turn this into a loud panic instead of silent action at a distance by[adding a generation counter to our arenas](https://docs.rs/generational-arena/latest/generational_arena/)\. Of course, it would be best if users always rooted GC references they held across collections and correctly implemented the`Trace`trait but, should they fail to do that, all three potential outcomes are 100% memory safe\.[3](https://fitzgen.com/2024/02/06/safe-gc.html#fn:sm-rooting-analysis)These failures can’t lead to memory corruption or use\-after\-free bugs, which would be the typical results of this kind of thing with an unsafe GC implementation\. ### Copying Collector False Start I initially intended to implement a copying collector rather than mark\-and\-sweep, but ultimately the borrowing and ownership didn’t pan out\. That isn’t to say it is impossible to implement a copying collector in safe Rust, but it ended up feeling like more of a headache than it was worth\. I spent several hours trying to jiggle things around to experiment with different ownership hierarchies and didn’t get anything satisfactory\. When I decided to try mark\-and\-sweep, it only took me about half an hour to get an initial prototype working\. I found this really surprising, since I had a strong intuition that a copying collector, with its separate from\- and to\-spaces, should play well with Rust’s ownership and borrowing\. Briefly, the algorithm works as follows: - We equally divide the heap into two*semi\-spaces*\. - At any given time in between collections, all objects live in one semi\-space and the other is sitting idle\. - We[bump allocate](https://fitzgeraldnick.com/2019/11/01/always-bump-downwards.html)within the active semi\-space, slowly filling it up, and when the bump pointer reaches the end of the semi\-space, we trigger a collection\. - During collection, as we trace the live set, we copy objects from the old semi\-space that has been active, to the other new semi\-space that has been idle\. At the same time, we maintain a map from the live objects’ location in the old semi\-space to their location in the new semi\-space\. When we trace an object’s edges, we also*update*those edges to point to their new locations\. Once tracing reaches a fixed\-point, we’ve copied the whole live set to the new semi\-space, it becomes the active semi\-space, and the previously\-active semi\-space now sits idle until the next collection\. Copying collection has a number of desirable properties: - The algorithm is relatively simple and easy to understand\. - Allocating new objects is fast: just bumping a pointer and checking that space isn’t exhausted yet\. - The act of copying objects to the new semi\-space compacts the heap, defeating fragmentation\. - It also eliminates the need for a sweep phase, since the whole of the old semi\-space is garbage after the live set has been moved to the new semi\-space\. Copying collection’s primary disadvantage is the memory overhead it imposes: we can only ever use at most half of the heap to store objects\. When I think about a copying collector, I tend to imagine Lisp cons cells, as I was first introduced to this algorithm in that context by[SICP](https://mitp-content-server.mit.edu/books/content/sectbyfn/books_pres_0/6515/sicp.zip/index.html)\. Here is what a very naive implementation of the core copying collection algorithm might look like in safe Rust: ``` fn copy_collect( roots: &mut [usize], from: &[Cons], to: &mut Vec<Cons>, ) { // Contains a work list of the new indices of cons cells // that have been been copied to `to` but haven't had their // edges traced and updated yet. let mut stack = vec::with_capacity(roots.len()); // The map from each live object's old location, to its new one. let mut old_to_new = HashMap::new(); // Copy each root to the to-space, enqueue it for tracing, and // update its pointer to its new index in the to-space. for root in roots { visit_edge(from, to, &mut old_to_new, &mut stack, root); } // Now do the same for everything transitively reachable from // the roots. while let Some(index) = stack.pop() { let cons = &mut to[index]; if let Some(car) = &mut cons.car { visit_edge(from, to, &mut old_to_new, &mut stack, car); } if let Some(cdr) = &mut cons.cdr { visit_edge(from, to, &mut old_to_new, &mut stack, cdr); } } } // Visit one edge. If the edge's referent has already been copied // to the to-space, just update the edge's pointer so that it points // to the new location. If it hasn't been copied yet, additionally // copy it over and enqueue it in the stack for future tracing. fn visit_edge( from: &[Cons], to: &mut Vec<Cons>, old_to_new: &mut HashMap<usize, usize>, stack: &mut Vec<usize>, edge: &mut usize, ) { let new_location = old_to_new .entry(*edge) .or_insert(|| { let new = to.len(); // Copy the object over. to.push(from[*edge]); // Enqueue it for tracing. stack.push(new); new }); *edge = new_location; } ``` As written, this works and is 100% safe\![4](https://fitzgen.com/2024/02/06/safe-gc.html#fn:works)So where do things start to break down? We’ll get there, but first… The old\-to\-new\-location map needn’t be an additional, separate allocation\. We don’t need that hash map\. Instead, we can reuse the from\-space’s storage and write the address of each copied object’s new location inline into its old location\. These are referred to as*forwarding pointers*\. This is a super standard optimization for copying collection; so much so that it’s rare to see a copying collector without it\. Let’s implement inline forwarding pointers for our safe copying collector\. Because we are mutating the from\-space to write the forwarding pointers, we will need to change it from a shared borrow into an exclusive borrow\. Additionally, to differentiate between forwarding pointers and actual cons cells, our semi\-spaces must become slices of an`enum`rather than slices of cons cells directly\. ``` enum SemiSpaceEntry { // The cons cell. If we see this during tracing, that means // we haven't copied it over to the to-space yet. Occupied(Cons), // This cons cell has already been moved, here is its new // location. Forwarded(usize) } fn copy_collect( roots: &mut [usize], from: &mut [SemiSpaceEntry], to: &mut Vec<SemiSpaceEntry>, ) { // Same as before, but without `old_to_new`... } fn visit_edge( from: &mut [SemiSpaceEntry], to: &mut Vec<SemiSpaceEntry>>, stack: &mut Vec<usize>, edge: &mut usize, ) { let new = match &mut from[*edge] { SemiSpaceEntry::Forwarded(new) => new, SemiSpaceEntry::Occupied(cons) => { let new = to.len(); // Copy the object over. to.push(cons); // Enqueue it for tracing. stack.push(new); // !!! Write the forwarding pointer. !!! from[edge] = SemiSpaceEntry::Forwarded(new); new } }; *edge = new; } ``` Again, this copying collector with forwarding pointers works and is still 100% safe code\. Things break down when we move away from a homogeneously\-typed heap that only contains cons cells towards a heterogeneously\-typed heap that can contain any type of GC object\. Recall how`safe\_gc::Heap`organizes its underlying storage with a hash map keyed by type id to get the`Arena<T\>`storage for that associated type: ``` pub struct Heap { // A map from `type_id(T)` to `Arena<T>`. arenas: HashMap<TypeId, Box<dyn ArenaObject>>, // ... } ``` My idea was that a whole`Heap`would be a semi\-space, and if it was the active semi\-space, the heap would additionally have an owning handle to the idle semi\-space: ``` pub struct Heap { // ... idle_semi_space: Option<Box<Heap>>, } ``` Given that, we would collect the heap by essentially \(papering over some details\) calling`copy\_collect`on each of its internal arenas: ``` impl Heap { pub fn gc(&mut self) { let to_heap = self.idle_semi_space.take().unwrap(); for (ty, from_arena) in &mut self.arenas { copy_collect(from_arena, to_heap); } } } ``` Note that we pass the whole`to\_heap`into`copy\_collect`, not`from\_arena`’s corresponding`Arena<T\>`in the to\-space, because there can be cross\-type edges\. A`Cat`object can have a reference to a`Salami`object as a little treat, and we need access to the whole to\-space, not just its`Arena<Cat\>`, to copy that`Salami`over when tracing`Cat`s\. But here’s where things break down: we also need mutable access the whole from\-space when tracing`Arena<Cat\>`s because we need to write the forwarding pointer in the from\-space’s`Arena<Salami\>`for the`Salami`’s new location in the to\-space\. But we can’t have mutable access to the whole from\-space because we’ve already projected into one of its arenas\. Yeah, I guess we could use something like take the`Arena<Cat\>`out of the from\-space, and then pass both the`Arena<Cat\>`and the whole from\-space into`copy\_collect`\. But then what do we do for`Cat`\-to\-`Cat`edges? Have some kind of check to test for whether we need to follow a given edge into the from\-space`Heap`or the`Arena`we are currently tracing? Like I said, I don’t think it is*impossible*to overcome these hurdles, the question is: is overcoming them worth it? Everything I could think up got pretty inelegant pretty quickly and/or would have laughably poor performance\.[5](https://fitzgen.com/2024/02/06/safe-gc.html#fn:perf)When compared with how easy it was to implement mark\-and\-sweep, I just don’t think a 100%`unsafe`\-free copying collector that supports arbitrary, heterogeneous types is worth the headache\. ### Why`safe\-gc`? `safe\-gc`is certainly a point in the design space of garbage\-collection libraries in Rust\. One could even argue it is an interesting — and maybe even useful? — point in the design space\! Also, it was fun\! At the very least, you don’t have to wonder about the correctness of any`unsafe`code in there, because there isn’t any\. As long as the Rust language and its standard library are sound,`safe\-gc`is too\. ### Conclusion The[`safe\-gc`](https://github.com/fitzgen/safe-gc)crate implements garbage\-collection\-as\-library for Rust with zero`unsafe`code\. It was fun to implement\! Thanks to[Trevor Elliott](https://elliottt.github.io/)and[Jamey Sharp](https://jamey.thesharps.us/)for brainstorming with me and thanks to[Manish Goregaokar](https://manishearth.github.io/)and again to Trevor Elliott for reading early drafts of this blog post\. ---

Garbage Collection Without Unsafe Code

Similar Articles

The Edge of Safe Rust

Safe Made Easy Pt.1: Single Ownership is (Not) Optional

Improving C# Memory Safety

@davidcrawshaw: While the industry is pouring resources into programs without GC (rust), I think the Jane Street OCaml folks have it fi…

Memory safety is a matter of life and death

Submit Feedback

Similar Articles

Safe Made Easy Pt.1: Single Ownership is (Not) Optional

@davidcrawshaw: While the industry is pouring resources into programs without GC (rust), I think the Jane Street OCaml folks have it fi…

Memory safety is a matter of life and death