Collection: Implement garbage collection

We need to add a few more things to get garbage collection working. Specifically, we need to config the GCWorkerCopyContext, which a GC worker uses for copying objects, and GC work packets that will be scheduled for a collection.

CopyConfig

CopyConfig defines how a GC plan copies objects. Similar to the MutatorConfig struct, you would need to define CopyConfig for your plan.

In impl<VM: VMBinding> Plan for MyGC<VM>, override the method create_copy_config(). The default implementation provides a default CopyConfig for non-copying plans. So for non-copying plans, you do not need to override the method. But for copying plans, you would have to provide a proper copy configuration.

In a semispace GC, objects will be copied between the two copy spaces. We will use one CopySpaceCopyContext for the copying, and will rebind the copy context to the proper tospace in the preparation step of a GC (which will be discussed later when we talk about preparing for collections).

We use CopySemantics::DefaultCopy for our copy operation, and bind it with the first CopySpaceCopyContext (CopySemantics::DefaultCopy => CopySelector::CopySpace(0)). Other copy semantics are unused in this plan. We also provide an initial space binding for CopySpaceCopyContext. However, we will flip tospace in every GC, and rebind the copy context to the new tospace in each GC, so it does not matter which space we use as the initial space here.

#![allow(unused)]
fn main() {
    fn create_copy_config(&'static self) -> CopyConfig<Self::VM> {
        use enum_map::enum_map;
        CopyConfig {
            copy_mapping: enum_map! {
                CopySemantics::DefaultCopy => CopySelector::CopySpace(0),
                _ => CopySelector::Unused,
            },
            space_mapping: vec![
                // The tospace argument doesn't matter, we will rebind before a GC anyway.
                (CopySelector::CopySpace(0), &self.copyspace0)
            ],
            constraints: &MYGC_CONSTRAINTS,
        }
    }
}

Because the semispace GC copies objects in every single GC, we modify the method current_gc_may_move_object() in MyGC so that it always returns true.

#![allow(unused)]
fn main() {
    fn current_gc_may_move_object(&self) -> bool {
        true
    }
}

Introduce collection to MyGC plan

Add a new method to Plan for MyGC, schedule_collection(). This function runs when a collection is triggered. It schedules GC work for the plan, i.e., it stops all mutators, runs the scheduler's prepare stage and resumes the mutators. The StopMutators work will invoke code from the bindings to scan threads and other roots, and those scanning work will further push work for a transitive closure.

Though you can add those work packets by yourself, GCWorkScheduler provides a method schedule_common_work() that will add common work packets for you.

To use schedule_common_work(), first we need to create a type MyGCWorkContext and implement the trait GCWorkContext for it. We create gc_work.rs and add the following implementation. Note that we will use the default SFTProcessEdges, which is a general work packet that a plan can use to trace objects. For plans like semispace, SFTProcessEdges is sufficient. For more complex GC plans, one can create and write their own work packet that implements the ProcessEdgesWork trait. We will discuss about this later, and discuss the alternatives.

#![allow(unused)]
fn main() {
pub struct MyGCWorkContext<VM: VMBinding>(std::marker::PhantomData<VM>);
impl<VM: VMBinding> crate::scheduler::GCWorkContext for MyGCWorkContext<VM> {
    type VM = VM;
    type PlanType = MyGC<VM>;
    type DefaultProcessEdges = SFTProcessEdges<Self::VM>;
    type PinningProcessEdges = UnsupportedProcessEdges<Self::VM>;
}
}

Then we implement schedule_collection() using MyGCWorkContext and schedule_common_work().

#![allow(unused)]
fn main() {
    fn schedule_collection(&'static self, scheduler: &GCWorkScheduler<VM>) {
        scheduler.schedule_common_work::<MyGCWorkContext<VM>>(self);
    }
}

Delete handle_user_collection_request(). This function was an override of a Common plan function to ignore user requested collection for NoGC. Now we remove it and allow user requested collection.

Prepare for collection

The collector has a number of steps it needs to perform before each collection. We'll add these now.

Prepare plan

In mygc/global.rs, find the method prepare. Delete the unreachable!() call, and add the following code:

#![allow(unused)]
fn main() {
    fn prepare(&mut self, tls: VMWorkerThread) {
        self.common.prepare(tls, true);

        self.hi
            .store(!self.hi.load(Ordering::SeqCst), Ordering::SeqCst);
        // Flips 'hi' to flip space definitions
        let hi = self.hi.load(Ordering::SeqCst);
        self.copyspace0.prepare(hi);
        self.copyspace1.prepare(!hi);

        self.fromspace_mut()
            .set_copy_for_sft_trace(Some(CopySemantics::DefaultCopy));
        self.tospace_mut().set_copy_for_sft_trace(None);
    }
}

This function is called at the start of a collection. It prepares the two spaces in the common plan, flips the definitions for which space is 'to' and which is 'from', then prepares the copyspaces with the new definition.

Note that we call set_copy_for_sft_trace() for both spaces. This step is required when using SFTProcessEdges to tell the spaces which copy semantic to use for copying. For fromspace, we use the DefaultCopy semantic, which we have defined earlier in our CopyConfig. So for objects in fromspace that need to be copied, the policy will use the copy context that binds with DefaultCopy (which allocates to the tospace) in the GC worker. For tospace, we set its copy semantics to None, as we do not expect to copy objects from tospace, and if that ever happens, we will simply panic.

Prepare worker

As we flip tospace for the plan, we also need to rebind the copy context to the new tospace. We will override prepare_worker() in our Plan implementation. Plan.prepare_worker() is executed by each GC worker in the preparation phase of a GC. The code is straightforward -- we get the first CopySpaceCopyContext, and call rebind() on it with the new tospace.

#![allow(unused)]
fn main() {
    fn prepare_worker(&self, worker: &mut GCWorker<VM>) {
        unsafe { worker.get_copy_context_mut().copy[0].assume_init_mut() }.rebind(self.tospace());
    }
}

Prepare mutator

Going back to mutator.rs, create a new function called mygc_mutator_prepare<VM: VMBinding>(_mutator: &mut Mutator<VM>, _tls: VMWorkerThread). This function will be called at the preparation stage of a collection (at the start of a collection) for each mutator. Its body can stay empty, as there aren't any preparation steps for the mutator in this GC. In create_mygc_mutator(), find the field prepare_func and change it from &unreachable_prepare_func to &mygc_mutator_prepare.

💡 Hint: If your plan does nothing when preparing mutators, there is an optimization you can do. You may set the plan constraints field PlanConstraints::needs_prepare_mutator to false so that the PrepareMutator work packets which call prepare_func will not be created in the first place. This optimization is helpful for VMs that run with a large number of mutator threads. If you do this optimization, you may also leave the MutatorConfig::prepare_func field as &unreachable_prepare_func to indicate it should not be called.

Release

Finally, we need to fill out the functions that are, roughly speaking, run after each collection.

Release in plan

Find the method release() in mygc/global.rs. Replace the unreachable!() call with the following code.

#![allow(unused)]
fn main() {
    fn release(&mut self, tls: VMWorkerThread) {
        self.common.release(tls, true);
        self.fromspace().release();
    }
}

This function is called at the end of a collection. It calls the release routines for the common plan spaces and the fromspace.

Release in mutator

Go back to mutator.rs. Create a new function called mygc_mutator_release() that takes the same inputs as the mygc_mutator_prepare() function above.

#![allow(unused)]
fn main() {
pub fn mygc_mutator_release<VM: VMBinding>(
    mutator: &mut Mutator<VM>,
    _tls: VMWorkerThread,
) {
    // rebind the allocation bump pointer to the appropriate semispace
    let bump_allocator = unsafe {
        mutator
            .allocators
            .get_allocator_mut(mutator.config.allocator_mapping[AllocationSemantics::Default])
    }
    .downcast_mut::<BumpAllocator<VM>>()
    .unwrap();
    bump_allocator.rebind(
        mutator
            .plan
            .downcast_ref::<MyGC<VM>>()
            .unwrap()
            .tospace(),
    );
}
}

Then go to create_mygc_mutator(), replace &unreachable_release_func in the release_func field with &mygc_mutator_release. This function will be called at the release stage of a collection (at the end of a collection) for each mutator. It rebinds the allocator for the Default allocation semantics to the new tospace. When the mutator threads resume, any new allocations for Default will then go to the new tospace.

ProcessEdgesWork for MyGC

ProcessEdgesWork is the key work packet for tracing objects in a GC. A ProcessEdgesWork implementation defines how to trace objects, and how to generate more work packets based on the current tracing to finish the object closure.

GCWorkContext specifies a type that implements ProcessEdgesWork, and we used SFTProcessEdges earlier. In this section, we discuss what SFTProcessEdges does, and what the alternatives are.

Approach 1: Use SFTProcessEdges

SFTProcessEdges dispatches the tracing of objects to their respective spaces through Space Function Table (SFT). As long as all the policies in a plan provide an implementation of sft_trace_object() in their SFT implementations, the plan can use SFTProcessEdges. Currently most policies provide an implementation for sft_trace_object(), except mark compact and immix. Those two policies use multiple GC traces, and due to the limitation of SFT, SFT does not allow multiple sft_trace_object() for a policy.

SFTProcessEdges is the simplest approach when all the policies support it. Fortunately, we can use it for our GC, semispace.

Approach 2: Derive PlanTraceObject and use PlanProcessEdges

PlanProcessEdges is another general ProcessEdgesWork implementation that can be used by most plans. When a plan implements the PlanTraceObject, it can use PlanProcessEdges.

You can manually provide an implementation of PlanTraceObject for MyGC. But you can also use the derive macro MMTK provides, and the macro will generate an implementation of PlanTraceObject:

  • Make sure MyGC already has the #[derive(HasSpaces)] attribute because all plans need to implement the HasSpaces trait anyway. (import the macro properly: use mmtk_macros::HasSpaces)
  • Add #[derive(PlanTraceObject)] for MyGC (import the macro properly: use mmtk_macros::PlanTraceObject)
  • Add both #[space] and #[copy_semantics(CopySemantics::Default)] to both copy space fields, copyspace0 and copyspace1. #[space] tells the macro that both copyspace0 and copyspace1 are spaces in the MyGC plan, and the generated trace code will check both spaces. #[copy_semantics(CopySemantics::DefaultCopy)] specifies the copy semantics to use when tracing objects in the corresponding space.
  • Add #[parent] to common. This tells the macro that there are more spaces defined in common and its nested structs. If an object is not found in any space with #[space] in this plan, the trace code will try to find the space for the object in the 'parent' plan. In our case, the trace code will proceed by checking spaces in the CommonPlan, as the object may be in large object space or immortal space in the common plan. CommonPlan also implements PlanTraceObject, so it knows how to find a space for the object and trace it in the same way.

With the derive macro, your MyGC struct should look like this:

#![allow(unused)]
fn main() {
#[derive(HasSpaces, PlanTraceObject)]
pub struct MyGC<VM: VMBinding> {
    pub hi: AtomicBool,
    #[space]
    #[copy_semantics(CopySemantics::DefaultCopy)]
    pub copyspace0: CopySpace<VM>,
    #[space]
    #[copy_semantics(CopySemantics::DefaultCopy)]
    pub copyspace1: CopySpace<VM>,
    #[parent]
    pub common: CommonPlan<VM>,
}
}

Once this is done, you can specify PlanProcessEdges as the DefaultProcessEdges in your GC work context:

#![allow(unused)]
fn main() {
use crate::policy::gc_work::DEFAULT_TRACE;
use crate::scheduler::gc_work::PlanProcessEdges;
pub struct MyGCWorkContext2<VM: VMBinding>(std::marker::PhantomData<VM>);
impl<VM: VMBinding> crate::scheduler::GCWorkContext for MyGCWorkContext2<VM> {
    type VM = VM;
    type PlanType = MyGC<VM>;
    type DefaultProcessEdges = PlanProcessEdges<Self::VM, MyGC<VM>, DEFAULT_TRACE>;
    type PinningProcessEdges = UnsupportedProcessEdges<Self::VM>;
}
}

Approach 3: Implement your own ProcessEdgesWork

Apart from the two approaches above, you can always implement your own ProcessEdgesWork. This is an overkill for simple plans like semi space, but might be necessary for more complex plans. We discuss how to implement it for MyGC.

Create a struct MyGCProcessEdges<VM: VMBinding> in the gc_work module. It includes a reference back to the plan, and a ProcessEdgesBase field:

#![allow(unused)]
fn main() {
pub struct MyGCProcessEdges<VM: VMBinding> {
    plan: &'static MyGC<VM>,
    base: ProcessEdgesBase<VM>,
}
}

Implement ProcessEdgesWork for MyGCProcessEdges. As most methods in the trait have a default implemetation, we only need to implement new() and trace_object() for our plan. However, this may not be true when you implement it for other GC plans. It would be better to check the default implementation of ProcessEdgesWork.

For trace_object(), what we do is similar to the approach above (except that we need to write the code ourselves rather than letting the macro to generate it for us). We try to figure out which space the object is in, and invoke trace_object() for the object on that space. If the object is not in any of the semi spaces in the plan, we forward the call to CommonPlan.

#![allow(unused)]
fn main() {
impl<VM: VMBinding> ProcessEdgesWork for MyGCProcessEdges<VM> {
    type VM = VM;
    type ScanObjectsWorkType = ScanObjects<Self>;

    fn new(
        slots: Vec<SlotOf<Self>>,
        roots: bool,
        mmtk: &'static MMTK<VM>,
        bucket: WorkBucketStage,
    ) -> Self {
        let base = ProcessEdgesBase::new(slots, roots, mmtk, bucket);
        let plan = base.plan().downcast_ref::<MyGC<VM>>().unwrap();
        Self { base, plan }
    }

    fn trace_object(&mut self, object: ObjectReference) -> ObjectReference {
        let worker = self.worker();
        let queue = &mut self.base.nodes;
        if self.plan.tospace().in_space(object) {
            self.plan.tospace().trace_object(
                queue,
                object,
                Some(CopySemantics::DefaultCopy),
                worker,
            )
        } else if self.plan.fromspace().in_space(object) {
            self.plan.fromspace().trace_object(
                queue,
                object,
                Some(CopySemantics::DefaultCopy),
                worker,
            )
        } else {
            self.plan.common.trace_object(queue, object, worker)
        }
    }

    fn create_scan_work(&self, nodes: Vec<ObjectReference>) -> ScanObjects<Self> {
        ScanObjects::<Self>::new(nodes, false, self.bucket)
    }
}
}

We would also need to implement Deref and DerefMut to our ProcessEdgesWork impl to be dereferenced as ProcessEdgesBase.

#![allow(unused)]
fn main() {
impl<VM: VMBinding> Deref for MyGCProcessEdges<VM> {
    type Target = ProcessEdgesBase<VM>;
    fn deref(&self) -> &Self::Target {
        &self.base
    }
}

impl<VM: VMBinding> DerefMut for MyGCProcessEdges<VM> {
    fn deref_mut(&mut self) -> &mut Self::Target {
        &mut self.base
    }
}
}

In the end, use MyGCProcessEdges as DefaultProcessEdges in the GCWorkContext:

#![allow(unused)]
fn main() {
pub struct MyGCWorkContext3<VM: VMBinding>(std::marker::PhantomData<VM>);
impl<VM: VMBinding> crate::scheduler::GCWorkContext for MyGCWorkContext3<VM> {
    type VM = VM;
    type PlanType = MyGC<VM>;
    type DefaultProcessEdges = MyGCProcessEdges<Self::VM>;
    type PinningProcessEdges = UnsupportedProcessEdges<Self::VM>;
}
}

Summary

You should now have MyGC working and able to collect garbage. All three benchmarks should be able to pass now.

If the benchmarks pass - good job! You have built a functional copying collector!

If you get particularly stuck, the code for the completed MyGC plan is available here.