Collection: Implement garbage collection
We need to add a few more things to get garbage collection working.
Specifically, we need to config the GCWorkerCopyContext
, which a GC worker uses for
copying objects, and GC work packets that will be scheduled for a collection.
CopyConfig
CopyConfig
defines how a GC plan copies objects.
Similar to the MutatorConfig
struct, you would need to define CopyConfig
for your plan.
In impl<VM: VMBinding> Plan for MyGC<VM>
, override the method create_copy_config()
.
The default implementation provides a default CopyConfig
for non-copying plans. So for non-copying plans,
you do not need to override the method. But
for copying plans, you would have to provide a proper copy configuration.
In a semispace GC, objects will be copied between the two copy spaces. We will use one
CopySpaceCopyContext
for the copying, and will rebind the copy context to the proper tospace
in the preparation step of a GC (which will be discussed later when we talk about preparing for collections).
We use CopySemantics::DefaultCopy
for our copy
operation, and bind it with the first CopySpaceCopyContext
(CopySemantics::DefaultCopy => CopySelector::CopySpace(0)
).
Other copy semantics are unused in this plan. We also provide an initial space
binding for CopySpaceCopyContext
. However, we will flip tospace in every GC, and rebind the
copy context to the new tospace in each GC, so it does not matter which space we use as the initial
space here.
#![allow(unused)] fn main() { fn create_copy_config(&'static self) -> CopyConfig<Self::VM> { use enum_map::enum_map; CopyConfig { copy_mapping: enum_map! { CopySemantics::DefaultCopy => CopySelector::CopySpace(0), _ => CopySelector::Unused, }, space_mapping: vec![ // The tospace argument doesn't matter, we will rebind before a GC anyway. (CopySelector::CopySpace(0), &self.copyspace0) ], constraints: &MYGC_CONSTRAINTS, } } }
Because the semispace GC copies objects in every single GC, we modify the method
current_gc_may_move_object()
in MyGC
so that it always returns true
.
#![allow(unused)] fn main() { fn current_gc_may_move_object(&self) -> bool { true } }
Introduce collection to MyGC plan
Add a new method to Plan for MyGC
, schedule_collection()
. This function
runs when a collection is triggered. It schedules GC work for the plan, i.e.,
it stops all mutators, runs the
scheduler's prepare stage and resumes the mutators. The StopMutators
work
will invoke code from the bindings to scan threads and other roots, and those
scanning work will further push work for a transitive closure.
Though you can add those work packets by yourself, GCWorkScheduler
provides a
method schedule_common_work()
that will add common work packets for you.
To use schedule_common_work()
, first we need to create a type MyGCWorkContext
and implement the trait GCWorkContext
for it. We create gc_work.rs
and add the
following implementation. Note that we will use the default
SFTProcessEdges
,
which is a general work packet that a plan can use to trace objects. For plans
like semispace, SFTProcessEdges
is sufficient. For more complex GC plans,
one can create and write their own work packet that implements the ProcessEdgesWork
trait.
We will discuss about this later, and discuss the alternatives.
#![allow(unused)] fn main() { pub struct MyGCWorkContext<VM: VMBinding>(std::marker::PhantomData<VM>); impl<VM: VMBinding> crate::scheduler::GCWorkContext for MyGCWorkContext<VM> { type VM = VM; type PlanType = MyGC<VM>; type DefaultProcessEdges = SFTProcessEdges<Self::VM>; type PinningProcessEdges = UnsupportedProcessEdges<Self::VM>; } }
Then we implement schedule_collection()
using MyGCWorkContext
and schedule_common_work()
.
#![allow(unused)] fn main() { fn schedule_collection(&'static self, scheduler: &GCWorkScheduler<VM>) { scheduler.schedule_common_work::<MyGCWorkContext<VM>>(self); } }
Delete handle_user_collection_request()
. This function was an override of
a Common plan function to ignore user requested collection for NoGC. Now we
remove it and allow user requested collection.
Prepare for collection
The collector has a number of steps it needs to perform before each collection. We'll add these now.
Prepare plan
In mygc/global.rs
, find the method prepare
. Delete the unreachable!()
call, and add the following code:
#![allow(unused)] fn main() { fn prepare(&mut self, tls: VMWorkerThread) { self.common.prepare(tls, true); self.hi .store(!self.hi.load(Ordering::SeqCst), Ordering::SeqCst); // Flips 'hi' to flip space definitions let hi = self.hi.load(Ordering::SeqCst); self.copyspace0.prepare(hi); self.copyspace1.prepare(!hi); self.fromspace_mut() .set_copy_for_sft_trace(Some(CopySemantics::DefaultCopy)); self.tospace_mut().set_copy_for_sft_trace(None); } }
This function is called at the start of a collection. It prepares the two spaces in the common plan, flips the definitions for which space is 'to' and which is 'from', then prepares the copyspaces with the new definition.
Note that we call set_copy_for_sft_trace()
for both spaces. This step is required
when using SFTProcessEdges
to tell the spaces which copy semantic to use for copying.
For fromspace, we use the DefaultCopy
semantic, which we have defined earlier in our CopyConfig
.
So for objects in fromspace that need to be copied, the policy will use the copy context that binds with
DefaultCopy
(which allocates to the tospace) in the GC worker. For tospace, we set its
copy semantics to None
, as we do not expect to copy objects from tospace, and if that ever happens,
we will simply panic.
Prepare worker
As we flip tospace for the plan, we also need to rebind the copy context
to the new tospace. We will override prepare_worker()
in our Plan
implementation.
Plan.prepare_worker()
is executed by each GC worker in the preparation phase of a GC. The code
is straightforward -- we get the first CopySpaceCopyContext
, and call rebind()
on it with
the new tospace
.
#![allow(unused)] fn main() { fn prepare_worker(&self, worker: &mut GCWorker<VM>) { unsafe { worker.get_copy_context_mut().copy[0].assume_init_mut() }.rebind(self.tospace()); } }
Prepare mutator
Going back to mutator.rs
, create a new function called
mygc_mutator_prepare<VM: VMBinding>(_mutator: &mut Mutator<VM>, _tls: VMWorkerThread)
.
This function will be called at the preparation stage of a collection (at the start of a
collection) for each mutator. Its body can stay empty, as there aren't any preparation steps for
the mutator in this GC. In create_mygc_mutator()
, find the field prepare_func
and change it
from &unreachable_prepare_func
to &mygc_mutator_prepare
.
💡 Hint: If your plan does nothing when preparing mutators, there is an optimization you can do. You may set the plan constraints field
PlanConstraints::needs_prepare_mutator
tofalse
so that thePrepareMutator
work packets which callprepare_func
will not be created in the first place. This optimization is helpful for VMs that run with a large number of mutator threads. If you do this optimization, you may also leave theMutatorConfig::prepare_func
field as&unreachable_prepare_func
to indicate it should not be called.
Release
Finally, we need to fill out the functions that are, roughly speaking, run after each collection.
Release in plan
Find the method release()
in mygc/global.rs
. Replace the
unreachable!()
call with the following code.
#![allow(unused)] fn main() { fn release(&mut self, tls: VMWorkerThread) { self.common.release(tls, true); self.fromspace().release(); } }
This function is called at the end of a collection. It calls the release routines for the common plan spaces and the fromspace.
Release in mutator
Go back to mutator.rs
. Create a new function called mygc_mutator_release()
that takes the same
inputs as the mygc_mutator_prepare()
function above.
#![allow(unused)] fn main() { pub fn mygc_mutator_release<VM: VMBinding>( mutator: &mut Mutator<VM>, _tls: VMWorkerThread, ) { // rebind the allocation bump pointer to the appropriate semispace let bump_allocator = unsafe { mutator .allocators .get_allocator_mut(mutator.config.allocator_mapping[AllocationSemantics::Default]) } .downcast_mut::<BumpAllocator<VM>>() .unwrap(); bump_allocator.rebind( mutator .plan .downcast_ref::<MyGC<VM>>() .unwrap() .tospace(), ); } }
Then go to create_mygc_mutator()
, replace &unreachable_release_func
in the release_func
field
with &mygc_mutator_release
. This function will be called at the release stage of a collection
(at the end of a collection) for each mutator. It rebinds the allocator for the Default
allocation semantics to the new tospace. When the mutator threads resume, any new allocations for
Default
will then go to the new tospace.
ProcessEdgesWork for MyGC
ProcessEdgesWork
is the key work packet for tracing objects in a GC. A ProcessEdgesWork
implementation
defines how to trace objects, and how to generate more work packets based on the current tracing
to finish the object closure.
GCWorkContext
specifies a type
that implements ProcessEdgesWork
, and we used SFTProcessEdges
earlier. In
this section, we discuss what SFTProcessEdges
does, and what the alternatives
are.
Approach 1: Use SFTProcessEdges
SFTProcessEdges
dispatches
the tracing of objects to their respective spaces through Space Function Table (SFT).
As long as all the policies in a plan provide an implementation of sft_trace_object()
in their SFT implementations,
the plan can use SFTProcessEdges
. Currently most policies provide an implementation for sft_trace_object()
, except
mark compact and immix. Those two policies use multiple GC traces, and due to the limitation of SFT, SFT does not allow
multiple sft_trace_object()
for a policy.
SFTProcessEdges
is the simplest approach when all the policies support it. Fortunately, we can use it for our GC, semispace.
Approach 2: Derive PlanTraceObject
and use PlanProcessEdges
PlanProcessEdges
is another general ProcessEdgesWork
implementation that can be used by most plans. When a plan
implements the PlanTraceObject
,
it can use PlanProcessEdges
.
You can manually provide an implementation of PlanTraceObject
for MyGC
. But you can also use the derive macro MMTK provides,
and the macro will generate an implementation of PlanTraceObject
:
- Make sure
MyGC
already has the#[derive(HasSpaces)]
attribute because all plans need to implement theHasSpaces
trait anyway. (import the macro properly:use mmtk_macros::HasSpaces
) - Add
#[derive(PlanTraceObject)]
forMyGC
(import the macro properly:use mmtk_macros::PlanTraceObject
) - Add both
#[space]
and#[copy_semantics(CopySemantics::Default)]
to both copy space fields,copyspace0
andcopyspace1
.#[space]
tells the macro that bothcopyspace0
andcopyspace1
are spaces in theMyGC
plan, and the generated trace code will check both spaces.#[copy_semantics(CopySemantics::DefaultCopy)]
specifies the copy semantics to use when tracing objects in the corresponding space. - Add
#[parent]
tocommon
. This tells the macro that there are more spaces defined incommon
and its nested structs. If an object is not found in any space with#[space]
in this plan, the trace code will try to find the space for the object in the 'parent' plan. In our case, the trace code will proceed by checking spaces in theCommonPlan
, as the object may be in large object space or immortal space in the common plan.CommonPlan
also implementsPlanTraceObject
, so it knows how to find a space for the object and trace it in the same way.
With the derive macro, your MyGC
struct should look like this:
#![allow(unused)] fn main() { #[derive(HasSpaces, PlanTraceObject)] pub struct MyGC<VM: VMBinding> { pub hi: AtomicBool, #[space] #[copy_semantics(CopySemantics::DefaultCopy)] pub copyspace0: CopySpace<VM>, #[space] #[copy_semantics(CopySemantics::DefaultCopy)] pub copyspace1: CopySpace<VM>, #[parent] pub common: CommonPlan<VM>, } }
Once this is done, you can specify PlanProcessEdges
as the DefaultProcessEdges
in your GC work context:
#![allow(unused)] fn main() { use crate::policy::gc_work::DEFAULT_TRACE; use crate::scheduler::gc_work::PlanProcessEdges; pub struct MyGCWorkContext2<VM: VMBinding>(std::marker::PhantomData<VM>); impl<VM: VMBinding> crate::scheduler::GCWorkContext for MyGCWorkContext2<VM> { type VM = VM; type PlanType = MyGC<VM>; type DefaultProcessEdges = PlanProcessEdges<Self::VM, MyGC<VM>, DEFAULT_TRACE>; type PinningProcessEdges = UnsupportedProcessEdges<Self::VM>; } }
Approach 3: Implement your own ProcessEdgesWork
Apart from the two approaches above, you can always implement your own ProcessEdgesWork
. This is
an overkill for simple plans like semi space, but might be necessary for more complex plans.
We discuss how to implement it for MyGC
.
Create a struct MyGCProcessEdges<VM: VMBinding>
in the gc_work
module. It includes a reference
back to the plan, and a ProcessEdgesBase
field:
#![allow(unused)] fn main() { pub struct MyGCProcessEdges<VM: VMBinding> { plan: &'static MyGC<VM>, base: ProcessEdgesBase<VM>, } }
Implement ProcessEdgesWork
for MyGCProcessEdges
. As most methods in the trait have a default
implemetation, we only need to implement new()
and trace_object()
for our plan. However, this
may not be true when you implement it for other GC plans. It would be better to check the default
implementation of ProcessEdgesWork
.
For trace_object()
, what we do is similar to the approach above (except that we need to write the code
ourselves rather than letting the macro to generate it for us). We try to figure out
which space the object is in, and invoke trace_object()
for the object on that space. If the
object is not in any of the semi spaces in the plan, we forward the call to CommonPlan
.
#![allow(unused)] fn main() { impl<VM: VMBinding> ProcessEdgesWork for MyGCProcessEdges<VM> { type VM = VM; type ScanObjectsWorkType = ScanObjects<Self>; fn new( slots: Vec<SlotOf<Self>>, roots: bool, mmtk: &'static MMTK<VM>, bucket: WorkBucketStage, ) -> Self { let base = ProcessEdgesBase::new(slots, roots, mmtk, bucket); let plan = base.plan().downcast_ref::<MyGC<VM>>().unwrap(); Self { base, plan } } fn trace_object(&mut self, object: ObjectReference) -> ObjectReference { let worker = self.worker(); let queue = &mut self.base.nodes; if self.plan.tospace().in_space(object) { self.plan.tospace().trace_object( queue, object, Some(CopySemantics::DefaultCopy), worker, ) } else if self.plan.fromspace().in_space(object) { self.plan.fromspace().trace_object( queue, object, Some(CopySemantics::DefaultCopy), worker, ) } else { self.plan.common.trace_object(queue, object, worker) } } fn create_scan_work(&self, nodes: Vec<ObjectReference>) -> ScanObjects<Self> { ScanObjects::<Self>::new(nodes, false, self.bucket) } } }
We would also need to implement Deref
and DerefMut
to our ProcessEdgesWork
impl to be
dereferenced as ProcessEdgesBase
.
#![allow(unused)] fn main() { impl<VM: VMBinding> Deref for MyGCProcessEdges<VM> { type Target = ProcessEdgesBase<VM>; fn deref(&self) -> &Self::Target { &self.base } } impl<VM: VMBinding> DerefMut for MyGCProcessEdges<VM> { fn deref_mut(&mut self) -> &mut Self::Target { &mut self.base } } }
In the end, use MyGCProcessEdges
as DefaultProcessEdges
in the GCWorkContext
:
#![allow(unused)] fn main() { pub struct MyGCWorkContext3<VM: VMBinding>(std::marker::PhantomData<VM>); impl<VM: VMBinding> crate::scheduler::GCWorkContext for MyGCWorkContext3<VM> { type VM = VM; type PlanType = MyGC<VM>; type DefaultProcessEdges = MyGCProcessEdges<Self::VM>; type PinningProcessEdges = UnsupportedProcessEdges<Self::VM>; } }
Summary
You should now have MyGC working and able to collect garbage. All three benchmarks should be able to pass now.
If the benchmarks pass - good job! You have built a functional copying collector!
If you get particularly stuck, the code for the completed MyGC
plan
is available here.