SoC2007 Memory         Custom Memory Heaps and Object Allocators - GSoC 2007

Summer of Code 2007: Custom Memory Heaps and Object Allocators

Student: Tim Kelsey

Mentor: Steve Streeting (sinbad)

Location: CVS 'soc07-memory' branch

Status: Underway

Project Summery



The aim of this effort is to produce a thread safe and flexible memory system for OGRE. The project is comprised of two main parts, firstly a flexible framework using policy programming techniques to allow control, manipulation and user extension of the system. The second part being the actual memory allocators. The overall effort resulting in a user controlled memory model for OGRE 3D.

Advantages

The most immediate and obvious advantage to implementing a custom allocator and heap framework is debugging and profiling. By utilising a Debug allocator an application developer would be able to access and log detailed information about memory usage, including size, frequency of allocations, most active periods of memory usage and possible corruption or leaking of memory.Building this functionality directly into the engine has a number of advantages over using external memory checking tools. Firstly the user is alerted to a problem the moment is arises, when recent code, the likely cause of the issue, is fresh in their mind. Secondly a custom debug heap can modify the internal specifics of the memory allocation model to integrate with existing debug information paths, resulting in unified debugging information.

As well as an invaluable debugging and profiling tool, custom memory heaps allow for tuning of memory footprint boundaries. Thus an end user would be able to restrict an applications memory footprint to a maximum threshold. Useful for situations where the installed memory of the development system is larger than the intended system specifications.

Functional Overview

Policy Programming

The STL uses policy based techniques through out its design and all allocators created have to honor the expected interface (ISO C++ section 20.4.1) to be STL compatible. This interface is designed to work in a policy scheme so I decided not to fight it. Additionally a policy based system makes for a very flexible system that should be easy to both use and extend as people see fit. Static linkage and empty base class optimisation also make for an efficient system. The down side is some fairly heavy template usage (see the thread safety example later). Typedef will and should be used extensively to alleviate this.

Per-Object Memory Control

To use a specific allocator for a family of objects the user would define (with a typedef) and inherit from a base class that provides an overloaded operator new and delete. These will be used in place of the global allocation scheme when creating new members in the users object family. The operators wrap back to an Allocator object (the policy host) that is constructed to match. This is implemented in the framework by the AllocWrapper<T> class template.

Code:

class MyBaseObject;
 
 typedef AllocWrapper <MyAllocatorType> MyAllocWrapper;
 
 class MyBaseObject : public MyAllocWrapper
 { ... };

Thus each time a new object belonging to the family is created the overloaded operator is invoked and the correct allocator will be called.

If the user needs to allocate memory for objects or buffers internal to member of the object family then AllocWrapper provides an accessor that will return an appropriate allocator for them to use directly. The AllocWrapper also provides a typedef to hide its specific policy composition called "allocator". This "allocator" type is propagated to user objects and can be employed in STL containers used internally or externally by the object family.

Code:

class MyBaseObject : public MyAllocWrapper
 {
   typedef typename MyAllocWrapper::allocator alocType;
   public:
      std::vector<int, alocType::rebind<int>::allocator > v;
 };

Note: the rebind<> object is an STL trick used to convert an allocator from one type to another and is part of the ISO standard for the allocator interface.

Global Memory Control

Each of the specific memory allocators grabs memory from a single generic memory manager. This manager also satisfies any calls to ::new/[] and ::delete/[] for objects that have no specific allocator attached. This general manager grabs and maintains regions of memory using mmap()/VirtualAlloc() to cater for small to medium requests. Requests for allocations over a given threshold are handled with a direct mmap()/VirtualAlloc() call.

Within each region a number of fixed size power of two bins are maintained, ranging from 8bytes to the full size of the regions available space. These bins are used to form quick free-lists of memory fragments. This allows the manager to find and reuse an old chunk of a correct size to satisfy a new memory request. when a chunk is allocated any excess memory over the desired allocation size is shaved off and added to an appropriately sized bin, conversely, when an allocation is freed it is coalesced with its neighbors to form the largest run of memory possible and this is then added to a bin.

If a region becomes depleted or too fragmented to cater for a specific request then a new region is created and managed.

Memory Corruption

The generic allocator adds a head and tail block to each allocation, these blocks contain necessary book keeping information and an additional "magic" value. When a block is released, or its size quired via sizeOfStorage() this magic value is tested to confirm that no overflows or general corruption has occurred.

Thread Safety

Basic thread safety is ensured in the generic manager using a mutex. A more effective system using Thread Local Storage is planned soon.

Memory Aligning

To write

Profiling

Memory profiling is delegated to a second policy (Really here I can only see the need for two profile policies, but who knows maybe a user will conceive a new one). The default profile policy maintains an internal struct holding information about allocations, How many allocations, How many de-allocations, how many bytes allocated and how many bytes released.On construction each Profile policy will register with the profile manager. Each tick a call to ProfileManager::update() needs to be performed, this will collect the info structs from each registered profile and reset that profile to 0. The ProfileManager works with this info to produce memory profile reports.

Because the usage of an engine like OGRE easily breaks down into distinct sections, initialising, loading, running, unloading, loading .... ect; the memory profile report can be generated for each section as well as a final global report. Making a call to ProfileManager::flush("Section Name") will cause intermediate information to be logged, detailing memory activity since the last call to flush(). By caching all the deallocation info in a single frame and only updating the stats at the end of that frame we can keep the debug heap reasonably fast; as well as allowing for a per-frame breakdown for memory usage reports.

Once the debug faze has passed a DummyProfilePolicy can be substituted for the default. The DummyProfilePolicy will not register on construction, maintains no member data and has only inline empty methods. In-short, one that does nothing and will be optimized away by the compiler. The only dynamically bound function in the system exists in the default profile policy and thus is an issue only in a debug build.

Thread Safety

The Allocator delegates to a policy for its specific memory management. The available policies fall into two groups, the basic memory management scheme and a thread safe access wrapper. For example:

Code:

typedef Allocator
 <
   Something,
   MutexPolicy<BasicAllocPolicy>, 
   ObjectTraits<Somthing>,
   DummyProfilePolicy
 > 
 ThreadSafeSomthingAlloc;

This would create a mutex protected standard allocator type with no profiling (a Release Heap).
It is possible to make a non-thread-safe heap by using the BasicAllocPolicy directly in the above example. The reasoning behind not adding thread protection into all policies directly stems from a desire to implement a thread local storage policy that would not need to maintain a mutex or similar mechanism. I plan on using Boost threads for all this lot, for maximum portability. Thread safety in the profiling code is enforced by the use of a mutex maintained within the profiling policy.

Specific Allocators

Small Object Allocator

A small allocation policy exists based on ideas presented in the book "Modern C++ Design" by Andrei Alexandrescu ISBN 0-201-70431-5. This policy works by grabbing a chunk of memory from the generic allocator and dishing it out in smaller sizes.

The small allocation policy can handle any sized allocation under a given threshold value. The policy class is really a capsule of several other mini allocators, each one catering for one specific size. Each time an allocation comes in it is delegated to the correct sized mini allocator. If a new size is seen for the first time then a new mini allocator is created for it and added. Allocations that are over the threshold are delegated to the global ::new/::delete operators.

Thread Local Allocator

On its way

The Future

This effort started as a GSoC 2007 project but it certainly wont be ending there!