[Previous]  [Up]  [Next]

Implementation notes

Curbing template bloat



Code bloat is often associated with the use of parameterized container types: each instantiation of a container class for a different collection of parameter types causes additional code generation. This can be particularly troublesome on platforms which don't provide support for virtual memory or dynamic linking and loading. It can also have a performance impact on platforms which do: virtual memory thrashing is possible, and even where paging isn't an issue larger image sizes mean that the ratio of processor cache size to image size is smaller than it might be, entailing lower average cache hit rates.

Template partial specialization would help by allowing containers of pointers to objects of arbitrary types to share all their implementation. However, this technique has the disadvantage of only being a partial solution to the problem, with the unfortunate consequence that library clients who are aware of it will be tempted to use containers of pointers inappropriately. That strategy has two consequences: first, unnecessary thrashing of the heap (poor runtime performance, heap fragmentation), and second maintenence problems (more pointers entails more opportunities for resource leaks). Template partial specialization is in any case not supported by CFront.

CathLibCPP addresses the bloat problem by an aggressive application of a technique known as "template hoisting": factoring out of common implementation into non-template base classes. The actual implementation device used here is novel, and is unusually flexible and extensible. As expected, the container template classes are lightweight, typesafe veneers over non-template implementation classes. These non-template classes are implemented in terms of opaque (i.e. void*) pointers and blocks of untyped memory, and perform contained-type-specific operations with the assistance of polymorphic helper objects (the novel device) which are constructed and passed to them by the fully typed veneer templates. See list.h/c++ and listbase.h/c++ for an illustration.

The potential for dramatic reduction in code size is illustrated by the files contained in directory testbloat. The file testbloat.c++.testbloat instantiates the standard map template class a compile-time configurable number of times to illustrate the code size increment that each additional template instantiation entails. The increment for CathLibCPP is compared with that for gcc 2.7.2 and the Hewlett Packard implementation of map. The near eightfold advantage of CathLibCPP over Hewlett Packard's clearly dwarfs any possible differences in quality of code generation between Norcroft C and gcc. See the file testbloat.c++.testbloat for more details.

The technique has a runtime cost, but it is estimated to be low: of the order of less than one virtual function call per container operation, averaged over all container operations. This cost may well be small enough to be offset by improved performance due to better cache hit rates, though without extensive profiling (unsupported by CFront) it's difficult to know for sure.

Unfortunately, CFront's poor template instantiation abilities makes this technique slightly awkward to use (automatic template instantiation would make it completely transparent). More details can be found in the section on template instantiation.


 [Previous]  [Up]  [Next]