ArrayAllocators.jl uses the standard array interface to allow faster zeros
with calloc
, allocation on specific NUMA nodes on multi-processor systems as well as aligned memory. The allocators are given as an argument to Array{T}
in place of undef
. Overall, this allows Julia to match the allocation performance of popular numerical libraries such as NumPy, which uses some of these techniques.
In this talk, we will also explore some of the unexpected properties of these allocation methods.
Julia offers an extensible array interface that allows array types to wrap around C pointers obtained from specialized or operating system specific application programming interfaces while integrating into the garbage collection system. ArrayAllocators.jl uses this array interface to allow faster zeros
with calloc
, allocation on specific NUMA nodes on multi-processor systems, and the allocation of aligned memory for vectorization. The allocators are given as an argument to Array{T}
or other subtypes of AbstractArray
in place of the undef
initializer to provide a familiar interface to the user. In this talk, I will describe how to use ArrayAllocators.jl to optimize applications via calloc
, NUMA, and aligned memory.
The easy availability of these allocation methods allows Julia to match the performance and caveats of other libraries or code that uses these methods. For example, NumPy's implementation of numpy.zeros
uses calloc
by default which may make it appear that NumPy is out performing Julia for certain microbenchmarks. On some operating systems, the initial allocation is significantly faster than explicitly filling the array with zeros as is currently done in Base
since the operating system may defer the actual allocation of the memory until a later time. Often the initial allocation time is similar to the allocation time of undef
arrays.
Another application is to make Julia NUMA-aware by allocating memory on specific NUMA nodes. I will demonstrate how to optimize the performance of common memory operations on systems with multiple NUMA nodes on modern processors, which may be counter-intuitive.
A final application is to align memory to power-of-two byte boundaries. This is useful to assist advanced vectorization applications where 64-byte aligned memory may accelerate the use of AVX-512 instructions.
Finally, I will discuss the integer overflow features of ArrayAllocators.jl and how other packages may extend ArrayAllocators.jl to easily add new ways of allocating memory for arrays.
In summary, ArrayAllocators.jl and its subpackages provide a familiar mechanism to allocate memory for arrays via low level methods in a familiar manner. This allows Julia programs to take advantage of advanced operating system features that may accelerate the initialization and use of the memory.