Memory allocation (GNU libgomp)

Previous: OpenMP Context Selectors, Up: OpenMP-Implementation Specifics [Contents][Index]

11.3 Memory allocation ¶

The description below applies to:

Explicit use of the OpenMP API routines, see Memory Management Routines.
The allocate clause, except when the allocator modifier is a constant expression with value omp_default_mem_alloc and no align modifier has been specified. (In that case, the normal malloc allocation is used.)
The allocate directive for variables in static memory; while the alignment is honored, the normal static memory is used.
Using the allocate directive for automatic/stack variables, except when the allocator clause is a constant expression with value omp_default_mem_alloc and no align clause has been specified. (In that case, the normal allocation is used: stack allocation and, sometimes for Fortran, also malloc [depending on flags such as -fstack-arrays].)
In Fortran, the allocators directive and the executable allocate directive for Fortran pointers and allocatables is supported, but requires that files containing those directives has to be compiled with -fopenmp-allocators. Additionally, all files that might explicitly or implicitly deallocate memory allocated that way must also be compiled with that option.
The used alignment is the maximum of the value the align clause and the alignment of the type after honoring, if present, the aligned (GNU::aligned) attribute and C’s _Alignas and C++’s alignas. However, the align clause of the allocate directive has no effect on the value of C’s _Alignof and C++’s alignof.

GCC supports the following predefined allocators and predefined memory spaces:

Predefined allocators	Associated predefined memory spaces
omp_default_mem_alloc	omp_default_mem_space
omp_large_cap_mem_alloc	omp_large_cap_mem_space
omp_const_mem_alloc	omp_const_mem_space
omp_high_bw_mem_alloc	omp_high_bw_mem_space
omp_low_lat_mem_alloc	omp_low_lat_mem_space
omp_cgroup_mem_alloc	omp_low_lat_mem_space (implementation defined)
omp_pteam_mem_alloc	omp_low_lat_mem_space (implementation defined)
omp_thread_mem_alloc	omp_low_lat_mem_space (implementation defined)
ompx_gnu_pinned_mem_alloc	omp_default_mem_space (GNU extension)

Each predefined allocator, including omp_null_allocator, has a corresponding allocator class template that meet the C++ allocator completeness requirements. These are located in the omp::allocator namespace, and the ompx::allocator namespace for gnu extensions. This allows the allocator-aware C++ standard library containers to use OpenMP allocation routines; for instance:

std::vector<int, omp::allocator::cgroup_mem<int>> vec;

The following allocator templates are supported:

Predefined allocators	Associated allocator template
omp_null_allocator	omp::allocator::null_allocator
omp_default_mem_alloc	omp::allocator::default_mem
omp_large_cap_mem_alloc	omp::allocator::large_cap_mem
omp_const_mem_alloc	omp::allocator::const_mem
omp_high_bw_mem_alloc	omp::allocator::high_bw_mem
omp_low_lat_mem_alloc	omp::allocator::low_lat_mem
omp_cgroup_mem_alloc	omp::allocator::cgroup_mem
omp_pteam_mem_alloc	omp::allocator::pteam_mem
omp_thread_mem_alloc	omp::allocator::thread_mem
ompx_gnu_pinned_mem_alloc	ompx::allocator::gnu_pinned_mem

The following traits are available when constructing a new allocator; if a trait is not specified or with the value default, the specified default value is used for that trait. The predefined allocators use the default values of each trait, except that the omp_cgroup_mem_alloc, omp_pteam_mem_alloc, and omp_thread_mem_alloc allocators have the access trait set to cgroup, pteam, and thread, respectively. For each trait, a named constant prefixed by omp_atk_ exists; for each non-numeric value, a named constant prefixed by omp_atv_ exists.

Trait	Allowed values	Default value
`sync_hint`	`contended`, `uncontended`, `serialized`, `private`	`contended`
`alignment`	Positive integer being a power of two	1 byte
`access`	`all`, `cgroup`, `pteam`, `thread`	`all`
`pool_size`	Positive integer (bytes)	See below.
`fallback`	`default_mem_fb`, `null_fb`, `abort_fb`, `allocator_fb`	See below
`fb_data`	allocator handle	(none)
`pinned`	`true`, `false`	See below
`partition`	`environment`, `nearest`, `blocked`, `interleaved`	`environment`

For the fallback trait, the default value is null_fb for the omp_default_mem_alloc allocator and any allocator that is associated with device memory; for all other allocators, it is default_mem_fb by default.

For the pinned trait, the default value is true for predefined allocator ompx_gnu_pinned_mem_alloc (a GNU extension), and false for all others.

The following description applies to the initial device (the host) and largely also to non-host devices; for the latter, also see Offload-Target Specifics.

For the memory spaces, the following applies:

omp_default_mem_space is supported
omp_const_mem_space maps to omp_default_mem_space
omp_low_lat_mem_space is only available on supported devices, and maps to omp_default_mem_space otherwise.
omp_large_cap_mem_space maps to omp_default_mem_space, unless the memkind library is available
omp_high_bw_mem_space maps to omp_default_mem_space, unless the memkind library is available

On Linux systems, where the memkind library (libmemkind.so.0) is available at runtime and the respective memkind kind is supported, it is used when creating memory allocators requesting

the partition trait interleaved except when the memory space is omp_large_cap_mem_space (uses MEMKIND_HBW_INTERLEAVE)
the memory space is omp_high_bw_mem_space (uses MEMKIND_HBW_PREFERRED)
the memory space is omp_large_cap_mem_space (uses MEMKIND_DAX_KMEM_ALL or, if not available, MEMKIND_DAX_KMEM)

On Linux systems, where the numa library (libnuma.so.1) is available at runtime, it used when creating memory allocators requesting

the partition trait nearest, except when both the libmemkind library is available and the memory space is either omp_large_cap_mem_space or omp_high_bw_mem_space

Note that the numa library will round up the allocation size to a multiple of the system page size; therefore, consider using it only with large data or by sharing allocations via the pool_size trait. Furthermore, the Linux kernel does not guarantee that an allocation will always be on the nearest NUMA node nor that after reallocation the same node will be used. Note additionally that, on Linux, the default setting of the memory placement policy is to use the current node; therefore, unless the memory placement policy has been overridden, the partition trait environment (the default) will be effectively a nearest allocation.

Additional notes regarding the traits:

The pinned trait is supported on Linux hosts, but is subject to the OS ulimit/rlimit locked memory settings. It currently uses mmap and is therefore optimized for few allocations, including large data. If the conditions for numa or memkind allocations are fulfilled, those allocators are used instead.
The default for the pool_size trait is no pool and for every (re)allocation the associated library routine is called, which might internally use a memory pool. Currently, the same applies when a pool_size has been specified, except that once allocations exceed the the pool size, the action of the fallback trait applies.
For the partition trait, the partition part size will be the same as the requested size (i.e. interleaved or blocked has no effect), except for interleaved when the memkind library is available. Furthermore, for nearest and unless the numa library is available, the memory might not be on the same NUMA node as thread that allocated the memory; on Linux, this is in particular the case when the memory placement policy is set to preferred.
The access trait has no effect such that memory is always accessible by all threads. (Except on supported no-host devices.)
The sync_hint trait has no effect.