3.4 Memory Protection

When a page is mapped using mmap, page protection flags can be specified using the protection flags argument. See Memory-mapped I/O.

The following flags are available:

PROT_WRITE

The memory can be written to.

PROT_READ

The memory can be read. On some architectures, this flag implies that the memory can be executed as well (as if PROT_EXEC had been specified at the same time).

PROT_EXEC

The memory can be used to store instructions which can then be executed. On most architectures, this flag implies that the memory can be read (as if PROT_READ had been specified).

PROT_NONE

This flag must be specified on its own.

The memory is reserved, but cannot be read, written, or executed. If this flag is specified in a call to mmap, a virtual memory area will be set aside for future use in the process, and mmap calls without the MAP_FIXED flag will not use it for subsequent allocations. For anonymous mappings, the kernel will not reserve any physical memory for the allocation at the time the mapping is created.

The operating system may keep track of these flags separately even if the underlying hardware treats them the same for the purposes of access checking (as happens with PROT_READ and PROT_EXEC on some platforms). On GNU systems, PROT_EXEC always implies PROT_READ, so that users can view the machine code which is executing on their system.

Inappropriate access will cause a segfault (see Program Error Signals).

After allocation, protection flags can be changed using the mprotect function.

Function: int mprotect (void *address, size_t length, int protection)

Preliminary: | MT-Safe | AS-Safe | AC-Safe | See POSIX Safety Concepts.

A successful call to the mprotect function changes the protection flags of at least length bytes of memory, starting at address.

address must be aligned to the page size for the mapping. The system page size can be obtained by calling sysconf with the _SC_PAGESIZE parameter (see Definition of sysconf). The system page size is the granularity in which the page protection of anonymous memory mappings and most file mappings can be changed. Memory which is mapped from special files or devices may have larger page granularity than the system page size and may require larger alignment.

length is the number of bytes whose protection flags must be changed. It is automatically rounded up to the next multiple of the system page size.

protection is a combination of the PROT_* flags described above.

The mprotect function returns 0 on success and -1 on failure.

The following errno error conditions are defined for this function:

ENOMEM

The system was not able to allocate resources to fulfill the request. This can happen if there is not enough physical memory in the system for the allocation of backing storage. The error can also occur if the new protection flags would cause the memory region to be split from its neighbors, and the process limit for the number of such distinct memory regions would be exceeded.

EINVAL

address is not properly aligned to a page boundary for the mapping, or length (after rounding up to the system page size) is not a multiple of the applicable page size for the mapping, or the combination of flags in protection is not valid.

EACCES

The file for a file-based mapping was not opened with open flags which are compatible with protection.

EPERM

The system security policy does not allow a mapping with the specified flags. For example, mappings which are both PROT_EXEC and PROT_WRITE at the same time might not be allowed.

If the mprotect function is used to make a region of memory inaccessible by specifying the PROT_NONE protection flag and access is later restored, the memory retains its previous contents.

On some systems, it may not be possible to specify additional flags which were not present when the mapping was first created. For example, an attempt to make a region of memory executable could fail if the initial protection flags were ‘PROT_READ | PROT_WRITE’.

In general, the mprotect function can be used to change any process memory, no matter how it was allocated. However, portable use of the function requires that it is only used with memory regions returned by mmap or mmap64.

3.4.1 Memory Protection Keys

On some systems, further access restrictions can be added to specific pages using memory protection keys. These restrictions work as follows:

New threads and subprocesses inherit the access restrictions of the current thread. If a protection key is allocated subsequently, existing threads (except the current) will use an unspecified system default for the access restrictions associated with newly allocated keys.

Upon entering a signal handler, the system resets the access restrictions of the current thread so that pages with the default key can be accessed, but the access restrictions for other protection keys are unspecified.

Applications are expected to allocate a key once using pkey_alloc, and apply the key to memory regions which need special protection with pkey_mprotect:

  int key = pkey_alloc (0, PKEY_DISABLE_ACCESS);
  if (key < 0)
    /* Perform error checking, including fallback for lack of support.  */
    ...;

  /* Apply the key to a special memory region used to store critical
     data.  */
  if (pkey_mprotect (region, region_length,
                     PROT_READ | PROT_WRITE, key) < 0)
    ...; /* Perform error checking (generally fatal).  */

If the key allocation fails due to lack of support for memory protection keys, the pkey_mprotect call can usually be skipped. In this case, the region will not be protected by default. It is also possible to call pkey_mprotect with a key value of -1, in which case it will behave in the same way as mprotect.

After key allocation assignment to memory pages, pkey_set can be used to temporarily acquire access to the memory region and relinquish it again:

  if (key >= 0 && pkey_set (key, 0) < 0)
    ...; /* Perform error checking (generally fatal).  */
  /* At this point, the current thread has read-write access to the
     memory region.  */
  ...
  /* Revoke access again.  */
  if (key >= 0 && pkey_set (key, PKEY_DISABLE_ACCESS) < 0)
    ...; /* Perform error checking (generally fatal).  */

In this example, a negative key value indicates that no key had been allocated, which means that the system lacks support for memory protection keys and it is not necessary to change the the access restrictions of the current thread (because it always has access).

Compared to using mprotect to change the page protection flags, this approach has two advantages: It is thread-safe in the sense that the access restrictions are only changed for the current thread, so another thread which changes its own access restrictions concurrently to gain access to the mapping will not suddenly see its access restrictions updated. And pkey_set typically does not involve a call into the kernel and a context switch, so it is more efficient.

Function: int pkey_alloc (unsigned int flags, unsigned int access_restrictions)

Preliminary: | MT-Safe | AS-Safe | AC-Unsafe corrupt | See POSIX Safety Concepts.

Allocate a new protection key. The flags argument is reserved and must be zero. The access_restrictions argument specifies access restrictions which are applied to the current thread (as if with pkey_set below). Access restrictions of other threads are not changed.

The function returns the new protection key, a non-negative number, or -1 on error.

The following errno error conditions are defined for this function:

ENOSYS

The system does not implement memory protection keys.

EINVAL

The flags argument is not zero.

The access_restrictions argument is invalid.

The system does not implement memory protection keys or runs in a mode in which memory protection keys are disabled.

ENOSPC

All available protection keys already have been allocated.

The system does not implement memory protection keys or runs in a mode in which memory protection keys are disabled.

Function: int pkey_free (int key)

Preliminary: | MT-Safe | AS-Safe | AC-Safe | See POSIX Safety Concepts.

Deallocate the protection key, so that it can be reused by pkey_alloc.

Calling this function does not change the access restrictions of the freed protection key. The calling thread and other threads may retain access to it, even if it is subsequently allocated again. For this reason, it is not recommended to call the pkey_free function.

ENOSYS

The system does not implement memory protection keys.

EINVAL

The key argument is not a valid protection key.

Function: int pkey_mprotect (void *address, size_t length, int protection, int key)

Preliminary: | MT-Safe | AS-Safe | AC-Safe | See POSIX Safety Concepts.

Similar to mprotect, but also set the memory protection key for the memory region to key.

Some systems use memory protection keys to emulate certain combinations of protection flags. Under such circumstances, specifying an explicit protection key may behave as if additional flags have been specified in protection, even though this does not happen with the default protection key. For example, some systems can support PROT_EXEC-only mappings only with a default protection key, and memory with a key which was allocated using pkey_alloc will still be readable if PROT_EXEC is specified without PROT_READ.

If key is -1, the default protection key is applied to the mapping, just as if mprotect had been called.

The pkey_mprotect function returns 0 on success and -1 on failure. The same errno error conditions as for mprotect are defined for this function, with the following addition:

EINVAL

The key argument is not -1 or a valid memory protection key allocated using pkey_alloc.

ENOSYS

The system does not implement memory protection keys, and key is not -1.

Function: int pkey_set (int key, unsigned int access_restrictions)

Preliminary: | MT-Safe | AS-Safe | AC-Safe | See POSIX Safety Concepts.

Change the access restrictions of the current thread for memory pages with the protection key key to access_restrictions. If access_restrictions is zero, no additional access restrictions on top of the page protection flags are applied. Otherwise, access_restrictions is a combination of the following flags:

PKEY_DISABLE_READ

Subsequent attempts to read from memory with the specified protection key will fault. At present only AArch64 platforms with enabled Stage 1 permission overlays feature support this type of restriction.

PKEY_DISABLE_WRITE

Subsequent attempts to write to memory with the specified protection key will fault.

PKEY_DISABLE_ACCESS

Subsequent attempts to write to or read from memory with the specified protection key will fault. On AArch64 platforms with enabled Stage 1 permission overlays feature this restriction value has the same effect as combination of PKEY_DISABLE_READ and PKEY_DISABLE_WRITE.

PKEY_DISABLE_EXECUTE

Subsequent attempts to execute from memory with the specified protection key will fault. At present only AArch64 platforms with enabled Stage 1 permission overlays feature support this type of restriction.

Operations not specified as flags are not restricted. In particular, this means that the memory region will remain executable if it was mapped with the PROT_EXEC protection flag and PKEY_DISABLE_ACCESS has been specified.

Calling the pkey_set function with a protection key which was not allocated by pkey_alloc results in undefined behavior. This means that calling this function on systems which do not support memory protection keys is undefined.

The pkey_set function returns 0 on success and -1 on failure.

The following errno error conditions are defined for this function:

EINVAL

The system does not support the access restrictions expressed in the access_restrictions argument.

Function: int pkey_get (int key)

Preliminary: | MT-Safe | AS-Safe | AC-Safe | See POSIX Safety Concepts.

Return the access restrictions of the current thread for memory pages with protection key key. The return value is zero or a combination of the PKEY_DISABLE_* flags; see the pkey_set function.

The returned value should be checked for presence or absence of specific flags using bitwise operations. Comparing the returned value with any of the flags or their combination using equals will almost certainly fail.

Calling the pkey_get function with a protection key which was not allocated by pkey_alloc results in undefined behavior. This means that calling this function on systems which do not support memory protection keys is undefined.