Dynamic Linker Hardening (The GNU C Library)

Previous: Dynamic Linker Introspection, Up: Dynamic Linker [Contents][Index]

37.3 Avoiding Unexpected Issues With Dynamic Linking ¶

This section details recommendations for increasing application robustness, by avoiding potential issues related to dynamic linking. The recommendations have two main aims: reduce the involvement of the dynamic linker in application execution after process startup, and restrict the application to a dynamic linker feature set whose behavior is more easily understood.

Key aspects of limiting dynamic linker usage after startup are: no use of the dlopen function, disabling lazy binding, and using the static TLS model. More easily understood dynamic linker behavior requires avoiding name conflicts (symbols and sonames) and highly customizable features like the audit subsystem.

Note that while these steps can be considered a form of application hardening, they do not guard against potential harm from accidental or deliberate loading of untrusted or malicious code. There is only limited overlap with traditional security hardening for applications running on GNU systems.

Restricted Dynamic Linker Features
Producing Matching Binaries
Checking Binaries
Run-time Considerations

37.3.1 Restricted Dynamic Linker Features ¶

Avoiding certain dynamic linker features can increase predictability of applications and reduce the risk of running into dynamic linker defects.

Do not use the functions dlopen, dlmopen, or dlclose. Dynamic loading and unloading of shared objects introduces substantial complications related to symbol and thread-local storage (TLS) management.
Without the dlopen function, dlsym and dlvsym cannot be used with shared object handles. Minimizing the use of both functions is recommended. If they have to be used, only the RTLD_DEFAULT pseudo-handle should be used.
Use the local-exec or initial-exec TLS models. If dlopen is not used, there are no compatibility concerns for initial-exec TLS. This TLS model avoids most of the complexity around TLS access. In particular, there are no TLS-related run-time memory allocations after process or thread start.
If shared objects are expected to be used more generally, outside the hardened, feature-restricted context, lack of compatibility between dlopen and initial-exec TLS could be a concern. In that case, the second-best alternative is to use global-dynamic TLS with GNU2 TLS descriptors, for targets that fully implement them, including the fast path for access to TLS variables defined in the initially loaded set of objects. Like initial-exec TLS, this avoids memory allocations after thread creation, but only if the dlopen function is not used.
Do not use lazy binding. Lazy binding may require run-time memory allocation, is not async-signal-safe, and introduces considerable complexity.
Make dependencies on shared objects explicit. Do not assume that certain libraries (such as libc.so.6) are always loaded. Specifically, if a main program or shared object references a symbol, create an ELF DT_NEEDED dependency on that shared object, or on another shared object that is documented (or otherwise guaranteed) to have the required explicit dependency. Referencing a symbol without a matching link dependency results in underlinking, and underlinked objects cannot always be loaded correctly: Initialization of objects may not happen in the required order.
Do not create dependency loops between shared objects (libA.so.1 depending on libB.so.1 depending on libC.so.1 depending on libA.so.1). The GNU C Library has to initialize one of the objects in the cycle first, and the choice of that object is arbitrary and can change over time. The object which is initialized first (and other objects involved in the cycle) may not run correctly because not all of its dependencies have been initialized.
Underlinking (see above) can hide the presence of cycles.
Limit the creation of indirect function (IFUNC) resolvers. These resolvers run during relocation processing, when the GNU C Library is not in a fully consistent state. If you write your own IFUNC resolvers, do not depend on external data or function references in those resolvers.
Do not use the audit functionality (LD_AUDIT, DT_AUDIT, DT_DEPAUDIT). Its callback and hooking capabilities introduce a lot of complexity and subtly alter dynamic linker behavior in corner cases even if the audit module is inactive.
Do not use symbol interposition. Without symbol interposition, the exact order in which shared objects are searched are less relevant.
Exceptions to this rule are copy relocations (see the next item), and vague linkage, as used by the C++ implementation (see below).
One potential source of symbol interposition is a combination of static and dynamic linking, namely linking a static archive into multiple dynamic shared objects. For such scenarios, the static library should be converted into its own dynamic shared object.
A different approach to this situation uses hidden visibility for symbols in the static library, but this can cause problems if the library does not expect that multiple copies of its code coexist within the same process, with no or partial sharing of state.
If you use shared objects that are linked with -Wl,-Bsymbolic (or equivalent) or use protected visibility, the code for the main program must be built as -fpic or -fPIC to avoid creating copy relocations (and the main program must not use copy relocations for other reasons). Using -fpie or -fPIE is not an alternative to PIC code in this context.
Be careful about explicit section annotations. Make sure that the target section matches the properties of the declared entity (e.g., no writable objects in .text).
Ensure that all assembler or object input files have the recommended security markup, particularly for non-executable stack.
Avoid using non-default linker flags and features. In particular, do not use the DT_PREINIT_ARRAY dynamic tag, and do not flag objects as DF_1_INITFIRST. Do not change the default linker script of BFD ld. Do not override ABI defaults, such as the dynamic linker path (with --dynamic-linker).
Some features of the GNU C Library indirectly depend on run-time code loading and dlopen. Use iconv_open with built-in converters only (such as UTF-8). Do not use NSS functionality such as getaddrinfo or getpwuid_r unless the system is configured for built-in NSS service modules only (see below).

Several considerations apply to ELF constructors and destructors.

The dynamic linker does not take constructor and destructor priorities into account when determining their execution order. Priorities are only used by the link editor for ordering execution within a completely linked object. If a dynamic shared object needs to be initialized before another object, this can be expressed with a DT_NEEDED dependency on the object that needs to be initialized earlier.
The recommendations to avoid cyclic dependencies and symbol interposition make it less likely that ELF objects are accessed before their ELF constructors have run. However, using dlsym and dlvsym, it is still possible to access uninitialized facilities even with these restrictions in place. (Of course, access to uninitialized functionality is also possible within a single shared object or the main executable, without resorting to explicit symbol lookup.) Consider using dynamic, on-demand initialization instead. To deal with access after de-initialization, it may be necessary to implement special cases for that scenario, potentially with degraded functionality.
Be aware that when ELF destructors are executed, it is possible to reference already-deconstructed shared objects. This can happen even in the absence of dlsym and dlvsym function calls, for example if client code using a shared object has registered callbacks or objects with another shared object. The ELF destructor for the client code is executed before the ELF destructor for the shared objects that it uses, based on the expected dependency order.
If dlopen and dlmopen are not used, DT_NEEDED dependency information is complete, and lazy binding is disabled, the execution order of ELF destructors is expected to be the reverse of the ELF constructor order. However, two separate dependency sort operations still occur. Even though the listed preconditions should ensure that both sorts produce the same ordering, it is recommended not to depend on the destructor order being the reverse of the constructor order.

The following items provide C++-specific guidance for preparing applications. If another programming language is used and it uses these toolchain features targeted at C++ to implement some language constructs, these restrictions and recommendations still apply in analogous ways.

C++ inline functions, templates, and other constructs may need to be duplicated into multiple shared objects using vague linkage, resulting in symbol interposition. This type of symbol interposition is unproblematic, as long as the C++ one definition rule (ODR) is followed, and all definitions in different translation units are equivalent according to the language C++ rules.
Be aware that under C++ language rules, it is unspecified whether evaluating a string literal results in the same address for each evaluation. This also applies to anonymous objects of static storage duration that GCC creates, for example to implement the compound literals C++ extension. As a result, comparing pointers to such objects, or using them directly as hash table keys, may give unexpected results.
By default, variables of block scope of static storage have consistent addresses across different translation units, even if defined in functions that use vague linkage.
Special care is needed if a C++ project uses symbol visibility or symbol version management (for example, the GCC ‘visibility’ attribute, the GCC -fvisibility option, or a linker version script with the linker option --version-script). It is necessary to ensure that the symbol management remains consistent with how the symbols are used. Some C++ constructs are implemented with the help of ancillary symbols, which can make complicated to achieve consistency. For example, an inline function that is always inlined into its callers has no symbol footprint for the function itself, but if the function contains a variable of static storage duration, this variable may result in the creation of one or more global symbols. For correctness, such symbols must be visible and bound to the same object in all other places where the inline function may be called. This requirement is not met if the symbol visibility is set to hidden, or if symbols are assigned a textually different symbol version (effectively creating two distinct symbols).
Due to the complex interaction between ELF symbol management and C++ symbol generation, it is recommended to use C++ language features for symbol management, in particular inline namespaces.
The toolchain and dynamic linker have multiple mechanisms that bypass the usual symbol binding procedures. This means that the C++ one definition rule (ODR) still holds even if certain symbol-based isolation mechanisms are used, and object addresses are not shared across translation units with incompatible type definitions.
This does not matter if the original (language-independent) advice regarding symbol interposition is followed. However, as the advice may be difficult to implement for C++ applications, it is recommended to avoid ODR violations across the entire process image. Inline namespaces can be helpful in this context because they can be used to create distinct ELF symbols while maintaining source code compatibility at the C++ level.
Be aware that as a special case of interposed symbols, symbols with the STB_GNU_UNIQUE binding type do not follow the usual ELF symbol namespace isolation rules: such symbols bind across RTLD_LOCAL boundaries. Furthermore, symbol versioning is ignored for such symbols; they are bound by symbol name only. All their definitions and uses must therefore be compatible. Hidden visibility still prevents the creation of STB_GNU_UNIQUE symbols and can achieve isolation of incompatible definitions.
C++ constructor priorities only affect constructor ordering within one shared object. Global constructor order across shared objects is consistent with ELF dependency ordering if there are no ELF dependency cycles.
C++ exception handling and run-time type information (RTTI), as implemented in the GNU toolchain, is not address-significant, and therefore is not affected by the symbol binding behaviour of the dynamic linker. This means that types of the same fully-qualified name (in non-anonymous namespaces) are always considered the same from an exception-handling or RTTI perspective. This is true even if the type information object or vtable has hidden symbol visibility, or the corresponding symbols are versioned under different symbol versions, or the symbols are not bound to the same objects due to the use of RTLD_LOCAL or dlmopen.
This can cause issues in applications that contain multiple incompatible definitions of the same type. Inline namespaces can be used to create distinct symbols at the ELF layer, avoiding this type of issue.
C++ exception handling across multiple dlmopen namespaces may not work, particular with the unwinder in GCC versions before 12. Current toolchain versions are able to process unwinding tables across dlmopen boundaries. However, note that type comparison is name-based, not address-based (see the previous item), so exception types may still be matched in unexpected ways. An important special case of exception handling, invoking destructors for variables of block scope, is not impacted by this RTTI type-sharing. Likewise, regular virtual member function dispatch for objects is unaffected (but still requires that the type definitions match in all directly involved translation units).
Once more, inline namespaces can be used to create distinct ELF symbols for different types.
Although the C++ standard requires that destructors for global objects run in the opposite order of their constructors, the Itanium C++ ABI requires a different destruction order in some cases. As a result, do not depend on the precise destructor invocation order in applications that use dlclose.
Registering destructors for later invocation allocates memory and may silently fail if insufficient memory is available. As a result, the destructor is never invoked. This applies to all forms of destructor registration, with the exception of thread-local variables (see the next item). To avoid this issue, ensure that such objects merely have trivial destructors, avoiding the need for registration, and deallocate resources using a different mechanism (for example, from an ELF destructor).
A similar issue exists for thread_local variables with thread storage duration of types that have non-trivial destructors. However, in this case, memory allocation failure during registration leads to process termination. If process termination is not acceptable, use thread_local variables with trivial destructors only. Functions for per-thread cleanup can be registered using pthread_key_create (globally for all threads) and activated using pthread_setspecific (on each thread). Note that a pthread_key_create call may still fail (and pthread_create keys are a limited resource in the GNU C Library), but this failure can be handled without terminating the process.

37.3.2 Producing Matching Binaries ¶

This subsection recommends tools and build flags for producing applications that meet the recommendations of the previous subsection.

Use BFD ld (bfd.ld) from GNU binutils to produce binaries, invoked through a compiler driver such as gcc. The version should be not too far ahead of what was current when the version of the GNU C Library was first released.
Do not use a binutils release that is older than the one used to build the GNU C Library itself.
Compile with -ftls-model=initial-exec to force the initial-exec TLS model.
Link with -Wl,-z,now to disable lazy binding.
Link with -Wl,-z,relro to enable RELRO (which is the default on most targets).
Specify all direct shared objects dependencies using -l options to avoid underlinking. Rely on .so files (which can be linker scripts) and searching with the -l option. Do not specify the file names of shared objects on the linker command line.
Consider using -Wl,-z,defs to treat underlinking as an error condition.
When creating a shared object (linked with -shared), use -Wl,-soname,lib… to set a soname that matches the final installed name of the file.
Do not use the -rpath linker option. (As explained below, all required shared objects should be installed into the default search path.)
Use -Wl,--error-rwx-segments and -Wl,--error-execstack to instruct the link editor to fail the link if the resulting final object would have read-write-execute segments or an executable stack. Such issues usually indicate that the input files are not marked up correctly.
Ensure that for each LOAD segment in the ELF program header, file offsets, memory sizes, and load addresses are multiples of the largest page size supported at run time. Similarly, the start address and size of the GNU_RELRO range should be multiples of the page size.
Avoid creating gaps between LOAD segments. The difference between the load addresses of two subsequent LOAD segments should be the size of the first LOAD segment. (This may require linking with -Wl,-z,noseparate-code.)

This may not be possible to achieve with the currently available link editors.
If the multiple-of-page-size criterion for the GNU_RELRO region cannot be achieved, ensure that the process memory image right before the start of the region does not contain executable or writable memory.

37.3.3 Checking Binaries ¶

In some cases, if the previous recommendations are not followed, this can be determined from the produced binaries. This section contains suggestions for verifying aspects of these binaries.

To detect underlinking, examine the dynamic symbol table, for example using ‘readelf -sDW’. If the symbol is defined in a shared object that uses symbol versioning, it must carry a symbol version, as in ‘pthread_kill@GLIBC_2.34’.
Examine the dynamic segment with ‘readelf -dW’ to check that all the required NEEDED entries are present. (It is not necessary to list indirect dependencies if these dependencies are guaranteed to remain during the evolution of the explicitly listed direct dependencies.)
The NEEDED entries should not contain full path names including slashes, only sonames.
For a further consistency check, collect all shared objects referenced via NEEDED entries in dynamic segments, transitively, starting at the main program. Then determine their dynamic symbol tables (using ‘readelf -sDW’, for example). Ideally, every symbol should be defined at most once, so that symbol interposition does not happen.
If there are interposed data symbols, check if the single interposing definition is in the main program. In this case, there must be a copy relocation for it. (This only applies to targets with copy relocations.)

Function symbols should only be interposed in C++ applications, to implement vague linkage. (See the discussion in the C++ recommendations above.)
Using the previously collected NEEDED entries, check that the dependency graph does not contain any cycles.
The dynamic segment should also mention BIND_NOW on the FLAGS line or NOW on the FLAGS_1 line (one is enough).
Ensure that only static TLS relocations (thread-pointer relative offset locations) are used, for example R_AARCH64_TLS_TPREL and X86_64_TPOFF64. As the second-best option, and only if compatibility with non-hardened applications using dlopen is needed, GNU2 TLS descriptor relocations can be used (for example, R_AARCH64_TLSDESC or R_X86_64_TLSDESC).
There should not be references to the traditional TLS function symbols __tls_get_addr, __tls_get_offset, __tls_get_addr_opt in the dynamic symbol table (in the ‘readelf -sDW’ output). Supporting global dynamic TLS relocations (such as R_AARCH64_TLS_DTPMOD, R_AARCH64_TLS_DTPREL, R_X86_64_DTPMOD64, R_X86_64_DTPOFF64) should not be used, either.
Likewise, the functions dlopen, dlmopen, dlclose should not be referenced from the dynamic symbol table.
For shared objects, there should be a SONAME entry that matches the file name (the base name, i.e., the part after the slash). The SONAME string must not contain a slash ‘/’.
For all objects, the dynamic segment (as shown by ‘readelf -dW’) should not contain RPATH or RUNPATH entries.
Likewise, the dynamic segment should not show any AUDIT, DEPAUDIT, AUXILIARY, FILTER, or PREINIT_ARRAY tags.
If the dynamic segment contains a (deprecated) HASH tag, it must also contain a GNU_HASH tag.
The INITFIRST flag (undeer FLAGS_1) should not be used.
The program header must not have LOAD segments that are writable and executable at the same time.
All produced objects should have a GNU_STACK program header that is not marked as executable. (However, on some newer targets, a non-executable stack is the default, so the GNU_STACK program header is not required.)

37.3.4 Run-time Considerations ¶

In addition to preparing program binaries in a recommended fashion, the run-time environment should be set up in such a way that problematic dynamic linker features are not used.

Install shared objects using their sonames in a default search path directory (usually /usr/lib64). Do not use symbolic links.
The default search path must not contain objects with duplicate file names or sonames.
Do not use environment variables (LD_… variables such as LD_PRELOAD or LD_LIBRARY_PATH, or GLIBC_TUNABLES) to change default dynamic linker behavior.
Do not install shared objects in non-default locations. (Such locations are listed explicitly in the configuration file for ldconfig, usually /etc/ld.so.conf, or in files included from there.)
In relation to the previous item, do not install any objects it glibc-hwcaps subdirectories.
Do not configure dynamically-loaded NSS service modules, to avoid accidental internal use of the dlopen facility. The files and dns modules are built in and do not rely on dlopen.
Do not truncate and overwrite files containing programs and shared objects in place, while they are used. Instead, write the new version to a different path and use rename to replace the already-installed version.
Be aware that during a component update procedure that involves multiple object files (shared objects and main programs), concurrently starting processes may observe an inconsistent combination of object files (some already updated, some still at the previous version). For example, this can happen during an update of the GNU C Library itself.