At a high-level, we’re doing coverage-guided symbolic execution of the user’s code.
The analyzer implementation works on the gimple-SSA representation. (I chose this in the hopes of making it easy to work with LTO to do whole-program analysis).
The implementation is read-only: it doesn’t attempt to change anything, just emit warnings.
The gimple representation can be seen using -fdump-ipa-analyzer.
Tip: If the analyzer ICEs before this is written out, one workaround is to use --param=analyzer-bb-explosion-factor=0 to force the analyzer to bail out after analyzing the first basic block.
First, we build a directed graph to represent the user’s code.
For historical reasons we call this the supergraph, although
this is now a misnomer as we no longer add callgraph edges to this graph.
The nodes and edges in the supergraph are called “supernodes” and
“superedges”, and often referred to in code as snodes and
sedges.
We make a node in the supergraph before every gimple statement, with edges representing the transitions between statements within a basic block, along with additional nodes and edges at CFG edges.
The nodes in the supergraph represent locations in the user’s code,
and discrete points between operations. The edges represent transitions
between these locations. Each edge in the supergraph can have an optional
operation associated with it, representing a single state transition
that occurs along the edge, such as
switch case)
There can be multiple nodes and edges in the supergraph corresponding to a single CFG edge so that e.g. we can handle filtering states on a condition separately from handling the effect of the phi nodes if the condition was satisfied.
The analyzer in GCC 10 - GCC 15 attempted to have a single supernode per basic block for the sake of efficiency, but given that state transitions can happen mid-block, this became unmaintainable, hence we now have fine-grained nodes with one node/edge per gimple statement.
Having built the supergraph from the CFGs of all of the functions in the user’s code, we manipulate it:
location_t value referring to the location in the user’s source.
This is necessary, since in the gimple IR seen by the analyzer, many gimple
statements have no location associated with them.
The supergraph can be seen at each stage using -fdump-analyzer-supergraph, which creates a series of SRC.supergraph.N.KIND.dot GraphViz files files showing the state of the supergraph after each of the above.
We then build an analysis_plan which walks the callgraph to
determine which calls might be suitable for being summarized (rather
than fully explored) and thus in what order to explore the functions.
Next is the heart of the analyzer: we use a worklist to explore state within the supergraph, building an "exploded graph". Nodes in the exploded graph correspond to <point, state> pairs, as in "Precise Interprocedural Dataflow Analysis via Graph Reachability" (Thomas Reps, Susan Horwitz and Mooly Sagiv) - but note that we’re not using the algorithm described in that paper, just the “exploded graph” terminology.
We reuse nodes for <point, state> pairs we’ve already seen, and avoid tracking state too closely, so that (hopefully) we rapidly converge on a final exploded graph, and terminate the analysis. We also bail out if the number of exploded <point, state> nodes gets larger than a particular multiple of the total number of supernodes, (to ensure termination in the face of pathological state-explosion cases, or bugs). We also stop exploring a point once we hit a limit of states for that point.
We can identify problems directly when processing a <point, state> instance. For example, if we’re finding the successors of
<point: before-stmt: "free (ptr);",
state: {"ptr": freed}>
then we can detect a double-free of "ptr". We can then emit a path to reach the problem by finding the simplest route through the graph.
Program points in the analysis are a combination of a supernode
together with a "call string" identifying the
stack of callsites below them, so that paths in the exploded graph
correspond to interprocedurally valid paths: we always return to the
correct call site, propagating state information accordingly.
We avoid infinite recursion by stopping the analysis if a callsite
appears more than analyzer-max-recursion-depth in a callstring
(defaulting to 2).
Nodes and edges in the exploded graph are called “exploded nodes” and
“exploded edges” and often referred to in the code as
enodes and eedges (especially when distinguishing them
from the snodes and sedges in the supergraph).
Each graph numbers its nodes, giving unique identifiers - supernodes are referred to throughout dumps in the form ‘SN': index’ and exploded nodes in the form ‘EN: index’ (e.g. ‘SN: 2’ and ‘EN:29’).
The supergraph can be seen using -fdump-analyzer-supergraph.
The exploded graph can be seen using -fdump-analyzer-exploded-graph and other dump options. Exploded nodes are color-coded in the .dot output based on state-machine states to make it easier to see state changes at a glance.
There’s a tension between:
For example, in general, given this CFG:
A
/ \
B C
\ /
D
/ \
E F
\ /
G
we want to avoid differences in state-tracking in B and C from leading to blow-up. If we don’t prevent state blowup, we end up with exponential growth of the exploded graph like this:
1:A
/ \
/ \
/ \
2:B 3:C
| |
4:D 5:D (2 exploded nodes for D)
/ \ / \
6:E 7:F 8:E 9:F
| | | |
10:G 11:G 12:G 13:G (4 exploded nodes for G)
Similar issues arise with loops.
To prevent this, we follow various approaches:
We avoid merging pairs of states that have state-machine differences, as these are the kinds of differences that are likely to be most interesting. So, for example, given:
if (condition)
ptr = malloc (size);
else
ptr = local_buf;
.... do things with 'ptr'
if (condition)
free (ptr);
...etc
then we end up with an exploded graph that looks like this:
if (condition)
/ T \ F
--------- ----------
/ \
ptr = malloc (size) ptr = local_buf
| |
copy of copy of
"do things with 'ptr'" "do things with 'ptr'"
with ptr: heap-allocated with ptr: stack-allocated
| |
if (condition) if (condition)
| known to be T | known to be F
free (ptr); |
\ /
-----------------------------
| ('ptr' is pruned, so states can be merged)
etc
where some duplication has occurred, but only for the places where the the different paths are worth exploringly separately.
Merging can be disabled via -fno-analyzer-state-merge.
Part of the state stored at a exploded_node is a region_model.
This is an implementation of the region-based ternary model described in
"A Memory Model for Static Analysis of C Programs"
(Zhongxing Xu, Ted Kremenek, and Jian Zhang).
A region_model encapsulates a representation of the state of
memory, with a store recording a binding between region
instances, to svalue instances. The bindings are organized into
clusters, where regions accessible via well-defined pointer arithmetic
are in the same cluster. The representation is graph-like because values
can be pointers to regions. It also stores a constraint_manager,
capturing relationships between the values.
Because each node in the exploded_graph has a region_model,
and each of the latter is graph-like, the exploded_graph is in some
ways a graph of graphs.
There are several “dump” functions for use when debugging the analyzer.
Consider this example C code:
void *
calls_malloc (size_t n)
{
void *result = malloc (1024);
return result; /* HERE */
}
void test (size_t n)
{
void *ptr = calls_malloc (n * 4);
/* etc. */
}
and the state at the point /* HERE */ for the interprocedural
analysis case where calls_malloc returns back to test.
Here’s an example of printing a program_state at /* HERE */,
showing the region_model within it, along with state for the
malloc state machine.
(gdb) break region_model::on_return
[..snip...]
(gdb) run
[..snip...]
(gdb) up
[..snip...]
(gdb) call state->dump()
State
├─ Region Model
│ ├─ Current Frame: frame: ‘calls_malloc’@2
│ ├─ Store
│ │ ├─ m_called_unknown_fn: false
│ │ ├─ frame: ‘test’@1
│ │ │ ╰─ _1: (INIT_VAL(n_2(D))*(size_t)4)
│ │ ╰─ frame: ‘calls_malloc’@2
│ │ ├─ result_4: &HEAP_ALLOCATED_REGION(27)
│ │ ╰─ _5: &HEAP_ALLOCATED_REGION(27)
│ ╰─ Dynamic Extents
│ ╰─ HEAP_ALLOCATED_REGION(27): (INIT_VAL(n_2(D))*(size_t)4)
╰─ ‘malloc’ state machine
╰─ 0x468cb40: &HEAP_ALLOCATED_REGION(27): unchecked ({free}) (‘result_4’)
Within the store, there are bindings clusters for the SSA names for the
various local variables within frames for test and
calls_malloc. For example,
test the whole cluster for _1 is bound
to a binop_svalue representing n * 4, and
test the whole cluster for result_4 is bound to a
region_svalue pointing at HEAP_ALLOCATED_REGION(12).
Additionally, this latter pointer has the unchecked state for the
malloc state machine indicating it hasn’t yet been checked against
NULL since the allocation call.
We also see that the state has captured the size of the heap-allocated region (“Dynamic Extents”).
This visualization can also be seen within the output of -fdump-analyzer-exploded-nodes-2 and -fdump-analyzer-exploded-nodes-3.
As well as the above visualizations of states, there are tree-like
visualizations for instances of svalue and region, showing
their IDs and how they are constructed from simpler symbols:
(gdb) break region_model::set_dynamic_extents [..snip...] (gdb) run [..snip...] (gdb) up [..snip...] (gdb) call size_in_bytes->dump() (17): ‘long unsigned int’: binop_svalue(mult_expr: ‘*’) ├─ (15): ‘size_t’: initial_svalue │ ╰─ m_reg: (12): ‘size_t’: decl_region(‘n_2(D)’) │ ╰─ parent: (9): frame_region(‘test’, index: 0, depth: 1) │ ╰─ parent: (1): stack region │ ╰─ parent: (0): root region ╰─ (16): ‘size_t’: constant_svalue (‘4’)
i.e. that size_in_bytes is a binop_svalue expressing
the result of multiplying
PARM_DECL n_2(D) for the
parameter n within the frame for test by
4.
The above visualizations rely on the text_art::widget framework,
which performs significant work to lay out the output, so there is also
an earlier, simpler, form of dumping available. For states there is:
(gdb) call state->dump(eg.m_ext_state, true)
rmodel:
stack depth: 2
frame (index 1): frame: ‘calls_malloc’@2
frame (index 0): frame: ‘test’@1
clusters within frame: ‘test’@1
cluster for: _1: (INIT_VAL(n_2(D))*(size_t)4)
clusters within frame: ‘calls_malloc’@2
cluster for: result_4: &HEAP_ALLOCATED_REGION(27)
cluster for: _5: &HEAP_ALLOCATED_REGION(27)
m_called_unknown_fn: FALSE
constraint_manager:
equiv classes:
constraints:
dynamic_extents:
HEAP_ALLOCATED_REGION(27): (INIT_VAL(n_2(D))*(size_t)4)
malloc:
0x468cb40: &HEAP_ALLOCATED_REGION(27): unchecked ({free}) (‘result_4’)
or for region_model just:
(gdb) call state->m_region_model->debug() stack depth: 2 frame (index 1): frame: ‘calls_malloc’@2 frame (index 0): frame: ‘test’@1 clusters within frame: ‘test’@1 cluster for: _1: (INIT_VAL(n_2(D))*(size_t)4) clusters within frame: ‘calls_malloc’@2 cluster for: result_4: &HEAP_ALLOCATED_REGION(27) cluster for: _5: &HEAP_ALLOCATED_REGION(27) m_called_unknown_fn: FALSE constraint_manager: equiv classes: constraints: dynamic_extents: HEAP_ALLOCATED_REGION(27): (INIT_VAL(n_2(D))*(size_t)4)
and for instances of svalue and region there is this
older dump implementation, which takes a bool simple flag
controlling the verbosity of the dump:
(gdb) call size_in_bytes->dump(true) (INIT_VAL(n_2(D))*(size_t)4) (gdb) call size_in_bytes->dump(false) binop_svalue (mult_expr, initial_svalue(‘size_t’, decl_region(frame_region(‘test’, index: 0, depth: 1), ‘size_t’, ‘n_2(D)’)), constant_svalue(‘size_t’, 4))
We need to explain to the user what the problem is, and to persuade them
that there really is a problem. Hence having a diagnostics::paths::path
isn’t just an incidental detail of the analyzer; it’s required.
Paths ought to be:
Without state-merging, all paths in the exploded graph are feasible (in terms of constraints being satisfied). With state-merging, paths in the exploded graph can be infeasible.
We collate warnings and only emit them for the simplest path e.g. for a bug in a utility function, with lots of routes to calling it, we only emit the simplest path (which could be intraprocedural, if it can be reproduced without a caller).
We thus want to find the shortest feasible path through the exploded graph from the origin to the exploded node at which the diagnostic was saved. Unfortunately, if we simply find the shortest such path and check if it’s feasible we might falsely reject the diagnostic, as there might be a longer path that is feasible. Examples include the cases where the diagnostic requires us to go at least once around a loop for a later condition to be satisfied, or where for a later condition to be satisfied we need to enter a suite of code that the simpler path skips.
We attempt to find the shortest feasible path to each diagnostic by first constructing a “trimmed graph” from the exploded graph, containing only those nodes and edges from which there are paths to the target node, and using Dijkstra’s algorithm to order the trimmed nodes by minimal distance to the target.
We then use a worklist to iteratively build a “feasible graph” (actually a tree), capturing the pertinent state along each path, in which every path to a “feasible node” is feasible by construction, restricting ourselves to the trimmed graph to ensure we stay on target, and ordering the worklist so that the first feasible path we find to the target node is the shortest possible path. Hence we start by trying the shortest possible path, but if that fails, we explore progressively longer paths, eventually trying iterations through loops. The exploration is captured in the feasible_graph, which can be dumped as a .dot file via -fdump-analyzer-feasibility to visualize the exploration. The indices of the feasible nodes show the order in which they were created. We effectively explore the tree of feasible paths in order of shortest path until we either find a feasible path to the target node, or hit a limit and give up.
This is something of a brute-force approach, but the trimmed graph hopefully keeps the complexity manageable.
This algorithm can be disabled (for debugging purposes) via -fno-analyzer-feasibility, which simply uses the shortest path, and notes if it is infeasible.
The above gives us a shortest feasible exploded_path through the
exploded_graph (a list of exploded_edge *). We use this
exploded_path to build a diagnostics::paths::path (a list of
events for the diagnostic subsystem) - specifically a
checker_path.
Having built the checker_path, we prune it to try to eliminate
events that aren’t relevant, to minimize how much the user has to read.
After pruning, we notify each event in the path of its ID and record the
IDs of interesting events, allowing for events to refer to other events
in their descriptions. The pending_diagnostic class has various
vfuncs to support emitting more precise descriptions, so that e.g.
returning possibly-NULL pointer to 'make_obj' from 'allocator'
for a return_event to make it clearer how the unchecked value moves
from callee back to caller
second 'free' here; first 'free' was at (3)
and a use-after-free might use
use after 'free' here; memory was freed at (2)
At this point we can emit the diagnostic.