Skip to content

Commit b081b01

Browse files
Update book
1 parent 27a6ac0 commit b081b01

File tree

9 files changed

+945
-14
lines changed

9 files changed

+945
-14
lines changed

book/src/SUMMARY.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,11 @@
77
- [Rules](./rules.md)
88
- [Atoms](./rules/atoms.md)
99
- [Relations](./rules/relations.md)
10+
- [Control Flow Graph](./rules/cfg.md)
11+
- [Move Paths](./rules/paths.md)
1012
- [Initialization analysis](./rules/initialization.md)
1113
- [Liveness analysis](./rules/liveness.md)
12-
- [Loan analysis](./rules/loans.md)
14+
- [Loan analysis](./rules/naive.md)
15+
- [Optimized Variant](./rules/opt.md)
1316
- [Testing Polonius](./testing.md)
1417
- [See also](./see_also.md)

book/src/rules/atoms.md

Lines changed: 22 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,5 @@
11
# Atoms
22

3-
Polonius defines the following **atoms**. To Polonius, these are
4-
opaque identifiers that identify particular things within the input
5-
program (literally they are newtype'd integers). Their meaning and
6-
relationships come entirely from the input relations.
7-
8-
## Example
9-
103
We'll use this snippet of Rust code to illustrate the various kinds of
114
atoms that can exist.
125

@@ -17,13 +10,15 @@ let z = x.0;
1710
drop(y);
1811
```
1912

20-
## Variables
13+
#### `Var`
2114

2215
A **variable** represents a user variable defined by the Rust source
2316
code. In our snippet, `x`, `y`, and `z` are variables. Other kinds of
2417
variables include parameters.
2518

26-
## Path
19+
.type Var <: unsigned
20+
21+
#### `Path`
2722

2823
A **path** indicates a path through memory to a memory location --
2924
these roughly correspond to **places** in MIR, although we only
@@ -49,7 +44,9 @@ atom P1 for the path `x` and another atom P2 for the path `x.0`.
4944
These atoms are related to one another through the `path_parent`
5045
relation.
5146

52-
## Node
47+
.type Path <: unsigned
48+
49+
#### `Node`
5350

5451
Nodes are, well, *nodes* in the control-flow graph. They are related
5552
to one another by the `cfg_edge` relation.
@@ -60,17 +57,30 @@ begun executing -- the other is called the "mid node" -- which
6057
represents the point where S "takes effect". Each start node has
6158
exactly one successor, the mid node.
6259

63-
## Loans
60+
.type Node <: unsigned
61+
62+
#### `Block`
63+
64+
.type Block <: unsigned
65+
66+
#### `Location`
67+
68+
.type Location = [ block: Block, stmt: unsigned, mid: symbol ]
69+
70+
#### `Loan`
6471

6572
A **loan** represents some borrow that occurs in the source. Each
6673
loan has an associated path that was borrowed along with a mutability.
6774
So, in our example, there would be a single loan, for the `&x.1`
6875
expression.
6976

70-
## Origins
77+
.type Loan <: unsigned
78+
79+
#### `Origin`
7180

7281
An **origin** is what it typically called in Rust a **lifetime**. In
7382
Polonius, an **origin** refers to the set of loans from which a
7483
reference may have been created.
7584

85+
.type Origin <: unsigned
7686

book/src/rules/cfg.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Control-flow graph
2+
3+
## Nodes
4+
5+
#### `cfg_edge`
6+
7+
Indicates that an edge exists between `source` and `target`
8+
9+
.decl cfg_edge(sourceNode: Node, targetNode: Node)
10+
.input cfg_edge
11+
12+
#### `cfg_node`
13+
14+
Enumerates all nodes (note that this approach implies that a single node
15+
graph is essentially not a thing).
16+
17+
.decl cfg_node(p: Node)
18+
19+
cfg_node(p) :- cfg_edge(p, _).
20+
cfg_node(p) :- cfg_edge(_, p).
21+
22+
## Basic Blocks
23+
24+
#### `bb_edge`
25+
26+
.decl bb_edge(src: Block, targ: Block)
27+
.input bb_edge
28+
29+
#### `node_is_loc`
30+
31+
.decl node_is_loc(node: Node, loc: Location)
32+
.input node_is_loc
33+
34+
#### `precedes_in_block`
35+
36+
.decl precedes_in_block(loc1: Location, loc2: Location) inline
37+
38+
precedes_in_block([bb, stmt1, mid1], [bb, stmt2, mid2]) :-
39+
stmt2 > stmt1; (stmt1 = stmt2, mid2 = "Mid", mid1 = "Start").
40+
41+
#### `succeeds_in_block`
42+
43+
.decl succeeds_in_block(loc1: Location, loc2: Location) inline
44+
45+
succeeds_in_block([bb, stmt1, mid1], [bb, stmt2, mid2]) :-
46+
stmt2 > stmt1; (stmt1 = stmt2, mid1 = "Mid", mid2 = "Start").
47+

book/src/rules/init.md

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# Initialization
2+
3+
## Relations
4+
5+
#### `path_maybe_initialized_on_exit`
6+
7+
Here we compute the set of paths that *may* contain a value on exit from
8+
each given node in the CFG. This is used later as part of the liveness
9+
analysis. In particular, if a value has been moved, then its drop is a
10+
no-op.
11+
12+
This is not used to compute move errors -- it would be too "optimistic",
13+
since it only computes if a value *may* be initialized. See the next section
14+
on computing *uninitialization*.
15+
16+
.decl path_maybe_initialized_on_exit(path: Path, node: Node) brie
17+
18+
path_maybe_initialized_on_exit(path, node) :-
19+
path_assigned_at(path, node).
20+
21+
path_maybe_initialized_on_exit(path, targetNode) :-
22+
path_maybe_initialized_on_exit(path, sourceNode),
23+
cfg_edge(sourceNode, targetNode),
24+
!path_moved_at(path, targetNode).
25+
26+
#### `var_maybe_partly_initialized_on_exit`
27+
28+
We also compute which **variables** may be initialized (or at least partly
29+
initialized). Drops for variables that are not even partly initialized are
30+
known to be a no-op.
31+
32+
.decl var_maybe_partly_initialized_on_exit(var: Var, node: Node)
33+
34+
var_maybe_partly_initialized_on_exit(var, node) :-
35+
path_maybe_initialized_on_exit(path, node),
36+
path_begins_with_var(path, var).
37+
38+
#### `path_maybe_uninitialized_on_exit`
39+
40+
Here we compute the set of paths that are maybe *uninitialized* on exit from
41+
a node. Naturally, it would be illegal to access a path that is maybe
42+
uninitialized.
43+
44+
We compute "maybe uninitialized" because it is easier than computing "must
45+
be initialized" (though they are equivalent), since the latter requires
46+
intersection, which is not available in "core datalog". It may make sense --
47+
as an optimization -- to try and convert to intersection, although it is
48+
debatable which will result in more tuples overall.
49+
50+
.decl path_maybe_uninitialized_on_exit(path: Path, node: Node) brie
51+
52+
path_maybe_uninitialized_on_exit(path, node) :-
53+
path_moved_at(path, node).
54+
55+
path_maybe_uninitialized_on_exit(path, targetNode) :-
56+
path_maybe_uninitialized_on_exit(path, sourceNode),
57+
cfg_edge(sourceNode, targetNode),
58+
!path_assigned_at(path, targetNode).
59+
60+
#### `path_maybe_accessed_later`
61+
62+
.decl path_maybe_accessed_later(path: Path, node: Node) brie
63+
64+
path_maybe_accessed_later(path, node) :-
65+
path_accessed_at(path, node).
66+
67+
path_maybe_accessed_later(path, src) :-
68+
path_maybe_accessed_later(path, dst),
69+
cfg_edge(src, dst).
70+
71+
## Errors
72+
73+
#### `move_errors`
74+
75+
.decl move_errors(path: Path, node: Node)
76+
.output move_errors
77+
78+
move_errors(path, targetNode) :-
79+
path_maybe_uninitialized_on_exit(path, sourceNode),
80+
cfg_edge(sourceNode, targetNode),
81+
path_accessed_at(path, targetNode).
82+

book/src/rules/liveness.md

Lines changed: 117 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,119 @@
11
# Liveness analysis
22

3-
**These rules are not yet described.**
3+
The role of the liveness computation is to figure out, for each cfg node,
4+
which variables may be accessed at some point in the future. We also
5+
distinguish between variables that may be accessed in general and those that
6+
may only be dropped. This is because a "full access" may potentially
7+
dereference any reference found in the variable, whereas a drop is more
8+
limited in its effects.
9+
10+
One interesting wrinkle around drops is that we also need to consider the
11+
initialization state of each variable. This is because `Drop` statements can
12+
be added for variables which are never initialized, or whose values have
13+
been moved. Such statements are considered no-ops in MIR.
14+
15+
## Inputs
16+
17+
#### `var_used_at`
18+
19+
Variable is used at the given CFG node
20+
21+
.decl var_used_at(variable: Var, node: Node)
22+
.input var_used_at
23+
24+
#### `var_defined_at`
25+
26+
Variable is defined (overwritten) at the given CFG node
27+
28+
.decl var_defined_at(variable: Var, node: Node)
29+
.input var_defined_at
30+
31+
#### `var_dropped_at`
32+
33+
Variable is dropped at this cfg node
34+
35+
.decl var_dropped_at(variable: Var, node: Node)
36+
.input var_dropped_at
37+
38+
#### `use_of_var_derefs_origin`
39+
40+
References with the given origin may be
41+
dereferenced when the variable is used.
42+
43+
In rustc, we generate this whenever the
44+
type of the variable includes the given
45+
origin.
46+
47+
.decl use_of_var_derefs_origin(variable: Var, origin: Origin)
48+
.input use_of_var_derefs_origin
49+
50+
References with the given origin may be
51+
dereferenced when the variable is dropped
52+
53+
#### `drop_of_var_derefs_origin`
54+
55+
In rustc, we generate this by examining the type
56+
and taking into account various
57+
unstable attributes. It is always a subset
58+
of `use_of_var_derefs_origin`.
59+
60+
.decl drop_of_var_derefs_origin(variable: Var, origin: Origin)
61+
.input drop_of_var_derefs_origin
62+
63+
## Relations
64+
65+
#### `var_live_on_entry`
66+
67+
Variables that are live on entry.
68+
69+
.decl var_live_on_entry(var: Var, node: Node)
70+
71+
var_live_on_entry(var, node) :-
72+
var_used_at(var, node).
73+
74+
var_live_on_entry(var, sourceNode) :-
75+
var_live_on_entry(var, targetNode),
76+
cfg_edge(sourceNode, targetNode),
77+
!var_defined_at(var, sourceNode).
78+
79+
#### `var_drop_live_on_entry`
80+
81+
Variables that are "drop live" on entry.
82+
83+
The initial rule is that, when a variable is dropped, that makes it
84+
drop-live -- unless we know that the variable is fully uninitialized, in
85+
which case the drop is a no-op.
86+
87+
**Optimization:** In rustc, we compute drop-live only up to the point where
88+
something becomes "use-live". We could do the same here by adding some `!`
89+
checks against `var_live_on_entry`, though it would require stratification
90+
in the datalog (not a problem).
91+
92+
.decl var_drop_live_on_entry(var: Var, node: Node)
93+
94+
var_drop_live_on_entry(var, targetNode) :-
95+
var_dropped_at(var, targetNode),
96+
cfg_edge(sourceNode, targetNode),
97+
var_maybe_partly_initialized_on_exit(var, sourceNode).
98+
99+
var_drop_live_on_entry(var, sourceNode) :-
100+
var_drop_live_on_entry(var, targetNode),
101+
cfg_edge(sourceNode, targetNode),
102+
!var_defined_at(var, sourceNode),
103+
var_maybe_partly_initialized_on_exit(var, sourceNode).
104+
105+
#### `origin_live_on_entry`
106+
107+
An origin is live at the node N if some reference with that origin may be
108+
dereferenced in the future.
109+
110+
.decl origin_live_on_entry(origin: Origin, node: Node)
111+
112+
origin_live_on_entry(origin, node) :-
113+
var_live_on_entry(var, node),
114+
use_of_var_derefs_origin(var, origin).
115+
116+
origin_live_on_entry(origin, node) :-
117+
var_drop_live_on_entry(var, node),
118+
drop_of_var_derefs_origin(var, origin).
119+

0 commit comments

Comments
 (0)