|
1 | 1 | // Copyright Kani Contributors
|
2 | 2 | // SPDX-License-Identifier: Apache-2.0 OR MIT
|
3 | 3 | //! GOTO binary serializer.
|
| 4 | +//! |
| 5 | +//! # Design overview |
| 6 | +//! |
| 7 | +//! When saving a [SymbolTable] to a binary file, the [Irep] describing each |
| 8 | +//! symbol's type, value and source location are structurally hashed and |
| 9 | +//! uniquely numbered so that structurally identical [Irep] only get written |
| 10 | +//! in full to the file the first time they are encountered and that ulterior |
| 11 | +//! occurrences are referenced by their unique number instead. |
| 12 | +//! The key concept at play is that of a numbering, ie a function that assigns |
| 13 | +//! numbers to values of a given type. |
| 14 | +//! |
| 15 | +//! The [IrepNumbering] struct offers high-level methods to number |
| 16 | +//! [InternedString], [IrepId] and [Irep] values: |
| 17 | +//! - [InternedString] objects get mapped to [NumberedString] objects based on |
| 18 | +//! the characters they contain. |
| 19 | +//! - [IrepId] objects get mapped to [NumberedString] objects based on the |
| 20 | +//! characters of their string representation, in the same number space |
| 21 | +//! as [InternedString]. |
| 22 | +//! - [Irep] objects get mapped to [NumberedIrep] based on: |
| 23 | +//! + the unique numbers assigned to their [Irep::id] attribute, |
| 24 | +//! + the unique numbers of [Irep] found in their [Irep::sub] attribute, |
| 25 | +//! + the pairs of unique numbers assigned to the ([IrepId],[Irep]) pairs |
| 26 | +//! found in their [Ipre::named_sub] attribute. |
| 27 | +//! |
| 28 | +//! In order to assign the same number to structurally identical [Irep] objects, |
| 29 | +//! [IrepNumbering] essentially implements a cache where each [NumberedIrep] is |
| 30 | +//! keyed under an [IrepKey] that describes its internal structure. |
| 31 | +//! |
| 32 | +//! An [IrepKey] is simply the vector of unique numbers describing the |
| 33 | +//! `id`, `sub` and `named_sub` attributes of a [Irep]. |
| 34 | +//! |
| 35 | +//! A [NumberedIrep] is conceptually a pair made of the [IrepKey] itself and the |
| 36 | +//! unique number assigned to that key. |
| 37 | +//! |
| 38 | +//! The cache implemented by [IrepNumbering] is bidirectional. It lets you |
| 39 | +//! compute the [NumberedIrep] of an [Irep], and lets you fetch a numbered |
| 40 | +//! [NumberedIrep] from its unique number. |
| 41 | +//! |
| 42 | +//! In practice: |
| 43 | +//! - the forward directon from [IrepKey] to unique numbers is implemented using a `HashMap<IrepKey,usize>` |
| 44 | +//! - the inverse direction from unique numbers to [NumberedIrep] is implemented usign a `Vec<NumberedIrep>` |
| 45 | +//! called the `index` that stores [NumberedIrep] under their unique number. |
| 46 | +//! |
| 47 | +//! Earlier we said that an [NumberedIrep] is conceptually a pair formed of |
| 48 | +//! an [IrepKey] and its unique number. It is represented using only |
| 49 | +//! a pair formed of a `usize` representing the unique number, and a `usize` |
| 50 | +//! representing the index at which the key can be found inside a single vector |
| 51 | +//! of type `Vec<usize>` called `keys` where all [IrepKey] are concatenated when |
| 52 | +//! they first get numbered. The inverse map of keys is represented this way |
| 53 | +//! because the Rust hash map that implements the forward cache owns |
| 54 | +//! its keys which would make it difficult to store keys references in inverse |
| 55 | +//! cache, which would introduce circular dependencies and require `Rc` and |
| 56 | +//! liftetimes annotations everywhere. |
| 57 | +//! |
| 58 | +//! Numberig an [Irep] consists in recursively traversing it and numbering its |
| 59 | +//! contents in a bottom-up fashion, then assembling its [IrepKey] and querying |
| 60 | +//! the cache to either return an existing [NumberedIrep] or creating a new one |
| 61 | +//! (in case that key has never been seen before). |
| 62 | +//! |
| 63 | +//! The [GotoBinarySerializer] internally uses a [IrepNumbering] and a cache |
| 64 | +//! of [NumberedIrep] and [NumberedString] it has already written to file. |
| 65 | +//! |
| 66 | +//! When given an [InternedString], [IrepId] or [Irep] to serialize, |
| 67 | +//! the [GotoBinarySerializer] first numbers that object using its internal |
| 68 | +//! [IrepNumbering] instance. Then it looks up that unique number in its cache |
| 69 | +//! of already written objects. If the object was seen before, only the unique |
| 70 | +//! number of that object is written to file. If the object was never seen |
| 71 | +//! before, then the unique number of that object is written to file, followed |
| 72 | +//! by the objects describing its contents (themselves only being written fully |
| 73 | +//! if they have never been seen before, or only referenced if they have been |
| 74 | +//! seen before, etc.) |
| 75 | +//! |
| 76 | +//! The [GotoBinaryDeserializer] also uses an [IrepNumbering] and a cache |
| 77 | +//! of [NumberedIrep] and [NumberedString] it has already read from file. |
| 78 | +//! Dually to the serializer, it will only attempt to decode the contents of an |
| 79 | +//! object from the byte stream on the first occurrence. |
4 | 80 |
|
5 | 81 | use crate::irep::{Irep, IrepId, Symbol, SymbolTable};
|
6 | 82 | use crate::{InternString, InternedString};
|
@@ -40,82 +116,6 @@ pub fn read_goto_binary_file(filename: &Path) -> io::Result<()> {
|
40 | 116 | deserializer.read_file()
|
41 | 117 | }
|
42 | 118 |
|
43 |
| -/// # Design overview |
44 |
| -/// |
45 |
| -/// When saving a [SymbolTable] to a binary file, the [Irep] describing each |
46 |
| -/// symbol's type, value and source location are structurally hashed and |
47 |
| -/// uniquely numbered so that structurally identical [Irep] only get written |
48 |
| -/// in full to the file the first time they are encountered and that ulterior |
49 |
| -/// occurrences are referenced by their unique number instead. |
50 |
| -/// The key concept at play is that of a numbering, ie a function that assigns |
51 |
| -/// numbers to values of a given type. |
52 |
| -/// |
53 |
| -/// The [IrepNumbering] struct offers high-level methods to number |
54 |
| -/// [InternedString], [IrepId] and [Irep] values: |
55 |
| -/// - [InternedString] objects get mapped to [NumberedString] objects based on |
56 |
| -/// the characters they contain. |
57 |
| -/// - [IrepId] objects get mapped to [NumberedString] objects based on the |
58 |
| -/// characters of their string representation, in the same number space |
59 |
| -/// as [InternedString]. |
60 |
| -/// - [Irep] objects get mapped to [NumberedIrep] based on: |
61 |
| -/// + the unique numbers assigned to their [Irep::id] attribute, |
62 |
| -/// + the unique numbers of [Irep] found in their [Irep::sub] attribute, |
63 |
| -/// + the pairs of unique numbers assigned to the ([IrepId],[Irep]) pairs |
64 |
| -/// found in their [Ipre::named_sub] attribute. |
65 |
| -/// |
66 |
| -/// In order to assign the same number to structurally identical [Irep] objects, |
67 |
| -/// [IrepNumbering] essentially implements a cache where each [NumberedIrep] is |
68 |
| -/// keyed under an [IrepKey] that describes its internal structure. |
69 |
| -/// |
70 |
| -/// An [IrepKey] is simply the vector of unique numbers describing the |
71 |
| -/// `id`, `sub` and `named_sub` attributes of a [Irep]. |
72 |
| -/// |
73 |
| -/// A [NumberedIrep] is conceptually a pair made of the [IrepKey] itself and the |
74 |
| -/// unique number assigned to that key. |
75 |
| -/// |
76 |
| -/// The cache implemented by [IrepNumbering] is bidirectional. It lets you |
77 |
| -/// compute the [NumberedIrep] of an [Irep], and lets you fetch a numbered |
78 |
| -/// [NumberedIrep] from its unique number. |
79 |
| -/// |
80 |
| -/// In practice: |
81 |
| -/// - the forward directon from [IrepKey] to unique numbers is implemented using a `HashMap<IrepKey,usize>` |
82 |
| -/// - the inverse direction from unique numbers to [NumberedIrep] is implemented usign a `Vec<NumberedIrep>` |
83 |
| -/// called the `index` that stores [NumberedIrep] under their unique number. |
84 |
| -/// |
85 |
| -/// Earlier we said that an [NumberedIrep] is conceptually a pair formed of |
86 |
| -/// an [IrepKey] and its unique number. It is represented using only |
87 |
| -/// a pair formed of a `usize` representing the unique number, and a `usize` |
88 |
| -/// representing the index at which the key can be found inside a single vector |
89 |
| -/// of type `Vec<usize>` called `keys` where all [IrepKey] are concatenated when |
90 |
| -/// they first get numbered. The inverse map of keys is represented this way |
91 |
| -/// because the Rust hash map that implements the forward cache owns |
92 |
| -/// its keys which would make it difficult to store keys references in inverse |
93 |
| -/// cache, which would introduce circular dependencies and require `Rc` and |
94 |
| -/// liftetimes annotations everywhere. |
95 |
| -/// |
96 |
| -/// Numberig an [Irep] consists in recursively traversing it and numbering its |
97 |
| -/// contents in a bottom-up fashion, then assembling its [IrepKey] and querying |
98 |
| -/// the cache to either return an existing [NumberedIrep] or creating a new one |
99 |
| -/// (in case that key has never been seen before). |
100 |
| -/// |
101 |
| -/// The [GotoBinarySerializer] internally uses a [IrepNumbering] and a cache |
102 |
| -/// of [NumberedIrep] and [NumberedString] it has already written to file. |
103 |
| -/// |
104 |
| -/// When given an [InternedString], [IrepId] or [Irep] to serialize, |
105 |
| -/// the [GotoBinarySerializer] first numbers that object using its internal |
106 |
| -/// [IrepNumbering] instance. Then it looks up that unique number in its cache |
107 |
| -/// of already written objects. If the object was seen before, only the unique |
108 |
| -/// number of that object is written to file. If the object was never seen |
109 |
| -/// before, then the unique number of that object is written to file, followed |
110 |
| -/// by the objects describing its contents (themselves only being written fully |
111 |
| -/// if they have never been seen before, or only referenced if they have been |
112 |
| -/// seen before, etc.) |
113 |
| -/// |
114 |
| -/// The [GotoBinaryDeserializer] also uses an [IrepNumbering] and a cache |
115 |
| -/// of [NumberedIrep] and [NumberedString] it has already read from file. |
116 |
| -/// Dually to the serializer, it will only attempt to decode the contents of an |
117 |
| -/// object from the byte stream on the first occurrence. |
118 |
| -
|
119 | 119 | /// A numbered [InternedString]. The number is guaranteed to be in [0,N].
|
120 | 120 | /// Had to introduce this indirection because [InternedString] does not let you
|
121 | 121 | /// access its unique id, so we have to build one ourselves.
|
|
0 commit comments