Skip to content

Commit adba0e7

Browse files
authored
Merge pull request #185 from Pennycook/doc/2.0.0-features
Add documentation for new features
2 parents 435636f + 5450331 commit adba0e7

File tree

4 files changed

+258
-17
lines changed

4 files changed

+258
-17
lines changed

docs/source/analysis.rst

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,100 @@ platforms. Plugging these numbers into the equation for code divergence gives
104104
not to (more on that later).
105105

106106

107+
Running ``cbi-tree``
108+
####################
109+
110+
Running ``codebasin`` provides an overview of divergence and coverage, which
111+
can be useful when we want to familiarize ourselves with a new code base,
112+
compare the impact of different code structures upon certain metrics, or track
113+
specialization metrics over time. However, it doesn't provide any *actionable*
114+
insight into how to improve a code base.
115+
116+
To understand how much specialization exists in each source file, we can
117+
substitute ``codebasin`` for ``cbi-tree``::
118+
119+
$ cbi-tree analysis.toml
120+
121+
This command performs the same analysis as ``codebasin``, but produces a tree
122+
annotated with information about which files contain specialization:
123+
124+
.. code-block:: text
125+
:emphasize-lines: 8,9,11,16
126+
127+
Legend:
128+
A: cpu
129+
B: gpu
130+
131+
Columns:
132+
[Platforms | SLOC | Coverage (%) | Avg. Coverage (%)]
133+
134+
[AB | 33 | 93.94 | 72.73] o /home/username/code-base-investigator/docs/sample-code-base/src/
135+
[AB | 13 | 100.00 | 92.31] ├── main.cpp
136+
[A- | 7 | 85.71 | 42.86] ├─o cpu/
137+
[A- | 7 | 85.71 | 42.86] │ └── foo.cpp
138+
[AB | 6 | 100.00 | 100.00] ├─o third-party/
139+
[AB | 1 | 100.00 | 100.00] │ ├── library.h
140+
[AB | 5 | 100.00 | 100.00] │ └── library.cpp
141+
[-B | 7 | 85.71 | 42.86] └─o gpu/
142+
[-B | 7 | 85.71 | 42.86] └── foo.cpp
143+
144+
.. tip::
145+
146+
Running ``cbi-tree`` in a modern terminal environment producers colored
147+
output to improve usability for large code bases.
148+
149+
Each node in the tree represents a source file or directory in the code
150+
base and is annotated with four pieces of information:
151+
152+
1. **Platforms**
153+
154+
The set of platforms that use the file or directory.
155+
156+
2. **SLOC**
157+
158+
The number of source lines of code (SLOC) in the file or directory.
159+
160+
3. **Coverage (%)**
161+
162+
The amount of code in the file or directory that is used by all platforms,
163+
as a percentage of SLOC.
164+
165+
4. **Avg. Coverage (%)**
166+
167+
The amount of code in the file or directory that is used by each platform,
168+
on average, as a percentage of SLOC.
169+
170+
The root of the tree represents the entire code base, and so the values in
171+
the annotations match the ``codebasin`` results: two platforms (``A`` and
172+
``B``) use the directory, there are 33 lines in total, 93.94% of those lines
173+
(i.e., 31 lines) are used by at least one platform, and each platform uses
174+
72.73% of those lines (i.e., 24 lines) on average. By walking the tree, we can
175+
break these numbers down across the individual files and directories in the
176+
code base.
177+
178+
Starting with ``main.cpp``, we can see that it is used by both platforms
179+
(``A`` and ``B``), and that 100% of the 13 lines in the file are used by at
180+
least one platform. However, the average coverage is only 92.31%, reflecting
181+
that each platform uses only 12 of those lines.
182+
183+
Turning our attention to ``cpu/foo.cpp`` and ``gpu/foo.cpp``, we can see
184+
that they are each specialized for one platform (``A`` and ``B``,
185+
respectively). The coverage for both files is only 85.71% (i.e., 6 of the 7
186+
lines), which tells us that both files contain some unused code (i.e., 1 line).
187+
The average coverage of 42.86% highlights the extent of the specialization.
188+
189+
.. tip::
190+
191+
Looking at average coverage is the best way to identify highly specialized
192+
regions of code. As the number of platforms targeted by a code base
193+
increases, the average coverage for files used by only a small number of
194+
platforms will approach zero.
195+
196+
The remaining files all have a coverage of 100.00% and an average coverage
197+
of 100.00%. This is our ideal case: all of the code in the file is used by
198+
at least one platform, and all of the platforms use all of the code.
199+
200+
107201
Filtering Platforms
108202
###################
109203

@@ -125,3 +219,9 @@ platform as follows:
125219
.. code:: sh
126220
127221
$ codebasin -p cpu analysis.toml
222+
223+
or
224+
225+
.. code:: sh
226+
227+
$ cbi-tree -p cpu analysis.toml

docs/source/cmd.rst

Lines changed: 72 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,13 +25,15 @@ Command Line Interface
2525
``-q, --quiet``
2626
Decrease verbosity level.
2727

28+
``--debug``
29+
Enable debug mode.
30+
2831
``-R <report>``
2932
Generate a report of the specified type.
3033

3134
- ``summary``: code divergence information
3235
- ``clustering``: distance matrix and dendrogram
3336
- ``duplicates``: detected duplicate files
34-
- ``files``: information about individual files
3537

3638
``-x <pattern>, --exclude <pattern>``
3739
Exclude files matching this pattern from the code base.
@@ -41,3 +43,72 @@ Command Line Interface
4143
Include the specified platform in the analysis.
4244
May be specified multiple times.
4345
If not specified, all platforms will be included.
46+
47+
Tree Tool
48+
---------
49+
50+
The tree tool generates a visualization of the code base where each file and
51+
directory is annotated with information about platform usage and coverage.
52+
53+
.. code-block:: text
54+
55+
cbi-tree [-h] [--version] [-x <pattern>] [-p <platform>] [--prune] [-L <level>] <analysis-file>
56+
57+
**positional arguments:**
58+
59+
``analysis-file``
60+
TOML file describing the analysis to be performed, including the codebase and platform descriptions.
61+
62+
**options:**
63+
64+
``-h, --help``
65+
Display help message and exit.
66+
67+
``--version``
68+
Display version information and exit.
69+
70+
``-x <pattern>, --exclude <pattern>``
71+
Exclude files matching this pattern from the code base.
72+
May be specified multiple times.
73+
74+
``-p <platform>, --platform <platform>``
75+
Include the specified platform in the analysis.
76+
May be specified multiple times.
77+
If not specified, all platforms will be included.
78+
79+
``--prune``
80+
Prune unused files from the tree.
81+
82+
``-L <level>, --levels <level>``
83+
Print only the specified number of levels.
84+
85+
Coverage Tool
86+
-------------
87+
88+
The coverage tool reads a JSON compilation database and generates a JSON
89+
coverage file that is suitable to be read by other tools.
90+
91+
.. code-block:: text
92+
93+
cbi-cov compute [-h] [-S <path>] [-x <pattern>] [-o <output path>] <input path>
94+
95+
**positional arguments:**
96+
97+
``input path``
98+
Path to compilation database JSON file.
99+
100+
**options:**
101+
102+
``-h, --help``
103+
Display help message and exit.
104+
105+
``-S <path>, --source-dir <path>``
106+
Path to source directory.
107+
108+
``-x <pattern>, --exclude <pattern>``
109+
Exclude files matching this pattern from the code base.
110+
May be specified multiple times.
111+
112+
``-o <output path>, --output <output path>``
113+
Path to coverage JSON file.
114+
If not specified, defaults to 'coverage.json'.

docs/source/emulating-compiler-behavior.rst

Lines changed: 54 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@ that are not reflected on the command line (such as their default include
77
paths, or compiler version macros).
88

99
If we believe (or already know!) that these behaviors will impact the
10-
divergence calculation for a code base, we can use a configuration file to
11-
instruct CBI to append additional options when emulating certain compilers.
10+
CBI's analysis of a code base, we can use a configuration file to append
11+
additional options when emulating certain compilers.
1212

1313
.. attention::
1414

@@ -76,12 +76,12 @@ but there is not enough information to decide what the value of
7676
:code:`__GNUC__` should be.
7777

7878

79-
Defining Behaviors
80-
------------------
79+
Defining Implicit Options
80+
-------------------------
8181

82-
``codebasin`` searches for a file called ``.cbi/config``, and uses the
83-
information found in that file to determine implicit compiler behavior. Each
84-
compiler definition is a TOML `table`_, of the form shown below:
82+
CBI searches for a file called ``.cbi/config``, and uses the information found
83+
in that file to determine implicit compiler options. Each compiler definition
84+
is a TOML `table`_, of the form shown below:
8585

8686
.. _`table`: https://toml.io/en/v1.0.0#table
8787

@@ -124,3 +124,50 @@ becomes:
124124
Coverage (%): 100.00
125125
Avg. Coverage (%): 70.37
126126
Total SLOC: 27
127+
128+
129+
Parsing Compiler Options
130+
------------------------
131+
132+
In more complex cases, emulating a compiler's implicit behavior requires CBI to
133+
parse the command-line arguments passed to the compiler. Such emulation
134+
requires CBI to understand which options are important and how they impact
135+
compilation.
136+
137+
CBI ships with a number of compiler definitions included (see `here`_), and the
138+
same syntax can be used to define custom compiler behaviors within the
139+
``.cbi/config`` file.
140+
141+
.. _`here`: https://github.com/intel/code-base-investigator/tree/main/codebasin/compilers
142+
143+
For example, the TOML file below defines behavior for the ``gcc`` and ``g++`` compilers:
144+
145+
.. code-block:: toml
146+
147+
[compiler.gcc]
148+
# This example does not define any implicit options.
149+
150+
# g++ inherits all options of gcc.
151+
[compiler."g++"]
152+
alias_of = "gcc"
153+
154+
# The -fopenmp flag enables a dedicated OpenMP compiler "mode".
155+
[[compiler.gcc.parser]]
156+
flags = ["-fopenmp"]
157+
action = "append_const"
158+
dest = "modes"
159+
const = "openmp"
160+
161+
# In OpenMP mode, the _OPENMP macro is defined.
162+
[[compiler.gcc.modes]]
163+
name = "openmp"
164+
defines = ["_OPENMP"]
165+
166+
This functionality is intended for expert users. In most cases, we expect that
167+
defining implicit options or relying on CBI's built-in compiler emulation
168+
support will be sufficient.
169+
170+
.. attention::
171+
172+
If you encounter a common case where a custom compiler definition is
173+
required, please `open an issue`_.

docs/source/features.rst

Lines changed: 32 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -13,18 +13,18 @@ Although limited, this functionality is sufficient to support analysis of many
1313
HPC codes, and CBI has been tested on C, C++, CUDA and some Fortran code bases.
1414

1515

16-
Computing Code Divergence
17-
#########################
16+
Computing Specialization Metrics
17+
################################
1818

19-
CBI computes code divergence by building a *specialization tree*, like the one
20-
shown below:
19+
CBI computes code divergence and platform coverage by building a
20+
*specialization tree*, like the one shown below:
2121

2222
.. image:: specialization-tree.png
2323
:alt: An example of a specialization tree.
2424

2525
CBI can then walk and evaluate this tree for different platform definitions, to
26-
produce a divergence report providing a breakdown of how many lines of code
27-
are shared between different platform sets.
26+
produce a report providing a breakdown of how many lines of code are shared
27+
between different platform sets.
2828

2929
.. code:: text
3030
@@ -46,9 +46,7 @@ are shared between different platform sets.
4646
Avg. Coverage (%): 42.44
4747
Total SLOC: 41
4848
49-
Future releases of CBI will provide additional ways to visualize the results of
50-
this analysis, in order to highlight exactly *which* lines of code correspond
51-
to different platform sets.
49+
For more information about these metrics, see :doc:`here <specialization>`.
5250

5351

5452
Hierarchical Clustering
@@ -76,3 +74,28 @@ hierarchical clustering by platform similarity.
7674

7775
.. image:: example-dendrogram.png
7876
:alt: A dendrogram representing the distance between platforms.
77+
78+
79+
Visualizing Platform Coverage
80+
#############################
81+
82+
To assist developers in identifying exactly which parts of their code are
83+
specialized and for which platforms, CBI can produce an annotated tree showing
84+
the amount of specialization within each file.
85+
86+
.. code:: text
87+
88+
Legend:
89+
A: cpu
90+
B: gpu
91+
92+
Columns:
93+
[Platforms | SLOC | Coverage (%) | Avg. Coverage (%)]
94+
95+
[AB | 1.0k | 2.59 | 1.83] o /path/to/sample-code-base/src/
96+
[-- | 1.0k | 0.00 | 0.00] |-- unused.cpp
97+
[AB | 13 | 100.00 | 92.31] |-- main.cpp
98+
[A- | 7 | 100.00 | 50.00] |-o cpu/
99+
[A- | 7 | 100.00 | 50.00] | \-- foo.cpp
100+
[-B | 7 | 100.00 | 50.00] \-o gpu/
101+
[-B | 7 | 100.00 | 50.00] \-- foo.cpp

0 commit comments

Comments
 (0)