Skip to content

Commit 00f4a4f

Browse files
authored
Merge pull request #161 from Pennycook/metric-rework
Rework the metrics used by the FileTree
2 parents ea5c73c + 29fc127 commit 00f4a4f

File tree

12 files changed

+308
-226
lines changed

12 files changed

+308
-226
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@
66
Code Base Investigator (CBI) is an analysis tool that provides insight into the
77
portability and maintainability of an application's source code.
88

9-
- Measure [code divergence](http://doi.org/10.1109/P3HPC.2018.00006) to
10-
understand how much code is specialized for different compilers, operating
11-
systems, hardware micro-architectures and more.
9+
- Measure [code divergence](http://doi.org/10.1109/P3HPC.2018.00006) and
10+
platform coverage to understand how much code is specialized for different
11+
compilers, operating systems, hardware micro-architectures and more.
1212

1313
- Visualize the distance between the code paths used to support different
1414
compilation targets.

codebasin/report.py

Lines changed: 107 additions & 114 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,81 @@ def extract_platforms(setmap):
5858
return list(unique_platforms)
5959

6060

61+
def coverage(
62+
setmap: dict[frozenset[str], int],
63+
platforms: set[str] | None = None,
64+
) -> float:
65+
"""
66+
Compute the percentage of lines in `setmap` required by at least one
67+
platform in the supplied `platforms` set.
68+
69+
Parameters
70+
----------
71+
setmap: dict[frozenset[str], int]
72+
A mapping from platform set to SLOC count.
73+
74+
platforms: set[str], optional
75+
The set of platforms to use when computing coverage.
76+
If not provided, computes coverage for all platforms.
77+
78+
Returns
79+
-------
80+
float
81+
The amount of code used by at least one platform, as a percentage.
82+
If `setmap` contains no lines of code or no platforms, returns NaN.
83+
"""
84+
if not platforms:
85+
platforms = set().union(*setmap.keys())
86+
87+
used = 0
88+
total = 0
89+
for subset, sloc in setmap.items():
90+
total += sloc
91+
if subset == frozenset():
92+
continue
93+
elif any([p in platforms for p in subset]):
94+
used += sloc
95+
96+
if total == 0:
97+
return float("nan")
98+
99+
return (used / total) * 100.0
100+
101+
102+
def average_coverage(
103+
setmap: dict[frozenset[str], int],
104+
platforms: set[str] | None = None,
105+
) -> float:
106+
"""
107+
Computes the coverage for each platform in the supplied `platforms` set
108+
(by calling :py:func:`coverage` for each platform), then returns the
109+
average (mean) of these values.
110+
111+
Parameters
112+
----------
113+
setmap: dict[frozenset[str], int]
114+
A mapping from platform set to SLOC count.
115+
116+
platforms: set[str], optional
117+
The set of platforms to use when computing coverage.
118+
If not provided, computes average over all platforms.
119+
120+
Returns
121+
-------
122+
float
123+
The average amount of code used by each platform, as a percentage.
124+
If `setmap` contains no lines of code or no platforms, returns NaN.
125+
"""
126+
if not platforms:
127+
platforms = set().union(*setmap.keys())
128+
129+
if len(platforms) == 0:
130+
return float("nan")
131+
132+
total = sum([coverage(setmap, [p]) for p in platforms])
133+
return total / len(platforms)
134+
135+
61136
def distance(setmap, p1, p2):
62137
"""
63138
Compute distance between two platforms
@@ -91,74 +166,6 @@ def divergence(setmap):
91166
return d / float(npairs)
92167

93168

94-
def utilization(setmap: defaultdict[frozenset[str], int]) -> float:
95-
"""
96-
Compute the average code utilization for all lines in the setmap.
97-
i.e., (reused SLOC / total SLOC)
98-
99-
Parameters
100-
----------
101-
setmap: defaultdict[frozenset[str], int]
102-
The mapping from platform sets to SLOC.
103-
104-
Returns
105-
-------
106-
float
107-
The average code utilization, in the range [0, NumPlatforms].
108-
If the number of total SLOC is 0, returns NaN.
109-
"""
110-
reused_sloc = 0
111-
total_sloc = 0
112-
for k, v in setmap.items():
113-
reused_sloc += len(k) * v
114-
total_sloc += v
115-
if total_sloc == 0:
116-
return float("nan")
117-
118-
return reused_sloc / total_sloc
119-
120-
121-
def normalized_utilization(
122-
setmap: defaultdict[frozenset[str], int],
123-
total_platforms: int | None = None,
124-
) -> float:
125-
"""
126-
Compute the average code utilization, normalized for a specific number of
127-
platforms.
128-
129-
Parameters
130-
----------
131-
setmap: defaultdict[frozenset[str,int]
132-
The mapping from platform sets to SLOC.
133-
134-
total_platforms: int, optional
135-
The total number of platforms to use as the denominator.
136-
By default, the denominator will be derived from the setmap.
137-
138-
Returns
139-
-------
140-
float
141-
The average code utilization, in the range [0, 1].
142-
143-
Raises
144-
------
145-
ValueError
146-
If `total_platforms` < the number of platforms in `setmap`.
147-
"""
148-
original_platforms = len(extract_platforms(setmap))
149-
if total_platforms is None:
150-
total_platforms = original_platforms
151-
if total_platforms < original_platforms:
152-
raise ValueError(
153-
"Cannot normalize to fewer platforms than the setmap contains.",
154-
)
155-
156-
if total_platforms == 0:
157-
return float("nan")
158-
else:
159-
return utilization(setmap) / total_platforms
160-
161-
162169
def summary(setmap: defaultdict[str, int], stream: TextIO = sys.stdout):
163170
"""
164171
Produce a summary report for the platform set, including
@@ -195,11 +202,11 @@ def summary(setmap: defaultdict[str, int], stream: TextIO = sys.stdout):
195202
]
196203

197204
cd = divergence(setmap)
198-
nu = normalized_utilization(setmap)
199-
unused = (setmap[frozenset()] / total_count) * 100.0
205+
cc = coverage(setmap)
206+
ac = average_coverage(setmap)
200207
lines += [f"Code Divergence: {cd:.2f}"]
201-
lines += [f"Code Utilization: {nu:.2f}"]
202-
lines += [f"Unused Code (%): {unused:.2f}"]
208+
lines += [f"Coverage (%): {cc:.2f}"]
209+
lines += [f"Avg. Coverage (%): {ac:.2f}"]
203210
lines += [f"Total SLOC: {total_count}"]
204211

205212
print("\n".join(lines), file=stream)
@@ -509,80 +516,67 @@ def _platforms_str(
509516
output += f"{color}{value}\033[0m"
510517
return output
511518

512-
def _sloc_str(self, max_used: int, max_total: int) -> str:
519+
def _sloc_str(self, max_sloc: int) -> str:
513520
"""
514521
Parameters
515522
----------
516-
max_used: int
517-
The maximum used SLOC, used to determine formatting width.
518-
519-
max_total: int
520-
The maximum total SLOC, used to determine formatting width.
523+
max_sloc: int
524+
The maximum SLOC, used to determine formatting width.
521525
522526
Returns
523527
-------
524528
str
525-
A string representing the SLOC used by this Node, in the form
526-
"used / total" with human-readable numbers.
529+
A string representing the SLOC associated with this Node, with
530+
human-readable numbers.
527531
"""
528532
color = ""
529533
if len(self.platforms) == 0:
530534
color = "\033[2m"
531535
elif self.is_symlink():
532536
color = "\033[96m"
533537

534-
used_len = len(_human_readable(max_used))
535-
total_len = len(_human_readable(max_total))
538+
sloc_len = len(_human_readable(max_sloc))
539+
sloc = _human_readable(sum(self.setmap.values()))
536540

537-
used = _human_readable(self.sloc)
538-
total = _human_readable(sum(self.setmap.values()))
541+
return f"{color}{sloc:>{sloc_len}}\033[0m"
539542

540-
return f"{color}{used:>{used_len}} / {total:>{total_len}}\033[0m"
541-
542-
def _divergence_str(self) -> str:
543+
def _coverage_str(self, platforms: set[str]) -> str:
543544
"""
544545
Returns
545546
-------
546547
str
547-
A string representing code divergence in this Node.
548+
A string representing code coverage of this Node.
548549
"""
549-
cd = divergence(self.setmap)
550+
cc = coverage(self.setmap, platforms)
550551
color = ""
551552
if len(self.platforms) == 0:
552553
color = "\033[2m"
553554
elif self.is_symlink():
554555
color = "\033[96m"
555-
elif cd <= 0.25:
556+
elif cc >= 50:
556557
color = "\033[32m"
557-
elif cd >= 0.75 or len(self.platforms) == 1:
558+
elif cc < 50:
558559
color = "\033[35m"
559-
return f"{color}{cd:4.2f}\033[0m"
560+
return f"{color}{cc:6.2f}\033[0m"
560561

561-
def _utilization_str(self, total_platforms: int) -> str:
562+
def _average_coverage_str(self, platforms: set[str]) -> str:
562563
"""
563-
Parameters
564-
----------
565-
total_platforms: int
566-
The number of platforms in the whole FileTree.
567-
568564
Returns
569565
-------
570566
str
571-
A string representing code utilization in this Node.
567+
A string representing average coverage of this Node.
572568
"""
573-
nu = normalized_utilization(self.setmap, total_platforms)
574-
569+
cc = average_coverage(self.setmap, platforms)
575570
color = ""
576571
if len(self.platforms) == 0:
577572
color = "\033[2m"
578573
elif self.is_symlink():
579574
color = "\033[96m"
580-
elif nu > 0.5:
575+
elif cc >= 50:
581576
color = "\033[32m"
582-
elif nu <= 0.5:
577+
elif cc < 50:
583578
color = "\033[35m"
584-
585-
return f"{color}{nu:4.2f}\033[0m"
579+
return f"{color}{cc:6.2f}\033[0m"
586580

587581
def _meta_str(self, root: Self) -> str:
588582
"""
@@ -596,13 +590,12 @@ def _meta_str(self, root: Self) -> str:
596590
str
597591
A string representing meta-information for this FileTree.Node.
598592
"""
599-
max_used = root.sloc
600-
max_total = sum(root.setmap.values())
593+
max_sloc = sum(root.setmap.values())
601594
info = [
602595
self._platforms_str(root.platforms),
603-
self._sloc_str(max_used, max_total),
604-
self._divergence_str(),
605-
self._utilization_str(len(root.platforms)),
596+
self._sloc_str(max_sloc),
597+
self._coverage_str(root.platforms),
598+
self._average_coverage_str(root.platforms),
606599
]
607600
return "[" + " | ".join(info) + "]"
608601

@@ -804,10 +797,10 @@ def files(
804797
legend += [""]
805798
legend += ["Columns:"]
806799
header = [
807-
"Platform Set",
808-
"Used SLOC / Total SLOC",
809-
"Code Divergence",
810-
"Code Utilization",
800+
"Platforms",
801+
"SLOC",
802+
"Coverage (%)",
803+
"Avg. Coverage (%)",
811804
]
812805
legend += ["[" + " | ".join(header) + "]"]
813806
legend += [""]

docs/source/analysis.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -78,8 +78,8 @@ the following output:
7878
{cpu, gpu} 17 51.52
7979
-----------------------
8080
Code Divergence: 0.45
81-
Unused Code (%): 6.06
82-
Total SLOC: 33
81+
Coverage (%): 93.94
82+
Avg. Coverage (%): 72.73
8383
8484
Distance Matrix
8585
--------------

docs/source/emulating-compiler-behavior.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -121,5 +121,6 @@ becomes:
121121
{cpu, gpu} 11 40.74
122122
-----------------------
123123
Code Divergence: 0.59
124-
Unused Code (%): 0.00
124+
Coverage (%): 100.00
125+
Avg. Coverage (%): 70.37
125126
Total SLOC: 27

docs/source/excluding-files.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,8 @@ shared between the cpu and gpu platforms:
5858
{cpu, gpu} 11 40.74
5959
-----------------------
6060
Code Divergence: 0.56
61-
Unused Code (%): 7.41
61+
Coverage (%): 92.59
62+
Avg. Coverage (%): 66.67
6263
Total SLOC: 27
6364
6465

docs/source/features.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,8 @@ are shared between different platform sets.
4242
{FPGA, CPU 1, GPU 2, GPU 1, CPU 2} 9 21.95
4343
---------------------------------------------
4444
Code Divergence: 0.55
45-
Unused Code (%): 4.88
45+
Coverage (%): 95.12
46+
Avg. Coverage (%): 42.44
4647
Total SLOC: 41
4748
4849
Future releases of CBI will provide additional ways to visualize the results of

docs/source/index.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,9 +41,9 @@ Code Base Investigator
4141
Code Base Investigator (CBI) is an analysis tool that provides insight into the
4242
portability and maintainability of an application's source code.
4343

44-
- Measure ":doc:`code divergence <specialization>`" to understand how much code
45-
is *specialized* for different compilers, operating systems, hardware
46-
micro-architectures and more.
44+
- Measure "code divergence" and "platform coverage" to understand how much code
45+
is :doc:`specialized <specialization>` for different compilers, operating
46+
systems, hardware micro-architectures and more.
4747

4848
- Visualize the distance between the code paths used to support different
4949
compilation targets.

0 commit comments

Comments
 (0)