Skip to content

Commit 729e9ae

Browse files
authored
Generate solacc reports (#2757)
## Changes Currently, `solacc` outputs advices to the console, and does not keep track of linting time. This PR: - dumps advices to an 'advices.txt' file - collects stats and dumps them to a 'stats.json' file (in the form of a json-like file, with 1 json object per line) - uploads the above ### Linked issues None ### Functionality None ### Tests - [x] manually tested: Sample solacc run: ![Screenshot 2024-09-27 at 16 44 22](https://github.com/user-attachments/assets/a0f508e4-4095-45b8-be38-51d08824b15e) Sample stats.json (expanded): ``` { "run_id": "1", "name": "ab-testing", "start_timestamp": "2024-09-27 10:16:02.512363+00:00", "end_timestamp": "2024-09-27 10:17:07.622161+00:00", "files_count": 6, "files_size": 34934 } { "run_id": "1", "name": "adverse-drug-events", "start_timestamp": "2024-09-27 10:17:08.669225+00:00", "end_timestamp": "2024-09-27 10:17:09.399495+00:00", "files_count": 5, "files_size": 48743 } { "run_id": "1", "name": "als-recommender", "start_timestamp": "2024-09-27 10:17:09.565942+00:00", "end_timestamp": "2024-09-27 10:17:12.039422+00:00", "files_count": 6, "files_size": 62750 } ``` Sample advices.txt: ``` ./dist/ab-testing/4. Real time inference.py:76:0: [legacy-context-in-shared-clusters] sc is not supported on UC Shared Clusters. Rewrite it using spark ./dist/ab-testing/4. Real time inference.py:139:0: [jvm-access-in-shared-clusters] Cannot access Spark Driver JVM on UC Shared Clusters ./dist/ab-testing/4. Real time inference.py:139:0: [legacy-context-in-shared-clusters] sc is not supported on UC Shared Clusters. Rewrite it using spark ./dist/ab-testing/RUNME.py:1:1: [library-install-failed] Unsupported 'pip' command: DBTITLE ./dist/ab-testing/5. AB testing metrics.py:50:0: [jvm-access-in-shared-clusters] Cannot access Spark Driver JVM on UC Shared Clusters ./dist/ab-testing/5. AB testing metrics.py:50:0: [legacy-context-in-shared-clusters] sc is not supported on UC Shared Clusters. Rewrite it using spark ./dist/ab-testing/5. AB testing metrics.py:51:0: [jvm-access-in-shared-clusters] Cannot access Spark Driver JVM on UC Shared Clusters ./dist/ab-testing/5. AB testing metrics.py:51:0: [legacy-context-in-shared-clusters] sc is not supported on UC Shared Clusters. Rewrite it using spark ./dist/ab-testing/5. AB testing metrics.py:54:0: [jvm-access-in-shared-clusters] Cannot access Spark Driver JVM on UC Shared Clusters ./dist/ab-testing/5. AB testing metrics.py:54:0: [legacy-context-in-shared-clusters] sc is not supported on UC Shared Clusters. Rewrite it using spark ./dist/ab-testing/5. AB testing metrics.py:58:0: [jvm-access-in-shared-clusters] Cannot access Spark Driver JVM on UC Shared Clusters ./dist/ab-testing/5. AB testing metrics.py:58:0: [legacy-context-in-shared-clusters] sc is not supported on UC Shared Clusters. Rewrite it using spark ./dist/ab-testing/2. Model training.py:34:0: [jvm-access-in-shared-clusters] Cannot access Spark Driver JVM on UC Shared Clusters ./dist/ab-testing/2. Model training.py:34:0: [legacy-context-in-shared-clusters] sc is not supported on UC Shared Clusters. Rewrite it using spark ./dist/ab-testing/2. Model training.py:36:0: [legacy-context-in-shared-clusters] sc is not supported on UC Shared Clusters. Rewrite it using spark ./dist/ab-testing/2. Model training.py:37:0: [legacy-context-in-shared-clusters] sc is not supported on UC Shared Clusters. Rewrite it using spark ./dist/ab-testing/2. Model training.py:37:0: [rdd-in-shared-clusters] RDD APIs are not supported on UC Shared Clusters. Rewrite it using DataFrame API ./dist/ab-testing/2. Model training.py:38:0: [jvm-access-in-shared-clusters] Cannot access Spark Driver JVM on UC Shared Clusters ./dist/ab-testing/2. Model training.py:41:0: [jvm-access-in-shared-clusters] Cannot access Spark Driver JVM on UC Shared Clusters ./dist/ab-testing/2. Model training.py:41:0: [legacy-context-in-shared-clusters] sc is not supported on UC Shared Clusters. Rewrite it using spark ./dist/ab-testing/5. AB testing metrics.py:22:0: [default-format-changed-in-dbr8] The default format changed in Databricks Runtime 8.0, from Parquet to Delta ./dist/ab-testing/5. AB testing metrics.py:36:27: [default-format-changed-in-dbr8] The default format changed in Databricks Runtime 8.0, from Parquet to Delta ./dist/ab-testing/5. AB testing metrics.py:90:2: [default-format-changed-in-dbr8] The default format changed in Databricks Runtime 8.0, from Parquet to Delta ./dist/ab-testing/5. AB testing metrics.py:97:2: [default-format-changed-in-dbr8] The default format changed in Databricks Runtime 8.0, from Parquet to Delta ./dist/ab-testing/risk_demo.dbdash:1:0: [unknown-language] Cannot detect language for /home/runner/work/ucx/ucx/dist/ab-testing/risk_demo.dbdash ./dist/ab-testing/1. Introduction.py:72:2: [direct-filesystem-access] The use of direct filesystem references is deprecated: /tmp/german_credit_data.csv ./dist/ab-testing/4. Real time inference.py:10:0: [direct-filesystem-access] The use of direct filesystem references is deprecated: /FileStore/tmp/streaming_ckpnt_risk_demo ./dist/ab-testing/4. Real time inference.py:10:14: [direct-filesystem-access] The use of direct filesystem references is deprecated: /FileStore/tmp/streaming_ckpnt_risk_demo ./dist/ab-testing/4. Real time inference.py:29:5: [default-format-changed-in-dbr8] The default format changed in Databricks Runtime 8.0, from Parquet to Delta ./dist/ab-testing/4. Real time inference.py:214:32: [direct-filesystem-access] The use of direct filesystem references is deprecated: /FileStore/tmp/streaming_ckpnt_risk_demo ./dist/ab-testing/4. Real time inference.py:236:2: [default-format-changed-in-dbr8] The default format changed in Databricks Runtime 8.0, from Parquet to Delta ./dist/ab-testing/4. Real time inference.py:254:27: [default-format-changed-in-dbr8] The default format changed in Databricks Runtime 8.0, from Parquet to Delta ./dist/ab-testing/2. Model training.py:53:22: [direct-filesystem-access] The use of direct filesystem references is deprecated: /Users/Uninferable/german_credit_experiment ./dist/ab-testing/2. Model training.py:66:5: [default-format-changed-in-dbr8] The default format changed in Databricks Runtime 8.0, from Parquet to Delta ./dist/ab-testing/2. Model training.py:197:67: [direct-filesystem-access] The use of direct filesystem references is deprecated: /tmp/pr-curve-model-a.png ./dist/ab-testing/2. Model training.py:229:67: [direct-filesystem-access] The use of direct filesystem references is deprecated: /tmp/pr-curve-model-b.png ./dist/adverse-drug-events/RUNME.py:1:1: [library-install-failed] Unsupported 'pip' command: DBTITLE ./dist/adverse-drug-events/01-ade-extraction.py:23:0: [jvm-access-in-shared-clusters] Cannot access Spark Driver JVM on UC Shared Clusters ./dist/adverse-drug-events/01-ade-extraction.py:23:0: [legacy-context-in-shared-clusters] sc is not supported on UC Shared Clusters. Rewrite it using spark ./dist/adverse-drug-events/02-ade-analysis.py:13:0: [jvm-access-in-shared-clusters] Cannot access Spark Driver JVM on UC Shared Clusters ./dist/adverse-drug-events/02-ade-analysis.py:13:0: [legacy-context-in-shared-clusters] sc is not supported on UC Shared Clusters. Rewrite it using spark ./dist/adverse-drug-events/02-ade-analysis.py:14:0: [jvm-access-in-shared-clusters] Cannot access Spark Driver JVM on UC Shared Clusters ``` --------- Co-authored-by: Eric Vergnaud <eric.vergnaud@databricks.com>
1 parent 97b9996 commit 729e9ae

File tree

4 files changed

+73
-12
lines changed

4 files changed

+73
-12
lines changed

.github/workflows/solacc.yml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,3 +26,10 @@ jobs:
2626

2727
- name: Verify linters on solution accelerators
2828
run: make solacc
29+
30+
- name: Upload reports
31+
uses: actions/upload-artifact@v4
32+
with:
33+
name: report
34+
path: build/
35+
if-no-files-found: error

src/databricks/labs/ucx/source_code/base.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,10 @@ def message_relative_to(self, base: Path, *, default: Path | None = None) -> str
9696
logger.debug(f'THIS IS A BUG! {advice.code}:{advice.message} has unknown path')
9797
if default is not None:
9898
path = default
99-
path = path.relative_to(base)
99+
try:
100+
path = path.relative_to(base)
101+
except ValueError:
102+
logger.debug(f'Not a relative path: {path} to base: {base}')
100103
# increment start_line because it is 0-based whereas IDEs are usually 1-based
101104
return f"./{path.as_posix()}:{advice.start_line+1}:{advice.start_col}: [{advice.code}] {advice.message}"
102105

src/databricks/labs/ucx/source_code/notebooks/sources.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -163,7 +163,13 @@ def __init__(
163163
self._python_trees: dict[PythonCell, Tree] = {} # the original trees to be linted
164164

165165
def lint(self) -> Iterable[Advice]:
166-
yield from self._load_tree_from_notebook(self._notebook, True)
166+
has_failure = False
167+
for advice in self._load_tree_from_notebook(self._notebook, True):
168+
if isinstance(advice, Failure): # happens when a cell is unparseable
169+
has_failure = True
170+
yield advice
171+
if has_failure:
172+
return
167173
for cell in self._notebook.cells:
168174
if not self._context.is_supported(cell.language.language):
169175
continue

tests/integration/source_code/solacc.py

Lines changed: 55 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,11 @@
1+
import dataclasses
2+
import json
13
import logging
24
import os
35
import shutil
46
import sys
57
from dataclasses import dataclass, field
8+
from datetime import datetime, timezone
69
from pathlib import Path
710

811
import requests
@@ -20,6 +23,8 @@
2023

2124
this_file = Path(__file__)
2225
dist = (this_file / '../../../../dist').resolve().absolute()
26+
build = dist.parent / "build"
27+
build.mkdir(exist_ok=True)
2328

2429

2530
def _get_repos_to_clone() -> dict[str, str]:
@@ -72,23 +77,41 @@ def _collect_uninferrable_count(advices: list[LocatedAdvice]):
7277

7378

7479
def _collect_unparseable(advices: list[LocatedAdvice]):
75-
return set(located_advice for located_advice in advices if located_advice.advice.code == 'parse-error')
80+
return list(located_advice for located_advice in advices if located_advice.advice.code == 'parse-error')
7681

7782

7883
def _print_advices(advices: list[LocatedAdvice]):
79-
for located_advice in advices:
80-
message = located_advice.message_relative_to(dist.parent)
81-
sys.stdout.write(f"{message}\n")
84+
messages = list(
85+
located_advice.message_relative_to(dist.parent).replace('\n', ' ') + '\n' for located_advice in advices
86+
)
87+
if os.getenv("CI"):
88+
advices_path = build / "advices.txt"
89+
with advices_path.open("a") as advices_file:
90+
advices_file.writelines(messages)
91+
else:
92+
for message in messages:
93+
sys.stdout.write(message)
94+
95+
96+
@dataclass
97+
class _SolaccStats:
98+
run_id: str
99+
name: str
100+
start_timestamp: datetime
101+
end_timestamp: datetime
102+
files_count: int
103+
files_size: int
82104

83105

84106
@dataclass
85107
class _SolaccContext:
86108
unparsed_files_path: Path | None = None
87-
files_to_skip: set[str] | None = None
109+
files_to_skip: set[Path] | None = None
88110
total_count = 0
89111
parseable_count = 0
90112
uninferrable_count = 0
91113
missing_imports: dict[str, dict[str, int]] = field(default_factory=dict)
114+
stats: list[_SolaccStats] = field(default_factory=list)
92115

93116
@classmethod
94117
def create(cls, for_all_dirs: bool):
@@ -98,11 +121,11 @@ def create(cls, for_all_dirs: bool):
98121
unparsed_path = Path(Path(__file__).parent, "solacc-unparsed.txt")
99122
if unparsed_path.exists():
100123
os.remove(unparsed_path)
101-
files_to_skip: set[str] | None = None
124+
files_to_skip: set[Path] | None = None
102125
malformed = Path(__file__).parent / "solacc-malformed.txt"
103126
if for_all_dirs and malformed.exists():
104127
lines = malformed.read_text(encoding="utf-8").split("\n")
105-
files_to_skip = set(line for line in lines if len(line) > 0 and not line.startswith("#"))
128+
files_to_skip = set(dist / line for line in lines if len(line) > 0 and not line.startswith("#"))
106129
return _SolaccContext(unparsed_files_path=unparsed_path, files_to_skip=files_to_skip)
107130

108131
def register_missing_import(self, missing_import: str):
@@ -153,7 +176,19 @@ def _lint_dir(solacc: _SolaccContext, soldir: Path):
153176
files_to_skip = set(solacc.files_to_skip) if solacc.files_to_skip else set()
154177
linted_files = set(files_to_skip)
155178
# lint solution
179+
start_timestamp = datetime.now(timezone.utc)
156180
advices = list(ctx.local_code_linter.lint_path(soldir, linted_files))
181+
end_timestamp = datetime.now(timezone.utc)
182+
# record stats
183+
stats = _SolaccStats(
184+
run_id=os.getenv("GITHUB_RUN_ATTEMPT") or "local",
185+
start_timestamp=start_timestamp,
186+
end_timestamp=end_timestamp,
187+
name=soldir.name,
188+
files_count=len(all_files),
189+
files_size=sum(path.stat().st_size for path in [soldir / filename for filename in all_files]),
190+
)
191+
solacc.stats.append(stats)
157192
# collect unparseable files
158193
unparseables = _collect_unparseable(advices)
159194
solacc.parseable_count += len(linted_files) - len(files_to_skip) - len(set(advice.path for advice in unparseables))
@@ -162,7 +197,11 @@ def _lint_dir(solacc: _SolaccContext, soldir: Path):
162197
logger.error(f"Error during parsing of {unparseable.path}: {unparseable.advice.message}".replace("\n", " "))
163198
# populate solacc-unparsed.txt
164199
with solacc.unparsed_files_path.open(mode="a", encoding="utf-8") as f:
165-
f.write(unparseable.path.relative_to(dist).as_posix())
200+
try:
201+
path = unparseable.path.relative_to(dist)
202+
except ValueError:
203+
path = unparseable.path
204+
f.write(path.as_posix())
166205
f.write("\n")
167206
# collect missing imports
168207
for missing_import in _collect_missing_imports(advices):
@@ -178,8 +217,8 @@ def _lint_dir(solacc: _SolaccContext, soldir: Path):
178217
def _lint_repos(clone_urls, sol_to_lint: str | None):
179218
solacc = _SolaccContext.create(sol_to_lint is not None)
180219
if sol_to_lint:
181-
# don't clone if linting just one file, assumption is we're troubleshooting
182-
_lint_dir(solacc, dist / sol_to_lint)
220+
sol_dir = _clone_repo(clone_urls[sol_to_lint], sol_to_lint)
221+
_lint_dir(solacc, sol_dir)
183222
else:
184223
names: list[str] = list(clone_urls.keys())
185224
for name in sorted(names, key=str.casefold):
@@ -199,6 +238,12 @@ def _lint_repos(clone_urls, sol_to_lint: str | None):
199238
f"not computed: {solacc.uninferrable_count}"
200239
)
201240
solacc.log_missing_imports()
241+
# log stats
242+
stats_path = build / "stats.json"
243+
with stats_path.open("a") as stats_file:
244+
for stats in solacc.stats:
245+
message = json.dumps(dataclasses.asdict(stats), default=str)
246+
stats_file.writelines([message])
202247
# fail the job if files are unparseable
203248
if parseable_pct < 100:
204249
sys.exit(1)

0 commit comments

Comments
 (0)