-
Notifications
You must be signed in to change notification settings - Fork 0
Adds verification of TagNode.location_path to the integration tests #87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 6 commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
49bcddb
.editorconfig: Adds specs for Markdown files
funkyfuture 0f3b09d
integrations-tests: Diversifies progress indication appearance
funkyfuture dcfc0b0
integrations-tests: Adds a test that verifies TagNode.location_path
funkyfuture 9c1d04a
integrations-tests: Amends README with instructive information
funkyfuture 2239b76
Adds integration-tests hatch env with tqdm as progress indicator
JKatzwinkel 69038de
integration-tests: Handles network issues when fetching corpora
funkyfuture 04b634c
integration-tests: Optimizes file walk & polish
funkyfuture f8f9dbc
integration-tests: Employ progress bars where feasible
funkyfuture 1d81efd
integration-tests: Updates usage instructions
funkyfuture 2108348
integration-tests: Include scripts in formatting and linting
funkyfuture File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,106 @@ | ||
#!/bin/env python3 | ||
|
||
import multiprocessing as mp | ||
import random | ||
from collections.abc import Iterable | ||
from itertools import chain | ||
from pathlib import Path | ||
from sys import stderr | ||
from typing import Final | ||
|
||
from _delb.plugins.core_loaders import path_loader | ||
from delb import is_tag_node, Document, FailedDocumentLoading, ParserOptions | ||
|
||
from utils import indicate_progress | ||
|
||
BATCH_SIZE: Final = 64 | ||
CPU_COUNT: Final = mp.cpu_count() | ||
|
||
CORPORA_PATH: Final = Path(__file__).parent.resolve() / "corpora" | ||
|
||
DOCUMENT_SAMPLES_PERCENT: Final = 25 | ||
LOCATIONS_PATHS_SAMPLES_PERCENT: Final = 25 | ||
|
||
|
||
def verify_location_paths(file: Path): | ||
try: | ||
document = Document( | ||
file, | ||
parser_options=ParserOptions( | ||
collapse_whitespace=False, resolve_entities=False, unplugged=True | ||
), | ||
) | ||
except FailedDocumentLoading as exc: | ||
print( | ||
f"\nFailed to load {file.name}: {exc.excuses[path_loader]}", | ||
end="", | ||
) | ||
return | ||
|
||
root = document.root | ||
for node in chain((root,), root.iterate_descendants(is_tag_node)): | ||
if random.randint(1, 100) > LOCATIONS_PATHS_SAMPLES_PERCENT: | ||
continue | ||
|
||
query_results = document.xpath(node.location_path) | ||
if query_results.size == 1 and query_results.first is node: | ||
indicate_progress() | ||
else: | ||
print( | ||
f"\nXPath query `{node.location_path}` in {file} yielded unexpected " | ||
"results." | ||
) | ||
stderr.write("🕱") | ||
|
||
|
||
def dispatch_batch(files: Iterable[Path]): | ||
for file in files: | ||
try: | ||
verify_location_paths(file) | ||
except Exception as e: | ||
print(f"\nUnhandled exception while testing {file}: {e}") | ||
|
||
|
||
def main(): | ||
mp.set_start_method("forkserver") | ||
|
||
all_counter = counter = 0 | ||
selected_files = [] | ||
dispatched_tasks = [] | ||
|
||
with mp.Pool(CPU_COUNT) as pool: | ||
for file in CORPORA_PATH.rglob("*.xml"): | ||
all_counter += 1 | ||
if random.randint(1, 100) > DOCUMENT_SAMPLES_PERCENT: | ||
continue | ||
|
||
selected_files.append(file) | ||
counter += 1 | ||
if len(selected_files) < BATCH_SIZE: | ||
continue | ||
|
||
dispatched_tasks.append( | ||
pool.apply_async(dispatch_batch, (tuple(selected_files),)) | ||
) | ||
selected_files.clear() | ||
|
||
while len(dispatched_tasks) >= CPU_COUNT: | ||
for task in dispatched_tasks: | ||
if task.ready(): | ||
dispatched_tasks.remove(task) | ||
|
||
stderr.flush() | ||
|
||
dispatch_batch(selected_files) | ||
stderr.flush() | ||
|
||
print( | ||
f"\n\nTested against {counter} *randomly* selected out of {all_counter} " | ||
"documents." | ||
f"\n{LOCATIONS_PATHS_SAMPLES_PERCENT}% of the tag nodes' `location_path` " | ||
f"attribute were verified per document." | ||
) | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
import random | ||
from sys import stderr | ||
from typing import Final | ||
|
||
# Neukölln's digest | ||
PROGRESS_INDICATION_CHARCATERS: Final = "✓→🚴✊★☆⯪𓄁𓅯▶️✴️🪇⚒️🧻🚬🗿🎳⏳🌝☕🐑🐞🌼🪱🌸🏵💮️" | ||
|
||
|
||
def indicate_progress(): | ||
stderr.write(random.choice(PROGRESS_INDICATION_CHARCATERS)) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the i think fairly standard font i use for code doesn't have a lot of those charcaters...