Drop tqdm in favor of logger; normalize use of tabulate #35

mcioffi · 2025-08-11T20:32:24Z

The Updates ✏️

TL;DR — Future web app (PR #25) use websockets for routing training data to the front-end, and tqdm adds complexity to that with stdout behavior over sockets, so this proposes to remove it; also included are updates to tabulate printing and a decorate function

Trainer: Remove tqdm dependency
Trainer: Reverts to logger.info for multi-processing tasks like train (multiple), gridsearch, and featuresearch
Trainer: Normalize use of tabulate across all tasks — train (single/multiple) now gets this treatment (examples below)
Trainer: New decorator fn called set_redirect_log_stream, to be used in web app

$ python3 train.py train --database train/data/training.sqlite3

[INFO] (training_utils) Loading and transforming training data.
[INFO] (training_utils) 81,272 usable vectors.
[INFO] (training_utils) 45 discarded due to OTHER labels.
[INFO] (train_model) 905609147 is the random seed used for the train/test split.
[INFO] (train_model) 65,017 training vectors.
[INFO] (train_model) 16,255 testing vectors.
[INFO] (train_model) Training model with training data.
[INFO] (train_model) Evaluating model with test data.

+------------------------+---------------------------+
| Sentence-level results | Word-level results        |
+------------------------+---------------------------+
| Accuracy: 94.42%       | Accuracy: 97.84%          |
|                        | Precision (micro) 97.82%  |
|                        | Recall (micro) 97.84%     |
|                        | F1 score (micro) 97.83%   |
+------------------------+---------------------------+

$ python3.12 train.py multiple --database train/data/training.sqlite3 --runs 2

[INFO] (training_utils) Loading and transforming training data.
[INFO] (training_utils) 81,272 usable vectors.
[INFO] (training_utils) 45 discarded due to OTHER labels.
[INFO] (train_model) Queued for 2 separate runs. This may take some time.
[INFO] (train_model) 1st run completed
[INFO] (train_model) 2nd run completed

+---------+---------------------+-------------------+-----------+
| Run     | Word/Token accuracy | Sentence accuracy | Seed      |
+---------+---------------------+-------------------+-----------+
| 1st     | 97.75%              | 94.36%            | 94545036  |
| 2nd     | 98.01%              | 94.83%            | 125339459 |
| -       | -                   | -                 | -         |
| Average | 97.88% ± 0.54%      | 94.60% ± 0.98%    | 125339459 |
| -       | -                   | -                 | -         |
| Best    | 98.01%              | 94.83%            | 125339459 |
| Worst   | 97.75%              | 94.36%            | 94545036  |
+---------+---------------------+-------------------+-----------+

Changes are to accommodate strangetom#25, as webtools will need basic logging to handle websocket logging behavior required for front-end

…le output from websockets 1. Work-in-progress new feature for gridsearch 2. Rever to logger.info for websocket output, to be aligned with develop branch 4. Refine the console output, should have originally used <pre> tags to persist reserved str \n, \t, etc Note: This commit refers to a function set_redirect_log_stream, that will be introduced in strangetom#35, since it is required to set a logging stream handle correctly to pipe results to websockets

…b app/tool Refer to notes on other PR at strangetom@371a060

strangetom · 2025-08-12T16:22:30Z

Thanks @mcioffi

A few minor issues:

Can you run the pre-commit hooks to fix the formatting issues
Using the fancy_grid style for the tables makes for better readability when the table columns are wide (e.g. when grid search results)

strangetom · 2025-08-12T17:04:16Z

Perfect, thanks!

@strangetom

* Drop flask front-end * First batch of vite/react/ts front-end with flask server * Updates for docs and readability * Cleanup * Update assets + readme * Refactoring webtools + integrating flask-socketio * Bump readme * Bump * dev reqs & concurrent pids bash in npm * Change handling of edit mode from modal to inline * cleanup * Labeller help hover * Missed doc notes * Train model tab updates - Cleaned up flask code - Removed train tab interrupt button (limitations) - Added package requirements precheck on app startup - Added time elapsed on training screen * Accidental none check against thread, preventing re-runs * Asset screenshot * Move time tracker on trainer to zustrand state "Time elapsed" tracking needs to live within managed state, and not ui state, due to the training being asynchronous and user can use the app while the model is training in the background * Bump * Add seed input to training * Add seed input to training (server) * Sets up cache reset on model loader Addresses one of the comments by @strangetom on #25 (comment) regarding cache resetting by the @ lru_cache decorator on the model loading. This ensures completed training cycles done via the web app (which impacts the parser) updates the parser optimistically to handle the cache resetting * Bump up third-party package versions against vite and mantine * WIP - add gridsearch options, revert to logger.info, and refine console output from websockets 1. Work-in-progress new feature for gridsearch 2. Rever to logger.info for websocket output, to be aligned with develop branch 4. Refine the console output, should have originally used <pre> tags to persist reserved str \n, \t, etc Note: This commit refers to a function set_redirect_log_stream, that will be introduced in #35, since it is required to set a logging stream handle correctly to pipe results to websockets * Screenshots * Remove remnants * Cleanup, readability, etc * Remove interrupt and keepModels based on maintainer feedback * Pre-commit format fixes, part I * Pre-commit fixes, part II (adds biomejs for webtools linter/formatter/etc) Tacks on to the existing pre-commit hooks in the repo with BiomeJS. BiomeJS is specifically for the web side (typescript/js/css). Local configuration modeled after preconfigs at https://github.com/biomejs/pre-commit. Commit includes all file format fixes necessary. Anticipated that config will be modified as necessary if new web contributors participate in future commits. * Restore training.sqlite3 to previous commit Accidentally included new training.sqlite3 in e01430b * Address bug feedback, round I - [x] Parser: The amount flags are not shown in the results table - [x] Parser: order labels in token tooltip as per old webapp - [x] Parser: Reorder rows in result table as per old webapp - [x] General: Can we avoid use of google fonts? Everything else (after running npm install) is entirely local. - [x] Parser: show % sign in tooltips for scores for each label - [x] Trainer: Output include debug info app.sockets, which isn't relevant to training. - [x] Trainer: html and tsv files saved in webtools directory instead of parent - [x] General: on the parser and labeller pages, the contrast between text and colours is a bit low (particularly for anything on a red background). This is probably best fixed by increasing the font size on those pages, since it's a little small anyway. - [x] Trainer: default model location incorrect (this was changed recently, is should be in ingredient_parser/en/data/) - [x] Trainer: split value does not allow more than one decimal place. There shouldn't really be any limit of the number of decimal places (but in practice, 3 might be a more reasonable limit than 1). - [x] Trainer: saveur dataset is selected by default, but doesn't exist. - [x] Parser: Missing separate_names option * Address bugs feedback, round II - [x] Trainer: Inputs for seed, split do not focus on click so cannot edit without tabbing to the inputs or finding another way to focus them. This might be a wider problem than just those. - [x] Trainer: unable to start another training run after completing a previous one if confusion matrix is generated by the first run. There's a RuntimeError related to Tk. * Address bugs feedback, round III - [x] Labeller: Unable to search unicode fractions - [x] Labeller: Unable to quickly select a label for a token using first letter (as per <select> element) - [x] Labeller: Bug when searching (coarse, dried returns no results but should return sentence id 4718) * Address bugs feedback, round IV - [x] Trainer: The split value seems to be capped at 0.9. That should probably changed 0.999 - [x] Labeller: There's a weird bug when changing the label for a token - an addition PUNC token gets inserted. This seems to effect any sentence containing a comma when you change the label of the first token. * Address bugs feedback, round V - [x] Labeller: Is it possible to be able to tab between tokens in a sentence to change to the next one? * Address bugs feedback, round VI - [x] Trainer: Gridsearch incomplete --------- Co-authored-by: tom <tpstrange@gmail.com>

mcioffi added 4 commits August 11, 2025 14:19

Drop tqdm in favor of logger; normalize use of tabulate

d25ccd9

Changes are to accommodate strangetom#25, as webtools will need basic logging to handle websocket logging behavior required for front-end

Missed reference

703b696

Preserve logging bypass from existing branch

f842f3e

Remove flask web package dependencies until PR#25

80ac95f

Introduce decorate fn for redirecting logsteam to be later used in we…

3750338

…b app/tool Refer to notes on other PR at strangetom@371a060

Formatting fixes with pre-commit

670e266

strangetom merged commit 50c12f3 into strangetom:develop Aug 12, 2025
4 checks passed

mcioffi deleted the pr/35 branch August 27, 2025 11:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Drop tqdm in favor of logger; normalize use of tabulate #35

Drop tqdm in favor of logger; normalize use of tabulate #35

Uh oh!

mcioffi commented Aug 11, 2025 •

edited

Loading

Uh oh!

strangetom commented Aug 12, 2025

Uh oh!

strangetom commented Aug 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Drop tqdm in favor of logger; normalize use of tabulate #35

Drop tqdm in favor of logger; normalize use of tabulate #35

Uh oh!

Conversation

mcioffi commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The Updates ✏️

Uh oh!

strangetom commented Aug 12, 2025

Uh oh!

strangetom commented Aug 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mcioffi commented Aug 11, 2025 •

edited

Loading