Redesign for webtools — move to vite/react/ts #25

mcioffi · 2024-10-30T04:15:49Z

The New 💯

TL;DR — Moves the flask front-end to vite/react/typescript front-end preserving the basic flask server.

Mantine holds most of the weight across the app for base component functionality
Combines the original labeller and webapp into one webtools
Webtools has three main features — (a) the main parser to test sentences and ingredients , (b) the labeller to edit and improve the dataset, and (c) a trigger to train the model
A favicon/logo
For the above (c) trigger to train model needs to actually be built ... initial thinking is to use websockets
Preserve most, if not all, of original repo features ... currently it's not all migrated yet
Add some testing framework for unit tests
General cleanup and simpler ways to spin up dev instance of flask plus vite/react/typescript

"Too Early" Sneak Peak 👀

mcioffi · 2025-04-30T12:43:55Z

@strangetom does pycrfsuite expose some sort of progress on the training execution? I looked at their docs on the Trainer class, but figured I would pose the question here, as it relates to this PR, since I am building a piece of the web tool for triggering and working with the training parts of this repo.

ingredient-parser/train/train_model.py

Line 259 in ec619bb

trainer.train(save_model)

strangetom · 2025-04-30T18:36:32Z

It's possible to set trainer = pycrfsuite.Trainer(verbose=True), which gives an output during the model training;

***** Iteration #774 *****
Loss: 1743.588445
Feature norm: 27.243591
Error norm: 18.806226
Active features: 5412
Line search trials: 1
Line search step: 1.000000
Seconds required for this iteration: 0.056

***** Iteration #775 *****
Loss: 1743.586763
Feature norm: 27.243678
Error norm: 6.752622
Active features: 5411
Line search trials: 2
Line search step: 0.500000
Seconds required for this iteration: 0.111

...etc

It's not a measure of progress as such, but it is an indication that something is happening.

mcioffi · 2025-05-08T12:38:02Z

@strangetom can you give the PR a spin on your machine, and run through the "Try the Parser" and "Browse & Label" tabs. Node is required to be installed, its managed outside of Python, as its a JS runtime. Once its globally installed on your machine, you can download the packages with /webtools/ as your working directory:

npm install

Then you need to run these three separate instances, and access the app in your browser same as you would have before, typically http://localhost:5000 http://127.0.0.1:5000

npm run flask
npm run flask-sockets
npm run watch

Depending on your python environment setup, you might need map the python path located in the npm package.json, in which case you can change these accordingly:

"flask": "python3_path=$(which python3) && \"$python3_path\" app.py",
"flask-sockets": "python3_path=$(which python3) && \"$python3_path\" app.sockets.py",

I have been some good progress on the "Train Model" tab, but there are some outstanding issues — you can ignore that at the moment.

strangetom · 2025-05-12T15:25:07Z

Hi @mcioffi

I've got this working and have had a play with it. This is very impressive.

I have a couple of notes:

flask_cors is missing from requirements-dev.txt.
Is there a way run this without running the three commands
I'm not sure if there's a need for a separate edit mode when browsing the training data.

mcioffi · 2025-05-13T13:57:21Z

Many thanks for the feedback. I pushed some updates based on your notes. I will have the 'Train model' tab done in coming weeks.

All set, thank you

flask_cors is missing from requirements-dev.txt.

Pulled everything under npm run dev, it should spawn three separate processes within the same terminal session now

Is there a way run this without running the three commands

Good point. I folded that functionality inline now, instead of in a separate user experience

I'm not sure if there's a need for a separate edit mode when browsing the training data.

- Cleaned up flask code - Removed train tab interrupt button (limitations) - Added package requirements precheck on app startup - Added time elapsed on training screen

mcioffi · 2025-07-31T22:38:56Z

@strangetom one more batch of requests — could you spin an instance up, and validate the following:

Train a model with configured inputs (e.g. selected datasets, html results file, split value, etc)
Navigate and use other app features while the training is running in the background
Try to close the browser tab / window while the training is running in the background — should receive a prompt
Uninstall a requirements-dev.txt package (e.g. tqdm), and restart the app — should receive a prompt

Due to how threading in python works, which Flask Socket IO uses to spawn separate processes without interfering with websockets, there is no true way to interrupt or terminate the training in the background. I left it out of scope, and will assume this meets the basic needs of most users. Also out of scope is any "new language" database/training (e.g. Spanish), and 1:1 configuration with the command line options for training (e.g. the app uses the default SQLite file/directory/etc versus 100% configuration in a terminal).

Python’s Thread class supports a subset of the behavior of Java’s Thread class; currently, there are no priorities, no thread groups, and threads cannot be destroyed, stopped, suspended, resumed, or interrupted. The static methods of Java’s Thread class, when implemented, are mapped to module-level functions.

"Time elapsed" tracking needs to live within managed state, and not ui state, due to the training being asynchronous and user can use the app while the model is training in the background

strangetom · 2025-08-01T18:45:39Z

Hi @mcioffi

I had to add allow_unsafe_werkzeug=True to the socketio.run() function in app.sockets.py to get the training functionality to work.

Training worked with the various options I tried.
- It would be handy if there was an option to set the seed value for the training run. I use it a lot to compare performance when I'm making changes.
Whilst training the model I was able to use the other functionality in the app.
Attempting to close the tab whilst training did display a prompt.
Uninstalling a dependency did show an error in the terminal when starting the app.
- However the app didn't abort, which is what I expected would happen.

I've got a couple more questions:

If I train a new model and then go to use the parser functionality in the app, will it use the new model or the model that was present when the app was started? The load_parser_model function in the package is cached with a LRU cache, so unless you're handling the specifically I think the app would continue with the older model.
Since you originally created this branch there have been some changes to the training process. Some of the command line options have changed and I've switching from using print to logging for displaying progress - there might be some impact on this.
Is it necessary to have the entire node_modules folder committed to git? Does the package-lock.json provide enough information to make it reproducible without copying node_modules around?

mcioffi · 2025-08-01T21:01:36Z

It would be handy if there was an option to set the seed value for the training run. I use it a lot to compare performance when I'm making changes.

👍🏼 makes sense — will add the seed input, mimicking the existing

If I train a new model and then go to use the parser functionality in the app, will it use the new model or the model that was present when the app was started? The load_parser_model function in the package is cached with a LRU cache, so unless you're handling the specifically I think the app would continue with the older model.

Let me play around the functionality of the lru_cache, it appears it should be simple to reset once the training cycle is complete via the web app "The decorator also provides a cache_clear() function for clearing or invalidating the cache"

Since you originally created this branch there have been some changes to the training process. Some of the command line options have changed and I've switching from using print to logging for displaying progress - there might be some impact on this.

Currently using contextlib.redirect_stdout to swallow the print output. Will adjust to accommodate logging versus print, and check out your develop branch to see what's upcoming

Is it necessary to have the entire node_modules folder committed to git? Does the package-lock.json provide enough information to make it reproducible without copying node_modules around?

Correct, node_modules can be removed when copying/moving around the file system. Running npm install restocks the packages, and package-lock.json is usually preserved in source control (from what I have seen)

Changes are to accommodate strangetom#25, as webtools will need basic logging to handle websocket logging behavior required for front-end

…le output from websockets 1. Work-in-progress new feature for gridsearch 2. Rever to logger.info for websocket output, to be aligned with develop branch 4. Refine the console output, should have originally used <pre> tags to persist reserved str \n, \t, etc Note: This commit refers to a function set_redirect_log_stream, that will be introduced in strangetom#35, since it is required to set a logging stream handle correctly to pipe results to websockets

* Drop tqdm in favor of logger; normalize use of tabulate Changes are to accommodate #25, as webtools will need basic logging to handle websocket logging behavior required for front-end * Missed reference * Preserve logging bypass from existing branch * Remove flask web package dependencies until PR#25 * Introduce decorate fn for redirecting logsteam to be later used in web app/tool Refer to notes on other PR at 371a060 * Formatting fixes with pre-commit

…/etc) Tacks on to the existing pre-commit hooks in the repo with BiomeJS. BiomeJS is specifically for the web side (typescript/js/css). Local configuration modeled after preconfigs at https://github.com/biomejs/pre-commit. Commit includes all file format fixes necessary. Anticipated that config will be modified as necessary if new web contributors participate in future commits.

Accidentally included new training.sqlite3 in strangetom@e01430b

strangetom · 2025-08-16T14:39:55Z

Hi @mcioffi

I've had another, more in-depth look at this. I've found a couple of bugs that will need fixing before merging. These are in no particular order.

Bugs

General: on the parser and labeller pages, the contrast between text and colours is a bit low (particularly for anything on a red background). This is probably best fixed by increasing the font size on those pages, since it's a little small anyway.
Parser: Missing separate_names option
Labeller: Bug when searching (coarse, dried returns no results but should return sentence id 4718)
Labeller: Unable to quickly select a label for a token using first letter (as per <select> element)
Trainer: Inputs for seed, split do not focus on click so cannot edit without tabbing to the inputs or finding another way to focus them. This might be a wider problem than just those.
Trainer: Output include debug info app.sockets, which isn't relevant to training.
Trainer: default model location incorrect (this was changed recently, is should be in ingredient_parser/en/data/)
Trainer: saveur dataset is selected by default, but doesn't exist.
Trainer: html and tsv files saved in webtools directory instead of parent
Trainer: unable to start another training run after completing a previous one if confusion matrix is generated by the first run. There's a RuntimeError related to Tk.
Trainer: split value does not allow more than one decimal place. There shouldn't really be any limit of the number of decimal places (but in practice, 3 might be a more reasonable limit than 1).

Not bugs, but would be nice

Parser: show % sign in tooltips for scores for each label
Parser: order labels in token tooltip as per old webapp
Parser: Reorder rows in result table as per old webapp
Parser: The amount flags are not shown in the results table
General: Can we avoid use of google fonts? Everything else (after running npm install) is entirely local.

This is a seriously impressive bit of work, thanks again!

- [x] Parser: The amount flags are not shown in the results table - [x] Parser: order labels in token tooltip as per old webapp - [x] Parser: Reorder rows in result table as per old webapp - [x] General: Can we avoid use of google fonts? Everything else (after running npm install) is entirely local. - [x] Parser: show % sign in tooltips for scores for each label - [x] Trainer: Output include debug info app.sockets, which isn't relevant to training. - [x] Trainer: html and tsv files saved in webtools directory instead of parent - [x] General: on the parser and labeller pages, the contrast between text and colours is a bit low (particularly for anything on a red background). This is probably best fixed by increasing the font size on those pages, since it's a little small anyway. - [x] Trainer: default model location incorrect (this was changed recently, is should be in ingredient_parser/en/data/) - [x] Trainer: split value does not allow more than one decimal place. There shouldn't really be any limit of the number of decimal places (but in practice, 3 might be a more reasonable limit than 1). - [x] Trainer: saveur dataset is selected by default, but doesn't exist. - [x] Parser: Missing separate_names option

- [x] Trainer: Inputs for seed, split do not focus on click so cannot edit without tabbing to the inputs or finding another way to focus them. This might be a wider problem than just those. - [x] Trainer: unable to start another training run after completing a previous one if confusion matrix is generated by the first run. There's a RuntimeError related to Tk.

mcioffi · 2025-08-17T18:49:06Z

Labeller: Bug when searching (coarse, dried returns no results but should return sentence id 4718)

Thanks! For this one ^, do you prefer to catch other intermediary punctuations too, so , ; - ——. I also noticed that fractions, e.g. ½, don't work with labeller search, so will address that too. Is there a way to reverse convert an already altered unicode fraction 1#1$2 back to 1 ½? I believe will need to include that too. For example, if you filter only against QTY label in labeller, you hit this scenario where the search term will look against raw tokens, e.g. ["1#1$2", "cup", "heavy", "cream"] sentence id 10414 —>

ingredient-parser/labeller/__init__.py

Line 248 in 9a7946f

partial_sentence = " ".join(

Labeller: Unable to quickly select a label for a token using first letter (as per select element)

And do you have a screenshot handy for this ^, trying to visualize it for clarity, thanks!

strangetom · 2025-08-17T19:15:23Z

Thanks! For this one ^, do you prefer to catch other intermediary punctuations too, so , ; - ——. I also noticed that fractions, e.g. ½, don't work with the labeller search, so will address that too.

I think it should match the search phrase exactly, so coarse, dried should only find sentence that contain the substring coarse, dried.
Good catch on the fractions, I'd not noticed that.
There's also an existing bug with the old labeller app where it would never find any matching sentences if the search phrase ended with punctuation. Fixing that would be handy too.

Labeller: Unable to quickly select a label for a token using first letter (as per select element)

And do you have a screenshot handy for this ^, trying to visualize it for clarity, thanks!

Sorry, my explanation was pretty terrible. With the old labeller app (which uses the <select> element for the dropdowns), when the dropdown is focussed you can type the first letter of the label you want and it will select the first option that starts with that letter. If there are multiple options that start with the same letter, typing the letter again will move on the next option starting with that letter.

strangetom · 2025-08-17T19:25:20Z

Is there a way to reverse convert an already altered unicode fraction 1#1$2 back to 1 ½? I believe will need to include that too.

Ah, that is a problem. I don't think I've ever encountered that when using the labeller, which is not to say we shouldn't fix it.
Would it be possible to use the PreProcessor._identify_fractions function on the search phrase to convert to the 1#1$2 form and then try to do the match? We'd probably need to refactor _identify_fractions to move it into the _utils.py file.

Edit: It might actually be easier to just run the search phrase through the PreProcessor and compare the resultant tokens with the tokens in the database.

- [x] Labeller: Unable to search unicode fractions - [x] Labeller: Unable to quickly select a label for a token using first letter (as per <select> element) - [x] Labeller: Bug when searching (coarse, dried returns no results but should return sentence id 4718)

mcioffi · 2025-08-17T21:11:12Z

Edit: It might actually be easier to just run the search phrase through the PreProcessor and compare the resultant tokens with the tokens in the database.

Agreed went that route. I used PreProcessor("input").sentence, no need to modify PreProcessor._identify_fractions. Re: "trailing sentence period", it appears PreProcessor does strip it.

strangetom · 2025-08-18T15:10:19Z

Thanks for those bugfixes. There a few remaining that I can find

Thanks for changing the labeller to use the combobox, it makes things much easier. Is it possible to be able to tab between tokens in a sentence to change to the next one?
Trainer: The split value seems to be capped at 0.9. That should probably changed 0.999 - I was trying to use 0.95 to quickly train a model to test the webtools.
Labeller: There's a weird bug when changing the label for a token - an addition PUNC token gets inserted.
- Steps to reproduce:
  1. Open labeller
  2. Load the NYT dataset
  3. Enable editing
  4. Change the label of the first token in the first sentence
  5. Observe an additional comma token is inserted (after the word squash.
    This seems to effect any sentence containing a comma when you change the label of the first token.

mcioffi · 2025-08-18T16:37:37Z

Thanks for changing the labeller to use the combobox, it makes things much easier. Is it possible to be able to tab between tokens in a sentence to change to the next one?

Took a look at Mantine's Combobox source code. Unfortunately, it looks like they only reserve ArrowUp and ArrowDown for navigating. Hopefully those keys work for now to navgitate via the keyboard in the dropdown

Labeller: There's a weird bug when changing the label for a token - an addition PUNC token gets inserted.

Ah ok, this one is related React's rendering lists with keys plus some code fragility. I forgot underlying token data can be repetitive, non-unique, i.e. [[',','PUNC'],['vegetable','B-NAME'][',','PUNC']]. I will submit some fixes/refactor a bit. Thanks for catching.

Edit: Misinterpreted Combobox question, thought it was referring to the dropdown label options, not the separate tokens in the sentence. Fixed thanks

- [x] Trainer: The split value seems to be capped at 0.9. That should probably changed 0.999 - [x] Labeller: There's a weird bug when changing the label for a token - an addition PUNC token gets inserted. This seems to effect any sentence containing a comma when you change the label of the first token.

- [x] Labeller: Is it possible to be able to tab between tokens in a sentence to change to the next one?

- [x] Trainer: Gridsearch incomplete

mcioffi and others added 6 commits October 29, 2024 23:14

Drop flask front-end

4ee2b39

First batch of vite/react/ts front-end with flask server

13e4e75

Updates for docs and readability

3739b16

Cleanup

4e6f681

Merge pull request '2.1.0' (#211) from develop into master

ec619bb

Merge branch 'master' into pr/25

049c245

mcioffi added 3 commits May 8, 2025 08:18

Update assets + readme

a436994

Refactoring webtools + integrating flask-socketio

1258afb

Bump readme

f0a03eb

Bump

1969dd5

mcioffi added 3 commits May 12, 2025 13:16

dev reqs & concurrent pids bash in npm

43b0cfa

Change handling of edit mode from modal to inline

5678e6f

cleanup

fe6d1d5

mcioffi added 3 commits May 13, 2025 10:06

Labeller help hover

b09c9ea

Missed doc notes

8d9fce8

Train model tab updates

b70bf64

- Cleaned up flask code - Removed train tab interrupt button (limitations) - Added package requirements precheck on app startup - Added time elapsed on training screen

mcioffi added 4 commits July 31, 2025 21:43

Accidental none check against thread, preventing re-runs

013bba3

Asset screenshot

17bcc2e

Move time tracker on trainer to zustrand state

394a1ee

"Time elapsed" tracking needs to live within managed state, and not ui state, due to the training being asynchronous and user can use the app while the model is training in the background

Bump

d038b96

mcioffi added 2 commits August 3, 2025 07:25

Add seed input to training

4f7e5c0

Add seed input to training (server)

e35f0bc

mcioffi mentioned this pull request Aug 11, 2025

Drop tqdm in favor of logger; normalize use of tabulate #35

Merged

4 tasks

mcioffi changed the base branch from master to develop August 12, 2025 17:20

mcioffi added 8 commits August 12, 2025 20:08

Merge remote-tracking branch 'upstream/develop' into pr/25

f3d02c8

Screenshots

1e8bcc1

Remove remnants

fc525af

Cleanup, readability, etc

c2522c6

Remove interrupt and keepModels based on maintainer feedback

c60cee9

Pre-commit format fixes, part I

d40e038

Restore training.sqlite3 to previous commit

5dabdd8

Accidentally included new training.sqlite3 in strangetom@e01430b

mcioffi added 2 commits August 16, 2025 13:12

mcioffi added 3 commits August 18, 2025 15:39

Address bugs feedback, round V

960b113

- [x] Labeller: Is it possible to be able to tab between tokens in a sentence to change to the next one?

Address bugs feedback, round VI

f467eab

- [x] Trainer: Gridsearch incomplete

strangetom merged commit 1d8c045 into strangetom:develop Aug 26, 2025
4 checks passed

mcioffi deleted the 1.2.0-webtools branch August 27, 2025 11:57

This was referenced Aug 28, 2025

[WIP] Patches for upcoming 2.3.0 #37

Closed

Patches for upcoming 2.3.0 #38

Merged

Redesign for webtools — move to vite/react/ts #25

Redesign for webtools — move to vite/react/ts #25

Uh oh!

Conversation

mcioffi commented Oct 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The New 💯

"Too Early" Sneak Peak 👀

Uh oh!

mcioffi commented Apr 30, 2025

Uh oh!

strangetom commented Apr 30, 2025

Uh oh!

mcioffi commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

strangetom commented May 12, 2025

Uh oh!

mcioffi commented May 13, 2025

Uh oh!

mcioffi commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

strangetom commented Aug 1, 2025

Uh oh!

mcioffi commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

strangetom commented Aug 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bugs

Not bugs, but would be nice

Uh oh!

mcioffi commented Aug 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

strangetom commented Aug 17, 2025

Uh oh!

strangetom commented Aug 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mcioffi commented Aug 17, 2025

Uh oh!

strangetom commented Aug 18, 2025

Uh oh!

mcioffi commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mcioffi commented Oct 30, 2024 •

edited

Loading

mcioffi commented May 8, 2025 •

edited

Loading

mcioffi commented Jul 31, 2025 •

edited

Loading

mcioffi commented Aug 1, 2025 •

edited

Loading

strangetom commented Aug 16, 2025 •

edited

Loading

mcioffi commented Aug 17, 2025 •

edited

Loading

strangetom commented Aug 17, 2025 •

edited

Loading

mcioffi commented Aug 18, 2025 •

edited

Loading