- 
                Notifications
    
You must be signed in to change notification settings  - Fork 17
 
Redesign for webtools β move to vite/react/ts #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| 
           @strangetom does pycrfsuite expose some sort of progress on the training execution? I looked at their docs on the Trainer class, but figured I would pose the question here, as it relates to this PR, since I am building a piece of the web tool for triggering and working with the training parts of this repo. ingredient-parser/train/train_model.py Line 259 in ec619bb 
  | 
    
| 
           It's possible to set  It's not a measure of progress as such, but it is an indication that something is happening.  | 
    
| 
           @strangetom can you give the PR a spin on your machine, and run through the "Try the Parser" and "Browse & Label" tabs. Node is required to be installed, its managed outside of Python, as its a JS runtime. Once its globally installed on your machine, you can download the packages with /webtools/ as your working directory: Then you need to run these three separate instances, and access the app in your browser same as you would have before, typically  Depending on your python environment setup, you might need map the python path located in the npm  I have been some good progress on the "Train Model" tab, but there are some outstanding issues β you can ignore that at the moment.  | 
    
| 
           Hi @mcioffi I've got this working and have had a play with it. This is very impressive. I have a couple of notes: 
  | 
    
| 
           Many thanks for the feedback. I pushed some updates based on your notes. I will have the 'Train model' tab done in coming weeks. All set, thank you 
 Pulled everything under  
 Good point. I folded that functionality inline now, instead of in a separate user experience 
  | 
    
- Cleaned up flask code - Removed train tab interrupt button (limitations) - Added package requirements precheck on app startup - Added time elapsed on training screen
| 
           @strangetom one more batch of requests β could you spin an instance up, and validate the following: 
 Due to how threading in python works, which Flask Socket IO uses to spawn separate processes without interfering with websockets, there is no true way to interrupt or terminate the training in the background. I left it out of scope, and will assume this meets the basic needs of most users. Also out of scope is any "new language" database/training (e.g. Spanish), and 1:1 configuration with the command line options for training (e.g. the app uses the default SQLite file/directory/etc versus 100% configuration in a terminal). 
  | 
    
"Time elapsed" tracking needs to live within managed state, and not ui state, due to the training being asynchronous and user can use the app while the model is training in the background
| 
           Hi @mcioffi I had to add  
 I've got a couple more questions: 
  | 
    
          
 ππΌ makes sense β will add the seed input, mimicking the existing 
 Let me play around the functionality of the  
 Currently using  
 Correct,   | 
    
Changes are to accommodate strangetom#25, as webtools will need basic logging to handle websocket logging behavior required for front-end
β¦le output from websockets 1. Work-in-progress new feature for gridsearch 2. Rever to logger.info for websocket output, to be aligned with develop branch 4. Refine the console output, should have originally used <pre> tags to persist reserved str \n, \t, etc Note: This commit refers to a function set_redirect_log_stream, that will be introduced in strangetom#35, since it is required to set a logging stream handle correctly to pipe results to websockets
* Drop tqdm in favor of logger; normalize use of tabulate Changes are to accommodate #25, as webtools will need basic logging to handle websocket logging behavior required for front-end * Missed reference * Preserve logging bypass from existing branch * Remove flask web package dependencies until PR#25 * Introduce decorate fn for redirecting logsteam to be later used in web app/tool Refer to notes on other PR at 371a060 * Formatting fixes with pre-commit
β¦/etc) Tacks on to the existing pre-commit hooks in the repo with BiomeJS. BiomeJS is specifically for the web side (typescript/js/css). Local configuration modeled after preconfigs at https://github.com/biomejs/pre-commit. Commit includes all file format fixes necessary. Anticipated that config will be modified as necessary if new web contributors participate in future commits.
Accidentally included new training.sqlite3 in strangetom@e01430b
| 
           Hi @mcioffi I've had another, more in-depth look at this. I've found a couple of bugs that will need fixing before merging. These are in no particular order. Bugs
 Not bugs, but would be nice
 This is a seriously impressive bit of work, thanks again!  | 
    
- [x] Parser: The amount flags are not shown in the results table - [x] Parser: order labels in token tooltip as per old webapp - [x] Parser: Reorder rows in result table as per old webapp - [x] General: Can we avoid use of google fonts? Everything else (after running npm install) is entirely local. - [x] Parser: show % sign in tooltips for scores for each label - [x] Trainer: Output include debug info app.sockets, which isn't relevant to training. - [x] Trainer: html and tsv files saved in webtools directory instead of parent - [x] General: on the parser and labeller pages, the contrast between text and colours is a bit low (particularly for anything on a red background). This is probably best fixed by increasing the font size on those pages, since it's a little small anyway. - [x] Trainer: default model location incorrect (this was changed recently, is should be in ingredient_parser/en/data/) - [x] Trainer: split value does not allow more than one decimal place. There shouldn't really be any limit of the number of decimal places (but in practice, 3 might be a more reasonable limit than 1). - [x] Trainer: saveur dataset is selected by default, but doesn't exist. - [x] Parser: Missing separate_names option
- [x] Trainer: Inputs for seed, split do not focus on click so cannot edit without tabbing to the inputs or finding another way to focus them. This might be a wider problem than just those. - [x] Trainer: unable to start another training run after completing a previous one if confusion matrix is generated by the first run. There's a RuntimeError related to Tk.
          
 Thanks! For this one ^, do you prefer to catch other intermediary punctuations too, so  ingredient-parser/labeller/__init__.py Line 248 in 9a7946f 
 
 And do you have a screenshot handy for this ^, trying to visualize it for clarity, thanks!  | 
    
          
 I think it should match the search phrase exactly, so  
 Sorry, my explanation was pretty terrible. With the old labeller app (which uses the   | 
    
          
 Ah, that is a problem. I don't think I've ever encountered that when using the labeller, which is not to say we shouldn't fix it. Edit: It might actually be easier to just run the search phrase through the   | 
    
- [x] Labeller: Unable to search unicode fractions - [x] Labeller: Unable to quickly select a label for a token using first letter (as per <select> element) - [x] Labeller: Bug when searching (coarse, dried returns no results but should return sentence id 4718)
          
 Agreed went that route. I used   | 
    
| 
           Thanks for those bugfixes. There a few remaining that I can find 
  | 
    
          
 Took a look at Mantine's Combobox source code. Unfortunately, it looks like they only reserve  
 Ah ok, this one is related React's rendering lists with keys plus some code fragility. I forgot underlying token data can be repetitive, non-unique, i.e.  Edit: Misinterpreted Combobox question, thought it was referring to the dropdown label options, not the separate tokens in the sentence. Fixed thanks  | 
    
- [x] Trainer: The split value seems to be capped at 0.9. That should probably changed 0.999 - [x] Labeller: There's a weird bug when changing the label for a token - an addition PUNC token gets inserted. This seems to effect any sentence containing a comma when you change the label of the first token.
- [x] Labeller: Is it possible to be able to tab between tokens in a sentence to change to the next one?
- [x] Trainer: Gridsearch incomplete

The New π―
TL;DR β Moves the
flaskfront-end tovite/react/typescriptfront-end preserving the basicflaskserver.labellerandwebappinto one webtoolsflaskplusvite/react/typescript"Too Early" Sneak Peak π