-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
Problem: iQual’s text vectorization uses sentence-transformers (e.g., older models like all-MiniLM-L6-v1). Newer models like all-MiniLM-L12-v2 offer better accuracy with similar efficiency.
Proposed Solution: Update src/iqual/text_features.py to support all-MiniLM-L12-v2 as an option in add_text_features. This would:
- Add a parameter to select the model (default to current).
- Update notebook examples (
Basic Modelling) to demo the new model. - Include performance benchmarks (e.g., accuracy on politeness dataset).
Steps:
- Add model option in
text_features.py. - Test on sample data (politeness dataset).
- Update
notebooks/Basic_Modelling.ipynbwith example. - Add tests for vectorization output.
Impact: Improves iQual’s NLP accuracy, aligning with World Bank’s AI-for-data goals.
Willing to Implement: I can submit a PR with code and updated notebook.
@addypy @g4brielvs, seeking your thoughts on adding all-MiniLM-L12-v2 to iQual’s text vectorization to boost NLP accuracy for SDG analysis. Happy to refine benchmarks or model choices per your guidance!
Metadata
Metadata
Assignees
Labels
No labels