Release v0.2.2 · Skill-Extraction Refactor (“ESCO + KSA”)
⚡PR⚡#101
✨ Highlights
-
Taxonomy-aware skill extraction
• Integrates a FAISS index of the ESCO skills taxonomy.
•Skill_Extractor.get_top_esco_skills()
now returns{Skill, index, score}
enabling deterministicSkill Tag
values (ESCO.<index>
). -
KSA enrichment with vLLM
• New helperget_ksa_details()
generates Knowledge Required and Task Abilities lists for each skill.
• Automatically invoked when a GPU/vLLM backend is available. -
Unified output schema
The extractor returns a tidy DataFrame with seven columns:
Research ID, Description, Raw Skill, Knowledge Required, Task Abilities, Skill Tag, Correlation Coefficient
.
🔧 Detailed Changes
Area | Description |
---|---|
utils.py | get_top_esco_skills() enhanced to include ESCO index and similarity score. |
llm_methods.py | Added get_ksa_details() plus supporting imports. |
skill_extractor.py | • Ensured self.index is always defined.• build_faiss_index_esco() / load_faiss_index_esco() now instance methods storing the index under laiser/input .• New taxonomy-first pipeline inserted at the top of extractor() ; legacy alignment kept for fallback. |
⚠️ Deprecated / To Be Removed
align_skills()
andalign_KSAs()
will be dropped in v0.3 once consumers migrate to the new output format.
🚧 Known Issues / Roadmap
- JSON parsing in
get_ksa_details()
needs additional resilience checks. - LLM calls are still executed per skill; batching will come in v0.3.
- Duplicate
import json
lines remain inllm_methods.py
. - Consider CPU-only fallback for KSA generation.
- Persistence of the ESCO vector index should move to a cloud vector DB.
- vLLM isn't supported on MPS/MacOS as of now.
⬆️ Upgrade Notes
pip install -U laiser==0.2.2
No changes to input parameters are required, but downstream code should read the new seven-column schema.
Next up
0.2 → 0.3
- adding batching and dropping deprecated APIs; increment patch for bug fixes.