Skill-Extraction Refactor (ESCO + KSA) #102

phanindra-max · 2025-06-12T17:57:34Z

phanindra-max
Jun 12, 2025
Maintainer

Taxonomy-aware skill extraction
• Integrates a FAISS index of the ESCO skills taxonomy.
• Skill_Extractor.get_top_esco_skills() now returns {Skill, index, score} enabling deterministic Skill Tag values (ESCO.<index>).
KSA enrichment with vLLM
• New helper get_ksa_details() generates Knowledge Required and Task Abilities lists for each skill.
• Automatically invoked when a GPU/vLLM backend is available.
Unified output schema
The extractor returns a tidy DataFrame with seven columns:
Research ID, Description, Raw Skill, Knowledge Required, Task Abilities, Skill Tag, Correlation Coefficient.

Area	Description
utils.py	`get_top_esco_skills()` enhanced to include ESCO index and similarity score.
llm_methods.py	Added `get_ksa_details()` plus supporting imports.
skill_extractor.py	• Ensured `self.index` is always defined. • `build_faiss_index_esco()` / `load_faiss_index_esco()` now instance methods storing the index under `laiser/input`. • New taxonomy-first pipeline inserted at the top of `extractor()`; legacy alignment kept for fallback.

align_skills() and align_KSAs() will be dropped in v0.3 once consumers migrate to the new output format.

pip install -U laiser==0.2.2

No changes to input parameters are required, but downstream code should read the new seven-column schema.

0.2 → 0.3

This discussion was created from the release Skill-Extraction Refactor (ESCO + KSA).