This repository contains the evaluation and analysis code for the paper "Understanding LLMs' Cross-Lingual Context Retrieval: How Good It Is And Where It Comes From".
You can run pip install -r requirements.txt to install the dependencies of this repository.
- The main evaluation scripts are
test_xquad.sh,test_xquad_x-x.sh,test_api_xquad.shandtest_api_xquad_x-x.sh. - The script for language error detection is
xquad_lang_detect.sh - The script for generation error detection is
deny_answer_rate.sh
The code files for interpretability analysis are located in the interpretability directory.
- The code for oracle performance estimation is
sentence_attribution_regression.py - The code for MRD measurement is
layerwise_attribution.py - The code for hidden-states similarity calculation is
hidden_states.py