T-Brain Machine Learning Competition 37: RAG for Financial Data

Final submission to T-Brain's 37th competition: Creating an RAG from sample data to answer ambiguous questions about finance, insurance, or other, with IDs of the relevant documents.

We achieved the best results with minimal data pre-processing. Our RAG method relies on the Beijing Academy of Artifical Intelligence's (BAAI) reranker model to sort documents by relevance to the query. Then it chunks the top three and reranks them again, returning the top result.

Best accuracy on sample data: 136/150 (90.7%).

To test, run fintech.ipynb (Requires JupyterLab or Jupyter Notebook). Required Python packages: json, tqdm, torch, transformers, os.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
result_in_score		result_in_score
.gitattributes		.gitattributes
README.md		README.md
combined_scores.json		combined_scores.json
finance_data.json		finance_data.json
fintech.ipynb		fintech.ipynb
ground_truths_example.json		ground_truths_example.json
insurance_data.json		insurance_data.json
pid_map_content.json		pid_map_content.json
questions.json		questions.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

T-Brain Machine Learning Competition 37: RAG for Financial Data

About

Uh oh!

Releases

Packages

Languages

benphamroodman/T-Brain-Machine-Learning-Competition-37-Submission

Folders and files

Latest commit

History

Repository files navigation

T-Brain Machine Learning Competition 37: RAG for Financial Data

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages