WikiChat 2.1 is now available! Key updates include:
- Improved Multilingual Support: Now supports 25 different Wikipedias (up from 10) available via web and API at search.genie.stanford.edu/wikipedia_20250320: 🇺🇸 English, 🇫🇷 French, 🇩🇪 German, 🇪🇸 Spanish, 🇯🇵 Japanese, 🇷🇺 Russian, 🇵🇹 Portuguese, 🇨🇳 Chinese, 🇮🇹 Italian, 🇸🇦 Arabic, 🇮🇷 Persian, 🇵🇱 Polish, 🇳🇱 Dutch, 🇺🇦 Ukrainian, 🇮🇱 Hebrew, 🇮🇩 Indonesian, 🇹🇷 Turkish, 🇨🇿 Czech, 🇸🇪 Swedish, 🇰🇷 Korean, 🇫🇮 Finnish, 🇻🇳 Vietnamese, 🇭🇺 Hungarian, Catalan, 🇹🇭 Thai.
- Improved Information Retrieval: Improved retrieval accuracy and speed with the latest Snowflake's Arctic embedding model.
- Improved Preprocessing of Wikipedia using Docling. As always, preprocessed Wikipedia is available on HuggingFace.
- Improved WikiChat Pipeline:
- Added inline citations to the final response.
- The 'generate' stage of the pipeline is now always merged with the 'claim extraction' stage, even in the non-distilled setting, for faster and cheaper inference.
- Removed date-based reranking in favor of LLM-based reranking.
- Switched to using pixi for package management and loguru for logging.