LLM Group, Institute for Advanced Algorithms Research, Shanghai

All

23 repositories

FTIIBench
Public
(ARXIV24) This is the official code repository for "FTII-Bench: A Comprehensive Multimodal Benchmark for Flow Text with Image Insertion"
Python
•
Apache License 2.0
•1•2•1•0•Updated Aug 18, 2025Aug 18, 2025
GuessArena
Public
[ACL 2025] GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning
benchmark openai evaluation-framework large-language-models chatgpt llm-eval qwen deepseek knowledge-evaluation reliable-evaluation
Python
•
Apache License 2.0
•0•7•0•0•Updated Jul 30, 2025Jul 30, 2025
QAEncoder
Public
[ACL 2025 Oral] QAEncoder: Towards Aligned Representation Learning in Question Answering Systems
Python
•20•176•0•0•Updated Jul 12, 2025Jul 12, 2025
RolePlay_LLMDoctor
Public
Python
•0•3•0•0•Updated Jul 6, 2025Jul 6, 2025
SurveyX
Public
Academic Survey Paper Generation.
nlp literature-search large-language-models llm ai4research deep-research automated-survey-generation literature-synthesis autosurvey
TeX
•78•890•36•0•Updated Jun 22, 2025Jun 22, 2025
SEAP
Public
Python
•1•21•0•0•Updated Jun 10, 2025Jun 10, 2025
Meta-Chunking
Public
Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception
Python
•
Apache License 2.0
•17•242•2•0•Updated Jun 7, 2025Jun 7, 2025
UHGEval
Public
[ACL 2024] User-friendly evaluation framework: Eval Suite & Benchmarks: UHGEval, HaluEval, HalluQA, etc.
benchmark evaluation dataset openai hallucination huggingface huggingface-transformers ceval gpt-3 openai-api
Python
•
Apache License 2.0
•17•172•0•0•Updated Jun 7, 2025Jun 7, 2025
MARA
Public
[ijcai2025]Token-level Accept or Reject: A micro alignment approach for Large Language Models
Python
•
Apache License 2.0
•0•6•0•0•Updated May 27, 2025May 27, 2025
MaintainCoder
Public
Python
•9•44•0•0•Updated May 21, 2025May 21, 2025
CRUD_RAG
Public
CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models
benchmark large-language-models retrieval-augmented-generation
Python
•24•327•8•0•Updated May 20, 2025May 20, 2025
xVerify
Public
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
benchmark regex reliability evaluation llm reliability-tools chatgpt cc-by-nc-nd-4 open-compass llm-as-a-judge
Python
•
Other
•7•128•6•0•Updated Apr 17, 2025Apr 17, 2025
SafeRAG
Public
Python
•5•39•0•0•Updated Mar 11, 2025Mar 11, 2025
Awesome-Attention-Heads
Public
An awesome repository & A comprehensive survey on interpretability of LLM attention heads.
awesome survey transformer gpt attention-mechanism research-paper circuit-analysis interpretability cognitive-neuroscience visualization-tools
TeX
•12•356•4•0•Updated Mar 2, 2025Mar 2, 2025
xFinder
Public
[ICLR 2025] xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation
benchmark regex reliability evaluation dataset gpt phi large-language-models llm cc-by-nc-nd-4
Python
•
Other
•7•176•2•0•Updated Feb 26, 2025Feb 26, 2025
ICSFSurvey
Public
Explore concepts like Self-Correct, Self-Refine, Self-Improve, Self-Contradict, Self-Play, and Self-Knowledge, alongside o1-like reasoning elevation🍓 and hallucination alleviation🍄.
decoding self-improvement knowledge-distillation data-augmentation reasoning self-consistency preference-learning hallucination self-correction attention-head
Jupyter Notebook
•4•169•0•0•Updated Dec 7, 2024Dec 7, 2024
FastMem
Public
Fast Memorization of Prompt Improves Context Awareness of Large Language Models (Findings of EMNLP 2024)
context-awareness llm qwen1-5 llama3
Python
•
Apache License 2.0
•0•22•0•0•Updated Oct 22, 2024Oct 22, 2024
DATG
Public
[ACL 2024]Controlled Text Generation for Large Language Model with Dynamic Attribute Graphs
graph pagerank inference text-generation fudge controlled-text-generation large-language-models llms controllable-text-generation preadd
Jupyter Notebook
•
Apache License 2.0
•3•39•1•0•Updated Sep 24, 2024Sep 24, 2024
CTGSurvey
Public
Controllable Text Generation for Large Language Models: A Survey
decoding survey ctg controlled-text-generation controllable-text-generation
TeX
•9•185•0•0•Updated Aug 27, 2024Aug 27, 2024
PGRAG
Public
PGRAG
Python
•
Other
•5•53•0•0•Updated Jul 16, 2024Jul 16, 2024
NewsBench
Public
[ACL 2024 Main] NewsBench: A Systematic Evaluation Framework for Assessing Editorial Capabilities of Large Language Models in Chinese Journalism
benchmark framework evaluation dataset gpt4 large-language-models llm chatgpt ernie-bot gpt35turbo
Python
•
Apache License 2.0
•1•33•0•0•Updated Jun 25, 2024Jun 25, 2024
Grimoire
Public
Grimoire is All You Need for Enhancing Large Language Models
grimoire llama datasets icl phi2 baichuan gpt-4 in-context-learning llm chatgpt
Python
•
Apache License 2.0
•13•116•0•0•Updated Feb 29, 2024Feb 29, 2024
UHGEval-dataset
Public
The full pipeline of creating UHGEval hallucination dataset
benchmark pipeline evaluation dataset unconstrained hallucinations large-language-models llm chatgpt uhgeval
Python
•0•9•0•0•Updated Feb 15, 2024Feb 15, 2024