| Dataset | Links | Domain | Language | Size |
|---|---|---|---|---|
| FSCS (Niklaus et al., 2021) | ๐ ๐ค ๐ป | Swiss court judgments | ๐ฉ๐ช ๐ซ๐ท ๐ฎ๐น | 85K cases w/ 2 outcomes |
| ECtHR (Chalkidis et al., 2021) | ๐ ๐ค | EU court judgments | ๐ฌ๐ง | 11K cases w/ 11 outcomes |
| ECHR (Aletras et al., 2019) | ๐ ๐พ | EU court judgments | ๐ฌ๐ง | 11.5K cases w/ 11 outcomes |
| CAIL (Xiao et al., 2018) | ๐ ๐ป | Chinese court judgements | ๐จ๐ณ | 2.6M cases w/ 6 outcomes |
| AnnoCaseLaw (2025) | ๐ ๐ป | US Appeals Court negligence cases | ๐บ๐ธ | 471 annotated cases with expert labels |
| IndianBailJudgments-1200 (2025) | ๐ ๐ค ๐ป | Indian court bail decisions | ๐ฎ๐ณ | 1.2K judgments with 20+ structured attributes |
| CaseSumm (2025) | ๐ ๐ค | US Supreme Court opinions | ๐บ๐ธ | 25.6K opinions with official syllabuses |
| JUSTICE (2022) | ๐ ๐ป | US Supreme Court cases | ๐บ๐ธ | Benchmark for judgment prediction |
| Cambridge Law Corpus (CLC) (2023) | ๐ | UK court cases | ๐ฌ๐ง | 258K+ cases (16th centuryโpresent) |
| Super-SCOTUS (2025) | ๐ ๐ป | US Supreme Court decisions | ๐บ๐ธ | Decision direction and related tasks |
| Dataset | Links | Domain | Language | Size |
|---|---|---|---|---|
| GLC (Papaloukas et al., 2021) | ๐ ๐ป | Greek legislation | ๐ฌ๐ท | 47.5K laws w/ 2.7K labels |
| CUAD (Hendrycks et al., 2021) | ๐ ๐ค ๐ป | Contracts | ๐ฌ๐ง | 510 contracts w/ 41 classes |
| MultiEURLEX (Chalkidis et al., 2021) | ๐ ๐ค ๐ป | EU legislation | ๐ฌ๐ง ๐ฉ๐ช ๐ซ๐ท ๐ฎ๐น ๐ช๐ธ (18+) | 65K laws w/ 4.5K labels |
| LEDGAR (Tuggener et al., 2020) | ๐ ๐พ | Contracts | ๐ฌ๐ง | 60.5K contracts w/ 12.6K labels |
| Contract Discovery (Borchmann et al., 2020) | ๐ ๐ป | Contracts | ๐ฌ๐ง | 2.6K clauses w/ 21 classes |
| EURLEX-57K (Chalkidis et al., 2019) | ๐ ๐พ | EU legislation | ๐ฌ๐ง | 57K laws w/ 4.3K labels |
| Unfair-ToS (Lippi et al., 2018) | ๐ ๐พ | Contracts | ๐ฌ๐ง | 9.4K sentences w/ 9 classes |
| Contract Elements (Chalkidis et al., 2017) | ๐ ๐พ | Contracts | ๐ฌ๐ง | 2.4K contracts w/ 10 classes |
| OPP-115 (Wilson et al., 2016) | ๐ ๐พ | Privacy laws | ๐ฌ๐ง | 115 policies w/ 23K labels |
| FairLex (2022) | ๐ ๐ค ๐ป | Multi-jurisdictional legal texts | ๐ฌ๐ง๐ฉ๐ช๐ซ๐ท๐ฎ๐น๐จ๐ณ | Fairness-focused classification datasets |
| Legal Case Document Summarization (Kaggle) | ๐ | Legal case summaries | Various | Large-scale dataset |
| Legal Citation Text Classification Dataset (Kaggle) | ๐ | General legal documents | ๐ฌ๐ง | 25K cases with catchphrases and citations |
| Dataset | Links | Domain | Language | Size |
|---|---|---|---|---|
| BSARD (Louis et al., 2022) | ๐ ๐ค ๐ป | Belgian legislation | ๐ซ๐ท | 1.1K questions w/ 22.6K candidate statutory articles |
| EU2UK (Chalkidis et al., 2021) | ๐ ๐พ | EU & UK legislation | ๐ฌ๐ง | 2K query documents w/ 52.5K candidate documents |
| UK2EU (Chalkidis et al., 2021) | ๐ ๐พ | EU & UK legislation | ๐ฌ๐ง | 2.1K query documents w/ 3.9K candidate documents |
| COLIEE-Case-Law-Retrieval (Rabelo et al., 2020) | ๐ ๐พ | Canadian precedents | ๐ฌ๐ง | 650 query cases w/ 128K candidate cases |
| COLIEE-Statute-Law-Retrieval (Rabelo et al., 2020) | ๐ ๐พ | Japanese legislation | ๐ฌ๐ง ๐ฏ๐ต | 808 questions w/ 768 candidate statutory articles |
| CAIL2019-SCM (Xiao et al., 2019) | ๐ ๐ป | Chinese court judgements | ๐จ๐ณ | 8.9K triplets of cases |
| CLERC (2024) | ๐ ๐ค ๐ป | Legal case retrieval | ๐ฌ๐ง | Large corpus for retrieval and RAG |
| LEAD (2024) | ๐ ๐ป | Legal case retrieval | Various | 100K+ pairs of similar legal cases |
| Legal IR Philippines (2024) | ๐ | Philippine legal documents | ๐ต๐ญ | Datasets with synthetic queries |
| Dataset | Links | Domain | Language | Size |
|---|---|---|---|---|
| CaseHOLD (Zheng et al., 2021) | ๐ ๐ป | US case holdings | ๐ฌ๐ง | 53.1K multiple-choice questions |
| JEC-QA (Zhong et al., 2019) | ๐ ๐พ | Chinese law | ๐จ๐ณ | 26.3K multiple-choice questions |
| CJRC (Duan et al., 2019) | ๐ ๐ป | Chinese court judgements | ๐จ๐ณ | 50K question-answers from 10K documents |
| PrivacyQA (Ravichander et al., 2019) | ๐ ๐ป | Privacy policies | ๐ฌ๐ง | 1.7K question-answers from 35 documents |
| LLeQA (2024) | ๐ ๐ค ๐ป | French-Belgian statutes | ๐ซ๐ท | 1,868 expert-annotated long-form QA |
| IndicLegalQA (2025) | ๐ | Indian Supreme Court judgments | ๐ฎ๐ณ | 10K QA pairs from 1,256 judgments |
| GerLayQA (2024) | ๐ ๐ป | German civil law | ๐ฉ๐ช | 21K laymen legal Qs with lawyer answers |
| LEGAL-UQA (2024) | ๐ | Legal questions | ๐ต๐ฐ | 619 parallel UrduโEnglish QA pairs |
| Dataset | Links | Domain | Language | Size |
|---|---|---|---|---|
| COLIEE-Case-Law-Entailment (Rabelo et al., 2020) | ๐ ๐พ | Canadian precedents | ๐ฌ๐ง | 425 cases w/ related case |
| COLIEE-Statute-Law-Entailment (Rabelo et al., 2020) | ๐ ๐พ | Japanese legislation | ๐ฌ๐ง ๐ฏ๐ต | 808 questions w/ related statutory article |
| LAR-ECHR (2024) | ๐ | European Court of Human Rights | ๐ฌ๐ง | Legal argument reasoning task dataset |
| ฮด-Stance (2025) | ๐ | US legal argumentation | ๐บ๐ธ | Large-scale stances and arguments |
| Dataset | Links | Domain | Language | Size |
|---|---|---|---|---|
| UK-Abs (Shukla et al., 2022) | ๐ ๐ป ๐พ | UK court cases | ๐ฌ๐ง | 793 pairs of (case, abastractive summary) from the UK Supreme Court |
| IN-Abs (Shukla et al., 2022) | ๐ ๐ป ๐พ | Indian court cases | ๐ฌ๐ง | 7.1K pairs of (case, abastractive summary) from the Indian Supreme Court |
| IN-Ext (Shukla et al., 2022) | ๐ ๐ป ๐พ | Indian court cases | ๐ฌ๐ง | 50 pairs of (case, extractive summary) from the Indian Supreme Court |
| TOS;DR (Keymanesh et al., 2020) | ๐ ๐ป | Terms of service | ๐ฌ๐ง | 1.6K pairs of (agreement text, summary) from data privacy policies |
| BillSum (Kornilova et al., 2019) | ๐ ๐ป ๐พ | US Congressional bills | ๐ฌ๐ง | 22.2K pairs of (bill, summary) |
| TL;DRLegal (Manor et al., 2019) | ๐ ๐ป | Terms of service | ๐ฌ๐ง | 84 pairs of (agreement text, summary) from software licenses |
| TOS;DR (Manor et al., 2019) | ๐ ๐ป | Terms of service | ๐ฌ๐ง | 421 pairs of (agreement text, summary) from data privacy policies |
| BVA Cases (Zhong et al., 2019) | ๐ ๐ป | US court cases | ๐ฌ๐ง | 92 pairs of (case, summary) from the US Board of Veterans' Appeal |
| LCR (Galgani et al., 2012) | ๐ ๐พ | Australian court cases | ๐ฌ๐ง | 3.9K pairs of (case, catchphrases) |
| EurLexSummarization (2022) | ๐ ๐ค ๐ป | EU legislation | ๐ | Multilingual summarization across 24 languages |
| Multi-LexSum (2025) | ๐ | Legal documents | ๐ฌ๐ง | 40K+ documents with 9K+ expert summaries |
| CaseSumm (2025) | ๐ ๐ค | US Supreme Court opinions | ๐ฌ๐ง | 25.6K opinions with official syllabuses |
| Dataset | Links | Language | Size |
|---|---|---|---|
| Pile of Law (Henderson et al., 2022) | ๐ ๐ค ๐ป | ๐ฌ๐ง | ~256GB of legal and administrative legal text |
| MultiLegalPile (2024) | ๐ ๐ค | ๐ | 689GB multilingual legal corpus from 17 jurisdictions |
| Dataset | Task | Language | Tasks |
|---|---|---|---|
| FairLex (Chalkidis et al., 2022) | ๐ ๐ค ๐ป | ๐ฌ๐ง ๐ฉ๐ช ๐ซ๐ท ๐ฎ๐น ๐จ๐ณ | Clasification (x1), legal judgement prediction (x3) |
| LexGLUE (Chalkidis et al., 2022) | ๐ ๐ค ๐ป | ๐ฌ๐ง | Classsification (x6), multiple-choice QA (x1) |
| Model | Links | Language | Size |
|---|---|---|---|
| Legal-HeBERT (Chriqui et al., 2022) | ๐ ๐ค ๐ป | ๐ฎ๐ฑ | 110M |
| PoL-BERT-Large (Henderson et al., 2022) | ๐ ๐ค ๐ป | ๐ฌ๐ง | 336M |
| Italian-LEGAL-BERT (Licari and Comande, 2022) | ๐ ๐ค | ๐ฎ๐น | 110M |
| JuriBERT (Douka et al., 2021) | ๐ ๐พ | ๐ซ๐ท | {6M, 15M, 42M, 110M} |
| Custom-LEGAL-BERT (Zheng et al., 2021) | ๐ ๐ค ๐ป | ๐ฌ๐ง | 110M |
| LEGAL-BERT (Chalkidis et al., 2020) | ๐ ๐ค | ๐ฌ๐ง | {35M, 110M} |
| LEGAL-GPT-{1,2} (Borchmann et al., 2020) | ๐ ๐ป | ๐ฌ๐ง | {117M, 1.5B} |
| MultiLegalPile Models (2024-2025) | ๐ ๐ค | ๐ | RoBERTa (multilingual + 24 monolingual), Longformer |
| Legal-BERT Fine-tuned (2024) | ๐ | ๐ฌ๐ง | Domain-adapted classification models |
| LegalCore Models (2025) | ๐ | ๐ | Event coreference resolution for legal texts |
| Legal LLaMA (2025) | ๐ | ๐จ๐ณ | Chinese legal domain adaptations |
| FairLex Domain Models (2024-2025) | ๐ค | ๐ | Domain-specific BERT models for 4 jurisdictions |
-
[
2017] Artificial Intelligence and Legal Analytics: New Tools for Law Practice in the Digital Age, K. Ashley. [link] -
[
2024] Large Language Models and International Law, Chicago Journal of International Law [๐] -
[
2024] Computational Legal Studies Comes of Age, SSRN [๐]
-
[
2020-05] How Does NLP Benefit Legal System: A Summary of Legal Artificial Intelligence, H. Zhong et al. [pdf] -
[
2019-09] A Brief History of the Changing Roles of Case Prediction in AI and Law, K. Ashley [pdf] -
[
2018-12] Deep learning in law: early adaptation and legal word embeddings trained on large corpora, I. Chalkidis et al. [pdf] -
[
2024] Natural Language Processing for the Legal Domain: A Survey of Tasks, Datasets, Models and Challenges, F. Ariai et al. [๐] -
[
2025] Computational Law: Datasets, Benchmarks, and Ontologies, D. Kรผรงรผk & F. Can [๐] -
[
2025] A Comprehensive Survey on Legal Summarization, arXiv [๐] -
[
2024] Large Language Models in Law: A Survey, J. Lai et al. [๐] -
[
2025] Large Language Models in Argument Mining: A Survey, arXiv [๐] -
[
2024] When Large Language Models Meet Law: Dual-Lens Survey, arXiv [๐]
- [
2019-06] Law as Data: The Promise and Challenges of Natural Language Processing for Legal Research, A. Dyevre. [slides] - [
2019-04] Artificial Intelligence and Law โ An Overview and History, H. Surden. [video]
- The Natural Legal Language Processing (NLLP) Workshop [website]
- The International Conference on Artificial Intelligence and Law (ICAIL) [website]
- The International Conference on Legal Knowledge and Information Systems (JURIX) [website]
- The EXplainable AI in Law (XAILA) Workshop [website]
- The International Workshop on Juris-informatics (JURISIN) [website]
- The Competition on Legal Information Extraction/Entailment (COLIEE) [website]
- The International Workshop on Legal Information Retrieval [website]
- NLLP 2025 - Natural Legal Language Processing Workshop (EMNLP 2025, Suzhou) [๐]
- RegNLP 2025 - Regulatory Natural Language Processing Workshop (COLING 2025) [๐]
- JURIX 2025 - 38th International Conference on Legal Knowledge and Information Systems (Turin, December 9-11, 2025) [๐]
- ICAIL 2025 - 20th International Conference on Artificial Intelligence and Law (Chicago, June 16-20, 2025) [๐]
- MWAiL 2025 - Multilingual Workshop on AI & Law Research (Chicago, June 20, 2025) [๐]
- LLMFinLegal 2025 - Workshop on Large Language Models for Finance and Legal (COLING 2025) [๐]
- 8th World Legal Tech and AI Summit (Berlin, September 18-19, 2025) [๐]
- AI Legal Summit 2025 - Various industry conferences on AI in legal practice [๐]
- Legal AI Conferences Online Platform - Centralized platform for legal AI events [๐]
- Embedding Benchmarking Tools: MTEB, Hugging Face evaluate, LegalBench, COLIEE [๐]
- Legal Argument Mining Tools: RMU:ECHR corpus and mining models [๐ป]
- Multilingual Legal Processing: Evaluation pipelines for multilingual legal LLMs [๐]
- LegalEval-Q: Quality evaluation for LLM-generated legal text [๐]
- FairLex Evaluation: Bias and fairness assessment [๐]
Last Updated: 2025-09-30 Research Coverage: 2024-01 to 2025-09 Sources: 180+ academic papers, datasets, and conference proceedings