You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CONTRIBUTING.md
+27-4Lines changed: 27 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -484,19 +484,42 @@ The log output will show detailed information about test execution.
484
484
485
485
### 🏗️ Building the Documentation
486
486
487
-
Navigate to the `docs/` directory and run:
487
+
Navigate to the `docs/` directory and choose your preferred build method:
488
+
489
+
#### For Live Development (Recommended)
490
+
491
+
Use `sphinx-autobuild` for live reloading during development:
492
+
493
+
```bash
494
+
# Live rebuild with auto-refresh on file changes
495
+
make livehtml
496
+
# Or on Windows: ./make.bat livehtml
497
+
```
498
+
499
+
This starts a development server on `http://localhost:9000` with:
500
+
- Automatic rebuilds when files change
501
+
- Browser auto-refresh
502
+
- Pretty URLs without `.html` extensions
503
+
504
+
#### For Static Builds
505
+
506
+
For one-time builds or CI-style building:
488
507
489
508
```bash
490
509
# Build with verbose output, ignore cache, and treat warnings as errors
491
510
# (recommended for structural changes)
492
-
uv run sphinx-build -b htmlsource build/html -v -E -W
511
+
uv run sphinx-build -b dirhtmlsource build/dirhtml -v -E -W
493
512
```
494
513
495
-
The `-E` flag ensures Sphinx completely rebuilds the environment, which is especially important after structural changes like modifying toctree directives or removing files.
514
+
The `-E` flag ensures Sphinx completely rebuilds the environment, which is especially important after structural changes like modifying toctree directives or removing files. The `dirhtml` format creates pretty URLs without `.html` extensions, consistent with the live development server.
496
515
497
516
### 👀 Viewing the Documentation
498
517
499
-
After building, open `build/html/index.html` in your web browser to view the documentation.
518
+
**With Live Development:**
519
+
The documentation automatically opens at `http://localhost:9000` when using `make livehtml`.
520
+
521
+
**With Static Builds:**
522
+
After building, open `build/dirhtml/index.html` in your web browser to view the documentation.
Copy file name to clipboardExpand all lines: README.md
+34-32Lines changed: 34 additions & 32 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,7 +25,7 @@ Most popular LLM frameworks for extracting structured data from documents requir
25
25
26
26
ContextGem addresses this challenge by providing a flexible, intuitive framework that extracts structured data and insights from documents with minimal effort. The complex, most time-consuming parts are handled with **powerful abstractions**, eliminating boilerplate code and reducing development overhead.
27
27
28
-
📖 Read more on the project [motivation](https://contextgem.dev/motivation.html) in the documentation.
28
+
📖 Read more on the project [motivation](https://contextgem.dev/motivation/) in the documentation.
29
29
30
30
31
31
## ⭐ Key features
@@ -151,15 +151,15 @@ ContextGem addresses this challenge by providing a flexible, intuitive framework
\* See [descriptions](https://contextgem.dev/motivation.html#the-contextgem-solution) of ContextGem abstractions and [comparisons](https://contextgem.dev/vs_other_frameworks.html) of specific implementation examples using ContextGem and other popular open-source LLM frameworks.
154
+
\* See [descriptions](https://contextgem.dev/motivation/#the-contextgem-solution) of ContextGem abstractions and [comparisons](https://contextgem.dev/vs_other_frameworks/) of specific implementation examples using ContextGem and other popular open-source LLM frameworks.
155
155
156
156
## 💡 What you can build
157
157
158
158
With **minimal code**, you can:
159
159
160
160
-**Extract structured data** from documents (text, images)
161
-
-**Identify and analyze key aspects** (topics, themes, categories) within documents ([learn more](https://contextgem.dev/aspects/aspects.html))
162
-
-**Extract specific concepts** (entities, facts, conclusions, assessments) from documents ([learn more](https://contextgem.dev/concepts/supported_concepts.html))
161
+
-**Identify and analyze key aspects** (topics, themes, categories) within documents ([learn more](https://contextgem.dev/aspects/aspects/))
162
+
-**Extract specific concepts** (entities, facts, conclusions, assessments) from documents ([learn more](https://contextgem.dev/concepts/supported_concepts/))
163
163
-**Build complex extraction workflows** through a simple, intuitive API
@@ -263,7 +263,7 @@ for item in anomalies_concept.extracted_items:
263
263
</thead>
264
264
<tbody>
265
265
<tr>
266
-
<td>Create a Document that contains text and/or visual content representing your document (contract, invoice, report, CV, etc.), from which an LLM extracts information (aspects and/or concepts). <ahref="https://contextgem.dev/documents/document_config.html">Learn more</a></td>
266
+
<td>Create a Document that contains text and/or visual content representing your document (contract, invoice, report, CV, etc.), from which an LLM extracts information (aspects and/or concepts). <ahref="https://contextgem.dev/documents/document_config/">Learn more</a></td>
<td>Define Aspects to extract text segments from the document (sections, topics, themes). You can organize content hierarchically and combine with concepts for comprehensive analysis. <ahref="https://contextgem.dev/aspects/aspects.html">Learn more</a></td>
287
-
<td>Define Concepts to extract specific data points with intelligent inference: entities, insights, structured objects, classifications, numerical calculations, dates, ratings, and assessments. <ahref="https://contextgem.dev/concepts/supported_concepts.html">Learn more</a></td>
286
+
<td>Define Aspects to extract text segments from the document (sections, topics, themes). You can organize content hierarchically and combine with concepts for comprehensive analysis. <ahref="https://contextgem.dev/aspects/aspects/">Learn more</a></td>
287
+
<td>Define Concepts to extract specific data points with intelligent inference: entities, insights, structured objects, classifications, numerical calculations, dates, ratings, and assessments. <ahref="https://contextgem.dev/concepts/supported_concepts/">Learn more</a></td>
<td>Create a reusable collection of predefined aspects and concepts that enables consistent extraction across multiple documents. <ahref="https://contextgem.dev/pipelines/extraction_pipelines.html">Learn more</a></td>
316
+
<td>Create a reusable collection of predefined aspects and concepts that enables consistent extraction across multiple documents. <ahref="https://contextgem.dev/pipelines/extraction_pipelines/">Learn more</a></td>
<td>Configure a cloud or local LLM that will extract aspects and/or concepts from the document. DocumentLLM supports fallback models and role-based task routing for optimal performance. <ahref="https://contextgem.dev/llms/llm_extraction_methods.html">Learn more</a></td>
333
-
<td>Configure a group of LLMs with unique roles for complex extraction workflows. You can route different aspects and/or concepts to specialized LLMs (e.g., simple extraction vs. reasoning tasks). <ahref="https://contextgem.dev/llms/llm_config.html#llm-groups">Learn more</a></td>
332
+
<td>Configure a cloud or local LLM that will extract aspects and/or concepts from the document. DocumentLLM supports fallback models and role-based task routing for optimal performance. <ahref="https://contextgem.dev/llms/llm_extraction_methods/">Learn more</a></td>
333
+
<td>Configure a group of LLMs with unique roles for complex extraction workflows. You can route different aspects and/or concepts to specialized LLMs (e.g., simple extraction vs. reasoning tasks). <ahref="https://contextgem.dev/llms/llm_config/#llm-groups">Learn more</a></td>
-[Extracting Aspects and Concepts from a Document](https://contextgem.dev/advanced_usage.html#extracting-aspects-and-concepts-from-a-document)
363
-
-[Using a Multi-LLM Pipeline to Extract Data from Several Documents](https://contextgem.dev/advanced_usage.html#using-a-multi-llm-pipeline-to-extract-data-from-several-documents)
-[Extracting Aspects and Concepts from a Document](https://contextgem.dev/advanced_usage/#extracting-aspects-and-concepts-from-a-document)
363
+
-[Using a Multi-LLM Pipeline to Extract Data from Several Documents](https://contextgem.dev/advanced_usage/#using-a-multi-llm-pipeline-to-extract-data-from-several-documents)
📖 Learn more about [DOCX converter features](https://contextgem.dev/converters/docx.html) in the documentation.
408
+
📖 Learn more about [DOCX converter features](https://contextgem.dev/converters/docx/) in the documentation.
409
409
410
410
411
411
## 🎯 Focused document analysis
412
412
413
413
ContextGem leverages LLMs' long context windows to deliver superior extraction accuracy from individual documents. Unlike RAG approaches that often [struggle with complex concepts and nuanced insights](https://www.linkedin.com/pulse/raging-contracts-pitfalls-rag-contract-review-shcherbak-ai-ptg3f), ContextGem capitalizes on continuously expanding context capacity, evolving LLM capabilities, and decreasing costs. This focused approach enables direct information extraction from complete documents, eliminating retrieval inconsistencies while optimizing for in-depth single-document analysis. While this delivers higher accuracy for individual documents, ContextGem does not currently support cross-document querying or corpus-wide retrieval - for these use cases, modern RAG frameworks (e.g., LlamaIndex, Haystack) remain more appropriate.
414
414
415
-
📖 Read more on [how ContextGem works](https://contextgem.dev/how_it_works.html) in the documentation.
415
+
📖 Read more on [how ContextGem works](https://contextgem.dev/how_it_works/) in the documentation.
416
416
417
417
## 🤖 Supported LLMs
418
418
@@ -422,20 +422,20 @@ ContextGem supports both cloud-based and local LLMs through [LiteLLM](https://gi
422
422
-**Model Architectures**: Works with both reasoning/CoT-capable (e.g. gpt-5) and non-reasoning models (e.g. gpt-4.1)
423
423
-**Simple API**: Unified interface for all LLMs with easy provider switching
424
424
425
-
> **💡 Model Selection Note:** For reliable structured extraction, we recommend using models with performance equivalent to or exceeding `gpt-4o-mini`. Smaller models (such as 8B parameter models) may struggle with ContextGem's detailed extraction instructions. If you encounter issues with smaller models, see our [troubleshooting guide](https://contextgem.dev/optimizations/optimization_small_llm_troubleshooting.html) for potential solutions.
425
+
> **💡 Model Selection Note:** For reliable structured extraction, we recommend using models with performance equivalent to or exceeding `gpt-4o-mini`. Smaller models (such as 8B parameter models) may struggle with ContextGem's detailed extraction instructions. If you encounter issues with smaller models, see our [troubleshooting guide](https://contextgem.dev/optimizations/optimization_small_llm_troubleshooting/) for potential solutions.
426
426
427
-
📖 Learn more about [supported LLM providers and models](https://contextgem.dev/llms/supported_llms.html), how to [configure LLMs](https://contextgem.dev/llms/llm_config.html), and [LLM extraction methods](https://contextgem.dev/llms/llm_extraction_methods.html) in the documentation.
427
+
📖 Learn more about [supported LLM providers and models](https://contextgem.dev/llms/supported_llms/), how to [configure LLMs](https://contextgem.dev/llms/llm_config/), and [LLM extraction methods](https://contextgem.dev/llms/llm_extraction_methods/) in the documentation.
428
428
429
429
## ⚡ Optimizations
430
430
431
431
ContextGem documentation offers guidance on optimization strategies to maximize performance, minimize costs, and enhance extraction accuracy:
432
432
433
-
-[Optimizing for Accuracy](https://contextgem.dev/optimizations/optimization_accuracy.html)
434
-
-[Optimizing for Speed](https://contextgem.dev/optimizations/optimization_speed.html)
435
-
-[Optimizing for Cost](https://contextgem.dev/optimizations/optimization_cost.html)
436
-
-[Dealing with Long Documents](https://contextgem.dev/optimizations/optimization_long_docs.html)
437
-
-[Choosing the Right LLM(s)](https://contextgem.dev/optimizations/optimization_choosing_llm.html)
438
-
-[Troubleshooting Issues with Small Models](https://contextgem.dev/optimizations/optimization_small_llm_troubleshooting.html)
433
+
-[Optimizing for Accuracy](https://contextgem.dev/optimizations/optimization_accuracy/)
434
+
-[Optimizing for Speed](https://contextgem.dev/optimizations/optimization_speed/)
435
+
-[Optimizing for Cost](https://contextgem.dev/optimizations/optimization_cost/)
436
+
-[Dealing with Long Documents](https://contextgem.dev/optimizations/optimization_long_docs/)
437
+
-[Choosing the Right LLM(s)](https://contextgem.dev/optimizations/optimization_choosing_llm/)
438
+
-[Troubleshooting Issues with Small Models](https://contextgem.dev/optimizations/optimization_small_llm_troubleshooting/)
439
439
440
440
441
441
## 💾 Serializing results
@@ -446,14 +446,16 @@ ContextGem allows you to save and load Document objects, pipelines, and LLM conf
446
446
- Transfer extraction results between systems
447
447
- Persist pipeline and LLM configurations for later reuse
448
448
449
-
📖 Learn more about [serialization options](https://contextgem.dev/serialization.html) in the documentation.
449
+
📖 Learn more about [serialization options](https://contextgem.dev/serialization/) in the documentation.
📄 **Raw documentation for LLMs:** Available at [`docs/docs-raw-for-llm.txt`](https://github.com/shcherbak-ai/contextgem/blob/main/docs/docs-raw-for-llm.txt) - automatically generated, optimized for LLM ingestion.
456
+
> **⚠️ Official Documentation Notice:**[https://contextgem.dev/](https://contextgem.dev/) is the only official source of ContextGem documentation. Please be aware of unauthorized copies or mirrors that may contain outdated or incorrect information.
457
+
458
+
📄 **Raw documentation for LLMs:** Available at [`docs/source/llms.txt`](https://github.com/shcherbak-ai/contextgem/blob/main/docs/source/llms.txt) - automatically generated, optimized for LLM ingestion.
457
459
458
460
🤖 **AI-powered code exploration:**[DeepWiki](https://deepwiki.com/shcherbak-ai/contextgem) provides visual architecture maps and natural language Q&A for the codebase.
0 commit comments