Skip to content

Commit 9c77a10

Browse files
docs: improve seo; add official source notice
1 parent d92e69a commit 9c77a10

File tree

15 files changed

+669
-344
lines changed

15 files changed

+669
-344
lines changed

.github/workflows/docs.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -40,15 +40,15 @@ jobs:
4040
- name: Build documentation
4141
run: |
4242
cd docs
43-
uv run sphinx-build -b html source _build/html -v -E -W
43+
uv run sphinx-build -b dirhtml source _build/dirhtml -v -E -W
4444
4545
- name: Create .nojekyll file
46-
run: touch docs/_build/html/.nojekyll
46+
run: touch docs/_build/dirhtml/.nojekyll
4747

4848
- name: Upload artifact
4949
uses: actions/upload-pages-artifact@v3
5050
with:
51-
path: ./docs/_build/html
51+
path: ./docs/_build/dirhtml
5252

5353
deploy:
5454
environment:

CONTRIBUTING.md

Lines changed: 27 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -484,19 +484,42 @@ The log output will show detailed information about test execution.
484484

485485
### 🏗️ Building the Documentation
486486

487-
Navigate to the `docs/` directory and run:
487+
Navigate to the `docs/` directory and choose your preferred build method:
488+
489+
#### For Live Development (Recommended)
490+
491+
Use `sphinx-autobuild` for live reloading during development:
492+
493+
```bash
494+
# Live rebuild with auto-refresh on file changes
495+
make livehtml
496+
# Or on Windows: ./make.bat livehtml
497+
```
498+
499+
This starts a development server on `http://localhost:9000` with:
500+
- Automatic rebuilds when files change
501+
- Browser auto-refresh
502+
- Pretty URLs without `.html` extensions
503+
504+
#### For Static Builds
505+
506+
For one-time builds or CI-style building:
488507

489508
```bash
490509
# Build with verbose output, ignore cache, and treat warnings as errors
491510
# (recommended for structural changes)
492-
uv run sphinx-build -b html source build/html -v -E -W
511+
uv run sphinx-build -b dirhtml source build/dirhtml -v -E -W
493512
```
494513

495-
The `-E` flag ensures Sphinx completely rebuilds the environment, which is especially important after structural changes like modifying toctree directives or removing files.
514+
The `-E` flag ensures Sphinx completely rebuilds the environment, which is especially important after structural changes like modifying toctree directives or removing files. The `dirhtml` format creates pretty URLs without `.html` extensions, consistent with the live development server.
496515

497516
### 👀 Viewing the Documentation
498517

499-
After building, open `build/html/index.html` in your web browser to view the documentation.
518+
**With Live Development:**
519+
The documentation automatically opens at `http://localhost:9000` when using `make livehtml`.
520+
521+
**With Static Builds:**
522+
After building, open `build/dirhtml/index.html` in your web browser to view the documentation.
500523

501524
### 🌐 Live Documentation
502525

NOTICE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@ Development Dependencies:
5454
- python-dotenv: Environment variable management
5555
- ruff: Fast Python linter and formatter
5656
- sphinx: Documentation generator
57+
- sphinx-autobuild: Live-reloading docs builder for Sphinx
5758
- sphinx-autodoc-typehints: Type annotation support for Sphinx
5859
- sphinx-book-theme: Book-like theme for Sphinx
5960
- sphinx-copybutton: Adds copy button to code blocks in Sphinx docs

README.md

Lines changed: 34 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ Most popular LLM frameworks for extracting structured data from documents requir
2525

2626
ContextGem addresses this challenge by providing a flexible, intuitive framework that extracts structured data and insights from documents with minimal effort. The complex, most time-consuming parts are handled with **powerful abstractions**, eliminating boilerplate code and reducing development overhead.
2727

28-
📖 Read more on the project [motivation](https://contextgem.dev/motivation.html) in the documentation.
28+
📖 Read more on the project [motivation](https://contextgem.dev/motivation/) in the documentation.
2929

3030

3131
## ⭐ Key features
@@ -151,15 +151,15 @@ ContextGem addresses this challenge by providing a flexible, intuitive framework
151151
🟡 - partially supported - requires additional setup<br>
152152
◯ - not supported - requires custom logic
153153

154-
\* See [descriptions](https://contextgem.dev/motivation.html#the-contextgem-solution) of ContextGem abstractions and [comparisons](https://contextgem.dev/vs_other_frameworks.html) of specific implementation examples using ContextGem and other popular open-source LLM frameworks.
154+
\* See [descriptions](https://contextgem.dev/motivation/#the-contextgem-solution) of ContextGem abstractions and [comparisons](https://contextgem.dev/vs_other_frameworks/) of specific implementation examples using ContextGem and other popular open-source LLM frameworks.
155155

156156
## 💡 What you can build
157157

158158
With **minimal code**, you can:
159159

160160
- **Extract structured data** from documents (text, images)
161-
- **Identify and analyze key aspects** (topics, themes, categories) within documents ([learn more](https://contextgem.dev/aspects/aspects.html))
162-
- **Extract specific concepts** (entities, facts, conclusions, assessments) from documents ([learn more](https://contextgem.dev/concepts/supported_concepts.html))
161+
- **Identify and analyze key aspects** (topics, themes, categories) within documents ([learn more](https://contextgem.dev/aspects/aspects/))
162+
- **Extract specific concepts** (entities, facts, conclusions, assessments) from documents ([learn more](https://contextgem.dev/concepts/supported_concepts/))
163163
- **Build complex extraction workflows** through a simple, intuitive API
164164
- **Create multi-level extraction pipelines** (aspects containing concepts, hierarchical aspects)
165165

@@ -263,7 +263,7 @@ for item in anomalies_concept.extracted_items:
263263
</thead>
264264
<tbody>
265265
<tr>
266-
<td>Create a Document that contains text and/or visual content representing your document (contract, invoice, report, CV, etc.), from which an LLM extracts information (aspects and/or concepts). <a href="https://contextgem.dev/documents/document_config.html">Learn more</a></td>
266+
<td>Create a Document that contains text and/or visual content representing your document (contract, invoice, report, CV, etc.), from which an LLM extracts information (aspects and/or concepts). <a href="https://contextgem.dev/documents/document_config/">Learn more</a></td>
267267
</tr>
268268
</tbody>
269269
</table>
@@ -283,8 +283,8 @@ document = Document(raw_text="Non-Disclosure Agreement...")
283283
</thead>
284284
<tbody>
285285
<tr>
286-
<td>Define Aspects to extract text segments from the document (sections, topics, themes). You can organize content hierarchically and combine with concepts for comprehensive analysis. <a href="https://contextgem.dev/aspects/aspects.html">Learn more</a></td>
287-
<td>Define Concepts to extract specific data points with intelligent inference: entities, insights, structured objects, classifications, numerical calculations, dates, ratings, and assessments. <a href="https://contextgem.dev/concepts/supported_concepts.html">Learn more</a></td>
286+
<td>Define Aspects to extract text segments from the document (sections, topics, themes). You can organize content hierarchically and combine with concepts for comprehensive analysis. <a href="https://contextgem.dev/aspects/aspects/">Learn more</a></td>
287+
<td>Define Concepts to extract specific data points with intelligent inference: entities, insights, structured objects, classifications, numerical calculations, dates, ratings, and assessments. <a href="https://contextgem.dev/concepts/supported_concepts/">Learn more</a></td>
288288
</tr>
289289
</tbody>
290290
</table>
@@ -313,7 +313,7 @@ document.add_concepts([concept])
313313
</thead>
314314
<tbody>
315315
<tr>
316-
<td>Create a reusable collection of predefined aspects and concepts that enables consistent extraction across multiple documents. <a href="https://contextgem.dev/pipelines/extraction_pipelines.html">Learn more</a></td>
316+
<td>Create a reusable collection of predefined aspects and concepts that enables consistent extraction across multiple documents. <a href="https://contextgem.dev/pipelines/extraction_pipelines/">Learn more</a></td>
317317
</tr>
318318
</tbody>
319319
</table>
@@ -329,8 +329,8 @@ document.add_concepts([concept])
329329
</thead>
330330
<tbody>
331331
<tr>
332-
<td>Configure a cloud or local LLM that will extract aspects and/or concepts from the document. DocumentLLM supports fallback models and role-based task routing for optimal performance. <a href="https://contextgem.dev/llms/llm_extraction_methods.html">Learn more</a></td>
333-
<td>Configure a group of LLMs with unique roles for complex extraction workflows. You can route different aspects and/or concepts to specialized LLMs (e.g., simple extraction vs. reasoning tasks). <a href="https://contextgem.dev/llms/llm_config.html#llm-groups">Learn more</a></td>
332+
<td>Configure a cloud or local LLM that will extract aspects and/or concepts from the document. DocumentLLM supports fallback models and role-based task routing for optimal performance. <a href="https://contextgem.dev/llms/llm_extraction_methods/">Learn more</a></td>
333+
<td>Configure a group of LLMs with unique roles for complex extraction workflows. You can route different aspects and/or concepts to specialized LLMs (e.g., simple extraction vs. reasoning tasks). <a href="https://contextgem.dev/llms/llm_config/#llm-groups">Learn more</a></td>
334334
</tr>
335335
</tbody>
336336
</table>
@@ -345,22 +345,22 @@ document = llm.extract_all(document)
345345
# print(document.concepts[0].extracted_items)
346346
```
347347

348-
📖 Learn more about ContextGem's [core components](https://contextgem.dev/how_it_works.html) and their practical examples in the documentation.
348+
📖 Learn more about ContextGem's [core components](https://contextgem.dev/how_it_works/) and their practical examples in the documentation.
349349

350350
## 📚 Usage Examples
351351

352352
🌟 **Basic usage:**
353-
- [Aspect Extraction from Document](https://contextgem.dev/quickstart.html#aspect-extraction-from-document)
354-
- [Extracting Aspect with Sub-Aspects](https://contextgem.dev/quickstart.html#extracting-aspect-with-sub-aspects)
355-
- [Concept Extraction from Aspect](https://contextgem.dev/quickstart.html#concept-extraction-from-aspect)
356-
- [Concept Extraction from Document (text)](https://contextgem.dev/quickstart.html#concept-extraction-from-document-text)
357-
- [Concept Extraction from Document (vision)](https://contextgem.dev/quickstart.html#concept-extraction-from-document-vision)
358-
- [LLM chat interface](https://contextgem.dev/quickstart.html#lightweight-llm-chat-interface)
353+
- [Aspect Extraction from Document](https://contextgem.dev/quickstart/#aspect-extraction-from-document)
354+
- [Extracting Aspect with Sub-Aspects](https://contextgem.dev/quickstart/#extracting-aspect-with-sub-aspects)
355+
- [Concept Extraction from Aspect](https://contextgem.dev/quickstart/#concept-extraction-from-aspect)
356+
- [Concept Extraction from Document (text)](https://contextgem.dev/quickstart/#concept-extraction-from-document-text)
357+
- [Concept Extraction from Document (vision)](https://contextgem.dev/quickstart/#concept-extraction-from-document-vision)
358+
- [LLM chat interface](https://contextgem.dev/quickstart/#lightweight-llm-chat-interface)
359359

360360
🚀 **Advanced usage:**
361-
- [Extracting Aspects Containing Concepts](https://contextgem.dev/advanced_usage.html#extracting-aspects-with-concepts)
362-
- [Extracting Aspects and Concepts from a Document](https://contextgem.dev/advanced_usage.html#extracting-aspects-and-concepts-from-a-document)
363-
- [Using a Multi-LLM Pipeline to Extract Data from Several Documents](https://contextgem.dev/advanced_usage.html#using-a-multi-llm-pipeline-to-extract-data-from-several-documents)
361+
- [Extracting Aspects Containing Concepts](https://contextgem.dev/advanced_usage/#extracting-aspects-with-concepts)
362+
- [Extracting Aspects and Concepts from a Document](https://contextgem.dev/advanced_usage/#extracting-aspects-and-concepts-from-a-document)
363+
- [Using a Multi-LLM Pipeline to Extract Data from Several Documents](https://contextgem.dev/advanced_usage/#using-a-multi-llm-pipeline-to-extract-data-from-several-documents)
364364

365365

366366
## 🔄 Document converters
@@ -405,14 +405,14 @@ docx_text = converter.convert_to_text_format(
405405

406406
```
407407

408-
📖 Learn more about [DOCX converter features](https://contextgem.dev/converters/docx.html) in the documentation.
408+
📖 Learn more about [DOCX converter features](https://contextgem.dev/converters/docx/) in the documentation.
409409

410410

411411
## 🎯 Focused document analysis
412412

413413
ContextGem leverages LLMs' long context windows to deliver superior extraction accuracy from individual documents. Unlike RAG approaches that often [struggle with complex concepts and nuanced insights](https://www.linkedin.com/pulse/raging-contracts-pitfalls-rag-contract-review-shcherbak-ai-ptg3f), ContextGem capitalizes on continuously expanding context capacity, evolving LLM capabilities, and decreasing costs. This focused approach enables direct information extraction from complete documents, eliminating retrieval inconsistencies while optimizing for in-depth single-document analysis. While this delivers higher accuracy for individual documents, ContextGem does not currently support cross-document querying or corpus-wide retrieval - for these use cases, modern RAG frameworks (e.g., LlamaIndex, Haystack) remain more appropriate.
414414

415-
📖 Read more on [how ContextGem works](https://contextgem.dev/how_it_works.html) in the documentation.
415+
📖 Read more on [how ContextGem works](https://contextgem.dev/how_it_works/) in the documentation.
416416

417417
## 🤖 Supported LLMs
418418

@@ -422,20 +422,20 @@ ContextGem supports both cloud-based and local LLMs through [LiteLLM](https://gi
422422
- **Model Architectures**: Works with both reasoning/CoT-capable (e.g. gpt-5) and non-reasoning models (e.g. gpt-4.1)
423423
- **Simple API**: Unified interface for all LLMs with easy provider switching
424424

425-
> **💡 Model Selection Note:** For reliable structured extraction, we recommend using models with performance equivalent to or exceeding `gpt-4o-mini`. Smaller models (such as 8B parameter models) may struggle with ContextGem's detailed extraction instructions. If you encounter issues with smaller models, see our [troubleshooting guide](https://contextgem.dev/optimizations/optimization_small_llm_troubleshooting.html) for potential solutions.
425+
> **💡 Model Selection Note:** For reliable structured extraction, we recommend using models with performance equivalent to or exceeding `gpt-4o-mini`. Smaller models (such as 8B parameter models) may struggle with ContextGem's detailed extraction instructions. If you encounter issues with smaller models, see our [troubleshooting guide](https://contextgem.dev/optimizations/optimization_small_llm_troubleshooting/) for potential solutions.
426426
427-
📖 Learn more about [supported LLM providers and models](https://contextgem.dev/llms/supported_llms.html), how to [configure LLMs](https://contextgem.dev/llms/llm_config.html), and [LLM extraction methods](https://contextgem.dev/llms/llm_extraction_methods.html) in the documentation.
427+
📖 Learn more about [supported LLM providers and models](https://contextgem.dev/llms/supported_llms/), how to [configure LLMs](https://contextgem.dev/llms/llm_config/), and [LLM extraction methods](https://contextgem.dev/llms/llm_extraction_methods/) in the documentation.
428428

429429
## ⚡ Optimizations
430430

431431
ContextGem documentation offers guidance on optimization strategies to maximize performance, minimize costs, and enhance extraction accuracy:
432432

433-
- [Optimizing for Accuracy](https://contextgem.dev/optimizations/optimization_accuracy.html)
434-
- [Optimizing for Speed](https://contextgem.dev/optimizations/optimization_speed.html)
435-
- [Optimizing for Cost](https://contextgem.dev/optimizations/optimization_cost.html)
436-
- [Dealing with Long Documents](https://contextgem.dev/optimizations/optimization_long_docs.html)
437-
- [Choosing the Right LLM(s)](https://contextgem.dev/optimizations/optimization_choosing_llm.html)
438-
- [Troubleshooting Issues with Small Models](https://contextgem.dev/optimizations/optimization_small_llm_troubleshooting.html)
433+
- [Optimizing for Accuracy](https://contextgem.dev/optimizations/optimization_accuracy/)
434+
- [Optimizing for Speed](https://contextgem.dev/optimizations/optimization_speed/)
435+
- [Optimizing for Cost](https://contextgem.dev/optimizations/optimization_cost/)
436+
- [Dealing with Long Documents](https://contextgem.dev/optimizations/optimization_long_docs/)
437+
- [Choosing the Right LLM(s)](https://contextgem.dev/optimizations/optimization_choosing_llm/)
438+
- [Troubleshooting Issues with Small Models](https://contextgem.dev/optimizations/optimization_small_llm_troubleshooting/)
439439

440440

441441
## 💾 Serializing results
@@ -446,14 +446,16 @@ ContextGem allows you to save and load Document objects, pipelines, and LLM conf
446446
- Transfer extraction results between systems
447447
- Persist pipeline and LLM configurations for later reuse
448448

449-
📖 Learn more about [serialization options](https://contextgem.dev/serialization.html) in the documentation.
449+
📖 Learn more about [serialization options](https://contextgem.dev/serialization/) in the documentation.
450450

451451

452452
## 📚 Documentation
453453

454454
📖 **Full documentation:** [contextgem.dev](https://contextgem.dev)
455455

456-
📄 **Raw documentation for LLMs:** Available at [`docs/docs-raw-for-llm.txt`](https://github.com/shcherbak-ai/contextgem/blob/main/docs/docs-raw-for-llm.txt) - automatically generated, optimized for LLM ingestion.
456+
> **⚠️ Official Documentation Notice:** [https://contextgem.dev/](https://contextgem.dev/) is the only official source of ContextGem documentation. Please be aware of unauthorized copies or mirrors that may contain outdated or incorrect information.
457+
458+
📄 **Raw documentation for LLMs:** Available at [`docs/source/llms.txt`](https://github.com/shcherbak-ai/contextgem/blob/main/docs/source/llms.txt) - automatically generated, optimized for LLM ingestion.
457459

458460
🤖 **AI-powered code exploration:** [DeepWiki](https://deepwiki.com/shcherbak-ai/contextgem) provides visual architecture maps and natural language Q&A for the codebase.
459461

contextgem/internal/base/llms.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3186,7 +3186,7 @@ def _post_init(self, __context: Any):
31863186
logger.info(
31873187
"Using local model provider. If you experience issues like JSON validation errors "
31883188
"with smaller models, see our troubleshooting guide: "
3189-
"https://contextgem.dev/optimizations/optimization_small_llm_troubleshooting.html"
3189+
"https://contextgem.dev/optimizations/optimization_small_llm_troubleshooting/"
31903190
)
31913191

31923192
# Recommend `ollama_chat` prefix for better responses for Ollama models (text-only processing)
@@ -4008,7 +4008,7 @@ def _validate_document_llm_post(self) -> Self:
40084008
f"while the model is reasoning-capable. If you intend to route reasoning tasks "
40094009
f"to this model, consider using a `reasoner_*` role to match aspect/concept `llm_role` "
40104010
f"and keep pipeline roles consistent. See "
4011-
f"https://contextgem.dev/optimizations/optimization_choosing_llm.html",
4011+
f"https://contextgem.dev/optimizations/optimization_choosing_llm/",
40124012
stacklevel=2,
40134013
)
40144014

@@ -4086,7 +4086,7 @@ def _validate_input_tokens(self, messages: list[dict[str, str]]) -> None:
40864086
f"(for text) or `max_images_to_analyze_per_call` (for images) to process the "
40874087
f"document in smaller chunks. "
40884088
f"See the optimization guide for long documents: "
4089-
f"https://contextgem.dev/optimizations/optimization_long_docs.html"
4089+
f"https://contextgem.dev/optimizations/optimization_long_docs/"
40904090
)
40914091

40924092
logger.debug(
@@ -4153,7 +4153,7 @@ def _validate_output_tokens(self) -> None:
41534153
f"(for text) or `max_images_to_analyze_per_call` (for images) to process the "
41544154
f"document in smaller chunks. "
41554155
f"See the optimization guide for long documents: "
4156-
f"https://contextgem.dev/optimizations/optimization_long_docs.html"
4156+
f"https://contextgem.dev/optimizations/optimization_long_docs/"
41574157
)
41584158

41594159
logger.debug(

0 commit comments

Comments
 (0)