Skip to content

Commit 608f0b7

Browse files
Merge pull request #11 from shcherbak-ai/dev
Refactored DOCX converter internals for better maintainability. Updated README. Added CHANGELOG.
2 parents a64c81b + 4285453 commit 608f0b7

File tree

16 files changed

+1878
-1597
lines changed

16 files changed

+1878
-1597
lines changed

CHANGELOG.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# Changelog
2+
All notable changes to ContextGem will be documented in this file.
3+
4+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), with the following additional categories:
5+
6+
- **Refactor**: Code reorganization that doesn't change functionality but improves structure or maintainability
7+
8+
## [0.2.2](https://github.com/shcherbak-ai/contextgem/releases/tag/v0.2.2) - 2025-05-02
9+
### Refactor
10+
- Refactored DOCX converter internals for better maintainability
11+
12+
## [0.2.1](https://github.com/shcherbak-ai/contextgem/releases/tag/v0.2.1) - 2023-04-30
13+
### Fixed
14+
- Fixed litellm dependency issue, pinning to version ==1.67.1 to avoid encoding bug in newer versions of litellm
15+
16+
## [0.2.0](https://github.com/shcherbak-ai/contextgem/releases/tag/v0.2.0) - 2023-04-21
17+
### Added
18+
- Added DocxConverter for converting DOCX files into ContextGem Document objects

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
[![Poetry](https://img.shields.io/endpoint?url=https://python-poetry.org/badge/v0.json)](https://python-poetry.org/)
1717
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-blue?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)
1818
[![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-2.1-4baaaa.svg)](CODE_OF_CONDUCT.md)
19+
[![Reddit](https://img.shields.io/badge/Reddit-Community-FF4500?logo=reddit&logoColor=white)](https://reddit.com/r/ContextGem)
1920

2021
<img src="https://contextgem.dev/_static/tab_solid.png" alt="ContextGem: 2nd Product of the week" width="250">
2122
<br/><br/>
@@ -340,6 +341,8 @@ A raw text version of the full documentation is available at [`docs/docs-raw-for
340341

341342
If you have a feature request or a bug report, feel free to [open an issue](https://github.com/shcherbak-ai/contextgem/issues/new) on GitHub. If you'd like to discuss a topic or get general advice on using ContextGem for your project, start a thread in [GitHub Discussions](https://github.com/shcherbak-ai/contextgem/discussions/new/).
342343

344+
Join our [Reddit community](https://reddit.com/r/ContextGem) to connect with other users, share your projects, ask questions, and participate in discussions about ContextGem.
345+
343346

344347
## 🤝 Contributing
345348

contextgem/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
ContextGem - Effortless LLM extraction from documents
2121
"""
2222

23-
__version__ = "0.2.1"
23+
__version__ = "0.2.2"
2424
__author__ = "Shcherbak AI AS"
2525

2626
from contextgem.public import (

contextgem/internal/__init__.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,15 @@
2828
_PostInitCollectorMixin,
2929
_RefParasAndSentsAttrituteProcessor,
3030
)
31+
from contextgem.internal.converters import (
32+
WORD_XML_NAMESPACES,
33+
DocxContentError,
34+
DocxConverterError,
35+
DocxFormatError,
36+
DocxXmlError,
37+
_DocxConverterBase,
38+
_DocxPackage,
39+
)
3140
from contextgem.internal.data_models import (
3241
_LLMCall,
3342
_LLMCost,
@@ -155,4 +164,13 @@
155164
"_is_json_serializable",
156165
"_get_sat_model",
157166
"_setup_jinja2_template",
167+
# Converters
168+
# DOCX
169+
"WORD_XML_NAMESPACES",
170+
"DocxContentError",
171+
"DocxConverterError",
172+
"DocxFormatError",
173+
"DocxXmlError",
174+
"_DocxConverterBase",
175+
"_DocxPackage",
158176
]
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
#
2+
# ContextGem
3+
#
4+
# Copyright 2025 Shcherbak AI AS. All rights reserved. Developed by Sergii Shcherbak.
5+
#
6+
# Licensed under the Apache License, Version 2.0 (the "License");
7+
# you may not use this file except in compliance with the License.
8+
# You may obtain a copy of the License at
9+
#
10+
# http://www.apache.org/licenses/LICENSE-2.0
11+
#
12+
# Unless required by applicable law or agreed to in writing, software
13+
# distributed under the License is distributed on an "AS IS" BASIS,
14+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
# See the License for the specific language governing permissions and
16+
# limitations under the License.
17+
#
18+
19+
from contextgem.internal.converters.docx import (
20+
WORD_XML_NAMESPACES,
21+
DocxContentError,
22+
DocxConverterError,
23+
DocxFormatError,
24+
DocxXmlError,
25+
_DocxConverterBase,
26+
_DocxPackage,
27+
)
28+
29+
__all__ = [
30+
"WORD_XML_NAMESPACES",
31+
"DocxConverterError",
32+
"DocxFormatError",
33+
"DocxXmlError",
34+
"DocxContentError",
35+
"_DocxConverterBase",
36+
"_DocxPackage",
37+
]
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
#
2+
# ContextGem
3+
#
4+
# Copyright 2025 Shcherbak AI AS. All rights reserved. Developed by Sergii Shcherbak.
5+
#
6+
# Licensed under the Apache License, Version 2.0 (the "License");
7+
# you may not use this file except in compliance with the License.
8+
# You may obtain a copy of the License at
9+
#
10+
# http://www.apache.org/licenses/LICENSE-2.0
11+
#
12+
# Unless required by applicable law or agreed to in writing, software
13+
# distributed under the License is distributed on an "AS IS" BASIS,
14+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
# See the License for the specific language governing permissions and
16+
# limitations under the License.
17+
#
18+
19+
from contextgem.internal.converters.docx.base import _DocxConverterBase
20+
from contextgem.internal.converters.docx.exceptions import (
21+
DocxContentError,
22+
DocxConverterError,
23+
DocxFormatError,
24+
DocxXmlError,
25+
)
26+
from contextgem.internal.converters.docx.namespaces import WORD_XML_NAMESPACES
27+
from contextgem.internal.converters.docx.package import _DocxPackage
28+
29+
__all__ = [
30+
"_DocxConverterBase",
31+
"DocxConverterError",
32+
"DocxFormatError",
33+
"DocxXmlError",
34+
"DocxContentError",
35+
"WORD_XML_NAMESPACES",
36+
"_DocxPackage",
37+
]

0 commit comments

Comments
 (0)