Skip to content

Commit d6991e5

Browse files
🔧 rework custom internals & fix custom page_id (#200)
1 parent 99ac17b commit d6991e5

File tree

17 files changed

+227
-75
lines changed

17 files changed

+227
-75
lines changed

docs/extras/code_samples/custom_v1.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ from mindee import Client, PredictResponse, product
33
# Init a new client
44
mindee_client = Client(api_key="my-api-key")
55

6-
custom_endpoint = mindee_client.create_endpoint("field_test", "solution-eng-tests")
6+
custom_endpoint = mindee_client.create_endpoint("my-endpoint", "my-account")
77

88
# Load a file from disk
99
input_doc = mindee_client.source_from_path("/path/to/the/file.ext")

docs/extras/guide/custom_v1.md

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -47,16 +47,27 @@ If it is not set, it will default to "1".
4747

4848
A `ListField` is a special type of custom list that implements the following:
4949

50+
5051
* **confidence** (`float`): the confidence score of the field prediction.
51-
* **page_id** (`int`): the ID of the page.
5252
* **reconstructed** (`bool`): indicates whether or not an object was reconstructed (not extracted as the API gave it).
53+
* **values** (`List[`[ListFieldValue](#list-field-value)`]`): list of value fields
5354

5455
Since the inner contents can vary, the value isn't accessed through a property, but rather through the following functions:
5556
* **contents_list()** (`-> List[Union[str, float]]`): returns a list of values for each element.
5657
* **contents_string(separator=" ")** (`-> str`): returns a list of concatenated values, with an optional **separator** `str` between them.
5758
* **__str__()**: returns a string representation of all values, with an empty space between each of them.
5859

5960

61+
#### List Field Value
62+
63+
Values of `ListField`s are stored in a `ListFieldValue` structure, which is implemented as follows:
64+
* **content** (`str`): extracted content of the prediction
65+
* **confidence** (`float`): the confidence score of the prediction
66+
* **bounding_box** (`BBox`): 4 relative vertices corrdinates of a rectangle containing the word in the document.
67+
* **polygon** (`Polygon`): vertices of a polygon containing the word.
68+
* **page_id** (`int`): the ID of the page, is `undefined` when at document-level.
69+
70+
6071
### Classification Field
6172

6273
A `ClassificationField` is a special type of custom classification that implements the following:
@@ -99,7 +110,7 @@ The **columns_to_line_items()** function can be called from the document and pag
99110

100111
It takes the following arguments:
101112

102-
* **anchor_names** (`List[str]`): a list of the names of possible anchor (field) candidate for the horizontal placement a line. If all provided anchors are invalid, the `LineItemV1` won't be built.
113+
* **anchor_names** (`List[str]`): a list of the names of possible anchor (field) candidate for the horizontal placement a line. If all provided anchors are invalid, the `CustomLine` won't be built.
103114
* **field_names** (`List[str]`): a list of fields to retrieve the values from
104115
* **height_tolerance** (`float`): Optional, the height tolerance used to build the line. It helps when the height of a line can vary unexpectedly.
105116

@@ -121,14 +132,14 @@ response.document.pages[0].prediction.columns_to_line_items(
121132
)
122133
```
123134

124-
It returns a list of [CustomLineV1](#CustomlineV1) objects.
135+
It returns a list of [CustomLine](#CustomLine) objects.
125136

126-
## CustomlineV1
137+
## CustomLine
127138

128-
`CustomlineV1` represents a line as it has been read from column fields. It has the following attributes:
139+
`CustomLine` represents a line as it has been read from column fields. It has the following attributes:
129140

130141
* **row_number** (`int`): Number of a given line. Starts at 1.
131-
* **fields** (`Dict[str, ListFieldValueV1]`[]): List of the fields associated with the line, indexed by their column name.
142+
* **fields** (`Dict[str, ListFieldValue]`[]): List of the fields associated with the line, indexed by their column name.
132143
* **bbox** (`BBox`): Simple bounding box of the current line representing the 4 minimum & maximum coordinates as `float` values.
133144

134145

docs/parsing/custom.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,26 +4,26 @@ Custom Fields
44

55
Classification
66
==============
7-
.. autoclass:: mindee.parsing.custom.classification.ClassificationFieldV1
7+
.. autoclass:: mindee.parsing.custom.classification.ClassificationField
88
:members:
99

1010

1111
Line Items
1212
==========
13-
.. autoclass:: mindee.parsing.custom.line_items.CustomLineV1
13+
.. autoclass:: mindee.parsing.custom.line_items.CustomLine
1414
:members:
1515

1616
Lists
1717
=====
1818

1919
List Field
2020
----------
21-
.. autoclass:: mindee.parsing.custom.list.ListFieldV1
21+
.. autoclass:: mindee.parsing.custom.list.ListField
2222
:members:
2323

2424
List Field Value
2525
----------------
26-
.. autoclass:: mindee.parsing.custom.list.ListFieldValueV1
26+
.. autoclass:: mindee.parsing.custom.list.ListFieldValue
2727
:members:
2828

2929
String Dict

mindee/error/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
from mindee.error.geometry_error import GeometryError
22
from mindee.error.mimetype_error import MimeTypeError
3-
from mindee.error.mindee_error import MindeeClientError, MindeeError
3+
from mindee.error.mindee_error import MindeeClientError, MindeeError, MindeeProductError
44
from mindee.error.mindee_http_error import (
55
MindeeHTTPClientError,
66
MindeeHTTPError,

mindee/error/mindee_error.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,3 +16,7 @@ class MindeeApiError(MindeeError):
1616

1717
class MindeeSourceError(MindeeError):
1818
"""An exception relating to document loading."""
19+
20+
21+
class MindeeProductError(MindeeApiError):
22+
"""An exception relating to the use of an incorrect product/version."""

mindee/input/sources.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ class LocalInputSource:
4545
filename: str
4646
file_mimetype: str
4747
input_type: InputType
48-
filepath: Optional[str] = None
48+
filepath: Optional[str]
4949

5050
def __init__(self, input_type: InputType):
5151
self.input_type = input_type

mindee/parsing/common/job.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ class Job:
1414
"""ID of the job sent by the API in response to an enqueue request."""
1515
issued_at: datetime
1616
"""Timestamp of the request reception by the API."""
17-
available_at: Optional[datetime] = None
17+
available_at: Optional[datetime]
1818
"""Timestamp of the request after it has been completed."""
1919
status: str
2020
"""Status of the request, as seen by the API."""
@@ -30,6 +30,8 @@ def __init__(self, json_response: dict) -> None:
3030
self.issued_at = datetime.fromisoformat(json_response["issued_at"])
3131
if json_response.get("available_at"):
3232
self.available_at = datetime.fromisoformat(json_response["available_at"])
33+
else:
34+
self.available_at = None
3335
self.id = json_response["id"]
3436
self.status = json_response["status"]
3537
if self.available_at:

mindee/parsing/custom/__init__.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
from mindee.parsing.custom.classification import ClassificationFieldV1
2-
from mindee.parsing.custom.line_items import CustomLineV1, get_line_items
3-
from mindee.parsing.custom.list import ListFieldV1, ListFieldValueV1
1+
from mindee.parsing.custom.classification import ClassificationField
2+
from mindee.parsing.custom.line_items import CustomLine, get_line_items
3+
from mindee.parsing.custom.list import ListField, ListFieldValue

mindee/parsing/custom/classification.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
from mindee.parsing.common.string_dict import StringDict
22

33

4-
class ClassificationFieldV1:
4+
class ClassificationField:
55
"""A classification field."""
66

77
value: str

mindee/parsing/custom/line_items.py

Lines changed: 19 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,10 @@
44
from mindee.geometry.bbox import BBox, extend_bbox, get_bbox
55
from mindee.geometry.minmax import MinMax, get_min_max_y
66
from mindee.geometry.quadrilateral import get_bounding_box
7-
from mindee.parsing.custom.list import ListFieldV1, ListFieldValueV1
7+
from mindee.parsing.custom.list import ListField, ListFieldValue
88

99

10-
def _find_best_anchor(anchors: Sequence[str], fields: Dict[str, ListFieldV1]) -> str:
10+
def _find_best_anchor(anchors: Sequence[str], fields: Dict[str, ListField]) -> str:
1111
"""
1212
Find the anchor with the most rows, in the order specified by `anchors`.
1313
@@ -23,12 +23,12 @@ def _find_best_anchor(anchors: Sequence[str], fields: Dict[str, ListFieldV1]) ->
2323
return anchor
2424

2525

26-
class CustomLineV1:
26+
class CustomLine:
2727
"""Represent a single line."""
2828

2929
row_number: int
3030
"""Index of the row of a given line."""
31-
fields: Dict[str, ListFieldValueV1]
31+
fields: Dict[str, ListFieldValue]
3232
"""Fields contained in the line."""
3333
bbox: BBox
3434
"""Simplified bounding box of the line."""
@@ -38,7 +38,7 @@ def __init__(self, row_number: int):
3838
self.bbox = BBox(1, 1, 0, 0)
3939
self.fields = {}
4040

41-
def update_field(self, field_name: str, field_value: ListFieldValueV1) -> None:
41+
def update_field(self, field_name: str, field_value: ListFieldValue) -> None:
4242
"""
4343
Updates a field value if it exists.
4444
@@ -61,7 +61,7 @@ def update_field(self, field_name: str, field_value: ListFieldValueV1) -> None:
6161
merged_confidence = field_value.confidence
6262
merged_polygon = get_bounding_box(field_value.polygon)
6363

64-
self.fields[field_name] = ListFieldValueV1(
64+
self.fields[field_name] = ListFieldValue(
6565
{
6666
"content": merged_content,
6767
"confidence": merged_confidence,
@@ -70,9 +70,7 @@ def update_field(self, field_name: str, field_value: ListFieldValueV1) -> None:
7070
)
7171

7272

73-
def is_box_in_line(
74-
line: CustomLineV1, bbox: BBox, height_line_tolerance: float
75-
) -> bool:
73+
def is_box_in_line(line: CustomLine, bbox: BBox, height_line_tolerance: float) -> bool:
7674
"""
7775
Checks if the bbox fits inside the line.
7876
@@ -86,25 +84,25 @@ def is_box_in_line(
8684

8785

8886
def prepare(
89-
anchor_name: str, fields: Dict[str, ListFieldV1], height_line_tolerance: float
90-
) -> List[CustomLineV1]:
87+
anchor_name: str, fields: Dict[str, ListField], height_line_tolerance: float
88+
) -> List[CustomLine]:
9189
"""
9290
Prepares lines before filling them.
9391
9492
:param anchor_name: name of the anchor.
9593
:param fields: fields to build lines from.
9694
:param height_line_tolerance: line height tolerance for custom line reconstruction.
9795
"""
98-
lines_prepared: List[CustomLineV1] = []
96+
lines_prepared: List[CustomLine] = []
9997
try:
100-
anchor_field: ListFieldV1 = fields[anchor_name]
98+
anchor_field: ListField = fields[anchor_name]
10199
except KeyError as exc:
102100
raise MindeeError("No lines have been detected.") from exc
103101

104102
current_line_number: int = 1
105-
current_line = CustomLineV1(current_line_number)
103+
current_line = CustomLine(current_line_number)
106104
if anchor_field and len(anchor_field.values) > 0:
107-
current_value: ListFieldValueV1 = anchor_field.values[0]
105+
current_value: ListFieldValue = anchor_field.values[0]
108106
current_line.bbox = extend_bbox(
109107
current_line.bbox,
110108
current_value.polygon,
@@ -118,7 +116,7 @@ def prepare(
118116
):
119117
lines_prepared.append(current_line)
120118
current_line_number += 1
121-
current_line = CustomLineV1(current_line_number)
119+
current_line = CustomLine(current_line_number)
122120
current_line.bbox = extend_bbox(
123121
current_line.bbox,
124122
current_value.polygon,
@@ -140,26 +138,26 @@ def prepare(
140138
def get_line_items(
141139
anchors: Sequence[str],
142140
field_names: Sequence[str],
143-
fields: Dict[str, ListFieldV1],
141+
fields: Dict[str, ListField],
144142
height_line_tolerance: float = 0.01,
145-
) -> List[CustomLineV1]:
143+
) -> List[CustomLine]:
146144
"""
147145
Reconstruct line items from fields.
148146
149147
:anchors: Possible fields to use as an anchor
150148
:columns: All fields which are columns
151149
:fields: List of field names to reconstruct table with
152150
"""
153-
line_items: List[CustomLineV1] = []
154-
fields_to_transform: Dict[str, ListFieldV1] = {}
151+
line_items: List[CustomLine] = []
152+
fields_to_transform: Dict[str, ListField] = {}
155153
for field_name, field_value in fields.items():
156154
if field_name in field_names:
157155
fields_to_transform[field_name] = field_value
158156
anchor = _find_best_anchor(anchors, fields_to_transform)
159157
if not anchor:
160158
print(Warning("Could not find an anchor!"))
161159
return line_items
162-
lines_prepared: List[CustomLineV1] = prepare(
160+
lines_prepared: List[CustomLine] = prepare(
163161
anchor, fields_to_transform, height_line_tolerance
164162
)
165163

0 commit comments

Comments
 (0)