-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
Environment
Item | Value |
---|---|
Studio build | 2025-07-08 (visible in browser dev-tools “appVersion”) |
API version | 2024-11-30 |
Resource region | eastus |
Browser | Chrome 138 on Linux |
Storage SDK | N/A (UI only) |
Bug description
Since the Studio update that rolled out during the week of 8 July 2025, every time I run Run layout, Auto-label, or manually save labels:
- the newly-generated
*.labels.json
,*.ocr.json
, andfields.json
files are saved at the root of the blob container
(trainingdata-invoices/
) - they are not saved beside the corresponding PDFs inside the folder path I set in the UI
(trainingdata-invoices/printsolutions/
).
Because the project’s “Folder path” remains printsolutions/
, the subsequent Train call only scans that sub-folder, does not see any label/OCR files, and throws:
ModelBuildError: Could not build the model: Can't find any valid labels for provided dataset …
Expected behaviour
Per the product documentation:
- “The
.labels.json
and.ocr.json
files correspond to each document in your training dataset.” :contentReference[oaicite:0]{index=0} - “If your documents are in a subfolder, enter the relative path from the container root in the Folder path field.” :contentReference[oaicite:1]{index=1}
- “When you train, you need to direct the API to a subfolder.” :contentReference[oaicite:2]{index=2}
Therefore the Studio should create the helper JSON files inside the same sub-folder that was configured in Folder path, so that training can find them.
Actual behaviour
Helper JSON files are always written to the container root, ignoring the selected folder path.
Impact
- Severity = High – Every custom extraction project that keeps its PDFs in sub-folders now fails to train.
- The only workaround is to copy or move the JSON files into the correct sub-folder before pressing Train, or to abandon sub-folders altogether.
Steps to reproduce
- Create a Custom extraction project in Document Intelligence Studio.
- Storage
- Container:
trainingdata-invoices
- Folder path:
printsolutions
- Container:
- Observe that the UI correctly lists
printsolutions/#64311.pdf
…#64315.pdf
. - Select any document → Run layout or Auto-label.
- Check the storage container:
trainingdata-invoices/*.ocr.json
andtrainingdata-invoices/*.labels.json
now exist.- The folder
trainingdata-invoices/printsolutions/
still contains only PDFs.
- Click Train → build fails with Can’t find any valid labels.
Workarounds
- Manual copy
Move all newly-created JSON files from the container root into the expected sub-folder before training. - Flatten container
Put each dataset in its own container and leave Folder path blank (sacrifices folder organisation). - API-only training
Trigger training via SDK/REST without asourceFilter.prefix
, so the service looks at the root where Studio now writes the files.
Requested action
Please confirm the regression and restore the previous behaviour where label/OCR/fields JSON files are written to the folder specified in Folder path.
If the change is intentional, please update the Studio and documentation accordingly and provide guidance on how to train datasets that are organised in sub-folders without manual file moves.