Skip to content

Document Intelligence Studio (build ~2025-07-08) writes .labels.json / .ocr.json to the *container root* instead of the selected sub-folder → training fails with “Can’t find any valid labels” #41969

@Buschleague

Description

@Buschleague

Environment

Item Value
Studio build 2025-07-08 (visible in browser dev-tools “appVersion”)
API version 2024-11-30
Resource region eastus
Browser Chrome 138 on Linux
Storage SDK N/A (UI only)

Bug description

Since the Studio update that rolled out during the week of 8 July 2025, every time I run Run layout, Auto-label, or manually save labels:

  • the newly-generated *.labels.json, *.ocr.json, and fields.json files are saved at the root of the blob container
    (trainingdata-invoices/)
  • they are not saved beside the corresponding PDFs inside the folder path I set in the UI
    (trainingdata-invoices/printsolutions/).

Because the project’s “Folder path” remains printsolutions/, the subsequent Train call only scans that sub-folder, does not see any label/OCR files, and throws:

ModelBuildError: Could not build the model: Can't find any valid labels for provided dataset …

Expected behaviour

Per the product documentation:

  • “The .labels.json and .ocr.json files correspond to each document in your training dataset.” :contentReference[oaicite:0]{index=0}
  • “If your documents are in a subfolder, enter the relative path from the container root in the Folder path field.” :contentReference[oaicite:1]{index=1}
  • “When you train, you need to direct the API to a subfolder.” :contentReference[oaicite:2]{index=2}

Therefore the Studio should create the helper JSON files inside the same sub-folder that was configured in Folder path, so that training can find them.


Actual behaviour

Helper JSON files are always written to the container root, ignoring the selected folder path.


Impact

  • Severity = High – Every custom extraction project that keeps its PDFs in sub-folders now fails to train.
  • The only workaround is to copy or move the JSON files into the correct sub-folder before pressing Train, or to abandon sub-folders altogether.

Steps to reproduce

  1. Create a Custom extraction project in Document Intelligence Studio.
  2. Storage
    • Container: trainingdata-invoices
    • Folder path: printsolutions
  3. Observe that the UI correctly lists printsolutions/#64311.pdf#64315.pdf.
  4. Select any document → Run layout or Auto-label.
  5. Check the storage container:
    • trainingdata-invoices/*.ocr.json and trainingdata-invoices/*.labels.json now exist.
    • The folder trainingdata-invoices/printsolutions/ still contains only PDFs.
  6. Click Train → build fails with Can’t find any valid labels.

Workarounds

  • Manual copy
    Move all newly-created JSON files from the container root into the expected sub-folder before training.
  • Flatten container
    Put each dataset in its own container and leave Folder path blank (sacrifices folder organisation).
  • API-only training
    Trigger training via SDK/REST without a sourceFilter.prefix, so the service looks at the root where Studio now writes the files.

Requested action

Please confirm the regression and restore the previous behaviour where label/OCR/fields JSON files are written to the folder specified in Folder path.

If the change is intentional, please update the Studio and documentation accordingly and provide guidance on how to train datasets that are organised in sub-folders without manual file moves.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Document IntelligenceService AttentionWorkflow: This issue is responsible by Azure service team.bugThis issue requires a change to an existing behavior in the product in order to be resolved.customer-reportedIssues that are reported by GitHub users external to the Azure organization.needs-team-attentionWorkflow: This issue needs attention from Azure service team or SDK team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions