Skip to content

xlsx number and formula parsing issue #38691

@Aznath

Description

@Aznath
  • Package Name: azure-ai-documentintelligence
  • Package Version: 1.0.0b4
  • Operating System: Windows
  • Python Version: Python 3.11.0

Describe the bug
When parsing a .xlsx file with begin_analyze_document() function and the prebuilt-layout model, numbers smaller than 10 are not parsed. However, decimal numbers smaller than 10 are parsed. Cells containing values that comes from excel formulas are not parsed either.

To Reproduce
Steps to reproduce the behavior:

  1. Create a .xlsx file with numbers smaller than 10 and a cell containing a formula
  2. Parse the file
  3. check parsing output

Expected behavior
Numbers smaller than 10 and formula cells should be parsed. The parsed output should be as follow:

"page_content":"# Sheet1\ntest\nint < 10\n1\n2\n3\n4\n5\n6\n7\n8\n9\nfloat\n1.5\nint >= 10\n10\n11\n12\n13\n14\n15\nsum\n121.5\n"

Screenshots
excel sheet:
Image

parsed output:
"page_content":"# Sheet1\ntest\nint < 10\nfloat\n1.5\nint >= 10\n10\n11\n12\n13\n14\n15\nsum\n"

Metadata

Metadata

Assignees

Labels

ClientThis issue points to a problem in the data-plane of the library.Document IntelligenceService AttentionWorkflow: This issue is responsible by Azure service team.customer-reportedIssues that are reported by GitHub users external to the Azure organization.needs-team-attentionWorkflow: This issue needs attention from Azure service team or SDK teamquestionThe issue doesn't require a change to the product in order to be resolved. Most issues start as that

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions