Description:
- Form Recognizer API evolves quickly to increase transcription accuracy, add new supported languages and/or simply onboard new features. custom_read_api is an Azure Cognitive Search skill to integrate Azure Form Recognizer Read API within an Azure Cognitive Search skillset. By default the in built OCR skill uses the latest GA version of Read API (which does not included latest and greatest developments) so this skill enables fresh previews to be leveraged.
Languages:
Products:
- Azure Cognitive Search
- Azure Cognitive Services (Form Recognizer Read API https://docs.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/concept-read)
- Azure Functions
- Create or reuse a Form Recognizer resource. Creation can be done from the Azure portal or programatically
- Create a Python Function in Azure, for example this is a good starting point
- Clone this repository
- Open the folder in VS Code and deploy the function, find here a tutorial
- Fill your Functions appsettings with the details from your deployment ('FR_ENDPOINT', 'FR_ENDPOINT_KEY' with the info you got in Azure Portal) (ie FR_ENDPOINT can be something like https://westeurope.api.cognitive.microsoft.com)
- Add a field in your index where you will dump the enriched entities, more info here
- Add the skill to your skillset as described below
- Add the output field mapping in your indexer as seen in the sample
- Run the indexer
You can find a sample input for the skill here
{
"values": [
{
"recordId": "0",
"data":{
"Url": "yourbase64encoded",
"SasToken": "?sp=rac_restofyoursastokenhere"
}
}
]
}
{
"values": [
{
"recordId": "0",
"data": {
"text": {
"your merged text after OCR transcription provides line by line within each of the pages"
}
}
}
]
}
In order to use this skill in a cognitive search pipeline, you'll need to add a skill definition to your skillset. Here's a sample skill definition for this example (inputs and outputs should be updated to reflect your particular scenario and skillset environment):
{
"@odata.type": "#Microsoft.Skills.Text.KeyPhraseExtractionSkill",
"name": "#1",
"description": "Takes the merged text and processes it to filter only the key phrases",
"context": "/document/merged_content",
"defaultLanguageCode": "en",
"maxKeyPhraseCount": null,
"modelVersion": null,
"inputs": [
{
"name": "text",
"source": "/document/merged_content"
}
],
"outputs": [
{
"name": "keyPhrases",
"targetName": "keyphrases"
}
]
},
{
"@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
"name": "#2",
"description": null,
"context": "/document",
"uri": "https://yourappfuncname.azurewebsites.net/api/function_name?code=unique_code_for_auth_here",
"httpMethod": "POST",
"timeout": "PT2M60S",
"batchSize": 2,
"degreeOfParallelism": null,
"inputs": [
{
"name": "Url",
"source": "/document/metadata_storage_path"
},
{
"name": "SasToken",
"source": "/document/metadata_storage_sas_token"
}
],
"outputs": [
{
"name": "text",
"targetName": "merged_content"
}
],
"httpHeaders": {}
}
This skill internally calls a given endpoint for FR, navigates the output, trims the response to only fetch the text (leaving aside layout and confidence levels) and concats the text into one single string.
{
"name": "keyphrases",
"type": "Collection(Edm.String)",
"facetable": false,
"filterable": false,
"retrievable": true,
"searchable": true,
"analyzer": "standard.lucene",
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "merged_content",
"type": "Edm.String",
"facetable": false,
"filterable": false,
"key": false,
"retrievable": true,
"searchable": true,
"sortable": false,
"analyzer": "standard.lucene",
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
The output enrichment of your skill can be directly mapped to one of your fields described above. By default it will be automatically mapped if there is a field named as the inner AI enriched targets (as in this case for merged_content and keyphrases, so no need to map it manually). This can be done with the indexer setting:
"outputFieldMappings": []