This project demonstrates how to use Amazon Bedrock to extract data from documents and images using your LLM of choice. The infrastructure is defined using AWS CDK with Java.
Below are the services that are required for deploying the solution.
Before getting started, make sure you have the following:
-
AWS Account
-
Java Development Kit (JDK) installed on your local machine
-
AWS CLI configured with valid credentials
- AWS CLI. If missing install latest AWS CLI from here.
aws --version
- AWS CLI. If missing install latest AWS CLI from here.
-
Node.js and npm installed (required for CDK)
- Node.js 22.x or later. If missing install Node.js from here.
node --version
- Node.js 22.x or later. If missing install Node.js from here.
-
AWS CDK - Install the latest AWS CDK Toolkit globally using the following command:
npm install -g aws-cdk
cdk --version
- CDK Bootstrap - Bootstrap your AWS account for CDK. This only needs to be done once per account/region.
cdk bootstrap aws://<account>/<region>
- CDK Bootstrap - Bootstrap your AWS account for CDK. This only needs to be done once per account/region.
Clone this repository and navigate to the project directory.
git clone https://<Repo-Url>.git
cd sample-amazon-bedrock-idp-java-cdk
Run below build command from the root directory of the project.
mvn clean install
Change to the Infra directory of the project.
cd Infra
Run the below command to deploy the application. Note: CDK would need to your approval before deploying.
cdk deploy
To skip the approval step, please use the below command.
cdk deploy --require-approval never
Make sure you are in the right AWS account and region.
AWS CloudFormation will create similar to below resources
Note: Not all the resources are shown in the screenshot below.
Navigate to Amazon Bedrock -> Model Access and enable the model as shown below;
- Make sure you are in the right AWS region. Example: us-west-2
- Enable specific models or all models.
- Default model used in this sample is "Anthropic Claude 3.5 Sonnet v2"
- Make sure you are enabling this model.
- If you would like to use a different model, please enable the model here in the 'Model Access'.
- Update the "Model_ID" environment variable in the AWS Lambda (BedrockIDPFunction) created by CDK.
- Navigate to your source Amazon S3 bucket created by CDK.
- Upload the documents for which you need to extract the information.
Important Note:
- Make sure you are uploading same document types so that extracted data aligns with your expectation.
Example: All the documents are images of driver license. - Both Amazon S3 and Amazon DynamoDB where Data is stored are encrypted by this solution.
Navigate to AWS Systems Manager -> Parameter Store as shown below;
- Click 'Edit' and add/update your extraction Prompt to the LLM.
- This prompt will be used for data extraction from your documents.
Note: Make sure your prompt is calling out to produce a valid JSON document without any filler words in the response.
Below is the detail of the BedrockIDP-Table created by CDK.
- This will be used to store the extracted information from the source S3 bucket.
- | Attribute Name | Attribute Type | Key Type |
- | fileName | String | Primary Key |
Note: other Column Keys are created from the extracted data
This step function will extract each from the S3 bucket to the DynamoDB table created by CDK.
Navigate to the Step Functions Console and click on the Step Function created by CDK.
Click on the Step Function name and click on View details -> Start Execution.
Once the execution is completed successfully as shown below;
After successful execution, data from all the files in the source S3 bucket are extracted and populated in the following;
- DynamoDB table created by CDK.
- Populated as JSON file on destination S3 bucket created by CDK.
Navigate to Amazon DynamoDB Console and click on the table name which ends with '-Contacts' and click on Items to verify the import.
Run the below command to delete the application.
cdk destroy
This will delete all the provisioned resources from your AWS account.