Skip to content

This project demonstrates how to use Amazon Bedrock to extract data from documents and images using your LLM of choice.

License

Notifications You must be signed in to change notification settings

aws-samples/sample-amazon-bedrock-idp-java-cdk

Amazon Bedrock IDP using Java CDK

This project demonstrates how to use Amazon Bedrock to extract data from documents and images using your LLM of choice. The infrastructure is defined using AWS CDK with Java.

Architecture

Architecture Diagram

Below are the services that are required for deploying the solution.

Deploy using AWS CDK

Pre-requisites

Before getting started, make sure you have the following:

  • AWS Account

  • Java Development Kit (JDK) installed on your local machine

    • Java 21 or later. If missing install Amazon Corretto Java 21 from here.

      java --version
    • Maven 3.9 or later. If missing install Maven from here.

    • Note: Java version showed in the below output should be 21 or later.

      mvn --version
  • AWS CLI configured with valid credentials

    • AWS CLI. If missing install latest AWS CLI from here.
      aws --version
  • Node.js and npm installed (required for CDK)

    • Node.js 22.x or later. If missing install Node.js from here.
      node --version
  • AWS CDK - Install the latest AWS CDK Toolkit globally using the following command:

    npm install -g aws-cdk
    cdk --version
    • CDK Bootstrap - Bootstrap your AWS account for CDK. This only needs to be done once per account/region.
      cdk bootstrap aws://<account>/<region>

Installation

Clone this repository and navigate to the project directory.

git clone https://<Repo-Url>.git
cd sample-amazon-bedrock-idp-java-cdk

Build

Run below build command from the root directory of the project.

mvn clean install

Deployment

Change to the Infra directory of the project.

cd Infra

CDK deployment:

Run the below command to deploy the application. Note: CDK would need to your approval before deploying.

cdk deploy

To skip the approval step, please use the below command.

cdk deploy --require-approval never

Verify

Make sure you are in the right AWS account and region.

AWS CloudFormation will create similar to below resources
Note: Not all the resources are shown in the screenshot below. AWSCloudformation_Resources.png

Bedrock IDP Process

Step 1: Enable the Bedrock Model Access

Navigate to Amazon Bedrock -> Model Access and enable the model as shown below; ModelAccess.png

  1. Make sure you are in the right AWS region. Example: us-west-2
  2. Enable specific models or all models.
    1. Default model used in this sample is "Anthropic Claude 3.5 Sonnet v2"
    2. Make sure you are enabling this model.
  3. If you would like to use a different model, please enable the model here in the 'Model Access'.
    1. Update the "Model_ID" environment variable in the AWS Lambda (BedrockIDPFunction) created by CDK.

Step 2: Upload your documents to S3 Bucket

  1. Navigate to your source Amazon S3 bucket created by CDK.
  2. Upload the documents for which you need to extract the information.

Important Note:

  1. Make sure you are uploading same document types so that extracted data aligns with your expectation.
    Example: All the documents are images of driver license.
  2. Both Amazon S3 and Amazon DynamoDB where Data is stored are encrypted by this solution.

Step 3: Update your extraction prompt in Parameter Store created by CDK

Navigate to AWS Systems Manager -> Parameter Store as shown below;

ExtractionPromptSSM.png

  1. Click 'Edit' and add/update your extraction Prompt to the LLM.
  2. This prompt will be used for data extraction from your documents.

Note: Make sure your prompt is calling out to produce a valid JSON document without any filler words in the response.

Step 4: Confirm the DynamoDB Tables created by CDK

Below is the detail of the BedrockIDP-Table created by CDK.

  • This will be used to store the extracted information from the source S3 bucket.
  • | Attribute Name | Attribute Type | Key Type |
  • | fileName | String | Primary Key |

Note: other Column Keys are created from the extracted data

Step 5: Execute the AWS Step Function for BedrockIDP Process

This step function will extract each from the S3 bucket to the DynamoDB table created by CDK.

Navigate to the Step Functions Console and click on the Step Function created by CDK. AWSStepFunctions.png

Click on the Step Function name and click on View details -> Start Execution. AWSStepFunctionsStartExecution.png

Once the execution is completed successfully as shown below; AWSStepFunctionsExecutionSuccess.png

After successful execution, data from all the files in the source S3 bucket are extracted and populated in the following;

  1. DynamoDB table created by CDK.
  2. Populated as JSON file on destination S3 bucket created by CDK.

Navigate to Amazon DynamoDB Console and click on the table name which ends with '-Contacts' and click on Items to verify the import.

Cleanup

Run the below command to delete the application.

cdk destroy

This will delete all the provisioned resources from your AWS account.

About

This project demonstrates how to use Amazon Bedrock to extract data from documents and images using your LLM of choice.

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages