Skip to content

This project demonstrates how to build a cost-effective Retrieval-Augmented Generation (RAG) solution using Amazon DynamoDB as a vector store for small use cases, enabling small businesses to implement AI personalization without the high costs typically associated with specialized vector databases.

License

Notifications You must be signed in to change notification settings

aws-solutions-library-samples/guidance-for-low-cost-semantic-search-on-aws

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Guidance for Low Cost Semantic search on AWS

Table of Contents

  1. Overview
  2. Prerequisites
  3. Deployment Steps
  4. Deployment Validation
  5. Running the Guidance
  6. Next Steps
  7. Cleanup
  8. FAQ, known issues, additional considerations, and limitations
  9. Notices
  10. Authors

Overview

Creating RAG architectures tends to be cost prohibitive to small and medium business given the relative High cost of vector DBs, this solution aims to provide a cost effective vector store taking into account the smaller sizes of data used for RAG in Small and Medium Business

Architecture Diagram

architecture

Cost

You are responsible for the cost of the AWS services used while running this Guidance. As of Feb 2025, the cost for running this Guidance with the default settings in the US East (N. Virginia) is approximately $29.16 per month for processing and querying (200 records PDF docs, 200KB average file size (10 pags), 6 queries per hour).

We recommend creating a Budget through AWS Cost Explorer to help manage costs. Prices are subject to change. For full details, refer to the pricing webpage for each AWS service used in this Guidance.

Sample Cost Table

These scenarios cover a range of industries and project types, providing examples of how to estimate costs for software development, construction projects, manufacturing processes, and more. Each scenario includes detailed information about the project requirements, constraints, and assumptions, allowing you to follow the step-by-step process of estimating costs.

  • Scenario1: 100 PDF docs, 200KB averagefile size (10 pags), 6 queries per hour
  • Scenario2: 200 PDF docs, 200KB average file size (10 pags), 6 queries per hour
  • Scenario3: 300 PDF docs, 200KB average file size (10 pags), 6 queries per hour
  • Scenario4: 400 PDF docs, 200KB average file size (10 pags), 10 queries per hour

The following table provides a sample cost breakdown for deploying this Guidance with the default parameters in the US East (N. Virginia) Region for one month.

Service Scenario1 Scenario2 Scenario3 Scenario4
Storage(S3) $0.03 $0.03 $0.01 $0.25
Compute(Lambda & ApiGw) $0.00 $0.00 $0.00 $0.63
Amazon DynamoDB r/w $4.96 $9.89 $14.81 $32.88
Amazon Textract* $1.50 $3.00 $4.50 $6.00
Amazon Bedrock Embedding $2.00 $4.00 $6.00 $8.00
Amazon Bedrock Claude $7.14 $7.14 $7.14 $11.90
AWS WAF $8.1 $8.1 $8.1 $8.1
AWS StepFunctions* $0.05 $0.1 $0.14 $0.19
Total $23.78 $29.26 $40.7 $67.95

* AWS StepFunctions and Amazon Textract are only billed during document ingestion process, this means it will not be a monthly recurring cost

Prerequisites

Operating System

This project uses CDK for deployment you should have either Docker/Podman/Finch. Follow your OS installation guide.

If you are using a container engine different than docker be add export CDK_DOCKER=$(which finch) or export CDK_DOCKER=$(which podman) to your bash profile

If you don't have CDK installed follow AWS official documentation for Prerequisites, and Getting Started

Ensure that you have credentials in your deployment server/computer one way is that your ~/.aws/config has credentials with enought permisions to deploy the solution.

[default]
region = us-east-1
output = json

And your ~/.aws/credentials

[default]
aws_access_key_id = XXXXXXXXXXXXXXX
aws_secret_access_key = XXXXXX/XXXXXXXXXXXXXXXXX

AWS account requirements

Solutions requires access to Amazon Bedrock, the Titan Text embeding model and Anthropic Claude model within it.

Below we will configure model access in Amazon Bedrock. Amazon Bedrock provides a variety of foundation models from several providers.

Follow the following instruction to setup Amazon Bedrock

  1. Find Amazon Bedrock by searching in the AWS console.

  2. Expand the side menu.

  3. From the side menu, select Model access. Select the Manage model access button.

  4. Use the checkboxes to select the models you wish to enable. If running from your own account, there is no cost to activate the models - you only pay for what you use during the labs. Review the applicable EULAs as needed. Select the following Anthropic models:

  • Claude 3 Haiku Model
  • Titan Text Embeddings V2

Click Request model access to activate the models in your account

  1. Monitor the model access status. It may take a few minutes for the models to have Access granted status. You can use the Refresh button to periodically check for updates.

  2. Verify that the model access status is Access granted for the selected Anthropic Claude model and Titan embeddings.

AWS cdk bootstrap

This Guidance uses aws-cdk. If you are using aws-cdk for first time, please perform the below bootstrapping”

  • If using virtual environments activate it depending your OS and install pip libraries to be able to use cdk deploy command i.e. pyenv activate environment

  • Navigate into the CDK directory. cd source/cdk

  • Install python libraries pip install -r requirements.txt

  • Bootstrap your environment. cdk bootstrap

Service limits

Keep in mind that all bedrock models have some service limmits that will affect if a user gets for the invoke model operation, for a up-to-date refer to Bedrock Limits.

  • On-demand InvokeModel requests per minute for Anthropic Claude 3 Haiku us-east-1: 1,000
  • On-demand InvokeModel tokens per minute for Anthropic Claude 3 Haiku 2,000,000

Supported Regions

This Guidance is built for us-east-1 region

Deployment Steps

  • Be sure you completed all the steps in Prerequisites

  • Navigate to the cdk directory cd source/cdk

  • Activate the environment pyenv activate environment

  • Review the Cloudformation Template. cdk synth

  • Deploy the stack cdk deploy

  • If you want to enable self user sign up (not recomended) use the following deploy command instead: cdk deploy --c selfSignup=True

Deployment Validation

  • Open CloudFormation console and verify the status of the template with the name starting with ChatbotStack.
  • If the deployment was successful you will see an output in the cdk called ChatbotStack.AdminPortal

Running the Guidance

UserCreation

You will need to create a User in Amazon Cognito. If you opted for SelfSignUp enabled you could navigate to the cloudfront distribution and you will get redirected to the Amazon Cognito HostedUI for user creation.

If you deployed WITHOUT --c selfSignup=True you need to create a user directly in the Amazon Cognito API/Console and add them to the default Group that is precreated.

  1. User uploads documents to the portal
  2. Through an API, raw docs are stored on Amazon S3
  3. Upload process triggers the ingestion state machine on Step Functions. Documents are processed by Textract and Claude to extract text from documents. Textract and Claude results are stored in raw format and Json on S3
  4. A Lambda function creates text chunks in vectors using Titan Embeddings model using Bedrock APIs
  5. According to chunk sizes, vectors are stored on separate DynamoDB tables

Config, Prompting and Inference Workflow

  1. User selects to either query the Textract or LLM vector store and updates prompt on the portal
  2. Through an API, the user query is sent to a Lambda Function
  3. User prompt is converted to vectors using Titan Embeddings model using Bedrock APIs
  4. Lambda retrieves vectors from DynamoDB tables with semantic similiraties
  5. User prompt + Context (vectors with semantic similiraties) are sent to Claude3 Haiku model using Bedrock APIs
  6. Response from Bedrock model is returned to user
  7. Conversation History is stored on a DynamoDB table for context

Next Steps

Multiple configuration may be applied to the process using Lambda Environment variables:

AIBotDockerLambda(prediction_lambda)

  • DYNAMO_PAGE_SIZE
  • TOLERANCE
  • EMBEDDING_MODEL_ID
  • MODEL_ID

StoreChunkDynamo(step4)

  • CHUNK_SIZE

Also You can integrate this Guidance using the pre-provided lex bot deployment to add them into other applications or systems

Cleanup

  • Navigate into the CDK directory. cd source/cdk
  • Deploy the stack cdk destroy

FAQ, known issues, additional considerations, and limitations

FAQ

Why does the first query fails?

Since the AIAgent runs on a AWS Lambda container, the first execution always takes more than 30 seconds from the cold start. That causes an Amazon API Gateway timeout. If the user tries again, everything should work correctly.

Is my file already processed?

This solution uses two AWS Step Functions to process the uploaded document, To verify the status of a document you can login to the AWS console, and lookup for the state machines that have the following naming format AIbotSMLLMParser and AIbotSM. In the execution tab of each State machine you will find the status of each document.

Is my file already deleted?

This solution uses a AWS Step Functions to delete the uploaded document, To verify the status of a document you can login to the AWS console, and lookup for the state machine that have the following naming format AIbotSMDeletion. In the execution tab of the State machine you will find the status of each document.

CognitoGroupNotFound error?

This means that the cognito user group is not assigned to the user making the requests, refer to User Creation. If you already asigned the group to the user and you are still receiving this error, try using the logout Button and authenticating again.

Notices

Customers are responsible for making their own independent assessment of the information in this Guidance. This Guidance: (a) is for informational purposes only, (b) represents AWS current product offerings and practices, which are subject to change without notice, and (c) does not create any commitments or assurances from AWS and its affiliates, suppliers or licensors. AWS products or services are provided “as is” without warranties, representations, or conditions of any kind, whether express or implied. AWS responsibilities and liabilities to its customers are controlled by AWS agreements, and this Guidance is not part of, nor does it modify, any agreement between AWS and its customers.

Authors

Name Contact LinkedIn
Gabriel Montero gasamoma@ LinkedIn Profile
Jose Miguel Gomez jmgomez@ LinkedIn Profile
Rafael Hernando Franco raffran@ LinkedIn Profile
Camilo Cortes hercamil@ LinkedIn Profile
Daniela Rojas lzanda@ LinkedIn Profile

About

This project demonstrates how to build a cost-effective Retrieval-Augmented Generation (RAG) solution using Amazon DynamoDB as a vector store for small use cases, enabling small businesses to implement AI personalization without the high costs typically associated with specialized vector databases.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 6