Guidance for Low Cost Semantic search on AWS

Overview

Creating RAG architectures tends to be cost prohibitive to small and medium business given the relative High cost of vector DBs, this solution aims to provide a cost effective vector store taking into account the smaller sizes of data used for RAG in Small and Medium Business

Architecture Diagram

Cost

You are responsible for the cost of the AWS services used while running this Guidance. As of Feb 2025, the cost for running this Guidance with the default settings in the US East (N. Virginia) is approximately $29.16 per month for processing and querying (200 records PDF docs, 200KB average file size (10 pags), 6 queries per hour).

We recommend creating a Budget through AWS Cost Explorer to help manage costs. Prices are subject to change. For full details, refer to the pricing webpage for each AWS service used in this Guidance.

Sample Cost Table

These scenarios cover a range of industries and project types, providing examples of how to estimate costs for software development, construction projects, manufacturing processes, and more. Each scenario includes detailed information about the project requirements, constraints, and assumptions, allowing you to follow the step-by-step process of estimating costs.

Scenario1: 100 PDF docs, 200KB averagefile size (10 pags), 6 queries per hour
Scenario2: 200 PDF docs, 200KB average file size (10 pags), 6 queries per hour
Scenario3: 300 PDF docs, 200KB average file size (10 pags), 6 queries per hour
Scenario4: 400 PDF docs, 200KB average file size (10 pags), 10 queries per hour

The following table provides a sample cost breakdown for deploying this Guidance with the default parameters in the US East (N. Virginia) Region for one month.

Service	Scenario1	Scenario2	Scenario3	Scenario4
Storage(S3)	$0.03	$0.03	$0.01	$0.25
Compute(Lambda & ApiGw)	$0.00	$0.00	$0.00	$0.63
Amazon DynamoDB r/w	$4.96	$9.89	$14.81	$32.88
Amazon Textract*	$1.50	$3.00	$4.50	$6.00
Amazon Bedrock Embedding	$2.00	$4.00	$6.00	$8.00
Amazon Bedrock Claude	$7.14	$7.14	$7.14	$11.90
AWS WAF	$8.1	$8.1	$8.1	$8.1
AWS StepFunctions*	$0.05	$0.1	$0.14	$0.19
Total	$23.78	$29.26	$40.7	$67.95

* AWS StepFunctions and Amazon Textract are only billed during document ingestion process, this means it will not be a monthly recurring cost

Prerequisites

Operating System

This project uses CDK for deployment you should have either Docker/Podman/Finch. Follow your OS installation guide.

If you are using a container engine different than docker be add export CDK_DOCKER=$(which finch) or export CDK_DOCKER=$(which podman) to your bash profile

If you don't have CDK installed follow AWS official documentation for Prerequisites, and Getting Started

Ensure that you have credentials in your deployment server/computer one way is that your ~/.aws/config has credentials with enought permisions to deploy the solution.

[default]
region = us-east-1
output = json

And your ~/.aws/credentials

[default]
aws_access_key_id = XXXXXXXXXXXXXXX
aws_secret_access_key = XXXXXX/XXXXXXXXXXXXXXXXX

AWS account requirements

Solutions requires access to Amazon Bedrock, the Titan Text embeding model and Anthropic Claude model within it.

Below we will configure model access in Amazon Bedrock. Amazon Bedrock provides a variety of foundation models from several providers.

Follow the following instruction to setup Amazon Bedrock

Find Amazon Bedrock by searching in the AWS console.
Expand the side menu.
From the side menu, select Model access. Select the Manage model access button.
Use the checkboxes to select the models you wish to enable. If running from your own account, there is no cost to activate the models - you only pay for what you use during the labs. Review the applicable EULAs as needed. Select the following Anthropic models:

Claude 3 Haiku Model
Titan Text Embeddings V2

Click Request model access to activate the models in your account

Monitor the model access status. It may take a few minutes for the models to have Access granted status. You can use the Refresh button to periodically check for updates.
Verify that the model access status is Access granted for the selected Anthropic Claude model and Titan embeddings.

AWS cdk bootstrap

This Guidance uses aws-cdk. If you are using aws-cdk for first time, please perform the below bootstrapping”

If using virtual environments activate it depending your OS and install pip libraries to be able to use cdk deploy command i.e. pyenv activate environment
Navigate into the CDK directory. cd source/cdk
Install python libraries pip install -r requirements.txt
Bootstrap your environment. cdk bootstrap

Service limits

Keep in mind that all bedrock models have some service limmits that will affect if a user gets for the invoke model operation, for a up-to-date refer to Bedrock Limits.

On-demand InvokeModel requests per minute for Anthropic Claude 3 Haiku us-east-1: 1,000
On-demand InvokeModel tokens per minute for Anthropic Claude 3 Haiku 2,000,000

Supported Regions

This Guidance is built for us-east-1 region

Deployment Steps

Be sure you completed all the steps in Prerequisites
Navigate to the cdk directory cd source/cdk
Activate the environment pyenv activate environment
Review the Cloudformation Template. cdk synth
Deploy the stack cdk deploy
If you want to enable self user sign up (not recomended) use the following deploy command instead: cdk deploy --c selfSignup=True

Deployment Validation

Open CloudFormation console and verify the status of the template with the name starting with ChatbotStack.
If the deployment was successful you will see an output in the cdk called ChatbotStack.AdminPortal

Running the Guidance

UserCreation

You will need to create a User in Amazon Cognito. If you opted for SelfSignUp enabled you could navigate to the cloudfront distribution and you will get redirected to the Amazon Cognito HostedUI for user creation.

If you deployed WITHOUT --c selfSignup=True you need to create a user directly in the Amazon Cognito API/Console and add them to the default Group that is precreated.

User uploads documents to the portal
Through an API, raw docs are stored on Amazon S3
Upload process triggers the ingestion state machine on Step Functions. Documents are processed by Textract and Claude to extract text from documents. Textract and Claude results are stored in raw format and Json on S3
A Lambda function creates text chunks in vectors using Titan Embeddings model using Bedrock APIs
According to chunk sizes, vectors are stored on separate DynamoDB tables

Config, Prompting and Inference Workflow

User selects to either query the Textract or LLM vector store and updates prompt on the portal
Through an API, the user query is sent to a Lambda Function
User prompt is converted to vectors using Titan Embeddings model using Bedrock APIs
Lambda retrieves vectors from DynamoDB tables with semantic similiraties
User prompt + Context (vectors with semantic similiraties) are sent to Claude3 Haiku model using Bedrock APIs
Response from Bedrock model is returned to user
Conversation History is stored on a DynamoDB table for context

Next Steps

Multiple configuration may be applied to the process using Lambda Environment variables:

AIBotDockerLambda(prediction_lambda)

DYNAMO_PAGE_SIZE
TOLERANCE
EMBEDDING_MODEL_ID
MODEL_ID

StoreChunkDynamo(step4)

CHUNK_SIZE

Also You can integrate this Guidance using the pre-provided lex bot deployment to add them into other applications or systems

Cleanup

Navigate into the CDK directory. cd source/cdk
Deploy the stack cdk destroy

FAQ, known issues, additional considerations, and limitations

FAQ

Why does the first query fails?

Since the AIAgent runs on a AWS Lambda container, the first execution always takes more than 30 seconds from the cold start. That causes an Amazon API Gateway timeout. If the user tries again, everything should work correctly.

Is my file already processed?

This solution uses two AWS Step Functions to process the uploaded document, To verify the status of a document you can login to the AWS console, and lookup for the state machines that have the following naming format AIbotSMLLMParser and AIbotSM. In the execution tab of each State machine you will find the status of each document.

Is my file already deleted?

This solution uses a AWS Step Functions to delete the uploaded document, To verify the status of a document you can login to the AWS console, and lookup for the state machine that have the following naming format AIbotSMDeletion. In the execution tab of the State machine you will find the status of each document.

CognitoGroupNotFound error?

This means that the cognito user group is not assigned to the user making the requests, refer to User Creation. If you already asigned the group to the user and you are still receiving this error, try using the logout Button and authenticating again.

Notices

Customers are responsible for making their own independent assessment of the information in this Guidance. This Guidance: (a) is for informational purposes only, (b) represents AWS current product offerings and practices, which are subject to change without notice, and (c) does not create any commitments or assurances from AWS and its affiliates, suppliers or licensors. AWS products or services are provided “as is” without warranties, representations, or conditions of any kind, whether express or implied. AWS responsibilities and liabilities to its customers are controlled by AWS agreements, and this Guidance is not part of, nor does it modify, any agreement between AWS and its customers.

Authors

Name	Contact	LinkedIn
Gabriel Montero	gasamoma@	LinkedIn Profile
Jose Miguel Gomez	jmgomez@	LinkedIn Profile
Rafael Hernando Franco	raffran@	LinkedIn Profile
Camilo Cortes	hercamil@	LinkedIn Profile
Daniela Rojas	lzanda@	LinkedIn Profile

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
assets		assets
deployment		deployment
source		source
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Guidance for Low Cost Semantic search on AWS

Table of Contents

Overview

Architecture Diagram

Cost

Sample Cost Table

Prerequisites

Operating System

AWS account requirements

AWS cdk bootstrap

Service limits

Supported Regions

Deployment Steps

Deployment Validation

Running the Guidance

UserCreation

Next Steps

Cleanup

FAQ, known issues, additional considerations, and limitations

Why does the first query fails?

Is my file already processed?

Is my file already deleted?

CognitoGroupNotFound error?

Notices

Authors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 6

Uh oh!

Languages

License

aws-solutions-library-samples/guidance-for-low-cost-semantic-search-on-aws

Folders and files

Latest commit

History

Repository files navigation

Guidance for Low Cost Semantic search on AWS

Table of Contents

Overview

Architecture Diagram

Cost

Sample Cost Table

Prerequisites

Operating System

AWS account requirements

AWS cdk bootstrap

Service limits

Supported Regions

Deployment Steps

Deployment Validation

Running the Guidance

UserCreation

Next Steps

Cleanup

FAQ, known issues, additional considerations, and limitations

Why does the first query fails?

Is my file already processed?

Is my file already deleted?

CognitoGroupNotFound error?

Notices

Authors

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 6

Uh oh!

Languages

Packages