Skip to content

HemanthGangula/File-Processing-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

File Processing System

Overview

This project implements a serverless file processing system using AWS Lambda, S3, and DynamoDB. The system automatically processes CSV files uploaded to an S3 bucket, extracts metadata, and stores it in a DynamoDB table. This is designed with least privilege access control and can handle files up to 10MB.

Features

  • Automatic Processing: Uploading a CSV file to the S3 bucket triggers a Lambda function.
  • Metadata Extraction: The Lambda function extracts relevant metadata (row count, column count, column names, etc.).
  • Storage in DynamoDB: The extracted metadata is stored in a DynamoDB table.
  • Minimal Permissions: Follows the principle of least privilege, ensuring secure access.
  • Fully Customizable: Can be adapted for different file formats, metadata extraction logic, and database choices.

Architecture

  1. A CSV file is uploaded to <bucket_name>.
  2. An S3 event triggers Lambda (<lambda_function_name>).
  3. Lambda extracts metadata and stores it in DynamoDB (<database_name>).
  4. (Optional) A notification mechanism can be integrated.

Setup Instructions

1. IAM Role Setup

Create an IAM Role (<iam_role_name>) with permissions to access S3, Lambda, DynamoDB, and CloudWatch Logs.

2. S3 Bucket Setup

Create an S3 bucket (<bucket_name>) and set up an event notification to trigger Lambda on PutObject events for .csv files.

3. DynamoDB Table Setup

Create a DynamoDB table (<database_name>) with:

  • Partition Key: filename (String)
  • Sort Key: upload_timestamp (Number)

4. Lambda Function Setup

  • Create a Lambda function (<lambda_function_name>) with Python 3.12 runtime.
  • Attach the IAM Role (<iam_role_name>).
  • Configure environment variables:
    S3_BUCKET_NAME = <bucket_name>
    DYNAMODB_TABLE = <database_name>
    
  • Set memory: 256MB, timeout: 30s, ephemeral storage: 512MB.
  • Attach S3 trigger with suffix .csv and enable recursive invocation prevention.

Deployment

1. Install Dependencies

Ensure you have AWS CLI and required Python packages:

pip install -r requirements.txt

2. Package and Upload Lambda Code

Run the following commands to update the Lambda function:

rm function.zip
zip -r function.zip . -x "*.zip" "*.pyc" "__pycache__/*"
aws lambda update-function-code --function-name <lambda_function_name> --zip-file fileb://function.zip --region us-east-1

3. Verify Data in DynamoDB

Check if metadata is stored correctly in DynamoDB:

aws dynamodb scan --table-name <database_name>

Limitations

  • Designed for CSV files up to 10MB.

Customization

This system can be modified to:

  • Process different file types (JSON, XML, etc.).
  • Extract custom metadata fields.
  • Integrate notification mechanisms (SNS, SQS, etc.).
  • Use alternative databases (RDS, Elasticsearch, etc.).

Conclusion

This project provides a scalable, serverless, and secure file processing solution using AWS services. It follows best practices like least privilege IAM policies and event-driven automation. Future improvements can include error handling, notifications, and local development using LocalStack.


Note: Ensure that <bucket_name>, <database_name>, <lambda_function_name>, and <iam_role_name> are replaced with the actual resource names in your AWS account.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages