- Overview
- Features
- Prerequisites
- Installation
- Configuration
- Usage
- API Reference
- Error Handling
- Best Practices
- Troubleshooting
- Security Considerations
- Contributing
- License
MarkerUnmaker is a Python utility designed to efficiently remove delete markers from Amazon S3 versioned buckets. Delete markers are created when objects are deleted in versioned S3 buckets, and over time, these can accumulate and impact storage costs and bucket performance.
Delete markers are placeholder objects created when you delete an object in a versioned S3 bucket. They don't contain actual data but mark the object as "deleted" while preserving previous versions. Removing unnecessary delete markers can:
- Reduce storage costs
- Improve bucket listing performance
- Clean up bucket organization
- Optimize backup and replication processes
- β Batch Processing: Efficiently processes up to 1000 delete markers per API call
- β Pagination Support: Handles buckets with unlimited numbers of objects
- β Robust Error Handling: Graceful handling of AWS API errors with fallback mechanisms
- β Security First: Supports AWS credential best practices
- β Prefix Filtering: Target specific object prefixes within buckets
- β Progress Tracking: Real-time feedback on processing status
- β Dry Run Mode: Preview operations before execution
- β Enterprise Ready: Suitable for production environments
- Python 3.7 or higher
- AWS CLI configured (recommended) or valid AWS credentials
- Network access to AWS S3 endpoints
- Valid AWS account with S3 access
- IAM permissions for S3 operations (see Security Considerations)
- S3 bucket with versioning enabled
boto3>=1.26.0
botocore>=1.29.0
# Download the script
curl -O https://raw.githubusercontent.com/AoGnyan/S3-Million-Version-Markers-Annihilator-/refs/heads/main/markerunmaker.py
# Install dependencies
pip install boto3 botocore
git clone https://github.com/AoGnyan/S3-Million-Version-Markers-Annihilator-.git
cd markerunmaker
pip install -r requirements.txt
pip install markerunmaker
The tool uses environment variables for configuration, following AWS best practices:
# Required
export S3_BUCKET_NAME="your-bucket-name"
# Optional
export S3_PREFIX="path/to/objects/" # Default: "" (all objects)
export AWS_REGION="us-west-2" # Default: "us-east-1"
export AWS_PROFILE="your-aws-profile" # Default: "default"
Create a config.json
file for advanced configuration:
{
"bucket_name": "your-bucket-name",
"prefix": "logs/2023/",
"region": "us-west-2",
"batch_size": 1000,
"dry_run": false,
"verbose": true
}
aws configure
# Enter your AWS Access Key ID, Secret Access Key, and region
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_REGION="us-east-1"
No additional configuration needed when running on AWS services with attached IAM roles.
# Set required environment variables
export S3_BUCKET_NAME="my-versioned-bucket"
# Run MarkerUnmaker
python markerunmaker.py
export S3_BUCKET_NAME="my-bucket"
export S3_PREFIX="logs/2023/"
python markerunmaker.py
export S3_BUCKET_NAME="my-bucket"
export DRY_RUN="true"
python markerunmaker.py
export S3_BUCKET_NAME="my-bucket"
export VERBOSE="true"
python markerunmaker.py
# Basic usage
python markerunmaker.py --bucket my-bucket
# With options
python markerunmaker.py \
--bucket my-bucket \
--prefix logs/2023/ \
--region us-west-2 \
--dry-run \
--verbose
# Using configuration file
python markerunmaker.py --config config.json
Main entry point for delete marker removal.
Parameters: None (uses environment variables)
Returns: None
Raises: botocore.exceptions.ClientError
for AWS API errors
Retrieves all delete markers from the specified bucket and prefix.
Returns: List of dictionaries containing:
Key
: Object keyVersionId
: Version ID of the delete marker
Example:
delete_markers = get_all_delete_markers()
print(f"Found {len(delete_markers)} delete markers")
Deletes a batch of objects with fallback to individual deletion.
Parameters:
objects_to_delete
: List of objects to delete
Returns: True
if successful, False
otherwise
Centralized error handling for S3 operations.
Parameters:
e
: The ClientError exceptioncontext
: Additional context for the error
Returns: AWS error code string
Error Code | Description | Resolution |
---|---|---|
NoSuchBucket |
Bucket doesn't exist | Verify bucket name and region |
AccessDenied |
Insufficient permissions | Check IAM policies |
NoSuchKey |
Object key not found | Normal during concurrent operations |
InvalidBucketName |
Invalid bucket name format | Use valid S3 bucket naming |
BucketNotEmpty |
Bucket contains objects | Expected for non-empty buckets |
MarkerUnmaker implements automatic error recovery:
- Batch Failures: Falls back to individual object deletion
- Transient Errors: Implements exponential backoff retry
- Permission Errors: Provides clear guidance for resolution
- Network Issues: Graceful handling with retry mechanisms
Enable detailed logging for troubleshooting:
import logging
logging.basicConfig(level=logging.DEBUG)
You can use the following guide to add progress tracking and logging into the code
- Backup Important Data: Ensure you have backups of critical objects
- Test with Small Prefix: Start with a limited scope using prefixes
- Use Dry Run: Always test with dry run mode first
- Check Permissions: Verify IAM permissions before execution
- Monitor Costs: Understand potential cost implications
- Monitor Progress: Watch console output for progress updates
- Check CloudWatch: Monitor S3 API metrics during execution
- Avoid Concurrent Operations: Don't run multiple instances simultaneously
- Network Stability: Ensure stable internet connection for large operations
- Verify Results: Check bucket contents to confirm expected results
- Monitor Costs: Track storage cost changes after cleanup
- Document Changes: Keep records of cleanup operations
- Update Lifecycle Policies: Consider implementing lifecycle rules
Solution: Verify bucket name and region configuration
- Check S3_BUCKET_NAME environment variable
- Ensure bucket exists in the specified region
- Verify AWS credentials have access to the bucket
Solution: Check IAM permissions
Required permissions:
- s3:ListBucketVersions
- s3:DeleteObject
- s3:DeleteObjectVersion
Solution: Verify bucket has versioning enabled and contains delete markers
- Check bucket versioning status
- Verify objects have been deleted (creating delete markers)
- Check if prefix filter is too restrictive
Solution: Optimize execution
- Use more specific prefixes to reduce scope
- Check network connectivity
- Consider running from AWS EC2 for better performance
Enable debug mode for detailed troubleshooting:
export DEBUG="true"
export VERBOSE="true"
python markerunmaker.py
- Check the troubleshooting section
- Review AWS CloudTrail logs for API call details
- Enable debug logging for detailed error information
- Consult AWS S3 documentation for specific error codes
Create a minimal IAM policy for MarkerUnmaker:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucketVersions",
"s3:DeleteObject",
"s3:DeleteObjectVersion"
],
"Resource": [
"arn:aws:s3:::your-bucket-name",
"arn:aws:s3:::your-bucket-name/*"
]
}
]
}
- Principle of Least Privilege: Grant only necessary permissions
- Use IAM Roles: Prefer IAM roles over access keys when possible
- Rotate Credentials: Regularly rotate AWS access keys
- Audit Access: Monitor CloudTrail logs for API usage
- Secure Storage: Never commit credentials to version control
- Network Security: Use VPC endpoints for enhanced security
- AWS IAM roles (for EC2/Lambda)
- AWS CLI profiles
- Environment variables (for local development)
- AWS SSO integration
- Hardcoded credentials in source code
- Sharing credentials via email or chat
- Using root account credentials
- Storing credentials in version control
# Adjust batch size based on your needs
BATCH_SIZE = 1000 # Maximum allowed by S3 API
# For smaller operations, reduce to minimize impact:
BATCH_SIZE = 100 # More granular progress updates
For very large buckets, consider parallel processing:
# Example: Process different prefixes in parallel
prefixes = ['logs/2023/01/', 'logs/2023/02/', 'logs/2023/03/']
# Use threading or multiprocessing to handle prefixes concurrently
Track performance metrics:
import time
start_time = time.time()
# ... processing ...
end_time = time.time()
print(f"Processing completed in {end_time - start_time:.2f} seconds")
We welcome contributions! Please follow these guidelines:
git clone https://github.com/your-repo/markerunmaker.git
cd markerunmaker
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements-dev.txt
- Follow PEP 8 style guidelines
- Add type hints for all functions
- Include docstrings for public functions
- Write unit tests for new features
- Update documentation for changes
# Run unit tests
python -m pytest tests/
# Run integration tests (requires AWS credentials)
python -m pytest tests/integration/
# Run linting
flake8 markerunmaker.py
mypy markerunmaker.py
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests and documentation
- Submit a pull request
This project is licensed under the MIT GNU Affero General Public License v3.0 - see the LICENSE file for details.
MIT License
Copyright (c) 2024 MarkerUnmaker Contributors
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
- π Documentation: This comprehensive guide
- π Bug Reports: GitHub Issues
- π‘ Feature Requests: GitHub Discussions
- π§ Email Support: partnerldm@gmail.com
pip install boto3 botocore
curl -O https://raw.githubusercontent.com/your-repo/markerunmaker/main/markerunmaker.py
aws configure
export S3_BUCKET_NAME="your-bucket-name"
export S3_PREFIX="optional/prefix/" # Optional
export DRY_RUN="true"
python markerunmaker.py
unset DRY_RUN # Remove dry run mode
python markerunmaker.py
- AWS SDK for Python (Boto3) team
- AWS documentation team for excellent API documentation
π― MarkerUnmaker Mission: Simplifying S3 bucket maintenance by efficiently removing unnecessary delete markers, reducing costs, and improving performance.