will examine the sender's domain, and if it matches specific patterns (like those seen in your examples), it will copy the email to an S3 bucket (cache) and then move it out of the Gmail inbox (e.g., to Trash or a specific label).
graph LR
A[Gmail Account] -- New Email --> B(Gmail API);
B -- 1. users.watch --> C[Google Cloud Pub/Sub Topic];
A -- New Email Event --> C;
C -- 2. Push Notification --> D[AWS API Gateway Endpoint];
D -- 3. Trigger --> E[AWS Lambda Function];
E -- 4a. Get Secret --> F[AWS Secrets Manager];
F -- OAuth Credentials --> E;
E -- 4b. Get/Modify Email --> B;
E -- 5. Filter Logic --> G{Domain Match?};
G -- Yes --> H[Cache Email];
H -- 6a. PutObject --> I[AWS S3 Bucket];
H -- 6b. Move Email --> B;
G -- No --> J[Ignore];
E -- Logs --> K[AWS CloudWatch Logs];
style E fill:#f9f,stroke:#333,stroke-width:2px
style I fill:#ccf,stroke:#333,stroke-width:2px
style B fill:#f8d7da,stroke:#721c24
style C fill:#f8d7da,stroke:#721c24
note: This workflow provides an automated way to isolate these specific types of emails for later investigation while keeping your main inbox cleaner. Remember to handle security (credentials) and the periodic nature of the users.watch
renewal.
- Google Cloud Project & Pub/Sub: Needed to enable Gmail API push notifications.
- AWS API Gateway (or other endpoint): To receive push notifications from Google Cloud Pub/Sub.
- AWS Lambda Function: The core processing logic resides here.
- AWS S3 Bucket: To cache the filtered emails.
- AWS IAM Roles: For granting necessary permissions to Lambda.
- AWS Secrets Manager (Recommended): To securely store Gmail API credentials (refresh token).
- Gmail API: To read and modify emails.
-
Google Cloud Setup:
- Create a Google Cloud Project.
- Enable the Gmail API and Google Cloud Pub/Sub API.
- Create OAuth 2.0 Credentials (Client ID and Secret) for a Web Application or Desktop Application. You'll need to authorize this app to access your Gmail account (scopes:
https://www.googleapis.com/auth/gmail.modify
for reading and moving emails). Perform the OAuth flow once to get a Refresh Token for your Gmail account. - Create a Google Cloud Pub/Sub Topic (e.g.,
gmail-push-notifications
). - Grant the Gmail API service account (
gmail-api-push@system.gserviceaccount.com
) permission to publish messages to this Pub/Sub topic.
-
AWS Setup:
- Create an S3 Bucket (e.g.,
my-gmail-filter-cache
) to store the filtered emails. - Create an IAM Role for the Lambda function. This role needs permissions for:
logs:CreateLogGroup
,logs:CreateLogStream
,logs:PutLogEvents
(for logging).s3:PutObject
(to write emails to the S3 bucket).secretsmanager:GetSecretValue
(if using Secrets Manager for the refresh token).- (Optionally)
apigateway:ManageConnections
if using WebSocket API Gateway, or relevant permissions if using HTTP API Gateway. - (Optionally)
sns:Publish
if using SNS as an intermediary.
- (Recommended) Store the obtained Google OAuth Refresh Token and your Client ID/Secret securely in AWS Secrets Manager.
- Create the AWS Lambda Function (e.g., using Python 3.x):
- Assign the created IAM Role.
- Include necessary libraries (e.g.,
google-auth
,google-api-python-client
,boto3
). You'll need to package these as a Lambda layer or include them in your deployment package. - Configure environment variables (e.g., S3 bucket name, Secrets Manager secret ARN, desired Gmail label for filtered emails).
- Create an API Gateway Endpoint (HTTP API is simpler and cheaper) that triggers the Lambda function. Note the invocation URL.
- Create an S3 Bucket (e.g.,
-
Connecting Google to AWS:
- Create a Push Subscription for your Google Cloud Pub/Sub Topic. Configure it to send notifications to the HTTPS endpoint URL of your AWS API Gateway.
-
Initiate Gmail Watch:
- Run a script (can be a one-off local script or another Lambda) using the Gmail API's
users.watch
method. Provide:- Your Gmail address (
userId
: 'me'). - The name of the Google Cloud Pub/Sub Topic created earlier.
- (Optional) Labels to watch (e.g.,
INBOX
).
- Your Gmail address (
- This tells Gmail to send notifications to your Pub/Sub topic when changes occur (like new emails arriving in the INBOX). Note: You need to renew this watch periodically (typically weekly). A scheduled Lambda or CloudWatch Event could automate this renewal.
- Run a script (can be a one-off local script or another Lambda) using the Gmail API's
- Email Arrival: A new email arrives in your Gmail INBOX.
- Gmail Notification: Gmail detects the new email and publishes a notification message to your Google Cloud Pub/Sub topic.
- Pub/Sub Push: Google Cloud Pub/Sub pushes this notification message (which usually contains the user's email address and history ID) to your configured AWS API Gateway endpoint.
- API Gateway Trigger: API Gateway receives the request and triggers the associated AWS Lambda function, passing the notification payload.
- Lambda Execution:
- Parse Notification: The Lambda function parses the incoming payload from Pub/Sub via API Gateway. It might need to decode the base64-encoded data if Pub/Sub sends it that way. It extracts the user email and history ID. Alternatively, instead of using history ID, the Lambda could just query for recent unread messages.
- Get Credentials: Retrieve the OAuth 2.0 Client ID, Client Secret, and Refresh Token from AWS Secrets Manager (or environment variables, less secure).
- Authenticate: Use the credentials and refresh token to obtain a short-lived Access Token for the Gmail API.
- Fetch Email(s): Use the history ID (via
users.history.list
) or query recent messages (users.messages.list
withq='is:unread in:inbox'
) to get the ID(s) of new message(s). For each message ID:- Fetch the full message details using
users.messages.get
withformat='metadata'
andmetadataHeaders=['From']
(orformat='full'
orformat='raw'
if needed later).
- Fetch the full message details using
- Filter Logic:
- Extract the sender's email address from the
From
header. - Parse the domain from the sender's email address.
- Define a list or regex pattern for the domains/subdomains to filter (e.g.,
.*\.get-me-jobs\.com$
,.*\.jobcase\.com$
,.*\.jobhat\.com$
,.*\.californiajobdepartment\.com$
,updates@umail\..*
). - Check if the extracted sender domain matches any of the filter patterns.
- Extract the sender's email address from the
- Action (If Match):
- Fetch Full Email: If not already fetched, get the full raw email content using
users.messages.get
withformat='raw'
. Decode the base64url encoded string. - Cache to S3: Upload the raw email content (e.g., as an
.eml
file) to the configured S3 bucket. Use the message ID as part of the object key for uniqueness (e.g.,filtered-emails/YYYY/MM/DD/{message-id}.eml
). - Move in Gmail: Use
users.messages.modify
to:- Option A (Safer): Add a specific label (e.g.,
FilteredJobSpam
) and remove theINBOX
label. This archives it under that label for review. - Option B (Direct Removal): Move the email to Trash using
users.messages.trash
.
- Option A (Safer): Add a specific label (e.g.,
- Fetch Full Email: If not already fetched, get the full raw email content using
- Action (No Match): Do nothing. The email remains in the inbox.
- Logging: Log the decision (filtered/cached/moved or ignored) and the message ID to CloudWatch Logs.
- Access the S3 bucket (
my-gmail-filter-cache
). - Download or process the cached
.eml
files. - Analyze the email headers (especially
Received
,Return-Path
,Sender
,X-Originating-IP
, etc.) and content to understand the true source or purpose behind these emails sent via the filtered domains. - Refine the filter patterns in the Lambda function based on your findings.