This project provides a sample implementation demonstrating how to use the Salesforce Data Cloud Document AI APIs. It creates a simple web interface that allows users to upload documents (PDFs or images) along with a JSON schema, and then processes these documents using Salesforce's document extraction capabilities.
The application showcases how to:
- Make authenticated API calls to Salesforce Data Cloud Document AI endpoints using OAuth 2.0
- Process document uploads and convert them to base64 encoding
- Submit documents for intelligent extraction with custom schema configurations
- Display the extracted structured data from documents
This testbed serves as a reference implementation for developers looking to integrate Salesforce Data Cloud's document processing capabilities into their own applications.
Youtube video walkthrough: https://youtu.be/H8cgvUP7Ytg
sf-datacloud-idp-testbed/
├── app.py # Main Flask application
├── config.py # Configuration settings (API URLs, OAuth settings)
├── api_client.py # API client for token management
├── requirements.txt # Python dependencies
├── static/ # Static assets
│ ├── css/ # CSS stylesheets
│ │ └── style.css # Main stylesheet
│ ├── js/ # JavaScript files
│ │ └── script.js # Client-side functionality
│ └── json-jazz.html # JSON Schema Generator tool
├── templates/ # HTML templates
│ └── index.html # Main interface page
├── .env.example # Example environment file
└── README.md # Project documentation
-
Clone the Repository
git clone https://github.com/ananth-anto/sf-datacloud-idp-testbed cd sf-datacloud-idp-testbed
-
Create Environment File
cp .env.example .env
-
Configure Salesforce Connected App
- Follow the Setting Up External Client App guide
- Important: Use callback URL as
http://localhost:3000/auth/callback
- Copy your
ClientId
,ClientSecret
, andLoginUrl
from your Salesforce Connected App to your.env
file
Example
.env
:LOGIN_URL=your-salesforce-login-url CLIENT_ID=your-salesforce-connected-app-client-id CLIENT_SECRET=your-salesforce-connected-app-client-secret API_VERSION=vXX.X TOKEN_FILE=access-token.secret
-
Create a Virtual Environment (Recommended)
python -m venv venv
- On macOS/Linux:
source venv/bin/activate
- On Windows:
venv\Scripts\activate
- On macOS/Linux:
-
Install Dependencies
pip install -r requirements.txt
-
Run the Application
python app.py
- The application will start and be available at
http://localhost:3000/
in your web browser.
- The application will start and be available at
- First Time Setup: When you first visit the application, you'll see an "Authenticate with Salesforce" button
- Click Authenticate: This will redirect you to Salesforce login page
- Login: Enter your Salesforce credentials
- Authorize: Grant permissions to the connected app
- Return: You'll be redirected back to the application with a code in the URL
- Token Exchange: The backend exchanges the code for an access token and instance URL
- Ready to Use: The "Analyze Document" button will now be enabled
- Upload a document (supported formats: PDF, PNG, JPG, JPEG, TIFF, BMP)
- Enter a JSON schema defining the data structure you want to extract
- Click "Analyze Document" to process the document
- View the extracted structured data in the results section
The schema should be formatted as a JSON object that defines the fields you want to extract from the document. For example:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"VendorName": {
"type": "string",
"description": "Name of the vendor issuing the invoice"
},
"Vendor Address": {
"type": "string",
"description": "Full address of the vendor"
},
"Vendor Phone number": {
"type": "string",
"description": "Phone number"
},
"Billing Address": {
"type": "string",
"description": "Full name and address of the billing entity"
},
"Product Item List": {
"type": "array",
"items": {
"type": "object",
"description": "Table containing the list of products",
"properties": {
"Item ID": {
"type": "string",
"description": "Item id"
},
"Item Description": {
"type": "string",
"description": "Full description for the item"
},
"Quantity": {
"type": "number",
"description": "Number of items"
},
"Unit Price": {
"type": "number"
},
"Line total": {
"type": "number",
"description": "Line total for the product line item"
}
}
},
"description": "Table containing the list of products"
},
"Total amount": {
"type": "number"
}
}
}
The application uses OAuth 2.0 Web Server Flow (Authorization Code Grant) for authentication with Salesforce Data Cloud. The authentication flow:
- User clicks "Authenticate" button - Frontend calls
/api/auth-info
to get Salesforce configuration - User logs into Salesforce - User enters credentials on Salesforce login page
- Salesforce redirects back - With a code in the URL
- Token is exchanged and saved - Backend exchanges the code for an access token and instance URL, which are stored in
access-token.secret
- Token is used for API calls - All subsequent API calls use the stored token and instance URL
- Storage: Access tokens and instance URLs are stored locally in
access-token.secret
- Security: The token file is automatically added to
.gitignore
to prevent accidental commits - Expiration: Salesforce access tokens expire. When expired, users will need to re-authenticate
- Validation: The application checks authentication status before allowing document processing
-
Authentication Errors: If you receive a 401 or 403 error, your access token may be expired. Click the "Authenticate" button to get a new token.
-
Connected App Configuration: Ensure your Salesforce Connected App has the correct callback URL and OAuth scopes configured.
-
Schema Errors: Ensure your schema is valid JSON and follows the expected format for the Salesforce Data Cloud Document AI API.
-
File Format Issues: Check that your document is in one of the supported formats and is not corrupted.
The application has logging enabled. Check the console output for detailed error messages and debugging information.
This testbed is intended for development and testing purposes only. For production use:
- Never hardcode authentication tokens in your code
- Implement proper user authentication and authorization
- Use HTTPS in production for secure token transmission
- Consider implementing token refresh logic for long-running applications
- Store tokens in secure, encrypted storage rather than local files
GET /
- Main application interfaceGET /api/status
- Check authentication statusGET /api/auth-info
- Get OAuth configurationGET /auth/callback
- OAuth callback page (handles code exchange)POST /extract-data
- Process document extraction
- Flask==3.0.2
- Werkzeug==3.0.1
- requests==2.31.0
This project is licensed under the MIT License - see the LICENSE file for details.