A production-ready FastAPI application for accessing Databricks Lakebase data. Features scalable architecture, automatic token refresh, and optimized database connection management.
Learn more about Databricks Lakebase here
- Database Abstraction & Security: APIs prevent direct database access and provide controlled access through authenticated apps.
- Standardized Access Patterns: APIs create consistent ways to interact with data across different teams and applications.
- Development Velocity: APIs reduce duplicate code in applications. Write your api logic once and let applications leverage your endpoint.
- Performance Optimization & Caching: APIs leverage connection pooling, query optimization, and results caching for high performance workloads.
- Cross Platform Capability: Any programming language can leverage the REST protocol.
- Audit Trails & Monitoring: Custom logging, request tracking, and usage analytics give visibility into data access.
- Future Proof: APIs simplify switching between databases, adding new data sources, or changing infrastructure.
- FastAPI REST API with async/await support
- Databricks Lakebase Integration with OAuth token management
- Automatic Resource Management - create and delete Lakebase resources on-demand
- Dynamic Endpoint Registration - endpoints are conditionally loaded based on database availability
- Automatic Token Refresh with configurable intervals
- Production-Ready Architecture with domain-driven design
- Connection Pooling with optimized settings for high-traffic scenarios
- Environment-Based Configuration for different deployment environments
- Comprehensive Error Handling and logging
- Immediate Example plugs into databricks sample datasets
- Databricks Workspace: Permissions to create apps and database instances
- Python 3.11+ and uv package manager
- Environment Variables configured (see Configuration section)
-
Clone and install dependencies:
git clone https://github.com/databricks-solutions/lakebase-fastapi-app.git uv sync
-
Configure environment variables:
cp .env.example .env # Edit .env with your Databricks configuration
-
Run the application:
uv run uvicorn src.main:app --reload
-
Access the API:
- API:
http://localhost:8000
- Interactive docs:
http://localhost:8000/docs
- API:
-
Create Lakebase Resources for Orders endpoints:
- Docs page:
http://localhost:8000/docs
- Click Dropdown:
/api/v1/resources/create-lakebase-resources
- Click 'Try it Out'
- Set create_resources = true
- Click Execute (Resources will take several minutes to deploy)
- Docs page:
-
Orders Endpoints:
- Confirm resources from step 5 are deployed.
- Restart the fastapi service
- You should now see the /orders endpoints.
-
Delete Lakebase Resources:
- Docs page:
http://localhost:8000/docs
- Click Dropdown:
/api/v1/resources/delete-lakebase-resources
- Click 'Try it Out'
- Set confirm_deletion = true
- Click Execute
- Docs page:
*assumes local development steps have been completed.
-
Databricks UI: Create Custom App:
-
Databricks UI: App Database Instance Permissions:
- Copy App Service Principal Id from App -> Authorization
- Compute -> Database Instances -> <your_instance> -> Permissions
- Add PostgreSQL Role -> enter app service principal id -> assign databricks superuser
- Grant App Service Principal permissions to the Postgres Catalog.
-
Configure environment variables in app.yaml:
Variable Description Example LAKEBASE_INSTANCE_NAME
Lakebase database instance name my-lakebase-instance
LAKEBASE_DATABASE_NAME
Lakebase database name demo-database
LAKEBASE_CATALOG_NAME
Lakebase catalog name my-lakebase-catalog
SYNCHED_TABLE_STORAGE_CATALOG
Catalog for synced table metadata my_catalog
SYNCHED_TABLE_STORAGE_SCHEMA
Schema for synced table metadata my_schema
DATABRICKS_DATABASE_PORT
Postgres Port 5432
DEFAULT_POSTGRES_SCHEMA
Database schema public
DEFAULT_POSTGRES_TABLE
Table name orders_synced
Variable Default Description DB_POOL_SIZE
5
Connection pool size DB_MAX_OVERFLOW
10
Max overflow connections DB_POOL_TIMEOUT
30
Pool checkout timeout (seconds) DB_COMMAND_TIMEOUT
10
Query timeout (seconds) DB_POOL_RECYCLE_INTERVAL
3600
Pool Recycle Interval (seconds) -
Deploy app files using Databricks CLI:
databricks sync --watch . /Workspace/Users/<your_username>/<project_folder> # May need -p <profile_name> depending on .databrickscfg
-
Databricks UI: Deploy Application:
- App -> Deploy
- Source code path = /Workspace/Users/<your_username>/<project_folder> - source code path is at the project root where app.yaml lives.
- View logs for successful deploy: src.main - INFO - Application startup initiated
- View your API docs: <your_app_url>/docs
src/
├── app.py # Main FastAPI application with dynamic endpoint loading
├── core/
│ └── database.py # Database connection with automatic token refresh
├── models/
│ ├── lakebase.py # Lakebase resource management models
│ └── orders.py # Orders models using SQLModel
└── routers/
├── __init__.py # Dynamic router registration logic
└── v1/ # API v1 endpoints
├── healthcheck.py # Health check endpoints
├── lakebase.py # Lakebase resource management endpoints
└── orders.py # Orders endpoints (loaded dynamically)
Important Note: OAuth tokens expire after one hour, but expiration is enforced only at login. Open connections remain active even if the token expires. However, any PostgreSQL command that requires authentication fails if the token has expired. Read More: https://docs.databricks.com/aws/en/oltp/oauth
Automatic Token Refresh:
- 50 Minute token refresh with background async task that does not impact requests
- Guaranteed token refresh before expiry (safe for 1-hour token lifespans)
- Optimized for high-traffic production applications
- Pool connections are recycled every hour preventing expired tokens on long connections
Endpoint | Method | Description |
---|---|---|
/health |
GET | Simple health check |
/api/v1/health/database |
GET | Database health check |
/api/v1/resources/create-lakebase-resources |
POST | Create Lakebase resources |
/api/v1/resources/delete-lakebase-resources |
DELETE | Delete Lakebase resources |
These endpoints are only available when a Lakebase database instance exists
Endpoint | Method | Description |
---|---|---|
/api/v1/orders/count |
GET | Get total order count |
/api/v1/orders/sample |
GET | Get 5 random order keys |
/api/v1/orders/{order_key} |
GET | Get order by key |
/api/v1/orders/pages |
GET | Page-based pagination (traditional) |
/api/v1/orders/stream |
GET | Cursor-based pagination (high performance) |
/api/v1/orders/{order_key}/status |
POST | Update order status |
# Check if app is running
curl http://localhost:8000/health
# Check database health
curl http://localhost:8000/api/v1/health/database
# Create Lakebase resources
curl -X POST http://localhost:8000/api/v1/resources/create-lakebase-resources \
-H "Content-Type: application/json" \
-d '{}'
# Get order count (only available after resources are created)
curl http://localhost:8000/api/v1/orders/count
# Get specific order
curl http://localhost:8000/api/v1/orders/1
# Get paginated orders
curl "http://localhost:8000/api/v1/orders/pages?page=1&page_size=10"
# Get cursor-based orders
curl "http://localhost:8000/api/v1/orders/stream?cursor=0&page_size=10"
# Update order status
curl -X POST http://localhost:8000/api/v1/orders/1/status \
-H "Content-Type: application/json" \
-d '{"o_orderstatus": "F"}'
{
"o_orderkey": 1,
"o_custkey": 36901,
"o_orderstatus": "F",
"o_totalprice": 172799.49,
"o_orderdate": "1996-01-02",
"o_orderpriority": "5-LOW",
"o_clerk": "Clerk#000000951",
"o_shippriority": 0,
"o_comment": "nstructions sleep furiously among"
}
View app-cookbook to learn how to:
- Connect Local Machine to Apps
- Connect External App to Databricks App
- Connect Databricks App to Databricks App
For applications handling thousands of requests per minute:
-
Increase pool size:
DB_POOL_SIZE=20 DB_MAX_OVERFLOW=50
-
Monitor connection pool metrics in application logs
- OAuth token rotation prevents credential staleness
- SSL/TLS enforcement for all database connections
- Environment variable isolation for sensitive configuration
- No credential logging in production builds
- Request latency (
X-Process-Time
header) - Token refresh frequency (log analysis)
- Connection pool utilization
- Database query performance
# Token refresh events
"Background token refresh: Generating fresh PostgreSQL OAuth token"
"Background token refresh: Token updated successfully"
# Performance tracking
"Request: GET /orders/1 - 8.3ms"
Connection timeouts:
- Increase
DB_COMMAND_TIMEOUT
for slow queries - Check database instance performance
Databricks support doesn't cover this content. For questions or bugs, please open a GitHub issue and the team will help on a best effort basis.
© 2025 Databricks, Inc. All rights reserved. The source in this notebook is provided subject to the Databricks License.
Library | Description | License | Source |
---|---|---|---|
FastAPI | High-performance API framework | MIT | GitHub |
SQLAlchemy | SQL toolkit and ORM | MIT | GitHub |
Databricks SDK | Official Databricks SDK | Apache 2.0 | GitHub |
asyncpg | Async PostgreSQL driver | Apache 2.0 | GitHub |
Pydantic | Data validation using Python type hints | MIT | GitHub |
Dataset | Disclaimer |
---|---|
TPC-H | The TPC-H Dataset is available without charge from TPC under the terms of the the TPC End User License Agreement. |