The APIs and Controller -> Backend Server (Flask)
section is finalized, should be.
The backend part swimlane diagram in the Model and Engin -> Data and Control Flow Diagram
is also finalized.
This section outlines how to build and run the project, along with the direct third-party tools, libraries, SDKs, and APIs used.
https://github.com/JI-DeepSleep/DocuSnap-Frontend
For backend, checkout the following repo:
https://github.com/JI-DeepSleep/DocuSnap-Backend
Built with Android Studio targeting Android 13 (API Level 33).
- Encryption:
- Bouncy Castle – Cryptographic algorithms.
- Android Keystore System – Secure key storage.
- Networking:
- OkHttp – Underlying HTTP/2 support.
Note: For a more detailed "Get Started", checkout the backend repo.
Built with Python, using the following core dependencies:
- Web Framework:
- Database:
- SQLite – Embedded relational database.
- OCR & AI Tools:
- CnOcr – Chinese and English OCR library.
- Zhipu AI API – Integration for generative AI tasks.
- Android Setup: Ensure Android SDK 33 is configured in Android Studio.
- Back-End Setup: Use
pip install -r requirements.txt
- The frontend stack has not been finalized. The backend stack won't be far from this version, but we're considering adding support for edge processing (move everything except LLM to the phone) for better security and privacy.
- User Frontend Handles UI interactions on the user's device.
- Camera/Gallery Accesses device camera and photo storage.
- Geo/Color Processor Performs image correction and enhancement.
- Document/Form Handler Manages processing workflows and local data.
- Frontend DB Stores processed documents/forms on device.
- Backend Server Routes requests and manages tasks.
- Backend Worker Executes asynchronous jobs.
- Cache Server Temporary storage for processing results.
- OCR Server Handles text extraction from images.
- Zhipu LLM (External Service) Provides AI enrichment via API.
- Device components (1-5) use Android OS capabilities
- Backend services (6-9) run on our infrastructure
- Zhipu LLM (10) is an external dependency
We present the entity relationship in our app mainly through a swimlane diagram because we find it to be the most informative. Two block diagrams that best fit the assignment requirement but are less informative are also shown below.
Below is the example flow of data and control if we want to parse a document A and a form B, and use the current document database to fill form B (fill task C).
%%{init: {'theme': 'default', 'themeVariables': { 'primaryColor': '#f0f0f0'}}}%%
sequenceDiagram
participant User as User Frontend<br>(User's Phone)
participant CameraGallery as Camera/Gallery<br>(User's Phone)
participant GeoColor as Geo/Color<br>(User's Phone)
participant Handler as Document/Form Handler<br>(User's Phone)
participant FEDB as Frontend DB<br>(User's Phone)
participant Backend as Backend Server
participant Worker as Backend Worker
participant Cache as Cache Server
participant OCR as OCR Server
participant LLM as Zhipu LLM
%% Document A Processing
rect rgba(200,230,255,0.5)
note over User: Document A (Camera)
User->>CameraGallery: captureImage("camera")
CameraGallery->>User: rawImage
User->>GeoColor: correctGeometry(rawImage)
GeoColor->>User: correctedImage
User->>GeoColor: enhanceColors(correctedImage)
GeoColor->>User: enhancedImage
User->>Handler: processDocument(enhancedImage)
Handler->>Backend: /api/process<br>(type=doc, SHA256_A, content=encrypted_payload)
Backend->>Handler: 202 Accepted (processing)
Backend->>Worker: Start processing thread
par Polling and Processing
loop Polling
Handler->>Backend: /api/process<br>(type=doc, SHA256_A, has_content=false)
Backend->>Cache: /api/cache/query<br>(client_id, SHA256_A, "doc")
Cache->>Backend: 404 Not Found
Backend->>Handler: 202 Accepted (processing)
end
Worker->>OCR: /api/ocr/extract
OCR->>Worker: text
Worker->>LLM: /api/llm/enrich
LLM->>Worker: formatted_json
Worker->>Cache: /api/cache/store<br>(client_id, SHA256_A, "doc", data)
Cache->>Worker: 201 Created
end
Handler->>Backend: /api/process<br>(type=doc, SHA256_A, has_content=false)
Backend->>Cache: /api/cache/query<br>(client_id, SHA256_A, "doc")
Cache->>Backend: 200 OK (data)
Backend->>Handler: 200 OK (result)
Handler->>FEDB: saveDocument(sha256_A, metadata)
Handler->>Backend: /api/clear<br>(client_id, SHA256_A)
Backend->>Cache: /api/cache/clear<br>(client_id, SHA256_A, "doc")
Cache->>Backend: 200 OK (cleared:1)
Backend->>Handler: 200 OK (cleared:1)
Handler->>User: processComplete
end
%% Form B Processing
rect rgba(230,255,230,0.5)
note over User: Form B (Gallery)
User->>CameraGallery: captureImage("gallery")
CameraGallery->>User: rawImage
User->>GeoColor: correctGeometry(rawImage)
GeoColor->>User: correctedImage
User->>GeoColor: enhanceColors(correctedImage)
GeoColor->>User: enhancedImage
User->>Handler: processForm(enhancedImage, "formB")
Handler->>Backend: /api/process<br>(type=form, SHA256_B, content=encrypted_payload)
Backend->>Handler: 202 Accepted (processing)
Backend->>Worker: Start processing thread
par Polling and Processing
loop Polling
Handler->>Backend: /api/process<br>(type=form, SHA256_B, has_content=false)
Backend->>Cache: /api/cache/query<br>(client_id, SHA256_B, "form")
Cache->>Backend: 404 Not Found
Backend->>Handler: 202 Accepted (processing)
end
Worker->>OCR: /api/ocr/extract
OCR->>Worker: text
Worker->>LLM: /api/llm/enrich
LLM->>Worker: formatted_json
Worker->>Cache: /api/cache/store<br>(client_id, SHA256_B, "form", data)
Cache->>Worker: 201 Created
end
Handler->>Backend: /api/process<br>(type=form, SHA256_B, has_content=false)
Backend->>Cache: /api/cache/query<br>(client_id, SHA256_B, "form")
Cache->>Backend: 200 OK (data)
Backend->>Handler: 200 OK (result)
Handler->>FEDB: saveFormData("formB", data)
Handler->>Backend: /api/clear<br>(client_id, SHA256_B)
Backend->>Cache: /api/cache/clear<br>(client_id, SHA256_B, "form")
Cache->>Backend: 200 OK (cleared:1)
Backend->>Handler: 200 OK (cleared:1)
Handler->>User: processComplete
end
%% Fill Task C
rect rgba(255,230,200,0.5)
note over User: Fill Task C
User->>Handler:fillForm("formB")
Handler->>Backend: /api/process<br>(type=fill, content=encrypted_payload)
Backend->>Handler: 202 Accepted (processing)
Backend->>Worker: Start processing thread
par Polling and Processing
loop Polling
Handler->>Backend: /api/process<br>(type=fill, has_content=false)
Backend->>Cache: /api/cache/query<br>(client_id, "composite_sha", "fill")
Cache->>Backend: 404 Not Found
Backend->>Handler: 202 Accepted (processing)
end
Worker->>LLM: /api/llm/enrich
LLM->>Worker: filled_form
Worker->>Cache: /api/cache/store<br>(client_id, "composite_sha", "fill", data)
Cache->>Worker: 201 Created
end
Handler->>Backend: /api/process<br>(type=fill, has_content=false)
Backend->>Cache: /api/cache/query<br>(client_id, "composite_sha", "fill")
Cache->>Backend: 200 OK (data)
Backend->>Handler: 200 OK (result)
Handler->>FEDB: updateDocumentData(sha256_A, updates)
Handler->>Backend: /api/clear<br>(client_id, "composite_sha")
Backend->>Cache: /api/cache/clear<br>(client_id, "composite_sha", "fill")
Cache->>Backend: 200 OK (cleared:1)
Backend->>Handler: 200 OK (cleared:1)
Handler->>User: processComplete
end
- User Frontend
- Functionality: UI rendering and interaction
- Implementation: Android Studio (API 33); Build from scratch
- Camera/Gallery
- Functionality: Image capture/selection
- Implementation: Android Studio (API 33) and Gallery APIs
- Geo/Color Processor
- Functionality: Image correction/enhancement
- Implementation: Android Studio (API 33); Build from scratch
- Document/Form Handler
- Functionality: Workflow coordination
- Implementation: Android Studio (API 33); Build from scratch
- Frontend DB
- Functionality: Local data persistence
- Implementation: SQLite via Android Room
- Backend Server
- Functionality: API routing
- Implementation: Flask + Gunicorn
- Backend Worker
- Functionality: Async processing
- Implementation: Python threading
- Cache Server
- Functionality: Temporary data storage
- Implementation: Flask + Gunicorn + SQLite
- OCR Server
- Functionality: Text extraction
- Implementation: Flask + Gunicorn + CnOcr library
- Zhipu LLM (External Service)
- Functionality: Data enrichment
- Implementation: External API integration
Internal frontend APIs via function calls.
function captureImage(source: "camera" | "gallery"): Image
- Captures/selects image from device camera or gallery
- Returns raw image object
function correctGeometry(image: Image): Image
- Applies perspective correction and deskewing
- Returns geometrically corrected image
function enhanceColors(image: Image): Image
- Optimizes contrast, brightness and color balance
- Returns color-enhanced image
function processDocument(enhancedImage: Image): { encryptedDoc: string, sha256: string }
- Processes generic documents
- Returns RSA-encrypted document and SHA256 hash
function processForm(enhancedImage: Image, formType: string): { encryptedDoc: string, sha256: string }
function fillForm(formId: string): JSON
processForm
:
- Processes structured forms using DB templates
- Returns encrypted document and SHA256 hash
fillForm
:
- Fill the given form
// Document storage
function saveDocument(sha256: string, metadata: JSON): boolean
function getDocument(sha256: string): Document
function updateDocumentData(sha256: string, updates: JSON): boolean
// Form data storage
function saveFormData(formId: string, data: JSON): boolean
function getFormData(formId: string): JSON
Main entry point for processing requests and status checks.
Handles all document processing types (doc/form/fill) through a single interface.
Request Body (JSON):
Key | Type | Required | Description |
---|---|---|---|
client_id |
String (UUID) | Yes | Client identifier |
type |
String | Yes | Processing type: "doc" , "form" , or "fill" |
SHA256 |
String | Yes | SHA256 hash computed as per rules below |
has_content |
Boolean | Yes | Indicates whether content payload is included |
content |
String(base64(AES(actual_json_string))) | No | Required when has_content=true - base64(AES(actual_json_string)) |
aes_key |
String(RSA(real_aes_key)) | No | Required when has_content=true - RSA(real_aes_key) |
SHA256 Computation:
SHA256( content_string )
Content Payload Structure (After 1. base64 decoding and then 2. AES decryption):
{
"to_process": ["base64_img1", "base64_img2"], // For doc/form
"to_process": form_obj, // For fill
"file_lib": {
"docs": [doc_obj_1, doc_obj_2, ...],
"forms": [form_obj_1, form_obj_2, ...]
}
}
Validation:
has_content=true
requirescontent
field (else400
)- Computed SHA256 must match provided
SHA256
(else400
) - Backend decrypts
aes_key
using private RSA key to get the real aes key. - Backend decrypts
content
using base64 decoding and then real aes key decryption.
Response:
{
"status": "processing|completed|error",
"error_detail": "Description", // Only for error status
"result": "base64(AES(actual json string))" // Only for completed status, the content is
}
Result Structures (after decryption):
// Doc type
{
"title": "a few words",
"tags": ["array", "of", "words"],
"description": "a few sentences",
"kv": {
"key1": "value1",
"key2": "value2" // Extracted key-value pairs
},
"related": [ // array of related docs
{"type": "xxx", "resource_id": "xxx"}
]
}
// Form type
{
"title": "a few words",
"tags": ["array", "of", "words"],
"description": "a few sentences",
"kv": {
"key1": "value1",
"key2": "value2" // Extracted key-value pairs
},
"fields": ["field1", "field2"],
"related": [ // array of related docs
{"type": "xxx", "resource_id": "xxx"}
]
}
// Fill type
{ // only include fields that has a match with file_lib it is okay that not all fields appear here
"field1": {
"value": "value1",
"source": {"type": "xxx", "resource_id": "xxx"} // type is either doc or form, and resource_id is uuid
},
"field2": {
"value": "value2",
"source": {"type": "xxx", "resource_id": "xxx"}
}
}
Status Codes:
Code | Description |
---|---|
200 | Result available (status=completed ) |
202 | Processing in progress (status=processing ) |
400 | Invalid input/SHA256 mismatch/SHA256 not recognized |
500 | Internal server error |
Example Request:
{
"client_id": "550e8400-e29b-41d4-a716-446655440000",
"type": "doc",
"SHA256": "9f86d081...b4b9a5",
"has_content": true,
"aes_key": "rsa encrypted"
"content": "base64(AES(actual json string))"
}
Example Response:
{
"status": "completed",
"result": { // Decrypted
"title": "Lease Agreement",
"tags": ["legal", "contract"],
"description": "Standard residential lease agreement for 12 months",
"kv": {
"landlord": "Jane Smith",
"tenant": "John Doe",
"term": "12 months"
},
"related": [
{"type": "form", "resource_id": "that form's uuid"}
]
}
}
Clears processing results from the system.
Request Body (JSON):
Key | Type | Required | Description |
---|---|---|---|
client_id |
String (UUID) | Yes | Client identifier |
type |
String | No | Processing type: "doc" , "form" , or "fill" |
SHA256 |
String | No | Specific document hash to clear |
Response:
{
"status": "ok"
}
Status Codes:
Code | Description |
---|---|
200 | Clearance successful |
400 | Missing client_id |
500 | Internal clearance error |
Stores and retrieves encrypted processing results using composite keys (client_id, SHA256, type)
.
The client (app) should not directly call this. This should be called by the backend server.
Retrieves cached processing results.
Query Parameters:
Key | Type | Required | Description |
---|---|---|---|
client_id |
String (UUID) | Yes | Client identifier |
SHA256 |
String | Yes | Document hash |
type |
String | Yes | doc , form , or fill |
Response:
// Success (200)
{"data": "ENCRYPTED_RESULT_STRING"}
// Not found (404)
{"error": "Cache entry missing"}
Stores processing results in cache.
Request Body (JSON):
Key | Type | Required | Description |
---|---|---|---|
client_id |
String (UUID) | Yes | Client identifier |
type |
String | Yes | Processing type: "doc" , "form" , or "fill" |
SHA256 |
String | Yes | Document hash |
data |
String | Yes | Encrypted result data |
Response: 201 Created
(Empty body)
Clears cached entries.
Request Body (JSON):
Key | Type | Required | Description |
---|---|---|---|
client_id |
String (UUID) | Yes | Client identifier |
type |
String | No | Processing type: "doc" , "form" , or "fill" |
SHA256 |
String | No | Specific document hash to clear |
Response:
{
"status": "ok"
}
Performs text extraction from images.
The client (app) should not directly call this. This should be called by the backend server.
Request Body (JSON):
Key | Type | Required | Description |
---|---|---|---|
image_data |
String (Base64) | Yes | Decrypted image |
Response:
{
"text": "Extracted document text..."
}
Status Code: 200 OK
Format OCR data using LLM.
Chinese/English OCR tool for text recognition.
-
API Documentation:
https://github.com/JI-DeepSleep/DocuSnap-Frontend/blob/main/README.md