Skip to content

Commit 8bbcb38

Browse files
committed
Add basic docs
1 parent 049c348 commit 8bbcb38

27 files changed

+1427
-38
lines changed

docs/Architecture-Hosted.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
This page outlines the architecture and deployment features of the BC Gov Hosted COMS service. It is mainly intended for a technical audience, and for people who want to have a better understanding of how we have the service deployed.
2+
3+
**Note:** For more details of the COMS application itself and how it works, see the [Architecture](Architecture) overview.
4+
5+
## Table of Contents
6+
7+
- [Infrastructure](#infrastructure)
8+
- [High Availability](#high-availability)
9+
- [Network Connectivity](#network-connectivity)
10+
- [Database connection Pooling](#database-connection-pooling)
11+
- [Horizontal Autoscaling](#horizontal-autoscaling)
12+
13+
## Infrastructure
14+
15+
The BC Govt. Hosted COMS service runs on the OpenShift container ecosystem. The following diagram provides a general logical overview of main component relations. Main network traffic flows are shown in fat arrows, while secondary network traffic relations are shown with a simple black line.
16+
17+
![Hosted COMS Architecture](images/coms_architecture.png)
18+
19+
**Figure 1 - The general infrastructure and network topology of the BC Govt. hosted COMS**
20+
21+
### High Availability
22+
23+
The COMS API and Database are all designed to be highly available within an OpenShift environment. The Database achieves high availability by leveraging [Patroni](https://patroni.readthedocs.io/en/latest/). COMS is designed to be a scalable and atomic microservice. On the OCP4 platform, there can be between 2 to 16 running replicas of the COMS microservice depending on service load. This allows the service to reliably handle a large variety of request volumes and scale resources appropriately.
24+
25+
### Network Connectivity
26+
27+
In general, all network traffic enters through the BC Govt. API Gateway. A specifically tailored Network Policy rule exists to allow only network traffic we expect to receive from the API Gateway. When a client connects to the COMS API, they will be going through OpenShift's router and load balancer before landing on the API gateway. That connection then gets forwarded to one of the COMS API pod replicas. Figure 1 represents the general network traffic direction with the outlined fat arrows. The direction of those arrows represents which component is initializing the TCP/IP connection.
28+
29+
COMS uses a database network pool to maintain persistent database connections. Pooling allows the service to avoid the overhead of repeated TCP/IP 3-way handshakes to start a connection. By reusing existing connections in a network pool, we can pipeline and improve network efficiency. We pool connections from COMS to Patroni within our architecture. The OpenShift load balancer follows general default Kubernetes scheduling behavior.
30+
31+
### Database connection Pooling
32+
33+
We introduced network pooling for Patroni connections to mitigate network traffic overhead. As our volume of traffic increased, it became expensive to create and destroy network connections for each transaction. While low volumes of traffic are capable of operating without any notable delay to the user, we started encountering issues when scaling up and improving total transaction flow within COMS.
34+
35+
By reusing connections whenever possible, we were able to avoid the TCP/IP 3-way handshake done on every new connection. Instead we could leverage existing connections to pipeline traffic and improve general efficiency. We observed up to an almost 3x performance increase in total transaction volume flow by switching to pooling.
36+
37+
### Horizontal Autoscaling
38+
39+
In order to make sure our application can horizontally scale (run many copies of itself), we had to ensure that all processes in the application are self-contained and atomic. Since we do not have any guarantees of which pod instance would be handling what task at any specific moment, the only thing we can do is to ensure that every unit of work is clearly defined and atomic so that we can prevent situations where there is deadlock, or double executions.
40+
41+
While implementing Horizontal Autoscaling is relatively simple by using a [Horizontal Pod Autoscaler](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/) construct in OpenShift, we can only take advantage of it if the application is able to handle the different types of lifecycles. Based on usage metrics such as CPU and memory load, the HPA can increase or decrease the number of replicas on the platform in order to meet the demand.
42+
43+
We found that in our testing, we were able to reliably scale up to around 17 pods before we began to crash out our Patroni database. While we haven't been able to reliably isolate the cause of this, we suspect that the underlying Postgres database can only handle up to 100 concurrent connections (and is thus ignoring Patroni's max connection limit of 500) or that the database containers are simply running out of memory before being able to handle more connections. As such, this is why we decided to cap our HPA to a maximum of 16 pods at this time.
44+
45+
Our current limiting factor for scaling higher is the ability for our database to support more connections for some reason or another. If we get into the situation where we need to scale past 16 pods, we will need to consider more managed solutions for pooling db connections such as [PgBouncer](https://www.pgbouncer.org/).

docs/Architecture.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
This page outlines the general architecture and design principles of COMS. It is mainly intended for a technical audience, and for people who want to have a better understanding of how the system works.
2+
3+
## Table of Contents
4+
5+
- [Infrastructure](#infrastructure)
6+
- [Database Structure](#database-structure)
7+
- [Code Design](#code-design)
8+
9+
## Infrastructure
10+
11+
![COMS Architecture](images/coms_self_architecture.png)
12+
13+
**Figure 1 - The general infrastructure and network topology of COMS**
14+
15+
## Database Structure
16+
17+
The PostgreSQL database is written and handled via managed, code-first migrations. We generally store tables containing users, objects, buckets, permissions, and how they relate to each other. As COMS is a back-end microservice, lines of business can leverage COMS without being tied to a specific framework or language. The following figures depict the database schema structure as of April 2023 for the v0.4.0 release.
18+
19+
![COMS Public ERD](images/coms_erd_public.png)
20+
21+
**Figure 3 - The public schema for a COMS database**
22+
23+
Database design focuses on simplicity and succinctness. It effectively tracks the user, the object, the bucket, the permissions, and how they relate to each other. We enforce foreign key integrity by invoking onUpdate and onDelete cascades in Postgres. This ensures that we do not have dangling references when entries are removed from the system. Metadata and tags are represented as many-to-many relationships to maximize reverse search speed.
24+
25+
![COMS Audit ERD](images/coms_erd_audit.png)
26+
27+
**Figure 4 - The audit schema for a COMS database**
28+
29+
We use a generic audit schema table to track any update and delete operations done on the database. This table is only modified by database via table triggers, and is not normally accessible by the COMS application itself. This should meet most general security, tracking and auditing requirements.
30+
31+
## Code Design
32+
33+
COMS is a relatively small and compact microservice with a very focused approach to handling and managing objects. However, not all design choices are self-evident just from inspecting the codebase. The following section will cover some of the main reasons why the code was designed the way it is.
34+
35+
### Organization
36+
37+
The code structure in COMS follows a simple, layered structure following best practice recommendations from Express, Node, and ES6 coding styles. The application has the following discrete layers:
38+
39+
| Layer | Purpose |
40+
| ---------- | -------------------------------------------------------------------------------------------- |
41+
| Controller | Contains controller express logic for determining what services to invoke and in what order |
42+
| DB | Contains the direct database table model definitions and typical modification queries |
43+
| Middleware | Contains middleware functions for handling authentication, authorization and feature toggles |
44+
| Routes | Contains defined Express routes for defining the COMS API shape and invokes controllers |
45+
| Services | Contains logic for interacting with either S3 or the Database for specific tasks |
46+
| Validators | Contains logic which examines and enforces incoming request shapes and patterns |
47+
48+
Each layer is designed to focus on one specific aspect of business logic. Calls between layers are designed to be deliberate, scoped, and contained. This hopefully makes it easier to tell at a glance what each piece of code is doing and what it depends on. For example, the validation layer sits between the routes and controllers. It ensures that incoming network calls are properly formatted before proceeding with execution.
49+
50+
#### Middleware
51+
52+
COMS middleware focuses on ensuring that the appropriate business logic filters are applied as early as possible. Concerns such as feature toggles, authentication and authorization are handled here. Express executes middleware in the order of introduction. It will sequentially execute and then invoke the next callback as a part of its call stack. Because of this, we must ensure that the order we introduce and execute our middleware adhere to the following pattern:
53+
54+
1. Run the `require*` middleware functions first (these generally invole the middleware found in `featureToggle.js`)
55+
2. Validation and structural cheks
56+
3. Permission and authorization checks
57+
4. Any remaining middleware hooks before invoking the controller

docs/Authentication.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
This page describes how to authenticate requests to the COMS API. The [Authentication Modes](Configuration#authentication-modes) must be enabled in the COMS configuration.
2+
3+
**Note:** The BC Gov Hosted COMS service only allows OIDC Authentication using JWT's issued by the [Pathfinder SSO `standard` keycloak realm](https://github.com/bcgov/sso-keycloak/wiki#standard-service)).
4+
5+
## OIDC Authentication
6+
7+
With [OIDC mode](Configuration#oidc-keycloak) enabled, requests to the COMS API can be authenticated using a **User ID token** (JWT) issued by an OIDC authentication realm. The JWT should be added in an Authorization header (type `Bearer` token).
8+
9+
COMS will only accept JWT's issued by one OIDC realm (specified in the COMS config). JWT's are typically issued to an application and saved to a user's browser when he/she signs-in to a website through the [Authorization Code Flow](https://openid.net/specs/openid-connect-core-1_0.html#CodeFlowAuth). Both the website (client app) and the instance of COMS must be [configured to use the same OIDC authentication realm](https://github.com/bcgov/common-object-management-service/blob/master/app/README.md#keycloak-variables) in order for the JWT to be valid.
10+
11+
When COMS receives the request, it will validate the JWT (by calling the OIDC realm's token endpoint). The JWT is a reliable way of verifying the the user's identity on which the COMS permission model is based.
12+
13+
The authentication when downloading an object also uses S3 pre-signed URLs:
14+
15+
### Authentication flow for readObject
16+
17+
Reference: [API Specification](https://coms.api.gov.bc.ca/api/v1/docs#tag/Object/operation/readObject) for more details.
18+
19+
A common use case for COMS is to download a specific object from object storage.
20+
Depending on the `download` mode specified in the request, the COMS `readObject` endpoint will return one of the following:
21+
22+
1. The file directly from S3, by first doing a HTTP 302 redirect to a temporary pre-signed S3 object URL
23+
2. The file streamed/proxied through COMS
24+
3. The temporary pre-signed S3 object URL itself
25+
26+
COMS uses the redirect flow by default because it avoids unnecessary network hops. For significantly large object transactions, redirection also has the added benefit of maximizing COMS microservice availability. Since the large transaction does not pass through COMS, it is able to remain capable of handling other client requests.
27+
28+
![COMS Network Flow](images/coms_network_flow.png)
29+
30+
**Figure 2 - The general network flow for a typical COMS object request**
31+
32+
## Basic Auth
33+
34+
If [Basic Auth Mode](Configuration#basic-auth) is enabled in your COMS instance, requests to the COMS API can be authenticated using an HTTP Authorization header (type `Basic`) containing the username and password configured in COMS.
35+
36+
This mode offers more direct access for a 'service account' authorized in the scope of the application rather than for a specific user and by-passes the COMS object/bucket permission model.
37+
38+
Basic Auth mode is not available on the BC Gov hosted COMS service.
39+
40+
## Unauthenticated Mode
41+
42+
[Unauthenticated Mode](Configuration#unauthenticated-auth) configuration is generally recommended when you expect to run COMS in a highly secured network environment and do not have concerns about access control to objects as you have another application handling that already.

docs/Buckets.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
2+
### Configuring Buckets
3+
4+
- COMS is [configured with a 'default' bucket](Configuration#object-storage). Various object management endpoints will use this bucket if no `bucketId` parameter is provided. (**Note:** the default bucket fall-back behaviour is not available in the BC Gov Hosted COMS service.)
5+
6+
- Additional buckets can be added to the COMS system using the [createBucket](https://coms.api.gov.bc.ca/api/v1/docs#tag/Bucket/operation/createBucket) endpoint.
7+
8+
- When a bucket is created, if the createBucket API request is authenticated with a User ID token (JWT), that user will be granted all [5 permissions](Permissions#permission-codes). Bucket Permissions can be granted to other users ([bucketAddPermissions](https://coms.api.gov.bc.ca/api/v1/docs#tag/Permission/operation/bucketAddPermissions)), if the request is authenticated with a JWT for a user with `MANAGE` permission.
9+
10+
If you are self-hosting COMS you can also manage permissions for any object or bucket by using these endpoints with [basic authentication](Authentication#basic-auth).
11+
12+
### Using the Bucket **Key**
13+
14+
When you create a bucket in COMS, technically you are 'mounting' your S3 bucket (actual bucket provisioned) at a specified path in the `key` property of the [createBucket](https://coms-dev.api.gov.bc.ca/api/v1/docs#tag/Bucket/operation/createBucket) request body.
15+
16+
COMS will only operate with objects at that 'folder' within the actual bucket. A COMS `bucket` can more accurately be thought of as a 'mount' to a single path within a bucket.
17+
18+
To work with objects in 'sub-folders' (with other prefixes), you can create multiple COMS 'buckets' mounted at different paths by specifying different keys.

docs/Endpoint-Notes.md

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
This page outlines the general usage patterns and organization of the COMS API. This article is intended for a technical audience, and for people who are planning on using the API endpoints.
2+
3+
**The COMS API is documented using the [Open API Specification](https://coms.api.gov.bc.ca/api/v1/docs)**
4+
5+
## Table of Contents
6+
7+
- [Bucket](#bucket)
8+
- [Object](#object)
9+
- [Metadata](#metadata)
10+
- [Tag](#tag)
11+
- [Versions](#versions)
12+
- [Permission](#permission)
13+
- [Sync](#sync)
14+
- [User](#user)
15+
16+
## Bucket
17+
18+
Bucket operations offer the usual CRUD operations for bucket resource management. For example:
19+
20+
- `CREATE /bucket` and `PATCH /bucket/{bucketId}` will pre-emptively check to see if the proposed credential changes represent a network-accessible bucket. These endpoints will yield an error if it is unable to validate the bucket.
21+
22+
## Object
23+
24+
Object endpoints directly influence and manipulate S3 objects and information inherent to them. These endpoints serve as the main core of COMS, focusing on CRUD operations for the objects themselves.
25+
26+
- Uploading (`POST /object`) or updating an object ( `POST /object/{objectId}`) accepts a file in a multipart/form-data body. You can include metadata (via headers) and tags (using query params) in this request.
27+
- `GET /object/{objectId}` is the main endpoint for users to directly access and download a single object.
28+
- `HEAD /object/{objectId}` should be used for situations where you need to get information about the object, but do not want the binary stream of the object itself.
29+
- `DELETE /object/{objectId}` deletes either the object or a specific version of the object. COMS follows the S3 standard for [deleting versioned objects](https://docs.aws.amazon.com/AmazonS3/latest/userguide/DeletingObjectVersions.html)
30+
- If versioning is enabled, calling `/object/{objectId}` is a soft-delete, adding a 'delete-marker' version. To restore this object, remove the delete-marker with `/object/{objectId}?versionId={VersionId of delete-marker}`. To hard-delete a versioned object, you must delete the last version `/object/{objectId}?versionId={last version}`.
31+
- Calling in the Delete endpoint on a bucket without versioning is a hard-delete.
32+
- The `GET /object` search and `PATCH /object/{objectId}/public` public toggle require a backing database in order to function.
33+
34+
### Metadata
35+
36+
Metadata operation endpoints directly focus on the manipulation of metadata of S3 Objects. Each endpoint will create a copy of the object with the modified metadata attached.
37+
38+
More details found here: [Metadata and Tags](Metadata-Tag)
39+
40+
### Tag
41+
42+
Tag operation endpoints directly focus on the manipulation of tags of S3 Objects. Unlike Metadata, Tags can be modified without the need to create new versions of the object.
43+
44+
More details found here: [Metadata and Tags](Metadata-Tag)
45+
46+
### Versions
47+
48+
Version specific operations focus on listing and discovering versioning information known by COMS. While the majority of version-specific operations are available as query parameters in the Objects endpoints, the `GET /object/{objectId}/version` endpoint focuses on letting users discover and list what versions are available to work with.
49+
50+
## Permission
51+
52+
Permission operation endpoints directly focus on associating users to objects with specific permissions. All of these endpoints require a database to function. Existing permissions can be searched for using `GET /permission/object` and `GET /permission/bucket`, and standard create, read and delete operations for permissions exist to allow users to modify access control for specific objects they have management permissions over.
53+
54+
More details found here: [Permissions](Permissions)
55+
56+
## Sync
57+
58+
*Available in COMS v0.7+*
59+
60+
Sync endpoints allow synchronizing COMS' internal state with that of the actual S3 bucket/object. This can be useful for setting up a S3 bucket with preexisting files for use with COMS without having to re-upload everything through the COMS API, or for synchronizing changes made through an external S3 client (e.g. S3 Browser, Cyberduck etc) to an object already managed by COMS.
61+
62+
API calls to the sync endpoints do not immediately add all detected changes to COMS' internal database; instead, they are added to a queue where they are eventually processed. The endpoint `GET /sync/status` returns the number of items that are currently sitting in this queue.
63+
64+
At the time of writing, synchronization is not done automatically, so the sync endpoints must be used in order for COMS to know of any changes to the bucket/object.
65+
66+
## User
67+
68+
User operation endpoints focus on exposing known tracked users and identity providers. These endpoints serve as a reference point for finding the right user and identity to manipulate in the Permission endpoints. As COMS is relatively agnostic to how a user logs in (it only cares that you exist), the onus of determining which identity provider a user uses falls onto the line of business to handle, should that be something that needs monitoring.

0 commit comments

Comments
 (0)