You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/dev/custom-domains.md
+26-30Lines changed: 26 additions & 30 deletions
Original file line number
Diff line number
Diff line change
@@ -1,31 +1,30 @@
1
-
# The Hitchhiker's Guide to Custom Domains
2
1
Lets territory owners attach a domain to their territories.
3
2
TODO: change the title
4
3
5
4
Index:
6
5
TODO
7
6
8
7
# Middleware
9
-
Every time we hit a custom domain, middleware checks if it's allowed via a cached list of `ACTIVE` domains, coupled with their `subName`.
10
-
If it's allowed, we redirect and rewrite to give custom domains a seamless territory-centered SN experience.
8
+
Every time we hit a custom domain, our middleware:
9
+
- Looks up a cached map of `ACTIVE` custom domains and their `subName`
10
+
- Redirects and rewrites URLs to provide a seamless territory-specific SN experience.
11
11
##### Main middleware
12
-
Referral cookies and security headers gets applied the same way as before on SN, with the exception of being their own functions, so that now we can apply them also to the customDomainMiddleware resulting response.
13
-
##### customDomainMiddleware
14
-
A `x-stacker-news-subname` header with the `subName` is injected into the request headers to give the SN code awareness of the territory attached to a custom domain.
12
+
Referral cookies and security headers are applied as usual ensuring that the same Stacker News functionality are applied to responses returned by `customDomainMiddleware`
13
+
##### Custom Domain Middleware
14
+
Injects a `x-stacker-news-subname` header into the request, so that we can avoid checking the cached map of domains again on other parts of the code.
15
15
16
16
Since SN has several paths that depends on the `sub` parameter or the `~subName/` paths, it manipulates the URL to always stay on the right territory:
17
-
-It forces the `sub` parameter to match the custom domain's `subName`
18
-
-Rewrites `/` to `~subName/`
17
+
-Forces the `sub` parameter to match the custom domain's `subName`
Rewriting `~subName` to `/` gives custom domains an **independent-like look**, so that things like `/~subName/post` can now look like `/post`, etc.
25
25
# Domain Verification
26
-
Domain Verification is a pgboss Job that checks correct DNS values and handles AWS external requests.
27
-
28
-
On domain creation, we schedule a job that starts in 30 seconds sending also the domain ID, and a `singletonKey` that protects this job from being ran from other workers, avoiding concurrency issues.
26
+
We use a pgboss job called `domainVerification`, to verify domains, manage AWS integrations and update domain status.
27
+
A new job is scheduled 30 seconds after domain creation or `domainVerification` resulting in `PENDING`, including a `singletonKey` to prevent concurrency from other workers.
29
28
30
29
The Domain Verification Flow is structured this way:
31
30
```
@@ -67,18 +66,19 @@ domain is PENDING
67
66
```
68
67
69
68
### DNS Verification
70
-
It uses the `Resolver`class from `node:dns/promises` to resolve CNAME records on a domain.
69
+
It uses `Resolver` from `node:dns/promises` to fetch CNAME records.
71
70
72
-
If the CNAME record is correct, it logs a `DomainVerificationAttempt`tied with the `DomainVerificationRecord`, having status `VERIFIED`. This resulting status is shared with the connected`DomainVerificationRecord`thanks to a trigger.
73
-
##### dnsmasq
71
+
A successful CNAME lookup logs a `DomainVerificationAttempt` with status `VERIFIED`, triggering an update to the corresponding`DomainVerificationRecord`via database triggers.
72
+
##### local testing with dnsmasq
74
73
In local, **dnsmasq** is used as a DNS server to mock records for the domain verification job.
75
74
To have a dedicated IP for the `node:dns` Resolver, the `worker` container is part of a dedicated docker network that gives dnsmasq the `172.30.0.2` IP address.
76
75
77
76
You can also set your machine's DNS configuration to point to 127.0.0.1:5353 and access custom rules that you might've set. For example, if you have a CNAME record www.pizza.com pointing to `local.sndev`, you can access www.pizza.com from your browser.
78
77
79
78
For more information on how to add/remove records, take a look at `README.md` on the `Custom domains` section.
80
79
### AWS management
81
-
The domain verification job also handles critical AWS operations, such as:
80
+
AWS operations are handled within the verification job. Each steps logs attempts and allows up to 3 pgboss job retries on critical thrown errors.
81
+
82
82
- certificate issuance
83
83
- certificate validation values
84
84
- certificate polling
@@ -88,39 +88,36 @@ The domain verification job also handles critical AWS operations, such as:
88
88
After DNS checks, if we don't have a certificate already, we request ACM a new certificate for the domain.
89
89
ACM will return a `certificateArn`, which is the unique ID of an ACM certificate, that is immediately used to check its status. These informations are then stored in the `DomainCertificate` table.
90
90
91
-
If we couldn't request a certificate, check its status or store it in the DB, it throws an error so that pgboss can retry the job.
92
-
93
91
##### Certificate validation values
94
92
ACM needs to verify domain ownership in order to validate the certificate, in this case we use the DNS method.
95
93
96
94
We ask ACM for the DNS records so that we can store them as a `DomainVerificationRecord` and present them to the user. Finally, we re-schedule the job so that the user can adjust their DNS configuration.
97
95
98
-
If we couldn't get validation values or store them in the DB, it throws an error so that pgboss can retry the job.
99
-
100
-
##### Certificate validation polling
96
+
##### Certificate validation and status polling
101
97
We asked ACM for a certificate, got its validation values and presented them to the user. Now we need to poll ACM to know if the verification was successful.
102
98
103
-
Since we're directly checking the certificate status, we also update DomainCertificate on our DB with the new status.
99
+
Since we're directly checking the certificate status, we also update `DomainCertificate` on our DB with the new status.
104
100
105
-
AWS timings are unpredictable, if the verification returns a negative result, we re-schedule the job to repeat this step.
106
-
And If we couldn't contact ACM, it throws an error so that pgboss can retry the job.
101
+
AWS validation timings are unpredictable, if the verification returns a negative result, we re-schedule the job to repeat this step.
107
102
108
103
##### Certificate attachment to the ALB listener
109
104
This is the last step regarding AWS in our domain verification job, it attaches a completely verified ACM certificate to our load balancer listener.
110
105
111
106
The ALB listener is the gatekeeper of the application load balancer (ALB), it determines how incoming requests should be routed to the target server.
112
107
113
-
In the case of Stacker News, the domain points directly at the load balancer listener, this means that we can both direct the user to point their `CNAME` record to `stacker.news` and that we can serve their ACM certificate directly from the load balancer.
108
+
In the case of Stacker News, the domain points directly to the load balancer listener, this means that we can both direct the user to point their `CNAME` record to `stacker.news` and we can serve their ACM certificate directly from the load balancer.
109
+
110
+
### Error handling
111
+
Every AWS or DNS step is wrapped in try/catch:
112
+
If something throws an error, we catch it to log the attempt and then re-throw it to let pgboss retry up to 3 times.
113
+
114
+
Using the `jobId` that we pass with each job, we can know if we're reaching 3 retries using pgboss' `getJobById`. And if we did reach 3 retries, we put the domain on `HOLD`, stopping and deleting future jobs tied to this domain.
114
115
115
116
### End of the job
116
117
When we finish a step in the domain verification job, and the resulting status is still `PENDING`, we re-schedule a job using `sendDebounced` by pgboss.
117
118
118
119
Since we use a `singletonKey` to avoid same-domain concurrent jobs, and you can't schedule another job if one is already running, `sendDebounced` will try to schedule a job when it can, e.g. when the job finishes or after 30 seconds.
119
120
120
-
### Error handling
121
-
If something throws an error, we catch it to log the attempt and then re-throw it to let pgboss retry up to 3 times.
122
-
Using the `jobId` that we pass with each job, we can know if we're reaching 3 retries using pgboss' `getJobById`. And if we did reach 3 retries, we put the domain on `HOLD`, stopping and deleting future jobs tied to this domain.
123
-
124
121
### Domain Verification logger
125
122
We need to be able to track where, when and what went wrong during domain verification. To do this, every step of the job calls `logAttempt`
126
123
##### logAttempt
@@ -155,7 +152,7 @@ Since in local we don't have the possibility to use Localstack to mock the ALB,
155
152
156
153
As the ALB is really important to reach stacker.news, we only implemented Attach/Detach certificate functions that takes a specific `certificateArn` (unique ID). This way we can't possibly mess with the default ALB configuration.
157
154
158
-
# plpgSQL functions and triggers
155
+
# Triggers, cleanup and maintenance
159
156
### Clear Long Held Domains
160
157
Every midnight, the `clearLongHeldDomains` job gets executed to remove domains that have been on `HOLD` for more than 30 days.
161
158
@@ -166,7 +163,6 @@ The `DomainVerification` job logs every step into `DomainVerificationAttempt`, w
166
163
167
164
If the result of a DNS verification on the `CNAME` record is `VERIFIED`, it triggers a field `status` update to the related `DomainVerificationRecord`, keeping the record **statuses** in sync with the `DomainVerification` job results.
168
165
169
-
170
166
### HOLD domain on territory STOP
171
167
Let's say the territory owner doesn't renew their territory, and they have a custom domain attached to it. We can't let the custom domain access Stacker News as the domain can be transferred or out of original owner's control.
0 commit comments