Skip to content

Commit 5a57a77

Browse files
committed
Merge branch 'custom_domains_base' into custom_domains_authsync
2 parents de48c27 + 6a0b2cc commit 5a57a77

File tree

1 file changed

+26
-30
lines changed

1 file changed

+26
-30
lines changed

docs/dev/custom-domains.md

Lines changed: 26 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,30 @@
1-
# The Hitchhiker's Guide to Custom Domains
21
Lets territory owners attach a domain to their territories.
32
TODO: change the title
43

54
Index:
65
TODO
76

87
# Middleware
9-
Every time we hit a custom domain, middleware checks if it's allowed via a cached list of `ACTIVE` domains, coupled with their `subName`.
10-
If it's allowed, we redirect and rewrite to give custom domains a seamless territory-centered SN experience.
8+
Every time we hit a custom domain, our middleware:
9+
- Looks up a cached map of `ACTIVE` custom domains and their `subName`
10+
- Redirects and rewrites URLs to provide a seamless territory-specific SN experience.
1111
##### Main middleware
12-
Referral cookies and security headers gets applied the same way as before on SN, with the exception of being their own functions, so that now we can apply them also to the customDomainMiddleware resulting response.
13-
##### customDomainMiddleware
14-
A `x-stacker-news-subname` header with the `subName` is injected into the request headers to give the SN code awareness of the territory attached to a custom domain.
12+
Referral cookies and security headers are applied as usual ensuring that the same Stacker News functionality are applied to responses returned by `customDomainMiddleware`
13+
##### Custom Domain Middleware
14+
Injects a `x-stacker-news-subname` header into the request, so that we can avoid checking the cached map of domains again on other parts of the code.
1515

1616
Since SN has several paths that depends on the `sub` parameter or the `~subName/` paths, it manipulates the URL to always stay on the right territory:
17-
- It forces the `sub` parameter to match the custom domain's `subName`
18-
- Rewrites `/` to `~subName/`
17+
- Forces the `sub` parameter to match the custom domain's `subName`
18+
- Internally rewrites `/` to `/~subName/`
1919
- Redirects paths that uses `~subName` to `/`
2020

2121
The territory paths are the following:
2222
`['/~', '/recent', '/random', '/top', '/post', '/edit']`
2323

2424
Rewriting `~subName` to `/` gives custom domains an **independent-like look**, so that things like `/~subName/post` can now look like `/post`, etc.
2525
# Domain Verification
26-
Domain Verification is a pgboss Job that checks correct DNS values and handles AWS external requests.
27-
28-
On domain creation, we schedule a job that starts in 30 seconds sending also the domain ID, and a `singletonKey` that protects this job from being ran from other workers, avoiding concurrency issues.
26+
We use a pgboss job called `domainVerification`, to verify domains, manage AWS integrations and update domain status.
27+
A new job is scheduled 30 seconds after domain creation or `domainVerification` resulting in `PENDING`, including a `singletonKey` to prevent concurrency from other workers.
2928

3029
The Domain Verification Flow is structured this way:
3130
```
@@ -67,18 +66,19 @@ domain is PENDING
6766
```
6867

6968
### DNS Verification
70-
It uses the `Resolver` class from `node:dns/promises` to resolve CNAME records on a domain.
69+
It uses `Resolver` from `node:dns/promises` to fetch CNAME records.
7170

72-
If the CNAME record is correct, it logs a `DomainVerificationAttempt` tied with the `DomainVerificationRecord`, having status `VERIFIED`. This resulting status is shared with the connected `DomainVerificationRecord` thanks to a trigger.
73-
##### dnsmasq
71+
A successful CNAME lookup logs a `DomainVerificationAttempt` with status `VERIFIED`, triggering an update to the corresponding `DomainVerificationRecord` via database triggers.
72+
##### local testing with dnsmasq
7473
In local, **dnsmasq** is used as a DNS server to mock records for the domain verification job.
7574
To have a dedicated IP for the `node:dns` Resolver, the `worker` container is part of a dedicated docker network that gives dnsmasq the `172.30.0.2` IP address.
7675

7776
You can also set your machine's DNS configuration to point to 127.0.0.1:5353 and access custom rules that you might've set. For example, if you have a CNAME record www.pizza.com pointing to `local.sndev`, you can access www.pizza.com from your browser.
7877

7978
For more information on how to add/remove records, take a look at `README.md` on the `Custom domains` section.
8079
### AWS management
81-
The domain verification job also handles critical AWS operations, such as:
80+
AWS operations are handled within the verification job. Each steps logs attempts and allows up to 3 pgboss job retries on critical thrown errors.
81+
8282
- certificate issuance
8383
- certificate validation values
8484
- certificate polling
@@ -88,39 +88,36 @@ The domain verification job also handles critical AWS operations, such as:
8888
After DNS checks, if we don't have a certificate already, we request ACM a new certificate for the domain.
8989
ACM will return a `certificateArn`, which is the unique ID of an ACM certificate, that is immediately used to check its status. These informations are then stored in the `DomainCertificate` table.
9090

91-
If we couldn't request a certificate, check its status or store it in the DB, it throws an error so that pgboss can retry the job.
92-
9391
##### Certificate validation values
9492
ACM needs to verify domain ownership in order to validate the certificate, in this case we use the DNS method.
9593

9694
We ask ACM for the DNS records so that we can store them as a `DomainVerificationRecord` and present them to the user. Finally, we re-schedule the job so that the user can adjust their DNS configuration.
9795

98-
If we couldn't get validation values or store them in the DB, it throws an error so that pgboss can retry the job.
99-
100-
##### Certificate validation polling
96+
##### Certificate validation and status polling
10197
We asked ACM for a certificate, got its validation values and presented them to the user. Now we need to poll ACM to know if the verification was successful.
10298

103-
Since we're directly checking the certificate status, we also update DomainCertificate on our DB with the new status.
99+
Since we're directly checking the certificate status, we also update `DomainCertificate` on our DB with the new status.
104100

105-
AWS timings are unpredictable, if the verification returns a negative result, we re-schedule the job to repeat this step.
106-
And If we couldn't contact ACM, it throws an error so that pgboss can retry the job.
101+
AWS validation timings are unpredictable, if the verification returns a negative result, we re-schedule the job to repeat this step.
107102

108103
##### Certificate attachment to the ALB listener
109104
This is the last step regarding AWS in our domain verification job, it attaches a completely verified ACM certificate to our load balancer listener.
110105

111106
The ALB listener is the gatekeeper of the application load balancer (ALB), it determines how incoming requests should be routed to the target server.
112107

113-
In the case of Stacker News, the domain points directly at the load balancer listener, this means that we can both direct the user to point their `CNAME` record to `stacker.news` and that we can serve their ACM certificate directly from the load balancer.
108+
In the case of Stacker News, the domain points directly to the load balancer listener, this means that we can both direct the user to point their `CNAME` record to `stacker.news` and we can serve their ACM certificate directly from the load balancer.
109+
110+
### Error handling
111+
Every AWS or DNS step is wrapped in try/catch:
112+
If something throws an error, we catch it to log the attempt and then re-throw it to let pgboss retry up to 3 times.
113+
114+
Using the `jobId` that we pass with each job, we can know if we're reaching 3 retries using pgboss' `getJobById`. And if we did reach 3 retries, we put the domain on `HOLD`, stopping and deleting future jobs tied to this domain.
114115

115116
### End of the job
116117
When we finish a step in the domain verification job, and the resulting status is still `PENDING`, we re-schedule a job using `sendDebounced` by pgboss.
117118

118119
Since we use a `singletonKey` to avoid same-domain concurrent jobs, and you can't schedule another job if one is already running, `sendDebounced` will try to schedule a job when it can, e.g. when the job finishes or after 30 seconds.
119120

120-
### Error handling
121-
If something throws an error, we catch it to log the attempt and then re-throw it to let pgboss retry up to 3 times.
122-
Using the `jobId` that we pass with each job, we can know if we're reaching 3 retries using pgboss' `getJobById`. And if we did reach 3 retries, we put the domain on `HOLD`, stopping and deleting future jobs tied to this domain.
123-
124121
### Domain Verification logger
125122
We need to be able to track where, when and what went wrong during domain verification. To do this, every step of the job calls `logAttempt`
126123
##### logAttempt
@@ -155,7 +152,7 @@ Since in local we don't have the possibility to use Localstack to mock the ALB,
155152

156153
As the ALB is really important to reach stacker.news, we only implemented Attach/Detach certificate functions that takes a specific `certificateArn` (unique ID). This way we can't possibly mess with the default ALB configuration.
157154

158-
# plpgSQL functions and triggers
155+
# Triggers, cleanup and maintenance
159156
### Clear Long Held Domains
160157
Every midnight, the `clearLongHeldDomains` job gets executed to remove domains that have been on `HOLD` for more than 30 days.
161158

@@ -166,7 +163,6 @@ The `DomainVerification` job logs every step into `DomainVerificationAttempt`, w
166163

167164
If the result of a DNS verification on the `CNAME` record is `VERIFIED`, it triggers a field `status` update to the related `DomainVerificationRecord`, keeping the record **statuses** in sync with the `DomainVerification` job results.
168165

169-
170166
### HOLD domain on territory STOP
171167
Let's say the territory owner doesn't renew their territory, and they have a custom domain attached to it. We can't let the custom domain access Stacker News as the domain can be transferred or out of original owner's control.
172168

0 commit comments

Comments
 (0)