Skip to content

Commit 52aed31

Browse files
BYKbitsandfoxes
authored andcommitted
ci(build): Parallelize and cache mdx pipeline - fix md cache (#14109)
Follow up to #14096. - Makes the entire `mdx.ts` and accompanying modules async - Limits concurrency in `mdx.ts` async ops (otherwise we crash Vercel Functions / AWS Lambda) (check https://sentry-docs-git-byk-cimdx-cache.sentry.dev/platform-redirect/ -- should not crash) - Adds compression to caches (as Vercel complained about the function size) - Removes the `<script>` blocks from the HTML for cache key calculation and faster `md` generation. Script blocks are already ignored and they are not stable across builds even when nothing changes, causing cache misses. Cuts down build times from ~21-22 minutes to ~13-14 minutes.
1 parent 73b524a commit 52aed31

File tree

13 files changed

+454
-248
lines changed

13 files changed

+454
-248
lines changed

.babelrc.js.bak

Lines changed: 0 additions & 16 deletions
This file was deleted.

app/sitemap.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ import {getDevDocsFrontMatter, getDocsFrontMatter} from 'sentry-docs/mdx';
55

66
export default async function sitemap(): Promise<MetadataRoute.Sitemap> {
77
if (isDeveloperDocs) {
8-
const docs = getDevDocsFrontMatter();
8+
const docs = await getDevDocsFrontMatter();
99
const baseUrl = 'https://develop.sentry.dev';
1010
return docsToSitemap(docs, baseUrl);
1111
}

docs/product/explore/session-replay/web/index.mdx

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,6 @@ sidebar_order: 10
44
description: "Learn about Session Replay and its video-like reproductions of user interactions, which can help you see when users are frustrated and build a better web experience."
55
---
66

7-
<Include name="feature-stage-beta-session-replay.mdx" />
8-
97
Session Replay allows you to see video-like reproductions of user sessions which can help you understand what happened before, during, and after an error or performance issue occurred. You'll be able to gain deeper debugging context into issues so that you can reproduce and resolve problems faster without the guesswork. As you play back each session, you'll be able to see every user interaction in relation to network requests, DOM events, and console messages. It’s effectively like having [DevTools](https://developer.chrome.com/docs/devtools/overview/) active in your production user sessions.
108

119
Replays are integrated with other parts of the Sentry product so you can see how the user experience is impacted by errors and slow transactions. You'll see session replays associated with error events on the [Issue Details](/product/issues/issue-details/) page, and those associated with slow transactions on the [Transaction Summary](/product/insights/overview/transaction-summary/) page. For [backend error replays](/product/explore/session-replay/web/getting-started/#replays-for-backend-errors), any contributing backend errors will be included in the replay's timeline, [breadcrumbs](https://docs.sentry.io/product/issues/issue-details/breadcrumbs/), and errors.

docs/product/sentry-basics/performance-monitoring.mdx

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,6 @@ sidebar_order: 1
44
description: "Understand and monitor how your application performs in production. Track key metrics, analyze bottlenecks, and resolve performance issues with distributed tracing, detailed transaction data, and automated issue detection."
55
---
66

7-
<Include name="performance-moving.mdx" />
8-
97
In many tools, Performance Monitoring is just about tracking a few key metrics on your web pages. Sentry takes a different approach. By setting up [Tracing](/concepts/key-terms/tracing/), Sentry captures detailed performance data for every transaction in your entire application stack and automatically presents it in a variety of easy-to-use but powerful features so you can rapidly identify and resolve performance issues as they happen - all in one place.
108

119
<Alert>

package.json

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@
7878
"next-plausible": "^3.12.4",
7979
"next-themes": "^0.3.0",
8080
"nextjs-toploader": "^1.6.6",
81+
"p-limit": "^6.2.0",
8182
"platformicons": "^8.0.4",
8283
"prism-sentry": "^1.0.2",
8384
"query-string": "^6.13.1",
@@ -116,7 +117,7 @@
116117
"@tailwindcss/forms": "^0.5.7",
117118
"@tailwindcss/typography": "^0.5.10",
118119
"@types/dompurify": "3.0.5",
119-
"@types/node": "^20",
120+
"@types/node": "^22",
120121
"@types/react": "18.3.12",
121122
"@types/react-dom": "18.3.1",
122123
"@types/ws": "^8.5.10",
@@ -140,10 +141,11 @@
140141
},
141142
"resolutions": {
142143
"dompurify": "3.2.4",
143-
"@types/dompurify": "3.0.5"
144+
"@types/dompurify": "3.0.5",
145+
"@types/node": "^22"
144146
},
145147
"volta": {
146-
"node": "20.11.0",
148+
"node": "22.16.0",
147149
"yarn": "1.22.22"
148150
}
149151
}

platform-includes/sourcemaps/overview/javascript.cloudflare.mdx

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,7 @@ If you want to configure source maps to upload manually, follow the guide for yo
1919

2020
### Guides for Source Maps
2121

22-
- <PlatformLink to="/sourcemaps/uploading/typescript/">
23-
TypeScript (tsc)
24-
</PlatformLink>
22+
- <PlatformLink to="/sourcemaps/uploading/typescript/">TypeScript (tsc)</PlatformLink>
2523

2624
<Alert>
2725
If you're using a bundler like Webpack, Vite, Rollup, or Esbuild, use the

platform-includes/sourcemaps/upload/primer/javascript.cloudflare.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,6 @@ If you can't find the tool of your choice in the list below, we recommend you ch
77

88
</Alert>
99

10-
<Include name="sourcemaps/overview/javascript.cloudflare.mdx" />
10+
<Include name="../platform-includes/sourcemaps/overview/javascript.cloudflare.mdx" />
1111

12-
<PageGrid />
12+
<PageGrid />

scripts/algolia.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,9 +64,9 @@ indexAndUpload();
6464
async function indexAndUpload() {
6565
// the page front matters are the source of truth for the static doc routes
6666
// as they are used directly by generateStaticParams() on [[..path]] page
67-
const pageFrontMatters = isDeveloperDocs
67+
const pageFrontMatters = await (isDeveloperDocs
6868
? getDevDocsFrontMatter()
69-
: await getDocsFrontMatter();
69+
: getDocsFrontMatter());
7070
const records = await generateAlogliaRecords(pageFrontMatters);
7171
console.log('🔥 Generated %d new Algolia records.', records.length);
7272
const existingRecordIds = await fetchExistingRecordIds(index);

scripts/generate-md-exports.mjs

Lines changed: 98 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,40 @@
11
#!/usr/bin/env node
2-
2+
/* eslint-disable no-console */
33
import {selectAll} from 'hast-util-select';
44
import {createHash} from 'node:crypto';
5-
import {constants as fsConstants, existsSync} from 'node:fs';
6-
import {copyFile, mkdir, opendir, readFile, rm, writeFile} from 'node:fs/promises';
5+
import {createReadStream, createWriteStream, existsSync} from 'node:fs';
6+
import {mkdir, opendir, readFile, rm} from 'node:fs/promises';
77
import {cpus} from 'node:os';
88
import * as path from 'node:path';
9+
import {Readable} from 'node:stream';
10+
import {pipeline} from 'node:stream/promises';
911
import {fileURLToPath} from 'node:url';
1012
import {isMainThread, parentPort, Worker, workerData} from 'node:worker_threads';
13+
import {
14+
constants as zlibConstants,
15+
createBrotliCompress,
16+
createBrotliDecompress,
17+
} from 'node:zlib';
1118
import rehypeParse from 'rehype-parse';
1219
import rehypeRemark from 'rehype-remark';
1320
import remarkGfm from 'remark-gfm';
1421
import remarkStringify from 'remark-stringify';
1522
import {unified} from 'unified';
1623
import {remove} from 'unist-util-remove';
1724

25+
const CACHE_COMPRESS_LEVEL = 4;
26+
1827
function taskFinishHandler(data) {
1928
if (data.failedTasks.length === 0) {
20-
console.log(`✅ Worker[${data.id}]: ${data.success} files successfully.`);
21-
} else {
22-
hasErrors = true;
23-
console.error(`❌ Worker[${data.id}]: ${data.failedTasks.length} files failed:`);
24-
console.error(data.failedTasks);
29+
console.log(
30+
`💰 Worker[${data.id}]: Cache hits: ${data.cacheHits} (${Math.round((data.cacheHits / data.success) * 100)}%)`
31+
);
32+
console.log(`✅ Worker[${data.id}]: converted ${data.success} files successfully.`);
33+
return false;
2534
}
35+
console.error(`❌ Worker[${data.id}]: ${data.failedTasks.length} files failed:`);
36+
console.error(data.failedTasks);
37+
return true;
2638
}
2739

2840
async function createWork() {
@@ -37,20 +49,21 @@ async function createWork() {
3749
const INPUT_DIR = path.join(root, '.next', 'server', 'app');
3850
const OUTPUT_DIR = path.join(root, 'public', 'md-exports');
3951

40-
const CACHE_VERSION = 1;
41-
const CACHE_DIR = path.join(root, '.next', 'cache', 'md-exports', `v${CACHE_VERSION}`);
42-
const noCache = !existsSync(CACHE_DIR);
43-
if (noCache) {
44-
await mkdir(CACHE_DIR, {recursive: true});
45-
}
46-
4752
console.log(`🚀 Starting markdown generation from: ${INPUT_DIR}`);
4853
console.log(`📁 Output directory: ${OUTPUT_DIR}`);
4954

5055
// Clear output directory
5156
await rm(OUTPUT_DIR, {recursive: true, force: true});
5257
await mkdir(OUTPUT_DIR, {recursive: true});
5358

59+
const CACHE_DIR = path.join(root, '.next', 'cache', 'md-exports');
60+
console.log(`💰 Cache directory: ${CACHE_DIR}`);
61+
const noCache = !existsSync(CACHE_DIR);
62+
if (noCache) {
63+
console.log(`ℹ️ No cache directory found, this will take a while...`);
64+
await mkdir(CACHE_DIR, {recursive: true});
65+
}
66+
5467
// On a 16-core machine, 8 workers were optimal (and slightly faster than 16)
5568
const numWorkers = Math.max(Math.floor(cpus().length / 2), 2);
5669
const workerTasks = new Array(numWorkers).fill(null).map(() => []);
@@ -86,7 +99,7 @@ async function createWork() {
8699
workerData: {id, noCache, cacheDir: CACHE_DIR, tasks: workerTasks[id]},
87100
});
88101
let hasErrors = false;
89-
worker.on('message', taskFinishHandler);
102+
worker.on('message', data => (hasErrors = taskFinishHandler(data)));
90103
worker.on('error', reject);
91104
worker.on('exit', code => {
92105
if (code !== 0) {
@@ -104,7 +117,11 @@ async function createWork() {
104117
cacheDir: CACHE_DIR,
105118
tasks: workerTasks[workerTasks.length - 1],
106119
id: workerTasks.length - 1,
107-
}).then(taskFinishHandler)
120+
}).then(data => {
121+
if (taskFinishHandler(data)) {
122+
throw new Error(`Worker[${data.id}] had some errors.`);
123+
}
124+
})
108125
);
109126

110127
await Promise.all(workerPromises);
@@ -116,62 +133,93 @@ async function createWork() {
116133
const md5 = data => createHash('md5').update(data).digest('hex');
117134

118135
async function genMDFromHTML(source, target, {cacheDir, noCache}) {
119-
const text = await readFile(source, {encoding: 'utf8'});
136+
const text = (await readFile(source, {encoding: 'utf8'}))
137+
// Remove all script tags, as they are not needed in markdown
138+
// and they are not stable across builds, causing cache misses
139+
.replace(/<script[^>]*>[\s\S]*?<\/script>/gi, '');
120140
const hash = md5(text);
121141
const cacheFile = path.join(cacheDir, hash);
122142
if (!noCache) {
123143
try {
124-
await copyFile(cacheFile, target, fsConstants.COPYFILE_FICLONE);
125-
return;
144+
await pipeline(
145+
createReadStream(cacheFile),
146+
createBrotliDecompress(),
147+
createWriteStream(target, {
148+
encoding: 'utf8',
149+
})
150+
);
151+
152+
return true;
126153
} catch {
127154
// pass
128155
}
129156
}
130157

131-
await writeFile(
132-
target,
133-
String(
134-
await unified()
135-
.use(rehypeParse)
136-
// Need the `main div > hgroup` selector for the headers
137-
.use(() => tree => selectAll('main div > hgroup, div#main', tree))
138-
// If we don't do this wrapping, rehypeRemark just returns an empty string -- yeah WTF?
139-
.use(() => tree => ({
140-
type: 'element',
141-
tagName: 'div',
142-
properties: {},
143-
children: tree,
144-
}))
145-
.use(rehypeRemark, {
146-
document: false,
147-
handlers: {
148-
// Remove buttons as they usually get confusing in markdown, especially since we use them as tab headers
149-
button() {},
150-
},
151-
})
152-
// We end up with empty inline code blocks, probably from some tab logic in the HTML, remove them
153-
.use(() => tree => remove(tree, {type: 'inlineCode', value: ''}))
154-
.use(remarkGfm)
155-
.use(remarkStringify)
156-
.process(text)
157-
)
158+
const data = String(
159+
await unified()
160+
.use(rehypeParse)
161+
// Need the `main div > hgroup` selector for the headers
162+
.use(() => tree => selectAll('main div > hgroup, div#main', tree))
163+
// If we don't do this wrapping, rehypeRemark just returns an empty string -- yeah WTF?
164+
.use(() => tree => ({
165+
type: 'element',
166+
tagName: 'div',
167+
properties: {},
168+
children: tree,
169+
}))
170+
.use(rehypeRemark, {
171+
document: false,
172+
handlers: {
173+
// Remove buttons as they usually get confusing in markdown, especially since we use them as tab headers
174+
button() {},
175+
},
176+
})
177+
// We end up with empty inline code blocks, probably from some tab logic in the HTML, remove them
178+
.use(() => tree => remove(tree, {type: 'inlineCode', value: ''}))
179+
.use(remarkGfm)
180+
.use(remarkStringify)
181+
.process(text)
158182
);
159-
await copyFile(target, cacheFile, fsConstants.COPYFILE_FICLONE);
183+
const reader = Readable.from(data);
184+
185+
await Promise.all([
186+
pipeline(
187+
reader,
188+
createWriteStream(target, {
189+
encoding: 'utf8',
190+
})
191+
),
192+
pipeline(
193+
reader,
194+
createBrotliCompress({
195+
chunkSize: 32 * 1024,
196+
params: {
197+
[zlibConstants.BROTLI_PARAM_MODE]: zlibConstants.BROTLI_MODE_TEXT,
198+
[zlibConstants.BROTLI_PARAM_QUALITY]: CACHE_COMPRESS_LEVEL,
199+
[zlibConstants.BROTLI_PARAM_SIZE_HINT]: data.length,
200+
},
201+
}),
202+
createWriteStream(cacheFile)
203+
).catch(err => console.warn('Error writing cache file:', err)),
204+
]);
205+
206+
return false;
160207
}
161208

162209
async function processTaskList({id, tasks, cacheDir, noCache}) {
163210
const failedTasks = [];
211+
let cacheHits = 0;
164212
for (const {sourcePath, targetPath} of tasks) {
165213
try {
166-
await genMDFromHTML(sourcePath, targetPath, {
214+
cacheHits += await genMDFromHTML(sourcePath, targetPath, {
167215
cacheDir,
168216
noCache,
169217
});
170218
} catch (error) {
171219
failedTasks.push({sourcePath, targetPath, error});
172220
}
173221
}
174-
return {id, success: tasks.length - failedTasks.length, failedTasks};
222+
return {id, success: tasks.length - failedTasks.length, failedTasks, cacheHits};
175223
}
176224

177225
async function doWork(work) {

src/docTree.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ export function getDocsRootNode(): Promise<DocNode> {
4040

4141
async function getDocsRootNodeUncached(): Promise<DocNode> {
4242
return frontmatterToTree(
43-
isDeveloperDocs ? getDevDocsFrontMatter() : await getDocsFrontMatter()
43+
await (isDeveloperDocs ? getDevDocsFrontMatter() : getDocsFrontMatter())
4444
);
4545
}
4646

0 commit comments

Comments
 (0)