-
Notifications
You must be signed in to change notification settings - Fork 5
Description
In the explainer
According to an HTTP archive query, roughly 45% of all HTML responses in July 2025 had multiple occurences of the same ID. Correcting for declarative shadow DOM would only make a small dent in this number.
However, @tunetheweb and I looked into this and the query had problems. The following queries rely on custom metrics rather than regex on an HTTPPayload and also correct other query issues:
Each row is showing a group of pages - the left most column shows a group of pages with exactly 1 conflict, 2 conflicts, 3, etc... the second column tells you how many pages in the crawl have that many conflicts in... so like roughly 17m pages in the crawl have no id conflicts, 10.3m sites (sites each are 2 pages), and the last column is a random url sample from that group. Keep in mind the note at the top that pages != loads or anything. Finally you can see the % of sites/pages
h/t to @tunetheweb for the query help