Skip to content

[html-element-references] Error in HTTPArchive query/data #10

@bkardell

Description

@bkardell

In the explainer

According to an HTTP archive query, roughly 45% of all HTML responses in July 2025 had multiple occurences of the same ID. Correcting for declarative shadow DOM would only make a small dent in this number.

However, @tunetheweb and I looked into this and the query had problems. The following queries rely on custom metrics rather than regex on an HTTPPayload and also correct other query issues:

https://docs.google.com/spreadsheets/d/19Or5bTZlf68VH-Mwb97uBl2Xsan1YSpR3wUxLTuZVDE/edit?gid=434696809#gid=434696809

Each row is showing a group of pages - the left most column shows a group of pages with exactly 1 conflict, 2 conflicts, 3, etc... the second column tells you how many pages in the crawl have that many conflicts in... so like roughly 17m pages in the crawl have no id conflicts, 10.3m sites (sites each are 2 pages), and the last column is a random url sample from that group. Keep in mind the note at the top that pages != loads or anything. Finally you can see the % of sites/pages

h/t to @tunetheweb for the query help

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions