Add documentation pages to the Sitemap #2078
Replies: 17 comments 6 replies
-
I wonder if it would be advantageous to support nested sitemaps? https://developers.google.com/search/docs/advanced/sitemaps/large-sitemaps This is where we would generate a OWNER/PACKAGE/sitemap.xml which includes all the pages for that package. The top level sitemap then references all of those individual sitemaps instead of itself containing every single line. It may offer more flexibility and scalability as we add more pages under the package route. |
Beta Was this translation helpful? Give feedback.
-
It's possible yes. Certainly if we're able to get every documentation path then this makes sense. If we're only able to add Looks like sitemaps can be up to 50MB (uncompressed) and have 50,000 entries. I'd say we're fine for now. |
Beta Was this translation helpful? Give feedback.
-
Imagine a world where... On a server, after generating a set of statically transformed DocC documentation... Where the We used our ability to execute a command before uploading build results to execute this command... jq "{ title: .metadata.title, path: .variants[0].paths[0] }" data/**/*.json | jq "[inputs]" … and received this as the output… [
{
"title": "SemanticVersion",
"path": "/documentation/semanticversion/semanticversion"
},
{
"title": "!=(_:_:)",
"path": "/documentation/semanticversion/semanticversion/!=(_:_:)"
},
{
"title": "...(_:)",
"path": "/documentation/semanticversion/semanticversion/'...(_:)-40b95"
},
{
"title": "...(_:)",
"path": "/documentation/semanticversion/semanticversion/'...(_:)-bfr8"
},
{
"title": "...(_:_:)",
"path": "/documentation/semanticversion/semanticversion/'...(_:_:)"
},
{
"title": "..<(_:)",
"path": "/documentation/semanticversion/semanticversion/'.._(_:)"
},
{
"title": "..<(_:_:)",
"path": "/documentation/semanticversion/semanticversion/'.._(_:_:)"
},
{
"title": " Implementations",
"path": "/documentation/semanticversion/semanticversion/-implementations"
},
{
"title": "<(_:_:)",
"path": "/documentation/semanticversion/semanticversion/_(_:_:)-1ojsm"
},
{
"title": ">(_:_:)",
"path": "/documentation/semanticversion/semanticversion/_(_:_:)-4ftn7"
},
{
"title": ">=(_:_:)",
"path": "/documentation/semanticversion/semanticversion/_=(_:_:)-3q5ap"
},
{
"title": "<=(_:_:)",
"path": "/documentation/semanticversion/semanticversion/_=(_:_:)-9elz8"
},
{
"title": "build",
"path": "/documentation/semanticversion/semanticversion/build"
},
{
"title": "Comparable Implementations",
"path": "/documentation/semanticversion/semanticversion/comparable-implementations"
},
{
"title": "CustomStringConvertible Implementations",
"path": "/documentation/semanticversion/semanticversion/customstringconvertible-implementations"
},
{
"title": "description",
"path": "/documentation/semanticversion/semanticversion/description"
},
{
"title": "Equatable Implementations",
"path": "/documentation/semanticversion/semanticversion/equatable-implementations"
},
{
"title": "init(_:)",
"path": "/documentation/semanticversion/semanticversion/init(_:)"
},
{
"title": "init(_:_:_:_:_:)",
"path": "/documentation/semanticversion/semanticversion/init(_:_:_:_:_:)"
},
{
"title": "init(from:)",
"path": "/documentation/semanticversion/semanticversion/init(from:)"
},
{
"title": "isInitialRelease",
"path": "/documentation/semanticversion/semanticversion/isinitialrelease"
},
{
"title": "isMajorRelease",
"path": "/documentation/semanticversion/semanticversion/ismajorrelease"
},
{
"title": "isMinorRelease",
"path": "/documentation/semanticversion/semanticversion/isminorrelease"
},
{
"title": "isPatchRelease",
"path": "/documentation/semanticversion/semanticversion/ispatchrelease"
},
{
"title": "isPreRelease",
"path": "/documentation/semanticversion/semanticversion/isprerelease"
},
{
"title": "isStable",
"path": "/documentation/semanticversion/semanticversion/isstable"
},
{
"title": "LosslessStringConvertible Implementations",
"path": "/documentation/semanticversion/semanticversion/losslessstringconvertible-implementations"
},
{
"title": "major",
"path": "/documentation/semanticversion/semanticversion/major"
},
{
"title": "minor",
"path": "/documentation/semanticversion/semanticversion/minor"
},
{
"title": "patch",
"path": "/documentation/semanticversion/semanticversion/patch"
},
{
"title": "preRelease",
"path": "/documentation/semanticversion/semanticversion/prerelease"
}
] Wouldn't that be a wonderful world? 😂 |
Beta Was this translation helpful? Give feedback.
-
If we send back that JSON, we could use it to generate complete sitemaps for all documentation that we host. |
Beta Was this translation helpful? Give feedback.
-
It can also spit it out minified with jq "{ title: .metadata.title, path: .variants[0].paths[0] }" data/**/*.json | jq -c "[inputs]" |
Beta Was this translation helpful? Give feedback.
-
We could also generate the sitemap XML as part of the builder, upload the sitemap to S3 along with the documentation archives, and serve the sitemaps through our docc-proxy mechanism. |
Beta Was this translation helpful? Give feedback.
-
Of course, since we're running the builder the |
Beta Was this translation helpful? Give feedback.
-
Possibly of interest. I tack on the option I crunch that down (with that lovely (And my process has a second step where I grep-sed-rage across the results of that to transform them from identifiers into symbol names) If you grab the |
Beta Was this translation helpful? Give feedback.
-
Thanks Joe! It's funny. Before I started looking at just using the directory/file structure, I was sure I had read about an option that could output a set of doc paths. I tried looking for an option on DocC to do this but didn't find Now you remind me of its name, a quick Google later reminds me where I read about it... https://rhonabwy.com/2022/02/10/tips-for-getting-the-most-out-of-docc/ 😂 |
Beta Was this translation helpful? Give feedback.
-
I learned it from Ethan on the Swift Forums when I was whining about not having a means to see the list of all possible symbols in a package. (I still like to dump all the symbols into a single doc and them sort them out as a curation process) |
Beta Was this translation helpful? Give feedback.
-
I'm going to move this back to being a discussion for now. We may look at a more comprehensive site map in the future, but the documentation links are crawlable from package pages, and we don't list any other package sub-pages right now. The really interesting one that would give a significant advantage would be to use the |
Beta Was this translation helpful? Give feedback.
-
I've received a user report that searching for Another point was to use |
Beta Was this translation helpful? Give feedback.
-
Having to reply via email - apologies, no direct access to github at the moment. If i’m tracking correctly, we’d like to have the content of doc pages within the full text search index. Externally that might be easier, as Google et al index off the content of crawled pages with JS enabled. Current DOCC content only works when viewing it through a javascript-enabled lens. The underlying bits we’d like to index ARE included in the JSON files, so they’re sort of available to scrape, normalize, and pull into a full text index - but that’s going to be rather obnoxiously difficult. I was experimenting with rendering to static HTML, but it’s nowhere near close (or even decent looking) - a variation of that might be useful enough to get a series of plain text strings from the encoded JSON data though. If we only want the symbol names, that’s a bit easier target to hit - and there are potentially indexes pre-built of all the symbols available (in a URI form) that can be disambiguated into strings we might be able to use for this purpose. -joeOn Nov 3, 2022, at 8:38 PM, Sven A. Schmidt ***@***.***> wrote:
I was also searching for something doc related the other day and realised we're not really ranking. I can't remember what the search actually was 😕 but it'd be great if we could boost this. Docs are very compelling content to rank on - I find that most of my use of SPI is already docs related.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I have some thoughts on this, too. Will write them up on Monday or Tuesday |
Beta Was this translation helpful? Give feedback.
-
You’ll note that I didn’t say which Monday or Tuesday! 😂 So. The reason I closed this/moved it to a discussion was because while it would be helpful to let Google know about every one of the documentation pages that we host, it should not be necessary. Having a sitemap for the main package pages is essential. Most of the content on the site is only available through search, and so if we want Google to know about every package, we need to give it a full index of every package. That’s what our sitemap does right now. Once Google knows about each package page, it should (and does) crawl from there. To prove this is working, here are a couple of queries that return results both on package sub-pages and also on documentation pages:
So, fundamentally, the documentation we host is being indexed by Google. This also proves that the JavaScript-based rendering of DocC is not tripping Google up as those pages in the "ExpressibleByStringLiteral" search are not on top level documentation pages so it’s successfully following links rendered by the DocC JavaScript renderer. The implementation of a documentation sitemap would be complex, and require a lot of work:
It was at this point that given that crawling was working, and faced with a large implementation like that, I decided it wasn’t worth the effort. — Now we get to the meat of the problem. I presume the “PostgresQuery ExpressibleByStringLiteral” query you mentioned was referring to this page not being findable through Google search? I checked in the Google Search console and sure enough, it’s not in Google’s index: The cause of this documentation sub-page not being indexed is that the top leveldocumentation page for that package is also not in Google’s index: I can’t explain why Google isn’t crawling this link, but it does lead me to a slightly easier solution that might save us from the full job of per-package sitemaps and moving the We could add a single line to the existing sitemap for packages that have documentation: <url>
<loc>https://swiftpackageindex.com/vapor/postgres-nio</loc>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://swiftpackageindex.com/vapor/postgres-nio/documentation</loc>
<changefreq>daily</changefreq>
</url> I’ll add a ticket for that change now and we can see if that helps. |
Beta Was this translation helpful? Give feedback.
-
I came to check on this today and unfortunately the postgres-nio package has moved its documentation away from SPI. 😬 I checked a few other packages, though, and it looks like this change has not worked. For example. None of these pages are indexed by Google yet:
And yet at the same time, these are:
It’s very very odd. |
Beta Was this translation helpful? Give feedback.
-
Tracked as #2200 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
They're not included in the
SiteURL
so it didn't happen automatically. Also we should look into whether DocC outputs anything we can use to grab all of the documentation page paths, not just the/[owner]/[package]/[ref]/documentation
page.Beta Was this translation helpful? Give feedback.
All reactions