-
Notifications
You must be signed in to change notification settings - Fork 17
feat: rewrite metadata urls #509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -223,3 +223,37 @@ export function toArray(v) { | |
} | ||
return Array.isArray(v) ? v : [v]; | ||
} | ||
|
||
/** | ||
* Returns an updated value of the provided metadata value: | ||
* if the metadata is a URL or an array of URLs (comma separated) | ||
* and if the URL(s) contains the preview or live host, | ||
* the URL(s) are replaced by the absolute URL(s) to the production host | ||
* @param {PipelineState} state the request state | ||
* @param {string} meta the metadata value | ||
* @returns {string} meta the update metadata value or the original value | ||
*/ | ||
export function makeAbsoluteURLForMeta(state, meta) { | ||
if (meta && meta.startsWith('http')) { | ||
const rewrite = []; | ||
const split = meta.split(/\s*,\s*/).map((v) => v.trim()); | ||
|
||
const previewSuffix = state.previewHost?.replace(/^.*?--/, '--'); | ||
const liveSuffix = state.liveHost?.replace(/^.*?--/, '--'); | ||
split.forEach((v) => { | ||
try { | ||
const u = new URL(v); | ||
if ((previewSuffix && u.host.endsWith(previewSuffix)) | ||
|| (liveSuffix && u.host.endsWith(liveSuffix))) { | ||
rewrite.push(getAbsoluteUrl(state, `${u.pathname}${u.search}${u.hash}`)); | ||
} else { | ||
rewrite.push(v); | ||
} | ||
} catch (e) { | ||
rewrite.push(v); | ||
} | ||
}); | ||
Comment on lines
+243
to
+255
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also, we need to be careful not to change the behaviour of the canonical url until we decided how it should be handled. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The new code only makes sure no |
||
return rewrite.join(', '); | ||
} | ||
return meta; | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
<head> | ||
<title>ACME CORP</title> | ||
<link rel="canonical" href="https://www.adobe.com/page-metadata-block-urls"> | ||
<meta name="description" content="Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed euismod, urna eu tempor congue, nisi erat condimentum nunc, eget tincidunt nisl nunc euismod."> | ||
<meta property="og:title" content="ACME CORP"> | ||
<meta property="og:description" content="Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed euismod, urna eu tempor congue, nisi erat condimentum nunc, eget tincidunt nisl nunc euismod."> | ||
<meta property="og:url" content="https://www.adobe.com/page-metadata-block-urls"> | ||
<meta property="og:image" content="https://www.adobe.com/default-meta-image.png?width=1200&format=pjpg&optimize=medium"> | ||
<meta property="og:image:secure_url" content="https://www.adobe.com/default-meta-image.png?width=1200&format=pjpg&optimize=medium"> | ||
<meta name="twitter:card" content="summary_large_image"> | ||
<meta name="twitter:title" content="ACME CORP"> | ||
<meta name="twitter:description" content="Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed euismod, urna eu tempor congue, nisi erat condimentum nunc, eget tincidunt nisl nunc euismod."> | ||
<meta name="twitter:image" content="https://www.adobe.com/default-meta-image.png?width=1200&format=pjpg&optimize=medium"> | ||
<meta name="locale" content="en-US"> | ||
<meta name="zero-cell" content="0"> | ||
<meta name="page-url-branch" content="https://www.adobe.com/page.html"> | ||
<meta name="live-url-branch" content="https://www.adobe.com/live.html"> | ||
<meta name="page-url-main" content="https://www.adobe.com/page.html"> | ||
<meta name="live-url-main" content="https://www.adobe.com/live.html"> | ||
<meta name="prod-url" content="https://www.adobe.com/prod-page.html"> | ||
<meta name="hlx-page-url" content="https://www.adobe.com/express/"> | ||
<meta name="hlx-page-urls-1" content="https://www.adobe.com/express/page1, https://www.adobe.com/express/page2"> | ||
<meta name="hlx-page-urls-2" content="https://www.adobe.com/express/page3, https://www.adobe.com/express/page4"> | ||
<meta name="hlx-page-urls-3" content="https://www.adobe.com/express/page5, https://www.adobe.com/express/page6"> | ||
<meta name="hlx-live-url" content="https://www.adobe.com/express/"> | ||
<meta name="branch-hlx-live-url" content="https://www.adobe.com/express/"> | ||
<link id="favicon" rel="icon" type="image/svg+xml" href="/icons/spark.svg"> | ||
<meta name="viewport" content="width=device-width, initial-scale=1"> | ||
<script src="/scripts.js" type="module"></script> | ||
<link rel="stylesheet" href="/styles.css"> | ||
</head> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
# Metadata URLs rewrite Test | ||
|
||
<table> | ||
<tr> | ||
<td>Metadata</td> | ||
</tr> | ||
<tr> | ||
<td>hlx.page url</td> | ||
<td><a href="https://main--helix-pages--adobe.hlx.page/express/">https://main--helix-pages--adobe.hlx.page/express/</a></td> | ||
</tr> | ||
<tr> | ||
<td>hlx.page urls 1</td> | ||
<td> | ||
<p><a href="https://main--helix-pages--adobe.hlx.page/express/page1">https://main--helix-pages--adobe.hlx.page/express/page1</a></p> | ||
<p><a href="https://main--helix-pages--adobe.hlx.page/express/page2">https://main--helix-pages--adobe.hlx.page/express/page2</a></p></td> | ||
</tr> | ||
<tr> | ||
<td>hlx.page urls 2</td> | ||
<td> | ||
<p><a href="https://main--helix-pages--adobe.hlx.page/express/page3">https://main--helix-pages--adobe.hlx.page/express/page3</a> </p> | ||
<p><a href="https://main--helix-pages--adobe.hlx.page/express/page4">https://main--helix-pages--adobe.hlx.page/express/page4</a> </p></td> | ||
</tr> | ||
<tr> | ||
<td>hlx.page urls 3</td> | ||
<td> | ||
<ul> | ||
<li><a href="https://main--helix-pages--adobe.hlx.page/express/page5">https://main--helix-pages--adobe.hlx.page/express/page5</a></li> | ||
<li><a href="https://main--helix-pages--adobe.hlx.page/express/page6">https://main--helix-pages--adobe.hlx.page/express/page6</a></li> | ||
</ul> | ||
</td> | ||
</tr> | ||
<tr> | ||
<td>hlx.live url</td> | ||
<td><a href="https://main--helix-pages--adobe.hlx.live/express/">https://main--helix-pages--adobe.hlx.page/express/</a></td> | ||
</tr> | ||
<tr> | ||
<td>branch hlx.live url</td> | ||
<td><a href="https://another-ref--helix-pages--adobe.hlx.live/express/">https://another-ref--helix-pages--adobe.hlx.page/express/</a></td> | ||
</tr> | ||
</table> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could reuse
rewriteUrls
here. That util is also used in therewrite-urls
step and turns URLs like this into path-only URLs. Or do we really need absolute URLs here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or add a flag
makeAbsolute
torewrite-urls
()There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, it would probably be better to apply the rewriting in render, as the logic in metadata is already a bit complex:
helix-html-pipeline/src/steps/render.js
Lines 62 to 72 in c7b98c4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meta are strings only (not anchor with href), I think it must stay absolute urls - you would need a special treatment to consume the meta if it can be a relative or an absolute url (if external link). In fact, making them relative might break all existing logic...
I initially started with
rewriteUrls
but several things:makeAbsolute
flag would be required and this adds extra logic which is currently isolated in "my"makeAbsoluteURLForMeta
function.custombranch--repo--owner.hlx.page
, it is not considered - this could be an improvement of the current methodrewriteUrls
is used in different places, I did not want to interfere and take the risk to change some behaviourI can re-try with
rewriteUrls
and see the impact.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I assume we do not get the values from the metadata sheet in here... no ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dev branch for testing can be easily achieved in the client side code consuming the metadata: it needs the path only anyway and use the current host. And it is already the case today: the code cannot assume it gets the correct host anyway if the content is managed by an author.
The only problem that this solves is authors copy/pasting urls from any of the 3 hosts in the metadata of their Word / gdoc documents, as we tell them to do. If they paste a hlx.page url, we should just make sure it does not appear in production. All the rest is anyway out of scope and does to change here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I do not think this is a "BREAKING CHANGE", slight change in behaviour, yes, but you have a URL today, you still get a URL after this. Different host, yes but points to same "content" in prod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, switching to relative links may be too breaking, but isn't similar behavior alreasy in
makeCanonicalHtmlUrl
today? I wouldn really try to reuse existing rewriting logic as much as possible.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope.
makeCanonicalHtmlUrl
is doing something else (only appendingindex.html
where needed).We have a large collection of methods doing similar but different things. That's why it is hard to plug on top without breaking backward compatibility.
Also
rewriteUrl
should be namedmakeRelative
because this is really what it is doing.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it is. projects might expect .hlx.page or hlx.live urls, and might be not be able to deal with the production url.