Entities Changed When Converting from markdown to HTML #10822
Replies: 5 comments 1 reply
-
I assume you're using the So, the answer is no. |
Beta Was this translation helpful? Give feedback.
-
Hi John,
Thanks for your informative reply.
No biggie about the entities, as Kysko would say.
Maybe a kind person on this list could point me to an example of a filter
that I could modify to my own questionable ends?
Thanks!
John
… Message ID: ***@***.***
com>
|
Beta Was this translation helpful? Give feedback.
-
@johnoregan -- this Lua filter should do what you want. You can add additional entities to local html_entities = pandoc.List{
["\u{00A0}"] = " ",
["\u{2026}"] = "…"
}
local char_str = ""
for k in pairs(html_entities) do
char_str = char_str .. k
end
---Replace appropriate Unicode glyphs with HTML entitites.
---@param str Str
---@return List<(Str | RawInline)>, false
---@overload fun(str: Str): nil
function Str(str)
local inlines = pandoc.List{}
local text = str.text
repeat
local start, _end = re.find(text, "([^" .. char_str .. "]+) / ([" .. char_str .. "]+)")
if start then
local segment = text:sub(start, _end)
text = text:sub(_end + 1)
if re.find(segment, "[" .. char_str .. "]") then
for entity in pairs(html_entities) do
segment = segment:gsub(entity, html_entities)
end
inlines:insert(pandoc.RawInline('html', segment))
else
inlines:insert(pandoc.Str(segment))
end
end
until not (start or _end)
if #inlines > 0 then
return inlines, false
end
end This could probably have been done much more elegantly using LPeg, but I haven't had the time yet to familiarise myself with that. |
Beta Was this translation helpful? Give feedback.
-
Hi RNW,
Many thanks!
Top notch stuff.
John
… Message ID: ***@***.***
com>
|
Beta Was this translation helpful? Give feedback.
-
Hi Nat,
Thanks for the pointer.
Will do rn!
John
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello All,
I recently upgraded to Pandoc 3.6.4 on Windows.
I've noticed that HTML named entities in my markdown are being transformed into hexadecimal entities when converted to HTML. For instance, becomes   and … becomes …, while <, &, and > pass through unchanged.
Besides doing something like `…`{=html}, is there a way to prevent Pandoc from transmogrifying my named entities?
Thanks!
John
Beta Was this translation helpful? Give feedback.
All reactions