Skip to content

Parsing not always round-tripable #67

@sh54

Description

@sh54

Assumptions

This assumes that a goal of the library is for a test like this to always pass for any given org:

(let [org "* Headline"]
    (is (= org (write-str (read-str org)))))

The bug

Parsing certain orgmode documents drops information about spacing resulting in a data structure that when written will not be the same as the original document.

To Reproduce

Note the use of extra spacing:

* TODO [#A] Buy raspberries :purchase:

Result from parse:

[:S
 [:headline
  [:stars "*"]
  [:keyword "TODO"]
  [:priority "A"]
  [:text [:text-normal "Buy raspberries   :purchase:   "]]]]

Only extra spacing around tags is still preserved.

Result from (comp transform parse):

{:headlines
 [{:headline
   {:level 1,
    :title [[:text-normal "Buy raspberries"]],
    :planning [],
    :tags ["purchase"]}}]}

Now spacing around tags is lost.

Expected behavior

So the extra spaces between the stars, keyword, priority and title surely violate most people's style guide but seem to be perfectly valid org. Extra spacing before the tags allows for right aligning them. Extraneous spacing should be removed by some formatting pass instead.

I would feel that extra spaces should be preserved ideally in a way that does not make it much harder to manipulate the AST.

Suggested parse structure

[:S
 [:headline
  [:stars "*" [:s "   "]]
  [:keyword "TODO" [:s "   "]]
  [:priority "A" [:s "   "]]
  [:text [:text-normal "Buy raspberries   :purchase:   "]]]]

Suggested transform structure

{:headlines
 [{:headline
   {:level 1,
    :level-post-spacing "   "
    :keyword "TODO" ;; keyword not present is another issue
    :keyword-post-spacing "   "
    :priority "A" ;; priority not present is another issue
    :priority-post-spacing "   "
    :title [[:text-normal "Buy raspberries"]],
    :title-post-spacing "   "
    :planning [],
    :tags ["purchase"]
    :tags-post-spacing "   "}}]}

The original document can be reproduced from either structure. If someone wants to fix the formatting by manipulating the data structure that should be easy too.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions