Skip to content

Remove dependency on "word-to-markdown" #146

@ronaldtse

Description

@ronaldtse

The purpose of the following code was just to convert a Word document into HTML then into Coradoc.

We should supersede this as the gem is introducing unnecessary dependencies we don't want.

module Coradoc
  module Input::Docx
    def self.processor_id
      :docx
    end

    def self.processor_match?(filename)
      %w[.docx .doc].any? { |i| filename.downcase.end_with?(i) }
    end

    def self.processor_execute(input, options = {})
      image_dir = Dir.mktmpdir
      options = options.merge(sourcedir: image_dir)
      doc = WordToMarkdown.new(input, image_dir)
      doc = Coradoc::Input::HTML.cleaner.preprocess_word_html(doc.document.html)
      options = WordToMarkdown::REVERSE_MARKDOWN_OPTIONS.merge(options)
      Coradoc::Input::HTML.to_coradoc(doc, options)
    ensure
      FileUtils.rm_rf(image_dir)
    end

    def self.processor_postprocess(data, options)
      Coradoc::Input::HTML.processor_postprocess(data, options)
    end

    # This processor prefers to work on original files.
    def self.processor_wants_filenames; true; end

    Coradoc::Input.define(self)
  end
end

Metadata

Metadata

Assignees

No one assigned

    Labels

    Tech debtSomething to clean upenhancementNew feature or request

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions