Skip to content

Invalid byte sequence in US-ASCII (ArgumentError) when running sitediff diff #132

@brnquester

Description

@brnquester

Summary

I have no problem running sitediff store or sitediff crawl; however, when running sitediff diff I keep getting the following error:

sitediff diff
/usr/local/bundle/gems/sitediff-1.1.1/lib/sitediff/cache.rb:44:in `split': invalid byte sequence in US-ASCII (ArgumentError)
	from /usr/local/bundle/gems/sitediff-1.1.1/lib/sitediff/cache.rb:44:in `get'
	from /usr/local/bundle/gems/sitediff-1.1.1/lib/sitediff/fetch.rb:46:in `block in queue_path'
	from /usr/local/bundle/gems/sitediff-1.1.1/lib/sitediff/fetch.rb:45:in `each'
	from /usr/local/bundle/gems/sitediff-1.1.1/lib/sitediff/fetch.rb:45:in `queue_path'
	from /usr/local/bundle/gems/sitediff-1.1.1/lib/sitediff/fetch.rb:35:in `block in run'
	from /usr/local/bundle/gems/sitediff-1.1.1/lib/sitediff/fetch.rb:35:in `each'
	from /usr/local/bundle/gems/sitediff-1.1.1/lib/sitediff/fetch.rb:35:in `run'
	from /usr/local/bundle/gems/sitediff-1.1.1/lib/sitediff.rb:184:in `run'
	from /usr/local/bundle/gems/sitediff-1.1.1/lib/sitediff/api.rb:117:in `diff'
	from /usr/local/bundle/gems/sitediff-1.1.1/lib/sitediff/cli.rb:127:in `diff'
	from /usr/local/bundle/gems/thor-0.20.3/lib/thor/command.rb:27:in `run'
	from /usr/local/bundle/gems/thor-0.20.3/lib/thor/invocation.rb:126:in `invoke_command'
	from /usr/local/bundle/gems/thor-0.20.3/lib/thor.rb:387:in `dispatch'
	from /usr/local/bundle/gems/thor-0.20.3/lib/thor/base.rb:466:in `start'
	from /usr/local/bundle/gems/sitediff-1.1.1/bin/sitediff:12:in `<top (required)>'
	from /usr/local/bundle/bin/sitediff:23:in `load'
	from /usr/local/bundle/bin/sitediff:23:in `<main>'
Reading config file: /website/sitediff/sitediff.yaml
Read 4582 paths from: /website/sitediff/paths.txt

Solution attempts

I was able to pass that error by patching it with:

sed -i 's/path.split(File::SEPARATOR)/path.encode('\''UTF-8'\'', :invalid => :replace).split(File::SEPARATOR)/g' /usr/local/bundle/gems/sitediff-1.1.1/lib/sitediff/cache.rb

But then I started to get other error:

sitediff diff
/usr/local/bundle/gems/addressable-2.5.2/lib/addressable/uri.rb:107:in `scan': invalid byte sequence in US-ASCII (ArgumentError)
	from /usr/local/bundle/gems/addressable-2.5.2/lib/addressable/uri.rb:107:in `parse'
	from /usr/local/bundle/gems/sitediff-1.1.1/lib/sitediff/uriwrapper.rb:52:in `initialize'
	from /usr/local/bundle/gems/sitediff-1.1.1/lib/sitediff/fetch.rb:54:in `new'
	from /usr/local/bundle/gems/sitediff-1.1.1/lib/sitediff/fetch.rb:54:in `block in queue_path'
	from /usr/local/bundle/gems/sitediff-1.1.1/lib/sitediff/fetch.rb:45:in `each'
	from /usr/local/bundle/gems/sitediff-1.1.1/lib/sitediff/fetch.rb:45:in `queue_path'
	from /usr/local/bundle/gems/sitediff-1.1.1/lib/sitediff/fetch.rb:35:in `block in run'
	from /usr/local/bundle/gems/sitediff-1.1.1/lib/sitediff/fetch.rb:35:in `each'
	from /usr/local/bundle/gems/sitediff-1.1.1/lib/sitediff/fetch.rb:35:in `run'
	from /usr/local/bundle/gems/sitediff-1.1.1/lib/sitediff.rb:184:in `run'
	from /usr/local/bundle/gems/sitediff-1.1.1/lib/sitediff/api.rb:117:in `diff'
	from /usr/local/bundle/gems/sitediff-1.1.1/lib/sitediff/cli.rb:127:in `diff'
	from /usr/local/bundle/gems/thor-0.20.3/lib/thor/command.rb:27:in `run'
	from /usr/local/bundle/gems/thor-0.20.3/lib/thor/invocation.rb:126:in `invoke_command'
	from /usr/local/bundle/gems/thor-0.20.3/lib/thor.rb:387:in `dispatch'
	from /usr/local/bundle/gems/thor-0.20.3/lib/thor/base.rb:466:in `start'
	from /usr/local/bundle/gems/sitediff-1.1.1/bin/sitediff:12:in `<top (required)>'
	from /usr/local/bundle/bin/sitediff:23:in `load'
	from /usr/local/bundle/bin/sitediff:23:in `<main>'
Reading config file: /website/sitediff/sitediff.yaml
Read 4581 paths from: /website/sitediff/paths.txt
Using sites from cache: before

I have also tried to declare the encoding in the container before running/installing it with no success:

export LANG="en_US.UTF-8"
export LANGUAGE="en_US.UTF-8"
export LC_CTYPE="en_US.UTF-8"
export LC_NUMERIC="en_US.UTF-8"
export LC_TIME="en_US.UTF-8"
export LC_COLLATE="en_US.UTF-8"
export LC_MONETARY="en_US.UTF-8"
export LC_MESSAGES="en_US.UTF-8"
export LC_PAPER="en_US.UTF-8"
export LC_NAME="en_US.UTF-8"
export LC_ADDRESS="en_US.UTF-8"
export LC_TELEPHONE="en_US.UTF-8"
export LC_MEASUREMENT="en_US.UTF-8"
export LC_IDENTIFICATION="en_US.UTF-8"

Any thoughts?

Tech stack

  • Ubuntu 21.04
  • Docker container Ruby v2.6.9
  • Sitediff v1.1.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions