Skip to content

[advice] Siegried fails to identify a txt file when the filename extension is "wrong" #257

@amayita

Description

@amayita

Hello there!

I am using Siegried within archivematica to identify files, and came across one issue that is maybe "minor", but easy to fix?

When a plain ASCII text file has a .doc extension to its file name, like ASCII_text.doc, Siegfried fails, as it assumes it is a Word doc and does not even attempt to identify what's actually in there.

The file command does identify the ASCII_text.doc file as ASCII text 😄

Is there any way we could improve this behavior upstream? Is this too small to waste time on this?
I really think an ascii txt file should not fail to be identified, no matter the filename.

More info:

$ file bitstream_76d19645-b414-4c03-a268-27a6fb73f157.doc
bitstream_76d19645-b414-4c03-a268-27a6fb73f157.doc: ASCII text, with very long lines (690), with CRLF line terminators

But siegfried output for the same file:

$ sf bitstream_76d19645-b414-4c03-a268-27a6fb73f157.doc
---
siegfried   : 1.11.0
scandate    : 2024-07-29T09:58:17Z
signature   : archivematica.sig
created     : 2023-12-17T15:55:42+01:00
identifiers :
  - name    : 'archivematica'
    details : 'wikidata-definitions-3.0.0; extensions: archivematica-fmt2.xml, archivematica-fmt3.xml, archivematica-fmt4.xml, archivematica-fmt5.xml'
---
filename : 'bitstream_76d19645-b414-4c03-a268-27a6fb73f157.doc'
filesize : 12736
modified : 2024-04-24T15:36:38Z
errors   :
matches  :
  - ns      : 'archivematica'
    id      : 'UNKNOWN'
    format  :
    version :
    mime    :
    class   :
    basis   :
    warning : 'no match; possibilities based on extension are x-fmt/42, x-fmt/43, x-fmt/44, x-fmt/131, x-fmt/274, x-fmt/275, x-fmt/276, x-fmt/329, fmt/39, fmt/40, fmt/37, fmt/38, x-fmt/393, x-fmt/394, fmt/473, fmt/609, fmt/754, fmt/892, fmt/1282, fmt/1283, fmt/1688'

Thanks for any input on this!

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions