- UnicodeFix
- Finally - a tool that blasts AI fingerprints, torches those infuriating smart quotes, and leaves your code & docs squeaky clean for real humans.
- Why Is This Happening?
- Installation
- Usage
- Brief Examples
- What's New / What's Cool
- Shortcut for macOS
- What's in This Repository
- Testing and CI/CD
- Contributing
- Support This and Other Projects
- Changelog
- License
Finally - a tool that blasts AI fingerprints, torches those infuriating smart quotes, and leaves your code & docs squeaky clean for real humans.
Ever open up a file and instantly know it came from ChatGPT, Copilot, or one of their AI cousins? (Yeah, so can everyone else now.) UnicodeFix vaporizes all the weird dashes, curly quotes, invisible space ninjas, and digital "tells" that out you as an AI user - or just make your stuff fail linters and code reviews.
Whether you're a student, a dev, or an open-source rebel: this is your "eraser for AI breadcrumbs."
Yes, it helps students cheat on their homework. It also makes blog posts and AI-proofed emails look like you sweated over every character. Nearly a thousand people have grabbed it. Nobody's bought me a coffee yet, but hey… there's a first time for everything.
Some folks think all this Unicode cruft is a side-effect of generative AI's training data. Others believe it's a deliberate move - baked-in "watermarks" to ID machine-generated text. Either way: these artifacts leave a trail. UnicodeFix wipes it.
Clone the repository and run the setup script:
git clone https://github.com/unixwzrd/UnicodeFix.git
cd UnicodeFix
bash setup.sh
The setup.sh
script:
- Creates a Python virtual environment just for UnicodeFix
- Installs dependencies
- Adds handy startup config to your
.bashrc
for one-command usage
See setup.sh for the nitty-gritty.
For serious environment nerds: VenvUtil is my full-featured Python env toolkit.
Once installed and activated:
(python-3.10-PA-dev) [unixwzrd@xanax: UnicodeFix]$ cleanup-text --help
usage: cleanup-text [-h] [-i] [-o OUTPUT] [-t] [-p] [-n] [infile ...]
Clean Unicode quirks from text. If no input files are given, reads from STDIN and writes to STDOUT (filter mode). If input files are given, creates cleaned files with .clean before the extension (e.g., foo.txt -> foo.clean.txt). Use -o - to force output to STDOUT for all input files, or -o <file> to specify a single output file
(only with one input file).
positional arguments:
infile Input file(s)
options:
-h, --help show this help message and exit
-i, --invisible Preserve invisible Unicode characters (zero-width, non-breaking, etc.)
-o OUTPUT, --output OUTPUT
Output file name, or '-' for STDOUT. Only valid with one input file, or use '-' for STDOUT with multiple files.
-t, --temp In-place cleaning: Move each input file to .tmp, clean it, write cleaned output to original name, and delete .tmp after success.
-p, --preserve-tmp With -t, preserve the .tmp file after cleaning (do not delete it). Useful for backup or manual recovery.
-n, --no-newline Do not add a newline at the end of the output file (suppress final newline).
cat file.txt | cleanup-text > cleaned.txt
cleanup-text *.txt
cleanup-text -t myfile.txt
cleanup-text -t -p myfile.txt
:%!cleanup-text
You can run it from Vim, VS Code in Vim mode, or as a pre-commit. Use it for email, blog posts, whatever. Ignore the naysayers - this is real-world convenience.
See cleanup-text.md for deeper dives and arcane options.
- Make sure your Python environment is activated before launching your editor, or wrap it in a shell script that does it for you.
- Adjust your editor's shell settings as needed for best results.
- Vaporizes invisible Unicode (unless you tell it not to)
- Normalizes EM/EN dashes to true ASCII - no more AI " - " nonsense
- Wipes AI "tells," watermarks, and digital fingerprints
- Fixes trailing whitespace, normalizes newlines, burns the digital junk
- Portable (Python 3.7+), cross-platform
- Integrated macOS Shortcut for right-click cleaning in Finder
- Can be used in CI/CD - but also by normal humans, not just pipeline freaks
Fun fact: Even Python will execute code with "curly quotes." Your IDE, email client, and browser all sneak these in. UnicodeFix hunts them down and torches them.
UnicodeFix ships with a macOS Shortcut for direct Finder integration.
Right-click files, pick a Quick Action, and - bam - no terminal required.
- Open the Shortcuts app.
- Choose
File -> Import
. - Select the Shortcut in
macOS/Strip Unicode.shortcut
. - Edit it to point to your local
cleanup-text.py
. - Relaunch Finder (
Cmd+Opt+Esc
→ select Finder → Relaunch) if needed. - After setup, right-click files, choose
Quick Actions
, selectStrip Unicode
.
- bin/cleanup-text.py - Main cleaning script
- bin/cleanup-text - Symlink for CLI usage
- setup.sh - Easy setup and env configuration
- requirements.txt - Python dependencies
- macOS/ - Shortcuts, scripts for Finder
- data/ - Example test files
- test/ - Automated test suite for all features/edge cases
- docs/ - Documentation and screenshots
- LICENSE
- README.md - This file
UnicodeFix comes with a full, automated test suite:
- Runs every feature & scenario on files in
data/
- Outputs to
test_output/
(by scenario, with diffs and word counts) - Clean up with:
./test/test_all.sh clean
- Plug into your CI/CD pipeline or just use as a "paranoia check" before shipping anything
Pro tip: Run the tests before you merge, publish, or email a "final" version.
See docs/test-suite.md for the deep dive.
Feedback, bug reports, and patches welcome.
If you've got a better integration path for your favorite platform, let's make it happen. Pull requests with attitude, creativity, and clean diffs appreciated.
If UnicodeFix (or my other projects) saved your bacon or made you smile, please consider fueling my caffeine habit and indie dev obsession:
One coffee = one more tool released to the wild.
Thank you for keeping solo development alive!
See CHANGELOG.md for the latest drop.
Copyright 2025 unixwzrd@unixwzrd.ai
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.