This repository provides a Python script for rewriting academic texts (from .docx or .pdf) with optional expansion, using Groq Cloud's language models.
Publishing is a tricky issue when it comes to renovating and extending the original book — by adding paragraphs, text chunks, or other details. Thanks to Python and Groq Cloud, editorial audits can be automated. This repo offers a flexible tool for text rewriting with slight expansion..
- Multi-format Input Support: Automatically processes either
.docxor.pdffiles, depending on the file extension. - Chunk-wise Rewriting: Splits input text into manageable chunks (by paragraphs in
.docx, or reconstructed paragraph-like segments in.pdf) before rewriting, which helps preserve context and improves output quality. - Groq Cloud API Integration: Uses Groq language models for semantic rewriting with optional content expansion.
- Preserves Logical Structure: Special handling of headings or section openers (e.g., lines ending with a colon) to keep text coherent across chunks.
- Terminal-based CLI Tool: No GUI needed; just run the script from terminal with a few parameters.
- Automatic File Detection: The script detects the first supported file (
.docxor.pdf) in the directory if no input is explicitly provided.
Go to https://console.groq.com/keys. Log in, create an API key, copy the token, and keep it safe.
- Download
requirements.txtandrewrIT-er.pyfiles from this repository. - Place them in the same folder as the
.docxor.pdffile you want to rewrite. - Open a terminal (Command Prompt or Bash) and navigate to this folder:
cd [your-folder-name]- Create a virtual environment (optional but recommended):
python3 -m venv venv
source venv/bin/activate # On Unix/macOS
venv\Scripts\activate # On Windows- To install
Python 3.11.9if not already installed:
winget install --id Python.Python.3.11 --source winget- To install the required libraries, run:
pip install -r requirements.txt- To run the script:
python rewrIT-er.py --token [your Groq-API-Key] [input_file] [output_file]- To rewrite the content of a DOCX or PDF file:
python rewrIT-er.py --token gsk_... book_origin.docx book_copy.docxIMPORTANT: This script will process the first .docx or .pdf file it finds in the directory. Ensure that only one doc/pdf file you want to translate is in the folder!
This project is licensed under the MIT License - see the MIT.md file for details.
- Processing 1 pages/chunks...
📄 Page 1
- Done successfuly! Head to: book_copy.docx