PDF Converter Pro is a web application that allows you to convert PDF files into various formats while aiming to preserve links and content structure. It features a Python (FastAPI) backend and a modern HTML/CSS/JavaScript frontend.
The project is automatically built and deployed using GitHub Actions, with the frontend hosted on GitHub Pages.
- Convert PDFs to:
- Markdown (
.md
) - Word (
.docx
) - LibreOffice/OpenDocument Text (
.odt
) - Plain Text (
.txt
)
- Markdown (
- Link Preservation: Efforts are made to keep hyperlinks active in the converted documents (especially for
.md
and wherepandoc
supports it). - User-Friendly Interface:
- Simple file upload.
- Clear format selection.
- Real-time progress bar.
- Status messages for ongoing operations.
- Direct download of converted files.
- Modern Design: Dark theme, responsive, and minimalist UI.
- Automated CI/CD:
- Backend testing via GitHub Actions.
- Frontend deployment to GitHub Pages via GitHub Actions.
To enable the live demo:
- After the first successful run of the "Deploy Frontend to gh-pages" workflow (see Actions tab), a
gh-pages
branch will be created in your repository. - Go to your GitHub repository's Settings page.
- In the left sidebar, click on Pages.
- Under "Build and deployment", for the Source, select Deploy from a branch.
- Under "Branch", select
gh-pages
as the branch and/ (root)
as the folder. - Click Save.
Your application will then be available at: https://<your-github-username>.github.io/pdf-converter-app/
(Replace <your-github-username>
with your actual GitHub username). It might take a few minutes for the site to become active after saving.
- Backend: Python, FastAPI, PyMuPDF, Pandoc
- Frontend: HTML, CSS (vanilla), JavaScript
- CI/CD: GitHub Actions
- Hosting: GitHub Pages (for frontend)
- Python 3.8+
pip
(Python package installer)pandoc
(System-wide installation required for.docx
and.odt
conversion)- Linux (Debian/Ubuntu):
sudo apt-get install pandoc
- macOS (Homebrew):
brew install pandoc
- Windows: Download installer from pandoc.org
- Linux (Debian/Ubuntu):
- (Optional) LibreOffice for
unoconv
if you plan to use it for certain conversions (Pandoc is the primary tool here).
git clone https://github.com/<your-github-username>/pdf-converter-app.git
cd pdf-converter-app
cd backend
# Create and activate a virtual environment (recommended)
python -m venv venv
# On Windows:
# venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Run the FastAPI backend server
uvicorn main:app --reload --host 0.0.0.0 --port 8000
The backend API will be available at http://localhost:8000
. You can see the API docs at http://localhost:8000/docs
.
Important Note on CORS:
The backend is configured with CORS (Cross-Origin Resource Sharing) to allow requests from typical frontend development servers (like http://localhost:8080
, http://127.0.0.1:8080
) and from all origins (*
). This "*"
setting is for ease of development and testing with services like GitHub Pages. For a production deployment of the backend, you should restrict the origins
list in backend/main.py
to only the specific domains where your frontend is hosted.
- Navigate to the
frontend
directory:cd frontend
- Open the
index.html
file in your web browser.
The frontend will attempt to connect to the backend API running at http://localhost:8000
.
To run the backend unit tests:
- Ensure you are in the
backend
directory and your virtual environment is activated. - Make sure test dependencies are installed (usually included in
requirements.txt
, or you might needpip install unittest-xml-reporting
if you want XML reports). - Run the tests:
# Ensure the main project directory is in PYTHONPATH for imports # If you are in the 'pdf-converter-app/backend' directory: export PYTHONPATH=$(pwd)/..:$PYTHONPATH # Or on Windows (PowerShell): # $env:PYTHONPATH="$(Get-Location)/..;$env:PYTHONPATH" python -m unittest discover -s ./tests -p "test_*.py"
The frontend application, whether run locally (index.html
) or deployed via GitHub Pages, needs to connect to a running backend API.
-
Backend Location: The Python backend (FastAPI server) must be running. This can be:
- Locally on your machine (e.g.,
python backend/main.py
oruvicorn backend.main:app --reload --port 8000
). - Deployed to a hosting service (e.g., Heroku, AWS, Google Cloud).
- Locally on your machine (e.g.,
-
Configuring the API URL in Frontend: The frontend needs to know the backend's URL. This is set in
frontend/script.js
:const API_BASE_URL = 'http://localhost:8000';
- For local development: If your backend is running on
http://localhost:8000
, this default URL is correct. - For GitHub Pages with a local backend: If you are using the GitHub Pages frontend, you still need to run the backend locally. The GitHub Pages site will try to make requests to
http://localhost:8000
on your computer. Ensure your browser can access this (some browsers/extensions might block localhost access fromhttps://
sites, but generally this works for development). - For GitHub Pages with a deployed backend: If you have deployed your backend to a public URL (e.g.,
https://your-backend-api.com
), you must updateAPI_BASE_URL
infrontend/script.js
to this public URL, then commit and push this change so your GitHub Pages site uses the correct API endpoint.
- For local development: If your backend is running on
If you see "NetworkError" or conversion button doesn't work on GitHub Pages:
- Ensure your local Python backend is running.
- Check the browser's developer console (usually F12) for error messages. CORS errors or mixed content warnings might appear if the backend is not configured correctly or if the
API_BASE_URL
is wrong. - The backend's CORS policy is currently set to allow all origins (
*
) for easier testing. If you changed this, ensure your GitHub Pages URL (https://<username>.github.io
) is in the allowed list.
- Backend CI: Pushes or pull requests to the
main
branch affecting thebackend/
directory or.github/workflows/backend.yml
will trigger the backend CI workflow. This workflow installs dependencies (includingpandoc
) and runs unit tests. - Frontend Deployment: Pushes to the
main
branch affecting thefrontend/
directory or.github/workflows/frontend.yml
will trigger the "Deploy Frontend to gh-pages" workflow. This workflow:- Checks out the
main
branch. - Pushes the entire content of the
frontend/
directory to thegh-pages
branch.
- Important: You need to configure GitHub Pages in your repository settings to serve from the
gh-pages
branch (see "Live Demo" section above for instructions).
- Checks out the
You can view the status of these actions under the "Actions" tab of your GitHub repository.
To manually trigger a frontend deployment (if workflow_dispatch
is enabled in frontend.yml
):
- Go to the "Actions" tab in your GitHub repository.
- Select the "Deploy Frontend to gh-pages" workflow from the list.
- Click on "Run workflow", choose the branch (usually
main
), and click "Run workflow".
-
Using the Web UI (Local or Deployed):
- Ensure the backend is running locally if testing locally.
- Open
frontend/index.html
or the GitHub Pages URL. - Upload a PDF file.
- Select your desired output format.
- Click "Convert File".
- Monitor the progress bar and status messages.
- Once complete, click the "Download File" button.
- Inspect the downloaded file for content and link preservation.
-
Directly via
converter.py
(for backend debugging): Thebackend/converter.py
script has aif __name__ == '__main__':
block that can be used for direct testing if you modify it or run it with a Python interpreter. It creates a dummy PDF and attempts to convert it.cd backend # Make sure venv is active python converter.py
This will save output files to the
output/
directory in the project root.
Contributions are welcome! If you have suggestions or find bugs, please:
- Fork the repository.
- Create a new branch for your feature or bugfix (e.g.,
feature/new-converter
orfix/upload-error
). - Make your changes, ensuring you add or update tests where appropriate.
- Ensure backend tests pass (
python -m unittest discover -s ./backend/tests
). - Commit your changes with clear and descriptive messages.
- Push your branch to your forked repository.
- Create a Pull Request to the original repository's
main
branch.
Please ensure your code follows the existing style and that any new dependencies are added to backend/requirements.txt
.
This project is open source and available under the MIT License (You would need to add a LICENSE file for this to be true - consider adding one, e.g. from choosealicense.com
). For now, assume it's proprietary or specify if no license is intended.
(Self-correction: The original spec did not ask for a LICENSE file, so I will omit this section for now, but it's good practice for public repos)