Skip to content

A pure Python implementation of Punycode (RFC 3492) for converting Unicode domain names to ASCII. Features bidirectional encoding/decoding, full RFC compliance, and zero dependencies. Perfect for IDNA processing and internationalized domain handling.

License

Notifications You must be signed in to change notification settings

justavik/PunyCodePython

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

PunyCode Python Implementation

A pure Python implementation of the Punycode algorithm (RFC 3492) for encoding and decoding Unicode domain names.

License: MIT Python 3.6+

Overview

PunyCode is a specialized encoding syntax used to convert Unicode strings into the limited character subset of ASCII supported by the Domain Name System (DNS). This implementation provides a clean, efficient, and RFC-compliant way to encode and decode Punycode strings in Python.

Features

  • 🔄 Bidirectional conversion between Unicode and Punycode
  • 📜 Full compliance with RFC 3492 specifications
  • 🐍 Pure Python implementation with no external dependencies
  • 🌐 Support for both basic ASCII and non-ASCII Unicode characters
  • 📚 Comprehensive documentation and examples

Installation

Clone the repository:

git clone https://github.com/yourusername/PunyCode.git
cd PunyCode

No additional dependencies are required as this is a pure Python implementation.

Usage

As a Command Line Tool

python punycode.py

Follow the interactive prompts to encode or decode strings.

from punycode import punycode_encode, punycode_decode

# Encoding example
unicode_str = "München"
encoded = punycode_encode(unicode_str)
print(encoded)  # Output: "Mnchen-3ya"

# Decoding example
punycode_str = "Mnchen-3ya"
decoded = punycode_decode(punycode_str)
print(decoded)  # Output: "München"

Technical Details

The implementation uses several parameters as defined in RFC 3492:

  • BASE: 36 (using digits 0-9 and letters a-z)
  • TMIN: 1
  • TMAX: 26
  • SKEW: 38
  • DAMP: 700
  • INITIAL_BIAS: 72
  • INITIAL_N: 0x80
  • DELIMITER: '-'

Algorithm Overview

  1. Encoding Process:

    • Basic ASCII characters are preserved
    • Non-ASCII characters are encoded using a delta-compression scheme
    • Results are represented using base-36 encoding
  2. Decoding Process:

    • Splits input at the last delimiter
    • Processes basic and non-basic code points separately
    • Reconstructs the original Unicode string

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the MIT License - see the LICENSE file for details.

References

Author

Avik Chatterjee

Acknowledgments

  • Thanks to the authors of RFC 3492 for the detailed specification
  • The Unicode Consortium for their standards and documentation

About

A pure Python implementation of Punycode (RFC 3492) for converting Unicode domain names to ASCII. Features bidirectional encoding/decoding, full RFC compliance, and zero dependencies. Perfect for IDNA processing and internationalized domain handling.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages