A Rust library for parsing Korean Hangul Word Processor (HWP) files with full layout rendering support.
- Complete HWP 5.0 Format Support: Parse all document components including text, formatting, tables, and embedded objects
- Visual Layout Rendering: Reconstruct documents with pixel-perfect accuracy when layout data is available
- Font and Style Preservation: Extract and apply original fonts, sizes, colors, and text formatting
- Advanced Layout Engine: Support for multi-column layouts, line-by-line positioning, and character-level formatting
- SVG Export: Render documents to scalable vector graphics
- Zero-copy Parsing: Efficient parsing with minimal memory allocation
- Safe Rust: Memory-safe implementation with comprehensive error handling
- Basic Document Creation: Create simple HWP documents with text content
- Paragraph Formatting: Text alignment, line spacing, paragraph spacing
- Page Layout: Custom page sizes, margins, orientation
- Hyperlinks: URL, email, file, and bookmark links (partial support)
- Header/Footer: Basic header and footer support (limited functionality)
Add this to your Cargo.toml
:
[dependencies]
hwpers = "0.3"
use hwpers::HwpReader;
// Parse an HWP file
let document = HwpReader::from_file("document.hwp")?;
// Extract text content
let text = document.extract_text();
println!("{}", text);
// Access document properties
if let Some(props) = document.get_properties() {
println!("Pages: {}", props.total_page_count);
}
// Iterate through sections and paragraphs
for (i, section) in document.sections().enumerate() {
println!("Section {}: {} paragraphs", i, section.paragraphs.len());
for paragraph in §ion.paragraphs {
if let Some(text) = ¶graph.text {
println!(" {}", text.content);
}
}
}
use hwpers::{HwpReader, render::{HwpRenderer, RenderOptions}};
let document = HwpReader::from_file("document.hwp")?;
// Create renderer with custom options
let options = RenderOptions {
dpi: 96,
scale: 1.0,
show_margins: false,
show_baselines: false,
};
let renderer = HwpRenderer::new(&document, options);
let result = renderer.render();
// Export first page to SVG
if let Some(svg) = result.to_svg(0) {
std::fs::write("page1.svg", svg)?;
}
println!("Rendered {} pages", result.pages.len());
use hwpers::writer::HwpWriter;
use hwpers::model::hyperlink::Hyperlink;
// Create a new document
let mut writer = HwpWriter::new();
// Add formatted text
writer.add_aligned_paragraph(
"제목",
hwpers::writer::style::ParagraphAlignment::Center
)?;
// Add hyperlinks
let link = Hyperlink::new_url("Rust", "https://rust-lang.org");
writer.add_paragraph_with_hyperlinks(
"Visit Rust website",
vec![link]
)?;
// Configure page layout
writer.set_custom_page_size(210.0, 297.0, // A4 size
hwpers::model::page_layout::PageOrientation::Portrait)?;
writer.set_page_margins_mm(20.0, 20.0, 20.0, 20.0)?;
// Add header and footer
writer.add_header("Document Header");
writer.add_footer_with_page_number("Page ",
hwpers::model::header_footer::PageNumberFormat::Numeric);
// Save the document
writer.save_to_file("output.hwp")?;
// Access character and paragraph formatting
for section in document.sections() {
for paragraph in §ion.paragraphs {
// Get paragraph formatting
if let Some(para_shape) = document.get_para_shape(paragraph.para_shape_id as usize) {
println!("Indent: {}, Alignment: {}",
para_shape.indent,
para_shape.get_alignment()
);
}
// Get character formatting runs
if let Some(char_shapes) = ¶graph.char_shapes {
for pos_shape in &char_shapes.char_positions {
if let Some(char_shape) = document.get_char_shape(pos_shape.char_shape_id as usize) {
println!("Position {}: Size {}, Bold: {}",
pos_shape.position,
char_shape.base_size / 100,
char_shape.is_bold()
);
}
}
}
}
}
- ✅ File header and version detection
- ✅ Document properties and metadata
- ✅ Section definitions and page layout
- ✅ Paragraph and character formatting
- ✅ Font definitions (FaceName)
- ✅ Styles and templates
- ✅ Text content with full Unicode support
- ✅ Tables and structured data
- ✅ Control objects (images, OLE objects)
- ✅ Numbering and bullet lists
- ✅ Tab stops and alignment
- ✅ Page dimensions and margins
- ✅ Multi-column layouts
- ✅ Line-by-line positioning (when available)
- ✅ Character-level positioning (when available)
- ✅ Borders and fill patterns
- ✅ SVG export with accurate positioning
- ✅ Compressed document support
- ✅ CFB (Compound File Binary) format handling
- ✅ Multiple encoding support (UTF-16LE)
- ✅ Error recovery and partial parsing
The library includes a command-line tool for inspecting HWP files:
# Install the tool
cargo install hwpers
# Inspect an HWP file
hwp_info document.hwp
This library supports HWP 5.0 format files. For older HWP formats, consider using format conversion tools first.
The HWP writer functionality is currently in early development with several limitations:
- Hyperlinks: Basic structure implemented, but position tracking within paragraphs needs refinement
- Header/Footer: 40-byte structure implemented, but text content storage mechanism incomplete
- Page Layout: Basic settings work, but multi-column layouts not supported
- Styles: Currently uses hardcoded style IDs; proper style management system needed
- Images: Control structure exists but BinData stream integration missing - images won't display
- Tables: Table creation and formatting not implemented
- Lists/Numbering: Bullet points and numbered lists not supported
- Text Boxes: Text box controls not implemented
- Shapes/Drawing: Shape and drawing objects not supported
- Advanced Formatting: Character styles, fonts, colors need proper style manager
- Document Properties: Metadata and document properties not fully implemented
- Generated files may not open correctly in some versions of Hanword
- Style IDs are hardcoded and may conflict with document defaults
- Position tracking for hyperlinks within paragraphs is imprecise
- No compression support for writer (reader supports both compressed and uncompressed)
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under either of
- Apache License, Version 2.0, (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
- HWP file format specification by Hancom Inc.
- Korean text processing community
- Rust parsing and document processing ecosystem