A Rust library for parsing EPUB files. This library provides a simple and efficient way to extract metadata, chapters, table of contents, and file contents from EPUB documents. mostly made for my own purposes needing content by chapter rather than by the TOC.
- ✅ Parse EPUB metadata (title, author, language, etc.)
- ✅ Extract chapters and their content
- ✅ Generate table of contents
- ✅ Access individual files within the EPUB
- ✅ HTML content parsing and extraction
- ✅ Support for and 3.0 format
Add this to your Cargo.toml
:
[dependencies]
epubie-lib = "0.1.0"
use epubie_lib::Epub;
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Open an EPUB file
let epub = Epub::new("path/to/your/book.epub".to_string())?;
// Get basic metadata
println!("Title: {}", epub.get_title());
println!("Author: {}", epub.get_creator());
// Iterate through chapters
for (i, chapter) in epub.get_chapters().iter().enumerate() {
println!("Chapter {}: {}", i + 1, chapter.get_title());
// Access files in each chapter
for file in chapter.get_files() {
if file.is_html() {
println!(" HTML content: {} bytes", file.get_html_bytes().len());
}
}
}
Ok(())
}
use epubie_lib::Epub;
let epub = Epub::new("book.epub".to_string())?;
println!("Title: {}", epub.get_title());
println!("Creator: {}", epub.get_creator());
println!("Language: {}", epub.get_language());
println!("Identifier: {}", epub.get_identifier());
println!("Publication Date: {}", epub.get_date());
if let Some(description) = epub.get_description() {
println!("Description: {}", description);
}
use epubie_lib::Epub;
let epub = Epub::new("book.epub".to_string())?;
println!("Total chapters: {}", epub.get_chapter_count());
for (i, chapter) in epub.get_chapters().iter().enumerate() {
println!("Chapter {}: {}", i + 1, chapter.get_title());
println!(" Files: {}", chapter.get_file_count());
for file in chapter.get_files() {
println!(" - {} ({})",
file.get_title().unwrap_or("Untitled"),
file.get_href());
}
}
use epubie_lib::Epub;
let epub = Epub::new("book.epub".to_string())?;
let toc = epub.get_table_of_contents();
println!("Table of Contents ({} entries):", toc.get_entry_count());
for entry in toc.get_entries() {
let indent = " ".repeat(entry.get_level() as usize);
println!("{}{} -> {}", indent, entry.get_title(), entry.get_href());
}
use epubie_lib::Epub;
let epub = Epub::new("book.epub".to_string())?;
for file in epub.get_all_files() {
println!("File: {} ({})", file.get_href(), file.get_media_type());
if file.is_html() {
// Get raw HTML content
let html_content = file.get_html_bytes();
println!(" HTML size: {} bytes", html_content.len());
// Get parsable HTML (if needed for further processing)
if let Some(parsed_html) = file.get_parsable_html() {
println!(" Parsed HTML available");
}
}
}
The main struct for working with EPUB files.
new(file_path: String) -> Result<Epub, Box<dyn std::error::Error>>
- Create a new EPUB instanceget_title() -> &str
- Get the book titleget_creator() -> &str
- Get the book author/creatorget_language() -> &str
- Get the book languageget_identifier() -> &str
- Get the book identifierget_date() -> &str
- Get the publication dateget_publisher() -> Option<String>
- Get the publisherget_description() -> Option<String>
- Get the book descriptionget_rights() -> Option<String>
- Get the rights informationget_cover() -> Option<String>
- Get the cover image pathget_tags() -> Option<Vec<String>>
- Get book tagsget_chapters() -> &Vec<Chapter>
- Get all chaptersget_chapter_count() -> usize
- Get the number of chaptersget_table_of_contents() -> &TableOfContents
- Get the table of contentsget_all_files() -> &Vec<EpubFile>
- Get all files in the EPUBget_file_count() -> usize
- Get the total number of files
Represents a chapter in the EPUB.
get_title() -> &str
- Get the chapter titleget_files() -> &Vec<EpubFile>
- Get files in this chapterget_file_count() -> usize
- Get the number of files in this chapter
Represents a file within the EPUB.
get_id() -> &str
- Get the file IDget_href() -> &str
- Get the file href/pathget_title() -> Option<&str>
- Get the file titleget_content() -> &str
- Get the file content as stringget_media_type() -> &str
- Get the MIME typeget_html_bytes() -> &[u8]
- Get raw HTML content as bytesis_html() -> bool
- Check if the file is HTMLget_parsable_html() -> Option<String>
- Get parsable HTML content
Represents the table of contents.
get_entries() -> &Vec<TocEntry>
- Get all TOC entriesget_entry_count() -> usize
- Get the number of TOC entries
Represents an entry in the table of contents.
get_title() -> &str
- Get the entry titleget_href() -> &str
- Get the entry href/linkget_level() -> u32
- Get the nesting level
The library includes example code demonstrating various use cases:
# Run the basic usage example
cargo run --example basic_usage
# Run tests
cargo test
chrono
- Date and time handlinguuid
- UUID generation and parsingzip
- ZIP file handling (EPUB files are ZIP archives)regex
- Regular expression supportserde
- Serialization frameworkserde-xml-rs
- XML parsing
- ✅ EPUB 2.0 and 3.0 formats
- ✅ OCF (Open Container Format) parsing
- ✅ OPF (Open Packaging Format) metadata extraction
- ✅ Navigation document parsing
- ✅ NCX (Navigation Control XML) support
- ✅ HTML content extraction
- ✅ Chapter organization and grouping
Contributions are welcome! Please feel free to submit a Pull Request.
- Initial release
- Basic EPUB parsing functionality
- Metadata extraction
- Chapter and file organization
- Table of contents generation