A lightweight, idiomatic Kotlin library for XML parsing with a fluent DSL. Built on top of StAX and Kotlin Coroutines, it provides both a low-level API and a high-level DSL for efficient XML processing.
- β¨ Features
- π₯ Installation
- π Documentation
- ποΈ Architecture
- π Getting Started
- π§© Advanced Usage
- π§ Extending the Library
- π Best Practices
- π API Reference
- π₯ Contributing
- π License
- Idiomatic Kotlin DSL for clean, readable XML parsing code
- Type-safe conversions for XML values (Int, Long, Double, Boolean)
- Nullable handling for optional XML elements
- Flow-based API for asynchronous and streaming processing
- Minimal dependencies (only Kotlin stdlib, Coroutines, and StAX)
- Extensible design for custom processors
repositories {
mavenCentral()
}
dependencies {
implementation("com.github.asm0dey:staks:1.1.0")
}
<dependency>
<groupId>com.github.asm0dey</groupId>
<artifactId>staks</artifactId>
<version>1.1.0</version>
</dependency>
The project's API documentation is generated using Dokka version 2.0.0 and is available online:
- API Documentation - Comprehensive documentation of all public APIs
The documentation is automatically generated and deployed to GitHub Pages whenever changes are pushed to the main branch.
If you want to generate the documentation locally, you can run:
./gradlew dokkaGenerate
The generated documentation will be available in the build/dokka/html
directory.
Staks provides two complementary approaches to XML parsing:
The low-level API gives you direct access to the XML event stream through Kotlin Flows and the StaksContext class. It's designed for:
- Maximum flexibility and control over the parsing process
- Streaming large XML documents efficiently
- Building custom parsing logic for complex XML structures
Key components of the low-level API include:
staks(input)
: Creates a Flow of XML events from various input sourcesStaksContext.collectText()
: Collects text content from specific elementsStaksContext.collectAttribute()
: Extracts attribute values from elementsStaksContext.collectElements()
: Processes elements and their content
Built on top of the low-level API, the high-level DSL provides a more concise and intuitive way to parse XML:
- Fluent, type-safe interface for common parsing tasks
- Automatic type conversions (string to int, boolean, etc.)
- Convenient handling of optional elements
- Simplified navigation of nested structures
Key components of the high-level DSL include:
StaksContext.tagValue()
: Gets the text content of a specific tagStaksContext.attribute()
: Gets an attribute valueStaksContext.list()
: Collects and transforms a list of elementsStaksContext.flow()
: Creates a Flow of transformed elements
import com.github.asm0dey.kxml.staks
import kotlinx.coroutines.flow.toList
import kotlinx.coroutines.runBlocking
import java.io.File
// Sample XML data
val xmlString = """
<library>
<book id="1">
<title>Kotlin in Action</title>
<year>2017</year>
</book>
<book id="2">
<title>Effective Kotlin</title>
<year>2020</year>
</book>
</library>
""".trimIndent()
// Define a data class to hold our parsed data
data class Book(val id: Int, val title: String, val year: Int)
// Using the high-level DSL for concise, readable parsing
val booksWithDsl = runBlocking {
staks(xmlString) {
// The list() function collects all matching elements and transforms them
list("book") {
// Inside this block, we're in the context of a single book element
val id = attribute("id").int() // Convert attribute to Int
val title = tagValue("title").string() // Get text content as String
val year = tagValue("year").int() // Get text content as Int
Book(id, title, year) // Return a Book object
}
}
}
// booksWithDsl is now a List<Book> with two entries
// You can also use an InputStream
val xmlInputStream = xmlString.byteInputStream()
val booksFromStream = runBlocking {
staks(xmlInputStream) {
// Same parsing logic as above, but with more explicit comments
list("book") {
// Get the 'id' attribute from the current element and convert to Int
val id = attribute("id").int()
// Get the text content of the 'title' child element
val title = tagValue("title").string()
// Get the text content of the 'year' child element and convert to Int
val year = tagValue("year").int()
// Create and return a Book object with the extracted data
Book(id, title, year)
}
}
}
// Or parse from a File
val xmlFile = File("books.xml") // Assuming this file exists
val booksFromFile = runBlocking {
staks(xmlFile) {
// Using the flow() function instead of list() for streaming processing
// This is useful for large XML files as it processes elements one by one
flow("book") {
val id = attribute("id").int()
val title = tagValue("title").string()
val year = tagValue("year").int()
Book(id, title, year)
}.toList() // Collect the flow into a list
}
}
The low-level API provides more control over the parsing process, which is useful for complex XML structures or when you need maximum performance:
import com.github.asm0dey.kxml.staks
import kotlinx.coroutines.flow.first
import kotlinx.coroutines.flow.toList
import kotlinx.coroutines.runBlocking
// Sample XML data
val xmlString = """
<library>
<book id="1">
<title>Kotlin in Action</title>
<year>2017</year>
</book>
<book id="2">
<title>Effective Kotlin</title>
<year>2020</year>
</book>
</library>
""".trimIndent()
data class Book(val id: Int, val title: String, val year: Int)
// Using the low-level API for more control over the parsing process
val books = runBlocking {
staks(xmlString) {
// collectElements creates a flow of elements matching the given name
collectElements("book") {
// For each book element, collect its attributes and child elements
val id = collectAttribute("book", "id").first().toInt()
// collectText creates a flow of text content from matching elements
val title = collectText("title").first()
val year = collectText("year").first().toInt()
Book(id, title, year)
}.toList() // Collect all books into a list
}
}
The library provides functions to access data from the root element:
import com.github.asm0dey.kxml.staks
import kotlinx.coroutines.runBlocking
val xml = """
<library version="1.0" count="2">
Library content
<book>Book 1</book>
<book>Book 2</book>
</library>
""".trimIndent()
val result = runBlocking {
staks(xml) {
// Get the root element name - in this example we know it's "library"
val rootElementName = "library"
// Get attributes from the root element using the attribute function
// First parameter is the element name, second is the attribute name
val version = attribute("library", "version").string()
val count = attribute("library", "count").int() // Automatically converts to Int
// Get text content from the root element using tagValue
val rootText = tagValue("library").string()
// Return a Triple with the extracted data
Triple(rootElementName, version, "Count: $count, Text: $rootText")
}
}
// result = Triple("library", "1.0", "Count: 2, Text: Library content")
val xml = """
<products>
<product id="1" available="true">
<name>Product 1</name>
<price>99.99</price>
<stock>100</stock>
<categories>
<category id="1">Electronics</category>
<category id="2">Computers</category>
</categories>
</product>
</products>
""".trimIndent()
data class Category(val id: Int, val name: String)
data class Product(val id: Int, val name: String, val price: Double,
val stock: Int, val available: Boolean,
val categories: List<Category>)
val products = staks(xml) {
list("product") {
val id = attribute("product", "id").int()
val available = attribute("product", "available").boolean()
val name = tagValue("name").string()
val price = tagValue("price").double()
val stock = tagValue("stock").int()
val categories = list("category") {
val categoryId = attribute("category", "id").int()
val categoryName = tagValue("category").string()
Category(categoryId, categoryName)
}
Product(id, name, price, stock, available, categories)
}
}
// Using flow instead of list
val productsFlow = staks(xml) {
flow("product") {
val id = attribute("product", "id").int()
val available = attribute("product", "available").boolean()
val name = tagValue("name").string()
val price = tagValue("price").double()
val stock = tagValue("stock").int()
val categories = list("category") {
val categoryId = attribute("category", "id").int()
val categoryName = tagValue("category").string()
Category(categoryId, categoryName)
}
Product(id, name, price, stock, available, categories)
}
// The result is a Flow<Product> that can be processed asynchronously
}
val xml = "<root><item>value</item></root>"
val existingTag = staks(xml) {
tagValue("item").nullable().string()
}
val nonExistingTag = staks(xml) {
tagValue("non-existing").nullable().string()
}
// existingTag = "value"
// nonExistingTag = null
The library provides functions to work directly with the current element context, which is especially useful when parsing lists of elements:
val xml = """
<root>
<item>1</item>
<item id="2">2</item>
<item id="3" active="true">3</item>
</root>
""".trimIndent()
data class Item(val id: Int?, val value: Int, val active: Boolean?)
val items = staks(xml) {
list("item") {
val id = attribute("id").nullable().int()
val value = text().int()
val active = attribute("active").nullable().boolean()
Item(id, value, active)
}
}
// items = [
// Item(id=null, value=1, active=null),
// Item(id=2, value=2, active=null),
// Item(id=3, value=3, active=true)
// ]
This approach is more concise than specifying the element name for each attribute or text value, especially when working with nested structures.
The library provides a convenient unary plus operator (+
) as a shorthand for .value()
:
val name: String = +tagValue("name") // Same as tagValue("name").value()
The library provides comprehensive support for XML namespaces. You can work with namespaces in several ways:
You can directly use namespace prefixes in element and attribute names:
val xml = """
<root xmlns:ns1="http://example.com/ns1">
<ns1:element>Value</ns1:element>
</root>
""".trimIndent()
val value = staks(xml) {
tagValue("ns1:element").string()
}
// value = "Value"
You can also use namespace URIs directly, which is useful when the prefix in the XML document might change:
val xml = """
<root xmlns:ns1="http://example.com/ns1">
<ns1:element>Value</ns1:element>
</root>
""".trimIndent()
val value = staks(xml) {
tagText("element", "http://example.com/ns1").string()
}
// value = "Value"
For more complex scenarios, you can pass a map of namespaces as an argument to the staks function:
val xml = """
<root xmlns:ns1="http://example.com/ns1">
<ns1:element>Value</ns1:element>
</root>
""".trimIndent()
// Define namespaces in a map
val namespaces = mapOf("myns" to "http://example.com/ns1")
val value = staks(xml, namespaces) {
// Use the namespace in a query
tagText("element", namespaces["myns"]).string()
}
// value = "Value"
This approach is particularly useful when working with XML documents where the prefix might change but the namespace URI remains the same.
You can also get information about namespaces declared in the XML document:
val xml = """
<root xmlns:ns1="http://example.com/ns1" xmlns:ns2="http://example.com/ns2">
<ns1:element>Value 1</ns1:element>
<ns2:element>Value 2</ns2:element>
</root>
""".trimIndent()
val namespaces = staks(xml) {
getNamespaces()
}
// namespaces = {"ns1" to "http://example.com/ns1", "ns2" to "http://example.com/ns2"}
val uri = staks(xml) {
resolveNamespace("ns1")
}
// uri = "http://example.com/ns1"
The library handles CDATA sections transparently, treating them as regular text content. This is useful for parsing XML with embedded HTML, JavaScript, or other content that might contain characters that would normally need to be escaped in XML.
val xml = """
<root>
<item><![CDATA[<tag>This & that</tag>]]></item>
</root>
""".trimIndent()
val content = staks(xml) {
tagValue("item").string()
}
// content = "<tag>This & that</tag>"
CDATA sections can be used in any text content, including mixed content:
val xml = """
<root>
<item>Regular text <![CDATA[<CDATA text>]]> more regular text</item>
</root>
""".trimIndent()
val content = staks(xml) {
tagValue("item").string()
}
// content = "Regular text <CDATA text> more regular text"
Multiple CDATA sections are concatenated:
val xml = """
<root>
<item><![CDATA[First]]><![CDATA[Second]]></item>
</root>
""".trimIndent()
val content = staks(xml) {
tagValue("item").string()
}
// content = "FirstSecond"
When parsing XML, it's important to handle potential errors gracefully. Here are some best practices for error handling:
For optional elements or attributes, use the nullable()
method to avoid NullPointerExceptions:
val optionalValue = staks(xml) {
tagValue("optional-element").nullable().string()
}
// optionalValue will be null if the element doesn't exist
The library may throw exceptions in certain cases, such as when converting values to incorrect types. Always wrap your parsing code in try-catch blocks when dealing with untrusted XML:
try {
val number = staks(xml) {
tagValue("number").int()
}
} catch (e: NumberFormatException) {
// Handle the case where the value is not a valid integer
} catch (e: Exception) {
// Handle other exceptions
}
For complex XML structures, consider validating the structure before parsing:
val isValid = staks(xml) {
// Check if required elements exist
val hasRequiredElements = tagValue("required-element").nullable().string() != null
// Check if values are in expected format
val isValidFormat = try {
tagValue("number").int()
true
} catch (e: Exception) {
false
}
hasRequiredElements && isValidFormat
}
if (isValid) {
// Proceed with parsing
} else {
// Handle invalid XML
}
The library is designed to be efficient, but there are some best practices to ensure optimal performance:
For large XML files, use the Flow-based API to process elements as they are parsed, rather than loading the entire document into memory:
val result = staks(largeXmlFile) {
flow("item") {
// Process each item as it's parsed
processItem(tagValue("name").string())
}
}
When processing large XML documents, you can limit the size of the internal event cache to reduce memory usage:
val result = staks(largeXmlFile, maxCacheSize = 100) {
// Process the XML with a limited cache size
list("item") {
// ...
}
}
This is particularly useful when you're processing a large XML document but only need to access a small portion of it. The maxCacheSize
parameter limits the number of XML events that are cached in memory, which can significantly reduce memory usage for large documents.
For repeated parsing tasks, define reusable extension functions:
// Define an extension function for parsing books
// Note: This doesn't need to be a suspend function since the DSL functions handle suspension
fun StaksContext.parseBook(): Book {
// Extract data from the current book element
val title = tagValue("title").string()
val author = tagValue("author").string()
val year = tagValue("year").int()
return Book(title, author, year)
}
// Usage in a coroutine context
val books = runBlocking {
staks(xml) {
// The list function collects all book elements and applies our parsing function
list("book") {
// Inside this lambda, we're in the context of a single book element
// Call our helper function to parse the current book
parseBook()
}
}
}
When working with namespaces, define a namespace map and pass it to the staks function to minimize repeated lookups:
// Define namespaces in a map
val namespaces = mapOf(
"ns1" to "http://example.com/ns1",
"ns2" to "http://example.com/ns2"
)
val result = staks(xml, namespaces) {
// Use namespaces["ns1"] instead of resolveNamespace("ns1") in multiple places
list("element", namespaces["ns1"]) {
// ...
}
}
The library is designed to be extensible. You can create your own processors on top of the existing primitives.
You can extend the ValueResult
interface to create custom value processors:
// Create a custom processor for dates
fun ValueResult.date(pattern: String = "yyyy-MM-dd"): LocalDate {
val formatter = DateTimeFormatter.ofPattern(pattern)
return LocalDate.parse(value(), formatter)
}
// Usage
val publishDate = staks(xml) {
tagValue("publishDate").date()
}
You can create custom element collectors for specific XML structures:
// Custom collector for address elements
fun StaksContext.collectAddress(): Flow<Address> = flow {
collectElements("address") {
val street = collectText("street").first()
val city = collectText("city").first()
val zipCode = collectText("zipCode").first()
emit(Address(street, city, zipCode))
}
}
// Usage
val addresses = staks(xml) {
collectAddress().toList()
}
You can create domain-specific extensions for your particular XML format:
// Extension for RSS feeds
fun StaksContext.collectRssItems(): Flow<RssItem> = flow {
collectElements("item") {
val title = collectText("title").first()
val link = collectText("link").first()
val description = collectText("description").firstOrNull()
emit(RssItem(title, link, description))
}
}
// Usage
val rssItems = staks(rssXml) {
collectRssItems().toList()
}
staks(input: InputStream): Flow<XmlEvent>
- Creates a Flow of XML events from an input streamstaks(input: InputStream, namespaces: Map<String, String>, enableNamespaces: Boolean, maxCacheSize: Int, block: suspend StaksContext.() -> T): T
- Main entry point for the DSL with input streamstaks(input: String, namespaces: Map<String, String>, enableNamespaces: Boolean, maxCacheSize: Int, block: suspend StaksContext.() -> T): T
- Main entry point for the DSL with string inputstaks(input: File, namespaces: Map<String, String>, enableNamespaces: Boolean, maxCacheSize: Int, block: suspend StaksContext.() -> T): T
- Main entry point for the DSL with file input
collectText(elementName: String, namespaceURI: String?): Flow<String>
- Collects text content of a specific elementcollectCurrentText(): Flow<String>
- Collects text content of the current elementcollectAttribute(elementName: String, attributeName: String, elementNamespaceURI: String?, attributeNamespaceURI: String?): Flow<String>
- Collects attributes of a specific elementcollectCurrentAttribute(attributeName: String, attributeNamespaceURI: String?): Flow<String>
- Collects an attribute of the current elementcollectElements(elementName: String, namespaceURI: String?, transform: suspend StaksContext.() -> T): Flow<T>
- Collects and transforms elements
tagValue(tagName: String): TagValueResult
- Gets a tag valuetagText(tagName: String, namespaceURI: String?): TagValueResult
- Gets a tag value with a specific namespace URItext(): TagValueResult
- Gets the text content of the current elementattribute(tagName: String, attributeName: String, tagNamespaceURI: String?, attributeNamespaceURI: String?): AttributeResult
- Gets an attribute valueattribute(attributeName: String, attributeNamespaceURI: String?): AttributeResult
- Gets an attribute value from the current elementlist(tagName: String, namespaceURI: String?, block: suspend StaksContext.() -> T): List<T>
- Parses a list of elementslist(tagName: String, block: suspend StaksContext.() -> T): List<T>
- Parses a list of elements without namespace URIflow(tagName: String, namespaceURI: String?, block: suspend StaksContext.() -> T): Flow<T>
- Parses a flow of elementsnullable(): NullableValueResult<T>
- Makes a value nullablerootName(): String?
- Gets the name of the root elementrootAttribute(attributeName: String): AttributeResult
- Gets an attribute value from the root elementrootText(): TagValueResult
- Gets the text content of the root element
.int()
- Converts to Int.long()
- Converts to Long.double()
- Converts to Double.boolean()
- Converts to Boolean.string()
- Gets the string value.value()
- Gets the raw string value+result
- Shorthand forresult.value()
Contributions are welcome! Here's how you can contribute:
- Fork the repository
- Create a feature branch:
git checkout -b feature/my-feature
- Commit your changes:
git commit -am 'Add my feature'
- Push to the branch:
git push origin feature/my-feature
- Submit a pull request
Please make sure to update tests as appropriate.
This project is licensed under the MIT License - see the LICENSE file for details.