-
Notifications
You must be signed in to change notification settings - Fork 11
Home
This site offers some background information on how to utilize the capabilities provided by the spark-xml-utils library within an Apache Spark application. In addition, some helpful tips, lessons learned, and java examples with using Apache Spark are provided.
The javadoc is also available for spark-xml-utils and could be helpful with understanding the class interactions.
The spark-xml-utils library was developed because there is a large amount of XML in our big datasets and I felt this data could be better served by providing some helpful xml utilities. This includes the ability to filter documents based on an xpath/xquery expression, return specific nodes for an xpath/xquery expression, or transform documents using a xslt stylesheet. By providing some basic wrappers to Saxon, the spark-xml-utils library exposes some basic XPath, XSLT, and XQuery functionality that can readily be leveraged by any Spark application.