EN | KR
WPAA is a comprehensive tool for analyzing and visualizing HTML architecture of web pages. It provides tree-structured visualization for both static and dynamic web pages, making DOM structure analysis intuitive and efficient.
- 🌳 Tree Visualization: Hierarchical representation of HTML structure
- 🔄 Change Detection: Automatic detection and comparison of webpage structure changes
- 🌐 Web Interface: Intuitive web UI for easy analysis
- 📊 Multiple Export Formats: Support for SVG, interactive HTML, CSV, and Markdown
- ⚡ Performance Optimization: Asynchronous processing, caching, and memory optimization
- 🔧 Static/Dynamic Analysis: Support for JavaScript-rendered web pages
- 🎯 Custom Filtering: CSS selector and attribute filtering capabilities
- 📈 Performance Monitoring: Track execution time, memory usage, and cache efficiency
Python 3.7+
pip install -r requirements.txt
Required External Programs:
-
Graphviz Installation:
- Download installer from official website
- Add bin directory to system PATH (e.g.,
C:\Program Files\Graphviz\bin
)
-
ChromeDriver Installation (for dynamic page analysis):
- Download from ChromeDriver website
- Save to appropriate location and update path in code:
service = Service('your/path/to/chromedriver')
python run_web_interface.py
Access http://127.0.0.1:5000
in your browser for intuitive web-based analysis.
Web Interface Features:
- 📱 User-friendly web UI
- 🔄 Real-time analysis progress display
- 📊 Download various output formats
- 🔍 Change comparison functionality
- 📈 Performance statistics
Basic usage:
python wpaa_run.py --urls https://example.com
Advanced options:
python wpaa_run.py --urls https://example.com https://test.com \
--exclude script style \
--include-attrs class href \
--custom-filter "div.content" \
--max-depth 3 \
--export-html \
--compare-changes \
--show-performance
--urls
: List of webpage URLs to analyze (required)--use-selenium
: Use Selenium for dynamic content fetching--exclude
: HTML tags to exclude (e.g., script style)--include-attrs
: HTML attributes to include in nodes (e.g., class id href)--custom-filter
: Filter specific elements using CSS selectors (e.g., div.classname)--max-depth
: Limit maximum tree depth--include-text
: Include text content--output
: Choose output format (text or json)--visualize
: Visualize tree structure as PNG file--export-svg
: Export to SVG format--export-html
: Export to interactive HTML--export-csv
: Export to CSV format--export-markdown
: Export to Markdown format--compare-changes
: Compare with previous version--show-performance
: Display performance report--optimize-tree
: Optimize tree structure
python wpaa_run.py --urls https://news.ycombinator.com
python wpaa_run.py --urls https://www.example.com --use-selenium
python wpaa_run.py --urls https://www.example.com --exclude script style meta link --visualize
python wpaa_run.py --urls https://www.example.com --include-attrs class id href --output json
python wpaa_run.py --urls https://www.example.com --export-html --compare-changes --show-performance
python wpaa_run.py --urls https://www.example.com --export-svg --export-csv --export-markdown
- Caching: Performance optimization for repeated URL analysis
- Asynchronous Processing: Concurrent analysis of multiple URLs
- Error Handling: Consistent error handling through decorators
- Tree Structure: HTML DOM visualization using anytree library
MK-II_2523: Feature improvements completed
- Tree comparison functionality for detecting site changes
- Web interface implementation
- Support for more output formats (SVG, interactive HTML)
- Performance optimization and memory usage improvements