Visual Webpage Section Detection

Dataset for Visual Section Detection

A dataset to train the object detection model that extracts sections, important regions in a web page, from page screenshot. The dataset consists of 3,452 annotated screenshots captured from web pages. Along with each image file, corresponding XML file for annotations is also included.
The dataset is collected using our annotation tool.

Trained Model Checkpoints (TensorFlow.js)

We provide trained model checkpoints for EfficientDet-D0 and EfficientDet-D3. The checkpoints are converted to TensorFlow.js format for easy deployment on the web.

Demo Program

We provide a Chrome extension program that extracts sections from a web page using the trained object detection model and our DOM-based structural analysis method.

Example results on Stack Overflow and Coursera

Step 0. Taking page screenshot	Step 1. Run object detection model	Step 2. Postprocessing using DOM tree

How to use

0. Clone the repository
1. Go to chrome://extensions/ and enable Developer mode
2. Click on Load unpacked and select the cloned repository folder
3. Open the web page you want to analyze and click on the extension icon to 'Start Detection'
4. After the detection process is finished, the two browser tabs will open with the object detection results and the postprocessed results

(Note) You can check the detailed results in the console log (SegmentedScreen) using DevTools (F12)

How to change the object detection model
- Switch the files (.bin, .json) in the tfjs_models/model folder with the target model files.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
images		images
libraries		libraries
tfjs_models/model		tfjs_models/model
README.md		README.md
background.js		background.js
content_script.js		content_script.js
manifest.json		manifest.json
popup.html		popup.html
popup.js		popup.js
popup_detection_result.html		popup_detection_result.html
popup_detection_result.js		popup_detection_result.js
popup_parsed_result.html		popup_parsed_result.html
popup_parsed_result.js		popup_parsed_result.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Visual Webpage Section Detection

Dataset for Visual Section Detection

Trained Model Checkpoints (TensorFlow.js)

Demo Program

About

Uh oh!

Releases

Packages

Languages

jeho-lee/visual-webpage-section-detection

Folders and files

Latest commit

History

Repository files navigation

Visual Webpage Section Detection

Dataset for Visual Section Detection

Trained Model Checkpoints (TensorFlow.js)

Demo Program

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages