You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<i>💡 First time? Run <b>"Scrape-LE: Setup Browser"</b> from Command Palette to install Chromium (~130MB one-time setup)</i>
38
38
</p>
39
39
40
-
## 🙏 Thank You!
40
+
## 🙏 Thank You
41
41
42
-
Thank you for your interest in Scrape-LE! If this extension has helped verify your scraper targets or validate site accessibility, please consider leaving a rating on [VS Code Marketplace](https://marketplace.visualstudio.com/items?itemName=nolindnaidoo.scrape-le). Your feedback helps other developers discover this tool and motivates continued development.
42
+
If Scrape-LE saves you time, a quick rating helps other developers discover it:
⭐ **Star this repository** to get notified about updates and new features!
45
+
## ✅ Why Scrape-LE?
45
46
46
-
## ✅ Why Scrape-LE
47
+
Validate scraper targets **before debugging your code**. Check if sites are reachable, detect auth walls, and verify selectors — all from your editor.
47
48
48
-
**Web scraping projects fail when target sites are unreachable or behave unexpectedly.** Debugging scraper failures is time-consuming when you don't know if the issue is your code or the target site.
49
-
50
-
**Scrape-LE makes site validation effortless.**
51
-
Quickly verify that pages load, render JavaScript correctly, and are accessible before deploying your scrapers.
49
+
Scrape-LE uses real browser automation (Playwright) to catch issues early: JavaScript rendering errors, authentication requirements, CAPTCHA detection, and selector validation. Stop wasting time debugging code when the problem is the target site.
52
50
53
51
-**Pre-deployment validation**
54
52
Test target URLs before writing scraper code. Catch unreachable sites, JS errors, and rendering issues early.
@@ -70,81 +68,26 @@ Quickly verify that pages load, render JavaScript correctly, and are accessible
70
68
71
69
## 🚀 More from the LE Family
72
70
73
-
**Scrape-LE** is part of a growing family of developer tools designed to make your workflow effortless:
74
-
75
-
-**Strings-LE** - Extract every user-visible string from JSON, YAML, CSV, TOML, INI, and .env files with zero hassle
-**[String-LE](https://marketplace.visualstudio.com/items?itemName=nolindnaidoo.string-le)** - Extract user-visible strings for i18n and validation • [Open VSX](https://open-vsx.org/extension/nolindnaidoo/string-le)
72
+
-**[Numbers-LE](https://marketplace.visualstudio.com/items?itemName=nolindnaidoo.numbers-le)** - Extract and analyze numeric data with statistics • [Open VSX](https://open-vsx.org/extension/nolindnaidoo/numbers-le)
73
+
-**[EnvSync-LE](https://marketplace.visualstudio.com/items?itemName=nolindnaidoo.envsync-le)** - Keep .env files in sync with visual diffs • [Open VSX](https://open-vsx.org/extension/nolindnaidoo/envsync-le)
74
+
-**[Paths-LE](https://marketplace.visualstudio.com/items?itemName=nolindnaidoo.paths-le)** - Extract file paths from imports and dependencies • [Open VSX](https://open-vsx.org/extension/nolindnaidoo/paths-le)
75
+
-**[URLs-LE](https://marketplace.visualstudio.com/items?itemName=nolindnaidoo.urls-le)** - Audit API endpoints and external resources • [Open VSX](https://open-vsx.org/extension/nolindnaidoo/urls-le)
76
+
-**[Colors-LE](https://marketplace.visualstudio.com/items?itemName=nolindnaidoo.colors-le)** - Extract and analyze colors from stylesheets • [Open VSX](https://open-vsx.org/extension/nolindnaidoo/colors-le)
77
+
-**[Dates-LE](https://marketplace.visualstudio.com/items?itemName=nolindnaidoo.dates-le)** - Extract temporal data from logs and APIs • [Open VSX](https://open-vsx.org/extension/nolindnaidoo/dates-le)
137
78
138
-
### robots.txt Compliance
79
+
##💡 Use Cases
139
80
140
-
Verify crawling is allowed:
81
+
-**Pre-Scraper Validation** - Check if sites are reachable before writing scraper code
82
+
-**Anti-Bot Detection** - Identify Cloudflare, reCAPTCHA, hCaptcha before deployment
83
+
-**Rate Limit Discovery** - Find rate limits before hitting them in production
84
+
-**robots.txt Compliance** - Verify crawling is allowed by site policies
85
+
-**Auth Wall Detection** - Check if login or paywalls block access
86
+
Disallow: /admin/, /api/internal/
87
+
Crawl-delay: 10 seconds
88
+
Sitemap: https://example.com/sitemap.xml
141
89
142
-
```typescript
143
-
// Check robots.txt
144
-
Disallow:/admin/, /api/internal/
145
-
Crawl-delay: 10seconds
146
-
Sitemap: https://example.com/sitemap.xml
147
-
```
90
+
````
148
91
149
92
## 🚀 Quick Start
150
93
@@ -170,7 +113,7 @@ On first use, Scrape-LE automatically detects if Chromium is installed and promp
170
113
171
114
```bash
172
115
bunx playwright install chromium
173
-
```
116
+
````
174
117
175
118
Or run from Command Palette: **"Scrape-LE: Setup Browser"**
176
119
@@ -249,152 +192,65 @@ See [`docs/CONFIGURATION.md`](docs/CONFIGURATION.md).
249
192
250
193
## ⚡ Performance
251
194
252
-
Scrape-LE is optimized for fast feedback:
195
+
Scrape-LE performance varies by target website and network. See [detailed benchmarks](docs/PERFORMANCE.md).
0 commit comments