-
Notifications
You must be signed in to change notification settings - Fork 30
Allowing to ignore errors and proceed. #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Sometimes weird errors happen, terminating download session. Specifying `--ignore-errors` flag allows to, well, ignore errors and keep going, which may be desirable behavior in some cases.
WalkthroughReplaces direct Playwright usage with a new browser abstraction ( Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant CLI as index.js (CLI)
participant Browser as browser.js (createBrowser)
participant Page as Page / Navigator
participant Store as ProgressStore
User->>CLI: run command (flags)
CLI->>CLI: parse options (commander, parseBooleanOption)
CLI->>Browser: createBrowser(userDataDir, headless, userAgent)
Browser-->>CLI: persistent context
rect rgb(240, 248, 255)
CLI->>Page: open / navigate loop
Page->>Store: getProgress()
loop per-photo
Page->>Page: downloadPhoto(), write metadata
Page->>CLI: increment photosDownloadedInSession
alt photosDownloadedInSession >= maxPerSession
CLI->>Browser: close / restart
CLI->>Browser: createBrowser(...)
end
end
end
alt error thrown
Page->>CLI: throw
CLI->>Store: saveProgress(currentUrl)
alt ignoreErrors = true
CLI->>Page: continue loop
else
CLI-->>User: log error + suggest --ignore-errors, exit(1)
end
else success
CLI-->>User: complete normally
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
index.js (1)
28-33: Consider using a dedicated argument parsing library.The current argument parsing using
process.argv.includes()is functional but basic. For better robustness and future extensibility (help text, validation, type checking), consider using a library likeyargsorcommander.Example with
yargs:import yargs from 'yargs'; import { hideBin } from 'yargs/helpers'; const argv = yargs(hideBin(process.argv)) .option('headless', { type: 'boolean', default: true, description: 'Run browser in headless mode' }) .option('ignore-errors', { type: 'boolean', default: false, description: 'Continue downloading despite errors' }) .help() .argv; const headless = argv.headless; const ignoreErrors = argv.ignoreErrors;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
index.js(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
index.js (1)
setup.js (1)
page(33-33)
| try { | ||
| await page.evaluate(() => document.getElementsByClassName('SxgK2b OQEhnd')[0].click()) | ||
|
|
||
| // we wait until new photo is loaded | ||
| await page.waitForURL((url) => { | ||
| return url.host === 'photos.google.com' && url.href !== currentUrl | ||
| }, | ||
| { | ||
| timeout: timeoutValue, | ||
| }) | ||
|
|
||
| await downloadPhoto(page) | ||
| await saveProgress(page) | ||
| } catch (error) { | ||
| console.error('An error occurred:', error); | ||
| await saveProgress(page); | ||
| if (!ignoreErrors) { | ||
| console.log('Aborting due to error. Use --ignore-errors to continue on errors.'); | ||
| process.exit(1); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add consecutive error limit to prevent infinite loops.
When ignoreErrors is enabled and errors persist (e.g., DOM structure changed, network issues), the script will loop indefinitely, repeatedly encountering the same error. This wastes resources and provides no clear indication when manual intervention is needed.
Add a consecutive error counter to abort after a threshold is reached:
chromium.use(stealth())
const timeoutValue = 300000
const userDataDir = './session'
const downloadPath = './download'
let headless = true
// accept --headless=false argument to run in headful mode
if (process.argv[2] === '--headless=false') {
headless = false
}
let ignoreErrors = false;
// accept --ignore-errors argument to control error handling behavior
if (process.argv.includes('--ignore-errors')) {
ignoreErrors = true;
}
+
+const MAX_CONSECUTIVE_ERRORS = 3;
+let consecutiveErrors = 0;Then update the catch block:
} catch (error) {
console.error('An error occurred:', error);
await saveProgress(page);
+ consecutiveErrors++;
+
+ if (consecutiveErrors >= MAX_CONSECUTIVE_ERRORS) {
+ console.error(`Aborting after ${MAX_CONSECUTIVE_ERRORS} consecutive errors.`);
+ process.exit(1);
+ }
+
if (!ignoreErrors) {
console.log('Aborting due to error. Use --ignore-errors to continue on errors.');
process.exit(1);
}
+ console.log(`Continuing despite error (${consecutiveErrors}/${MAX_CONSECUTIVE_ERRORS} consecutive errors)...`);
}
}And reset the counter on success (after line 153):
await downloadPhoto(page)
await saveProgress(page)
+ consecutiveErrors = 0; // Reset on success
} catch (error) {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try { | |
| await page.evaluate(() => document.getElementsByClassName('SxgK2b OQEhnd')[0].click()) | |
| // we wait until new photo is loaded | |
| await page.waitForURL((url) => { | |
| return url.host === 'photos.google.com' && url.href !== currentUrl | |
| }, | |
| { | |
| timeout: timeoutValue, | |
| }) | |
| await downloadPhoto(page) | |
| await saveProgress(page) | |
| } catch (error) { | |
| console.error('An error occurred:', error); | |
| await saveProgress(page); | |
| if (!ignoreErrors) { | |
| console.log('Aborting due to error. Use --ignore-errors to continue on errors.'); | |
| process.exit(1); | |
| } | |
| } | |
| chromium.use(stealth()) | |
| const timeoutValue = 300000 | |
| const userDataDir = './session' | |
| const downloadPath = './download' | |
| let headless = true | |
| // accept --headless=false argument to run in headful mode | |
| if (process.argv[2] === '--headless=false') { | |
| headless = false | |
| } | |
| let ignoreErrors = false | |
| // accept --ignore-errors argument to control error handling behavior | |
| if (process.argv.includes('--ignore-errors')) { | |
| ignoreErrors = true | |
| } | |
| const MAX_CONSECUTIVE_ERRORS = 3 | |
| let consecutiveErrors = 0 | |
| // … inside your main loop … | |
| try { | |
| await page.evaluate(() => document.getElementsByClassName('SxgK2b OQEhnd')[0].click()) | |
| // we wait until new photo is loaded | |
| await page.waitForURL( | |
| (url) => url.host === 'photos.google.com' && url.href !== currentUrl, | |
| { timeout: timeoutValue } | |
| ) | |
| await downloadPhoto(page) | |
| await saveProgress(page) | |
| consecutiveErrors = 0 // Reset on success | |
| } catch (error) { | |
| console.error('An error occurred:', error) | |
| await saveProgress(page) | |
| consecutiveErrors++ | |
| if (consecutiveErrors >= MAX_CONSECUTIVE_ERRORS) { | |
| console.error(`Aborting after ${MAX_CONSECUTIVE_ERRORS} consecutive errors.`) | |
| process.exit(1) | |
| } | |
| if (!ignoreErrors) { | |
| console.log('Aborting due to error. Use --ignore-errors to continue on errors.') | |
| process.exit(1) | |
| } | |
| console.log( | |
| `Continuing despite error (${consecutiveErrors}/${MAX_CONSECUTIVE_ERRORS} consecutive errors)...` | |
| ) | |
| } |
🤖 Prompt for AI Agents
In index.js around lines 141 to 161, add a consecutive error counter and
threshold to avoid infinite loops when ignoreErrors is true: declare a let
consecutiveErrorCount = 0 and const MAX_CONSECUTIVE_ERRORS = <reasonable number>
in the surrounding scope, increment consecutiveErrorCount inside the catch block
before deciding whether to continue, and if consecutiveErrorCount >=
MAX_CONSECUTIVE_ERRORS log a clear fatal message and call process.exit(1)
regardless of ignoreErrors; after successful operations (immediately after line
153 where the success path continues) reset consecutiveErrorCount = 0 so
transient failures don't trigger the abort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
index.js (2)
40-47: Trim.lastdoneand preserve original error.Trailing newlines commonly break
new URL(...). Alsothrow new Error(error)loses stack/message.-const getProgress = async () => { +const getProgress = async () => { try { - const lastDone = await fsP.readFile('.lastdone', 'utf-8') - if (lastDone === '') throw new Error('Please add the starting link in .lastdone file') - return lastDone + const lastDone = (await fsP.readFile('.lastdone', 'utf-8')).trim(); + if (lastDone === '') throw new Error('Please add the starting link in .lastdone file'); + return lastDone; } catch (error) { - throw new Error(error) + throw error; // preserves original message/stack } }
202-214: Fix ReferenceError:pathis undefined in catch; recompute target path and validate.In the catch block,
pathis out of scope (declared inside try). This will throw and skip moving the file.} catch (error) { - const randomNumber = Math.floor(Math.random() * 1000000) - const fileName = await download.suggestedFilename().replace(/(\.[\w\d_-]+)$/i, `_${randomNumber}$1`) - - var downloadFilePath = path - - // check for long paths that could result in ENAMETOOLONG and truncate if necessary - if (downloadFilePath.length > 225) { - downloadFilePath = truncatePath(downloadFilePath) - } - - await moveFile(temp, `${downloadFilePath}`) - console.log('Download Complete:', `${downloadFilePath}`) + const randomNumber = Math.floor(Math.random() * 1000000); + const randomizedFileName = (await download.suggestedFilename()).replace(/(\.[\w\d_-]+)$/i, `_${randomNumber}$1`); + let downloadFilePath = validatePath(`${downloadPath}/${year}/${month}/${randomizedFileName}`); + // check for long paths that could result in ENAMETOOLONG and truncate if necessary + if (downloadFilePath.length > 225) { + downloadFilePath = truncatePath(downloadFilePath); + } + await moveFile(temp, downloadFilePath); + console.log('Download Complete:', downloadFilePath); }
♻️ Duplicate comments (1)
index.js (1)
157-164: Consecutive error limit to avoid infinite loops (already suggested).This loop can spin forever with
--ignore-errors. The prior review’s recommendation to add a max consecutive error counter still applies.
🧹 Nitpick comments (2)
browser.js (2)
12-15: Avoid hard‑codingchannel: 'chromium'(portability).For most environments the Playwright-managed Chromium is preferred. Forcing a system channel can fail if not installed. Make it configurable or drop it.
- return chromium.launchPersistentContext(path.resolve(userDataDir), { - headless, - channel: 'chromium', + return chromium.launchPersistentContext(path.resolve(userDataDir), { + headless, + // channel: 'chromium', // consider making this configurable
16-24: Guard--no-sandboxusage.Disabling the sandbox weakens security; only needed in specific containers/root setups. Consider toggling via CLI/env and omit by default.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (4)
browser.js(1 hunks)download/.gitignore(0 hunks)index.js(4 hunks)package.json(1 hunks)
💤 Files with no reviewable changes (1)
- download/.gitignore
✅ Files skipped from review due to trivial changes (1)
- package.json
🧰 Additional context used
🧬 Code graph analysis (2)
browser.js (1)
index.js (3)
userDataDir(15-15)headless(34-34)userAgent(9-12)
index.js (1)
browser.js (2)
createBrowser(7-29)createBrowser(7-29)
🔇 Additional comments (1)
index.js (1)
18-31: CLI boolean parsing looks solid.Good use of a coercer to support true/false/1/0/yes/no and defaulting behavior.
| // Check if the maximum number of photos in session has been reached | ||
| if (maxNumPhotosInSession !== -1 && photosDownloadedInSession >= maxNumPhotosInSession) { | ||
| console.log(`Downloaded ${photosDownloadedInSession} photos in this session. Restarting browser...`); | ||
| await browser.close(); | ||
| photosDownloadedInSession = 0; | ||
| browser = await createBrowser(userDataDir, headless, userAgent) | ||
| page = await browser.newPage(); | ||
| await page.goto(currentUrl); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Session restart uses stale URL; resume from the current page instead.
currentUrl was captured before navigation. On restart you’ll revisit the old photo and risk duplicates.
- if (maxNumPhotosInSession !== -1 && photosDownloadedInSession >= maxNumPhotosInSession) {
+ if (maxNumPhotosInSession !== -1 && photosDownloadedInSession >= maxNumPhotosInSession) {
console.log(`Downloaded ${photosDownloadedInSession} photos in this session. Restarting browser...`);
- await browser.close();
+ const resumeUrl = await page.url(); // capture current page before closing
+ await browser.close();
photosDownloadedInSession = 0;
browser = await createBrowser(userDataDir, headless, userAgent)
page = await browser.newPage();
- await page.goto(currentUrl);
+ await page.goto(resumeUrl);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Check if the maximum number of photos in session has been reached | |
| if (maxNumPhotosInSession !== -1 && photosDownloadedInSession >= maxNumPhotosInSession) { | |
| console.log(`Downloaded ${photosDownloadedInSession} photos in this session. Restarting browser...`); | |
| await browser.close(); | |
| photosDownloadedInSession = 0; | |
| browser = await createBrowser(userDataDir, headless, userAgent) | |
| page = await browser.newPage(); | |
| await page.goto(currentUrl); | |
| } | |
| // Check if the maximum number of photos in session has been reached | |
| if (maxNumPhotosInSession !== -1 && photosDownloadedInSession >= maxNumPhotosInSession) { | |
| console.log(`Downloaded ${photosDownloadedInSession} photos in this session. Restarting browser...`); | |
| const resumeUrl = await page.url(); // capture current page before closing | |
| await browser.close(); | |
| photosDownloadedInSession = 0; | |
| browser = await createBrowser(userDataDir, headless, userAgent) | |
| page = await browser.newPage(); | |
| await page.goto(resumeUrl); | |
| } |
🤖 Prompt for AI Agents
In index.js around lines 148-156, the restart logic uses the stale variable
currentUrl captured earlier, causing a revisit of the old photo; before closing
the browser capture the actual current page URL (e.g., const resumeUrl = await
page.url()), then after creating the new browser and page navigate to that
captured resumeUrl instead of currentUrl so scraping resumes where it left off.
Sometimes weird errors happen, terminating download session. Specifying
--ignore-errorsflag allows to, well, ignore errors and keep going, which may be desirable behavior in some cases.Summary by CodeRabbit
New Features
Bug Fixes