Skip to content

Conversation

@sgzmd
Copy link

@sgzmd sgzmd commented Oct 13, 2025

Sometimes weird errors happen, terminating download session. Specifying --ignore-errors flag allows to, well, ignore errors and keep going, which may be desirable behavior in some cases.

Summary by CodeRabbit

  • New Features

    • Added CLI options to control headless mode and session limits: --ignore-errors and --max-num-photos-in-session.
    • Progress is now persisted/restored so processing can resume from the last saved position.
    • Sessions automatically restart when the per-session photo limit is reached to continue downloads.
  • Bug Fixes

    • Improved error handling: errors are logged and progress is saved; with --ignore-errors processing continues instead of aborting.

Sometimes weird errors happen, terminating download session. Specifying
`--ignore-errors` flag allows to, well, ignore errors and keep going,
which may be desirable behavior in some cases.
@coderabbitai
Copy link

coderabbitai bot commented Oct 13, 2025

Walkthrough

Replaces direct Playwright usage with a new browser abstraction (createBrowser), adds a commander-based CLI (headless, ignore-errors, max photos per session), robust boolean parsing, progress save/load, per-session download limits with browser restarts, and try/catch error handling that logs and persists progress and optionally continues.

Changes

Cohort / File(s) Summary
Main CLI & flow
index.js
Reworked CLI to use commander; added parseBooleanOption, options object, --headless, --ignore-errors, --max-num-photos-in-session; introduced progress helpers (getProgress, saveProgress); wrapped navigation/download loop in try/catch; tracks photosDownloaded and photosDownloadedInSession; restarts browser when per-session max reached; on error logs, saves progress, and either continues or exits based on --ignore-errors.
Browser abstraction
browser.js
New module exporting createBrowser(userDataDir, headless, userAgent) that resolves userAgent, applies stealth plugin, and launches a persistent Playwright Chromium context with configured args, viewport, and download settings.
Dependencies
package.json
Added commander dependency (^14.0.1).
Ignored file removed
download/.gitignore
Deleted .gitignore that previously ignored all files except itself.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant CLI as index.js (CLI)
  participant Browser as browser.js (createBrowser)
  participant Page as Page / Navigator
  participant Store as ProgressStore

  User->>CLI: run command (flags)
  CLI->>CLI: parse options (commander, parseBooleanOption)

  CLI->>Browser: createBrowser(userDataDir, headless, userAgent)
  Browser-->>CLI: persistent context

  rect rgb(240, 248, 255)
  CLI->>Page: open / navigate loop
  Page->>Store: getProgress()
  loop per-photo
    Page->>Page: downloadPhoto(), write metadata
    Page->>CLI: increment photosDownloadedInSession
    alt photosDownloadedInSession >= maxPerSession
      CLI->>Browser: close / restart
      CLI->>Browser: createBrowser(...)
    end
  end
  end

  alt error thrown
    Page->>CLI: throw
    CLI->>Store: saveProgress(currentUrl)
    alt ignoreErrors = true
      CLI->>Page: continue loop
    else
      CLI-->>User: log error + suggest --ignore-errors, exit(1)
    end
  else success
    CLI-->>User: complete normally
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

I hop through code with whiskers keen,
New tunnels built and pathways clean.
I save my steps when storms appear,
Restart my paws when limits near.
Flag me brave or make me rest—I'll cheer. 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly conveys the main feature added by the pull request—enabling the tool to ignore errors and continue processing—without extraneous details, and it aligns directly with the described objectives of the changeset.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
index.js (1)

28-33: Consider using a dedicated argument parsing library.

The current argument parsing using process.argv.includes() is functional but basic. For better robustness and future extensibility (help text, validation, type checking), consider using a library like yargs or commander.

Example with yargs:

import yargs from 'yargs';
import { hideBin } from 'yargs/helpers';

const argv = yargs(hideBin(process.argv))
  .option('headless', {
    type: 'boolean',
    default: true,
    description: 'Run browser in headless mode'
  })
  .option('ignore-errors', {
    type: 'boolean',
    default: false,
    description: 'Continue downloading despite errors'
  })
  .help()
  .argv;

const headless = argv.headless;
const ignoreErrors = argv.ignoreErrors;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a5593b and f510525.

📒 Files selected for processing (1)
  • index.js (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
index.js (1)
setup.js (1)
  • page (33-33)

Comment on lines 141 to 161
try {
await page.evaluate(() => document.getElementsByClassName('SxgK2b OQEhnd')[0].click())

// we wait until new photo is loaded
await page.waitForURL((url) => {
return url.host === 'photos.google.com' && url.href !== currentUrl
},
{
timeout: timeoutValue,
})

await downloadPhoto(page)
await saveProgress(page)
} catch (error) {
console.error('An error occurred:', error);
await saveProgress(page);
if (!ignoreErrors) {
console.log('Aborting due to error. Use --ignore-errors to continue on errors.');
process.exit(1);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add consecutive error limit to prevent infinite loops.

When ignoreErrors is enabled and errors persist (e.g., DOM structure changed, network issues), the script will loop indefinitely, repeatedly encountering the same error. This wastes resources and provides no clear indication when manual intervention is needed.

Add a consecutive error counter to abort after a threshold is reached:

 chromium.use(stealth())
 
 const timeoutValue = 300000
 const userDataDir = './session'
 const downloadPath = './download'
 
 let headless = true
 
 // accept --headless=false argument to run in headful mode
 if (process.argv[2] === '--headless=false') {
   headless = false
 }
 
 let ignoreErrors = false;
 
 // accept --ignore-errors argument to control error handling behavior
 if (process.argv.includes('--ignore-errors')) {
   ignoreErrors = true;
 }
+
+const MAX_CONSECUTIVE_ERRORS = 3;
+let consecutiveErrors = 0;

Then update the catch block:

     } catch (error) {
       console.error('An error occurred:', error);
       await saveProgress(page);
+      consecutiveErrors++;
+      
+      if (consecutiveErrors >= MAX_CONSECUTIVE_ERRORS) {
+        console.error(`Aborting after ${MAX_CONSECUTIVE_ERRORS} consecutive errors.`);
+        process.exit(1);
+      }
+      
       if (!ignoreErrors) {
         console.log('Aborting due to error. Use --ignore-errors to continue on errors.');
         process.exit(1);
       }
+      console.log(`Continuing despite error (${consecutiveErrors}/${MAX_CONSECUTIVE_ERRORS} consecutive errors)...`);
     }
   }

And reset the counter on success (after line 153):

       await downloadPhoto(page)
       await saveProgress(page)
+      consecutiveErrors = 0;  // Reset on success
     } catch (error) {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try {
await page.evaluate(() => document.getElementsByClassName('SxgK2b OQEhnd')[0].click())
// we wait until new photo is loaded
await page.waitForURL((url) => {
return url.host === 'photos.google.com' && url.href !== currentUrl
},
{
timeout: timeoutValue,
})
await downloadPhoto(page)
await saveProgress(page)
} catch (error) {
console.error('An error occurred:', error);
await saveProgress(page);
if (!ignoreErrors) {
console.log('Aborting due to error. Use --ignore-errors to continue on errors.');
process.exit(1);
}
}
chromium.use(stealth())
const timeoutValue = 300000
const userDataDir = './session'
const downloadPath = './download'
let headless = true
// accept --headless=false argument to run in headful mode
if (process.argv[2] === '--headless=false') {
headless = false
}
let ignoreErrors = false
// accept --ignore-errors argument to control error handling behavior
if (process.argv.includes('--ignore-errors')) {
ignoreErrors = true
}
const MAX_CONSECUTIVE_ERRORS = 3
let consecutiveErrors = 0
// … inside your main loop …
try {
await page.evaluate(() => document.getElementsByClassName('SxgK2b OQEhnd')[0].click())
// we wait until new photo is loaded
await page.waitForURL(
(url) => url.host === 'photos.google.com' && url.href !== currentUrl,
{ timeout: timeoutValue }
)
await downloadPhoto(page)
await saveProgress(page)
consecutiveErrors = 0 // Reset on success
} catch (error) {
console.error('An error occurred:', error)
await saveProgress(page)
consecutiveErrors++
if (consecutiveErrors >= MAX_CONSECUTIVE_ERRORS) {
console.error(`Aborting after ${MAX_CONSECUTIVE_ERRORS} consecutive errors.`)
process.exit(1)
}
if (!ignoreErrors) {
console.log('Aborting due to error. Use --ignore-errors to continue on errors.')
process.exit(1)
}
console.log(
`Continuing despite error (${consecutiveErrors}/${MAX_CONSECUTIVE_ERRORS} consecutive errors)...`
)
}
🤖 Prompt for AI Agents
In index.js around lines 141 to 161, add a consecutive error counter and
threshold to avoid infinite loops when ignoreErrors is true: declare a let
consecutiveErrorCount = 0 and const MAX_CONSECUTIVE_ERRORS = <reasonable number>
in the surrounding scope, increment consecutiveErrorCount inside the catch block
before deciding whether to continue, and if consecutiveErrorCount >=
MAX_CONSECUTIVE_ERRORS log a clear fatal message and call process.exit(1)
regardless of ignoreErrors; after successful operations (immediately after line
153 where the success path continues) reset consecutiveErrorCount = 0 so
transient failures don't trigger the abort.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
index.js (2)

40-47: Trim .lastdone and preserve original error.

Trailing newlines commonly break new URL(...). Also throw new Error(error) loses stack/message.

-const getProgress = async () => {
+const getProgress = async () => {
   try {
-    const lastDone = await fsP.readFile('.lastdone', 'utf-8')
-    if (lastDone === '') throw new Error('Please add the starting link in .lastdone file')
-    return lastDone
+    const lastDone = (await fsP.readFile('.lastdone', 'utf-8')).trim();
+    if (lastDone === '') throw new Error('Please add the starting link in .lastdone file');
+    return lastDone;
   } catch (error) {
-    throw new Error(error)
+    throw error; // preserves original message/stack
   }
 }

202-214: Fix ReferenceError: path is undefined in catch; recompute target path and validate.

In the catch block, path is out of scope (declared inside try). This will throw and skip moving the file.

   } catch (error) {
-    const randomNumber = Math.floor(Math.random() * 1000000)
-    const fileName = await download.suggestedFilename().replace(/(\.[\w\d_-]+)$/i, `_${randomNumber}$1`)
-
-    var downloadFilePath = path
-
-    // check for long paths that could result in ENAMETOOLONG and truncate if necessary
-    if (downloadFilePath.length > 225) {
-      downloadFilePath = truncatePath(downloadFilePath)
-    }
-
-    await moveFile(temp, `${downloadFilePath}`)
-    console.log('Download Complete:', `${downloadFilePath}`)
+    const randomNumber = Math.floor(Math.random() * 1000000);
+    const randomizedFileName = (await download.suggestedFilename()).replace(/(\.[\w\d_-]+)$/i, `_${randomNumber}$1`);
+    let downloadFilePath = validatePath(`${downloadPath}/${year}/${month}/${randomizedFileName}`);
+    // check for long paths that could result in ENAMETOOLONG and truncate if necessary
+    if (downloadFilePath.length > 225) {
+      downloadFilePath = truncatePath(downloadFilePath);
+    }
+    await moveFile(temp, downloadFilePath);
+    console.log('Download Complete:', downloadFilePath);
   }
♻️ Duplicate comments (1)
index.js (1)

157-164: Consecutive error limit to avoid infinite loops (already suggested).

This loop can spin forever with --ignore-errors. The prior review’s recommendation to add a max consecutive error counter still applies.

🧹 Nitpick comments (2)
browser.js (2)

12-15: Avoid hard‑coding channel: 'chromium' (portability).

For most environments the Playwright-managed Chromium is preferred. Forcing a system channel can fail if not installed. Make it configurable or drop it.

-  return chromium.launchPersistentContext(path.resolve(userDataDir), {
-    headless,
-    channel: 'chromium',
+  return chromium.launchPersistentContext(path.resolve(userDataDir), {
+    headless,
+    // channel: 'chromium', // consider making this configurable

16-24: Guard --no-sandbox usage.

Disabling the sandbox weakens security; only needed in specific containers/root setups. Consider toggling via CLI/env and omit by default.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f510525 and fe6888b.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (4)
  • browser.js (1 hunks)
  • download/.gitignore (0 hunks)
  • index.js (4 hunks)
  • package.json (1 hunks)
💤 Files with no reviewable changes (1)
  • download/.gitignore
✅ Files skipped from review due to trivial changes (1)
  • package.json
🧰 Additional context used
🧬 Code graph analysis (2)
browser.js (1)
index.js (3)
  • userDataDir (15-15)
  • headless (34-34)
  • userAgent (9-12)
index.js (1)
browser.js (2)
  • createBrowser (7-29)
  • createBrowser (7-29)
🔇 Additional comments (1)
index.js (1)

18-31: CLI boolean parsing looks solid.

Good use of a coercer to support true/false/1/0/yes/no and defaulting behavior.

Comment on lines +148 to +156
// Check if the maximum number of photos in session has been reached
if (maxNumPhotosInSession !== -1 && photosDownloadedInSession >= maxNumPhotosInSession) {
console.log(`Downloaded ${photosDownloadedInSession} photos in this session. Restarting browser...`);
await browser.close();
photosDownloadedInSession = 0;
browser = await createBrowser(userDataDir, headless, userAgent)
page = await browser.newPage();
await page.goto(currentUrl);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Session restart uses stale URL; resume from the current page instead.

currentUrl was captured before navigation. On restart you’ll revisit the old photo and risk duplicates.

-      if (maxNumPhotosInSession !== -1 && photosDownloadedInSession >= maxNumPhotosInSession) {
+      if (maxNumPhotosInSession !== -1 && photosDownloadedInSession >= maxNumPhotosInSession) {
         console.log(`Downloaded ${photosDownloadedInSession} photos in this session. Restarting browser...`);
-        await browser.close();
+        const resumeUrl = await page.url(); // capture current page before closing
+        await browser.close();
         photosDownloadedInSession = 0;
         browser = await createBrowser(userDataDir, headless, userAgent)
         page = await browser.newPage();
-        await page.goto(currentUrl);
+        await page.goto(resumeUrl);
       }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Check if the maximum number of photos in session has been reached
if (maxNumPhotosInSession !== -1 && photosDownloadedInSession >= maxNumPhotosInSession) {
console.log(`Downloaded ${photosDownloadedInSession} photos in this session. Restarting browser...`);
await browser.close();
photosDownloadedInSession = 0;
browser = await createBrowser(userDataDir, headless, userAgent)
page = await browser.newPage();
await page.goto(currentUrl);
}
// Check if the maximum number of photos in session has been reached
if (maxNumPhotosInSession !== -1 && photosDownloadedInSession >= maxNumPhotosInSession) {
console.log(`Downloaded ${photosDownloadedInSession} photos in this session. Restarting browser...`);
const resumeUrl = await page.url(); // capture current page before closing
await browser.close();
photosDownloadedInSession = 0;
browser = await createBrowser(userDataDir, headless, userAgent)
page = await browser.newPage();
await page.goto(resumeUrl);
}
🤖 Prompt for AI Agents
In index.js around lines 148-156, the restart logic uses the stale variable
currentUrl captured earlier, causing a revisit of the old photo; before closing
the browser capture the actual current page URL (e.g., const resumeUrl = await
page.url()), then after creating the new browser and page navigate to that
captured resumeUrl instead of currentUrl so scraping resumes where it left off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant