-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Which package is this bug report for? If unsure which one to select, leave blank
None
Issue description
I've written a script to show where ps is used when using Crawlee's Cheerio scraper:
🔍 PROCESS CALL DETECTED: spawn
Command: ps
Args: ["-A","-o","ppid,pid,stat,rss,comm"]
Full Stack trace:
1: at traceProcessCall (/Users/alexpchin/Documents/work/agent247/packages/trigger.dev/scripts/debug-branch-scraping.ts:20:19)
2: at <anonymous> (/Users/alexpchin/Documents/work/agent247/packages/trigger.dev/scripts/debug-branch-scraping.ts:49:13)
3: at childrenOfPid (/Users/alexpchin/Documents/work/agent247/node_modules/.pnpm/@apify+ps-tree@1.2.0/node_modules/@apify/ps-tree/index.js:53:21)
4: at node:internal/util:442:7
5: at new Promise (<anonymous>)
6: at childrenOfPid (node:internal/util:428:12)
7: at getMemoryInfo (/Users/alexpchin/Documents/work/agent247/node_modules/.pnpm/@crawlee+utils@3.14.1/node_modules/@crawlee/src/internals/memory-info.ts:78:33)
8: at LocalEventManager.createMemoryInfo (/Users/alexpchin/Documents/work/agent247/node_modules/.pnpm/@crawlee+core@3.14.1/node_modules/@crawlee/src/events/local_event_manager.ts:113:29)
9: at LocalEventManager.createSystemInfo (/Users/alexpchin/Documents/work/agent247/node_modules/.pnpm/@crawlee+core@3.14.1/node_modules/@crawlee/src/events/local_event_manager.ts:81:17)
10: at LocalEventManager.emitSystemInfoEvent (/Users/alexpchin/Documents/work/agent247/node_modules/.pnpm/@crawlee+core@3.14.1/node_modules/@crawlee/src/events/local_event_manager.ts:46:22)
---
INFO CheerioCrawler: Starting the crawler.
🔍 PROCESS CALL DETECTED: spawn
Command: ps
Args: ["-A","-o","ppid,pid,stat,rss,comm"]
Full Stack trace:
1: at traceProcessCall (/Users/alexpchin/Documents/work/agent247/packages/trigger.dev/scripts/debug-branch-scraping.ts:20:19)
2: at <anonymous> (/Users/alexpchin/Documents/work/agent247/packages/trigger.dev/scripts/debug-branch-scraping.ts:49:13)
3: at childrenOfPid (/Users/alexpchin/Documents/work/agent247/node_modules/.pnpm/@apify+ps-tree@1.2.0/node_modules/@apify/ps-tree/index.js:53:21)
4: at node:internal/util:442:7
5: at new Promise (<anonymous>)
6: at childrenOfPid (node:internal/util:428:12)
7: at getMemoryInfo (/Users/alexpchin/Documents/work/agent247/node_modules/.pnpm/@crawlee+utils@3.14.1/node_modules/@crawlee/src/internals/memory-info.ts:78:33)
8: at Snapshotter.start (/Users/alexpchin/Documents/work/agent247/node_modules/.pnpm/@crawlee+core@3.14.1/node_modules/@crawlee/src/autoscaling/snapshotter.ts:205:33)
9: at AutoscaledPool.run (/Users/alexpchin/Documents/work/agent247/node_modules/.pnpm/@crawlee+core@3.14.1/node_modules/@crawlee/src/autoscaling/autoscaled_pool.ts:368:9)
10: at CheerioCrawler.run (/Users/alexpchin/Documents/work/agent247/node_modules/.pnpm/@crawlee+basic@3.14.1/node_modules/@crawlee/src/internals/basic-crawler.ts:977:13)
---
It seems like the ps command is being called from two different locations:
1. System Info Detection (Line 10 in stack trace)
LocalEventManager.emitSystemInfoEvent (/Users/alexpchin/Documents/work/agent247/node_modules/.pnpm/@crawlee+core@3.14.1/node_modules/@crawlee/src/events/local_event_manager.ts:46:22)
2. Memory Snapshotter (Lines 3-9 in stack trace)
childrenOfPid (/Users/alexpchin/Documents/work/agent247/node_modules/.pnpm/@apify+ps-tree@1.2.0/node_modules/@apify/ps-tree/index.js:53:21)
getMemoryInfo (/Users/alexpchin/Documents/work/agent247/node_modules/.pnpm/@crawlee+utils@3.14.1/node_modules/@crawlee/src/internals/memory-info.ts:78:33)
Snapshotter.start (/Users/alexpchin/Documents/work/agent247/node_modules/.pnpm/@crawlee+core@3.14.1/node_modules/@crawlee/src/autoscaling/snapshotter.ts:205:33)
I've tried disabling with:
// Enable experimental system info v2 for better containerized environment support
Configuration.set('systemInfoV2', true);
// Force containerized mode to avoid ps-based system detection
Configuration.set('containerized', true);
// Set fixed memory value to avoid ps-based memory detection
// Using 1024MB (1GB) to match small-2x machine configuration
Configuration.set('memoryMbytes', 1024);
But I'm still seeing the error:
[31mERROR�[39m Memory snapshot failed.
spawn ps ENOENT
�[90m at Process.ChildProcess._handle.onexit (node:internal/child_process:286:19)�[39m
�[90m at onErrorNT (node:internal/child_process:484:16)�[39m
�[90m at processTicksAndRejections (node:internal/process/task_queues:82:21)�[39m
In production.
The code does now run with the above Configurations being set but the error is still being shown which suggests something else needs to respect those options.
Code sample
Package version
"@crawlee/cheerio": "^3.14.1"
Node.js version
v20.11.1
Operating system
Mac Sequoia
Apify platform
- Tick me if you encountered this issue on the Apify platform
I have tested this on the next
release
3.14.2-beta.14
Other context
No response