Skip to content

Use of ps in containerized environments like Trigger.dev causes error #3152

@alexpchin

Description

@alexpchin

Which package is this bug report for? If unsure which one to select, leave blank

None

Issue description

I've written a script to show where ps is used when using Crawlee's Cheerio scraper:

🔍 PROCESS CALL DETECTED: spawn
   Command: ps
   Args: ["-A","-o","ppid,pid,stat,rss,comm"]
   Full Stack trace:
     1: at traceProcessCall (/Users/alexpchin/Documents/work/agent247/packages/trigger.dev/scripts/debug-branch-scraping.ts:20:19)
     2: at <anonymous> (/Users/alexpchin/Documents/work/agent247/packages/trigger.dev/scripts/debug-branch-scraping.ts:49:13)
     3: at childrenOfPid (/Users/alexpchin/Documents/work/agent247/node_modules/.pnpm/@apify+ps-tree@1.2.0/node_modules/@apify/ps-tree/index.js:53:21)
     4: at node:internal/util:442:7
     5: at new Promise (<anonymous>)
     6: at childrenOfPid (node:internal/util:428:12)
     7: at getMemoryInfo (/Users/alexpchin/Documents/work/agent247/node_modules/.pnpm/@crawlee+utils@3.14.1/node_modules/@crawlee/src/internals/memory-info.ts:78:33)
     8: at LocalEventManager.createMemoryInfo (/Users/alexpchin/Documents/work/agent247/node_modules/.pnpm/@crawlee+core@3.14.1/node_modules/@crawlee/src/events/local_event_manager.ts:113:29)
     9: at LocalEventManager.createSystemInfo (/Users/alexpchin/Documents/work/agent247/node_modules/.pnpm/@crawlee+core@3.14.1/node_modules/@crawlee/src/events/local_event_manager.ts:81:17)
     10: at LocalEventManager.emitSystemInfoEvent (/Users/alexpchin/Documents/work/agent247/node_modules/.pnpm/@crawlee+core@3.14.1/node_modules/@crawlee/src/events/local_event_manager.ts:46:22)
---
INFO  CheerioCrawler: Starting the crawler.
🔍 PROCESS CALL DETECTED: spawn
   Command: ps
   Args: ["-A","-o","ppid,pid,stat,rss,comm"]
   Full Stack trace:
     1: at traceProcessCall (/Users/alexpchin/Documents/work/agent247/packages/trigger.dev/scripts/debug-branch-scraping.ts:20:19)
     2: at <anonymous> (/Users/alexpchin/Documents/work/agent247/packages/trigger.dev/scripts/debug-branch-scraping.ts:49:13)
     3: at childrenOfPid (/Users/alexpchin/Documents/work/agent247/node_modules/.pnpm/@apify+ps-tree@1.2.0/node_modules/@apify/ps-tree/index.js:53:21)
     4: at node:internal/util:442:7
     5: at new Promise (<anonymous>)
     6: at childrenOfPid (node:internal/util:428:12)
     7: at getMemoryInfo (/Users/alexpchin/Documents/work/agent247/node_modules/.pnpm/@crawlee+utils@3.14.1/node_modules/@crawlee/src/internals/memory-info.ts:78:33)
     8: at Snapshotter.start (/Users/alexpchin/Documents/work/agent247/node_modules/.pnpm/@crawlee+core@3.14.1/node_modules/@crawlee/src/autoscaling/snapshotter.ts:205:33)
     9: at AutoscaledPool.run (/Users/alexpchin/Documents/work/agent247/node_modules/.pnpm/@crawlee+core@3.14.1/node_modules/@crawlee/src/autoscaling/autoscaled_pool.ts:368:9)
     10: at CheerioCrawler.run (/Users/alexpchin/Documents/work/agent247/node_modules/.pnpm/@crawlee+basic@3.14.1/node_modules/@crawlee/src/internals/basic-crawler.ts:977:13)
---

It seems like the ps command is being called from two different locations:

1. System Info Detection (Line 10 in stack trace)
LocalEventManager.emitSystemInfoEvent (/Users/alexpchin/Documents/work/agent247/node_modules/.pnpm/@crawlee+core@3.14.1/node_modules/@crawlee/src/events/local_event_manager.ts:46:22)

2. Memory Snapshotter (Lines 3-9 in stack trace)
childrenOfPid (/Users/alexpchin/Documents/work/agent247/node_modules/.pnpm/@apify+ps-tree@1.2.0/node_modules/@apify/ps-tree/index.js:53:21)
getMemoryInfo (/Users/alexpchin/Documents/work/agent247/node_modules/.pnpm/@crawlee+utils@3.14.1/node_modules/@crawlee/src/internals/memory-info.ts:78:33)
Snapshotter.start (/Users/alexpchin/Documents/work/agent247/node_modules/.pnpm/@crawlee+core@3.14.1/node_modules/@crawlee/src/autoscaling/snapshotter.ts:205:33)

I've tried disabling with:

  // Enable experimental system info v2 for better containerized environment support
  Configuration.set('systemInfoV2', true);

  // Force containerized mode to avoid ps-based system detection
  Configuration.set('containerized', true);

  // Set fixed memory value to avoid ps-based memory detection
  // Using 1024MB (1GB) to match small-2x machine configuration
  Configuration.set('memoryMbytes', 1024);

But I'm still seeing the error:

[31mERROR�[39m Memory snapshot failed.
  spawn ps ENOENT
  �[90m    at Process.ChildProcess._handle.onexit (node:internal/child_process:286:19)�[39m
  �[90m    at onErrorNT (node:internal/child_process:484:16)�[39m
  �[90m    at processTicksAndRejections (node:internal/process/task_queues:82:21)�[39m

In production.

The code does now run with the above Configurations being set but the error is still being shown which suggests something else needs to respect those options.

Code sample

Package version

"@crawlee/cheerio": "^3.14.1"

Node.js version

v20.11.1

Operating system

Mac Sequoia

Apify platform

  • Tick me if you encountered this issue on the Apify platform

I have tested this on the next release

3.14.2-beta.14

Other context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working.t-toolingIssues with this label are in the ownership of the tooling team.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions