diff --git a/docs/config/config-file.mdx b/docs/config/config-file.mdx index e80b7685f9..5ac5e21f41 100644 --- a/docs/config/config-file.mdx +++ b/docs/config/config-file.mdx @@ -4,6 +4,7 @@ sidebarTitle: "trigger.config.ts" description: "This file is used to configure your project and how it's built." --- +import ScrapingWarning from "/snippets/web-scraping-warning.mdx"; import BundlePackages from "/snippets/bundle-packages.mdx"; The `trigger.config.ts` file is used to configure your Trigger.dev project. It is a TypeScript file at the root of your project that exports a default configuration object. Here's an example: @@ -473,6 +474,32 @@ export default defineConfig({ }); ``` +#### puppeteer + + + +To use Puppeteer in your project, add these build settings to your `trigger.config.ts` file: + +```ts trigger.config.ts +import { defineConfig } from "@trigger.dev/sdk/v3"; + +export default defineConfig({ + project: "", + // Your other config settings... + build: { + extensions: [puppeteer()], + }, +}); +``` + +And add the following environment variable in your Trigger.dev dashboard on the Environment Variables page: + +```bash +PUPPETEER_EXECUTABLE_PATH: "/usr/bin/google-chrome-stable", +``` + +Follow [this example](/examples/puppeteer) to get setup with Trigger.dev and Puppeteer in your project. + #### ffmpeg You can add the `ffmpeg` build extension to your build process: @@ -482,7 +509,7 @@ import { defineConfig } from "@trigger.dev/sdk/v3"; import { ffmpeg } from "@trigger.dev/build/extensions/core"; export default defineConfig({ - //..other stuff + // Your other config settings... build: { extensions: [ffmpeg()], }, @@ -505,6 +532,8 @@ export default defineConfig({ This extension will also add the `FFMPEG_PATH` and `FFPROBE_PATH` to your environment variables, making it easy to use popular ffmpeg libraries like `fluent-ffmpeg`. +Follow [this example](/examples/ffmpeg-video-processing) to get setup with Trigger.dev and FFmpeg in your project. + #### esbuild plugins You can easily add existing or custom esbuild plugins to your build process using the `esbuildPlugin` extension: diff --git a/docs/examples/intro.mdx b/docs/examples/intro.mdx index ee1a2844b1..391db784cb 100644 --- a/docs/examples/intro.mdx +++ b/docs/examples/intro.mdx @@ -11,6 +11,7 @@ description: "Learn how to use Trigger.dev with these practical task examples." | [OpenAI with retrying](/examples/open-ai-with-retrying) | Create a reusable OpenAI task with custom retry options. | | [PDF to image](/examples/pdf-to-image) | Use `MuPDF` to turn a PDF into images and save them to Cloudflare R2. | | [React to PDF](/examples/react-pdf) | Use `react-pdf` to generate a PDF and save it to Cloudflare R2. | +| [Puppeteer](/examples/puppeteer) | Use Puppeteer to generate a PDF or scrape a webpage. | | [Resend email sequence](/examples/resend-email-sequence) | Send a sequence of emails over several days using Resend with Trigger.dev. | | [Sharp image processing](/examples/sharp-image-processing) | Use Sharp to process an image and save it to Cloudflare R2. | | [Stripe webhook](/examples/stripe-webhook) | Trigger a task from Stripe webhook events. | diff --git a/docs/examples/puppeteer.mdx b/docs/examples/puppeteer.mdx new file mode 100644 index 0000000000..94bd20d2ff --- /dev/null +++ b/docs/examples/puppeteer.mdx @@ -0,0 +1,217 @@ +--- +title: "Puppeteer" +sidebarTitle: "Puppeteer" +description: "These examples demonstrate how to use Puppeteer with Trigger.dev." +--- + +import LocalDevelopment from "/snippets/local-development-extensions.mdx"; +import ScrapingWarning from "/snippets/web-scraping-warning.mdx"; + +## Overview + +There are 3 example tasks to follow on this page: + +1. [Basic example](/examples/puppeteer#basic-example) +2. [Generate a PDF from a web page](/examples/puppeteer#generate-a-pdf-from-a-web-page) +3. [Scrape content from a web page](/examples/puppeteer#scrape-content-from-a-web-page) + + + +## Build configurations + +To use all examples on this page, you'll first need to add these build settings to your `trigger.config.ts` file: + +```ts trigger.config.ts +import { defineConfig } from "@trigger.dev/sdk/v3"; + +export default defineConfig({ + project: "", + // Your other config settings... + build: { + // This is required to use the Puppeteer library + extensions: [puppeteer()], + }, +}); +``` + +Learn more about [build configurations](/config/config-file#build-configuration) including setting default retry settings, customizing the build environment, and more. + +## Set an environment variable + +Set the following environment variable in your [Trigger.dev dashboard](/deploy-environment-variables) or [using the SDK](/deploy-environment-variables#in-your-code): + +```bash +PUPPETEER_EXECUTABLE_PATH: "/usr/bin/google-chrome-stable", +``` + +## Basic example + +### Overview + +In this example we use [Puppeteer](https://pptr.dev/) to log out the title of a web page, in this case from the [Trigger.dev](https://trigger.dev) landing page. + +### Task code + +```ts trigger/puppeteer-basic-example.ts +import { logger, task } from "@trigger.dev/sdk/v3"; +import puppeteer from "puppeteer"; + +export const puppeteerTask = task({ + id: "puppeteer-log-title", + run: async () => { + const browser = await puppeteer.launch(); + const page = await browser.newPage(); + + await page.goto("https://trigger.dev"); + + const content = await page.title(); + logger.info("Content", { content }); + + await browser.close(); + }, +}); +``` + +### Testing your task + +There's no payload required for this task so you can just click "Run test" from the Test page in the dashboard. Learn more about testing tasks [here](/run-tests). + +## Generate a PDF from a web page + +### Overview + +In this example we use [Puppeteer](https://pptr.dev/) to generate a PDF from the [Trigger.dev](https://trigger.dev) landing page and upload it to [Cloudflare R2](https://developers.cloudflare.com/r2/). + +### Task code + +```ts trigger/puppeteer-generate-pdf.ts +import { logger, task } from "@trigger.dev/sdk/v3"; +import puppeteer from "puppeteer"; +import { PutObjectCommand, S3Client } from "@aws-sdk/client-s3"; + +// Initialize S3 client +const s3Client = new S3Client({ + region: "auto", + endpoint: process.env.S3_ENDPOINT, + credentials: { + accessKeyId: process.env.R2_ACCESS_KEY_ID ?? "", + secretAccessKey: process.env.R2_SECRET_ACCESS_KEY ?? "", + }, +}); + +export const puppeteerWebpageToPDF = task({ + id: "puppeteer-webpage-to-pdf", + run: async () => { + const browser = await puppeteer.launch(); + const page = await browser.newPage(); + const response = await page.goto("https://trigger.dev"); + const url = response?.url() ?? "No URL found"; + + // Generate PDF from the web page + const generatePdf = await page.pdf(); + + logger.info("PDF generated from URL", { url }); + + await browser.close(); + + // Upload to R2 + const s3Key = `pdfs/test.pdf`; + const uploadParams = { + Bucket: process.env.S3_BUCKET, + Key: s3Key, + Body: generatePdf, + ContentType: "application/pdf", + }; + + logger.log("Uploading to R2 with params", uploadParams); + + // Upload the PDF to R2 and return the URL. + await s3Client.send(new PutObjectCommand(uploadParams)); + const s3Url = `https://${process.env.S3_BUCKET}.s3.amazonaws.com/${s3Key}`; + logger.log("PDF uploaded to R2", { url: s3Url }); + return { pdfUrl: s3Url }; + }, +}); + +``` + +### Testing your task + +There's no payload required for this task so you can just click "Run test" from the Test page in the dashboard. Learn more about testing tasks [here](/run-tests). + +## Scrape content from a web page + +### Overview + +In this example we use [Puppeteer](https://pptr.dev/) with a [BrowserBase](https://www.browserbase.com/) proxy to scrape the GitHub stars count from the [Trigger.dev](https://trigger.dev) landing page and log it out. See [this list](/examples/puppeteer#proxying) for more proxying services we recommend. + + + When web scraping, you MUST use the technique below which uses a proxy with Puppeteer. Direct scraping without using `browserWSEndpoint` is prohibited and will result in account suspension. + + +### Task code + +```ts trigger/scrape-website.ts +import { logger, task } from "@trigger.dev/sdk/v3"; +import puppeteer from "puppeteer-core"; + +export const puppeteerScrapeWithProxy = task({ + id: "puppeteer-scrape-with-proxy", + run: async () => { + const browser = await puppeteer.connect({ + browserWSEndpoint: `wss://connect.browserbase.com?apiKey=${process.env.BROWSERBASE_API_KEY}`, + }); + + const page = await browser.newPage(); + + // Set up BrowserBase proxy authentication + await page.authenticate({ + username: "api", + password: process.env.BROWSERBASE_API_KEY || "", + }); + + try { + // Navigate to the target website + await page.goto("https://trigger.dev", { waitUntil: "networkidle0" }); + + // Scrape the GitHub stars count + const starCount = await page.evaluate(() => { + const starElement = document.querySelector(".github-star-count"); + const text = starElement?.textContent ?? "0"; + const numberText = text.replace(/[^0-9]/g, ""); + return parseInt(numberText); + }); + + logger.info("GitHub star count", { starCount }); + + return { starCount }; + } catch (error) { + logger.error("Error during scraping", { + error: error instanceof Error ? error.message : String(error), + }); + throw error; + } finally { + await browser.close(); + } + }, +}); +``` + +### Testing your task + +There's no payload required for this task so you can just click "Run test" from the Test page in the dashboard. Learn more about testing tasks [here](/run-tests). + + + +## Proxying + +If you're using Trigger.dev Cloud and Puppeteer or any other tool to scrape content from websites you don't own, you'll need to proxy your requests. **If you don't you'll risk getting our IP address blocked and we will ban you from our service.** + +Here are a list of proxy services we recommend: + +- [Browserbase](https://www.browserbase.com/) +- [Brightdata](https://brightdata.com/) +- [Browserless](https://browserless.io/) +- [Oxylabs](https://oxylabs.io/) +- [ScrapingBee](https://scrapingbee.com/) +- [Smartproxy](https://smartproxy.com/) \ No newline at end of file diff --git a/docs/mint.json b/docs/mint.json index 187a44f604..63eb479d9d 100644 --- a/docs/mint.json +++ b/docs/mint.json @@ -324,6 +324,7 @@ "examples/ffmpeg-video-processing", "examples/open-ai-with-retrying", "examples/pdf-to-image", + "examples/puppeteer", "examples/sharp-image-processing", "examples/stripe-webhook", "examples/supabase-storage-upload", diff --git a/docs/snippets/web-scraping-warning.mdx b/docs/snippets/web-scraping-warning.mdx new file mode 100644 index 0000000000..651d69da22 --- /dev/null +++ b/docs/snippets/web-scraping-warning.mdx @@ -0,0 +1,3 @@ + + **WEB SCRAPING:** When web scraping, you MUST use a proxy to comply with our terms of service. Direct scraping of third-party websites without the site owner's permission using Trigger.dev Cloud is prohibited and will result in account suspension. See [this example](/examples/puppeteer#scrape-content-from-a-web-page) using a proxy. + \ No newline at end of file