Skip to content

Commit 8d64ce0

Browse files
authored
Small fixes to AI Web Scraper (#130)
* docs: update tutorial content, instructions, and styling for AI web scraper guide - Modified tutorial text to clarify workflow steps and architecture overview - Added import for Steps component and updated content structure - Updated descriptions to improve clarity on AI analysis and parallel processing - Changed image class for responsiveness and added CSS for better SVG scaling - Updated environment setup instructions and prerequisites for clarity - Minor formatting and wording improvements for consistency and readability * docs: update tutorial documentation by removing outdated learning steps * docs: update tutorial steps for AI web scraper with clearer instructions - Rephrased step descriptions for clarity and consistency - Changed "Setting up database tables for results" to "Create database tables for storing scraping results" - Updated "Creating modular task functions" to "Build modular task functions for web scraping" - Clarified "Integrating OpenAI with structured outputs for AI analysis" - Rephrased "Building parallel DAG workflows" to "Design parallel DAG workflows for concurrent processing" - Changed "Executing flows in Supabase Edge Functions" to "Execute flows using Supabase Edge Functions" * feat: add JoinCommunity component and integrate it into tutorial pages - Created a new JoinCommunity component with Discord link - Replaced previous inline Discord links with the new component in multiple docs - Improved consistency and maintainability of community links across pages * docs: update tutorial steps and instructions in backend.md * docs: add README detailing pgflow functions directory structure and best practices - Introduces a comprehensive README for the pgflow functions directory - Describes flow and task organization, naming conventions, and error handling - Provides guidance on designing modular, reusable, and type-safe functions - Includes information on edge function workers and supporting files - Enhances documentation to improve maintainability and developer onboarding * feat: update tutorial documentation with revised backend setup and flow configuration - Clarify steps for creating database table, task functions, and pgflow workflow - Improve code snippets and instructions for flow compilation and deployment - Add notes on parallel execution, retries, and debugging - Enhance overall clarity and structure of the tutorial content * docs: add guide for deleting flow and its data during development - Introduced a new documentation page explaining how to completely remove a flow and all associated data - Updated related flow management documentation to reference the new delete flow procedure - Clarified usage scenarios, warnings, and post-deletion steps for development workflows - Enhanced navigation with links to versioning, flow options, and flow code organization guides * feat: add SQL functions and tests for deleting flows and pruning old records Introduce a new SQL function to delete a flow and all associated data, along with unit tests to verify its correctness. Also add a pruning function for old records and instructions for manual installation. Update documentation to include usage examples and warnings about destructive operations. These changes facilitate development workflows and data management during testing and development phases. * docs: add note about flow immutability and its benefits in the documentation - Updated compile-to-sql.md to include a note explaining that flow definitions are immutable - Added details on why immutability matters for production stability, audit trails, and consistency - Included a collapsible section with further explanation on the implications of immutability
1 parent 2f13e8b commit 8d64ce0

File tree

11 files changed

+418
-191
lines changed

11 files changed

+418
-191
lines changed
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# pgflow Functions Directory Structure
2+
3+
This directory contains pgflow functions organized according to best practices for maintainability, reusability, and clarity.
4+
5+
## Key Components
6+
7+
### `_flows/` Directory
8+
9+
Contains flow definitions that compose tasks into directed acyclic graphs (DAGs):
10+
11+
- **analyze_website.ts** - Orchestrates website analysis by coordinating scraping, summarization, tagging, and saving tasks
12+
13+
Flows define:
14+
15+
- Execution order
16+
- Parallelism opportunities
17+
- Data dependencies between tasks
18+
- Error handling and retry logic
19+
20+
### `_tasks/` Directory
21+
22+
Contains small, focused functions that each perform a single unit of work:
23+
24+
- **scrapeWebsite.ts** - Fetches content from a given URL
25+
- **convertToCleanMarkdown.ts** - Converts HTML to clean Markdown format
26+
- **summarizeWithAI.ts** - Uses AI to generate content summaries
27+
- **extractTags.ts** - Extracts relevant tags from content using AI
28+
- **saveWebsite.ts** - Persists website data to the database
29+
30+
Tasks are:
31+
32+
- Modular and reusable across different flows
33+
- Testable in isolation
34+
- Designed with clear inputs and outputs
35+
- JSON-serializable (required by pgflow)
36+
37+
### Edge Function Workers
38+
39+
Each flow has a corresponding edge function worker that executes the flow logic. By convention, workers are numbered (e.g., `analyze_website_worker_0`, `analyze_website_worker_1`) to enable multiple concurrent workers for the same flow.
40+
41+
### Supporting Files
42+
43+
- **utils.ts** - Shared utilities for database connections and common operations
44+
- **database-types.d.ts** - TypeScript type definitions generated from the database schema
45+
- **deno.json** - Configuration for Deno runtime in Edge Functions
46+
- **deno.lock** - Lock file ensuring consistent dependency versions
47+
48+
## Best Practices
49+
50+
1. **Task Design**: Keep tasks focused on a single responsibility
51+
2. **Flow Organization**: Use descriptive names and group related logic
52+
3. **Type Safety**: Leverage TypeScript for flow inputs/outputs
53+
4. **Error Handling**: Configure appropriate retries and timeouts
54+
5. **JSON Serialization**: Ensure all data is JSON-serializable
55+
56+
For more details on organizing pgflow code, see the documentation at:
57+
https://pgflow.io/how-to/organize-flows-code/
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
/**
2+
* Deletes a flow and all its associated data.
3+
* WARNING: This is destructive and should only be used during development.
4+
*
5+
* @param flow_slug - The slug of the flow to delete
6+
*/
7+
create or replace function pgflow.delete_flow_and_data(
8+
flow_slug TEXT
9+
) returns void language plpgsql as $$
10+
BEGIN
11+
-- Drop queue and archive table
12+
PERFORM pgmq.drop_queue(delete_flow_and_data.flow_slug);
13+
14+
-- Delete all associated data in the correct order
15+
DELETE FROM pgflow.step_tasks WHERE step_tasks.flow_slug = delete_flow_and_data.flow_slug;
16+
DELETE FROM pgflow.step_states WHERE step_states.flow_slug = delete_flow_and_data.flow_slug;
17+
DELETE FROM pgflow.runs WHERE runs.flow_slug = delete_flow_and_data.flow_slug;
18+
DELETE FROM pgflow.deps WHERE deps.flow_slug = delete_flow_and_data.flow_slug;
19+
DELETE FROM pgflow.steps WHERE steps.flow_slug = delete_flow_and_data.flow_slug;
20+
DELETE FROM pgflow.flows WHERE flows.flow_slug = delete_flow_and_data.flow_slug;
21+
22+
RAISE NOTICE 'Flow % and all associated data has been deleted', delete_flow_and_data.flow_slug;
23+
END
24+
$$;
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
begin;
2+
select plan(9);
3+
select pgflow_tests.reset_db();
4+
5+
-- Load the delete_flow_and_data function
6+
\i _shared/delete_flow_and_data.sql.raw
7+
8+
-- Create test flow with steps and dependencies
9+
select pgflow.create_flow('test_flow_to_delete', max_attempts => 0);
10+
select pgflow.add_step('test_flow_to_delete', 'step1');
11+
select pgflow.add_step('test_flow_to_delete', 'step2', ARRAY['step1']);
12+
13+
-- Start a flow run to generate data
14+
select pgflow.start_flow('test_flow_to_delete', '{}'::jsonb);
15+
16+
-- Test that data exists before deletion
17+
select is(
18+
(select count(*) from pgflow.flows where flow_slug = 'test_flow_to_delete'),
19+
1::bigint,
20+
'Flow should exist before deletion'
21+
);
22+
select is(
23+
(select count(*) from pgflow.steps where flow_slug = 'test_flow_to_delete'),
24+
2::bigint,
25+
'Steps should exist before deletion'
26+
);
27+
select is(
28+
(select count(*) from pgflow.deps where flow_slug = 'test_flow_to_delete'),
29+
1::bigint,
30+
'Dependencies should exist before deletion'
31+
);
32+
select is(
33+
(select count(*) from pgflow.runs where flow_slug = 'test_flow_to_delete'),
34+
1::bigint,
35+
'Run should exist before deletion'
36+
);
37+
select is(
38+
(select count(*) from pgflow.step_states where flow_slug = 'test_flow_to_delete'),
39+
2::bigint,
40+
'Step states should exist before deletion'
41+
);
42+
43+
-- Execute the delete function
44+
select pgflow.delete_flow_and_data('test_flow_to_delete');
45+
46+
-- Test that all data has been deleted
47+
select is(
48+
(select count(*) from pgflow.flows where flow_slug = 'test_flow_to_delete'),
49+
0::bigint,
50+
'Flow should be deleted'
51+
);
52+
select is(
53+
(select count(*) from pgflow.steps where flow_slug = 'test_flow_to_delete'),
54+
0::bigint,
55+
'Steps should be deleted'
56+
);
57+
select is(
58+
(select count(*) from pgflow.runs where flow_slug = 'test_flow_to_delete'),
59+
0::bigint,
60+
'Runs should be deleted'
61+
);
62+
select is(
63+
(select count(*) from pgflow.step_states where flow_slug = 'test_flow_to_delete'),
64+
0::bigint,
65+
'Step states should be deleted'
66+
);
67+
68+
select finish();
69+
rollback;
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
---
2+
import { Card } from '@astrojs/starlight/components';
3+
---
4+
5+
<h2>Join the Community</h2>
6+
7+
<Card title="Connect on Discord" icon="discord">
8+
Have questions or need help? pgflow is just getting started - join us on Discord to ask questions, share feedback, or discuss partnership opportunities.
9+
10+
<h5><strong><a href="https://discord.com/invite/NpffdEyb">Join Discord →</a></strong></h5>
11+
</Card>

pkgs/website/src/content/docs/getting-started/compile-to-sql.mdx

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,3 +90,26 @@ If successful, you should see output like:
9090
```
9191
Applying migration 20250505120000_create_greet_user_flow.sql...done
9292
```
93+
94+
:::note[Flow definitions are immutable]
95+
Once a flow is registered in the database, its structure cannot be modified. To change a flow, you can either [delete it](/how-to/delete-flow-and-data/) (development only) or use [versioning](/how-to/version-your-flows/).
96+
97+
<details>
98+
<summary>Why immutability matters</summary>
99+
100+
<br/>
101+
102+
**What does immutability mean?**
103+
104+
Once a flow is registered, you cannot modify its structure - no adding/removing steps, changing dependencies, or renaming steps.
105+
106+
**Why this protects production:**
107+
108+
- Running workflows continue safely through deployments
109+
- Historical runs remain intact with their original structure
110+
- Flows behave consistently from start to finish
111+
- Complete audit trails are preserved
112+
113+
This ensures your production workflows are stable, predictable, and never break mid-execution.
114+
</details>
115+
:::
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
---
2+
title: Delete Flow and its Data
3+
description: Completely remove a flow and all its data during development
4+
sidebar:
5+
order: 55
6+
---
7+
8+
import { Aside, Code } from "@astrojs/starlight/components";
9+
import deleteFlowFunctionCode from '../../../../../core/supabase/tests/_shared/delete_flow_and_data.sql.raw?raw';
10+
11+
During development, you may want to completely remove a flow and all its associated data to start fresh. This operation is destructive and should **never be used in production**.
12+
13+
<Aside type="danger" title="Data Loss Warning">
14+
This permanently deletes all flow data including:
15+
- All run history
16+
- All queued and archived messages
17+
- All task outputs
18+
- All flow definitions
19+
</Aside>
20+
21+
## When to Use This
22+
23+
This approach is useful when:
24+
- You need to make breaking changes during development
25+
- You want to clean up test data
26+
- You're iterating on flow structure and need a fresh start
27+
28+
For production environments, always use [versioned flows](/how-to/version-your-flows/) instead (e.g., `my_flow_v1`, `my_flow_v2`) to safely deploy new versions while maintaining complete flow history.
29+
30+
## Using the Delete Function
31+
32+
pgflow includes a delete function that accepts a flow slug parameter:
33+
34+
```sql
35+
pgflow.delete_flow_and_data(flow_slug TEXT)
36+
```
37+
38+
Example:
39+
```sql
40+
-- Delete a specific flow
41+
SELECT pgflow.delete_flow_and_data('your_flow_slug');
42+
```
43+
44+
## Installing the Function
45+
46+
To install this function, run the following SQL directly in your database using psql or Supabase Studio. This approach helps prevent accidentally deploying this destructive function to production.
47+
48+
<Aside type="caution">
49+
This function is not yet included in the default pgflow migrations since pgflow is still in early development and I'm learning how users interact with it. It has been thoroughly unit tested, but you'll need to manually add it to your project as shown in the <a href="#the-delete-function">Delete Function</a> section below.
50+
</Aside>
51+
52+
## The Delete Function
53+
54+
Run this SQL to install the delete function:
55+
56+
<Code lang="sql" code={deleteFlowFunctionCode} />
57+
58+
## After Deleting
59+
60+
Once you've deleted the flow:
61+
62+
1. You can compile and deploy a fresh version without conflicts
63+
2. The flow slug becomes available for reuse
64+
3. All historical data is permanently lost
65+
66+
## See Also
67+
68+
- [Version your flows](/how-to/version-your-flows/) - Safe flow updates for production
69+
- [Update flow options](/how-to/update-flow-options/) - Non-breaking configuration changes

pkgs/website/src/content/docs/how-to/prune-old-records.mdx

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -59,8 +59,7 @@ Make sure to adjust intervals and time of pruning based on size of your tables a
5959

6060
### 1. Install the pruning function
6161

62-
Create a new migration and paste [contents of the pruning function](#the-pruning-function),
63-
then run `supabase migrations up`.
62+
To install this function, run the [pruning function SQL](#the-pruning-function) directly in your database using psql or Supabase Studio.
6463

6564
### 2. Setup pg_cron schedule
6665

@@ -84,6 +83,6 @@ SELECT * FROM cron.job;
8483

8584
## The Pruning Function
8685

87-
Add this SQL function to a migration file:
86+
Run this SQL to install the pruning function:
8887

8988
<Code lang="sql" code={pruningFunctionCode} />

pkgs/website/src/content/docs/how-to/version-your-flows.mdx

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@ sidebar:
55
order: 50
66
---
77

8+
import { CardGrid, LinkCard } from '@astrojs/starlight/components';
9+
810
## Current Compilation Limitations
911

1012
**Important:** The current version of pgflow's compiler has several limitations:
@@ -65,3 +67,22 @@ We've created a detailed guide on [how to update flow options](/how-to/update-fl
6567
- Best practices for maintaining compatibility
6668

6769
For any non-breaking changes to existing flows, refer to this guide rather than recompiling.
70+
71+
## Development Workarounds
72+
73+
During development, if you need to make breaking changes to a flow, you can [delete the flow and its data](/how-to/delete-flow-and-data/) entirely and start fresh. This approach deletes all flow data and should never be used in production.
74+
75+
## See Also
76+
77+
<CardGrid>
78+
<LinkCard
79+
title="Delete flow and its data"
80+
description="Remove a flow completely during development"
81+
href="/how-to/delete-flow-and-data/"
82+
/>
83+
<LinkCard
84+
title="Update flow options"
85+
description="Non-breaking configuration changes"
86+
href="/how-to/update-flow-options/"
87+
/>
88+
</CardGrid>

0 commit comments

Comments
 (0)