-
Notifications
You must be signed in to change notification settings - Fork 56
feat(slackBotMessageIngestion): Added option to include bot message #1097
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…uring channel ingestion
Summary of ChangesHello @naSim087, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new feature that allows users to explicitly include bot messages when performing manual Slack channel ingestion. Previously, only user-generated messages were ingested. This change provides a UI option and updates the backend logic to correctly parse and store bot messages, enhancing the completeness of ingested Slack data. Additionally, a new script has been added to manage the re-ingestion of existing channels, demonstrating the flexibility of the updated ingestion pipeline. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds an end-to-end includeBotMessage flag: frontend form field and checkbox, API payload forwarding, types/schema update, ingestion pipeline changes to process and insert bot messages, and a new Vespa-driven Slack channel re-ingestion script with batching, continuation, concurrency, and stats. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Admin as Admin User
participant UI as Frontend ManualIngestionForm
participant API as IngestMoreChannelApi
participant Ingest as handleSlackChannelIngestion
participant Slack as Slack API
Admin->>UI: Toggle "Include Bot Messages" + Submit
UI->>API: POST { ..., includeBotMessage }
API->>Ingest: handleSlackChannelIngestion(..., includeBotMessages)
Ingest->>Slack: Fetch channel messages
alt includeBotMessages = true
Ingest->>Ingest: extractBotMessageText, processBotMessageMentions, apply bot metadata
end
Ingest->>Ingest: insert messages (user messages + optional bot messages)
Ingest-->>API: Done
API-->>UI: Response
sequenceDiagram
autonumber
participant Script as runSlackChannelProcessing
participant Vespa as SlackContainerVespaClient
participant DB as DB (Users/Connectors)
participant Ingest as handleSlackChannelIngestion
Script->>DB: getUsersWithSlackSyncJobs(appId)
loop batches
Script->>Vespa: fetch batch (continuation)
Vespa-->>Script: docs[], continuationToken?
loop per doc
Script->>DB: getSlackConnectorForUser(email)
alt valid user+connector
Script->>Ingest: handleSlackChannelIngestion(connId, [channel], ..., includeBotMessages?)
Ingest-->>Script: status
else no valid user
Script-->>Script: record skipped
end
end
Script->>Script: update stats, store continuation
end
Script-->>Script: final stats
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧬 Code graph analysis (1)server/integrations/slack/channelIngest.ts (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a new feature to include bot messages during Slack channel ingestion. The changes span the frontend, API, and backend ingestion logic. The core logic in channelIngest.ts
has been updated to handle bot messages, but this has introduced significant code duplication and a potential performance issue. Additionally, a new script for re-ingesting channels has been added, which contains some minor issues like using console.log
and not supporting the new feature. My review focuses on improving the maintainability and performance of the new ingestion logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
server/integrations/slack/channelIngest.ts (1)
640-745
: Bot replies: same gating and missing user issues as parent messagesApply analogous fixes for replies:
-const isBotReply = - includeBotMessages && - reply.type === "message" && - reply.bot_id && - reply.text != "" +const hasTextOrBlocksReply = + (reply.text && reply.text.trim().length > 0) || + (Array.isArray(reply.blocks) && reply.blocks.length > 0) +const isBotReply = + includeBotMessages && + reply.type === "message" && + reply.bot_id && + hasTextOrBlocksReply @@ - reply.team = await getTeam(client, reply) + if (!reply.team) { + try { + const t = await safeGetTeamInfo(client) + reply.team = t.id! + } catch {} + } @@ if (isBotReply) { @@ - reply.client_msg_id = reply.client_msg_id || customBotId + reply.client_msg_id = reply.client_msg_id || customBotId + reply.user = reply.user || (reply.bot_profile?.id as any) || reply.bot_id @@ - reply.text = finalBotText + reply.text = finalBotText || reply.text || "(no text)"server/api/admin.ts (1)
1501-1506
: Change ingestMoreChannelSchema connectorId to z.coerce.number()
ingestMoreChannelSchema (server/types.ts:558) usesz.number()
but IngestMoreChannelApi receivesconnectorId
as a string and parses it after validation—this will 400 on string input. Update toz.coerce.number()
or accept/transform a string before validation.
Optional: renameincludeBotMessage
→includeBotMessages
for consistency.
🧹 Nitpick comments (5)
server/types.ts (1)
558-563
: Add includeBotMessage: good, but align naming and connectorId type across layers
- Field addition LGTM and matches frontend payload.
- Naming is singular here but backend code paths use includeBotMessages (plural). Align to one name to avoid future confusion.
- Also, admin API and frontend send connectorId as string; this schema requires a number. Prefer coercion to avoid client breakage.
Suggested updates:
export const ingestMoreChannelSchema = z.object({ - connectorId: z.number(), + connectorId: z.coerce.number(), channelsToIngest: z.array(z.string()), startDate: z.string(), endDate: z.string(), - includeBotMessage: z.boolean().optional().default(false), + includeBotMessage: z.boolean().optional().default(false), })Optionally rename consistently:
- includeBotMessage: z.boolean().optional().default(false), + includeBotMessages: z.boolean().optional().default(false),server/scripts/slackChannelProcessing.ts (3)
118-121
: Debug console.log in server pathconsole.log noise in server-side script; prefer structured logger to keep logs consistent.
- const data = await response.json() - console.log(data) + const data = await response.json() + logger.debug({ docCount: data.documents?.length ?? 0 }, "Visit response")Also applies to: 129-133
231-239
: Bot messages not enabled in re-ingestion scriptYou always pass includeBotMessages: false to handleSlackChannelIngestion. If the intent is to support bot messages in batch re-ingestion, make it configurable via options.
- await handleSlackChannelIngestion( + await handleSlackChannelIngestion( connectorId, [docId], "", "", selectedUser, - false + options?.includeBotMessages ?? false )
18-26
: Unused config/options fields
- Stats field channelsAlreadyProcessed and option skipProcessed are never used.
- Either implement or remove to reduce confusion.
Also applies to: 258-269
server/integrations/slack/channelIngest.ts (1)
350-408
: processBotMessageMentions: OK, but be mindful of rate limitsLogic is fine and memoizes users; sequential users.info calls may hit rate limits for many mentions. Consider batching or p-limit if this becomes hot.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
frontend/src/routes/_authenticated/admin/integrations/slack.tsx
(5 hunks)server/api/admin.ts
(2 hunks)server/integrations/slack/channelIngest.ts
(7 hunks)server/scripts/slackChannelProcessing.ts
(1 hunks)server/types.ts
(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
server/integrations/slack/channelIngest.ts (1)
server/integrations/slack/index.ts (2)
getTeam
(680-701)insertChatMessage
(764-836)
frontend/src/routes/_authenticated/admin/integrations/slack.tsx (1)
frontend/src/components/ui/label.tsx (1)
Label
(24-24)
server/scripts/slackChannelProcessing.ts (6)
server/logger/index.ts (2)
getLogger
(36-93)Subsystem
(15-15)server/config.ts (2)
NAMESPACE
(20-20)CLUSTER
(21-21)server/types.ts (1)
TxnOrClient
(319-319)server/db/schema/syncJobs.ts (1)
syncJobs
(27-59)server/db/connector.ts (1)
getConnectorByAppAndEmailId
(449-477)server/integrations/slack/channelIngest.ts (1)
handleSlackChannelIngestion
(1115-1376)
🔇 Additional comments (8)
server/types.ts (1)
23-26
: Enum string normalization looks fineNo behavior change; consistent quoting.
frontend/src/routes/_authenticated/admin/integrations/slack.tsx (4)
176-181
: ManualIngestionFormData updatedType addition LGTM and matches server/types default.
1131-1133
: Default form valuesincludeBotMessage: false default aligns with schema default. Good.
1164-1181
: Payload wiringincludeBotMessage forwarded correctly in both admin and non-admin branches.
If the server route is validated with zod, ensure connectorId accepts strings or coerce to number server-side (see server/types.ts suggestion).
1261-1281
: UI checkboxCheckbox is bound and labelled correctly with accessible htmlFor/checked/onChange wiring. LGTM.
server/integrations/slack/channelIngest.ts (2)
412-442
: extractBotMessageTextReasonable aggregation of block text. LGTM.
458-459
: Threading includeBotMessages through insertChannelMessages and usagePlumbing looks correct. After above fixes, bot messages should ingest reliably when flag is true.
Also applies to: 1312-1313
server/scripts/slackChannelProcessing.ts (1)
70-98
: Selection and concurrency are valid on/document/v1/.../docid
Per Vespa API reference, bothselection
andconcurrency
are supported query parameters for visiting documents.Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (5)
server/integrations/slack/channelIngest.ts (3)
350-407
: Avoid sequential users.info calls in mention processing; fetch concurrently and replace via regexCurrent loop fetches user info per mention sequentially → slow and rate-limit prone. Batch missing user IDs and fetch concurrently (limited), then do a single regex replace.
export async function processBotMessageMentions( @@ - // Find all user mentions - const mentions = text.match(mentionRegex) - if (!mentions) return processedText - - for (const mention of mentions) { - const userId = mention.slice(2, -1) // Remove <@ and > - // Check if user is already in memberMap - let user = memberMap.get(userId) - // If not in map, fetch user info - if (!user) { - try { - const userResponse = await client.users.info({ user: userId }) - if (userResponse.ok && userResponse.user) { - user = userResponse.user as User - memberMap.set(userId, user) - } - } catch (error) { - // If fetching fails, continue with original mention - continue - } - } - if (user) { - let replacementName: string - // Check if this is a bot user - if (user.is_bot) { - // For bots, use the bot's name or real_name - replacementName = - user.profile?.real_name || - user.real_name || - user.name || - "Unknown Bot" - } else { - // For regular users, use display_name or name - replacementName = - user.profile?.display_name || user.name || "Unknown User" - } - // Replace the mention with @username format - processedText = processedText.replace(mention, `@${replacementName}`) - } - } + const mentions = text.match(mentionRegex) + if (!mentions) return processedText + + // Fetch missing users concurrently with a small limit + const userIds = [...new Set(mentions.map((m) => m.slice(2, -1)))] + const toFetch = userIds.filter((id) => !memberMap.has(id)) + if (toFetch.length) { + const limit = pLimit(5) + await Promise.all( + toFetch.map((userId) => + limit(async () => { + try { + const res = await client.users.info({ user: userId }) + if (res.ok && res.user) memberMap.set(userId, res.user as User) + } catch {} + }), + ), + ) + } + + // Replace mentions using the map + processedText = processedText.replace(mentionRegex, (_match, userId) => { + const user = memberMap.get(userId) + if (!user) return _match + const replacementName = user.is_bot + ? user.profile?.real_name || + user.real_name || + user.name || + "Unknown Bot" + : user.profile?.display_name || user.name || "Unknown User" + return `@${replacementName}` + }) @@ return processedText
412-442
: Remove @ts-ignore; add type-safe block parsing (include rich_text)Avoid @ts-ignore and cover common bot block shapes.
export function extractBotMessageText(message: SlackMessage): string { let combinedText = "" @@ - if (message.blocks) { - for (const block of message.blocks) { - // @ts-ignore - if (block.type === "section" && block.text?.text) { - // @ts-ignore - combinedText += block.text.text + "\n" - } - // @ts-ignore - if (block.type === "section" && block.fields) { - // @ts-ignore - for (const field of block.fields) { - if (field.text) { - combinedText += field.text + "\n" - } - } - } - } - } + if (Array.isArray(message.blocks)) { + for (const block of message.blocks as Array< + | { type: "section"; text?: { text?: string }; fields?: Array<{ text?: string }> } + | { type: "rich_text"; elements?: any[] } + | { type: string } + >) { + if (block.type === "section") { + const section = block + if (section.text?.text) combinedText += section.text.text + "\n" + if (Array.isArray(section.fields)) { + for (const f of section.fields) { + if (f?.text) combinedText += f.text + "\n" + } + } + } else if (block.type === "rich_text" && Array.isArray(block.elements)) { + for (const el of block.elements) { + if (Array.isArray(el?.elements)) { + for (const child of el.elements) { + if (child?.type === "text" && child?.text) { + combinedText += child.text + "\n" + } + } + } + } + } + } + }
516-605
: Reduce duplication: factor shared “process and insert message” logicParent vs reply paths for bot/regular are nearly identical. Extract a helper (e.g., processAndInsertMessage(msg, { isBot, channelId, client, memberMap, channelMap })) and reuse in both loops to cut drift and complexity.
Also applies to: 646-756
server/scripts/slackChannelProcessing.ts (2)
28-33
: Propagate includeBotMessages to re-ingestion (configurable, not hardcoded false)Wire an options.includeBotMessages through to handleSlackChannelIngestion.
export interface SlackChannelProcessingOptions { batchSize?: number concurrency?: number skipProcessed?: boolean initialContinuationToken?: string + includeBotMessages?: boolean } @@ export async function processSlackChannels( app: Apps = Apps.Slack, options: SlackChannelProcessingOptions = {}, ): Promise<SlackChannelProcessingStats> { const { batchSize = 100, concurrency = 3, // Lower concurrency to avoid overwhelming Slack API skipProcessed = true, initialContinuationToken, + includeBotMessages = false, } = options @@ -async function processSlackChannel( - doc: any, - validUsers: Set<string>, -): Promise<{ +async function processSlackChannel( + doc: any, + validUsers: Set<string>, + includeBotMessages = false, +): Promise<{ @@ - await handleSlackChannelIngestion( + await handleSlackChannelIngestion( connectorId, [docId], // Array with single channel ID "", // No start date - full ingestion "", // No end date - full ingestion selectedUser, - false, + includeBotMessages, ) @@ - const results = await Promise.allSettled( - slackChannels.map((doc: any) => - processLimit(() => processSlackChannel(doc, validUsers)), - ), - ) + const results = await Promise.allSettled( + slackChannels.map((doc: any) => + processLimit(() => + processSlackChannel(doc, validUsers, includeBotMessages), + ), + ), + ) @@ const stats = await processSlackChannels(Apps.Slack, { batchSize: 50, // Smaller batches for Slack to avoid rate limits concurrency: 2, // Lower concurrency for Slack API skipProcessed: false, // We want to re-process all channels initialContinuationToken: continuationToken, + includeBotMessages: false, // TODO: make CLI/param-configurable })Also applies to: 256-266, 188-197, 229-237, 369-373, 443-448
4-4
: Use server/shared enums and avoid string literal entity filter
- Import Apps (and SlackEntity) from shared types to match DB values; keep chatContainerSchema from vespa types.
- Replace "channel" literal with SlackEntity.Channel.
-import { Apps, chatContainerSchema } from "@xyne/vespa-ts/types" +import { chatContainerSchema } from "@xyne/vespa-ts/types" +import { Apps, SlackEntity } from "@/shared/types" @@ - const slackChannels = documents.filter( - (doc) => - doc?.fields?.app === Apps.Slack && doc?.fields?.entity === "channel", - ) + const slackChannels = documents.filter( + (doc) => + doc?.fields?.app === Apps.Slack && + doc?.fields?.entity === SlackEntity.Channel, + )Also applies to: 339-342
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
server/integrations/slack/channelIngest.ts
(7 hunks)server/scripts/slackChannelProcessing.ts
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
server/scripts/slackChannelProcessing.ts (6)
server/logger/index.ts (2)
getLogger
(36-93)Subsystem
(15-15)server/config.ts (2)
NAMESPACE
(20-20)CLUSTER
(21-21)server/types.ts (1)
TxnOrClient
(319-319)server/db/schema/syncJobs.ts (1)
syncJobs
(27-59)server/db/connector.ts (1)
getConnectorByAppAndEmailId
(449-477)server/integrations/slack/channelIngest.ts (1)
handleSlackChannelIngestion
(1126-1387)
server/integrations/slack/channelIngest.ts (1)
server/integrations/slack/index.ts (3)
getTeam
(680-701)formatSlackSpecialMentions
(271-298)insertChatMessage
(764-836)
# [3.14.0](v3.13.8...v3.14.0) (2025-10-13) ### Features * **slackBotMessageIngestion:** Added option to include bot message ([#1097](#1097)) ([a2e4746](a2e4746))
Description
add option for user to include the bot message during the channel ingestion.
Testing
Additional Notes
Summary by CodeRabbit