-
Notifications
You must be signed in to change notification settings - Fork 56
Feat/microsoft-sharepoint: microsoft sharepoint documents ingestion and sync #969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds Microsoft delegated and app‑only (service‑account) OAuth flows: frontend authType branching and model changes; new admin API to register service credentials; DB connector token-refresh helpers; Microsoft Graph client and helpers; SharePoint discovery, delta ingestion/sync; retry/auth plumbing, types, queue wiring, and Azure dependency. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Admin as Admin User
participant FE as Frontend (Microsoft Tab)
participant API as Server API
participant Auth as CustomServiceAuthProvider
participant DB as DB (connectors)
participant Ing as Ingestion Worker
Admin->>FE: choose authType=appOnly, enter tenantId+creds, submit
FE->>API: POST /admin/microsoft/service_account {tenantId, clientId, clientSecret}
API->>Auth: getAccessTokenWithExpiry()
Auth-->>API: {access_token, expires_at}
API->>DB: create connector (ServiceAccount) + store encrypted creds
API->>Ing: trigger handleMicrosoftServiceAccountIngestion (async)
API-->>FE: 200 {connectorId, expires_at}
FE->>FE: setOAuthIntegrationStatus(OAuthConnecting)
sequenceDiagram
autonumber
participant Q as Sync Queue
participant MS as handleMicrosoftServiceAccountChanges
participant DB as DB (connectors + tokens)
participant GC as MicrosoftGraphClient
participant SP as Microsoft Graph / SharePoint
participant Vespa as Index
Q->>MS: run()
MS->>DB: load service-account connectors + creds
MS->>GC: create/update graph client (tenantId, creds)
MS->>SP: discover sites & drives
loop per drive
MS->>SP: GET delta changes
alt deleted
MS->>Vespa: delete document
else added/updated
MS->>SP: download file & permissions
MS->>Vespa: upsert document/chunks/ACLs
end
MS->>DB: persist deltaLinks + lastSyncedAt
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120+ minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @Sithaarth24, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands Microsoft integration capabilities by introducing comprehensive ingestion and synchronization for SharePoint documents. It provides flexibility in authentication through both delegated OAuth and app-only service accounts, complete with a new frontend configuration interface. The backend now manages service account credentials, performs initial data ingestion from SharePoint sites and drives, and maintains an up-to-date index using a delta synchronization process. This ensures that SharePoint content is discoverable and current within the system. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant new feature for Microsoft SharePoint document ingestion and synchronization using service account authentication. The changes are extensive, touching the frontend UI for configuration, backend API endpoints, database schemas, and the core integration logic. Overall, the implementation is well-structured, with clear separation of concerns for SharePoint site/drive discovery and file processing. However, I've identified a couple of critical issues related to token persistence and leftover test code that must be addressed before merging. I've also included some medium-severity suggestions to improve type safety and correct minor logic errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 13
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
server/integrations/microsoft/index.ts (1)
1082-1089
: UndefineddownloadDir
variable.The
downloadDir
variable is used but not defined anywhere in the file.Either define
downloadDir
or remove this development-only code:-if (process.env.NODE_ENV !== "production") { - const init = () => { - if (!fs.existsSync(downloadDir)) { - fs.mkdirSync(downloadDir, { recursive: true }) - } - } - init() -}If this is needed for development, define the constant:
+const downloadDir = path.join(__dirname, 'downloads') + if (process.env.NODE_ENV !== "production") { const init = () => { if (!fs.existsSync(downloadDir)) { fs.mkdirSync(downloadDir, { recursive: true }) } } init() }server/integrations/microsoft/client.ts (5)
22-37
: Fix stale token exposure in getCurrentTokens (captures initial values).
getCurrentTokens()
closes over the constructor args, so it won’t reflect tokens updated viaupdateMicrosoftGraphClient
. Also, there’s no place to store an updated expiry.Apply this diff to add a mutable
expiresAt
field on the client shape:export interface MicrosoftGraphClient { client: Client accessToken: string refreshToken?: string // Only for delegated clients tenantId?: string // Only for service clients clientId: string clientSecret: string betaClient: Client + expiresAt?: Date // Helper methods to get updated tokens after refresh getCurrentTokens(): { accessToken: string refreshToken?: string expiresAt?: Date } }
Then update the factory and updater (see comments on Lines 39-57 and 79-94).
59-96
: Ensure getCurrentTokens reflects current client state.Return current fields instead of closed-over variables; also surface
expiresAt
on the object.Apply this diff:
export const createMicrosoftGraphClient = ( accessToken: string, clientId: string, clientSecret: string, refreshToken?: string, tenantId?: string, tokenExpiresAt?: Date, ): MicrosoftGraphClient => { const authProvider = new CustomAuthProvider(accessToken) const client = Client.initWithMiddleware({ authProvider, defaultVersion: "v1.0", }) const betaClient = Client.initWithMiddleware({ authProvider, defaultVersion: "beta", }) return { client, accessToken, refreshToken, tenantId, clientId, clientSecret, betaClient, + expiresAt: tokenExpiresAt, getCurrentTokens() { return { - accessToken, - refreshToken, - expiresAt: tokenExpiresAt, + accessToken: this.accessToken, + refreshToken: this.refreshToken, + expiresAt: this.expiresAt, } }, } }
113-121
: Graph SDK get() does not take arguments; dropoptions
from Beta call.Passing
options
to.get(options)
is not supported by@microsoft/microsoft-graph-client
.Apply this diff:
export const makeBetaGraphApiCall = async ( graphClient: MicrosoftGraphClient, endpoint: string, - options?: any, ): Promise<any> => { return retryWithBackoff( async () => { - const result = await graphClient.betaClient.api(endpoint).get(options) + const result = await graphClient.betaClient.api(endpoint).get() return result }, `Making Microsoft Graph API call to ${endpoint}`, Apps.MicrosoftDrive, 1, graphClient, ) }Also applies to: 123-127
129-152
: get() doesn’t accept args; alsooptions
is unused.Use headers via
.header()
as you do, and call.get()
without parameters. Consider removing theoptions
param from the signature.Apply this diff:
export const makeGraphApiCallWithHeaders = async ( graphClient: MicrosoftGraphClient, endpoint: string, headers: Record<string, string>, - options?: any, ): Promise<any> => { return retryWithBackoff( async () => { const request = graphClient.client.api(endpoint) // Add custom headers Object.entries(headers).forEach(([key, value]) => { request.header(key, value) }) - const result = await request.get(options) + const result = await request.get() return result }, `Making Microsoft Graph API call to ${endpoint} with headers`, Apps.MicrosoftDrive, 1, graphClient, ) }
154-181
: Use full nextLink URL directly; avoid re-parsing the path (version duplication risk).Feeding
url.pathname + url.search
to.api()
can duplicate/v1.0
//beta
or drop host-specific routing.Apply this diff:
while (nextLink) { const response: any = await retryWithBackoff( async () => { - if (nextLink!.startsWith("http")) { - // This is a full URL from @odata.nextLink - const url = new URL(nextLink!) - const path = url.pathname + url.search - return await graphClient.client.api(path).get() - } else { - // This is a relative path - return await graphClient.client.api(nextLink!).get(options) - } + if (nextLink!.startsWith("http")) { + // Use absolute nextLink as-is + return await graphClient.client.api(nextLink!).get() + } + // Relative path + return await graphClient.client.api(nextLink!).get(options) }, `Making paginated Microsoft Graph API call to ${nextLink}`, Apps.MicrosoftDrive, 1, graphClient, )
🧹 Nitpick comments (21)
package.json (1)
2-4
: Avoid duplicate deps at repo root; keep runtime deps scoped to server packageHaving @azure/identity at the monorepo root while also adding it under server/package.json risks version drift/hoisting quirks. Prefer declaring runtime deps only in the server package (or set up proper workspaces).
Suggested change:
-{ - "dependencies": { - "@azure/identity": "^4.12.0" - } -} +{ + "private": true +}If you’re using workspaces, add a root "workspaces" key and let the server package own this dependency.
server/db/connector.ts (3)
39-41
: Remove test-only/unnecessary imports from a runtime DB moduleImporting from node:test (todo) and zod’s date symbol here is unnecessary and can cause bundling/runtime surprises.
Apply this diff:
import { CustomServiceAuthProvider } from "@/integrations/microsoft/utils" -import { date } from "zod" -import { todo } from "node:test"
322-326
: Error type/message mismatch for service-account connectorsThrowing NoOauthConnectorFound for a ServiceAccount connector is confusing and can mislead ops/alerts.
- Introduce a NoServiceAccountConnectorFound error (or generalize the name), and use a message that mentions ServiceAccount.
358-365
: Defensive logging is fineThe unsupported app branch is OK; consider reusing the same error wording in both OAuth and ServiceAccount paths for consistency.
frontend/src/routes/_authenticated/admin/integrations/microsoft.tsx (1)
48-49
: Remove debug logStray console.log in submit path.
- console.log(isServiceAuth)
server/queue/index.ts (1)
218-225
: Run service-account change handlers in parallel to avoid head-of-line blockingCurrently executed sequentially; if one integration is slow it delays the other.
Apply this change:
- await handleGoogleServiceAccountChanges(boss, job) - await handleMicrosoftServiceAccountChanges() + await Promise.allSettled([ + handleGoogleServiceAccountChanges(boss, job), + handleMicrosoftServiceAccountChanges(), + ])server/server.ts (1)
1559-1562
: Queue init at startup: ensure idempotency and single ownerConfirm initQueue() won’t be executed by multiple processes (e.g., multiple Bun instances) leading to duplicate schedules or boss.start() conflicts. If you horizontally scale, consider leader election or a configuration gate.
If you run multiple server instances, ensure only one runs initQueue(). Otherwise, move queue to a dedicated worker process.
server/api/admin.ts (2)
112-117
: Remove unused import.The
todo
import fromnode:test
on line 112 appears to be unused and should be removed.-import { todo } from "node:test" -import { ClientSecretCredential } from "@azure/identity" -import { Client as GraphClient } from "@microsoft/microsoft-graph-client" -import type { AuthenticationProvider } from "@microsoft/microsoft-graph-client" -import { handleMicrosoftServiceAccountIngestion } from "@/integrations/microsoft" -import { CustomServiceAuthProvider } from "@/integrations/microsoft/utils" +import { ClientSecretCredential } from "@azure/identity" +import { Client as GraphClient } from "@microsoft/microsoft-graph-client" +import type { AuthenticationProvider } from "@microsoft/microsoft-graph-client" +import { handleMicrosoftServiceAccountIngestion } from "@/integrations/microsoft" +import { CustomServiceAuthProvider } from "@/integrations/microsoft/utils"
442-534
: Consider improving error handling consistency and adding input validation.The function has good error handling overall, but could benefit from more specific error types and additional validation for the App type.
- Add validation to ensure the app type is supported for service account authentication:
const app = Apps.MicrosoftDrive + + // Validate that the app supports service account authentication + if (!IsMicrosoftApp(app)) { + throw new HTTPException(400, { + message: "Application type does not support service account authentication", + }) + }
- Consider using a more specific error type for authentication failures:
try { const authProvider = new CustomServiceAuthProvider( tenantId, clientId, clientSecret, ) const accessToken = await authProvider.getAccessTokenWithExpiry() + } catch (authError) { + if (authError instanceof Error && authError.message.includes("authentication")) { + throw new HTTPException(401, { + message: "Failed to authenticate with Microsoft service account", + }) + } + throw authError + } + + try { const expiresAt = new Date(accessToken.expiresOnTimestamp)server/integrations/microsoft/sync.ts (1)
83-88
: Remove unused import.The
unknown
import fromzod
appears to be unused.-import { unknown } from "zod" import { discoverSharePointSites, discoverSiteDrives, processSiteDrives, } from "./sharepoint"
server/utils.ts (1)
13-13
: Remove unused import.The
todo
import fromnode:test
appears to be unused.-import { todo } from "node:test"
server/integrations/microsoft/index.ts (1)
851-852
: Consider making site filtering configurable.The SharePoint sites discovery doesn't appear to have any filtering, but the sync module has hardcoded site filters. Consider making this consistent and configurable.
Add an optional filter parameter or configuration:
- let sites = await discoverSharePointSites(graphClient, email!) + // TODO: Add site filtering configuration if needed + let sites = await discoverSharePointSites(graphClient, email!) + + // Optional: Filter sites based on configuration + // const siteFilter = config.sharepoint?.allowedSites + // if (siteFilter && siteFilter.length > 0) { + // sites = sites.filter(site => siteFilter.includes(site.name)) + // }server/integrations/microsoft/client.ts (2)
6-6
: Remove unused import or wire it up.
CustomServiceAuthProvider
is imported but not used.Apply this diff:
-import { CustomServiceAuthProvider } from "./utils"
192-205
: Prefer getStream() for file content endpoints.For
/content
,getStream()
avoids JSON assumptions and is clearer. Optional change.Example (requires a stream-aware helper):
const stream = await graphClient.client.api(endpoint).getStream() return await streamToBuffer(stream)server/integrations/microsoft/utils.ts (7)
37-50
: Use typed error consistent with Google integration for parity and handling.Align with
DeleteDocumentError
pattern to preserve error semantics and observability.Apply this diff (imports may be needed for the error type):
- throw new Error(`Error deleting file: ${(err as Error).message}`) + // Consider reusing a typed error similar to Google integration + throw new Error(`Error deleting file: ${(err as Error).message}`)If you have a
DeleteDocumentError
for Microsoft, emit that instead withintegration: Apps.MicrosoftDrive
.
162-167
: Use existing retry helper for content fetches (rate-limit resilience).Wrap this call with the shared retry logic, or reuse
makeGraphApiCallWithHeaders
from the client module.Example change:
- const contentResponse = await graphClient.client - .api(`/me/drive/items/${fileId}/content`) - .header("Accept", "text/plain") - .get() + // import { makeGraphApiCallWithHeaders } from "./client" + const contentResponse = await makeGraphApiCallWithHeaders( + graphClient, + `/me/drive/items/${fileId}/content`, + { Accept: "text/plain" }, + )
249-256
: Comment contradicts behavior for deletions.Comment says “Skip…deleted”, but code returns true. Clarify intent.
Apply this diff:
- // Skip if it's a deleted item and we don't have it in our system - if (change.deleted) { - return true // We need to process deletions to remove from Vespa - } + // Process deletions to remove docs from Vespa + if (change.deleted) return true
366-371
: PDF pipeline: avoid non-null assertions, fill permissions/owner, and use Logger.
- Don’t assert
driveId!
; it’s optional.- Populate
permissions
viagetFilePermissions
.- Prefer
Logger.error
overconsole.error
.- Guard date parsing.
Apply this diff:
- const pdfBuffer = await downloadFileFromGraph( - graphClient, - file.id!, - file.parentReference?.driveId!, - ) + const pdfBuffer = await downloadFileFromGraph( + graphClient, + file.id!, + file.parentReference?.driveId, + ) @@ - await fs.promises.writeFile(pdfPath, new Uint8Array(pdfBuffer)) + await fs.promises.writeFile(pdfPath, new Uint8Array(pdfBuffer)) @@ - await deleteDocument(pdfPath) + await deleteDocument(pdfPath) @@ - results.push({ + const permissions = await getFilePermissions( + graphClient, + file.id!, + file.parentReference?.driveId, + ) + const createdAt = file.createdDateTime + ? new Date(file.createdDateTime).getTime() + : Date.now() + const updatedAt = file.lastModifiedDateTime + ? new Date(file.lastModifiedDateTime).getTime() + : createdAt + + results.push({ title: file.name!, url: file.webUrl ?? "", app: Apps.MicrosoftDrive, docId: file.id!, parentId: file.parentReference?.id ?? null, - owner: "", // Extract from file.createdBy if available + owner: file.createdBy?.user?.displayName ?? "", photoLink: "", ownerEmail: userEmail, entity: DriveEntity.PDF, chunks: chunks.map((c: any) => c.chunk), - permissions: [], // Process file.permissions if available + permissions, mimeType: file.file?.mimeType ?? "", metadata: JSON.stringify({ parentPath: file.parentReference?.path, size: file.size, }), - createdAt: new Date(file.createdDateTime!).getTime(), - updatedAt: new Date(file.lastModifiedDateTime!).getTime(), + createdAt, + updatedAt, })And replace
console.error
with:- console.error(`Error processing PDF ${file.name}:`, error) + Logger.error(error, `Error processing PDF ${file.name}: ${error}`)Also applies to: 376-393, 395-415
435-444
: DOCX pipeline: same fixes as PDFs (permissions/owner/dates and Logger).Apply this diff:
- const docxBuffer = await downloadFileFromGraph( - graphClient, - file.id!, - file.parentReference?.driveId!, - ) + const docxBuffer = await downloadFileFromGraph( + graphClient, + file.id!, + file.parentReference?.driveId, + ) @@ - results.push({ + const permissions = await getFilePermissions( + graphClient, + file.id!, + file.parentReference?.driveId, + ) + const createdAt = file.createdDateTime + ? new Date(file.createdDateTime).getTime() + : Date.now() + const updatedAt = file.lastModifiedDateTime + ? new Date(file.lastModifiedDateTime).getTime() + : createdAt + + results.push({ title: file.name!, url: file.webUrl ?? "", app: Apps.MicrosoftDrive, docId: file.id!, parentId: file.parentReference?.id ?? null, - owner: "", + owner: file.createdBy?.user?.displayName ?? "", photoLink: "", ownerEmail: userEmail, entity: DriveEntity.Docs, // Reuse Google's entity types chunks: extractedContent.text_chunks || [], // Use text_chunks property - permissions: [], + permissions, mimeType: file.file?.mimeType ?? "", metadata: JSON.stringify({ parentPath: file.parentReference?.path, size: file.size, images: extractedContent.image_chunks?.length || 0, // Use image_chunks property }), - createdAt: new Date(file.createdDateTime!).getTime(), - updatedAt: new Date(file.lastModifiedDateTime!).getTime(), + createdAt, + updatedAt, })And:
- console.error(`Error processing Word document ${file.name}:`, error) + Logger.error(error, `Error processing Word document ${file.name}: ${error}`)Also applies to: 445-466
476-493
: Excel pipeline: keep logging consistent and guard dates.If you keep this path commented-out elsewhere, these changes are still safe.
Apply this diff:
- const workbook = await graphClient + const workbook = await graphClient .api(`${base}/${file.id}/workbook/worksheets`) .get() @@ - console.error( - `Error processing worksheet ${worksheet.name}:`, - worksheetError, - ) + Logger.error( + worksheetError as Error, + `Error processing worksheet ${worksheet.name}: ${worksheetError}`, + ) continue @@ - results.push({ + const createdAt = file.createdDateTime + ? new Date(file.createdDateTime).getTime() + : Date.now() + const updatedAt = file.lastModifiedDateTime + ? new Date(file.lastModifiedDateTime).getTime() + : createdAt + + results.push({ title: file.name!, url: file.webUrl ?? "", app: Apps.MicrosoftDrive, docId: file.id!, parentId: file.parentReference?.id ?? null, owner: "", photoLink: "", ownerEmail: userEmail, entity: DriveEntity.Sheets, chunks, permissions: [], mimeType: file.file?.mimeType ?? "", metadata: JSON.stringify({ parentPath: file.parentReference?.path, size: file.size, worksheetCount: workbook.value?.length || 0, }), - createdAt: new Date(file.createdDateTime!).getTime(), - updatedAt: new Date(file.lastModifiedDateTime!).getTime(), + createdAt, + updatedAt, })And:
- console.error(`Error processing Excel file ${file.name}:`, error) + Logger.error(error as Error, `Error processing Excel file ${file.name}: ${error}`)Also applies to: 496-527, 529-551
559-601
: processFileContent: Excel path is disabled.If intentional, add a brief comment/TODO to track enabling later.
Apply this diff:
case "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet": - // const excelResults = await processMicrosoftExcel( + // TODO: Enable Excel extraction when finalized + // const excelResults = await processMicrosoftExcel( // graphClient.client, // [file], // userEmail, // ) // return excelResults[0]?.chunks || [] return []
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (3)
bun.lock
is excluded by!**/*.lock
frontend/bun.lockb
is excluded by!**/bun.lockb
server/bun.lockb
is excluded by!**/bun.lockb
📒 Files selected for processing (16)
frontend/.env.default
(0 hunks)frontend/src/routes/_authenticated/admin/integrations/microsoft.tsx
(4 hunks)package.json
(1 hunks)server/api/admin.ts
(4 hunks)server/db/connector.ts
(5 hunks)server/integrations/microsoft/client.ts
(8 hunks)server/integrations/microsoft/config.ts
(1 hunks)server/integrations/microsoft/index.ts
(3 hunks)server/integrations/microsoft/sharepoint/index.ts
(1 hunks)server/integrations/microsoft/sync.ts
(11 hunks)server/integrations/microsoft/utils.ts
(3 hunks)server/package.json
(2 hunks)server/queue/index.ts
(2 hunks)server/server.ts
(6 hunks)server/types.ts
(5 hunks)server/utils.ts
(4 hunks)
💤 Files with no reviewable changes (1)
- frontend/.env.default
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-05-28T10:47:41.020Z
Learnt from: naSim087
PR: xynehq/xyne#484
File: server/integrations/google/sync.ts:222-222
Timestamp: 2025-05-28T10:47:41.020Z
Learning: The functions `handleGoogleDriveChange` and `getDriveChanges` in `server/integrations/google/sync.ts` are intentionally exported for future changes, even though they are not currently being imported by other modules.
Applied to files:
server/integrations/microsoft/utils.ts
server/integrations/microsoft/index.ts
server/queue/index.ts
server/integrations/microsoft/sync.ts
📚 Learning: 2025-09-12T13:28:43.782Z
Learnt from: devesh-juspay
PR: xynehq/xyne#821
File: server/server.ts:771-837
Timestamp: 2025-09-12T13:28:43.782Z
Learning: All workflow API endpoints should be protected with AuthMiddleware to prevent unauthorized access to template creation, execution, tool management, and other sensitive workflow operations.
Applied to files:
server/server.ts
📚 Learning: 2025-06-10T05:40:04.427Z
Learnt from: naSim087
PR: xynehq/xyne#525
File: frontend/src/routes/_authenticated/admin/integrations/slack.tsx:141-148
Timestamp: 2025-06-10T05:40:04.427Z
Learning: In frontend/src/routes/_authenticated/admin/integrations/slack.tsx, the ConnectAction enum and related connectAction state (lines 141-148, 469-471) are intentionally kept for future development, even though they appear unused in the current implementation.
Applied to files:
frontend/src/routes/_authenticated/admin/integrations/microsoft.tsx
🧬 Code graph analysis (11)
server/integrations/microsoft/sharepoint/index.ts (6)
server/integrations/microsoft/client.ts (2)
MicrosoftGraphClient
(23-37)makeGraphApiCall
(98-112)server/integrations/microsoft/index.ts (1)
loggerWithChild
(82-84)server/integrations/tracker.ts (1)
Tracker
(89-325)server/integrations/microsoft/utils.ts (2)
getFilePermissions
(278-353)processFileContent
(559-601)server/shared/types.ts (2)
Apps
(38-38)DriveEntity
(33-33)server/search/vespa.ts (1)
insertWithRetry
(171-171)
server/types.ts (1)
server/db/oauthProvider.ts (1)
createOAuthProvider
(14-34)
server/integrations/microsoft/utils.ts (4)
server/integrations/google/index.ts (3)
downloadDir
(2431-2431)deleteDocument
(1467-1484)safeLoadPDF
(2479-2495)server/integrations/microsoft/client.ts (4)
MicrosoftGraphClient
(23-37)makeGraphApiCall
(98-112)downloadFileFromGraph
(193-209)Client
(249-249)server/utils.ts (1)
hashPdfFilename
(325-332)server/docxChunks.ts (1)
extractTextAndImagesWithChunksFromDocx
(2412-2765)
server/server.ts (6)
server/api/search.ts (1)
chatTitleSchema
(132-135)server/api/chat/chat.ts (1)
GenerateChatTitleApi
(7168-7215)server/types.ts (2)
createOAuthProvider
(187-225)microsoftServiceSchema
(227-233)server/db/oauthProvider.ts (1)
createOAuthProvider
(14-34)server/api/admin.ts (2)
CreateOAuthProvider
(356-440)AddServiceConnectionMicrosoft
(442-534)server/errors/index.ts (1)
InitialisationError
(99-108)
server/integrations/microsoft/index.ts (7)
server/db/schema/connectors.ts (2)
SelectConnector
(146-146)connectors
(60-109)server/integrations/tracker.ts (1)
Tracker
(89-325)server/types.ts (1)
MicrosoftServiceCredentials
(461-468)server/integrations/microsoft/client.ts (1)
createMicrosoftGraphClient
(60-95)server/integrations/metricStream.ts (2)
sendWebsocketMessage
(11-16)closeWs
(5-7)server/integrations/microsoft/sharepoint/index.ts (3)
discoverSharePointSites
(12-47)discoverSiteDrives
(50-110)processSiteDrives
(113-247)server/errors/index.ts (1)
CouldNotFinishJobSuccessfully
(194-203)
server/db/connector.ts (5)
server/utils.ts (2)
IsGoogleApp
(300-307)IsMicrosoftApp
(309-316)server/types.ts (3)
OAuthCredentials
(312-318)TxnOrClient
(310-310)MicrosoftServiceCredentials
(461-468)server/db/schema/connectors.ts (2)
connectors
(60-109)selectConnectorSchema
(135-139)server/errors/index.ts (2)
NoOauthConnectorFound
(290-295)MissingOauthConnectorCredentialsError
(262-271)server/integrations/microsoft/utils.ts (1)
CustomServiceAuthProvider
(99-129)
server/utils.ts (3)
server/integrations/microsoft/client.ts (2)
MicrosoftGraphClient
(23-37)updateMicrosoftGraphClient
(39-57)server/integrations/microsoft/config.ts (1)
scopes
(2-5)server/integrations/microsoft/utils.ts (1)
CustomServiceAuthProvider
(99-129)
server/queue/index.ts (1)
server/integrations/microsoft/sync.ts (1)
handleMicrosoftServiceAccountChanges
(1825-1925)
server/api/admin.ts (8)
server/integrations/microsoft/index.ts (2)
loggerWithChild
(82-84)handleMicrosoftServiceAccountIngestion
(817-923)server/db/user.ts (1)
getUserByEmail
(98-107)server/db/connector.ts (2)
db
(3-3)insertConnector
(43-92)server/errors/index.ts (3)
NoUserFound
(322-327)ConnectorNotCreated
(338-343)AddServiceConnectionError
(330-335)server/types.ts (2)
microsoftService
(295-295)MicrosoftServiceCredentials
(461-468)server/integrations/microsoft/utils.ts (1)
CustomServiceAuthProvider
(99-129)server/db/schema/connectors.ts (1)
selectConnectorSchema
(135-139)server/utils.ts (1)
getErrorMessage
(106-109)
server/integrations/microsoft/sync.ts (10)
server/integrations/microsoft/client.ts (4)
makeGraphApiCall
(98-112)downloadFileFromGraph
(193-209)MicrosoftGraphClient
(23-37)createMicrosoftGraphClient
(60-95)server/integrations/microsoft/sharepoint/index.ts (3)
discoverSharePointSites
(12-47)discoverSiteDrives
(50-110)processSiteDrives
(113-247)server/integrations/microsoft/index.ts (1)
loggerWithChild
(82-84)server/utils.ts (1)
getErrorMessage
(106-109)server/search/vespa.ts (4)
getDocumentOrNull
(63-63)DeleteDocument
(65-65)UpdateDocumentPermissions
(172-173)insertWithRetry
(171-171)server/integrations/microsoft/utils.ts (2)
getFilePermissions
(278-353)processFileContent
(559-601)server/db/syncJob.ts (1)
getAppSyncJobs
(35-45)server/db/connector.ts (2)
db
(3-3)getMicrosoftAuthConnectorWithCredentials
(307-368)server/types.ts (1)
MicrosoftServiceCredentials
(461-468)server/db/syncHistory.ts (1)
insertSyncHistory
(10-30)
frontend/src/routes/_authenticated/admin/integrations/microsoft.tsx (2)
frontend/src/api.ts (1)
api
(4-4)server/shared/types.ts (1)
Apps
(38-38)
🔇 Additional comments (18)
server/package.json (1)
55-55
: OK to bump @xyne/vespa-tsNo issue spotted with ^1.0.6 bump.
server/db/connector.ts (2)
167-179
: Token expiry helper refactor looks goodSwitch to IsExpired(app, expiresAt, buffer) is reasonable and used consistently for Google/Microsoft.
215-219
: Good: expiry check now uses expiresAt from credentialsThis aligns with the new IsExpired signature.
frontend/src/routes/_authenticated/admin/integrations/microsoft.tsx (2)
213-298
: Admin-only auth-type UI: looks goodConditional radio group for delegated vs app-only and tenantId-only when app-only is selected is sound. Disabled scopes input matches server default-scope behavior.
Confirm that non-admin users never see app-only and cannot post to the admin route from the client.
112-128
: Status transitions per auth type are sensible
- App-only: set OAuthConnecting immediately after backend confirms creation.
- Delegated: set OAuth to proceed with modal OAuth.
LGTM.
server/queue/index.ts (1)
27-31
: Microsoft service-account sync wiring addedImporting handleMicrosoftServiceAccountChanges is correct and consistent with sync worker usage.
server/types.ts (3)
295-295
: Export type alias: OKmicrosoftService export matches the new schema exposure.
384-389
: SharePoint delta token type addedSchema looks appropriate; union update below includes it.
461-468
: MicrosoftServiceCredentials type: OKMatches usage in DB credentials and sync code.
server/server.ts (1)
787-791
: Chat title endpoint addition: OKSchema validation and handler wiring look correct.
server/integrations/microsoft/config.ts (1)
3-3
: Switch to .default scope: LGTM
- Works for app-only flows (application permissions).
- Retaining offline_access is appropriate for delegated OAuth to obtain refresh tokens.
server/integrations/microsoft/sharepoint/index.ts (1)
131-131
: Duplicate drive name in log message.The log message uses
siteDrive.name
for both the drive name and site name, which appears to be incorrect.[raise_minor_issue]
loggerWithChild({ email: userEmail }).info( - `Processing drive: ${siteDrive.name} from site: ${siteDrive.name}`, + `Processing drive: ${siteDrive.name}`, )server/integrations/microsoft/sync.ts (1)
1825-1925
: LGTM! Well-structured service account sync implementation.The implementation follows the established pattern from OAuth sync, properly handles errors, updates sync job status, and creates sync history records appropriately.
server/utils.ts (1)
216-251
: Well-structured multi-provider authentication handling.The implementation properly distinguishes between OAuth/delegated and service/app-only authentication for Microsoft, with appropriate error handling for missing credentials.
server/integrations/microsoft/index.ts (1)
817-923
: LGTM! Well-structured service account ingestion implementation.The implementation includes proper error handling, progress tracking via websockets, transaction management, and comprehensive logging with job IDs for traceability.
server/integrations/microsoft/utils.ts (2)
52-69
: LGTM: robust PDF loader with password handling.
98-129
: LGTM: service auth provider using ClientSecretCredential.Good separation for app-only flows; Azure Identity caches tokens.
server/integrations/microsoft/client.ts (1)
97-112
: Incorrect — retryWithBackoff does not log/serialize the authClient; passing graphClient is intended.retryWithBackoff only logs the caught error and the context string (Logger.warn(error); Logger.info/error with context) and uses the optional authClient to refresh tokens on 401. It does not serialize or log the authClient parameter, so the specific clientSecret-leak concern is unfounded. For extra hardening, avoid including secrets in error objects or pass a redacted object (e.g., { tenantId, clientId }) / ensure logger redaction.
Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
server/db/connector.ts (1)
167-179
: Defensively handle invalid/missing expiresAt; treat as expired.With an invalid Date, getTime() is NaN and the function returns false, skipping refresh. Force-refresh on invalid expiration.
-const IsExpired = ( - app: Apps, - expiresAt: Date, - bufferInSeconds: number, -): boolean => { - if (IsGoogleApp(app) || IsMicrosoftApp(app)) { - const now: Date = new Date() - const currentTime = now.getTime() - const expirationTime = new Date(expiresAt).getTime() - return currentTime + bufferInSeconds * 1000 > expirationTime - } - return false -} +const IsExpired = ( + app: Apps, + expiresAt: Date | string | number | null | undefined, + bufferInSeconds: number, +): boolean => { + if (!(IsGoogleApp(app) || IsMicrosoftApp(app))) return false + const expMs = + expiresAt instanceof Date + ? expiresAt.getTime() + : new Date(expiresAt as any).getTime() + if (!Number.isFinite(expMs)) { + // Invalid/missing expiry => force refresh + return true + } + return Date.now() + bufferInSeconds * 1000 >= expMs +}server/integrations/microsoft/sync.ts (1)
1624-1629
: Fix mis-ordered createMicrosoftGraphClient arguments (critical)createMicrosoftGraphClient(accessToken, clientId, clientSecret, refreshToken?, tenantId?, tokenExpiresAt?). Several call sites pass refreshToken as the 2nd argument (or tenantId as the 4th), which places values into clientId/clientSecret slots and breaks authentication. Fix calls to pass clientId/clientSecret as args 2/3, refreshToken as arg 4, tenantId as arg 5.
server/integrations/microsoft/sync.ts (lines ~1624–1627) — replace
const graphClient = createMicrosoftGraphClient(
oauthTokens.access_token,
oauthTokens.refresh_token,
MICROSOFT_CLIENT_ID,
MICROSOFT_CLIENT_SECRET,
)
with
const graphClient = createMicrosoftGraphClient(
oauthTokens.access_token,
MICROSOFT_CLIENT_ID,
MICROSOFT_CLIENT_SECRET,
oauthTokens.refresh_token,
)server/integrations/microsoft/sync.ts (lines ~1730–1733) — same replacement as above.
server/integrations/microsoft/index.ts (lines ~956–959) — replace
const graphClient = createMicrosoftGraphClient(
oauthTokens.access_token,
oauthTokens.refresh_token,
microsoftProvider.clientId!,
microsoftProvider.clientSecret! as string,
)
with
const graphClient = createMicrosoftGraphClient(
oauthTokens.access_token,
microsoftProvider.clientId!,
microsoftProvider.clientSecret! as string,
oauthTokens.refresh_token,
)server/integrations/microsoft/index.ts (lines ~832–835) — service account call currently passes tenantId as the 4th arg; replace
const graphClient = createMicrosoftGraphClient(
credentials.access_token,
credentials.clientId,
credentials.clientSecret,
credentials.tenantId,
)
with
const graphClient = createMicrosoftGraphClient(
credentials.access_token,
credentials.clientId,
credentials.clientSecret,
undefined,
credentials.tenantId,
)
🧹 Nitpick comments (7)
server/db/connector.ts (4)
38-40
: Remove stray/unused imports (zod date, node:test todo).These appear accidental in a DB module.
import { CustomServiceAuthProvider } from "@/integrations/microsoft/utils" -import { date } from "zod" -import { todo } from "node:test"
315-319
: Constrain to Microsoft apps in the query.Function name suggests Microsoft-only; add an app filter to avoid matching other service accounts.
and( eq(connectors.id, connectorId), eq(connectors.authType, AuthType.ServiceAccount), + inArray(connectors.app, [ + Apps.MicrosoftDrive, + Apps.MicrosoftOutlook, + Apps.MicrosoftCalendar, + Apps.MicrosoftSharepoint, + ]), ),Add the import (outside this hunk):
-import { and, eq } from "drizzle-orm" +import { and, eq, inArray } from "drizzle-orm"
322-326
: Use a more accurate error for service-account connectors.These aren’t OAuth connectors; prefer NoConnectorsFound with a precise message.
- if (!res.length) { - throw new NoOauthConnectorFound({ - message: `Could not get the oauth connector with id: ${connectorId}`, - }) - } + if (!res.length) { + throw new NoConnectorsFound({ + message: `Could not get the service-account connector with id: ${connectorId}`, + }) + }
338-356
: Avoid refresh stampede under concurrency.Multiple workers can detect expiry and refresh simultaneously. Consider row-level locking (SELECT ... FOR UPDATE) or an atomic compare-and-swap on expires_at in the UPDATE to serialize refresh.
server/integrations/microsoft/sync.ts (3)
1302-1305
: Avoid logging raw deltaLink URLs/tokens.Delta URLs can embed opaque tokens. Don’t log them.
- loggerWithChild({ email }).info( - `Fetching delta changes from: ${nextLink.substring(0, 100)}...`, - ) + loggerWithChild({ email }).info( + `Fetching SharePoint drive delta changes (URL redacted)`, + )
1188-1190
: Implement cleanup for deleted drives.Currently a TODO. Without cleanup, Vespa retains orphaned docs when drives are removed.
High-level approach:
- Track docs by
siteId::driveId
composite key in metadata.- On deletion, bulk remove matching docs (or queue a background cleanup job).
I can draft a
deleteDriveDocuments(siteAndDriveKeys: string[])
helper using Vespa queries if you want.
94-116
: Defer env var validation to runtime call sites.Top‑level throws on missing
MICROSOFT_CLIENT_ID/SECRET
can crash unrelated workers/tests that import this module but don’t use Microsoft integrations.
- Move
validateMicrosoftCredentials()
into the functions that actually need it (e.g., inside OAuth handlers) or guard it behind a feature flag check.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
server/db/connector.ts
(5 hunks)server/integrations/microsoft/sync.ts
(11 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-28T10:47:41.020Z
Learnt from: naSim087
PR: xynehq/xyne#484
File: server/integrations/google/sync.ts:222-222
Timestamp: 2025-05-28T10:47:41.020Z
Learning: The functions `handleGoogleDriveChange` and `getDriveChanges` in `server/integrations/google/sync.ts` are intentionally exported for future changes, even though they are not currently being imported by other modules.
Applied to files:
server/integrations/microsoft/sync.ts
🧬 Code graph analysis (2)
server/integrations/microsoft/sync.ts (10)
server/integrations/microsoft/client.ts (4)
makeGraphApiCall
(98-112)downloadFileFromGraph
(193-209)MicrosoftGraphClient
(23-37)createMicrosoftGraphClient
(60-95)server/integrations/microsoft/sharepoint/index.ts (3)
discoverSharePointSites
(12-47)discoverSiteDrives
(50-110)processSiteDrives
(113-247)server/integrations/microsoft/index.ts (1)
loggerWithChild
(82-84)server/utils.ts (1)
getErrorMessage
(106-109)server/search/vespa.ts (4)
getDocumentOrNull
(63-63)DeleteDocument
(65-65)UpdateDocumentPermissions
(172-173)insertWithRetry
(171-171)server/integrations/microsoft/utils.ts (2)
getFilePermissions
(278-353)processFileContent
(559-601)server/db/syncJob.ts (2)
getAppSyncJobs
(35-45)updateSyncJob
(65-87)server/db/connector.ts (2)
db
(3-3)getMicrosoftAuthConnectorWithCredentials
(307-368)server/types.ts (1)
MicrosoftServiceCredentials
(461-468)server/db/syncHistory.ts (1)
insertSyncHistory
(10-30)
server/db/connector.ts (5)
server/utils.ts (2)
IsGoogleApp
(300-307)IsMicrosoftApp
(309-316)server/types.ts (3)
OAuthCredentials
(312-318)TxnOrClient
(310-310)MicrosoftServiceCredentials
(461-468)server/db/schema/connectors.ts (2)
connectors
(60-109)selectConnectorSchema
(135-139)server/errors/index.ts (2)
NoOauthConnectorFound
(290-295)MissingOauthConnectorCredentialsError
(262-271)server/integrations/microsoft/utils.ts (1)
CustomServiceAuthProvider
(99-129)
🔇 Additional comments (6)
server/db/connector.ts (4)
14-18
: Import realignment looks good.Type-only imports from "@/types" are correct and moving ConnectorType out of here avoids duplication.
21-21
: Using shared ConnectorType enum is correct.Good move to source ConnectorType from "@/shared/types".
215-219
: Callsite updated to pass Date-based expiry — OK.
354-356
: Refreshed token is now persisted correctly.You serialize the updated credentials object, addressing the earlier bug.
server/integrations/microsoft/sync.ts (2)
1164-1166
: Good: no hardcoded site-name filters.Site discovery no longer hardcodes allowlists like "VISA_SIN" etc. This addresses prior review feedback.
1412-1414
: Incorrect — driveId is already handled via file.parentReference (no change required)processFileContent delegates to processMicrosoftPDFs/processMicrosoftWord which call downloadFileFromGraph(..., file.parentReference?.driveId), and client.ts switches to the drives/{driveId}/items/{fileId}/content endpoint when a driveId is provided. See server/integrations/microsoft/utils.ts (processMicrosoftPDFs/processMicrosoftWord) and server/integrations/microsoft/client.ts (downloadUrl/driveId handling).
Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
server/integrations/microsoft/sync.ts (2)
1729-1734
: createMicrosoftGraphClient args out of order (Calendar)Same ordering bug here.
- const graphClient = createMicrosoftGraphClient( - oauthTokens.access_token, - oauthTokens.refresh_token, - MICROSOFT_CLIENT_ID, - MICROSOFT_CLIENT_SECRET, - ) + const graphClient = createMicrosoftGraphClient( + oauthTokens.access_token, + MICROSOFT_CLIENT_ID, + MICROSOFT_CLIENT_SECRET, + oauthTokens.refresh_token, + )
731-744
: DeletedItems delta endpoint defaults to empty stringWhen there’s no prior token, you call makeGraphApiCall with "", which will fail. Provide a proper initial delta endpoint.
- let deletedItemsEndpoint: string = "" - - if (deletedItemsDeltaToken && deletedItemsDeltaToken.startsWith("http")) { - // Use existing delta token URL - const url = new URL(deletedItemsDeltaToken) - deletedItemsEndpoint = url.pathname + url.search - } + let deletedItemsEndpoint: string + if (deletedItemsDeltaToken && deletedItemsDeltaToken.startsWith("http")) { + const url = new URL(deletedItemsDeltaToken) + deletedItemsEndpoint = url.pathname + url.search + } else { + deletedItemsEndpoint = + "/me/mailFolders/deleteditems/messages/delta" + + "?$top=100&$select=id,internetMessageId" + }
🧹 Nitpick comments (2)
server/api/admin.ts (1)
127-133
: Remove unused test/deps importsThese imports are unused here and can confuse bundlers/tree-shaking.
-import { todo } from "node:test" -import { ClientSecretCredential } from "@azure/identity" -import { Client as GraphClient } from "@microsoft/microsoft-graph-client" -import type { AuthenticationProvider } from "@microsoft/microsoft-graph-client"server/integrations/microsoft/sync.ts (1)
346-350
: Use makeGraphApiCall for fetch/retry consistencyDirect graphClient.client.api().get() bypasses your retry/backoff and token-refresh path.
- const contentResponse = await graphClient.client - .api(`/me/drive/items/${file.id}/content`) - .header("Accept", "text/plain") - .get() + const contentResponse = await makeGraphApiCall( + graphClient, + `/me/drive/items/${file.id}/content`, + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
server/api/admin.ts
(4 hunks)server/integrations/microsoft/sync.ts
(11 hunks)server/package.json
(1 hunks)server/server.ts
(6 hunks)server/types.ts
(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- server/types.ts
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-12T13:28:43.782Z
Learnt from: devesh-juspay
PR: xynehq/xyne#821
File: server/server.ts:771-837
Timestamp: 2025-09-12T13:28:43.782Z
Learning: All workflow API endpoints should be protected with AuthMiddleware to prevent unauthorized access to template creation, execution, tool management, and other sensitive workflow operations.
Applied to files:
server/server.ts
📚 Learning: 2025-05-28T10:47:41.020Z
Learnt from: naSim087
PR: xynehq/xyne#484
File: server/integrations/google/sync.ts:222-222
Timestamp: 2025-05-28T10:47:41.020Z
Learning: The functions `handleGoogleDriveChange` and `getDriveChanges` in `server/integrations/google/sync.ts` are intentionally exported for future changes, even though they are not currently being imported by other modules.
Applied to files:
server/integrations/microsoft/sync.ts
🧬 Code graph analysis (3)
server/server.ts (4)
server/types.ts (2)
createOAuthProvider
(188-226)microsoftServiceSchema
(228-234)server/db/oauthProvider.ts (1)
createOAuthProvider
(14-34)server/api/admin.ts (2)
CreateOAuthProvider
(371-455)AddServiceConnectionMicrosoft
(457-549)server/errors/index.ts (1)
InitialisationError
(99-108)
server/integrations/microsoft/sync.ts (10)
server/integrations/microsoft/client.ts (4)
makeGraphApiCall
(98-112)downloadFileFromGraph
(193-209)MicrosoftGraphClient
(23-37)createMicrosoftGraphClient
(60-95)server/integrations/microsoft/sharepoint/index.ts (3)
discoverSharePointSites
(12-47)discoverSiteDrives
(50-110)processSiteDrives
(113-247)server/integrations/microsoft/index.ts (1)
loggerWithChild
(82-84)server/utils.ts (1)
getErrorMessage
(106-109)server/search/vespa.ts (4)
getDocumentOrNull
(65-65)DeleteDocument
(67-67)UpdateDocumentPermissions
(213-214)insertWithRetry
(212-212)server/integrations/microsoft/utils.ts (2)
getFilePermissions
(278-353)processFileContent
(559-601)server/db/syncJob.ts (2)
getAppSyncJobs
(35-45)updateSyncJob
(65-87)server/db/connector.ts (2)
db
(3-3)getMicrosoftAuthConnectorWithCredentials
(307-368)server/types.ts (1)
MicrosoftServiceCredentials
(462-469)server/db/syncHistory.ts (1)
insertSyncHistory
(10-30)
server/api/admin.ts (8)
server/integrations/microsoft/index.ts (2)
loggerWithChild
(82-84)handleMicrosoftServiceAccountIngestion
(817-923)server/db/user.ts (1)
getUserByEmail
(145-154)server/db/connector.ts (2)
db
(3-3)insertConnector
(43-92)server/errors/index.ts (3)
NoUserFound
(322-327)ConnectorNotCreated
(338-343)AddServiceConnectionError
(330-335)server/types.ts (2)
microsoftService
(296-296)MicrosoftServiceCredentials
(462-469)server/integrations/microsoft/utils.ts (1)
CustomServiceAuthProvider
(99-129)server/db/schema/connectors.ts (1)
selectConnectorSchema
(135-139)server/utils.ts (1)
getErrorMessage
(106-109)
🔇 Additional comments (9)
server/package.json (1)
49-49
: Bun compatibility risk: @azure/identity v4 not officially supportedThis dependency is known to target Node LTS and browsers; Bun support is unofficial and has reported incompatibilities. If this service runs under Bun, expect runtime issues unless you add shims/polyfills or switch to Node for the server process.
Would you like me to add runtime checks/docs to enforce Node LTS for the server and keep Bun for tooling only?
server/api/admin.ts (2)
466-468
: Replace ts-ignore with a type assertionDon’t suppress types; assert the validated shape.
- // @ts-ignore - const form: microsoftService = c.req.valid("form") + const form = c.req.valid("form") as microsoftService
520-520
: Background ingestion must handle rejectionsFire-and-forget without .catch risks unhandled rejections and silent failures. Mirror the Slack/Google pattern.
Apply:
- handleMicrosoftServiceAccountIngestion(email, connector) + handleMicrosoftServiceAccountIngestion(email, connector).catch((error) => { + loggerWithChild({ email }).error( + error, + `Background Microsoft service account ingestion failed for connector ${connector.id}: ${getErrorMessage(error)}`, + ) + })server/server.ts (2)
1507-1510
: Queue init error handling is fineinitQueue() is awaited via .catch and will surface as InitialisationError early. LGTM.
1057-1061
: microsoftServiceSchema likely rejects valid forms (scopes superRefine bleed-through)microsoftServiceSchema derives from createOAuthProvider, whose superRefine enforces scopes when isUsingGlobalCred is false. Your admin POST /microsoft/service_account submits only clientId, clientSecret, tenantId, app and will be rejected.
Fix in server/types.ts by defining an independent schema without the inherited superRefine:
--- a/server/types.ts +++ b/server/types.ts @@ export const createOAuthProvider = z .object({ @@ }) .superRefine((data, ctx) => { @@ }) -// Service-account schema derived from createOAuthProvider (inherits superRefine) -export const microsoftServiceSchema = createOAuthProvider - .omit({ - scopes: true, - }) - .extend({ - tenantId: z.string(), - }) +// Independent service-account schema for Microsoft (no scopes required) +export const microsoftServiceSchema = z.strictObject({ + clientId: z.string().min(1), + clientSecret: z.string().min(1), + tenantId: z.string().min(1), + app: z.nativeEnum(Apps).default(Apps.MicrosoftSharepoint), +})server/integrations/microsoft/sync.ts (4)
1480-1485
: createMicrosoftGraphClient args out of order (OAuth)Signature is (accessToken, clientId, clientSecret, refreshToken?, tenantId?, tokenExpiresAt?). You’re passing refresh_token as clientId.
Apply:
- const graphClient = createMicrosoftGraphClient( - oauthTokens.access_token, - oauthTokens.refresh_token, - MICROSOFT_CLIENT_ID, - MICROSOFT_CLIENT_SECRET, - ) + const graphClient = createMicrosoftGraphClient( + oauthTokens.access_token, + MICROSOFT_CLIENT_ID, + MICROSOFT_CLIENT_SECRET, + oauthTokens.refresh_token, + )Likely an incorrect or invalid review comment.
1421-1431
: Do not persist @microsoft.graph.downloadUrlThis is a short-lived pre-authorized URL; storing it increases leakage risk.
metadata: JSON.stringify({ size: item.size, - downloadUrl: item["@microsoft.graph.downloadUrl"], siteId: siteId, driveId: driveId, parentId: item.parentReference?.id ?? "", parentPath: item.parentReference?.path ?? "/", eTag: item.eTag ?? "", }),
83-83
: Invalid import: zod has no named export 'unknown'This breaks build. Remove it (and use z.unknown() if needed).
-import { unknown } from "zod"
1241-1250
: Preserve deltaLink when Graph returns noneOverwriting with empty string loses a valid token.
- //update with new Link - deltaLinks[driveKey] = newDeltaLink + // Update only if a new deltaLink was returned + if (newDeltaLink) { + deltaLinks[driveKey] = newDeltaLink + } else { + loggerWithChild({ email }).warn( + `No new deltaLink for drive ${driveKey}; preserving existing token`, + ) + }
… sync deleted files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
server/integrations/microsoft/sync.ts (2)
1612-1617
: Fix createMicrosoftGraphClient argument order (Outlook).Parameters are misordered;
clientId
andclientSecret
must be 2nd and 3rd,refreshToken
4th.Apply this diff:
- const graphClient = createMicrosoftGraphClient( - oauthTokens.access_token, - oauthTokens.refresh_token, - MICROSOFT_CLIENT_ID, - MICROSOFT_CLIENT_SECRET, - ) + const graphClient = createMicrosoftGraphClient( + oauthTokens.access_token, + MICROSOFT_CLIENT_ID, + MICROSOFT_CLIENT_SECRET, + oauthTokens.refresh_token, + )
1718-1723
: Fix createMicrosoftGraphClient argument order (Calendar).Same misordering here.
Apply this diff:
- const graphClient = createMicrosoftGraphClient( - oauthTokens.access_token, - oauthTokens.refresh_token, - MICROSOFT_CLIENT_ID, - MICROSOFT_CLIENT_SECRET, - ) + const graphClient = createMicrosoftGraphClient( + oauthTokens.access_token, + MICROSOFT_CLIENT_ID, + MICROSOFT_CLIENT_SECRET, + oauthTokens.refresh_token, + )
🧹 Nitpick comments (8)
server/package.json (1)
68-68
: Type-only package can live in devDependencies
@types/pdf-parse
supplies compile-time typings only; pulling it intodependencies
bloats production installs with no runtime benefit. Let’s move it todevDependencies
."devDependencies": { "@biomejs/biome": "1.9.4", @@ - "p-queue": "^8.0.1", + "p-queue": "^8.0.1", + "@types/pdf-parse": "^1.1.5", "semantic-release": "^24.2.7", @@ - "@types/pdf-parse": "^1.1.5",server/integrations/microsoft/utils.ts (4)
415-418
: Use consistent logger instead of console.error.Prefer
Logger.error
for structured logs.Apply this diff:
- console.error(`Error processing PDF ${file.name}:`, error) + Logger.error(error, `Error processing PDF ${file.name}: ${error}`)
521-525
: Use consistent logger instead of console.error.Apply this diff:
- console.error( - `Error processing worksheet ${worksheet.name}:`, - worksheetError, - ) + Logger.error( + worksheetError, + `Error processing worksheet ${worksheet.name}: ${worksheetError}`, + )
539-539
: Fix entity type for Excel files.Use the specific Excel enum for consistency.
Apply this diff:
- entity: DriveEntity.Sheets, + entity: DriveEntity.ExcelSpreadsheet,
163-171
: Consider using the retry/backoff wrapper for Graph calls.Direct SDK call here bypasses
retryWithBackoff
; considermakeGraphApiCall
if you can attach headers via the wrapper or extend it to support headers.server/integrations/microsoft/sharepoint/index.ts (1)
145-152
: Skip folders during ingestion.Folder items are being ingested as files with empty chunks. Skip them to avoid noisy documents.
Apply this diff:
- if (response.value && Array.isArray(response.value)) { + if (response.value && Array.isArray(response.value)) { for (const item of response.value) { + if (item.folder) { + continue + } try {server/integrations/microsoft/sync.ts (2)
1174-1179
: Avoid mutating deltaLinks while iterating its keys.Delete after collecting to prevent iteration pitfalls.
Apply this diff:
- for (const key in deltaLinks) { - if (!driveSet.has(key)) { - deletedDrives.push(key) - delete deltaLinks[key] - } - } + const toDelete: string[] = [] + for (const key in deltaLinks) { + if (!driveSet.has(key)) { + deletedDrives.push(key) + toDelete.push(key) + } + } + for (const key of toDelete) delete deltaLinks[key]
1358-1371
: Don’t throw on “not found” during delete; log at warn level.This isn’t exceptional for delta; reduce noise.
Apply this diff:
- } else { - throw new Error("File not found in vespa") - } + } else { + Logger.warn(`SharePoint file ${fileId} not found in Vespa during delete`) + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
server/bun.lockb
is excluded by!**/bun.lockb
📒 Files selected for processing (4)
server/integrations/microsoft/sharepoint/index.ts
(1 hunks)server/integrations/microsoft/sync.ts
(11 hunks)server/integrations/microsoft/utils.ts
(3 hunks)server/package.json
(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-28T10:47:41.020Z
Learnt from: naSim087
PR: xynehq/xyne#484
File: server/integrations/google/sync.ts:222-222
Timestamp: 2025-05-28T10:47:41.020Z
Learning: The functions `handleGoogleDriveChange` and `getDriveChanges` in `server/integrations/google/sync.ts` are intentionally exported for future changes, even though they are not currently being imported by other modules.
Applied to files:
server/integrations/microsoft/sharepoint/index.ts
server/integrations/microsoft/sync.ts
server/integrations/microsoft/utils.ts
🧬 Code graph analysis (3)
server/integrations/microsoft/sharepoint/index.ts (5)
server/integrations/microsoft/client.ts (2)
MicrosoftGraphClient
(23-37)makeGraphApiCall
(98-112)server/integrations/microsoft/index.ts (1)
loggerWithChild
(82-84)server/integrations/tracker.ts (1)
Tracker
(89-325)server/integrations/microsoft/utils.ts (2)
getFilePermissions
(278-353)processFileContent
(559-601)server/search/vespa.ts (1)
insertWithRetry
(212-212)
server/integrations/microsoft/sync.ts (9)
server/integrations/microsoft/client.ts (4)
makeGraphApiCall
(98-112)downloadFileFromGraph
(193-209)MicrosoftGraphClient
(23-37)createMicrosoftGraphClient
(60-95)server/integrations/microsoft/sharepoint/index.ts (3)
discoverSharePointSites
(12-47)discoverSiteDrives
(50-110)processSiteDrives
(113-243)server/utils.ts (1)
getErrorMessage
(106-109)server/search/vespa.ts (3)
getDocumentOrNull
(65-65)DeleteDocument
(67-67)insertWithRetry
(212-212)server/integrations/microsoft/utils.ts (2)
getFilePermissions
(278-353)processFileContent
(559-601)server/db/syncJob.ts (2)
getAppSyncJobs
(35-45)updateSyncJob
(65-87)server/db/connector.ts (2)
db
(3-3)getMicrosoftAuthConnectorWithCredentials
(307-368)server/types.ts (1)
MicrosoftServiceCredentials
(462-469)server/db/syncHistory.ts (1)
insertSyncHistory
(10-30)
server/integrations/microsoft/utils.ts (6)
server/integrations/google/index.ts (3)
downloadDir
(2431-2431)deleteDocument
(1467-1484)safeLoadPDF
(2479-2495)server/integrations/google/pdf-utils.ts (3)
downloadDir
(11-11)deleteDocument
(13-30)safeLoadPDF
(60-77)server/integrations/microsoft/client.ts (4)
MicrosoftGraphClient
(23-37)makeGraphApiCall
(98-112)downloadFileFromGraph
(193-209)Client
(249-249)server/utils.ts (1)
hashPdfFilename
(325-332)server/chunks.ts (1)
chunkDocument
(123-133)server/docxChunks.ts (1)
extractTextAndImagesWithChunksFromDocx
(2412-2765)
🔇 Additional comments (7)
server/package.json (1)
48-48
: Blocking: @azure/identity still fails on our Bun runtimeReiterating the outstanding concern: the server scripts (Lines 33–34) continue to execute under Bun, yet
@azure/identity@^4.12.0
only supports Node.js/browsers and is known to crash/hang in Bun (AbortController,process.versions
, stream APIs). Shipping this as-is will break the new SharePoint auth flows whenever they execute inside Bun. Please either switch the runtime to Node LTS (and enforce/document it) or provide the Bun-specific shims/polyfills plus full verification before landing.server/integrations/microsoft/utils.ts (1)
25-35
: Always ensure download directory exists; add fallback for production.Currently the dir is only ensured in non-prod, risking runtime failures in prod. Use deterministic dir with fallback to OS temp.
Apply this diff:
-// Download directory setup -export const downloadDir = path.resolve(__dirname, "../../downloads") - -if (process.env.NODE_ENV !== "production") { - const init = () => { - if (!fs.existsSync(downloadDir)) { - fs.mkdirSync(downloadDir, { recursive: true }) - } - } - init() -} +// Download directory setup +export const downloadDir = process.env.DOWNLOAD_DIR + ? path.resolve(process.env.DOWNLOAD_DIR) + : path.resolve(__dirname, "../../downloads") + +try { + if (!fs.existsSync(downloadDir)) { + fs.mkdirSync(downloadDir, { recursive: true }) + } +} catch (e) { + // Fallback to OS temp if primary location is not writable + const fallback = path.resolve(os.tmpdir(), "downloads") + if (!fs.existsSync(fallback)) { + fs.mkdirSync(fallback, { recursive: true }) + } + (exports as any).downloadDir = fallback +}Also add this import at the top of the file:
import os from "node:os"server/integrations/microsoft/sharepoint/index.ts (2)
210-218
: Guard on the actual delta token, not the object.Using
deltaLinks
(the map) is always truthy; checkdeltaLink
instead.Apply this diff:
- if (deltaLinks && siteDrive.id) { + if (deltaLink && siteDrive.id) { deltaLinks[`${siteDrive.sharePointIds?.siteId}::${siteDrive.id}`] = deltaLink
128-141
: Validate sharePointIds.siteId and drive.id before constructing delta URL.Avoid building invalid endpoints when IDs are missing; skip and log.
Apply this diff:
- loggerWithChild({ email: userEmail }).info( - `Processing drive: ${siteDrive.name} from site: ${siteDrive.name}`, - ) + if (!siteDrive.sharePointIds?.siteId || !siteDrive.id) { + loggerWithChild({ email: userEmail }).warn( + `Skipping drive ${siteDrive.name} - missing sharePointIds.siteId or drive.id`, + ) + continue + } + loggerWithChild({ email: userEmail }).info( + `Processing drive: ${siteDrive.name} from site: ${siteDrive.name}`, + )server/integrations/microsoft/sync.ts (3)
1249-1250
: Don’t overwrite existing delta token with an empty value.Preserve the prior token when no new
@odata.deltaLink
is returned.Apply this diff:
- deltaLinks[driveKey] = newDeltaLink + if (newDeltaLink) { + deltaLinks[driveKey] = newDeltaLink + } else { + loggerWithChild({ email }).warn( + `No new deltaLink for drive ${driveKey}; preserving existing token`, + ) + }
1412-1419
: Remove @microsoft.graph.downloadUrl from metadata.Same security concern as above; do not persist pre-authorized URLs.
Apply this diff:
metadata: JSON.stringify({ size: item.size, - downloadUrl: item["@microsoft.graph.downloadUrl"], siteId: siteId, driveId: driveId, parentId: item.parentReference?.id ?? "", parentPath: item.parentReference?.path ?? "/", eTag: item.eTag ?? "", }),
83-83
: Remove invalid Zod import.
zod
has no named exportunknown
. This breaks the build.Apply this diff:
-import { unknown } from "zod"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server/integrations/microsoft/utils.ts (1)
221-232
: Don’t store pre‑authorized downloadUrl in metadata.Remove to reduce leakage risk; fetch fresh URLs when needed.
export const formatOneDriveMetadata = (file: any): string => { return JSON.stringify({ parentPath: file.parentReference?.path ?? "", parentName: file.parentReference?.name ?? "", size: file.size ?? 0, webUrl: file.webUrl ?? "", - downloadUrl: file["@microsoft.graph.downloadUrl"] ?? "", lastModifiedBy: file.lastModifiedBy?.user?.displayName ?? "", createdBy: file.createdBy?.user?.displayName ?? "", }) }
🧹 Nitpick comments (5)
server/server.ts (1)
1511-1514
: Queue init failure currently crashes the server. Confirm desired behavior.Throwing InitialisationError at top-level aborts startup. If queue is non-critical, prefer logging + health flag to avoid full outage.
-initQueue().catch((error) => { - throw new InitialisationError({ cause: error }) -}) +initQueue().catch((error) => { + Logger.error(error, "Queue initialisation failed") + // optionally expose degraded health instead of crashing +})server/integrations/microsoft/sync.ts (3)
1398-1411
: Use mime map for entity instead of Misc.Improves search facets/UX.
- entity: DriveEntity.Misc, + entity: + (item.file?.mimeType && microsoftMimeTypeMap[item.file.mimeType]) ?? + DriveEntity.Misc,
1353-1372
: Avoid throwing on “not found in Vespa” during delete.A missing doc isn’t exceptional during eventual consistency; downgrade to warn to reduce noise.
- } else { - throw new Error("File not found in vespa") - } + }
1181-1186
: TODO: handle deleted drives cleanup.Delta link removal alone leaves orphaned docs. Consider scheduled purge by driveId/siteId.
server/integrations/microsoft/utils.ts (1)
355-423
: Use Logger instead of console.error for consistency.Aligns with centralized logging.
- console.error(`Error processing PDF ${file.name}:`, error) + Logger.error(error, `Error processing PDF ${file.name}`)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
server/bun.lockb
is excluded by!**/bun.lockb
📒 Files selected for processing (3)
server/integrations/microsoft/sync.ts
(13 hunks)server/integrations/microsoft/utils.ts
(3 hunks)server/server.ts
(5 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-28T10:47:41.020Z
Learnt from: naSim087
PR: xynehq/xyne#484
File: server/integrations/google/sync.ts:222-222
Timestamp: 2025-05-28T10:47:41.020Z
Learning: The functions `handleGoogleDriveChange` and `getDriveChanges` in `server/integrations/google/sync.ts` are intentionally exported for future changes, even though they are not currently being imported by other modules.
Applied to files:
server/integrations/microsoft/utils.ts
server/integrations/microsoft/sync.ts
🧬 Code graph analysis (3)
server/server.ts (5)
server/api/search.ts (1)
chatTitleSchema
(140-143)server/api/chat/chat.ts (1)
GenerateChatTitleApi
(7231-7288)server/types.ts (1)
microsoftServiceSchema
(228-234)server/api/admin.ts (1)
AddServiceConnectionMicrosoft
(457-549)server/errors/index.ts (1)
InitialisationError
(99-108)
server/integrations/microsoft/utils.ts (7)
server/logger/index.ts (2)
getLogger
(36-93)Subsystem
(15-15)server/integrations/google/index.ts (3)
downloadDir
(2431-2431)deleteDocument
(1467-1484)safeLoadPDF
(2479-2495)server/integrations/google/pdf-utils.ts (3)
downloadDir
(11-11)deleteDocument
(13-30)safeLoadPDF
(60-77)server/integrations/microsoft/client.ts (4)
MicrosoftGraphClient
(23-37)makeGraphApiCall
(98-112)downloadFileFromGraph
(193-209)Client
(249-249)server/utils.ts (1)
hashPdfFilename
(325-332)server/chunks.ts (1)
chunkDocument
(123-133)server/docxChunks.ts (1)
extractTextAndImagesWithChunksFromDocx
(2412-2765)
server/integrations/microsoft/sync.ts (10)
server/integrations/microsoft/client.ts (4)
makeGraphApiCall
(98-112)downloadFileFromGraph
(193-209)MicrosoftGraphClient
(23-37)createMicrosoftGraphClient
(60-95)server/integrations/microsoft/sharepoint/index.ts (3)
discoverSharePointSites
(12-47)discoverSiteDrives
(50-110)processSiteDrives
(113-243)server/integrations/microsoft/index.ts (1)
loggerWithChild
(82-84)server/utils.ts (1)
getErrorMessage
(106-109)server/search/vespa.ts (3)
getDocumentOrNull
(65-65)DeleteDocument
(67-67)insertWithRetry
(212-212)server/integrations/microsoft/utils.ts (2)
getFilePermissions
(278-353)processFileContent
(559-601)server/db/syncJob.ts (2)
getAppSyncJobs
(35-45)updateSyncJob
(65-87)server/db/connector.ts (2)
db
(3-3)getMicrosoftAuthConnectorWithCredentials
(307-368)server/types.ts (1)
MicrosoftServiceCredentials
(462-469)server/db/syncHistory.ts (1)
insertSyncHistory
(10-30)
🔇 Additional comments (14)
server/server.ts (2)
797-801
: LGTM: route formatting onlyRefactor to multi-line handler is no-op and clearer.
1061-1065
: microsoftServiceSchema likely rejects valid service-account submissions (scopes).The derived schema still inherits superRefine from createOAuthProvider. This route will fail when scopes aren’t provided, even though handler supplies defaults later.
Apply this diff after fixing the schema (in server/types.ts) or make scopes optional for this route:
- .post( - "/microsoft/service_account", - zValidator("form", microsoftServiceSchema), - AddServiceConnectionMicrosoft, - ) + .post( + "/microsoft/service_account", + zValidator("form", microsoftServiceSchema), // ensure schema doesn't require scopes + AddServiceConnectionMicrosoft, + )#!/bin/bash # Verify microsoftServiceSchema does not enforce scopes via superRefine rg -n -C3 'microsoftServiceSchema|createOAuthProvider|superRefine' --type=ts server/types.tsserver/integrations/microsoft/sync.ts (7)
183-195
: LGTM: switched delta fetch to makeGraphApiCall with paging.Simpler and consistent with retry/backoff wrapper.
294-316
: LGTM: unified file download path uses downloadFileFromGraph(graphClient,…).Correct MicrosoftGraphClient usage.
Also applies to: 379-386
1167-1172
: Filter out drives missing siteId to avoid keys like "undefined::driveId".Prevents malformed deltaLinks keys.
- drives = drives.filter( - (drive) => - drive.id && - !(`${drive.sharePointIds?.siteId}::${drive.id}` in deltaLinks), - ) + drives = drives.filter( + (drive) => + drive.id && + drive.sharePointIds?.siteId && + !(`${drive.sharePointIds.siteId}::${drive.id}` in deltaLinks), + )
83-84
: Remove invalid Zod import.zod has no named export "unknown". This breaks build.
-import { unknown } from "zod"
1240-1250
: Don’t overwrite existing delta token with empty value.Graph may omit @odata.deltaLink; this wipes a valid token.
- //update with new Link - deltaLinks[driveKey] = newDeltaLink + // update only if a new deltaLink was returned + if (newDeltaLink) { + deltaLinks[driveKey] = newDeltaLink + } else { + loggerWithChild({ email }).warn( + `No new deltaLink for drive ${driveKey}; preserving existing token`, + ) + }
1411-1420
: Do not persist @microsoft.graph.downloadUrl in metadata.Short‑lived pre‑authorized URL; storing raises leakage risk.
metadata: JSON.stringify({ size: item.size, - downloadUrl: item["@microsoft.graph.downloadUrl"], siteId: siteId, driveId: driveId, parentId: item.parentReference?.id ?? "", parentPath: item.parentReference?.path ?? "/", eTag: item.eTag ?? "", }),
1807-1907
: Verify sync_jobs.config initialization and add null guard
I couldn’t find a migration that sets a defaultconfig
for MicrosoftSharepoint sync jobs. Confirm that existing records always have a validdeltaLinks
field or implement a backfill, and add a runtime fallback (e.g. default to{ deltaLinks: {}, lastSyncedAt: new Date(0) }
) before accessingconfig.deltaLinks
.server/integrations/microsoft/utils.ts (5)
98-129
: LGTM: Service credential provider via @azure/identity.Works for app-only flows and exposes expiry when needed.
277-353
: LGTM: Robust permission extraction across shapes.Handles modern and legacy Graph responses; dedup via Set.
425-473
: LGTM: DOCX extraction via existing docxChunks util.Buffer is a Uint8Array; safe for the helper.
559-602
: LGTM: processFileContent routes by MIME; Excel disabled safely.Returns chunks; callers handle metadata insert/update.
25-36
: Ensure download directory exists in production too.Current guard only creates it in non-prod; disk writes will fail in prod.
-// Download directory setup -export const downloadDir = path.resolve(__dirname, "../../downloads") - -if (process.env.NODE_ENV !== "production") { - const init = () => { - if (!fs.existsSync(downloadDir)) { - fs.mkdirSync(downloadDir, { recursive: true }) - } - } - init() -} +// Download directory setup +export const downloadDir = process.env.DOWNLOAD_DIR + ? path.resolve(process.env.DOWNLOAD_DIR) + : path.resolve(__dirname, "../../downloads") + +try { + if (!fs.existsSync(downloadDir)) { + fs.mkdirSync(downloadDir, { recursive: true }) + } +} catch (e) { + // Fallback to OS temp + const osTmp = require("node:os").tmpdir() + const fallback = path.resolve(osTmp, "xyne-downloads") + if (!fs.existsSync(fallback)) fs.mkdirSync(fallback, { recursive: true }) + ;(exports as any).downloadDir = fallback + Logger.warn({ fallback }, "Using temp download directory") +}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
server/integrations/microsoft/sync.ts (2)
1162-1173
: Avoid ‘undefined::driveId’ keys; filter drives missing siteId.Keying deltaLinks by
${siteId}::${driveId}
without ensuring siteId exists yields bad keys and collisions.- const driveSet = new Set( - drives.map((drive) => `${drive.sharePointIds?.siteId}::${drive.id}`), - ) + const driveSet = new Set( + drives + .filter((d) => d.sharePointIds?.siteId && d.id) + .map((d) => `${d.sharePointIds!.siteId}::${d.id}`), + ) @@ - drives = drives.filter( - (drive) => - drive.id && - !(`${drive.sharePointIds?.siteId}::${drive.id}` in deltaLinks), - ) + drives = drives.filter( + (drive) => + drive.id && + drive.sharePointIds?.siteId && + !(`${drive.sharePointIds.siteId}::${drive.id}` in deltaLinks), + )
1369-1373
: Don’t throw when Vespa doc is already absent.A missing doc on delete is normal; avoid noisy errors.
- } else { - throw new Error("File not found in vespa") - } + } else { + loggerWithChild({ email }).debug( + `File ${fileId} not found in Vespa; nothing to delete`, + ) + }server/integrations/microsoft/index.ts (1)
73-74
: Remove unused AWS Bedrock import.It’s not referenced; avoid dead imports.
-import type { InvokeModelResponseFilterSensitiveLog } from "@aws-sdk/client-bedrock-runtime"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
server/integrations/microsoft/index.ts
(3 hunks)server/integrations/microsoft/sharepoint/index.ts
(1 hunks)server/integrations/microsoft/sync.ts
(14 hunks)server/integrations/microsoft/utils.ts
(3 hunks)server/server.ts
(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- server/integrations/microsoft/sharepoint/index.ts
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-28T10:47:41.020Z
Learnt from: naSim087
PR: xynehq/xyne#484
File: server/integrations/google/sync.ts:222-222
Timestamp: 2025-05-28T10:47:41.020Z
Learning: The functions `handleGoogleDriveChange` and `getDriveChanges` in `server/integrations/google/sync.ts` are intentionally exported for future changes, even though they are not currently being imported by other modules.
Applied to files:
server/integrations/microsoft/sync.ts
🧬 Code graph analysis (4)
server/server.ts (2)
server/types.ts (1)
microsoftServiceSchema
(228-234)server/api/admin.ts (1)
AddServiceConnectionMicrosoft
(457-549)
server/integrations/microsoft/sync.ts (10)
server/integrations/microsoft/client.ts (4)
makeGraphApiCall
(98-112)downloadFileFromGraph
(193-209)MicrosoftGraphClient
(23-37)createMicrosoftGraphClient
(60-95)server/integrations/microsoft/sharepoint/index.ts (3)
discoverSharePointSites
(12-47)discoverSiteDrives
(50-110)processSiteDrives
(113-246)server/integrations/microsoft/index.ts (1)
loggerWithChild
(82-84)server/utils.ts (1)
getErrorMessage
(106-109)server/search/vespa.ts (3)
getDocumentOrNull
(65-65)DeleteDocument
(67-67)insertWithRetry
(212-212)server/integrations/microsoft/utils.ts (2)
getFilePermissionsSharepoint
(278-318)processFileContent
(600-642)server/db/syncJob.ts (2)
getAppSyncJobs
(35-45)updateSyncJob
(65-87)server/db/connector.ts (2)
db
(3-3)getMicrosoftAuthConnectorWithCredentials
(307-368)server/types.ts (1)
MicrosoftServiceCredentials
(462-469)server/db/syncHistory.ts (1)
insertSyncHistory
(10-30)
server/integrations/microsoft/index.ts (6)
server/db/schema/connectors.ts (2)
SelectConnector
(146-146)connectors
(60-109)server/integrations/tracker.ts (1)
Tracker
(89-325)server/types.ts (1)
MicrosoftServiceCredentials
(462-469)server/integrations/microsoft/client.ts (1)
createMicrosoftGraphClient
(60-95)server/integrations/metricStream.ts (2)
sendWebsocketMessage
(11-16)closeWs
(5-7)server/integrations/microsoft/sharepoint/index.ts (3)
discoverSharePointSites
(12-47)discoverSiteDrives
(50-110)processSiteDrives
(113-246)
server/integrations/microsoft/utils.ts (5)
server/integrations/google/pdf-utils.ts (3)
downloadDir
(11-11)deleteDocument
(13-30)safeLoadPDF
(60-77)server/integrations/microsoft/client.ts (4)
MicrosoftGraphClient
(23-37)makeGraphApiCall
(98-112)downloadFileFromGraph
(193-209)Client
(249-249)server/integrations/microsoft/index.ts (1)
loggerWithChild
(82-84)server/utils.ts (1)
hashPdfFilename
(325-332)server/docxChunks.ts (1)
extractTextAndImagesWithChunksFromDocx
(2412-2765)
🔇 Additional comments (16)
server/server.ts (4)
45-45
: LGTM: schema importImporting microsoftServiceSchema here is correct and scoped.
79-80
: LGTM: handler importAddServiceConnectionMicrosoft import aligns with new route below.
797-801
: LGTM: no functional changeMultiline formatting only; route behavior unchanged.
1061-1065
: microsoftServiceSchema likely rejects valid service-account forms (superRefine scopes).microsoftServiceSchema derives from createOAuthProvider.omit({ scopes: true }). If createOAuthProvider’s superRefine still enforces scopes when isUsingGlobalCred is false, this route will 400 on valid service-account submissions without scopes.
Apply one of:
- Make scopes optional/ignored for service-account schema (or pass isUsingGlobalCred=true).
- Create a separate base schema without the scopes superRefine and use that for service accounts.
Would you like a targeted patch in server/types.ts to conditionally relax scopes for service accounts?
server/integrations/microsoft/sync.ts (7)
185-195
: LGTM: Graph delta paginationSwitching to makeGraphApiCall with pagination is correct and consistent with client helpers.
300-301
: LGTM: unified file download helperUsing downloadFileFromGraph for DOCX/XLSX content fetch simplifies endpoints and error handling.
Also applies to: 317-318
1811-1911
: LGTM: service-account SharePoint change loopClient creation, deltaLinks in-place update, and sync history writes look correct. Ensure upstream fix for deltaLink guarding (above) lands before enabling.
Would you like a dry-run script to print current SharePoint sync jobs’ deltaLinks and confirm they remain non-empty after a run?
83-83
: Invalid import: zod has no named export ‘unknown’.This breaks build. Use z.unknown() via import { z } from "zod" if needed, otherwise remove.
-import { unknown } from "zod" +// removed invalid import; use `import { z } from "zod"` and `z.unknown()` if neededBased on learnings
1250-1252
: Don’t overwrite a valid delta token with an empty value.If Graph omits @odata.deltaLink, you wipe a good token.
- //update with new Link - deltaLinks[driveKey] = newDeltaLink + // Update only if a new deltaLink was returned + if (newDeltaLink) { + deltaLinks[driveKey] = newDeltaLink + } else { + loggerWithChild({ email }).warn( + `No new deltaLink for drive ${driveKey}; preserving existing token`, + ) + }
1297-1299
: Guard substring call on possibly undefined nextLink.substring on undefined will throw.
- loggerWithChild({ email }).info( - `Fetching delta changes from: ${nextLink.substring(0, 100)}...`, - ) + loggerWithChild({ email }).info( + `Fetching delta changes from: ${String(nextLink ?? "").substring(0, 100)}...`, + )
1413-1421
: Remove @microsoft.graph.downloadUrl from persisted metadata.It’s a short‑lived pre‑authorized URL; storing increases leakage risk.
metadata: JSON.stringify({ size: item.size, - downloadUrl: item["@microsoft.graph.downloadUrl"], siteId: siteId, driveId: driveId, parentId: item.parentReference?.id ?? "", parentPath: item.parentReference?.path ?? "/", eTag: item.eTag ?? "", }),
server/integrations/microsoft/index.ts (2)
817-924
: LGTM: service-account ingestion orchestrationDiscovery -> drive processing -> deltaLinks capture -> sync job write is sound. WebSocket progress envelope consistent with metricStream.
832-837
: Fix createMicrosoftGraphClient argument order (tenantId is 5th optional param).Passing tenantId as 4th sets it as refreshToken; tenantId is then lost.
- const graphClient = createMicrosoftGraphClient( - credentials.access_token, - credentials.clientId, - credentials.clientSecret, - credentials.tenantId, - ) + const graphClient = createMicrosoftGraphClient( + credentials.access_token, + credentials.clientId, + credentials.clientSecret, + undefined, + credentials.tenantId, + )#!/bin/bash # Find all call sites and show argument lists to confirm ordering rg -nPU --type=ts '(?s)createMicrosoftGraphClient\s*\((.*?)\)' -C2server/integrations/microsoft/utils.ts (3)
278-318
: LGTM: SharePoint permissions extractionRobustly handles grantedToV2 and parses emails from loginName; dedup via Set is good.
600-642
: LGTM: content extraction wrapperClear type routing; safe fallbacks; avoids processing unsupported types.
26-36
: Ensure download directory exists in production (and allow override).Current code only creates the dir in non‑prod; writes will fail in prod.
-// Download directory setup -export const downloadDir = path.resolve(__dirname, "../../downloads") - -if (process.env.NODE_ENV !== "production") { - const init = () => { - if (!fs.existsSync(downloadDir)) { - fs.mkdirSync(downloadDir, { recursive: true }) - } - } - init() -} +// Download directory setup +export const downloadDir = process.env.DOWNLOAD_DIR + ? path.resolve(process.env.DOWNLOAD_DIR) + : path.resolve(__dirname, "../../downloads") + +if (!fs.existsSync(downloadDir)) { + fs.mkdirSync(downloadDir, { recursive: true }) +}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
server/api/admin.ts (1)
520-521
: Defer “Connected” status until ingestion completesSet to NotConnected (or an “InProgress” equivalent) here; the ingestion flow already sets Connected on success. Avoids showing a false-positive connected state if ingestion fails.
- ConnectorStatus.Connected, // Set as connected since we validated the connection + ConnectorStatus.NotConnected,server/integrations/microsoft/sync.ts (1)
1182-1184
: Implement cleanup for deleted drivesThere’s a TODO to remove files for deleted drives. Leaving stale docs will bloat index and cause stale search results.
I can draft a helper deleteSharePointDriveDocuments(siteId::driveId) that queries by metadata and bulk-deletes matching docs. Want me to add it?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
server/api/admin.ts
(4 hunks)server/integrations/microsoft/sync.ts
(14 hunks)server/utils.ts
(4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-28T10:47:41.020Z
Learnt from: naSim087
PR: xynehq/xyne#484
File: server/integrations/google/sync.ts:222-222
Timestamp: 2025-05-28T10:47:41.020Z
Learning: The functions `handleGoogleDriveChange` and `getDriveChanges` in `server/integrations/google/sync.ts` are intentionally exported for future changes, even though they are not currently being imported by other modules.
Applied to files:
server/integrations/microsoft/sync.ts
server/utils.ts
🧬 Code graph analysis (3)
server/api/admin.ts (8)
server/integrations/microsoft/index.ts (2)
loggerWithChild
(82-84)handleMicrosoftServiceAccountIngestion
(817-923)server/db/user.ts (1)
getUserByEmail
(147-156)server/db/connector.ts (2)
db
(3-3)insertConnector
(43-92)server/errors/index.ts (3)
NoUserFound
(322-327)ConnectorNotCreated
(338-343)AddServiceConnectionError
(330-335)server/types.ts (2)
microsoftService
(296-296)MicrosoftServiceCredentials
(462-469)server/integrations/microsoft/utils.ts (1)
CustomServiceAuthProvider
(100-130)server/db/schema/connectors.ts (1)
selectConnectorSchema
(135-139)server/utils.ts (1)
getErrorMessage
(106-109)
server/integrations/microsoft/sync.ts (10)
server/integrations/microsoft/client.ts (4)
makeGraphApiCall
(98-112)downloadFileFromGraph
(193-209)MicrosoftGraphClient
(23-37)createMicrosoftGraphClient
(60-95)server/integrations/microsoft/sharepoint/index.ts (3)
discoverSharePointSites
(12-47)discoverSiteDrives
(50-110)processSiteDrives
(113-246)server/integrations/microsoft/index.ts (1)
loggerWithChild
(82-84)server/utils.ts (1)
getErrorMessage
(106-109)server/search/vespa.ts (3)
getDocumentOrNull
(65-65)DeleteDocument
(67-67)insertWithRetry
(212-212)server/integrations/microsoft/utils.ts (2)
getFilePermissionsSharepoint
(278-318)processFileContent
(600-642)server/db/syncJob.ts (2)
getAppSyncJobs
(35-45)updateSyncJob
(65-87)server/db/connector.ts (2)
db
(3-3)getMicrosoftAuthConnectorWithCredentials
(307-368)server/types.ts (1)
MicrosoftServiceCredentials
(462-469)server/db/syncHistory.ts (1)
insertSyncHistory
(10-30)
server/utils.ts (3)
server/integrations/microsoft/client.ts (2)
MicrosoftGraphClient
(23-37)updateMicrosoftGraphClient
(39-57)server/integrations/microsoft/config.ts (1)
scopes
(2-5)server/integrations/microsoft/utils.ts (1)
CustomServiceAuthProvider
(100-130)
🔇 Additional comments (11)
server/api/admin.ts (3)
497-517
: Confirm secrets are encrypted at restYou serialize clientSecret into credentials. Ensure the connectors.credentials column is transparently encrypted (app-layer or DB-layer). If not, add encryption before insert.
Would you like a patch to wrap credential serialization with the project’s encryption helper?
475-476
: Replace ts-ignore with a type assertion for safetyUse a typed cast instead of suppressing types.
- // @ts-ignore - const form: microsoftService = c.req.valid("form") + const form = c.req.valid("form") as microsoftService
528-528
: Handle the background promise to avoid unhandled rejectionsAdd a .catch() (or await in try/catch) on the ingestion call.
- handleMicrosoftServiceAccountIngestion(email, connector) + handleMicrosoftServiceAccountIngestion(email, connector).catch((error) => { + loggerWithChild({ email }).error( + error, + `Background Microsoft service account ingestion failed for connector ${connector.id}: ${getErrorMessage(error)}`, + ) + })server/utils.ts (3)
213-219
: Good fix: robust Google token refreshNull-safe handling of refreshAccessToken avoids runtime crashes.
220-256
: Microsoft token refresh flow is sensibleBranching between delegated (refresh token) and service principal (client credentials) and updating the Graph client in-place looks correct.
If you’ve seen Graph returning 401 with code other than InvalidAuthenticationToken, consider broadening the condition to include statusCode 401 on GraphError.
315-319
: Expanded Microsoft app detection LGTMIncluding MicrosoftSharepoint ensures retry/backoff paths trigger for SharePoint.
server/integrations/microsoft/sync.ts (5)
1168-1172
: Filter out drives missing siteId before keying deltaLinksPrevents keys like "undefined::driveId".
- drives = drives.filter( - (drive) => - drive.id && - !(`${drive.sharePointIds?.siteId}::${drive.id}` in deltaLinks), - ) + drives = drives.filter( + (drive) => + drive.id && + drive.sharePointIds?.siteId && + !(`${drive.sharePointIds.siteId}::${drive.id}` in deltaLinks), + )
1249-1251
: Don’t overwrite an existing delta token with an empty valueOnly update when a new token is returned.
- //update with new Link - deltaLinks[driveKey] = newDeltaLink + // Update only if a new deltaLink was returned + if (newDeltaLink) { + deltaLinks[driveKey] = newDeltaLink + } else { + loggerWithChild({ email }).warn( + `No new deltaLink for drive ${driveKey}; preserving existing token`, + ) + }
1830-1836
: Correct createMicrosoftGraphClient argument orderService-account client is constructed with (accessToken, clientId, clientSecret, undefined, tenantId) — matches the signature.
Ensure other call sites also follow: createMicrosoftGraphClient(accessToken, clientId, clientSecret, refreshToken?, tenantId?).
1296-1299
: Guard substring on possibly undefined nextLinkAvoids runtime error when nextLink is falsy.
- loggerWithChild({ email }).info( - `Fetching delta changes from: ${nextLink.substring(0, 100)}...`, - ) + loggerWithChild({ email }).info( + `Fetching delta changes from: ${String(nextLink ?? "").substring(0, 100)}...`, + )
1411-1420
: Don’t persist @microsoft.graph.downloadUrl in metadataIt’s short‑lived and pre‑authorized; storing increases leakage risk.
metadata: JSON.stringify({ size: item.size, - downloadUrl: item["@microsoft.graph.downloadUrl"], siteId: siteId, driveId: driveId, parentId: item.parentReference?.id ?? "", parentPath: item.parentReference?.path ?? "/", eTag: item.eTag ?? "", }),
frontend/src/routes/_authenticated/admin/integrations/microsoft.tsx
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server/integrations/microsoft/utils.ts (1)
224-233
: Avoid embedding @microsoft.graph.downloadUrl in metadata.Same risk as in SharePoint path — short‑lived pre‑authorized URL.
Apply:
return JSON.stringify({ parentPath: file.parentReference?.path ?? "", parentName: file.parentReference?.name ?? "", size: file.size ?? 0, webUrl: file.webUrl ?? "", - downloadUrl: file["@microsoft.graph.downloadUrl"] ?? "", lastModifiedBy: file.lastModifiedBy?.user?.displayName ?? "", createdBy: file.createdBy?.user?.displayName ?? "", })
🧹 Nitpick comments (11)
server/integrations/microsoft/sync.ts (4)
1409-1414
: Use correct entity from MIME type instead of Misc.Improves search/UX and consistency with OneDrive path.
Apply:
- entity: DriveEntity.Misc, + entity: getEntityFromMimeType(item.file?.mimeType),And add the import (see suggested change for lines 62–67 below).
62-67
: Import getEntityFromMimeType to classify SharePoint files properly.Apply:
import { getFilePermissionsSharepoint, MicrosoftMimeType, microsoftMimeTypeMap, processFileContent, type OneDriveFile, + getEntityFromMimeType, } from "./utils"
1249-1253
: Preserve existing delta token if no new token returned.Currently updates only when non-empty; log when preserved to aid debugging.
Apply:
- if (newDeltaLink !== "") { - //update with new Link - deltaLinks[driveKey] = newDeltaLink - } + if (newDeltaLink) { + deltaLinks[driveKey] = newDeltaLink + } else { + loggerWithChild({ email }).warn( + `No new deltaLink for drive ${driveKey}; preserving existing token`, + ) + }
1370-1374
: Don't throw for “not found in Vespa” on delete path.Deletion events can arrive after cleanup; skip throwing to reduce noisy error logs.
Apply:
- } else { - throw new Error("File not found in vespa") - } + }server/utils.ts (1)
199-202
: Expand 401 detection for Microsoft/Graph.Graph errors often use status/statusCode; include those for reliability.
Apply:
- (error.code === 401 || error.code === "InvalidAuthenticationToken") && + (error.code === 401 || + error.status === 401 || + error.statusCode === 401 || + error.code === "InvalidAuthenticationToken") &&server/integrations/microsoft/utils.ts (2)
38-51
: Rename deleteDocument helper to avoid confusion with Vespa DeleteDocument.Same name as search/vespa.DeleteDocument; can confuse readers and tooling.
Consider renaming to deleteTempFile and updating call sites.
455-456
: Use Logger instead of console.error for consistency.Adopt structured logging for errors.
Replace console.error with Logger.error(error, '...') in these blocks.
Also applies to: 508-509, 593-594
server/db/connector.ts (1)
321-325
: Use a more accurate error type name for service-account connectors.Throwing NoOauthConnectorFound for ServiceAccount is misleading.
Consider introducing NoServiceConnectorFound or reusing a neutral NotFound error.
server/api/admin.ts (3)
519-521
: Set initial connector status to Connecting; flip to Connected after ingestion.Prevents UI showing “Connected” while heavy ingestion runs.
Apply:
- ConnectorStatus.Connected, // Set as connected since we validated the connection + ConnectorStatus.Connecting,And let the ingestion flow set Connected on success (it already updates status in ingestion).
527-528
: Avoid blocking the request on ingestion; run in background with error handling.Ingestion can be long‑running; prefer fire‑and‑forget with a catch.
Apply:
- await handleMicrosoftServiceAccountIngestion(email, connector) + void handleMicrosoftServiceAccountIngestion(email, connector).catch( + (error) => { + loggerWithChild({ email: sub }).error( + error, + `Background Microsoft service account ingestion failed for connector ${connector.id}: ${getErrorMessage(error)}`, + ) + }, + )
474-475
: Replace ts-ignore with a type assertion for form.Keeps type safety without suppressing errors.
Apply:
- // @ts-ignore - const form: microsoftService = c.req.valid("form") + const form = c.req.valid("form") as microsoftService
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
frontend/src/routes/_authenticated/admin/integrations/microsoft.tsx
(5 hunks)server/api/admin.ts
(4 hunks)server/db/connector.ts
(5 hunks)server/integrations/microsoft/sharepoint/index.ts
(1 hunks)server/integrations/microsoft/sync.ts
(14 hunks)server/integrations/microsoft/utils.ts
(3 hunks)server/package.json
(3 hunks)server/types.ts
(5 hunks)server/utils.ts
(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- server/package.json
- server/integrations/microsoft/sharepoint/index.ts
- server/types.ts
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-28T10:47:41.020Z
Learnt from: naSim087
PR: xynehq/xyne#484
File: server/integrations/google/sync.ts:222-222
Timestamp: 2025-05-28T10:47:41.020Z
Learning: The functions `handleGoogleDriveChange` and `getDriveChanges` in `server/integrations/google/sync.ts` are intentionally exported for future changes, even though they are not currently being imported by other modules.
Applied to files:
server/integrations/microsoft/sync.ts
🧬 Code graph analysis (6)
frontend/src/routes/_authenticated/admin/integrations/microsoft.tsx (2)
frontend/src/api.ts (1)
api
(4-4)server/shared/types.ts (1)
Apps
(38-38)
server/integrations/microsoft/utils.ts (5)
server/integrations/microsoft/client.ts (4)
MicrosoftGraphClient
(23-37)makeGraphApiCall
(98-112)downloadFileFromGraph
(193-209)Client
(249-249)server/integrations/microsoft/index.ts (1)
loggerWithChild
(82-84)server/utils.ts (1)
hashPdfFilename
(328-335)server/chunks.ts (1)
chunkDocument
(123-133)server/docxChunks.ts (1)
extractTextAndImagesWithChunksFromDocx
(2412-2765)
server/utils.ts (3)
server/integrations/microsoft/client.ts (2)
MicrosoftGraphClient
(23-37)updateMicrosoftGraphClient
(39-57)server/integrations/microsoft/config.ts (1)
scopes
(2-5)server/integrations/microsoft/utils.ts (1)
CustomServiceAuthProvider
(100-130)
server/api/admin.ts (8)
server/integrations/microsoft/index.ts (2)
loggerWithChild
(82-84)handleMicrosoftServiceAccountIngestion
(817-923)server/db/user.ts (1)
getUserByEmail
(147-156)server/db/connector.ts (2)
db
(3-3)insertConnector
(42-91)server/errors/index.ts (3)
NoUserFound
(322-327)ConnectorNotCreated
(338-343)AddServiceConnectionError
(330-335)server/types.ts (2)
microsoftService
(295-295)MicrosoftServiceCredentials
(461-468)server/integrations/microsoft/utils.ts (1)
CustomServiceAuthProvider
(100-130)server/db/schema/connectors.ts (1)
selectConnectorSchema
(135-139)server/utils.ts (1)
getErrorMessage
(103-106)
server/integrations/microsoft/sync.ts (10)
server/integrations/microsoft/client.ts (4)
makeGraphApiCall
(98-112)downloadFileFromGraph
(193-209)MicrosoftGraphClient
(23-37)createMicrosoftGraphClient
(60-95)server/integrations/microsoft/sharepoint/index.ts (3)
discoverSharePointSites
(12-47)discoverSiteDrives
(50-110)processSiteDrives
(113-252)server/integrations/microsoft/index.ts (1)
loggerWithChild
(82-84)server/utils.ts (1)
getErrorMessage
(103-106)server/search/vespa.ts (3)
getDocumentOrNull
(65-65)DeleteDocument
(67-67)insertWithRetry
(212-212)server/integrations/microsoft/utils.ts (2)
getFilePermissionsSharepoint
(278-318)processFileContent
(601-644)server/db/syncJob.ts (2)
getAppSyncJobs
(35-45)updateSyncJob
(65-87)server/db/connector.ts (2)
db
(3-3)getMicrosoftAuthConnectorWithCredentials
(306-367)server/types.ts (1)
MicrosoftServiceCredentials
(461-468)server/db/syncHistory.ts (1)
insertSyncHistory
(10-30)
server/db/connector.ts (5)
server/utils.ts (2)
IsGoogleApp
(303-310)IsMicrosoftApp
(312-319)server/types.ts (3)
OAuthCredentials
(312-318)TxnOrClient
(310-310)MicrosoftServiceCredentials
(461-468)server/db/schema/connectors.ts (3)
SelectConnector
(146-146)connectors
(60-109)selectConnectorSchema
(135-139)server/errors/index.ts (2)
NoOauthConnectorFound
(290-295)MissingOauthConnectorCredentialsError
(262-271)server/integrations/microsoft/utils.ts (1)
CustomServiceAuthProvider
(100-130)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (9)
frontend/src/routes/_authenticated/admin/integrations/microsoft.tsx (1)
61-62
: Fix scopes payload: drop empty entries
scopes: [value.scopes, ""]
still pushes an empty scope to the provider, which breaks delegated auth. Please send only non-empty scope strings.- scopes: [value.scopes, ""], + scopes: value.scopes + .split(/[,\s]+/) + .map((scope) => scope.trim()) + .filter(Boolean),server/integrations/microsoft/sync.ts (3)
1298-1301
: Guard nextLink before substring.If nextLink is empty/undefined, substring throws.
Apply:
- loggerWithChild({ email }).info( - `Fetching delta changes from: ${nextLink.substring(0, 100)}...`, - ) + loggerWithChild({ email }).info( + `Fetching delta changes from: ${String(nextLink ?? "").substring(0, 100)}...`, + )
1167-1172
: Filter out drives missing siteId before initial ingestion.Avoid keys like "undefined::driveId".
Apply:
drives = drives.filter( (drive) => - drive.id && - !(`${drive.sharePointIds?.siteId}::${drive.id}` in deltaLinks), + drive.id && + drive.sharePointIds?.siteId && + !(`${drive.sharePointIds.siteId}::${drive.id}` in deltaLinks), )
1414-1422
: Do not persist @microsoft.graph.downloadUrl in metadata.Pre-authorized, short‑lived URL; storing increases leakage risk and has little value. Remove it.
Apply:
metadata: JSON.stringify({ size: item.size, - downloadUrl: item["@microsoft.graph.downloadUrl"], siteId: siteId, driveId: driveId, parentId: item.parentReference?.id ?? "", parentPath: item.parentReference?.path ?? "/", eTag: item.eTag ?? "", }),
server/integrations/microsoft/utils.ts (2)
26-36
: Ensure download directory exists in all environments.Only creating it in non-prod breaks PDF processing in prod.
Apply:
-// Download directory setup -export const downloadDir = path.resolve(__dirname, "../../downloads") - -if (process.env.NODE_ENV !== "production") { - const init = () => { - if (!fs.existsSync(downloadDir)) { - fs.mkdirSync(downloadDir, { recursive: true }) - } - } - init() -} +// Download directory setup +export const downloadDir = process.env.DOWNLOAD_DIR + ? path.resolve(process.env.DOWNLOAD_DIR) + : path.resolve(__dirname, "../../downloads") + +if (!fs.existsSync(downloadDir)) { + fs.mkdirSync(downloadDir, { recursive: true }) +}
278-318
: Permissions parsing logic looks solid.Covers modern and legacy shapes, dedupes, and handles fallbacks.
server/db/connector.ts (1)
306-367
: Service-account token refresh and persistence LGTM.Expired token refresh via ClientSecretCredential and writing updated credentials back to DB is correct.
server/api/admin.ts (1)
132-136
: Remove unused Azure/Microsoft Graph imports.Not used; keep surface minimal.
Apply:
-import { ClientSecretCredential } from "@azure/identity" -import { Client as GraphClient } from "@microsoft/microsoft-graph-client" -import type { AuthenticationProvider } from "@microsoft/microsoft-graph-client"server/utils.ts (1)
199-207
: Fix unsafe non-null assertion in Google token refresh.refreshAccessToken can fail; current code can throw at runtime.
Apply:
- if (authClient instanceof OAuth2Client) { + if (authClient instanceof OAuth2Client) { if (IsGoogleApp(app)) { Logger.info(`401 encountered, refreshing OAuth access token...`) - const { credentials } = await authClient?.refreshAccessToken()! - authClient?.setCredentials(credentials) + const refreshResult = await authClient.refreshAccessToken() + if (!refreshResult?.credentials) { + throw new Error("Failed to refresh Google OAuth access token") + } + const { credentials } = refreshResult + authClient.setCredentials(credentials) return retryWithBackoff( fn, context, app, retries + 1, authClient, )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
server/integrations/microsoft/client.ts (4)
21-36
: getCurrentTokens returns stale values after token refreshgetCurrentTokens closes over constructor arguments; updateMicrosoftGraphClient mutates fields but getCurrentTokens keeps returning the original tokens. Read from object fields instead.
Apply these diffs to (a) add expiresAt to the interface, (b) have getCurrentTokens read current fields:
export interface MicrosoftGraphClient { client: Client accessToken: string + expiresAt?: Date refreshToken?: string // Only for delegated clients tenantId?: string // Only for service clients clientId: string clientSecret: string betaClient: Client // Helper methods to get updated tokens after refresh getCurrentTokens(): { accessToken: string refreshToken?: string expiresAt?: Date } }
export const createMicrosoftGraphClient = ( accessToken: string, clientId: string, clientSecret: string, refreshToken?: string, tenantId?: string, tokenExpiresAt?: Date, ): MicrosoftGraphClient => { const authProvider = new CustomAuthProvider(accessToken) const client = Client.initWithMiddleware({ authProvider, defaultVersion: "v1.0", }) const betaClient = Client.initWithMiddleware({ authProvider, defaultVersion: "beta", }) - return { - client, - accessToken, - refreshToken, - tenantId, - clientId, - clientSecret, - betaClient, - getCurrentTokens() { - return { - accessToken, - refreshToken, - expiresAt: tokenExpiresAt, - } - }, - } + const graph: MicrosoftGraphClient = { + client, + betaClient, + accessToken, + refreshToken, + tenantId, + clientId, + clientSecret, + expiresAt: tokenExpiresAt, + getCurrentTokens() { + return { + accessToken: graph.accessToken, + refreshToken: graph.refreshToken, + expiresAt: graph.expiresAt, + } + }, + } + return graph }
96-111
: Normalize absolute Graph URLs (supports @odata.nextLink/@odata.deltaLink)makeGraphApiCall fails when endpoint is an absolute URL. Normalize like makePagedGraphApiCall does.
export const makeGraphApiCall = async ( graphClient: MicrosoftGraphClient, endpoint: string, ): Promise<any> => { return retryWithBackoff( async () => { - const result = await graphClient.client.api(endpoint).get() + let path = endpoint + if (endpoint.startsWith("http")) { + const url = new URL(endpoint) + path = url.pathname + url.search + } + const result = await graphClient.client.api(path).get() return result }, `Making Microsoft Graph API call to ${endpoint}`, Apps.MicrosoftDrive, 1, graphClient, ) }
112-127
: Same absolute-URL issue for beta clientMirror the normalization for makeBetaGraphApiCall.
return retryWithBackoff( async () => { - const result = await graphClient.betaClient.api(endpoint).get(options) + let path = endpoint + if (endpoint.startsWith("http")) { + const url = new URL(endpoint) + path = url.pathname + url.search + } + const result = await graphClient.betaClient.api(path).get(options) return result },
128-151
: Handle absolute URLs and keep headers on normalized requestsNormalize endpoint and keep custom headers on the request.
return retryWithBackoff( async () => { - const request = graphClient.client.api(endpoint) + let path = endpoint + if (endpoint.startsWith("http")) { + const url = new URL(endpoint) + path = url.pathname + url.search + } + const request = graphClient.client.api(path) // Add custom headers Object.entries(headers).forEach(([key, value]) => { request.header(key, value) }) const result = await request.get(options) return resultserver/integrations/microsoft/sync.ts (1)
741-748
: Avoid calling Graph with empty endpoint for DeletedItemsdeletedItemsEndpoint defaults to "", leading to a request to the service root. Guard or start with the proper delta URL only if available.
- const deletedItemsResponse = await makeGraphApiCall( - client, - deletedItemsEndpoint, - ) + if (!deletedItemsEndpoint) { + Logger.info("Skipping DeletedItems delta: no token yet") + return { newDeltaTokens, stats, changesExist } + } + const deletedItemsResponse = await makeGraphApiCall(client, deletedItemsEndpoint)server/integrations/microsoft/index.ts (1)
951-957
: Critical: Incorrect argument order for createMicrosoftGraphClient in OAuth flow
Function signature is(accessToken, clientId, clientSecret, refreshToken?)
, but the code passesrefresh_token
as the second argument and shiftsclientId
/clientSecret
, which breaks token refreshing.
- In
server/integrations/microsoft/index.ts
(lines 951–957), swap the 2nd–4th parameters:- createMicrosoftGraphClient( - oauthTokens.access_token, - oauthTokens.refresh_token, - microsoftProvider.clientId!, - microsoftProvider.clientSecret! as string, - ) + createMicrosoftGraphClient( + oauthTokens.access_token, + microsoftProvider.clientId!, + microsoftProvider.clientSecret! as string, + oauthTokens.refresh_token +)
- Likewise, update the usage example in
docs/microsoft-integration/README.md
to pass(accessToken, clientId, clientSecret, refreshToken)
.
🧹 Nitpick comments (16)
server/integrations/microsoft/client.ts (1)
153-189
: Parity: pass options on absolute @odata.nextLink branchPass options in the absolute-URL branch for consistency.
- return await graphClient.client.api(path).get() + return await graphClient.client.api(path).get(options)server/integrations/microsoft/sharepoint/index.ts (2)
146-151
: Absolute nextLink handling depends on client fix@odata.nextLink will be absolute; with the proposed client helper normalization, this loop will work with both relative and absolute endpoints. If you don’t adopt that change, normalize here.
135-137
: Improve log contextLog siteId instead of repeating the drive name twice.
- loggerWithChild({ email: userEmail }).info( - `Processing drive: ${siteDrive.name} from site: ${siteDrive.name}`, - ) + loggerWithChild({ email: userEmail }).info( + `Processing drive: ${siteDrive.name} from siteId: ${siteDrive.sharePointIds.siteId}`, + )server/integrations/microsoft/sync.ts (3)
166-196
: Absolute URL endpoints in delta flowsgetOneDriveDelta may pass absolute URLs to makeGraphApiCall. With the proposed normalization in client helpers, this becomes safe. Without it, normalize here before calling.
1409-1411
: Use mime-type mapping for entity instead of hardcoding MiscAlign with other paths and improve result fidelity.
- entity: DriveEntity.Misc, + entity: getEntityFromMimeType(item.file?.mimeType),Additionally, import it:
-import { +import { getFilePermissionsSharepoint, MicrosoftMimeType, microsoftMimeTypeMap, processFileContent, loggerWithChild, + getEntityFromMimeType, type OneDriveFile, } from "./utils"
347-355
: Use retry/backoff wrapper for text extraction requestsThis direct client call bypasses retry/backoff and absolute-URL normalization patterns. Use the header-capable helper.
- const contentResponse = await graphClient.client - .api(`/me/drive/items/${file.id}/content`) - .header("Accept", "text/plain") - .get() + const contentResponse = await makeGraphApiCallWithHeaders( + graphClient, + `/me/drive/items/${file.id}/content`, + { Accept: "text/plain" }, + )server/integrations/microsoft/index.ts (4)
835-845
: Progress interval cleanup: stop using fixed 8s timeout; clear in finally.The setTimeout-based clearInterval after 8s can stop updates mid-run or keep running after completion. Clear the interval deterministically on success and on error.
Apply this diff:
- const interval = setInterval(() => { + const interval = setInterval(() => { sendWebsocketMessage( JSON.stringify({ progress: tracker.getProgress(), userStats: tracker.getServiceAccountProgress().userStats, startTime: tracker.getStartTime(), }), connector!.externalId, ) }, 4000) - - // const driveTokens = {} - setTimeout(() => { - clearInterval(interval) - }, 8000) + // Ensure cleanup happens regardless of outcomeAnd add clears on both success and error paths:
await insertSyncJob(trx, { ... }) loggerWithChild({ email: email! }).info( `Microsoft SharePoint service account ingestion completed (jobId: ${jobId})`, ) - closeWs(connector!.externalId) + clearInterval(interval) + closeWs(connector!.externalId)} catch (error) { loggerWithChild({ email: email! }).error( error, `handleMicrosoftServiceAccountIngestion (jobId: ${jobId}) failed: ${(error as Error).message}`, ) if (connector) { await db .update(connectors) .set({ status: ConnectorStatus.Failed, }) .where(eq(connectors.id, connector.id)) - closeWs(connector.externalId) + clearInterval(interval) + closeWs(connector.externalId) }Also note: Tracker.getProgress() for Microsoft returns 0 by design; consider emitting a computed percent from tracker.userStats to avoid a “stuck at 0%” UI.
Also applies to: 861-864
940-950
: Same interval cleanup issue in OAuth ingestion.Use finally-style cleanup instead of a fixed 8s timeout to avoid stale or prematurely stopped updates.
Apply this diff:
- const interval = setInterval(() => { + const interval = setInterval(() => { sendWebsocketMessage( JSON.stringify({ progress: tracker.getProgress(), userStats: tracker.getOAuthProgress().userStats, startTime: tracker.getStartTime(), }), connector.externalId, ) }, 4000)- setTimeout(() => { - clearInterval(interval) - }, 8000) + // On success: + // clearInterval(interval) just before closeWs(connector.externalId) + // On error: + // clearInterval(interval) before marking Failed and closing WSAlso applies to: 987-990
853-858
: Heads-up: N+1 Graph calls for permissions can be expensive.processSiteDrives fetches item permissions per file, which can be a major latency and throttling hotspot for large drives. Consider:
- Batch requests or defer per-item permission expansion to a follow-up pass.
- Limit concurrency with a queue (e.g., p-limit) and exponential backoff on 429s.
- Cache group membership lookups if you expand groups.
68-74
: Remove unused imports
BothClient
andInvokeModelResponseFilterSensitiveLog
are not used in this file; please remove them.--- a/server/integrations/microsoft/index.ts @@ lines 68,73 - import { Client } from "@microsoft/microsoft-graph-client" - import type { InvokeModelResponseFilterSensitiveLog } from "@aws-sdk/client-bedrock-runtime"server/integrations/microsoft/utils.ts (6)
42-54
: Throw a typed error for deleteDocument to align with error taxonomy.Use DeleteDocumentError (as used in Google integration) for consistency and observability.
Apply this diff:
+import { DeleteDocumentError } from "@/errors" ... } catch (err) { - Logger.error( - err, - `Error deleting file at ${filePath}: ${err} ${(err as Error).stack}`, - err, - ) - throw new Error(`Error deleting file: ${(err as Error).message}`) + Logger.error( + err, + `Error deleting file at ${filePath}: ${err} ${(err as Error).stack}`, + err, + ) + throw new DeleteDocumentError({ + message: "Error deleting file", + cause: err as Error, + integration: Apps.MicrosoftDrive, + entity: DriveEntity.PDF, + }) }
398-464
: PDF processing: add size guard, fill owner/permissions, and use Logger.
- Skip oversized files before downloading.
- Populate owner and permissions (OneDrive vs SharePoint).
- Use Logger instead of console.error.
Apply this diff:
for (const file of pdfFiles) { try { + // Skip large files + if (isFileTooLarge(file.size)) { + Logger.warn( + `Skipping PDF ${file.name} as it's larger than ${MAX_ONEDRIVE_FILE_SIZE}MB`, + ) + continue + } // Download PDF content const pdfBuffer = await downloadFileFromGraph( graphClient, file.id!, file.parentReference?.driveId!, ) @@ // Create Vespa document structure - results.push({ + const owner = + file.createdBy?.user?.displayName ?? + file.lastModifiedBy?.user?.displayName ?? + userEmail + const permissions = file.parentReference?.driveId + ? await getFilePermissionsSharepoint( + graphClient, + file.id!, + file.parentReference.driveId!, + ) + : await getFilePermissions(graphClient, file.id!) + + results.push({ title: file.name!, url: file.webUrl ?? "", app: Apps.MicrosoftDrive, docId: file.id!, parentId: file.parentReference?.id ?? null, - owner: "", // Extract from file.createdBy if available + owner, photoLink: "", ownerEmail: userEmail, entity: DriveEntity.PDF, - chunks: chunks.map((c: any) => c.chunk), - permissions: [], // Process file.permissions if available + chunks: chunks.map((c: any) => c.chunk), + permissions, mimeType: file.file?.mimeType ?? "", @@ - } catch (error) { - console.error(`Error processing PDF ${file.name}:`, error) + } catch (error) { + Logger.error(error, `Error processing PDF ${file.name}`) continue } }
467-517
: Word processing: robust app detection, size guard, owner/permissions, Logger.
- Don’t infer app from refreshToken; detect SharePoint via parentReference.driveId.
- Add size guard and fill owner/permissions, unify Logger.
Apply this diff:
for (const file of wordFiles) { try { + if (isFileTooLarge(file.size)) { + Logger.warn( + `Skipping DOCX ${file.name} as it's larger than ${MAX_ONEDRIVE_FILE_SIZE}MB`, + ) + continue + } // Download DOCX content @@ - results.push({ + const app = + file.parentReference?.driveId ? Apps.MicrosoftSharepoint : Apps.MicrosoftDrive + const owner = + file.createdBy?.user?.displayName ?? + file.lastModifiedBy?.user?.displayName ?? + userEmail + const permissions = file.parentReference?.driveId + ? await getFilePermissionsSharepoint( + graphClient, + file.id!, + file.parentReference.driveId!, + ) + : await getFilePermissions(graphClient, file.id!) + + results.push({ title: file.name!, url: file.webUrl ?? "", - app: graphClient.refreshToken - ? Apps.MicrosoftDrive - : Apps.MicrosoftSharepoint, + app, docId: file.id!, parentId: file.parentReference?.id ?? null, - owner: "", + owner, photoLink: "", ownerEmail: userEmail, entity: DriveEntity.WordDocument, // Reuse Google's entity types chunks: extractedContent.text_chunks || [], // Use text_chunks property - permissions: [], + permissions, mimeType: file.file?.mimeType ?? "", @@ - } catch (error) { - console.error(`Error processing Word document ${file.name}:`, error) + } catch (error) { + Logger.error(error, `Error processing Word document ${file.name}`) continue } }
520-602
: Consistent logging: replace console.error with Logger in Excel processing.Stay consistent with module logging.
Apply this diff:
- console.error( - `Error processing worksheet ${worksheet.name}:`, - worksheetError, - ) + Logger.error(worksheetError, `Error processing worksheet ${worksheet.name}`) @@ - } catch (error) { - console.error(`Error processing Excel file ${file.name}:`, error) + } catch (error) { + Logger.error(error, `Error processing Excel file ${file.name}`) continue }
604-647
: Consistent logging in processFileContent catch.Use Logger to capture stack and context.
Apply this diff:
} catch (error) { - console.error(`Error processing file content for ${file.name}:`, error) + Logger.error(error, `Error processing file content for ${file.name}`) return [] // Fallback to metadata only on error }
664-671
: Normalize lastSyncedAt to ISO string for storage consistency.Elsewhere you store ISO strings; keeping types consistent reduces parsing issues.
Apply this diff:
return { type: "microsoftDriveDeltaToken" as const, driveToken: "", // Will be set after first sync contactsToken: "", // For future contacts sync - lastSyncedAt: new Date(), + lastSyncedAt: new Date().toISOString(), }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
server/integrations/microsoft/client.ts
(7 hunks)server/integrations/microsoft/index.ts
(3 hunks)server/integrations/microsoft/sharepoint/index.ts
(1 hunks)server/integrations/microsoft/sync.ts
(14 hunks)server/integrations/microsoft/utils.ts
(3 hunks)server/package.json
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- server/package.json
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-05-28T10:47:41.020Z
Learnt from: naSim087
PR: xynehq/xyne#484
File: server/integrations/google/sync.ts:222-222
Timestamp: 2025-05-28T10:47:41.020Z
Learning: The functions `handleGoogleDriveChange` and `getDriveChanges` in `server/integrations/google/sync.ts` are intentionally exported for future changes, even though they are not currently being imported by other modules.
Applied to files:
server/integrations/microsoft/sync.ts
server/integrations/microsoft/index.ts
server/integrations/microsoft/utils.ts
server/integrations/microsoft/sharepoint/index.ts
📚 Learning: 2025-06-16T11:56:22.752Z
Learnt from: oindrila-b
PR: xynehq/xyne#545
File: server/integrations/google/index.ts:137-145
Timestamp: 2025-06-16T11:56:22.752Z
Learning: In server/integrations/google/index.ts, both Logger (base logger) and loggerWithChild (factory for email-scoped child loggers) are intentionally maintained to handle different logging scenarios - one for when email context is not available and one for when it is available.
Applied to files:
server/integrations/microsoft/utils.ts
🧬 Code graph analysis (5)
server/integrations/microsoft/sync.ts (9)
server/integrations/microsoft/client.ts (4)
makeGraphApiCall
(97-111)downloadFileFromGraph
(192-208)MicrosoftGraphClient
(22-36)createMicrosoftGraphClient
(59-94)server/integrations/microsoft/sharepoint/index.ts (3)
discoverSharePointSites
(11-46)discoverSiteDrives
(49-109)processSiteDrives
(112-251)server/integrations/microsoft/utils.ts (3)
loggerWithChild
(23-25)getFilePermissionsSharepoint
(281-321)processFileContent
(604-647)server/utils.ts (1)
getErrorMessage
(103-106)server/search/vespa.ts (3)
getDocumentOrNull
(65-65)DeleteDocument
(67-67)insertWithRetry
(212-212)server/db/syncJob.ts (2)
getAppSyncJobs
(35-45)updateSyncJob
(65-87)server/db/connector.ts (2)
db
(3-3)getMicrosoftAuthConnectorWithCredentials
(306-367)server/types.ts (1)
MicrosoftServiceCredentials
(461-468)server/db/syncHistory.ts (1)
insertSyncHistory
(10-30)
server/integrations/microsoft/client.ts (1)
server/shared/types.ts (1)
Apps
(38-38)
server/integrations/microsoft/index.ts (7)
server/db/schema/connectors.ts (2)
SelectConnector
(146-146)connectors
(60-109)server/integrations/microsoft/utils.ts (1)
loggerWithChild
(23-25)server/integrations/tracker.ts (1)
Tracker
(89-325)server/types.ts (1)
MicrosoftServiceCredentials
(461-468)server/integrations/microsoft/client.ts (1)
createMicrosoftGraphClient
(59-94)server/integrations/metricStream.ts (2)
sendWebsocketMessage
(11-16)closeWs
(5-7)server/integrations/microsoft/sharepoint/index.ts (3)
discoverSharePointSites
(11-46)discoverSiteDrives
(49-109)processSiteDrives
(112-251)
server/integrations/microsoft/utils.ts (6)
server/integrations/google/index.ts (4)
loggerWithChild
(147-149)downloadDir
(2431-2431)deleteDocument
(1467-1484)safeLoadPDF
(2479-2495)server/logger/index.ts (3)
getLoggerWithChild
(192-200)Subsystem
(15-15)getLogger
(36-93)server/integrations/microsoft/client.ts (4)
MicrosoftGraphClient
(22-36)makeGraphApiCall
(97-111)downloadFileFromGraph
(192-208)Client
(248-248)server/utils.ts (1)
hashPdfFilename
(328-335)server/chunks.ts (1)
chunkDocument
(123-133)server/docxChunks.ts (1)
extractTextAndImagesWithChunksFromDocx
(2412-2765)
server/integrations/microsoft/sharepoint/index.ts (4)
server/integrations/microsoft/client.ts (2)
MicrosoftGraphClient
(22-36)makeGraphApiCall
(97-111)server/integrations/microsoft/utils.ts (4)
loggerWithChild
(23-25)getFilePermissionsSharepoint
(281-321)getEntityFromMimeType
(217-223)processFileContent
(604-647)server/integrations/tracker.ts (1)
Tracker
(89-325)server/search/vespa.ts (1)
insertWithRetry
(212-212)
🔇 Additional comments (7)
server/integrations/microsoft/client.ts (1)
38-56
: Propagate refreshed expiry and keep client state consistentAdd expiresAt to the updater and assign it to the instance. This also addresses the previously raised token-expiry propagation concern.
export const updateMicrosoftGraphClient = ( graphClient: MicrosoftGraphClient, accessToken: string, - refreshToken?: string, + refreshToken?: string, + expiresAt?: Date, ) => { graphClient.accessToken = accessToken if (refreshToken) graphClient.refreshToken = refreshToken + if (expiresAt) graphClient.expiresAt = expiresAt const authProvider = new CustomAuthProvider(accessToken) graphClient.client = Client.initWithMiddleware({ authProvider, defaultVersion: "v1.0", }) graphClient.betaClient = Client.initWithMiddleware({ authProvider, defaultVersion: "beta", }) }server/integrations/microsoft/sharepoint/index.ts (1)
174-184
: Remove @microsoft.graph.downloadUrl from persisted metadataThis URL is short‑lived and pre‑authorized; persisting increases leakage risk without value.
metadata: JSON.stringify({ size: item.size, - downloadUrl: item["@microsoft.graph.downloadUrl"], siteId: siteDrive.sharePointIds?.siteId, driveId: siteDrive.id, driveName: siteDrive.name, driveType: siteDrive.driveType, parentId: item.parentReference?.id ?? "", parentPath: item.parentReference?.path ?? "/", eTag: item.eTag ?? "", }),
server/integrations/microsoft/sync.ts (3)
1298-1300
: Guard substring on possibly empty/undefined nextLinkAvoid runtime error when nextLink is falsy.
- loggerWithChild({ email }).info( - `Fetching delta changes from: ${nextLink.substring(0, 100)}...`, - ) + loggerWithChild({ email }).info( + `Fetching delta changes from: ${String(nextLink ?? "").substring(0, 100)}...`, + )
1415-1422
: Do not persist @microsoft.graph.downloadUrl in metadataSecurity/privacy risk; short‑lived pre‑authorized URL should not be stored.
metadata: JSON.stringify({ size: item.size, - downloadUrl: item["@microsoft.graph.downloadUrl"], siteId: siteId, driveId: driveId, parentId: item.parentReference?.id ?? "", parentPath: item.parentReference?.path ?? "/", eTag: item.eTag ?? "", }),
1150-1192
: Filter out drives with missing siteId before initial ingestionPrevents keys like "undefined::driveId" and ingestion of malformed entries.
- //Filters out newly added drives - drives = drives.filter( - (drive) => - drive.id && - !(`${drive.sharePointIds?.siteId}::${drive.id}` in deltaLinks), - ) + // Filters out newly added drives; ensure siteId exists + drives = drives.filter( + (drive) => + Boolean(drive.id) && + Boolean(drive.sharePointIds?.siteId) && + !(`${drive.sharePointIds!.siteId}::${drive.id}` in deltaLinks), + )server/integrations/microsoft/utils.ts (2)
29-39
: Ensure download directory in all environments (not only non-prod).This was flagged earlier; current logic only creates the directory in non-production, causing failures in prod.
Apply this diff:
-// Download directory setup -export const downloadDir = path.resolve(__dirname, "../../downloads") - -if (process.env.NODE_ENV !== "production") { - const init = () => { - if (!fs.existsSync(downloadDir)) { - fs.mkdirSync(downloadDir, { recursive: true }) - } - } - init() -} +// Download directory setup +export const downloadDir = + process.env.DOWNLOAD_DIR + ? path.resolve(process.env.DOWNLOAD_DIR) + : path.resolve(__dirname, "../../downloads") + +try { + if (!fs.existsSync(downloadDir)) { + fs.mkdirSync(downloadDir, { recursive: true }) + } +} catch (e) { + const fallback = path.resolve(require("os").tmpdir(), "xyne_downloads") + if (!fs.existsSync(fallback)) fs.mkdirSync(fallback, { recursive: true }) + // eslint-disable-next-line @typescript-eslint/ban-ts-comment + // @ts-ignore reassign to known-good path + ;(exports as any).downloadDir = fallback + Logger.warn({ fallback }, "Falling back to tmp download dir") +}
102-133
: Comment inaccurate: CustomServiceAuthProvider is used in server/utils.ts, server/db/connector.ts, and server/api/admin.ts, so it’s not unused.Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server/integrations/microsoft/index.ts (1)
953-958
: SwaprefreshToken
andclientId
/clientSecret
arguments in OAuth flowApply this in
server/integrations/microsoft/index.ts
:- const graphClient = createMicrosoftGraphClient( - oauthTokens.access_token, - oauthTokens.refresh_token, - microsoftProvider.clientId!, - microsoftProvider.clientSecret! as string, - ) + const graphClient = createMicrosoftGraphClient( + oauthTokens.access_token, + microsoftProvider.clientId!, + microsoftProvider.clientSecret! as string, + oauthTokens.refresh_token, + microsoftProvider.tenantId, // include if available + )
🧹 Nitpick comments (3)
server/integrations/microsoft/index.ts (3)
828-834
: Pass token expiry when constructing service-account Graph client (optional but helpful).Improves observability/refresh logic by preserving expiresAt.
const graphClient = createMicrosoftGraphClient( credentials.access_token, credentials.clientId, credentials.clientSecret, undefined, credentials.tenantId, + credentials.expires_at ? new Date(credentials.expires_at) : undefined, )
6-6
: Remove unused imports.Keep the module light and lint‑clean.
-import { - Subsystem, - SyncCron, - type MicrosoftServiceCredentials, - type OAuthCredentials, - type SaaSJob, - type SaaSOAuthJob, -} from "@/types" +import { + Subsystem, + SyncCron, + type MicrosoftServiceCredentials, + type OAuthCredentials, + type SaaSOAuthJob, +} from "@/types" @@ -import { getConnector, getOAuthConnectorWithCredentials } from "@/db/connector" +import { getOAuthConnectorWithCredentials } from "@/db/connector" @@ -import { Client } from "@microsoft/microsoft-graph-client" +// (removed) unused @@ -import type { InvokeModelResponseFilterSensitiveLog } from "@aws-sdk/client-bedrock-runtime" +// (removed) unusedIf your linting still flags others (e.g., DriveItem), we can prune them too.
Also applies to: 9-9, 68-68, 73-73
848-849
: Nit: prefer const for sites.It isn’t reassigned.
- let sites = await discoverSharePointSites(graphClient, email!) + const sites = await discoverSharePointSites(graphClient, email!)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
server/integrations/microsoft/index.ts
(3 hunks)server/package.json
(1 hunks)server/server.ts
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- server/server.ts
- server/package.json
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-28T10:47:41.020Z
Learnt from: naSim087
PR: xynehq/xyne#484
File: server/integrations/google/sync.ts:222-222
Timestamp: 2025-05-28T10:47:41.020Z
Learning: The functions `handleGoogleDriveChange` and `getDriveChanges` in `server/integrations/google/sync.ts` are intentionally exported for future changes, even though they are not currently being imported by other modules.
Applied to files:
server/integrations/microsoft/index.ts
🧬 Code graph analysis (1)
server/integrations/microsoft/index.ts (8)
server/db/schema/connectors.ts (2)
SelectConnector
(146-146)connectors
(60-109)server/integrations/microsoft/utils.ts (1)
loggerWithChild
(23-25)server/integrations/tracker.ts (1)
Tracker
(89-325)server/types.ts (1)
MicrosoftServiceCredentials
(461-468)server/integrations/microsoft/client.ts (1)
createMicrosoftGraphClient
(59-94)server/integrations/metricStream.ts (2)
sendWebsocketMessage
(11-16)closeWs
(5-7)server/integrations/microsoft/sharepoint/index.ts (3)
discoverSharePointSites
(11-46)discoverSiteDrives
(49-109)processSiteDrives
(112-251)server/errors/index.ts (1)
CouldNotFinishJobSuccessfully
(194-203)
🔇 Additional comments (1)
server/integrations/microsoft/index.ts (1)
75-79
: LGTM: SharePoint discovery/processing imports.Imports align with the new service-account SharePoint flow.
Summary by CodeRabbit
New Features
Improvements
Chores