Skip to content

feat(knowledge): connectors, user exclusions, expanded tools & airtable integration#3230

Merged
waleedlatif1 merged 17 commits intofeat/mothership-copilotfrom
feat/kb
Mar 5, 2026
Merged

feat(knowledge): connectors, user exclusions, expanded tools & airtable integration#3230
waleedlatif1 merged 17 commits intofeat/mothership-copilotfrom
feat/kb

Conversation

@waleedlatif1
Copy link
Collaborator

Summary

  • Knowledge base connectors: sync engine with SHA-256 change detection, connector registry, add/edit/delete UI
  • User-excluded documents persist across syncs — viewable and restorable from edit connector modal
  • Bulk delete paths set userExcluded for connector docs via SQL CASE
  • New knowledge tools: list_documents, list_chunks, delete_document, delete_chunk, update_chunk, list_tags
  • Airtable: list_bases and get_base_schema tools
  • Connector-aware delete confirmations in document and KB views
  • Refactored query hooks into hooks/queries/kb/ and hooks/queries/oauth/
  • DB migration: connector tables, sync logs, userExcluded column, document fields

Type of Change

  • New feature

Testing

Tested manually

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link

vercel bot commented Feb 17, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Mar 5, 2026 11:32pm

Request Review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 17, 2026

Greptile Summary

This PR introduces a significant knowledge-base expansion: a connector sync framework (Confluence, GitHub, Google Drive, Jira, Linear, Notion, Airtable) with SHA-256 change detection, a user-exclusion system that persists across syncs, six new knowledge tools (list_documents, list_chunks, delete_document, delete_chunk, update_chunk, list_tags), two new Airtable tools (list_bases, get_base_schema), and the associated DB migration adding knowledgeConnector, knowledgeConnectorSyncLog, and new columns on document.

Key findings:

  • LIKE wildcard injection in tag filters (service.ts buildTagFilterCondition): % and _ in user-supplied tag filter values are not escaped before being embedded in LIKE patterns. Although Drizzle parameterizes the value (preventing SQL injection), unescaped metacharacters cause incorrect query semantics — e.g. a value of % matches every row, _ matches any single character.
  • restore PATCH missing isNull(deletedAt) guard (connectors/[connectorId]/documents/route.ts): The restore operation doesn't guard against restoring soft-deleted documents. A direct API call with IDs that are both userExcluded = true and soft-deleted will silently un-delete them. The sibling exclude operation correctly includes this guard.
  • offset: 0 treated as falsy in list_chunks.ts: if (params.offset) drops a value of 0, mirroring the same bug already flagged in list_documents.ts.
  • get_base_schema.ts swallows API errors: transformResponse unconditionally returns success: true, hiding 4xx/5xx failures from callers — the same pattern flagged for the knowledge delete/update tools.

Confidence Score: 3/5

  • PR introduces several real bugs that need fixing before merge — incorrect LIKE wildcard matching on tag filters, a missing soft-delete guard that allows unintended document restoration, and silent error swallowing in the new Airtable tool.
  • Core sync engine logic and the connector registry are sound. The schema migration and new API routes are well-structured with proper auth and soft-delete handling. However, the LIKE wildcard bug in buildTagFilterCondition will produce wrong query results for any user whose tag values contain % or _; the missing isNull(deletedAt) check in the restore path is a logic gap exploitable via direct API; and the get_base_schema tool silently reports success on API failures. None of these are environment-breaking on deploy, but they affect data correctness and developer trust in the new tools.
  • apps/sim/lib/knowledge/documents/service.ts (LIKE wildcard escaping), apps/sim/app/api/knowledge/[id]/connectors/[connectorId]/documents/route.ts (restore guard), apps/sim/tools/knowledge/list_chunks.ts (offset falsy check), apps/sim/tools/airtable/get_base_schema.ts (error handling)

Important Files Changed

Filename Overview
apps/sim/lib/knowledge/connectors/sync-engine.ts Core sync engine: correct full/incremental logic, SHA-256 change detection, stale-doc deletion guarded behind full-sync check. No new issues found beyond previously-flagged items.
apps/sim/lib/knowledge/documents/service.ts New buildTagFilterCondition function correctly whitelists tag slots, but interpolates raw user input into LIKE patterns without escaping % and _ metacharacters, causing incorrect wildcard matching.
apps/sim/app/api/knowledge/[id]/connectors/[connectorId]/documents/route.ts GET query now correctly filters excluded docs with isNull(deletedAt), but the restore PATCH operation is missing the same isNull(deletedAt) guard, allowing it to silently un-delete soft-deleted documents via direct API calls.
apps/sim/tools/knowledge/list_chunks.ts offset: 0 is silently dropped by a falsy check (if (params.offset)), making it impossible for an LLM to explicitly request the first page — same pattern as the flagged list_documents.ts bug.
apps/sim/tools/airtable/get_base_schema.ts New Airtable schema tool; transformResponse always returns success: true regardless of HTTP status, silently swallowing 4xx/5xx API errors (missing scope, invalid baseId, etc.).
apps/sim/connectors/airtable/airtable.ts Well-structured Airtable connector with field-name caching via syncContext, pagination via offset cursor, and proper validateConfig that checks table and optional view access.
apps/sim/connectors/registry.ts Simple registry mapping 7 connector IDs to their configs; all entries are consistent with the connector files added in this PR.
packages/db/schema.ts Adds knowledgeConnector and knowledgeConnectorSyncLog tables plus connector fields on document; partial unique index on (connectorId, externalId) where deletedAt IS NULL is correct for soft-delete semantics.
apps/sim/app/api/knowledge/[id]/connectors/route.ts Connector list/create endpoints with tag-slot allocation, credential validation, and initial sync dispatch — well-structured with proper auth checks throughout.
apps/sim/app/api/knowledge/connectors/sync/route.ts Cron endpoint correctly gates on verifyCronAuth, queries only active/error connectors with an overdue nextSyncAt, and dispatches syncs fire-and-forget.

Sequence Diagram

sequenceDiagram
    participant User
    participant API as Next.js API
    participant SyncEngine as Sync Engine
    participant Connector as Connector (Notion/GDrive/…)
    participant DB as Database
    participant Storage as Storage Service
    participant Processor as Document Processor

    User->>API: POST /connectors (credentialId, sourceConfig)
    API->>Connector: validateConfig(accessToken, sourceConfig)
    Connector-->>API: { valid: true }
    API->>DB: INSERT knowledgeConnector + tag definitions (tx)
    API->>SyncEngine: dispatchSync(connectorId)

    SyncEngine->>DB: SELECT connector + KB owner userId
    SyncEngine->>SyncEngine: refreshAccessToken
    SyncEngine->>DB: UPDATE connector status=syncing (lock)
    SyncEngine->>DB: INSERT sync log (started)

    loop Paginated fetch
        SyncEngine->>Connector: listDocuments(accessToken, sourceConfig, cursor)
        Connector-->>SyncEngine: ExternalDocumentList
    end

    SyncEngine->>DB: SELECT existing docs (connectorId, deletedAt IS NULL)
    SyncEngine->>DB: SELECT excluded docs (userExcluded=true, deletedAt IS NULL)

    loop For each external doc
        alt New doc
            SyncEngine->>Storage: uploadFile(content as .txt)
            SyncEngine->>DB: INSERT document (pending)
            SyncEngine->>Processor: processDocumentAsync (fire & forget)
        else Content hash changed
            SyncEngine->>Storage: uploadFile(new content)
            SyncEngine->>DB: UPDATE document (pending)
            SyncEngine->>Processor: processDocumentAsync (fire & forget)
        else Unchanged / excluded
            SyncEngine->>SyncEngine: skip
        end
    end

    alt Full sync mode
        SyncEngine->>DB: soft-delete stale docs not seen in source
    end

    SyncEngine->>DB: UPDATE sync log (completed)
    SyncEngine->>DB: UPDATE connector (status=active, nextSyncAt)

    User->>API: PATCH /connectors/:id/documents {operation: exclude, documentIds}
    API->>DB: UPDATE document SET userExcluded=true WHERE deletedAt IS NULL

    User->>API: PATCH /connectors/:id/documents {operation: restore, documentIds}
    API->>DB: UPDATE document SET userExcluded=false, deletedAt=null, enabled=true
Loading

Last reviewed commit: eec872d

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

88 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@waleedlatif1
Copy link
Collaborator Author

@greptile
@cursor review

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

89 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

@waleedlatif1
Copy link
Collaborator Author

@cursor review

@waleedlatif1
Copy link
Collaborator Author

bugbot review

@waleedlatif1
Copy link
Collaborator Author

@cursor review

@waleedlatif1
Copy link
Collaborator Author

bugbot review

@waleedlatif1
Copy link
Collaborator Author

@greptile review

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

88 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

@waleedlatif1
Copy link
Collaborator Author

@cursor review

@waleedlatif1
Copy link
Collaborator Author

@greptile

@waleedlatif1
Copy link
Collaborator Author

@greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

98 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

@waleedlatif1
Copy link
Collaborator Author

@cursor review

@waleedlatif1
Copy link
Collaborator Author

@greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

98 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

@waleedlatif1 waleedlatif1 changed the base branch from staging to feat/mothership-copilot March 5, 2026 21:19
waleedlatif1 and others added 2 commits March 5, 2026 13:25
…leedlatif1/resolve-kb-conflicts

# Conflicts:
#	apps/sim/app/workspace/[workspaceId]/w/[workflowId]/components/panel/components/editor/components/sub-block/components/credential-selector/components/oauth-required-modal.tsx
#	apps/sim/app/workspace/[workspaceId]/w/[workflowId]/components/panel/components/editor/components/sub-block/components/credential-selector/credential-selector.tsx
#	apps/sim/app/workspace/[workspaceId]/w/[workflowId]/components/panel/components/editor/components/sub-block/components/tool-input/components/tools/credential-selector.tsx
#	apps/sim/app/workspace/[workspaceId]/w/components/sidebar/components/settings-modal/components/integrations/integrations.tsx
#	apps/sim/blocks/blocks/airtable.ts
#	apps/sim/blocks/blocks/knowledge.ts
#	apps/sim/tools/airtable/list_bases.ts
#	apps/sim/tools/registry.ts
#	packages/db/migrations/meta/0155_snapshot.json
#	packages/db/migrations/meta/_journal.json
#	packages/db/schema.ts
Generated migration 0162 for knowledge_connector and
knowledge_connector_sync_log tables after resolving merge
conflicts with feat/mothership-copilot.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ge tools

- Extract shared computeContentHash to connectors/utils.ts (dedup across 7 connectors)
- Include error'd connectors in cron auto-retry query
- Add syncContext caching for Confluence (cloudId, spaceId)
- Batch Confluence label fetches with concurrency limit of 10
- Enforce maxPages in Confluence v2 path
- Clean up stale storage files on document update
- Retry stuck documents (pending/failed) after sync completes
- Soft-delete documents and reclaim tag slots on connector deletion
- Add incremental sync support to ConnectorConfig interface
- Fix offset:0 falsy check in list_documents tool

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…optimize API calls

- Extract shared htmlToPlainText to connectors/utils.ts (dedup Confluence + Google Drive)
- Add syncContext caching for Jira cloudId, Notion/Linear/Google Drive cumulative limits
- Fix cumulative maxPages/maxIssues/maxFiles enforcement across pagination pages
- Bump Notion page_size from 20 to 100 (5x fewer API round-trips)
- Batch Notion child page fetching with concurrency=5 (was serial N+1)
- Bump Confluence v2 limit from 50 to 250 (v2 API supports it)
- Pass syncContext through Confluence CQL path for cumulative tracking
- Upgrade GitHub tree truncation warning to error level
- Fix sync-engine test mock to include inArray export

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…te broken tests

- Add parseTagDate and joinTagArray helpers to connectors/utils.ts
- Update all 7 connectors to use shared tag mapping helpers (removes 12+ duplication instances)
- Fix Notion listFromParentPage cumulative maxPages check (was using local count)
- Rewrite 3 broken connector route test files to use vi.hoisted() + static vi.mock()
  pattern instead of deprecated vi.doMock/vi.resetModules (all 86 tests now pass)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@waleedlatif1
Copy link
Collaborator Author

@greptile

@waleedlatif1
Copy link
Collaborator Author

@cursor review

…se feedback

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@cursor
Copy link

cursor bot commented Mar 5, 2026

PR Summary

High Risk
Adds new connector CRUD/sync APIs, cron-triggered background sync, and connector-driven document updates/exclusions, touching core data paths (documents/tags) and OAuth token usage. Bugs here could cause unintended document deletion/exclusion or runaway sync scheduling.

Overview
Adds knowledge-base connectors end-to-end: new connector list/create/update/delete APIs (including source config validation, tag-slot allocation, and soft-delete cascading), per-connector document listing plus exclude/restore operations, manual sync trigger, and a cron scheduler endpoint that dispatches due sync jobs.

Updates the knowledge base UI to connect/manage sources (AddConnectorModal, ConnectorsSection, EditConnectorModal with sync history + per-document exclude/restore), show connector icons/source links, and warn that deleting connector-synced docs permanently excludes them.

Expands Knowledge and Airtable capabilities: Knowledge block/docs now cover full document/chunk/tag CRUD (including list/delete/update tools) and the documents API gains tagFilters query support; Airtable block/docs add get_base_schema, and a new airtableConnector syncs table records (schema-aware text extraction + hashing) and is registered in CONNECTOR_REGISTRY.

Written by Cursor Bugbot for commit bd07ab5. Configure here.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

…, fix offset=0

- Escape %, _, \ in tag filter LIKE patterns to prevent incorrect matches
- Add isNull(deletedAt) guard to restore operation to prevent un-deleting soft-deleted docs
- Change offset check from falsy to != null so offset=0 is not dropped

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@waleedlatif1 waleedlatif1 merged commit dbef14b into feat/mothership-copilot Mar 5, 2026
3 checks passed
@waleedlatif1 waleedlatif1 deleted the feat/kb branch March 5, 2026 23:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants