Skip to content

[Feat] KB Export, Folder Upload & Vision LLM for Image Processing#1207

Open
CREDO23 wants to merge 26 commits intoMODSetter:devfrom
CREDO23:feat/kb-export-and-folder-upload
Open

[Feat] KB Export, Folder Upload & Vision LLM for Image Processing#1207
CREDO23 wants to merge 26 commits intoMODSetter:devfrom
CREDO23:feat/kb-export-and-folder-upload

Conversation

@CREDO23
Copy link
Copy Markdown
Contributor

@CREDO23 CREDO23 commented Apr 9, 2026

Description

  • KB-level and folder-level export as ZIP (markdown files preserving folder structure)
  • Web folder upload with structure preservation
  • Vision LLM integration for image processing with document-parser fallback

API Changes

  • This PR includes API changes
    • GET /search-spaces/{id}/export?folder_id= — export KB/folder as ZIP
    • POST /documents/folder-upload — upload folder with structure

Change Type

  • Bug fix
  • New feature

Testing Performed

  • Tested locally
  • Manual/QA verification

Checklist

  • Follows project coding standards and conventions
  • No lint/build errors or new warnings
  • All relevant tests are passing

High-level PR Summary

This PR adds three major features: KB-level and folder-level export as ZIP files preserving folder structure, web folder upload with structure preservation, and Vision LLM integration for image processing with document-parser fallback. The Vision LLM integration allows images to be analyzed using a language model with vision capabilities (like GPT-4V or Claude with vision), falling back to the existing document parser if the vision LLM is unavailable or fails. The export functionality creates ZIP archives of markdown documents organized by folder hierarchy, with batch processing to handle large knowledge bases efficiently. The folder upload feature enables users to upload entire directory structures via the web interface, maintaining the folder hierarchy. Additionally, a hydration error in the mobile upload drop zone was fixed by replacing a button-in-button structure with a proper interactive div.

⏱️ Estimated Review Time: 30-90 minutes

💡 Review Order Suggestion
Order File Path
1 surfsense_backend/app/utils/file_extensions.py
2 surfsense_backend/app/etl_pipeline/file_classifier.py
3 surfsense_backend/app/etl_pipeline/parsers/vision_llm.py
4 surfsense_backend/app/etl_pipeline/etl_pipeline_service.py
5 surfsense_backend/app/tasks/document_processors/file_processors.py
6 surfsense_backend/app/tasks/connector_indexers/local_folder_indexer.py
7 surfsense_backend/tests/unit/etl_pipeline/test_etl_pipeline_service.py
8 surfsense_backend/tests/unit/utils/test_file_extensions.py
9 surfsense_backend/app/services/export_service.py
10 surfsense_backend/app/routes/export_routes.py
11 surfsense_backend/app/routes/__init__.py
12 surfsense_backend/app/routes/documents_routes.py
13 surfsense_web/lib/apis/documents-api.service.ts
14 surfsense_web/messages/en.json
15 surfsense_web/messages/es.json
16 surfsense_web/messages/hi.json
17 surfsense_web/messages/pt.json
18 surfsense_web/messages/zh.json
19 surfsense_web/components/sources/DocumentUploadTab.tsx
20 surfsense_web/components/documents/DocumentsFilters.tsx
21 surfsense_web/components/documents/FolderNode.tsx
22 surfsense_web/components/documents/FolderTreeView.tsx
23 surfsense_web/components/layout/ui/sidebar/DocumentsSidebar.tsx

Need help? Join our Discord

Analyze latest changes

CREDO23 added 23 commits April 9, 2026 11:18
Copy link
Copy Markdown

@recurseml recurseml bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by RecurseML

🔍 Review performed on b96c049..7e14df6

✨ No bugs found, your code is sparkling clean

✅ Files analyzed, no issues (22)

surfsense_backend/app/etl_pipeline/etl_pipeline_service.py
surfsense_backend/app/etl_pipeline/file_classifier.py
surfsense_backend/app/etl_pipeline/parsers/vision_llm.py
surfsense_backend/app/routes/__init__.py
surfsense_backend/app/routes/documents_routes.py
surfsense_backend/app/routes/export_routes.py
surfsense_backend/app/services/export_service.py
surfsense_backend/app/tasks/connector_indexers/local_folder_indexer.py
surfsense_backend/app/tasks/document_processors/file_processors.py
surfsense_backend/app/utils/file_extensions.py
surfsense_backend/tests/unit/etl_pipeline/test_etl_pipeline_service.py
surfsense_backend/tests/unit/utils/test_file_extensions.py
surfsense_web/components/documents/DocumentsFilters.tsx
surfsense_web/components/documents/FolderNode.tsx
surfsense_web/components/documents/FolderTreeView.tsx
surfsense_web/components/layout/ui/sidebar/DocumentsSidebar.tsx
surfsense_web/components/sources/DocumentUploadTab.tsx
surfsense_web/messages/en.json
surfsense_web/messages/es.json
surfsense_web/messages/hi.json
surfsense_web/messages/pt.json
surfsense_web/messages/zh.json

@vercel
Copy link
Copy Markdown

vercel bot commented Apr 9, 2026

@CREDO23 is attempting to deploy a commit to the Rohan Verma's projects Team on Vercel.

A member of the Team first needs to authorize it.

@CREDO23 CREDO23 marked this pull request as draft April 9, 2026 19:32
@CREDO23 CREDO23 marked this pull request as ready for review April 10, 2026 15:38
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CREDO23 update migration no

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants