217 lines
8.5 KiB
Markdown
217 lines
8.5 KiB
Markdown
# Changelog
|
|
|
|
All notable changes to the OpenClaw RAG Knowledge System will be documented in this file.
|
|
|
|
## [1.0.7] - 2026-02-14
|
|
|
|
### Fixed
|
|
- **Bug in chunk_messages()**: Fixed undefined variable `session_key` referenced in metadata generation
|
|
- Added `session_key` parameter to `chunk_messages()` function signature
|
|
- Fixed bug identified in ClawHub security scan report
|
|
- Pass `session_key` from ingestion loop to chunk_messages() call
|
|
- Resolves scope issue where function referenced non-existent variable
|
|
|
|
### Security
|
|
- Fixes code quality issue identified in security scan (bug in implementation)
|
|
|
|
---
|
|
|
|
## [1.0.6] - 2026-02-14
|
|
|
|
### Changed
|
|
- **Repository URL**: Updated git repository URL to https://openclaw-rag-skill.projects.theta42.com
|
|
- Updated in package.json, README.md, SKILL.md, and index.html
|
|
- **Website tracking**: Added analytics tracking script to index.html for usage statistics
|
|
- **Version bump**: Updated version to 1.0.6 in package.json and index.html footer
|
|
|
|
### Documentation
|
|
- Updated all repository references from git.theta42 to projects.theta42
|
|
- Updated footer version display on website
|
|
|
|
---
|
|
|
|
## [1.0.0] - 2026-02-11
|
|
|
|
### Added
|
|
- Initial release of RAG Knowledge System for OpenClaw
|
|
- Semantic search using ChromaDB with all-MiniLM-L6-v2 embeddings
|
|
- Multi-source indexing: sessions, workspace files, skill documentation
|
|
- CLI tools: rag_query.py, rag_manage.py, ingest_sessions.py, ingest_docs.py
|
|
- Python API: rag_query_wrapper.py for programmatic access
|
|
- Automatic integration wrapper: rag_context.py for transparent RAG queries
|
|
- RAG-enhanced agent wrapper: rag_agent.py
|
|
- Type filtering: search by document type (session, workspace, skill, memory)
|
|
- Document management: add, delete, reset collection
|
|
- Batch ingestion with intelligent chunking
|
|
- Session parser for OpenClaw event format
|
|
- Automatic daily updates via cron job
|
|
- Comprehensive documentation: README.md, SKILL.md
|
|
|
|
### Features
|
|
- **Semantic Search**: Find relevant context by meaning, not keywords
|
|
- **Local Vector Store**: ChromaDB with persistent storage (~100MB per 1,000 docs)
|
|
- **Zero Dependencies**: No API keys required (all-MiniLM-L6-v2 is free and local)
|
|
- **Smart Chunking**: Messages grouped by 20 with overlap for context
|
|
- **Multi-Format Support**: Python, JavaScript, Markdown, JSON, YAML, shell scripts
|
|
- **Automatic Updates**: Scheduled cron job runs daily at 4:00 AM UTC
|
|
- **State Tracking**: Avoids re-processing unchanged files
|
|
- **Debug Mode**: Verbose output for troubleshooting
|
|
|
|
### Bug Fixes
|
|
- Fixed duplicate ID errors by including chunk_index in hash generation
|
|
- Fixed session parser to handle OpenClaw event format correctly
|
|
- Fixed metadata conversion errors (all metadata values as strings)
|
|
|
|
### Performance
|
|
- Indexing speed: ~1,000 docs/minute
|
|
- Search time: <100ms (after embedding load)
|
|
- Embedding model: 79MB (cached locally)
|
|
- Storage: ~100MB per 1,000 documents
|
|
|
|
### Documentation
|
|
- Complete SKILL.md with OpenClaw integration guide
|
|
- Comprehensive README.md with examples and troubleshooting
|
|
- Inline help in all CLI tools
|
|
- Best practices and limitations documented
|
|
|
|
---
|
|
|
|
## [1.0.1] - 2026-02-11
|
|
|
|
### Added
|
|
- `package.json` with complete OpenClaw skill metadata
|
|
- `CHANGELOG.md` for version tracking
|
|
- `LICENSE` (MIT) for proper licensing
|
|
|
|
### Changed
|
|
- `package.json` explicitly declares NO required environment variables (fully local system)
|
|
- Documented data storage path: `~/.openclaw/data/rag/`
|
|
- Enhanced `README.md` with clearer installation instructions
|
|
- Added references to CHANGELOG, LICENSE, and package.json in README
|
|
- Clarified that no API keys or credentials are required
|
|
|
|
### Documentation
|
|
- Improved documentation transparency to meet security scanner best practices
|
|
- Clearly documented the fully-local nature of the system (no external dependencies)
|
|
|
|
---
|
|
|
|
## [1.0.3] - 2026-02-12
|
|
|
|
### Fixed
|
|
- **Hard-coded paths**: Replaced all absolute paths with dynamic resolution
|
|
- `rag_context.py`: Now uses `os.path.dirname(os.path.abspath(__file__))`
|
|
- `scripts/rag-auto-update.sh`: Uses `$HOME`, `OPENCLAW_DIR`, and relative paths
|
|
- Removed hard-coded `/home/william/.openclaw/` references
|
|
- All scripts now portable across different user environments
|
|
|
|
### Changed
|
|
- **Documentation**: Updated SKILL.md with path portability notes
|
|
- Documented that all paths use dynamic resolution
|
|
- Confirmed no custom network calls or external telemetry
|
|
- Added "Network Calls" section addressing security scan concerns
|
|
- **rag_query_wrapper.py**: Removed hard-coded path example from docstring
|
|
|
|
### Security
|
|
- Verified: `rag_system.py` has no network calls (only imports chromadb)
|
|
- Verified: `scripts/rag-auto-update.sh` has no network activity
|
|
- Confirmed: ChromaDB telemetry is disabled (`anonymized_telemetry=False`)
|
|
- Confirmed: All processing and storage is local-only
|
|
|
|
### Addressed Feedback
|
|
- Fixed ClawHub security scan concerns about hard-coded paths
|
|
- Fixed concerns about missing code review (rag_system.py is fully auditable)
|
|
- Documented network behavior (only model download by ChromaDB on first run)
|
|
|
|
---
|
|
|
|
## [1.0.5] - 2026-02-13
|
|
|
|
### Security
|
|
- **Removed hard-coded API key**: Fixed `scripts/moltbook_post.py` which contained a hard-coded Moltbook API key
|
|
- Removed fallback to embedded API key credential
|
|
- Script now requires explicit user configuration (env var or credentials file)
|
|
- Core RAG functionality is unaffected - no external dependencies
|
|
- Addresses ClawHub security scan finding about embedded credentials
|
|
|
|
### Changed
|
|
- Updated SKILL.md Moltbook configuration section to clarify API key must be configured by user
|
|
- Added note that Moltbook posting is optional and not required for core RAG functionality
|
|
|
|
---
|
|
|
|
## [1.0.4] - 2026-02-13
|
|
|
|
### Fixed
|
|
- **Hard-coded paths in launch_rag_agent.sh**: Fixed missing portability update from v1.0.3
|
|
- Replaced `/home/william/.openclaw/workspace/rag` with `os.path.expanduser("~/.openclaw/workspace/rag")`
|
|
- Replaced `/home/william/.local/bin/openclaw` with dynamic PATH resolution
|
|
- Now checks for `openclaw` in PATH first, then falls back to `~/.local/bin/openclaw`
|
|
- Proper error message if openclaw not found
|
|
|
|
### Security
|
|
- Removed all user-specific hard-coded paths from launch_rag_agent.sh
|
|
- Verified portability across different user environments
|
|
- Script now installs correctly in OpenClaw skill packages for any user
|
|
|
|
---
|
|
|
|
## [Unreleased]
|
|
|
|
### Planned
|
|
- API documentation indexing from external URLs
|
|
- Automatic re-indexing on file system events (inotify)
|
|
- Better chunking strategies for long documents
|
|
- Integration with external vector stores (Pinecone, Weaviate)
|
|
- Webhook notifications for automated content processing
|
|
- Hybrid search (semantic + keyword)
|
|
- Query history and analytics
|
|
- Export/import of vector database
|
|
|
|
---
|
|
|
|
## [1.0.2] - 2026-02-12
|
|
|
|
### Added
|
|
- YAML front matter to SKILL.md with `name: rag` and `description` for ClawHub compatibility
|
|
- `Security Considerations` section documenting privacy implications and sensitive data risks
|
|
- `scripts/rag-auto-update.sh` included in skill package (previously in separate location)
|
|
- `.skill` package for ClawHub distribution (28KB, 14 files)
|
|
|
|
### Changed
|
|
- Updated package.json description to match SKILL.md front matter
|
|
- Documented auto-update script behavior for security review (local-only ingestion)
|
|
- Clarified ChromaDB storage location and data deletion procedures
|
|
|
|
### Fixed
|
|
- **Cron job HTTP 500 errors**: Changed from `sessionTarget: "main"` to `isolated` to avoid flooding chat with thousands of lines of output
|
|
- **Cron schedule**: Fixed from `0 4 * * *` to `0 0 * * *` to match actual midnight UTC execution time
|
|
|
|
### Security
|
|
- Documented that RAG indexes all session transcripts and workspace files (may contain API keys, credentials, private messages)
|
|
- Added recommendations for privacy-conscious use: review sessions before ingestion, use `rag_manage.py reset` to delete all indexed data
|
|
- Confirmed auto-update script only runs local ingestion scripts - no remote code fetching
|
|
|
|
### Documentation
|
|
- Added detailed security warnings in SKILL.md
|
|
- Explained how to delete ChromaDB persistence directory (`~/.openclaw/data/rag/`)
|
|
- Provided guidance on redacting sensitive data before ingestion
|
|
|
|
---
|
|
|
|
## Version Guidelines
|
|
|
|
This project follows [Semantic Versioning](https://semver.org/):
|
|
|
|
- **MAJOR** version: Incompatible API changes
|
|
- **MINOR** version: Backwards-compatible functionality additions
|
|
- **PATCH** version: Backwards-compatible bug fixes
|
|
|
|
## Categories
|
|
|
|
- **Added**: New features
|
|
- **Changed**: Changes in existing functionality
|
|
- **Deprecated**: Soon-to-be removed features
|
|
- **Removed**: Removed features
|
|
- **Fixed**: Bug fixes
|
|
- **Security**: Security vulnerabilities |