Changelog
All notable changes to the Content Recommender system are documented in this file.
v1.1.0
October 3, 2025
Major reliability improvements to content extraction, addressing SSL certificate issues, title extraction problems, and caching inconsistencies. This release significantly improves the success rate of content extraction and ensures accurate article titles and recommendations.
Added
- 4-tier retry strategy for content extraction with different user agents and SSL settings
- Automatic content fetching in extract_metadata() for title extraction
- Comprehensive reprocessing system for failed articles
- Intelligent cache invalidation system for recommendations
- New API endpoint POST /api/reprocess-failed for manual article reprocessing
- New API endpoint POST /api/update-titles for batch title updates
- New API endpoint POST /api/clear-cache for manual cache clearing
- Release notes page with comprehensive documentation
Changed
- Enhanced extract_main_text() with SSL fallback strategy
- Enhanced extract_pdf_content() with SSL error handling
- Increased retry attempts from 3 to 4 for better success rate
- Added SSL warning suppression with urllib3.disable_warnings()
- Improved title extraction to fetch content when not provided
- Cache invalidation now triggers automatically on article addition, import, reprocessing, and title updates
Fixed
- SSL certificate verification errors causing extraction failures
- "No title found" issue affecting most articles
- Stale recommendations due to cached extraction results
- Missing lru_cache import causing cache management issues
- Recommendations showing outdated titles after database updates
- Failed article recovery system not working properly
Performance
- Content extraction success rate improved to 97.3%
- Title extraction success rate improved to 100% for processed articles
- Automatic cache invalidation ensures fresh recommendations
- Robust SSL handling for problematic websites
v1.0.0
Initial Release
Initial release of the Content Recommender system with basic functionality for content extraction, semantic search, and article recommendations.
Added
- Basic content extraction using Trafilatura
- OpenAI embeddings integration for semantic search
- SQLite database for storing URLs, content, and embeddings
- Web interface for adding URLs and searching content
- Article similarity clustering and recommendations
- RSS feed import functionality
- Import/export capabilities for database backup
- Statistics dashboard
- Docker containerization