skill-seekers/Skill_Seekers
MCP Developer ToolsTransform 17 source types (docs, GitHub repos, PDFs, videos, Jupyter, Confluence, Notion, Slack/Discord) into AI-ready skills and RAG knowledge. 35 MCP tools for scraping, packaging, enhancing, and exporting to vector databases (Weaviate, Chroma, FAISS, Qdrant). Supports 16+ target platforms.
Features
- Converts documentation, repos, PDFs, videos, notebooks and more into structured knowledge assets
- Generates ready‑to‑use packages for Claude, Gemini, OpenAI/GPT, LangChain, LlamaIndex, Haystack, Pinecone, ChromaDB, FAISS, Qdrant, IBM Bob and Cursor/Windsurf/Cline AI coding assistants
- One command (skill-seekers create) produces a universal asset that can be exported to multiple targets
Recent releases
View all 20 releases →- IBM Bob packaging target via `--target bob`
- GitHub scraper filters: issue state, labels, and since date
- Per-issue Markdown files for GitHub issues
Full changelog
[3.6.0] - 2026-05-03
Theme: Quality-of-life release — packaging targets, GitHub issue workflow, codebase analysis fixes, and source detection hardening.
Added
- IBM Bob packaging target — new
--target bobadaptor and agent install support for IBM's Bob agent platform (#366) - GitHub issue filtering —
--github-issue-state,--github-issue-labels, and--github-issue-sincefilters in the GitHub scraper for narrowing which issues are pulled (#367) - Per-issue files — GitHub scraper now writes one Markdown file per issue instead of a single bundle, improving navigation and downstream chunking (#367)
- Pinecone frontmatter — Pinecone vector exports now include consistent YAML frontmatter for metadata round-tripping (#367)
Fixed
- Unified scraper now generates
codebase_analysis/index — local sources were producing C3.x outputs with broken SKILL.md links; the unified skill builder now wires up the index and resolves links correctly (#362, #376) - Guides fallback fires correctly —
unified_skill_builderwas emitting a truthy placeholder for empty guides which suppressed the fallback content; placeholder removed (#364, #375) - HTML URLs no longer treated as local files —
source_detectornow checks forhttp(s)://before falling through to the local-path branch, fixing false-positive routing (#373) - PDF extracted images appear in markdown —
pdf_scrapernow insertsreferences for images extracted from PDFs so they render in the generated SKILL.md (#369) - C3.x output for local sources —
unifiedcommand was skipping the C3.x analysis pipeline for local codebase sources; now emits the full pattern/test/guide/config/router output (#363, #372) - Language filter passed to C3.x clone analysis — repos cloned for analysis now respect
--languagesinstead of analyzing every file (fixes #361, #370) - Unity vs Unreal detection — Unity projects with C# imports were being misidentified as Unreal; detection now keys on C# import patterns (fixes #365, #368)
- max_pages default changed from 500 to -1 (unlimited)
- removal of hardcoded magic numbers in constants.py; now reads defaults.json
- Centralized `defaults.json` config as single source of truth for all default values
- Low‑signal code snippet filtering via `_is_low_signal_code_snippet()`
- Pattern description normalization with `_normalize_pattern_description()`
Full changelog
[3.5.1] - 2026-04-12
Added
- Centralized
defaults.jsonconfig — single source of truth for all default values (rate_limit,max_pages,workers,async_mode, enhancement, analysis, RAG settings). Newdefaults.pyloader module. All 15+ files that previously hardcoded defaults now read from this file (#356) - Low-signal code snippet filtering —
_is_low_signal_code_snippet()filters junk patterns like bareTrue,options, single identifiers from quick references (#360) - Pattern description normalization —
_normalize_pattern_description()cleans boilerplate prefixes and truncates to first meaningful sentence (#360) - Example language priority ranking —
_example_language_priority()ranks Python > Bash > JSON > etc. for SKILL.md examples (#360) checkpoint_exists()method onDocToSkillConverter— was called but never defined (#360)- Unified config source normalization —
DocToSkillConverter.__init__merges fields fromsources[0]into flat config for compatibility (#360) display_namesupport in SKILL.md generation — produces cleaner titles and slugs (#360)- New tests:
test_doc_scraper_entrypoint.py(regression for_run_scraping), quick-reference quality tests, docs-only compatibility tests, nested reference coverage tests (#360)
Changed
max_pagesdefault is now unlimited (-1) — the scraper fetches all pages unless the user explicitly sets--max-pages. Previously defaulted to 500 (#356)--no-rate-limitflag now works — was defined in CLI arguments but never consumed byExecutionContext(#356)constants.pyreads fromdefaults.json— no longer contains hardcoded magic numbers (#356)ExecutionContext.ScrapingSettings—rate_limitandmax_pagesnow use real defaults instead ofNone, preventing None-poisoning downstream (#356)- SKILL.md frontmatter cleanup — empty
doc_version:andversion:fields are now omitted; placeholder sections removed (#360) - Enhancement routing through platform adaptors instead of importing nonexistent
enhance_skill_mdhelper (#360) quality_metrics.pyusesrglobfor nested reference directories in unified skills (#360)
Fixed
TypeError: '>' not supported between instances of 'NoneType' and 'int'—rate_limitdefaulted toNoneinExecutionContext, which flowed throughconfig.get("rate_limit", DEFAULT)(dict.get returns None when the key exists with value None, ignoring the fallback). Fixed indoc_scraper.py(sync + async paths),estimate_pages.py, andsync_config.py(#356, #359)discover_urls()loop never executed with unlimitedmax_pages—len(discovered) < -1is always False. Added unlimited mode guard (#356)converter.scrape()called nonexistent method in_run_scraping()— changed toconverter.scrape_all()(#360)- None-safety for BeautifulSoup attributes —
link["href"],sitemap.text,meta_desc["content"]guarded against None XML text nodes (#360) - Python 3.10 compatibility — backslash in f-string in
quality_metrics.pynot supported before 3.12 (#360)
- All content extraction features (pattern detection, test examples, how‑to guides, config extraction, router generation) are now enabled by default; no opt‑in required
- Dynamic routing via `_build_argv()` replaces manual argument forwarding and adds 7 previously missing CLI flags
- Renamed `claude-enhanced` merge mode to `ai-enhanced` (backward‑compatible alias retained)
- Removed hardcoded Claude references across the codebase
- Removed GitHub API analysis limit of 50 files and config extraction limit of 100 files
- Removed command injection vulnerability from cloned repo script execution
- Replaced `git add -A` with targeted staging in marketplace publisher
- Cleared auth tokens from cached `.git/config` after clone
- Grand Unification: single `create` command for 18 source types with auto‑detection and direct converters
- Agent‑agnostic `AgentClient` abstraction supporting Claude, Kimi, Codex, Copilot, OpenCode, and custom agents via API‑key detection
- Headless browser rendering (`--browser` flag) using Playwright to handle JavaScript SPAs
Full changelog
[3.5.0] - 2026-04-09
Theme: Grand Unification — one command, one interface, direct converters. Agent-agnostic architecture, marketplace pipeline, smart SPA discovery, all content extraction enabled by default. 80+ files changed across the codebase.
Added
- Grand Unification — unified
createcommand as single entry point for all 18 source types with auto-detection, direct converter invocation, and centralized enhancement (#346) - Agent-agnostic
AgentClientabstraction — all 5 enhancers now support Claude, Kimi, Codex, Copilot, OpenCode, and custom agents via a unified interface. Auto-detects agent from API keys instead of hardcoding (#336) - Kimi CLI integration with stdin piping and output parsing (#336)
MarketplacePublisher— publish skills to Claude Code plugin marketplace repos (#336)MarketplaceManager— register and manage marketplace repositories (#336)ConfigPublisher— push configs to registered config source repos (#336)push_configMCP tool for automated config publishing (#336)- Smart SPA discovery engine — three-layer discovery: sitemap.xml, llms.txt, SPA nav rendering (#336)
"browser": trueconfig support for JavaScript SPA sites with browser renderer timeout defaults (60s, domcontentloaded) (#336)- Dynamic routing via
_build_argv()— replaced manual arg forwarding with dynamic forwarder, added 7 missing CLI flags (#336) - Kotlin language support for codebase analysis — Full C3.x pipeline support: AST parsing (classes, objects, functions, data/sealed classes, extension functions, coroutines), dependency extraction, design pattern recognition (object declaration→Singleton, companion object→Factory, sealed class→Strategy), test example extraction (JUnit, Kotest, MockK, Spek), language detection patterns, config detection (build.gradle.kts), and extension maps across all analyzers (#287)
- Headless browser rendering (
--browserflag) — uses Playwright to render JavaScript SPA sites (React, Vue, etc.) that return empty HTML shells. Auto-installs Chromium on first use. Optional dep:pip install "skill-seekers[browser]"(#321) skill-seekers doctorcommand — 8 diagnostic checks (Python version, package install, git, core/optional deps, API keys, MCP server, output dir) with pass/warn/fail status and--verboseflag (#316)- Prompt injection check workflow — bundled
prompt-injection-checkworkflow scans scraped content for injection patterns (role assumption, instruction overrides, delimiter injection, hidden instructions). Added as first stage indefaultandsecurity-focusworkflows. Flags suspicious content without removing it (#324) - Codex CLI plugin manifest (
.codex-plugin/plugin.json) for OpenAI Codex integration (#350) - 6 behavioral UML diagrams — 3 sequence (create pipeline, GitHub+C3.x flow, MCP invocation), 2 activity (source detection, enhancement pipeline), 1 component (runtime dependencies with interface contracts)
- 134 new tests —
test_agent_client.py,test_config_publisher.py,_build_argvtests. Total: 3194 passed, 39 expected skips (#336)
Changed
- All content extraction features enabled by default — pattern detection, test examples, how-to guides, config extraction, and router generation no longer require explicit opt-in
- Renamed
claude-enhancedmerge mode toai-enhanced— backward compatibility alias kept (#336) - Removed 118+ hardcoded Claude references across 60+ files (#336)
- Refactored 5 enhancers to use
AgentClientabstraction (#336) - Removed 50-file GitHub API analysis limit (#336)
- Removed 100-file config extraction limit (#336)
- Fixed unified scraper default
max_pagesfrom 100 to 500 (#336) - Centralized enhancement timeouts to 45min default with unlimited support (#336)
- Excluded slow MCP/e2e tests from CI coverage step to prevent timeout
Fixed
glob('*.md')replaced withrglob('*.md')in all adaptors — fixes packaging when skills are in nested directories (#349)scraped_datalist-vs-dict bug in conflict detection (#336)base_urlpassthrough to doc scraper subprocess (#336)- URL filtering now uses base directory correctly (#336)
- C3.x analysis data loss (#336)
--enhance-levelflag not passed correctly (#336)guide_enhancermethod rename —_call_claude_apirenamed to_call_ai(#336)- 11 pre-existing test failures fixed (#336)
- Per-file language detection in GitHub scraper (#336)
- GitHub language detection crashes with
TypeErrorwhen API response contains non-integer metadata keys (e.g.,"url") — now filters to integer values only (#322) - C3.x codebase analysis crashes with
TypeError—_run_c3_analysis()and_analyze_c3x()passed removedenhance_with_ai/ai_modekwargs toanalyze_codebase()instead ofenhance_level(#323)
Security
- Removed command injection via cloned repo script execution (#336)
- Replaced
git add -Awith targeted staging in marketplace publisher (#336) - Clear auth tokens from cached
.git/configafter clone (#336) - Use
defusedxmlfor sitemap XML parsing (XXE protection) (#336) - Path traversal validation for config names (#336)
- 8 new LLM platform adaptors (OpenCode, Kimi, DeepSeek, Qwen, OpenRouter, Together AI, Fireworks AI) bringing total to 12
- 7 new CLI agent install paths (roo, cline, aider, bolt, kilo, continue, kimi-code) raising count to 18
- OpenCode skill tools: auto‑splitter and bi‑directional converter
Full changelog
What's New in v3.4.0
Theme: 8 new LLM platform adaptors (12 total), 7 new CLI agent paths (18 total), OpenCode skill tools, SPA site detection, 8 bug fixes, and full UML architecture documentation.
Platform Expansion: 5 → 12 LLM Targets
| New Platform | Flag | Base |
|---|---|---|
| OpenCode | --target opencode | Directory-based, dual YAML |
| Kimi | --target kimi | OpenAI-compatible |
| DeepSeek | --target deepseek | OpenAI-compatible |
| Qwen | --target qwen | OpenAI-compatible |
| OpenRouter | --target openrouter | OpenAI-compatible |
| Together AI | --target together | OpenAI-compatible |
| Fireworks AI | --target fireworks | OpenAI-compatible |
All new platforms inherit from a shared OpenAI-compatible base class for consistent behavior.
Agent Expansion: 11 → 18 Install Paths
New agents: roo, cline, aider, bolt, kilo, continue, kimi-code
OpenCode Skill Tools
- Skill splitter — auto-split large docs into focused sub-skills with router
- Bi-directional converter — import/export between OpenCode and any platform format
Distribution
- Smithery manifest (
smithery.yaml) - GitHub Actions template for automated skill updates
- Claude Code Plugin with slash commands
Bug Fixes
sanitize_url()crash on Python 3.14 stricturlparse(#284)- Blind
/index.html.mdappend breaking non-Docusaurus sites (#277) - Unified scraper temp config format (#317)
- Unicode arrows breaking Windows cp1252 terminals
- CLI flags in plugin slash commands
- MiniMax adaptor improvements (#319)
- Misleading "Scraped N pages" count — now shows
(N saved, M skipped)(#320) - SPA site detection — warns when site requires JavaScript rendering (#320, #321)
Documentation
- Full UML architecture — 14 class diagrams synced from source code via StarUML
- StarUML HTML API reference export
- Ecosystem section linking all Skill Seekers repos
- Architecture references in README and CONTRIBUTING
- Consolidated
Docs/intodocs/
Test Results
2929 passed, 39 skipped, 0 failures
Install / Upgrade
pip install --upgrade skill-seekers
Full changelog: https://github.com/yusufkaraaslan/Skill_Seekers/blob/main/CHANGELOG.md
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.