Skip to content

skill-seekers/Skill_Seekers

MCP Developer Tools

Transform 17 source types (docs, GitHub repos, PDFs, videos, Jupyter, Confluence, Notion, Slack/Discord) into AI-ready skills and RAG knowledge. 35 MCP tools for scraping, packaging, enhancing, and exporting to vector databases (Weaviate, Chroma, FAISS, Qdrant). Supports 16+ target platforms.

Python Latest v3.7.0 · 4d ago Security brief →

Features

  • Converts documentation, repos, PDFs, videos, notebooks and more into structured knowledge assets
  • Generates ready‑to‑use packages for Claude, Gemini, OpenAI/GPT, LangChain, LlamaIndex, Haystack, Pinecone, ChromaDB, FAISS, Qdrant, IBM Bob and Cursor/Windsurf/Cline AI coding assistants
  • One command (skill-seekers create) produces a universal asset that can be exported to multiple targets

Recent releases

View all 20 releases →
No immediate action
v3.7.0 Breaking risk

scan command + opt-in submission

v3.6.0 Breaking risk
Notable features
  • IBM Bob packaging target via `--target bob`
  • GitHub scraper filters: issue state, labels, and since date
  • Per-issue Markdown files for GitHub issues
Full changelog

[3.6.0] - 2026-05-03

Theme: Quality-of-life release — packaging targets, GitHub issue workflow, codebase analysis fixes, and source detection hardening.

Added

  • IBM Bob packaging target — new --target bob adaptor and agent install support for IBM's Bob agent platform (#366)
  • GitHub issue filtering--github-issue-state, --github-issue-labels, and --github-issue-since filters in the GitHub scraper for narrowing which issues are pulled (#367)
  • Per-issue files — GitHub scraper now writes one Markdown file per issue instead of a single bundle, improving navigation and downstream chunking (#367)
  • Pinecone frontmatter — Pinecone vector exports now include consistent YAML frontmatter for metadata round-tripping (#367)

Fixed

  • Unified scraper now generates codebase_analysis/ index — local sources were producing C3.x outputs with broken SKILL.md links; the unified skill builder now wires up the index and resolves links correctly (#362, #376)
  • Guides fallback fires correctlyunified_skill_builder was emitting a truthy placeholder for empty guides which suppressed the fallback content; placeholder removed (#364, #375)
  • HTML URLs no longer treated as local filessource_detector now checks for http(s):// before falling through to the local-path branch, fixing false-positive routing (#373)
  • PDF extracted images appear in markdownpdf_scraper now inserts ![](…) references for images extracted from PDFs so they render in the generated SKILL.md (#369)
  • C3.x output for local sourcesunified command was skipping the C3.x analysis pipeline for local codebase sources; now emits the full pattern/test/guide/config/router output (#363, #372)
  • Language filter passed to C3.x clone analysis — repos cloned for analysis now respect --languages instead of analyzing every file (fixes #361, #370)
  • Unity vs Unreal detection — Unity projects with C# imports were being misidentified as Unreal; detection now keys on C# import patterns (fixes #365, #368)
v3.5.1 Breaking risk
Breaking changes
  • max_pages default changed from 500 to -1 (unlimited)
  • removal of hardcoded magic numbers in constants.py; now reads defaults.json
Notable features
  • Centralized `defaults.json` config as single source of truth for all default values
  • Low‑signal code snippet filtering via `_is_low_signal_code_snippet()`
  • Pattern description normalization with `_normalize_pattern_description()`
Full changelog

[3.5.1] - 2026-04-12

Added

  • Centralized defaults.json config — single source of truth for all default values (rate_limit, max_pages, workers, async_mode, enhancement, analysis, RAG settings). New defaults.py loader module. All 15+ files that previously hardcoded defaults now read from this file (#356)
  • Low-signal code snippet filtering_is_low_signal_code_snippet() filters junk patterns like bare True, options, single identifiers from quick references (#360)
  • Pattern description normalization_normalize_pattern_description() cleans boilerplate prefixes and truncates to first meaningful sentence (#360)
  • Example language priority ranking_example_language_priority() ranks Python > Bash > JSON > etc. for SKILL.md examples (#360)
  • checkpoint_exists() method on DocToSkillConverter — was called but never defined (#360)
  • Unified config source normalizationDocToSkillConverter.__init__ merges fields from sources[0] into flat config for compatibility (#360)
  • display_name support in SKILL.md generation — produces cleaner titles and slugs (#360)
  • New tests: test_doc_scraper_entrypoint.py (regression for _run_scraping), quick-reference quality tests, docs-only compatibility tests, nested reference coverage tests (#360)

Changed

  • max_pages default is now unlimited (-1) — the scraper fetches all pages unless the user explicitly sets --max-pages. Previously defaulted to 500 (#356)
  • --no-rate-limit flag now works — was defined in CLI arguments but never consumed by ExecutionContext (#356)
  • constants.py reads from defaults.json — no longer contains hardcoded magic numbers (#356)
  • ExecutionContext.ScrapingSettingsrate_limit and max_pages now use real defaults instead of None, preventing None-poisoning downstream (#356)
  • SKILL.md frontmatter cleanup — empty doc_version: and version: fields are now omitted; placeholder sections removed (#360)
  • Enhancement routing through platform adaptors instead of importing nonexistent enhance_skill_md helper (#360)
  • quality_metrics.py uses rglob for nested reference directories in unified skills (#360)

Fixed

  • TypeError: '>' not supported between instances of 'NoneType' and 'int'rate_limit defaulted to None in ExecutionContext, which flowed through config.get("rate_limit", DEFAULT) (dict.get returns None when the key exists with value None, ignoring the fallback). Fixed in doc_scraper.py (sync + async paths), estimate_pages.py, and sync_config.py (#356, #359)
  • discover_urls() loop never executed with unlimited max_pageslen(discovered) < -1 is always False. Added unlimited mode guard (#356)
  • converter.scrape() called nonexistent method in _run_scraping() — changed to converter.scrape_all() (#360)
  • None-safety for BeautifulSoup attributeslink["href"], sitemap.text, meta_desc["content"] guarded against None XML text nodes (#360)
  • Python 3.10 compatibility — backslash in f-string in quality_metrics.py not supported before 3.12 (#360)
v3.5.0 Breaking risk
⚠ Upgrade required
  • All content extraction features (pattern detection, test examples, how‑to guides, config extraction, router generation) are now enabled by default; no opt‑in required
  • Dynamic routing via `_build_argv()` replaces manual argument forwarding and adds 7 previously missing CLI flags
Breaking changes
  • Renamed `claude-enhanced` merge mode to `ai-enhanced` (backward‑compatible alias retained)
  • Removed hardcoded Claude references across the codebase
  • Removed GitHub API analysis limit of 50 files and config extraction limit of 100 files
Security fixes
  • Removed command injection vulnerability from cloned repo script execution
  • Replaced `git add -A` with targeted staging in marketplace publisher
  • Cleared auth tokens from cached `.git/config` after clone
Notable features
  • Grand Unification: single `create` command for 18 source types with auto‑detection and direct converters
  • Agent‑agnostic `AgentClient` abstraction supporting Claude, Kimi, Codex, Copilot, OpenCode, and custom agents via API‑key detection
  • Headless browser rendering (`--browser` flag) using Playwright to handle JavaScript SPAs
Full changelog

[3.5.0] - 2026-04-09

Theme: Grand Unification — one command, one interface, direct converters. Agent-agnostic architecture, marketplace pipeline, smart SPA discovery, all content extraction enabled by default. 80+ files changed across the codebase.

Added

  • Grand Unification — unified create command as single entry point for all 18 source types with auto-detection, direct converter invocation, and centralized enhancement (#346)
  • Agent-agnostic AgentClient abstraction — all 5 enhancers now support Claude, Kimi, Codex, Copilot, OpenCode, and custom agents via a unified interface. Auto-detects agent from API keys instead of hardcoding (#336)
  • Kimi CLI integration with stdin piping and output parsing (#336)
  • MarketplacePublisher — publish skills to Claude Code plugin marketplace repos (#336)
  • MarketplaceManager — register and manage marketplace repositories (#336)
  • ConfigPublisher — push configs to registered config source repos (#336)
  • push_config MCP tool for automated config publishing (#336)
  • Smart SPA discovery engine — three-layer discovery: sitemap.xml, llms.txt, SPA nav rendering (#336)
  • "browser": true config support for JavaScript SPA sites with browser renderer timeout defaults (60s, domcontentloaded) (#336)
  • Dynamic routing via _build_argv() — replaced manual arg forwarding with dynamic forwarder, added 7 missing CLI flags (#336)
  • Kotlin language support for codebase analysis — Full C3.x pipeline support: AST parsing (classes, objects, functions, data/sealed classes, extension functions, coroutines), dependency extraction, design pattern recognition (object declaration→Singleton, companion object→Factory, sealed class→Strategy), test example extraction (JUnit, Kotest, MockK, Spek), language detection patterns, config detection (build.gradle.kts), and extension maps across all analyzers (#287)
  • Headless browser rendering (--browser flag) — uses Playwright to render JavaScript SPA sites (React, Vue, etc.) that return empty HTML shells. Auto-installs Chromium on first use. Optional dep: pip install "skill-seekers[browser]" (#321)
  • skill-seekers doctor command — 8 diagnostic checks (Python version, package install, git, core/optional deps, API keys, MCP server, output dir) with pass/warn/fail status and --verbose flag (#316)
  • Prompt injection check workflow — bundled prompt-injection-check workflow scans scraped content for injection patterns (role assumption, instruction overrides, delimiter injection, hidden instructions). Added as first stage in default and security-focus workflows. Flags suspicious content without removing it (#324)
  • Codex CLI plugin manifest (.codex-plugin/plugin.json) for OpenAI Codex integration (#350)
  • 6 behavioral UML diagrams — 3 sequence (create pipeline, GitHub+C3.x flow, MCP invocation), 2 activity (source detection, enhancement pipeline), 1 component (runtime dependencies with interface contracts)
  • 134 new teststest_agent_client.py, test_config_publisher.py, _build_argv tests. Total: 3194 passed, 39 expected skips (#336)

Changed

  • All content extraction features enabled by default — pattern detection, test examples, how-to guides, config extraction, and router generation no longer require explicit opt-in
  • Renamed claude-enhanced merge mode to ai-enhanced — backward compatibility alias kept (#336)
  • Removed 118+ hardcoded Claude references across 60+ files (#336)
  • Refactored 5 enhancers to use AgentClient abstraction (#336)
  • Removed 50-file GitHub API analysis limit (#336)
  • Removed 100-file config extraction limit (#336)
  • Fixed unified scraper default max_pages from 100 to 500 (#336)
  • Centralized enhancement timeouts to 45min default with unlimited support (#336)
  • Excluded slow MCP/e2e tests from CI coverage step to prevent timeout

Fixed

  • glob('*.md') replaced with rglob('*.md') in all adaptors — fixes packaging when skills are in nested directories (#349)
  • scraped_data list-vs-dict bug in conflict detection (#336)
  • base_url passthrough to doc scraper subprocess (#336)
  • URL filtering now uses base directory correctly (#336)
  • C3.x analysis data loss (#336)
  • --enhance-level flag not passed correctly (#336)
  • guide_enhancer method rename_call_claude_api renamed to _call_ai (#336)
  • 11 pre-existing test failures fixed (#336)
  • Per-file language detection in GitHub scraper (#336)
  • GitHub language detection crashes with TypeError when API response contains non-integer metadata keys (e.g., "url") — now filters to integer values only (#322)
  • C3.x codebase analysis crashes with TypeError_run_c3_analysis() and _analyze_c3x() passed removed enhance_with_ai/ai_mode kwargs to analyze_codebase() instead of enhance_level (#323)

Security

  • Removed command injection via cloned repo script execution (#336)
  • Replaced git add -A with targeted staging in marketplace publisher (#336)
  • Clear auth tokens from cached .git/config after clone (#336)
  • Use defusedxml for sitemap XML parsing (XXE protection) (#336)
  • Path traversal validation for config names (#336)
v3.4.0 New feature
Notable features
  • 8 new LLM platform adaptors (OpenCode, Kimi, DeepSeek, Qwen, OpenRouter, Together AI, Fireworks AI) bringing total to 12
  • 7 new CLI agent install paths (roo, cline, aider, bolt, kilo, continue, kimi-code) raising count to 18
  • OpenCode skill tools: auto‑splitter and bi‑directional converter
Full changelog

What's New in v3.4.0

Theme: 8 new LLM platform adaptors (12 total), 7 new CLI agent paths (18 total), OpenCode skill tools, SPA site detection, 8 bug fixes, and full UML architecture documentation.

Platform Expansion: 5 → 12 LLM Targets

| New Platform | Flag | Base |
|---|---|---|
| OpenCode | --target opencode | Directory-based, dual YAML |
| Kimi | --target kimi | OpenAI-compatible |
| DeepSeek | --target deepseek | OpenAI-compatible |
| Qwen | --target qwen | OpenAI-compatible |
| OpenRouter | --target openrouter | OpenAI-compatible |
| Together AI | --target together | OpenAI-compatible |
| Fireworks AI | --target fireworks | OpenAI-compatible |

All new platforms inherit from a shared OpenAI-compatible base class for consistent behavior.

Agent Expansion: 11 → 18 Install Paths

New agents: roo, cline, aider, bolt, kilo, continue, kimi-code

OpenCode Skill Tools

  • Skill splitter — auto-split large docs into focused sub-skills with router
  • Bi-directional converter — import/export between OpenCode and any platform format

Distribution

  • Smithery manifest (smithery.yaml)
  • GitHub Actions template for automated skill updates
  • Claude Code Plugin with slash commands

Bug Fixes

  • sanitize_url() crash on Python 3.14 strict urlparse (#284)
  • Blind /index.html.md append breaking non-Docusaurus sites (#277)
  • Unified scraper temp config format (#317)
  • Unicode arrows breaking Windows cp1252 terminals
  • CLI flags in plugin slash commands
  • MiniMax adaptor improvements (#319)
  • Misleading "Scraped N pages" count — now shows (N saved, M skipped) (#320)
  • SPA site detection — warns when site requires JavaScript rendering (#320, #321)

Documentation

  • Full UML architecture — 14 class diagrams synced from source code via StarUML
  • StarUML HTML API reference export
  • Ecosystem section linking all Skill Seekers repos
  • Architecture references in README and CONTRIBUTING
  • Consolidated Docs/ into docs/

Test Results

2929 passed, 39 skipped, 0 failures

Install / Upgrade

pip install --upgrade skill-seekers

Full changelog: https://github.com/yusufkaraaslan/Skill_Seekers/blob/main/CHANGELOG.md

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

About

Stars
13,909
Forks
1,438
Languages
Python Shell Dockerfile

Install & Platforms

Install via
pip

Beta — feedback welcome: [email protected]