Hugging Face Model Scout
Finds and compares Hugging Face models for a task, weighing license, size, latency, and benchmarks.
What it does
Hugging Face Model Scout turns an under-specified modeling need into a ranked, decision-ready shortlist of models on the Hugging Face Hub. Instead of pointing you at the most-downloaded checkpoint, it weighs the dimensions that actually decide a deployment:
- License — Apache-2.0 vs. MIT vs. Llama Community vs. Gemma vs. CC-BY-NC vs. gated/proprietary, with non-commercial and access restrictions flagged loudly.
- Size & memory footprint — parameter count translated into real memory at fp16 / int8 / 4-bit, including KV-cache overhead.
- Latency & throughput — characteristics for your specific hardware (single GPU, CPU-only, edge, serverless).
- Context length & modality — max sequence, input/output format, and whether the model even fits your task.
- Benchmarks — cited scores from the relevant leaderboard (MTEB, Open LLM Leaderboard, MMLU, HumanEval, Open ASR Leaderboard, …), not vague "best in class" claims.
It returns a comparison table, a recommendation with a runner-up and a budget option, and honest caveats about what you still need to verify.
Built on Claude Sonnet 4.6 (claude-sonnet-4-6) for a strong balance of reasoning quality and cost.
Setup
Set your Anthropic API key:
export ANTHROPIC_API_KEY=sk-ant-...
Then instantiate the agent with one of the configs below (YAML, JSON, TypeScript, CLI, or Python). All five represent the same agent — same model, same system prompt.
Example
Prompt:
I need a sentence-embedding model for semantic search over English support tickets. Must be commercially usable, run on CPU, and fit in ~2 GB RAM. Quality matters but latency more so.
Response (abridged):
Task: English semantic-search embeddings · CPU-only · commercial license · ≤2 GB RAM · latency-priority.
| Model | License | Params | ~Memory (fp32) | Context | MTEB (Retrieval avg) | Notes |
|---|
BAAI/bge-small-en-v1.5 | MIT | 33M | ~130 MB | 512 | strong for size | Fast on CPU, drop-in |
sentence-transformers/all-MiniLM-L6-v2 | Apache-2.0 | 22M | ~90 MB | 256 | solid baseline | Ubiquitous, very fast |
BAAI/bge-base-en-v1.5 | MIT | 109M | ~440 MB | 512 | higher than small | More accurate, ~3× slower |
Recommendation: Start with BAAI/bge-small-en-v1.5 — best quality-per-millisecond on CPU, MIT (commercial-safe), trivially under your RAM budget. Keep all-MiniLM-L6-v2 as the latency floor if bge-small is too slow at your QPS. If accuracy lags, bge-base-en-v1.5 still fits 2 GB and lifts retrieval quality.
Verify: Benchmark all three on your ticket set with sentence-transformers before committing — MTEB averages don't always predict domain performance. Confirm current MTEB numbers on the live leaderboard, since rankings shift.
Notes
- The agent cites benchmarks by name and refuses to invent leaderboard numbers — when a figure is uncertain it says so and points you to the model card or live leaderboard.
- Memory estimates are derived from parameter count and precision; the agent shows the arithmetic so you can sanity-check.
- Model availability, licenses, and benchmark rankings change frequently. Treat the shortlist as a starting point and confirm details on huggingface.co/models and the relevant leaderboard.
- For long candidate lists or large model cards, prefer streaming (
.stream() / client.messages.stream) to avoid request timeouts.