Question 1

What is the cheapest production embedding model in 2026?

Accepted Answer

OpenAI text-embedding-3-small and Voyage voyage-3-lite tie at $0.02 per 1M input tokens, with Jina jina-embeddings-v3 also at $0.02 (and Apache-licensed for self-hosting). For workloads where storage cost dominates, voyage-3-lite's 512-dim output is half the storage of the OpenAI 1536-dim default.

Question 2

Which embedding model has the longest input context?

Accepted Answer

Voyage AI models (voyage-3, voyage-3-large, voyage-code-3) ship 32k input tokens, the longest in the catalog. OpenAI text-embedding-3 supports 8191. Cohere embed-v3 caps at 512 tokens, the shortest among hosted providers.

Question 3

What is a reranker and when do I need one?

Accepted Answer

A reranker is a second-stage RAG model that takes a query plus N candidate documents from your initial vector retrieval and re-scores them in pairwise fashion. Rerankers materially improve precision when your initial retriever returns a noisy top-K. Cohere rerank-v3.5 and Voyage rerank-2 are the production defaults; jina-reranker-v2 is the open-weights option.

Question 4

Can I self-host the open-weights embedding models?

Accepted Answer

Yes. Jina embeddings v3, Nomic Embed v1.5, mxbai-embed-large, and BGE-M3 are all open-weights and can be served via vLLM, Ollama, or text-embeddings-inference. The license column distinguishes Apache-2.0, MIT, and CC-BY-NC-4.0 (Jina is non-commercial; the rest are commercially usable).

Embedding Models