Why Vector Databases Are Essential for AI
Every RAG pipeline, semantic search engine, and recommendation system depends on vector similarity search. As embedding models produce vectors with hundreds or thousands of dimensions, traditional databases struggle with this workload. Purpose-built vector databases use specialized indexing algorithms (HNSW, IVF, DiskANN) to perform approximate nearest-neighbor search at scale.
Choosing the right vector database for a Kubernetes deployment requires evaluating performance, operational complexity, scalability, and ecosystem fit. We benchmarked the three most popular open-source options — Qdrant, Weaviate, and Milvus — across these dimensions.
The Contenders
Qdrant — Written in Rust, focused on performance and simplicity. Offers a clean gRPC/REST API and supports advanced filtering alongside vector search.
Weaviate — Written in Go, positions itself as an "AI-native" database with built-in vectorization modules. Supports GraphQL queries and hybrid (vector + keyword) search out of the box.
Milvus — A CNCF project written in Go/C++, designed for massive scale. Uses a distributed architecture with separate storage and compute components.
Kubernetes Deployment Comparison
Qdrant
Qdrant's Helm chart deploys a StatefulSet with straightforward configuration:
helm repo add qdrant https://qdrant.github.io/qdrant-helm
helm install qdrant qdrant/qdrant \
--set replicaCount=3 \
--set persistence.size=50Gi \
--set resources.requests.memory=4Gi \
--set resources.requests.cpu=2Qdrant's single-binary architecture means fewer moving parts. A 3-node cluster requires exactly 3 pods, a headless service, and persistent volumes. Replication and sharding are configured at the collection level via the API.
Operational complexity: Low. Upgrades are rolling StatefulSet updates. Backup is a snapshot API call.
Weaviate
Weaviate also deploys as a StatefulSet but has optional module pods for vectorization:
helm repo add weaviate https://weaviate.github.io/weaviate-helm
helm install weaviate weaviate/weaviate \
--set replicas=3 \
--set storage.size=50Gi \
--set modules.text2vec-transformers.enabled=trueEnabling vectorization modules adds sidecar containers or separate deployments, increasing resource consumption. If you're generating embeddings externally (which we recommend for production), disable these modules.
Operational complexity: Medium. Module management adds configuration surface. Schema management requires more planning upfront.
Milvus
Milvus has the most complex architecture with separate components:
helm repo add milvus https://milvus-io.github.io/milvus-helm
helm install milvus milvus/milvus \
--set cluster.enabled=true \
--set etcd.replicaCount=3 \
--set minio.mode=distributed \
--set pulsar.enabled=trueA distributed Milvus deployment includes: proxy nodes, query nodes, data nodes, index nodes, etcd (metadata), MinIO (object storage), and Pulsar (message queue). That's 15+ pods minimum for a production cluster.
Operational complexity: High. More components mean more failure modes, more monitoring, and more upgrade coordination.
Performance Benchmarks
We benchmarked all three databases on identical Kubernetes infrastructure (3x m5.2xlarge nodes) using a dataset of 5 million 768-dimensional vectors (OpenAI embeddings):
Ingestion Speed (vectors/second)
| Database | Batch Insert (1000) | Single Insert |
|---|---|---|
| Qdrant | 45,000 | 2,800 |
| Weaviate | 32,000 | 1,900 |
| Milvus | 52,000 | 1,200 |
Milvus leads on batch ingestion due to its distributed write path. Qdrant wins on single-insert latency thanks to Rust's efficient memory handling.
Query Performance (queries/second at recall@10 > 0.95)
| Database | No Filter | With Filter (10% selectivity) |
|---|---|---|
| Qdrant | 4,200 | 3,800 |
| Weaviate | 3,100 | 2,400 |
| Milvus | 3,800 | 3,200 |
Qdrant consistently delivers the best query performance, especially with filtering. Its payload index allows efficient pre-filtering before vector search, avoiding the post-filter recall degradation that affects other databases.
Memory Efficiency (GB used at 5M vectors)
| Database | RAM Usage | Disk Usage |
|---|---|---|
| Qdrant | 8.2 GB | 12.1 GB |
| Weaviate | 11.4 GB | 18.3 GB |
| Milvus | 14.8 GB | 15.6 GB |
Qdrant is the most memory-efficient, partly due to Rust's zero-cost abstractions and its support for on-disk vector storage with memory-mapped files.
Filtered Search: The Production Differentiator
In production, you almost never search all vectors. You filter by tenant, category, date range, or access control. This is where the databases diverge significantly:
Qdrant supports native payload filtering with indexed fields, applied before the vector search. This maintains high recall even with restrictive filters.
Weaviate supports filtering through its GraphQL API with a where clause. Performance is good but degrades more steeply with complex multi-field filters.
Milvus has improved its filtering significantly with partition keys and scalar indexes, but complex filter expressions can still cause query planning overhead.
High Availability and Disaster Recovery
| Feature | Qdrant | Weaviate | Milvus |
|---|---|---|---|
| Replication | Yes (configurable per collection) | Yes (built-in) | Yes (segment-level) |
| Sharding | Yes (automatic or manual) | Yes (automatic) | Yes (channel-based) |
| Backup/Restore | Snapshot API | Backup modules | MinIO snapshots |
| Zero-downtime upgrade | Yes (rolling) | Yes (rolling) | Partial (component-dependent) |
| Cross-region | Manual (snapshot transfer) | Manual | Manual |
Our Recommendation
After deploying all three in production Kubernetes environments for clients:
- Choose Qdrant for most use cases. Best performance-to-complexity ratio. Ideal for teams that want a fast, reliable vector database without operational overhead. Our default recommendation at MBB AI Studio.
- Choose Weaviate if you need built-in hybrid search (vector + BM25) and prefer a GraphQL API. Good for teams already in the Weaviate ecosystem.
- Choose Milvus if you're operating at massive scale (100M+ vectors) and have a dedicated platform team to manage the infrastructure. Its distributed architecture shines at scale but is overkill for most deployments.
Conclusion
The vector database landscape is maturing rapidly. For Kubernetes-native deployments, operational simplicity should be weighted heavily alongside raw performance. A database that's 10% faster but requires 5x the operational effort isn't a good trade. Start with Qdrant for most AI applications, evaluate Weaviate for hybrid search needs, and reserve Milvus for truly large-scale scenarios. Whatever you choose, invest in proper Kubernetes operators, monitoring, and backup automation from day one.