Vector Databases on Kubernetes: Qdrant vs Weaviate vs Milvus

Why Vector Databases Are Essential for AI

Every RAG pipeline, semantic search engine, and recommendation system depends on vector similarity search. As embedding models produce vectors with hundreds or thousands of dimensions, traditional databases struggle with this workload. Purpose-built vector databases use specialized indexing algorithms (HNSW, IVF, DiskANN) to perform approximate nearest-neighbor search at scale.

Choosing the right vector database for a Kubernetes deployment requires evaluating performance, operational complexity, scalability, and ecosystem fit. We benchmarked the three most popular open-source options — Qdrant, Weaviate, and Milvus — across these dimensions.

The Contenders

Qdrant — Written in Rust, focused on performance and simplicity. Offers a clean gRPC/REST API and supports advanced filtering alongside vector search.

Weaviate — Written in Go, positions itself as an "AI-native" database with built-in vectorization modules. Supports GraphQL queries and hybrid (vector + keyword) search out of the box.

Milvus — A CNCF project written in Go/C++, designed for massive scale. Uses a distributed architecture with separate storage and compute components.

Kubernetes Deployment Comparison

Qdrant

Qdrant's Helm chart deploys a StatefulSet with straightforward configuration:

bash

helm repo add qdrant https://qdrant.github.io/qdrant-helm
helm install qdrant qdrant/qdrant \
  --set replicaCount=3 \
  --set persistence.size=50Gi \
  --set resources.requests.memory=4Gi \
  --set resources.requests.cpu=2

Qdrant's single-binary architecture means fewer moving parts. A 3-node cluster requires exactly 3 pods, a headless service, and persistent volumes. Replication and sharding are configured at the collection level via the API.

Operational complexity: Low. Upgrades are rolling StatefulSet updates. Backup is a snapshot API call.

Weaviate

Weaviate also deploys as a StatefulSet but has optional module pods for vectorization:

bash

helm repo add weaviate https://weaviate.github.io/weaviate-helm
helm install weaviate weaviate/weaviate \
  --set replicas=3 \
  --set storage.size=50Gi \
  --set modules.text2vec-transformers.enabled=true

Enabling vectorization modules adds sidecar containers or separate deployments, increasing resource consumption. If you're generating embeddings externally (which we recommend for production), disable these modules.

Operational complexity: Medium. Module management adds configuration surface. Schema management requires more planning upfront.

Milvus

Milvus has the most complex architecture with separate components:

bash

helm repo add milvus https://milvus-io.github.io/milvus-helm
helm install milvus milvus/milvus \
  --set cluster.enabled=true \
  --set etcd.replicaCount=3 \
  --set minio.mode=distributed \
  --set pulsar.enabled=true

A distributed Milvus deployment includes: proxy nodes, query nodes, data nodes, index nodes, etcd (metadata), MinIO (object storage), and Pulsar (message queue). That's 15+ pods minimum for a production cluster.

Operational complexity: High. More components mean more failure modes, more monitoring, and more upgrade coordination.

Performance Benchmarks

We benchmarked all three databases on identical Kubernetes infrastructure (3x m5.2xlarge nodes) using a dataset of 5 million 768-dimensional vectors (OpenAI embeddings):

Ingestion Speed (vectors/second)

Database	Batch Insert (1000)	Single Insert
Qdrant	45,000	2,800
Weaviate	32,000	1,900
Milvus	52,000	1,200

Milvus leads on batch ingestion due to its distributed write path. Qdrant wins on single-insert latency thanks to Rust's efficient memory handling.

Query Performance (queries/second at recall@10 > 0.95)

Database	No Filter	With Filter (10% selectivity)
Qdrant	4,200	3,800
Weaviate	3,100	2,400
Milvus	3,800	3,200

Qdrant consistently delivers the best query performance, especially with filtering. Its payload index allows efficient pre-filtering before vector search, avoiding the post-filter recall degradation that affects other databases.

Memory Efficiency (GB used at 5M vectors)

Database	RAM Usage	Disk Usage
Qdrant	8.2 GB	12.1 GB
Weaviate	11.4 GB	18.3 GB
Milvus	14.8 GB	15.6 GB

Qdrant is the most memory-efficient, partly due to Rust's zero-cost abstractions and its support for on-disk vector storage with memory-mapped files.

Filtered Search: The Production Differentiator

In production, you almost never search all vectors. You filter by tenant, category, date range, or access control. This is where the databases diverge significantly:

Qdrant supports native payload filtering with indexed fields, applied before the vector search. This maintains high recall even with restrictive filters.

Weaviate supports filtering through its GraphQL API with a where clause. Performance is good but degrades more steeply with complex multi-field filters.

Milvus has improved its filtering significantly with partition keys and scalar indexes, but complex filter expressions can still cause query planning overhead.

High Availability and Disaster Recovery

Feature	Qdrant	Weaviate	Milvus
Replication	Yes (configurable per collection)	Yes (built-in)	Yes (segment-level)
Sharding	Yes (automatic or manual)	Yes (automatic)	Yes (channel-based)
Backup/Restore	Snapshot API	Backup modules	MinIO snapshots
Zero-downtime upgrade	Yes (rolling)	Yes (rolling)	Partial (component-dependent)
Cross-region	Manual (snapshot transfer)	Manual	Manual

Our Recommendation

After deploying all three in production Kubernetes environments for clients:

Choose Qdrant for most use cases. Best performance-to-complexity ratio. Ideal for teams that want a fast, reliable vector database without operational overhead. Our default recommendation at MBB AI Studio.

Choose Weaviate if you need built-in hybrid search (vector + BM25) and prefer a GraphQL API. Good for teams already in the Weaviate ecosystem.

Choose Milvus if you're operating at massive scale (100M+ vectors) and have a dedicated platform team to manage the infrastructure. Its distributed architecture shines at scale but is overkill for most deployments.

Conclusion

The vector database landscape is maturing rapidly. For Kubernetes-native deployments, operational simplicity should be weighted heavily alongside raw performance. A database that's 10% faster but requires 5x the operational effort isn't a good trade. Start with Qdrant for most AI applications, evaluate Weaviate for hybrid search needs, and reserve Milvus for truly large-scale scenarios. Whatever you choose, invest in proper Kubernetes operators, monitoring, and backup automation from day one.

Back to all articles