QUICK INFO BOX
| Attribute | Details |
|---|---|
| Company Name | Weaviate B.V. |
| Founders | Bob van Luijt (CEO), Etienne Dilocker (CTO) |
| Founded Year | 2019 |
| Headquarters | Amsterdam, Netherlands (with San Francisco office) |
| Industry | Database Technology / Artificial Intelligence / Open Source |
| Sector | Vector Databases / AI Infrastructure / Machine Learning |
| Company Type | Private |
| Key Investors | Cortical Ventures, Zetta Venture Partners, Battery Ventures, IVP, Index Ventures |
| Funding Rounds | Seed, Series A, B |
| Total Funding Raised | $50 Million |
| Valuation | Undisclosed (estimated $500M+ February 2026) |
| Number of Employees | 100+ (February 2026) |
| Key Products / Services | Weaviate Cloud Services (WCS), Open-Source Vector Database, GraphQL API, Hybrid Search, Multi-modal Search, Multi-tenancy |
| Technology Stack | Go Language, GraphQL, HNSW Algorithm, gRPC, Kubernetes |
| Revenue (Latest Year) | Commercial adoption growing (February 2026) |
| Customer Base | 5+ Million downloads, 2,000+ companies using open-source, cloud adoption growing |
| Social Media | LinkedIn, Twitter, GitHub |
Introduction
Vector databases are essential AI infrastructure, yet most are proprietary black boxes. As AI applications exploded in 2021-2024 (ChatGPT, LangChain, LlamaIndex), every developer needed vector search for RAG (Retrieval-Augmented Generation), semantic search, and recommendations. Yet the options were limited:
- Proprietary managed services (Pinecone): Easy but vendor lock-in, limited customization, pricing uncertainty
- Traditional databases with vector extensions (PostgreSQL pgvector): Slow performance, not purpose-built for vectors
- Building from scratch: 6-12 months engineering effort, $500K-2M cost
What was missing: open-source, production-grade vector database that developers could self-host (full control), customize (extend functionality), and inspect (transparent algorithms)—while optionally using managed cloud when convenient.
Enter Weaviate, the open-source vector database combining GraphQL query API, hybrid search (semantic + keyword), multi-modal vectors (text, images, audio), and production reliability (HNSW indexing, 99.9% uptime). Founded in 2019 by Bob van Luijt (CEO, serial entrepreneur) and Etienne Dilocker (CTO, systems architect), Weaviate pioneered developer-friendly vector database—writing queries in GraphQL (familiar to web developers) rather than custom DSL, supporting complex filters/aggregations, and providing both self-hosted and fully managed cloud options.
As of February 2026, Weaviate has $50 million in funding from Cortical Ventures, Zetta Venture Partners, Battery Ventures, IVP, and Index Ventures. The open-source project has 5+ million downloads, 14,000+ GitHub stars, 2,000+ companies using it in production, and growing Weaviate Cloud Services (WCS) adoption. The platform powers AI applications at Zapier, Morningstar, StackOverflow, Instabase, and thousands of startups building semantic search, RAG chatbots, recommendation engines, and content discovery systems.
With 100+ employees, 20+ programming language clients (Python, JavaScript, Go, Java, etc.), and modular architecture (pluggable vectorizers, rerankers, storage backends), Weaviate has become the open-source alternative to proprietary vector databases—offering transparency, flexibility, and community-driven innovation.
What makes Weaviate revolutionary:
- Open-source transparency: Inspect algorithms, customize code, self-host with full control—avoiding vendor lock-in
- GraphQL API: Writing vector queries in familiar GraphQL syntax—lowering learning curve vs. custom query languages
- Hybrid search: Combining dense vectors (semantic) with BM25 sparse vectors (keyword) + reranking in single query
- Multi-modal support: Storing text, image, audio, video embeddings together—searching across modalities
- Modular architecture: Pluggable vectorizers (OpenAI, Cohere, Hugging Face), storage backends (local, cloud), rerankers (Cohere)
The market opportunity spans $100+ billion database market (open-source capturing 40-60% share historically), $200+ billion AI/ML infrastructure, and $50+ billion developer tools. Every company building AI applications needs vector search, and many prefer open-source for control, customization, and avoiding vendor lock-in.
Weaviate competes with Pinecone ($2.75B valuation, proprietary managed service), Qdrant ($28M funding, Rust-based open-source), Milvus (open-source, LF AI Foundation), Chroma ($20M funding, embedded database), and Elasticsearch (adding vector capabilities to traditional search). Weaviate differentiates through GraphQL query interface (unique), hybrid search sophistication (dense + sparse + reranking), modular architecture (bring-your-own-vectorizer), and European roots (GDPR compliance, data sovereignty).
The founding story reflects open-source philosophy: Bob van Luijt, after building semantic search startup (SeMI Technologies), recognized that semantic search infrastructure should be open-source commodity rather than proprietary secret—enabling innovation at application layer rather than infrastructure layer. After pivoting from consulting to product, van Luijt and Dilocker built Weaviate as open-source vector database, growing organic community to 14K+ GitHub stars before launching commercial cloud service.
This comprehensive article explores Weaviate’s journey from open-source project to the vector database powering 5+ million downloads and 2,000+ production deployments worldwide.
Founding Story & Background
The Semantic Search Vision (2016-2018)
Bob van Luijt, entrepreneur based in Amsterdam, founded SeMI Technologies in 2016 with vision: semantic search for enterprise data. Traditional keyword search fails when:
- Searching “affordable cars” should return “economical vehicles” (synonyms)
- Searching “data breach response” should return “incident management procedures” (related concepts)
- Searching “machine learning tutorials” should return Python/TensorFlow guides (implied context)
Semantic search—understanding meaning rather than exact keywords—requires embeddings (vector representations) and similarity search (finding nearest neighbors).
Van Luijt and team built semantic search consulting practice, implementing solutions for enterprises using academic research (Word2Vec, BERT embeddings) and custom infrastructure. Yet every project reinvented same infrastructure:
- Embedding generation: Running BERT, transforming text to vectors
- Vector storage: Building databases handling millions of vectors
- Similarity search: Implementing HNSW, FAISS, Annoy algorithms
- Query interface: Building APIs for inserting/searching vectors
This repetition frustrated van Luijt: Why rebuild vector database for each project? The insight: Vector search infrastructure should be standardized, open-source commodity—like MySQL for relational data, Elasticsearch for text search.
2019: Open-Source Pivot
In 2019, van Luijt and Etienne Dilocker (CTO, systems architect with distributed systems expertise) pivoted SeMI Technologies from consulting to product: Weaviate—open-source vector database.
Founding principles:
- Open-source first: Apache 2.0 license, transparent algorithms, community-driven
- GraphQL API: Leveraging GraphQL (Facebook’s query language) rather than inventing custom DSL—making vector queries familiar to web developers
- Modularity: Pluggable components (vectorizers, storage, rerankers)—avoiding lock-in to specific embedding models
- Production-ready: HNSW indexing (fast approximate search), horizontal scaling, high availability
The name “Weaviate” reflected weaving together diverse data sources (text, images, structured data) into unified semantic fabric.
2019-2020: Building Open-Source Community
From 2019-2020, van Luijt and Dilocker focused on community growth rather than revenue:
Strategy:
- GitHub-first: Publishing code, documentation, tutorials on GitHub
- Developer advocacy: Writing blog posts, speaking at conferences (Kafka Summit, PyData), recording tutorials
- Integration ecosystem: Building connectors for popular tools (Hugging Face, OpenAI, Cohere, TensorFlow)
Technical innovation:
- GraphQL schema: Defining vector database schema in GraphQL, enabling complex queries:
{
Get {
Article(
nearText: {concepts: ["AI safety"]}
limit: 5
) {
title
_additional {
certainty
}
}
}
} - HNSW indexing: Implementing Hierarchical Navigable Small World algorithm (state-of-art approximate nearest neighbor search)
- Cross-references: Linking objects across classes (like foreign keys in relational databases)
Early adopters were ML engineers and data scientists needing vector search for:
- Semantic search engines: Searching documentation, knowledge bases
- Recommendation systems: Finding similar products, content
- Content de-duplication: Detecting similar/duplicate text, images
By 2020, Weaviate reached 1,000+ GitHub stars, 10,000+ downloads, and dozens of production deployments.
2021: Seed Funding and Commercialization
In 2021, with ChatGPT hype building and RAG emerging as standard pattern, vector database demand exploded. Van Luijt and Dilocker raised seed funding to:
- Hire core team: Engineering, DevRel (developer relations), support
- Build cloud service: Weaviate Cloud Services (WCS) for managed hosting
- Expand integrations: Supporting more vectorizers (Cohere, Jina AI), LLM frameworks (LangChain)
Seed (2021): $16 Million
- Lead Investor: Cortical Ventures (AI infrastructure specialist)
- Additional Investors: Zetta Venture Partners, angel investors
- Purpose: Team expansion, Weaviate Cloud Services, ecosystem growth
Growth accelerated:
- 2020: 1,000 GitHub stars, 10K downloads
- 2021: 5,000 stars, 100K downloads (10x growth)
- 2022: 10,000 stars, 1M downloads (10x growth)
2022-2023: LangChain Integration and RAG Explosion
In November 2022, ChatGPT launched—creating explosion in AI application development. Developers wanted to build “ChatGPT for my company data,” requiring RAG (Retrieval-Augmented Generation):
RAG Pattern:
- Index company documents in vector database (Weaviate)
- User asks question, convert to embedding
- Retrieve relevant document chunks from Weaviate
- Pass context + question to LLM (GPT-4)
- Generate grounded answer
LangChain (Python/JS framework for LLM apps) integrated Weaviate as one of first vector stores:
from langchain.vectorstores import Weaviate
from langchain.embeddings import OpenAIEmbeddings
vectorstore = Weaviate(
client=weaviate_client,
index_name="Documents",
text_key="content",
embedding=OpenAIEmbeddings()
)
# RAG query
retriever = vectorstore.as_retriever()
docs = retriever.get_relevant_documents("What's the refund policy?")
This integration created network effect: LangChain popularized RAG → every RAG tutorial mentioned Weaviate → massive adoption.
2023 growth:
- Downloads: 3M+ (3x YoY)
- GitHub stars: 14,000+ (40% increase)
- Production deployments: 1,000+ companies
- Community: 500+ contributors, 5,000+ Discord members
2023-2024: Series A and Enterprise Focus
Series A (2023): $16 Million
- Lead Investor: Battery Ventures
- Additional Investors: Cortical Ventures, Zetta Venture Partners
- Purpose: Enterprise features (multi-tenancy, RBAC), scale WCS (Weaviate Cloud Services)
Series B (2024): $18 Million
- Lead Investors: IVP, Index Ventures
- Additional Investors: Battery Ventures, Cortical Ventures
- Purpose: Global expansion, advanced features (multi-modal search), competition with Pinecone
By 2024, Weaviate served:
- 5M+ downloads (open-source)
- 2,000+ companies (production deployments)
- Growing WCS revenue (managed cloud service)
Founders & Key Team
| Relation / Role | Name | Previous Experience / Role |
|---|---|---|
| Founder, CEO | Bob van Luijt | Serial Entrepreneur, Semantic Search Expert, SeMI Technologies Founder |
| Co-Founder, CTO | Etienne Dilocker | Systems Architect, Distributed Systems Expert, Infrastructure Engineering |
| VP Engineering | Stefan Bogdan | Engineering Leadership, Database Systems, Scalability |
| Head of DevRel | Connor Shorten | Developer Advocate, ML Educator, Community Building |
Bob van Luijt (CEO) leads Weaviate with vision for open-source AI infrastructure. His semantic search expertise (SeMI Technologies) shaped Weaviate’s architecture. Van Luijt is prominent open-source advocate, frequently speaking about why vector databases should be open.
Etienne Dilocker (CTO) built Weaviate’s distributed systems architecture supporting millions of vectors, horizontal scaling, and high availability. His engineering ensures Weaviate competes with proprietary alternatives on performance.
Connor Shorten (Head of DevRel) grew Weaviate’s community from hundreds to 14,000+ GitHub stars through tutorials, conference talks, blog posts, and YouTube videos. His ML education background makes Weaviate accessible to developers.
Funding & Investors
Seed (2021): $16 Million
- Lead Investor: Cortical Ventures
- Additional Investors: Zetta Venture Partners, AI/ML angels
- Purpose: Core team hiring, Weaviate Cloud Services development, ecosystem integrations
Series A (2023): $16 Million
- Lead Investor: Battery Ventures
- Additional Investors: Cortical Ventures, Zetta Venture Partners
- Purpose: Enterprise features (multi-tenancy, RBAC, SSO), scale cloud infrastructure, competition
Series B (2024): $18 Million
- Lead Investors: IVP, Index Ventures
- Additional Investors: Battery Ventures, Cortical Ventures
- Purpose: Global expansion (Europe, Asia), multi-modal search, hybrid search improvements, M&A
Total Funding Raised: $50 Million
Weaviate deployed capital across:
- Engineering: Core database development, performance optimization, new features
- Cloud infrastructure: WCS (Weaviate Cloud Services) global deployment, Kubernetes scaling
- Developer relations: Community growth, documentation, tutorials, conference presence
- Enterprise sales: Building SDR, AE teams for Fortune 500
- Ecosystem: Integrations with LangChain, LlamaIndex, OpenAI, Cohere, Hugging Face
Product & Technology Journey
A. Core Open-Source Database
Architecture (written in Go):
Schema Definition
{
"classes": [{
"class": "Article",
"vectorizer": "text2vec-openai",
"properties": [
{"name": "title", "dataType": ["text"]},
{"name": "content", "dataType": ["text"]},
{"name": "author", "dataType": ["string"]}
]
}]
}
Data Insertion
import weaviate
client = weaviate.Client("http://localhost:8080")
client.data_object.create(
class_name="Article",
data_object={
"title": "Introduction to RAG",
"content": "RAG combines retrieval and generation...",
"author": "Alice"
}
)
# Weaviate automatically generates embeddings using configured vectorizer
Vector Search
{
Get {
Article(
nearText: {concepts: ["retrieval augmented generation"]}
limit: 5
) {
title
content
_additional {
certainty # similarity score
distance # vector distance
}
}
}
}
B. Hybrid Search
Combining semantic (vector) and keyword (BM25) search:
{
Get {
Article(
hybrid: {
query: "GPT-4 capabilities"
alpha: 0.5 # 0=pure keyword, 1=pure vector, 0.5=balanced
}
limit: 5
) {
title
_additional {
score # hybrid score
}
}
}
}
Reranking (Cohere Rerank):
results = client.query.get("Article", ["title", "content"]) \
.with_hybrid(query="machine learning", alpha=0.5) \
.with_additional(["rerank(property: 'content', query: 'machine learning')"])
Impact: 20-30% improvement in retrieval accuracy vs. pure vector search.
C. Multi-Modal Search
Searching across text, images, audio:
{
Get {
Image(
nearImage: {image: "base64encodedimage"} # find similar images
nearText: {concepts: ["sunset beach"]} # or by text description
limit: 5
) {
url
caption
}
}
}
Supported modalities:
- Text: BERT, OpenAI embeddings, Cohere
- Images: CLIP, ResNet, custom models
- Audio: Wav2Vec, custom audio embeddings
- Video: Frame embeddings, temporal vectors
D. Cross-References and Filtering
Relational features (like SQL joins):
{
Get {
Author {
name
articles { # cross-reference to Article class
... on Article {
title
content
}
}
}
}
}
Filtering (pre-filter before vector search):
{
Get {
Article(
nearText: {concepts: ["AI safety"]}
where: {
path: ["author"]
operator: Equal
valueString: "Alice"
}
) {
title
}
}
}
E. Modular Architecture
Pluggable vectorizers:
- text2vec-openai: OpenAI embeddings
- text2vec-cohere: Cohere embeddings
- text2vec-huggingface: Hugging Face models
- text2vec-transformers: Self-hosted BERT, RoBERTa
- img2vec-neural: CLIP, ResNet for images
- Custom modules: Bring-your-own vectorizer
Pluggable storage:
- Local storage: Filesystem, Docker volume
- Cloud storage: S3, GCS, Azure Blob
- Kubernetes: Persistent volumes, StatefulSets
Rerankers:
- Cohere Rerank: Improving hybrid search results
- Custom rerankers: Bring-your-own scoring model
F. Weaviate Cloud Services (WCS)
Managed hosting (alternative to self-hosting):
Features:
- Serverless: Automatic scaling, pay-per-use
- Global deployment: US, EU, Asia regions (GDPR compliance)
- Backup/restore: Automated daily backups, point-in-time recovery
- Monitoring: Grafana dashboards, alerts, logs
- Security: TLS encryption, API key authentication, VPC peering
Pricing:
- Sandbox: Free tier (1M vectors, 100GB storage)
- Standard: $25-200/month (10M-100M vectors)
- Enterprise: Custom pricing (dedicated clusters, SLAs)
G. Enterprise Features
Multi-tenancy: Isolating data by tenant (SaaS applications):
client.data_object.create(
class_name="Document",
data_object={"content": "..."},
tenant="customer-123" # isolated tenant
)
RBAC (Role-Based Access Control): Fine-grained permissions (read/write/admin)
Backups: Automated backups, restore to specific timestamps
Monitoring: Prometheus metrics, Grafana dashboards, log aggregation
H. Performance
Benchmarks (Weaviate internal testing):
- Latency: p50 <20ms, p95 <100ms, p99 <200ms
- Throughput: 10K+ QPS (queries per second) per node
- Recall: 95-99% (finding correct neighbors)
- Scaling: Horizontal scaling to 100B+ vectors (multi-node clusters)
HNSW optimization: Custom implementation reducing memory 40% vs. reference
Business Model & Revenue
Revenue Streams (February 2026)
| Stream | % Revenue | Description |
|---|---|---|
| Weaviate Cloud Services | 70% | Managed hosting ($25-10K+/month) |
| Enterprise Support | 20% | Support contracts, SLAs, consulting |
| Training/Certification | 10% | Weaviate certification programs, workshops |
Open-Source Model:
- Free forever: Open-source Apache 2.0 (self-hosted)
- WCS: Managed cloud (convenience, no DevOps)
- Enterprise: Support, SLAs, consulting
Customer Segmentation
- AI startups (60%): Building RAG apps, semantic search (often start open-source, upgrade to WCS)
- Enterprise (30%): Fortune 500 self-hosting or using WCS enterprise
- Developers (10%): Individual developers, side projects, education
Commercial Strategy
Open-source → Cloud funnel:
- Developers discover Weaviate (GitHub, tutorials, LangChain integration)
- Self-host for prototypes, experimentation
- Upgrade to WCS for production (managed, no DevOps)
- Enterprise contracts for dedicated clusters, support, SLAs
Competitive advantages:
- No vendor lock-in: Can always self-host (vs. Pinecone proprietary)
- Transparency: Inspect/modify code, understand algorithms
- Community: 500+ contributors, rapid innovation
- Cost control: Self-host = infrastructure costs only (vs. premium managed pricing)
Competitive Landscape
Pinecone ($2.75B valuation, $138M funding): Proprietary managed service, serverless
Qdrant ($28M funding): Rust-based open-source, performance focus
Milvus (open-source): LF AI Foundation, China-originated, large community
Chroma ($20M funding): Embedded database (no server), developer-friendly
Elasticsearch (Elastic, $8B valuation): Traditional search adding vector capabilities
PostgreSQL pgvector: Extension for Postgres, familiar but slower
Weaviate Differentiation:
- GraphQL API: Unique, familiar to web developers (vs. custom query DSL)
- Hybrid search maturity: Dense + sparse + reranking (sophisticated)
- Modular architecture: Pluggable vectorizers, storage, rerankers
- Open-source + cloud: Flexibility (self-host or managed)
- European roots: GDPR compliance, data sovereignty (important for EU customers)
Impact & Success Stories
Developer Tools
StackOverflow: Using Weaviate for semantic code search, finding similar questions/answers. 10M+ Stack Overflow posts indexed, 95% search relevance.
Financial Services
Morningstar: Using Weaviate for investment research, semantic search across financial documents. 50M+ documents, <100ms query latency, 40% faster research.
Productivity
Zapier: Using Weaviate for app discovery, semantic search across 5,000+ integrations. 30% increase in relevant app recommendations.
Future Outlook
Product Roadmap
AI-native features: Auto-tuning indexes, intelligent sharding, predictive scaling
Graph capabilities: Combining vector + graph databases for knowledge graphs
Streaming: Real-time vector updates, change data capture
Edge deployment: Running Weaviate on edge devices, mobile
Growth Strategy
Open-source dominance: Becoming default vector database (like PostgreSQL for relational)
Cloud revenue: Converting self-hosted users to WCS (convenience)
Enterprise: Fortune 500 adoption with support contracts
Competitive Position
Weaviate positioned as open-source alternative to proprietary Pinecone—appealing to:
- Developers valuing transparency, customization
- Enterprises avoiding vendor lock-in, requiring data sovereignty
- Cost-conscious companies (self-hosting cheaper at scale)
FAQs
What is Weaviate?
Weaviate is open-source vector database with GraphQL API, supporting hybrid search (semantic + keyword), multi-modal vectors, and production-ready performance. Available self-hosted or as managed cloud (WCS).
How much does Weaviate cost?
Open-source: Free (Apache 2.0). Weaviate Cloud Services: Free sandbox, $25-200/month standard, enterprise custom pricing. Self-hosting: Infrastructure costs only.
What is Weaviate’s valuation?
Estimated $500M+ (February 2026) based on $50M funding, though company hasn’t disclosed official valuation.
How many users does Weaviate have?
5+ million open-source downloads, 14,000+ GitHub stars, 2,000+ companies in production, growing WCS adoption.
Who founded Weaviate?
Bob van Luijt (semantic search entrepreneur, SeMI Technologies) and Etienne Dilocker (systems architect), founded 2019 in Amsterdam.
Conclusion
Weaviate has established itself as leading open-source vector database, achieving 5+ million downloads, 14,000+ GitHub stars, and 2,000+ production deployments. With $50 million funding from Battery Ventures, IVP, and Index Ventures, Weaviate proves that open-source model works for infrastructure software—providing transparency, flexibility, and community innovation while building sustainable commercial business through managed cloud services.
As AI applications proliferate (every company building RAG chatbots, semantic search, recommendations), demand for vector databases grows exponentially. Weaviate’s open-source positioning offers compelling alternative to proprietary solutions—avoiding vendor lock-in, enabling customization, and providing cost control through self-hosting. With GraphQL API familiarity, hybrid search sophistication, and modular architecture, Weaviate is positioned to capture significant share of vector database market. While remaining private longer than competitors, Weaviate’s community-driven growth, enterprise adoption, and cloud revenue make it sustainable long-term player in AI infrastructure landscape.
Related Article:
- https://eboona.com/ai-unicorn/6sense/
- https://eboona.com/ai-unicorn/abnormal-security/
- https://eboona.com/ai-unicorn/abridge/
- https://eboona.com/ai-unicorn/adept-ai/
- https://eboona.com/ai-unicorn/anduril-industries/
- https://eboona.com/ai-unicorn/anthropic/
- https://eboona.com/ai-unicorn/anysphere/
- https://eboona.com/ai-unicorn/applied-intuition/
- https://eboona.com/ai-unicorn/attentive/
- https://eboona.com/ai-unicorn/automation-anywhere/
- https://eboona.com/ai-unicorn/biosplice/
- https://eboona.com/ai-unicorn/black-forest-labs/
- https://eboona.com/ai-unicorn/brex/
- https://eboona.com/ai-unicorn/bytedance/
- https://eboona.com/ai-unicorn/canva/
- https://eboona.com/ai-unicorn/celonis/
- https://eboona.com/ai-unicorn/cerebras-systems/


























