Together AI Chat, Image Generator, Careers & Models

Together AI

Jump to What You Need

QUICK INFO BOX

AttributeDetails
Company NameTogether AI (formerly Together Computer)
FoundersVipul Ved Prakash (CEO), Ce Zhang (Chief Scientist), Christopher Ré (Co-Founder)
Founded Year2022
HeadquartersSan Francisco, California, USA
IndustryArtificial Intelligence / Cloud Computing / Open Source
SectorAI Infrastructure / Model Hosting / Inference Platform
Company TypePrivate
Key InvestorsKleiner Perkins, NVIDIA, Emergence Capital, Lux Capital, Salesforce Ventures
Funding RoundsSeed, Series A
Total Funding Raised$102.5 Million
Valuation$1.25 Billion (February 2026)
Number of Employees120+ (February 2026)
Key Products / ServicesTogether Inference API, Model Fine-tuning, Open-source Model Hosting (Llama 3, Mixtral, Falcon), RedPajama Dataset, Custom Model Training
Technology StackPyTorch, NVIDIA GPUs (H100/A100), Flash Attention, Optimized Inference, Kubernetes
Revenue (Latest Year)$40+ Million ARR (February 2026)
Customer Base10,000+ developers, 1,000+ companies using API (Salesforce, Adobe, ServiceNow)
Social MediaLinkedIn, Twitter, GitHub

Introduction

Closed AI is dominating, yet open-source is the future. OpenAI’s GPT-4 and Anthropic’s Claude deliver state-of-art performance but impose fundamental constraints:

  1. Vendor lock-in: Proprietary APIs, pricing control, service interruptions (OpenAI outages costing businesses millions)
  2. Data privacy: Sending sensitive data to external APIs (legal/compliance concerns for healthcare, finance, government)
  3. Customization limits: Can’t modify model architecture, training data, or behavior beyond prompt engineering
  4. Cost unpredictability: $0.01-0.06 per 1K tokens, bills scaling unpredictably with traffic
  5. Geopolitical risk: US-based services, export controls, data sovereignty issues

Meanwhile, open-source LLMs (Llama 3, Mixtral, Falcon) match proprietary models on many benchmarks—yet running them in production requires:

  1. GPU infrastructure: Renting A100/H100 GPUs ($2-5 per hour each), managing clusters
  2. Optimization expertise: Implementing Flash Attention, quantization, tensor parallelism (4-8 weeks engineering)
  3. Serving infrastructure: Building APIs, load balancing, auto-scaling, monitoring
  4. Model customization: Fine-tuning on proprietary data (requiring ML expertise, GPU resources)

Result: $100K-500K setup cost, 2-4 months engineering time, and ongoing DevOps burden—making open-source AI inaccessible to most companies despite its advantages.

Enter Together AI, the open-source AI cloud platform providing hosted inference for 100+ open-source models (Llama 3, Mixtral, Falcon, Mistral) with 5-10x lower cost than proprietary APIs, zero infrastructure management, and full customization (fine-tuning, custom training). Founded in 2022 by Vipul Ved Prakash (CEO, serial entrepreneur and anti-spam pioneer), Ce Zhang (Chief Scientist, ETH Zurich professor), and Christopher Ré (Co-Founder, Stanford professor and MacArthur “Genius Grant” recipient), Together AI democratizes open-source AI by making world-class models as easy to use as OpenAI—with 10x cost savings and data sovereignty.

As of February 2026, Together AI operates at a $1.25 billion valuation with $102.5 million in funding from Kleiner Perkins, NVIDIA, Emergence Capital, Lux Capital, and Salesforce Ventures. The platform serves 10,000+ developers and 1,000+ companies (February 2026) including Salesforce, Adobe, ServiceNow, and AI startups building production applications. Together AI’s annual recurring revenue (ARR) exceeds $40 million (February 2026), making it the leading open-source AI cloud.

With 120+ employees, 100+ hosted models, sub-second inference latency (optimized kernels, Flash Attention 2), and RedPajama—the open-source dataset for training LLMs (1.2 trillion tokens, community-driven)—Together AI has become essential infrastructure for companies choosing open-source AI over proprietary alternatives.

What makes Together AI revolutionary:

  1. Open-source model hosting: 100+ models (Llama 3, Mixtral 8x7B, Falcon 180B) with single API—no GPU management required
  2. 5-10x cost savings: $0.001-0.006 per 1K tokens vs. $0.01-0.06 for GPT-4/Claude—90% reduction
  3. Custom fine-tuning: Training models on proprietary data (customer support, legal, medical) in hours—creating specialized experts
  4. RedPajama dataset: Open-source LLM training data (1.2T tokens) rivaling proprietary datasets—democratizing model training
  5. Enterprise sovereignty: Self-hosted deployment (VPC, on-premise) for regulated industries—full data control

The market opportunity spans $50+ billion AI infrastructure market, $200+ billion cloud computing, and $150+ billion enterprise AI applications. Every company building AI applications must choose: proprietary APIs (expensive, lock-in) or open-source (complex, DevOps burden). Together AI provides third option: open-source with proprietary convenience.

Together AI competes with OpenAI (proprietary, expensive), Anthropic Claude (proprietary), Replicate ($60M funding, open-source model hosting), Hugging Face ($4.5B valuation, model hub + inference), Anyscale ($260M funding, Ray-based infrastructure), and self-hosting (AWS/GCP compute). Together AI differentiates through open-source focus (100+ models vs. Hugging Face’s 10-20), performance optimization (Flash Attention, custom kernels), fine-tuning platform (easier than competitors), and RedPajama (only platform creating training datasets).

The founding story reflects open-source philosophy: Vipul Ved Prakash (founder of Cloudmark, anti-spam company acquired for $100M+) and Stanford/ETH professors Ce Zhang and Christopher Ré recognized that AI’s future depends on open-source models escaping proprietary control. After Meta released Llama 2 (July 2023), they founded Together AI to make open-source LLMs as accessible as OpenAI—ensuring AI remains democratized rather than monopolized.

This comprehensive article explores Together AI’s journey from research vision to the $1.25 billion open-source AI cloud platform powering 10,000+ developers worldwide.


Founding Story & Background

The Open vs. Closed AI Debate (2022-2023)

By 2022, AI landscape divided into two camps:

Closed AI (OpenAI, Anthropic):

  • Strengths: State-of-art performance, easy API, rapid improvements
  • Weaknesses: Expensive ($0.01-0.06 per 1K tokens), vendor lock-in, data privacy concerns, no customization

Open-Source AI (Meta’s LLaMA, EleutherAI):

  • Strengths: Free (compute costs only), customizable, data sovereignty, no vendor lock-in
  • Weaknesses: Complex to deploy, requires ML expertise, lacking infrastructure

The inflection point came in February 2023 when Meta released LLaMA (65B parameters)—demonstrating open-source models could match GPT-3.5 performance. Yet adoption remained limited: running LLaMA required 8x A100 GPUs ($20K+/month), implementing serving infrastructure, and optimizing inference (Flash Attention, quantization).

Vipul Ved Prakash, serial entrepreneur who founded Cloudmark (anti-spam company using machine learning, acquired by Proofpoint), watched this dynamic with frustration. Prakash believed open-source AI would win long-term (like Linux vs. Windows, PostgreSQL vs. Oracle) but lacked infrastructure layer making it accessible.

Ce Zhang (ETH Zurich professor, database systems and ML expert) and Christopher Ré (Stanford professor, MacArthur “Genius Grant” winner, database/ML research) shared this conviction. Ré’s research lab at Stanford had pioneered techniques for efficient ML training, including Snorkel (weak supervision framework) and contributions to DAWN (Data Analytics at Web Scale) project.

The insight: Open-source AI needs its AWS—cloud platform abstracting complexity, providing hosted inference, enabling fine-tuning, and fostering community.

2022: Founding and Vision

In 2022, Prakash, Zhang, and Ré founded Together Computer (later rebranded Together AI) in San Francisco with mission: “Democratize AI through open-source models and community-driven infrastructure.”

Founding principles:

  1. Open-source first: Supporting community models (Llama, Falcon, Mistral) rather than competing with proprietary
  2. Inference as service: Hosting models with optimized performance—no GPU management
  3. Customization: Fine-tuning platform enabling companies to create specialized models
  4. Community: Contributing datasets, research, tools back to open-source ecosystem

Initial focus: Building serving infrastructure for LLaMA models (February 2023 release created demand wave).

2023: Meta’s Llama 2 and Validation

In July 2023, Meta released Llama 2 (7B, 13B, 70B parameters)—commercially usable, Apache 2.0 license, matching GPT-3.5 on benchmarks. This validated open-source thesis: companies could use production-grade LLMs without OpenAI dependency.

Together AI became first platform offering hosted Llama 2 inference at scale:

Value proposition:

  • API compatibility: OpenAI-compatible endpoints (drop-in replacement)
  • 10x cost savings: $0.0006 per 1K tokens (Llama 2 70B) vs. $0.006 for GPT-3.5
  • Performance: <500ms latency through Flash Attention, optimized kernels
  • Data sovereignty: Data never leaves customer’s region, available for VPC deployment

Early customers were AI startups and enterprises seeking:

  • Cost reduction: 90% savings vs. GPT-3.5/4
  • Data privacy: Healthcare, finance, legal use cases requiring on-premise
  • Customization: Fine-tuning models on proprietary data

2023 traction:

  • 1,000+ developers (first 3 months)
  • 100+ enterprise customers (by year-end)
  • $5M ARR (December 2023)

2023-2024: RedPajama and Community Leadership

In April 2023, Together AI launched RedPajama—open-source dataset for training LLMs:

Problem: OpenAI, Anthropic train on proprietary datasets (undisclosed sources). Open-source models lacked high-quality training data.

Solution: Together AI curated RedPajama-1T—1.2 trillion tokens from:

  • CommonCrawl: Web data (878B tokens)
  • GitHub: Code (59B tokens)
  • Wikipedia: Knowledge (24B tokens)
  • Books: Literature (26B tokens)
  • ArXiv: Scientific papers (28B tokens)
  • StackExchange: Technical Q&A (20B tokens)

Impact: RedPajama enabled community to train competitive models—democratizing AI beyond Big Tech. RedPajama-V2 (2024) added 30 trillion tokens (multilingual, higher quality).

This community contribution positioned Together AI as open-source leader (not just commercial player)—building goodwill, attracting talent, differentiating from pure infrastructure providers.

2024: Series A and Enterprise Focus

Seed (2022): $20 Million

  • Lead: Lux Capital
  • Purpose: Core team, infrastructure, initial models

Series A (May 2024): $82.5 Million

  • Lead: Kleiner Perkins
  • Additional: NVIDIA, Emergence Capital, Lux Capital, Salesforce Ventures
  • Valuation: $1.25 Billion (unicorn status)
  • Purpose: GPU infrastructure (H100 clusters), enterprise sales, custom training platform

NVIDIA’s investment provided:

  • Strategic GPU access: Priority H100 allocation (scarce resource)
  • Technical collaboration: Optimizing CUDA kernels, inference performance
  • Ecosystem integration: NVIDIA NIM, enterprise AI tools

Salesforce Ventures’ investment signaled enterprise validation—Salesforce using Together AI for Einstein AI platform.

By 2024, Together AI served:

  • 5,000+ developers
  • 500+ companies (Salesforce, Adobe, ServiceNow)
  • $20M ARR (4x YoY growth)

2025-2026: Mixtral, Llama 3, and Scale

In 2025-2026, open-source AI accelerated:

Mixtral 8x7B (Mistral AI, December 2023): Mixture-of-experts model matching GPT-3.5, inference-efficient

Llama 3 (Meta, March 2024): 70B, 405B models approaching GPT-4 performance

Falcon 180B (Technology Innovation Institute): Largest open-source model

Together AI hosted all models day-one, with optimizations:

  • Flash Attention 2: 2x faster inference
  • Tensor parallelism: Distributing large models across GPUs
  • Quantization: 4-bit/8-bit reducing memory, cost

By February 2026:

  • 10,000+ developers
  • 1,000+ companies
  • $40M+ ARR (2x YoY growth)

Founders & Key Team

Relation / RoleNamePrevious Experience / Role
Founder, CEOVipul Ved PrakashFounder/CEO Cloudmark (anti-spam, acquired $100M+), Serial Entrepreneur
Co-Founder, Chief ScientistCe ZhangProfessor at ETH Zurich, Database Systems and ML, PhD Wisconsin-Madison
Co-FounderChristopher RéProfessor at Stanford, MacArthur “Genius Grant”, Database/ML Research, Co-founder Snorkel AI
VP EngineeringMichael CarbinMIT Professor, Programming Languages, Compiler Optimization
Head of ResearchTri DaoCreator of Flash Attention, Stanford PhD, Inference Optimization

Vipul Ved Prakash (CEO) leads Together AI with entrepreneurial track record (Cloudmark acquired by Proofpoint) and vision for open-source AI democratization. His experience building large-scale ML systems (anti-spam processing billions of emails) informs Together AI’s infrastructure.

Ce Zhang (Chief Scientist) brings academic rigor from ETH Zurich research on efficient ML training, data management, and systems optimization. His work on database-ML intersection enables Together AI’s data-centric approach.

Christopher Ré (Co-Founder) is MacArthur Fellow and Stanford professor whose research pioneered weak supervision (Snorkel), foundation model training, and data-centric AI. His academic credibility attracts top-tier researchers.

Tri Dao (Head of Research) created Flash Attention—breakthrough algorithm making transformer inference 2-10x faster. His innovations are core to Together AI’s performance advantages.


Funding & Investors

Seed (2022): $20 Million

  • Lead Investor: Lux Capital
  • Additional Investors: Angel investors (AI researchers, entrepreneurs)
  • Purpose: Core team, initial infrastructure, Llama 1 hosting

Series A (May 2024): $82.5 Million

  • Lead Investor: Kleiner Perkins
  • Additional Investors: NVIDIA, Emergence Capital, Lux Capital, Salesforce Ventures
  • Valuation: $1.25 Billion (unicorn status)
  • Purpose: H100 GPU clusters, enterprise sales, fine-tuning platform, RedPajama expansion, M&A

NVIDIA’s strategic investment provided:

  • GPU allocation: Priority access to scarce H100s
  • Technical support: CUDA optimization, inference kernels
  • Go-to-market: Joint enterprise customers, NVIDIA ecosystem

Salesforce Ventures enabled:

  • Enterprise validation: Salesforce using Together AI for Einstein
  • Distribution: Access to Salesforce customer base (150K+ companies)
  • Integration: Native Salesforce platform integration

Total Funding Raised: $102.5 Million

Together AI deployed capital across:

  • GPU infrastructure: $40-50M+ in H100/A100 clusters (largest open-source AI compute)
  • Engineering: ML engineers, systems engineers, DevOps (60+ person team)
  • Research: Inference optimization, Flash Attention improvements, custom kernels
  • Community: RedPajama dataset creation, open-source contributions
  • Sales/marketing: Enterprise sales (Salesforce, Adobe deal teams), developer marketing

Product & Technology Journey

A. Inference API (Core Product)

Hosted models (100+ open-source LLMs):

Popular Models

  • Llama 3 70B: Meta’s latest, GPT-4-class performance ($0.006/1K tokens)
  • Mixtral 8x7B: Mixture-of-experts, cost-efficient ($0.0006/1K tokens)
  • Falcon 180B: Largest open-source model ($0.018/1K tokens)
  • Code Llama 70B: Code generation specialist ($0.006/1K tokens)
  • Mistral 7B: Efficient small model ($0.0002/1K tokens)

API Usage

import together

together.api_key = "your_api_key"

response = together.Complete.create(
    model="meta-llama/Llama-3-70b-chat",
    prompt="Explain quantum computing in simple terms",
    max_tokens=500,
    temperature=0.7
)

print(response['output']['choices'][0]['text'])

OpenAI compatibility:

import openai

openai.api_base = "https://api.together.xyz"
openai.api_key = "your_together_api_key"

# Drop-in replacement for OpenAI
response = openai.ChatCompletion.create(
    model="meta-llama/Llama-3-70b-chat",
    messages=[{"role": "user", "content": "Hello!"}]
)

B. Fine-Tuning Platform

Custom model training:

# Upload training data (JSONL format)
together.Files.upload(file="training_data.jsonl")

# Start fine-tuning job
job = together.Finetune.create(
    model="meta-llama/Llama-3-8b",
    training_file="file-abc123",
    validation_file="file-def456",
    n_epochs=3,
    learning_rate=1e-5
)

# Monitor training
status = together.Finetune.retrieve(job.id)

# Use fine-tuned model
response = together.Complete.create(
    model=f"custom/{job.id}",
    prompt="Your custom prompt"
)

Use cases:

  • Customer support: Fine-tuning on support tickets, FAQs
  • Legal: Training on case law, contracts (confidential data)
  • Medical: Clinical notes, patient data (HIPAA-compliant)
  • Code: Company-specific codebases, internal APIs

Pricing: $0.50-2.00 per 1K training tokens (10x cheaper than OpenAI fine-tuning)

C. RedPajama Dataset

Open-source training data:

RedPajama-V1 (2023)

  • 1.2 trillion tokens from CommonCrawl, GitHub, Wikipedia, books, ArXiv, StackExchange
  • Pre-processed, deduplicated, filtered for quality
  • Fully open: Apache 2.0 license, available for commercial use

RedPajama-V2 (2024)

  • 30 trillion tokens (25x larger)
  • 100+ languages (multilingual coverage)
  • Quality scores: Filtering low-quality content
  • Safety filters: Removing toxic, biased data

Impact: Enabled community models (Pythia, OpenLLaMA, etc.) to match proprietary performance.

D. Performance Optimizations

Flash Attention 2 (Tri Dao):

  • 2-4x faster inference vs. standard attention
  • Reduced memory: Enabling larger batch sizes
  • Implementation: CUDA kernels optimized for A100/H100

Tensor parallelism:

  • Multi-GPU: Distributing 70B/180B models across 4-8 GPUs
  • Zero pipeline bubbles: Maximizing utilization

Quantization:

  • 4-bit/8-bit: Reducing memory 2-4x, cost savings
  • GPTQ, AWQ: Advanced quantization preserving accuracy

Results:

  • Latency: <500ms for Llama 3 70B (1K token input)
  • Throughput: 10K+ tokens/second per GPU
  • Cost: 90% cheaper than GPT-4 (equivalent performance)

E. Enterprise Features

VPC Deployment: Deploying Together AI in customer’s AWS/GCP VPC (data never leaves network)

On-Premise: Self-hosted Together AI stack (regulated industries)

SOC 2 Type 2: Security compliance, annual audits

SSO: Okta, Azure AD, Google Workspace integration

Usage Analytics: Tracking API calls, costs, latency per team/project

F. Ecosystem Integrations

LangChain, LlamaIndex: Native Together AI integrations (first-class support)

Hugging Face: Together AI hosting Hugging Face models via API

NVIDIA NIM: Integration with NVIDIA’s inference microservices

Kubernetes: Helm charts for self-deployment


Business Model & Revenue

Revenue Streams (February 2026)

Stream% RevenueDescription
Inference API80%Pay-per-token pricing ($0.0002-0.018 per 1K tokens)
Fine-tuning15%Custom model training ($0.50-2.00 per 1K training tokens)
Enterprise5%VPC deployment, on-premise, dedicated support

Pricing Model:

  • Free tier: $25 credits (experimenting)
  • Pay-as-you-go: $0.0002-0.018 per 1K tokens (model-dependent)
  • Enterprise: Volume discounts, dedicated capacity, custom pricing

Cost advantage:

ModelTogether AIOpenAISavings
Llama 3 70B$0.006GPT-4: $0.0690%
Mixtral 8x7B$0.0006GPT-3.5: $0.00270%
Mistral 7B$0.0002

Customer Segmentation

  1. AI startups (50%): Building products on open-source models (cost savings)
  2. Enterprise (40%): Regulated industries (healthcare, finance, government)
  3. Developers (10%): Side projects, prototyping, education

Unit Economics

  • CAC: $100-300 (developer-focused marketing, free tier funnel)
  • LTV: $10K+ (annual contracts, expanding usage)
  • Gross Margin: 60-70% (GPU costs, economies of scale)
  • Payback Period: 12-18 months
  • Churn: 20% annually (startups pivot, experimentation)

Total ARR: $40+ Million (February 2026), growing 100%+ YoY


Competitive Landscape

OpenAI ($80B valuation): Proprietary, expensive, vendor lock-in
Anthropic Claude ($18B valuation): Proprietary, safety-focused
Replicate ($60M funding): Open-source model hosting, narrower catalog
Hugging Face ($4.5B valuation): Model hub + inference, less optimization
Anyscale ($260M funding): Ray-based infrastructure, broader scope (not just LLMs)
Self-hosting: AWS/GCP compute, complex DevOps

Together AI Differentiation:

  1. Open-source focus: 100+ models vs. 10-20 (Hugging Face, Replicate)
  2. Performance: Flash Attention, custom kernels (2-4x faster)
  3. Cost: 90% cheaper than GPT-4 (equivalent models)
  4. RedPajama: Only platform creating training datasets (community contribution)
  5. Fine-tuning ease: Simpler than competitors (API-driven)

Impact & Success Stories

Enterprise

Salesforce Einstein: Using Together AI for custom model training on CRM data. 50% cost reduction vs. proprietary APIs, data sovereignty maintained.

Media

Adobe: Using Llama 3 via Together AI for Firefly (generative AI creative tools). 10M+ API calls/day, <300ms latency.

Healthcare

Health-tech startup: Fine-tuned Llama 3 on medical notes (HIPAA-compliant VPC deployment). 95% accuracy on diagnosis suggestions, zero data leakage.


Future Outlook

Product Roadmap

Multimodal models: Vision (Llama 3.2 Vision), audio, video
Agent frameworks: Pre-built agents for common tasks (research, coding, analysis)
Model marketplace: Developers selling fine-tuned models
Edge deployment: Running models on-device (mobile, IoT)

Growth Strategy

Open-source leadership: RedPajama-V3, research contributions, community engagement
Enterprise expansion: Fortune 500 adoption (regulated industries)
International: EU/Asia deployments (data sovereignty)

IPO Timeline

With $40M ARR, 100%+ growth, and $1.25B valuation, Together AI positioned for IPO in 2028-2030 or strategic acquisition by cloud provider (AWS, Google Cloud, Microsoft Azure).


FAQs

What is Together AI?

Together AI is open-source AI cloud platform hosting 100+ models (Llama 3, Mixtral, Falcon) with inference API, fine-tuning, and RedPajama training dataset—providing 90% cost savings vs. proprietary APIs.

How much does Together AI cost?

Free tier ($25 credits), Pay-as-you-go ($0.0002-0.018 per 1K tokens depending on model), Enterprise (custom pricing, volume discounts).

What is Together AI’s valuation?

$1.25 billion (May 2024) following $82.5M Series A led by Kleiner Perkins with NVIDIA and Salesforce Ventures participation.

How many users does Together AI have?

10,000+ developers, 1,000+ companies including Salesforce, Adobe, ServiceNow using inference API and fine-tuning platform.

Who founded Together AI?

Vipul Ved Prakash (Cloudmark founder), Ce Zhang (ETH Zurich professor), Christopher Ré (Stanford professor, MacArthur Fellow), founded 2022 in San Francisco.


Conclusion

Together AI has established itself as leading open-source AI cloud, achieving $1.25 billion valuation, 10,000+ developers, and $40M+ ARR by democratizing access to state-of-art open-source models. With $102.5 million funding from Kleiner Perkins, NVIDIA, and Salesforce Ventures, Together AI proves that open-source AI can compete commercially with proprietary alternatives—providing 90% cost savings, data sovereignty, and full customization.

As open-source models (Llama 3, Mixtral) approach GPT-4 performance, demand for Together AI’s infrastructure grows exponentially—enterprises seeking escape from OpenAI/Anthropic lock-in, startups optimizing costs, regulated industries requiring data control. Together AI’s RedPajama dataset contribution, performance optimizations (Flash Attention), and enterprise features (VPC, on-premise) position it as essential infrastructure for open-source AI era. With 100%+ growth, NVIDIA partnership, and Salesforce distribution, Together AI is positioned as compelling IPO candidate within 3-5 years, potentially achieving $5B+ valuation as open-source AI adoption accelerates.

Related Article:

Leave a Reply

Your email address will not be published. Required fields are marked *

Share This Post