Open-source framework for building real-time voice AI agents!
Pipecat is a Python framework for orchestrating audio, video, AI services, transports, and conversation pipelines. Voice-first architecture with pluggable components.
What you can build: voice assistants, AI companions, multimodal interfaces, interactive storytelling, business agents (customer support, intake), and complex dialog systems.
The framework handles speech recognition, text-to-speech, conversation logic, and real-time interaction. WebRTC and WebSocket transport built in. Ultra-low latency for natural conversations.
Why Pipecat:
• Voice-first: Integrates STT, TTS, and conversation handling in one framework • Pluggable: Supports multiple AI service providers for each capability
• Composable pipelines: Build complex behavior from modular components
• Real-time: Low-latency interaction with streaming audio/video
Supported services:
• Speech-to-Text: Deepgram, AssemblyAI, OpenAI Whisper, Groq, Azure, AWS, Google, and more
• LLMs: OpenAI, Anthropic, Gemini, Groq, Mistral, Ollama, AWS, Azure, and more
• Text-to-Speech: OpenAI, ElevenLabs, Deepgram, Cartesia, Azure, AWS, Google, and more
• Speech-to-Speech: OpenAI Realtime, Gemini Multimodal Live, AWS Nova Sonic, Ultravox, Grok Voice Agent
10.3k+ stars on GitHub.
I've shared link to the repo in the comments!
显示更多
Introducing ElevenLabs Devs, a new YouTube channel for AI engineers.
Expect deep dives, demos, and clear explanations of key concepts across Text to Speech, Speech to Text, ElevenAgents, and broader AI systems.
Subscribe:
显示更多