Search Results for "real voice text to speech" - Page 2

97 projects for "real voice text to speech" with 1 filter applied:

  • IT Asset Management (ITAM) Software Icon
    IT Asset Management (ITAM) Software

    Supercharge Your IT Assets, the Easy Way

    EZO AssetSonar is a comprehensive IT asset management platform that provides real-time visibility into your entire digital infrastructure. Track and optimize hardware, software, and license management to reduce risks, control IT spend, and improve compliance.
    Learn More
  • CloudZero: The Cloud Cost Optimization Platform Icon
    CloudZero: The Cloud Cost Optimization Platform

    CloudZero automates the collection, allocation, and analysis of your infrastructure and AI spend to uncover waste and improve unit economics.

    CloudZero is the leader in proactive cloud cost efficiency. We enable engineers to build cost-efficient software without slowing down innovation. CloudZero's next-generation cloud cost optimization platform automates the collection, allocation, and analysis of cloud costs to uncover savings opportunities and improve unit economics. We are the only platform that enables companies to understand 100% of their operational cloud spend and take an engineering-led approach to optimizing that spend. CloudZero is used by industry leaders worldwide, such as Coinbase, Klaviyo, Miro, Nubank, and Rapid7.
    Learn More
  • 1
    WhisperSpeech

    WhisperSpeech

    An Open Source text-to-speech system built by inverting Whisper

    WhisperSpeech is an open-source text-to-speech system created by “inverting” OpenAI’s Whisper, reusing its strengths as a semantic audio model to generate speech instead of only transcribing it. The project aims to be for speech what Stable Diffusion is for images: powerful, hackable, and safe for commercial use, with code under Apache-2.0/MIT and models trained only on properly licensed data.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    Bailing

    Bailing

    Bailing is a voice dialogue robot similar to GPT-4o

    Bailing is an open-source voice-dialogue assistant designed to deliver natural voice-based conversations by combining automatic speech recognition (ASR), voice activity detection (VAD), a large language model (LLM), and text-to-speech (TTS) in a single pipeline. Its goal is to offer a “voice-first” chat experience similar to what one might expect from a system like GPT-4o, but fully open and deployable by users.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    RunAnywhere

    RunAnywhere

    Production ready toolkit to run AI locally

    ...It also includes integrated pipelines that combine speech-to-text, large language models, and text-to-speech into a complete conversational system.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Qwen3-ASR

    Qwen3-ASR

    Qwen3-ASR is an open-source series of ASR models

    Qwen3-ASR is an automatic speech recognition system in the QwenLM family, developed to convert spoken language into text with strong accuracy and real-time performance. As a specialized ASR variant of the broader Qwen language model ecosystem, it focuses on capturing reliable transcriptions from audio sources such as recordings, live streams, or conversational inputs while supporting low latency use cases.
    Downloads: 3 This Week
    Last Update:
    See Project
  • The complete IT asset and license management platform Icon
    The complete IT asset and license management platform

    Gain full visibility and control over your IT assets, licenses, usage and spend in one place with Setyl.

    The platform seamlessly integrates with 100+ IT systems, including MDM, RMM, IDP, SSO, HR, finance, helpdesk tools, and more.
    Learn More
  • 5
    Fun Audio Chat

    Fun Audio Chat

    Large Audio Language Model built for natural interactions

    Fun Audio Chat is an interactive voice-first conversational AI platform designed to let users engage in natural spoken dialogue with large language models in real time, turning speech into context-aware responses while maintaining a smooth back-and-forth experience. It combines speech recognition, audio processing, and AI generation so users can speak simply and receive spoken replies, enabling applications such as virtual assistants, voice bots, and hands-free chat interfaces. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    WhisperLive

    WhisperLive

    A nearly-live implementation of OpenAI's Whisper

    WhisperLive is a “nearly live” implementation of OpenAI’s Whisper model focused on real-time transcription. It runs as a server–client system in which the server hosts a Whisper backend and clients stream audio to be transcribed with very low delay. The project supports multiple inference backends, including Faster-Whisper, NVIDIA TensorRT, and OpenVINO, allowing you to target GPUs and different CPU architectures efficiently. It can handle microphone input, pre-recorded audio files, and...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 7
    edge-tts

    edge-tts

    Use Microsoft Edge's online text-to-speech service from Python

    edge-tts is a Python module and command-line tool that gives you direct access to Microsoft Edge’s online text-to-speech service without needing the Edge browser, Windows, or any API key. It wraps the same cloud voices used by Edge, exposing them through a simple CLI (edge-tts, edge-playback) and a Python API, so you can script high-quality speech generation in your own applications. The tool lets you list available voices, specify locale and voice name, and generate audio files in common formats like MP3 or WAV. ...
    Downloads: 38 This Week
    Last Update:
    See Project
  • 8
    Orpheus TTS

    Orpheus TTS

    Towards Human-Sounding Speech

    Orpheus TTS is a state-of-the-art open-source text-to-speech system built on a Llama-3B backbone, treating speech synthesis as a large language model problem instead of a traditional TTS pipeline. It is designed to produce human-like speech with natural intonation, emotion, and rhythm, targeting quality comparable to or better than many closed-source systems. The project ships both pretrained and finetuned English models, as well as a family of multilingual models released as a research preview, and includes data-processing scripts so users can train or finetune their own variants. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    LiveKit Agents

    LiveKit Agents

    Framework for building realtime multimodal voice AI agents apps

    LiveKit Agents is an open source framework designed for building realtime AI agents that can participate as programmable entities within communication sessions. It enables developers to create conversational and multimodal agents capable of processing voice, audio, and other inputs in realtime environments. These agents can join LiveKit rooms as participants and interact with users or systems through speech, text, and other modalities. LiveKit Agents provides libraries and tooling that allow developers to combine speech-to-text, large language models, and text-to-speech services to build interactive AI experiences. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Point of Sale. Powerful and Simple. Icon
    Point of Sale. Powerful and Simple.

    For retail store owners and multi-location retail operations needing a tool to manage sales, inventory, staff and channels in one place

    Vibe Retail is an all-in-one retail point-of-sale and operations platform built for single-store and multi-location retailers seeking to unify inventory, sales, staff and customer data from one mobile-friendly interface. The system lets you track inventory across locations and warehouses, handle item variations (size, color, material), manage purchase orders and supplier deliveries, print custom barcodes, and transfer stock between stores in real time. On the sales side, Vibe supports multiple payment types (cards, cash, checks, gift cards, EBT), layaway workflows, serial number tracking, delivery management, loyalty programs and branded receipts. Retailers can integrate with online platforms (such as Shopify and WooCommerce), sync in-store and online sales, access 40+ real-time reports on sales, inventory and performance, set up promotions and discounts, and print receipts from mobile devices.
    Learn More
  • 10
    Luna AI

    Luna AI

    Virtual AI anchor that combines state-of-the-art technology

    ...For voice, it integrates with numerous TTS engines (Edge-TTS, VITS-Fast, ElevenLabs, VALL-E-X, OpenVoice, GPT-SoVITS, Azure TTS, fish-speech, ChatTTS, CosyVoice, F5-TTS, MultiTTS, MeloTTS, and others), and can optionally pass the output through voice conversion systems like so-vits-svc or DDSP-SVC to change timbre.
    Downloads: 18 This Week
    Last Update:
    See Project
  • 11
    MARS5

    MARS5

    MARS5 speech model (TTS) from CAMB.AI

    MARS5-TTS is CAMB.AI’s open-source English speech model designed for high-quality text-to-speech and voice emulation. It uses a two-stage architecture that combines an autoregressive (AR) model with a non-autoregressive (NAR) model, giving it both expressiveness and speed. The model is built to handle prosodically challenging content such as sports commentary, anime dialogue, and other high-energy or highly varied speech patterns with realistic rhythm and intonation. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    StreamSpeech

    StreamSpeech

    StreamSpeech is a seamless model for offline speech recognition

    ...The model supports eight tasks: offline ASR, speech-to-text translation, speech-to-speech translation, and TTS, as well as their streaming or simultaneous counterparts, all handled by the same underlying system. During simultaneous translation, StreamSpeech can optionally output intermediate ASR transcripts and text translations, giving users or downstream applications real-time visibility into what the system is hearing and how it is translating.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Open-LLM-VTuber

    Open-LLM-VTuber

    Open source AI VTuber platform with voice chat and Live2D avatars

    Open-LLM-VTuber is an open source platform designed to create AI-powered VTuber characters that can interact with users through voice and animated avatars. It enables hands-free conversations with large language models by combining speech recognition, language processing, and text-to-speech synthesis into a single system. Users can speak directly to the AI character, and the system can respond with a generated voice while animating a Live2D avatar to simulate a talking virtual personality. ...
    Downloads: 31 This Week
    Last Update:
    See Project
  • 14
    TADA

    TADA

    Open Source Speech Language Model

    TADA is an open-source speech-language modeling framework designed to unify spoken audio and text representations within a single generative architecture. The system focuses on aligning speech and text streams using a dual-alignment mechanism that synchronizes the acoustic signal with its textual representation. By modeling both modalities together, the framework allows developers to build systems capable of generating, understanding, and transforming speech and language simultaneously. This...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Textream

    Textream

    Textream is a free macOS teleprompter app for streamers, interviewers

    Textream is an open-source, free macOS teleprompter application designed for streamers, podcasters, presenters, and interviewers who want a smooth, distraction-free way to stay on script. It runs natively on macOS and leverages on-device speech recognition to highlight each word in real time as you speak, keeping your focus where it belongs — on delivery rather than memorization. The interface supports multiple modes of use, such as classic constant-scroll auto-scrolling, voice-activated scrolling that pauses when you’re silent, and direct word tracking that syncs the displayed script to your spoken pace. ...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 16
    Supertonic

    Supertonic

    Lightning-fast, on-device TTS, running natively via ONNX

    Supertonic is a lightning-fast, on-device text-to-speech system built around ONNX Runtime for maximum speed and portability. It focuses on running entirely locally, eliminating the need for cloud APIs and providing low latency and strong privacy guarantees, even on constrained devices like Raspberry Pi boards and e-readers. The core model is highly compact at around 66 million parameters, yet benchmarks show it can generate speech up to 167× faster than real time on modern consumer hardware and significantly outpace popular cloud TTS APIs in throughput and real-time factor. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 17
    Dia

    Dia

    A TTS model capable of generating ultra-realistic dialogue

    Dia is a neural text-to-speech model designed specifically for generating ultra-realistic dialogue in a single pass. Instead of focusing on isolated sentences or flat narration, it is optimized for conversational audio, complete with natural turn-taking, prosody, and pacing. The model can be conditioned on a reference audio sample, allowing you to control emotion, tone, and other stylistic aspects of the speech. It can also produce nonverbal vocalizations like laughter, coughs, clearing the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    TTS WebUI

    TTS WebUI

    A single Gradio + React WebUI with extensions for ACE-Step

    TTS-WebUI is a unified Gradio + React web interface that brings together a large ecosystem of text-to-speech, voice conversion, and audio generation models under a single UI. It supports a wide range of models such as Bark, MusicGen, Tortoise, RVC, StyleTTS2, ParlerTTS, CosyVoice, XTTSv2, Stable Audio, SeamlessM4T, and many others, exposing them as interchangeable backends for speech and music synthesis. The project provides an installer that sets up Conda, Python environments, and all necessary dependencies, so users can focus on experimenting with voices instead of managing tooling. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 19
    LLPlayer

    LLPlayer

    The media player for language learning, with dual subtitles

    ...Unlike traditional media players, the application focuses on advanced subtitle-related features that help learners understand and interact with foreign language media more effectively. The player supports dual subtitles so users can simultaneously view text in both the original language and their native language while watching videos. It can also automatically generate subtitles in real time using speech-to-text systems such as Whisper, allowing subtitles to be created even when none are available. Real-time translation capabilities enable subtitles to be translated using multiple translation engines and language models. ...
    Downloads: 46 This Week
    Last Update:
    See Project
  • 20
    ChatTTS_colab

    ChatTTS_colab

    One-click deployment (including offline integration package)

    ChatTTS_colab is a wrapper project around the ChatTTS model that focuses on “one-click” deployment, especially in Google Colab. It provides an integrated offline bundle and scripts for Windows and macOS so users can run ChatTTS locally without wrestling with complex environment setup. The repository includes Colab notebooks that launch a Gradio-based web UI and expose streaming TTS, making it possible to listen to generated audio as it is produced. A distinctive feature is the “voice gacha”...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    DragonianVoice

    DragonianVoice

    C++ inference library for multiple SVC/TTS

    DragonianVoice is a C++ inference library that unifies multiple speech synthesis, voice conversion, and singing voice synthesis models under a single, high-performance ONNX-based framework. It focuses on being a reusable native library rather than a full UI product, with bindings for C, C++, and C# so it can be embedded into other applications or engines. The project supports a wide range of model families: TTS models such as Tacotron2, VITS, EmotionalVITS, BERTVits2, GPT-SoVITS, SVC systems...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Voice Accounting For Blind & Mute People

    Voice Accounting For Blind & Mute People

    Free & Easy AI Voice Accounting Software For Blind & Speechless People

    Just download the above zip file, extract it and then open the index.html file on internet browsers like Firefox ( preferable ) or Google Chrome. Also, please view and download my full collection of softwares for people with disabilities, here : https://sourceforge.net/projects/softwares-for-disabled-people/ This full collection also includes the Voice Accounting Software as well.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    FAY

    FAY

    Framework for building AI-powered interactive digital humans and agent

    ...Fay supports various types of digital humans, including 2.5D and 3D avatars, and can be integrated with applications running on mobile devices, PCs, web platforms, and embedded systems. Its architecture allows developers to combine different AI components such as speech recognition, text-to-speech, and large language models to create conversational digital agents. Fay provides multiple interfaces for text, voice, and digital human control, enabling developers to build interactive assistants, virtual presenters, or automated service agents. It also supports custom knowledge bases and configurable behaviors so developers can tailor the personality and responses of the digital human.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 24
    OuteTTS

    OuteTTS

    Interface for OuteTTS models

    OuteTTS is an interface library for running OuteTTS text-to-speech models across a range of backends, making it easier to deploy the same model on different hardware and runtimes. It provides a high-level Interface API that wraps model configuration, speaker handling, and audio generation so you can focus on integrating speech into your application rather than wiring up low-level engines.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    CSM (Conversational Speech Model)

    CSM (Conversational Speech Model)

    A Conversational Speech Generation Model

    The CSM (Conversational Speech Model) is a speech generation model developed by Sesame AI that creates RVQ audio codes from text and audio inputs. It uses a Llama backbone and a smaller audio decoder to produce audio codes for realistic speech synthesis. The model has been fine-tuned for interactive voice demos and is hosted on platforms like Hugging Face for testing.
    Downloads: 4 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB