Showing 30 open source projects for "python tools"

View related business solutions
  • Striven | All In One Business Management Software Icon
    Striven | All In One Business Management Software

    Striven is an all-in-one business management software suite with everything your organization needs for success.

    Striven is the all-in-one business management software that lowers your costs, improves your operations, and makes work easier. Make your company’s data coherent, connected, and relevant.
    Learn More
  • Taking the Paper Out of Work Icon
    Taking the Paper Out of Work

    For organizations that need powerful ECM and document automation software

    The Square 9 AI-powered intelligent document processing platform takes the paper out of work and makes it easier to get things done with digital workflows.
    Learn More
  • 1
    Pocket TTS

    Pocket TTS

    A TTS that fits in your CPU (and pocket)

    Pocket TTS is a lightweight text-to-speech project designed to run efficiently on CPUs, targeting developers who want local speech generation without depending on GPUs or hosted web APIs. It is built to feel practical in everyday applications, where installation and usage should be as simple as adding a dependency and calling a function. The project focuses on keeping the runtime footprint manageable while still producing natural-sounding speech, which makes it attractive for offline tools,...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 2
    pyttsx3

    pyttsx3

    Offline Text To Speech synthesis for python

    pyttsx3 is an offline text-to-speech library for Python that wraps native speech engines instead of calling cloud APIs. It is designed to work entirely without an internet connection, making it suitable for local automation, kiosks, accessibility tools, and embedded applications. On Windows it uses SAPI5, on Linux it typically uses eSpeak or eSpeak-NG, and on macOS it can use NSSpeechSynthesizer or AVSpeechSynthesizer, giving it broad cross-platform compatibility.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 3
    Style-Bert-VITS2

    Style-Bert-VITS2

    Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles

    Style-Bert-VITS2 is a text-to-speech system based on Bert-VITS2 that focuses on highly controllable voice styles and emotional expression. It takes the original Bert-VITS2 v2.1 and its Japanese-Extra variant and extends them so you can control emotion and speaking style with fine-grained intensity, not just choose a generic tone. The project targets both power users and beginners: Windows users without Git or Python can install and run it using bundled .bat scripts, while advanced users can...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 4
    Qwen3-TTS

    Qwen3-TTS

    Qwen3-TTS is an open-source series of TTS models

    Qwen3-TTS is an open-source text-to-speech (TTS) project built around the Qwen3 large language model family, focused on generating high-quality, natural-sounding speech from plain text input. It provides researchers and developers with tools to transform text into expressive, intelligible audio, supporting multiple languages and voice characteristics tuned for clarity and fluidity. The project includes pre-trained models and inference scripts that let users synthesize speech locally or...
    Downloads: 11 This Week
    Last Update:
    See Project
  • Powerful Business Process Automation Icon
    Powerful Business Process Automation

    With ThinkAutomation, you get an open-ended studio to build any and every automated workflow you could ever need.

    When a message is received ThinkAutomation automatically executes one or more Automations. Automations are created using an easy to use drag-and-drop interface to run simple or complex tasks. Automations can perform many business process Actions, including: updating company databases, CRM systems and cloud services, sending outgoing emails, Teams & SMS messages, document processing, custom scripting, integration and much more. Over 100 built-in actions are included, plus ThinkAutomation is extensible with Custom Actions.  
    Learn More
  • 5
    VoxCPM

    VoxCPM

    TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

    VoxCPM is a tokenizer-free text-to-speech system that models speech in a continuous space, aiming for extremely realistic, context-aware synthesis and true-to-life zero-shot voice cloning. Instead of converting speech into discrete tokens, it uses an end-to-end diffusion-autoregressive architecture built on the MiniCPM-4 backbone, combining hierarchical language modeling, finite scalar quantization (FSQ), and local Diffusion Transformers. This design helps decouple semantic and acoustic...
    Downloads: 62 This Week
    Last Update:
    See Project
  • 6
    TTS WebUI

    TTS WebUI

    A single Gradio + React WebUI with extensions for ACE-Step

    ...It supports a wide range of models such as Bark, MusicGen, Tortoise, RVC, StyleTTS2, ParlerTTS, CosyVoice, XTTSv2, Stable Audio, SeamlessM4T, and many others, exposing them as interchangeable backends for speech and music synthesis. The project provides an installer that sets up Conda, Python environments, and all necessary dependencies, so users can focus on experimenting with voices instead of managing tooling. It offers both a Gradio backend and an optional React frontend, which can be accessed on separate ports and even run inside Docker for more reproducible deployments. An extension system lets you enable extra models and tools, install community extensions from a catalog, and manage them via a dedicated GUI or CLI extension manager.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 7
    AI Runner

    AI Runner

    Offline inference engine for art, real-time voice conversations

    ...The project has a strong focus on developer ergonomics, with thorough development guidelines, environment configuration using .env variables, and a clear structure for tests, tools and agents.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 8
    abogen

    abogen

    Generate audiobooks from EPUBs, PDFs and text with captions

    abogen is a tool designed to generate audiobooks (or speech narrations) from textual sources such as EPUBs, PDFs, or plain text, with synchronized captions. In other words, it automates the pipeline of reading a digital book (or document), converting its text into speech via a TTS engine, and packaging the result into an audiobook format — likely along with timestamped captions or subtitles that align with the spoken audio. This can be very useful for accessibility, content consumption on...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 9
    OmniVoice

    OmniVoice

    High-Quality Voice Cloning TTS for 600+ Languages

    The OmniVoice project is a cutting-edge multilingual text-to-speech system designed to generate high-quality speech across more than 600 languages. Built on a diffusion language model-style architecture, it combines scalability with strong performance, enabling both natural-sounding voice synthesis and efficient inference speeds. One of its most notable capabilities is zero-shot voice cloning, allowing users to replicate a speaker’s voice using only a short reference audio clip. In addition,...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Bitdefender Ultimate Small Business Security Icon
    Bitdefender Ultimate Small Business Security

    Protect the big future of your small business

    Get exceptional protection against all digital threats for your business and employees.
    Learn More
  • 10
    Sopro TTS

    Sopro TTS

    A lightweight text-to-speech model with zero-shot voice cloning

    ...The model is designed to work with a small set of dependencies and to be accessible for developers who want offline TTS with customizable voice style, including options for streaming or non-streaming generation modes. Users can install it with standard Python tools, run a demo server locally, and experiment with CLI or Python API usage for producing synthetic speech.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    NVIDIA NeMo Framework

    NVIDIA NeMo Framework

    Scalable generative AI framework built for researchers and developers

    ...NeMo is designed to scale: with tools like NeMo-Run, users can orchestrate large-scale experiments across thousands of GPUs.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    FastRTC

    FastRTC

    The python library for real-time communication

    FastRTC is a Python library designed to simplify real-time communication (RTC), especially for audio and video streaming applications. It abstracts away much of the complexity that typically comes with implementing WebRTC by providing a simple interface — e.g. a Stream class — that can be mounted within a web backend (for example a FastAPI application).
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    MiniMax-MCP

    MiniMax-MCP

    Official MiniMax Model Context Protocol (MCP) server

    MiniMax-MCP is the official Model Context Protocol (MCP) server for accessing MiniMax’s multimodal generative APIs from MCP-compatible clients. It acts as a bridge between tools like Claude Desktop, Cursor, Windsurf, OpenAI Agents, and the MiniMax platform, exposing capabilities such as text-to-speech, voice cloning, image generation, text-to-image, video generation, image-to-video, text-to-video, and music generation. The server is written in Python and distributed under the MIT license, with a pyproject.toml and uv-based workflow that makes installation and execution reproducible. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    IndexTTS2

    IndexTTS2

    Industrial-level controllable zero-shot text-to-speech system

    IndexTTS is a modern, zero-shot text-to-speech (TTS) system engineered to deliver high-quality, natural-sounding speech synthesis with few requirements and strong voice-cloning capabilities. It builds on state-of-the-art models such as XTTS and other modern neural TTS backbones, improving them with a conformer-based speech conditional encoder and upgrading the decoder to a high-quality vocoder (BigVGAN2), leading to clearer and more natural audio output. The system supports zero-shot voice...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 15
    RealtimeTTS

    RealtimeTTS

    Converts text to speech in realtime

    RealtimeTTS is a low-latency text-to-speech library built for real-time applications such as voice chat with LLMs, assistants, and interactive tools. It is designed around a streaming model: you can feed it text incrementally (for example, as an LLM responds) and get audio output almost immediately, which keeps end-to-end latency very low. The library is engine-agnostic and plugs into a wide range of cloud and local TTS systems, including OpenAI, ElevenLabs, Azure, Coqui, Piper, StyleTTS2,...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    openctp

    openctp

    Provides CTP stock options and Zhongtai Securities XTP

    ...The project offers a comprehensive simulation environment similar to SimNow that supports futures, options, A share stocks, funds, bonds, and stock options, and even extends to Hong Kong and US markets. In addition to the core library, openctp supplies Python bindings for CTPAPI and stock options APIs, making it easier to build strategies, tools, and analytics in Python. It also develops full featured and lightweight trading clients like TickTrader, TickTraderMini, and ViTrader, which support multiple desks and markets.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    OpenAI-Compatible Edge-TTS API

    OpenAI-Compatible Edge-TTS API

    Free, high-quality text-to-speech API endpoint to replace OpenAI

    OpenAI-Compatible Edge-TTS API is a local, OpenAI-compatible text-to-speech API that uses edge-tts—Microsoft Edge’s online TTS service—as the backend. The project emulates the /v1/audio/speech endpoint used by OpenAI, so any client that can talk to the OpenAI TTS API can be redirected to this service with minimal changes. It exposes parameters for input text, voice selection, audio format, and playback speed, mirroring the OpenAI interface while mapping popular OpenAI voice names to...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    Matcha-TTS

    Matcha-TTS

    A fast TTS architecture with conditional flow matching

    Matcha-TTS is a non-autoregressive neural text-to-speech architecture that uses conditional flow matching to generate speech quickly while maintaining natural quality. It models speech as an ODE-based generative process, and conditional flow matching lets it reach high-quality audio in only a few synthesis steps, which greatly reduces latency compared to score-matching diffusion approaches. The model is fully probabilistic, so it can generate diverse realizations of the same text while still...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    Lingvo

    Lingvo

    Framework for building neural networks

    Lingvo is a TensorFlow based framework focused on building and training sequence models, especially for language and speech tasks. It was originally developed for internal research and later open sourced to support reproducible experiments and shared model implementations. The framework provides a structured way to define models, input pipelines, and training configurations using a common interface for layers, which encourages reuse across different tasks. It has been used to implement state...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    IMS Toucan

    IMS Toucan

    Controllable and fast Text-to-Speech for over 7000 languages

    IMS-Toucan is a toolkit for training, using, and teaching state-of-the-art text-to-speech systems, built at the Institute for Natural Language Processing (IMS), University of Stuttgart. It is the official home of ToucanTTS, a massively multilingual TTS system designed to support over 7,000 languages with a single unified framework. The toolkit focuses on being fast and controllable while not requiring huge amounts of compute, making it practical for research labs and smaller teams. It...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    ESPnet

    ESPnet

    End-to-end speech processing toolkit

    ESPnet is a comprehensive end-to-end speech processing toolkit covering a wide spectrum of tasks, including automatic speech recognition (ASR), text-to-speech (TTS), speech translation (ST), speech enhancement, speaker diarization, and spoken language understanding. It uses PyTorch as its deep learning engine and adopts a Kaldi-style data processing pipeline for features, data formats, and experimental recipes. This combination allows researchers to leverage modern neural architectures while...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    CSM (Conversational Speech Model)

    CSM (Conversational Speech Model)

    A Conversational Speech Generation Model

    The CSM (Conversational Speech Model) is a speech generation model developed by Sesame AI that creates RVQ audio codes from text and audio inputs. It uses a Llama backbone and a smaller audio decoder to produce audio codes for realistic speech synthesis. The model has been fine-tuned for interactive voice demos and is hosted on platforms like Hugging Face for testing. CSM offers a flexible setup and is compatible with CUDA-enabled GPUs for efficient execution.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 23
    Bert-VITS2

    Bert-VITS2

    VITS2 backbone with multilingual-bert

    Bert-VITS2 is a neural text-to-speech project that combines a VITS2 backbone with a multilingual BERT front-end to produce high-quality speech in multiple languages. The core idea is to use BERT-style contextual embeddings for text encoding while relying on a refined VITS2 architecture for acoustic generation and vocoding. The repository includes everything needed to train, fine-tune, and run the model, from configuration files to preprocessing scripts, spectrogram utilities, and training...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    AudioBC

    AudioBC

    Offline desktop app to convert EPUB to MP3 using Kokoro-82M neural TTS

    AudioBC is a powerful desktop application designed to turn your digital library into a personal audiobook collection. Unlike most Text-to-Speech (TTS) tools that require expensive cloud API subscriptions or an active internet connection, AudioBC runs entirely on your local machine. Powered by the state-of-the-art Kokoro-82M neural engine, AudioBC produces natural, human-like speech that rivals premium cloud services. It is built with a focus on privacy and simplicity, offering a...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 25
    EmotiVoice

    EmotiVoice

    Multi-Voice and Prompt-Controlled TTS Engine

    EmotiVoice is a multi-voice, prompt-controlled text-to-speech engine designed to generate highly expressive speech across thousands of voices. It supports both English and Chinese and ships with over 2,000 preset voices, making it suitable for everything from characters and virtual anchors to narration and dialogue. The core idea is prompt-based emotional and style control: you can ask the engine to speak “happy,” “sad,” “excited,” or with other high-level style prompts that shape prosody,...
    Downloads: 5 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB