Search Results for "speech & image processing based project in matlab"

Showing 97 open source projects for "speech & image processing based project in matlab"

View related business solutions
  • Enterprise AI Agents for Every Customer Moment Icon
    Enterprise AI Agents for Every Customer Moment

    For enterprise companies looking for AI Agents

    From chat to voice to SMS, every conversation gets a smart, personalized response powered by your policies, tone, and data.
    Learn More
  • Go beyond a virtual data room with Datasite Diligence Icon
    Go beyond a virtual data room with Datasite Diligence

    Datasite Diligence, helps dealmakers in more than 170 countries close more deals, faster.

    The data room with a view. Evolved for next-generation M&A. Built on decades of deal experience. Packed with expert tools, yet intuitive for novices. A fully mobile platform with frictionless processes. Smart AI tools that let you close more deals, faster, plus end-to-end support at all times. Do due diligence with intelligence.
    Learn More
  • 1
    Image-Editor

    Image-Editor

    AI based photo editing website for changing image background

    Welcome to Image-Editor, the AI-based photo editing website that lets you change backgrounds, colors, crop, sharpen images, and much more with just a single click. With exceptional image quality and fast processing times, Image-Editor is the ultimate tool for all your photo editing needs. To get started, simply run pip install -r requirements.txt to download all the necessary libraries.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 2
    pyVideoTrans

    pyVideoTrans

    Translate the video from one language to another and embed dubbing

    pyVideoTrans is an ambitious open-source multimedia processing project that assembles speech recognition, subtitle generation, AI translation, voice synthesis, and video assembly into a unified pipeline for converting videos from one language to another with embedded dubbing and captions. At its core it runs speech-to-text models to transcribe audio tracks, translates the resulting text into a target language using local or cloud-based translation engines, synthesizes new speech to match the translated subtitles, and then merges that speech back into the video, creating a fully localized media file. ...
    Downloads: 30 This Week
    Last Update:
    See Project
  • 3
    GLM

    GLM

    OpenGL Mathematics (GLM)

    OpenGL Mathematics (GLM) is a header only C++ mathematics library for graphics software based on the OpenGL Shading Language (GLSL) specifications. GLM provides classes and functions designed and implemented with the same naming conventions and functionality than GLSL so that anyone who knows GLSL, can use GLM as well in C++. This project isn't limited to GLSL features. An extension system, based on the GLSL extension conventions, provides extended capabilities: matrix transformations, quaternions, data packing, random numbers, noise, etc. ...
    Downloads: 64 This Week
    Last Update:
    See Project
  • 4
    Point Cloud Library

    Point Cloud Library

    A standalone, large scale, open project for 2D/3D image processing

    The Point Cloud Library (PCL) is a standalone, large scale, open project for 2D/3D image and point cloud processing. PCL is released under the terms of the BSD license, and thus free for commercial and research use. Whether you’ve just discovered PCL or you’re a long time veteran, this page contains links to a set of resources that will help consolidate your knowledge on PCL and 3D processing. An additional Wiki resource for developers is available too. To simplify both usage and...
    Downloads: 18 This Week
    Last Update:
    See Project
  • Create stunning, professional email signatures in minutes Icon
    Create stunning, professional email signatures in minutes

    For companies looking to create, assign and manage all their employees email signatures and add targeted marketing banners.

    Create, assign and manage all your employees’ email signatures and add targeted marketing banners. Stop getting worked up about your signatures! Leverage a centralized interface to easily create and manage the email signatures of all your employees. Take advantage of each email to broadcast and amplify your brand. Letsignit helps you regain control over your digital identity. Harmonize 100% of your employee’s email signatures in just a few clicks! 121 professional emails are received and 40 are sent every day by an employee. With Letsignit, turn every email into a powerful communication opportunity: send the right message to the right person at the right time! Innovative more than tech, inspiring more than following. Authentic more than overrated, close more than "think big", trustworthy more than doubtful. Hands-on more than complex, available but yet premium, fun but yet expert.
    Learn More
  • 5
    AI Runner

    AI Runner

    Offline inference engine for art, real-time voice conversations

    AI Runner is an offline inference engine designed to run a collection of AI workloads on your own machine, including image generation for art, real-time voice conversations, LLM-powered chatbots and automated workflows. It is implemented as a desktop-oriented Python application and emphasizes privacy and self-hosting, allowing users to work with text-to-speech, speech-to-text, text-to-image and multimodal models without sending data to external services. ...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 6
    Readest

    Readest

    Readest is a modern, feature-rich ebook reader

    Readest is a project meant to facilitate reading, studying, or consuming content by integrating reading tools with AI-powered assistance. Although the repository is not as widely documented or popular as some, the idea is that Readest supports features to help with reading comprehension — likely combining OCR / text retrieval, translation, note-taking, or summarization for reading materials (eBooks, articles, PDFs). The goal appears to be to let users feed in arbitrary reading material and...
    Downloads: 41 This Week
    Last Update:
    See Project
  • 7
    FaceFusion

    FaceFusion

    Industry leading face manipulation platform

    FaceFusion is an open-source face swapping and facial enhancement toolkit designed for high-quality video and image manipulation workflows. The project enables users to replace faces in images or videos while maintaining temporal consistency and visual realism. It integrates modern deep learning models for face detection, alignment, and blending to produce smoother results than traditional approaches. FaceFusion is built with a modular pipeline that allows users to customize processing steps...
    Downloads: 252 This Week
    Last Update:
    See Project
  • 8
    WhisperJAV

    WhisperJAV

    Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD

    WhisperJAV is an open-source speech transcription pipeline designed specifically for generating subtitles for Japanese adult video content. The project addresses challenges that standard speech recognition models face when transcribing this type of audio, which often includes low signal-to-noise ratios and large numbers of non-verbal vocalizations. Traditional automatic speech recognition systems can misinterpret these sounds as words, leading to inaccurate transcripts. WhisperJAV introduces...
    Downloads: 22 This Week
    Last Update:
    See Project
  • 9
    Orpheus TTS

    Orpheus TTS

    Towards Human-Sounding Speech

    Orpheus TTS is a state-of-the-art open-source text-to-speech system built on a Llama-3B backbone, treating speech synthesis as a large language model problem instead of a traditional TTS pipeline. It is designed to produce human-like speech with natural intonation, emotion, and rhythm, targeting quality comparable to or better than many closed-source systems. The project ships both pretrained and finetuned English models, as well as a family of multilingual models released as a research...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Free and Open Source HR Software Icon
    Free and Open Source HR Software

    OrangeHRM provides a world-class HRIS experience and offers everything you and your team need to be that HR hero you know that you are.

    Give your HR team the tools they need to streamline administrative tasks, support employees, and make informed decisions with the OrangeHRM free and open source HR software.
    Learn More
  • 10
    SD.Next

    SD.Next

    All-in-one WebUI for AI generative image and video creation

    SD.Next is an all-in-one web user interface for generative image creation that expands beyond basic Stable Diffusion workflows to cover broader image and video generation, captioning, and processing tasks. It is designed as a power-user environment where model management, generation features, and workflow controls are centralized in a single UI rather than spread across separate scripts and utilities.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 11
    comfyui-mixlab-nodes

    comfyui-mixlab-nodes

    Workflow and speech recognition app

    comfyui-mixlab-nodes is a large collection of custom nodes for ComfyUI that turns workflows into interactive apps and adds real-time multimedia, LLM, and TTS capabilities. It introduces a “Workflow-to-APP” concept, where a ComfyUI graph can be transformed into a Web App through an AppInfo node, complete with categories, batch prompts, and editable configurations. The project also brings Real-time Design features like screen capture and floating video nodes, enabling creative pipelines that...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 12
    OpenAI-Compatible Edge-TTS API

    OpenAI-Compatible Edge-TTS API

    Free, high-quality text-to-speech API endpoint to replace OpenAI

    OpenAI-Compatible Edge-TTS API is a local, OpenAI-compatible text-to-speech API that uses edge-tts—Microsoft Edge’s online TTS service—as the backend. The project emulates the /v1/audio/speech endpoint used by OpenAI, so any client that can talk to the OpenAI TTS API can be redirected to this service with minimal changes. It exposes parameters for input text, voice selection, audio format, and playback speed, mirroring the OpenAI interface while mapping popular OpenAI voice names to...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Streamer-Sales

    Streamer-Sales

    LLM Large Model of Selling Anchor

    Streamer-Sales is an open-source large language model system designed specifically for e-commerce live streaming and automated product promotion. The project focuses on generating persuasive product descriptions and live presentation scripts that mimic the style of professional online sales hosts. By analyzing product characteristics and marketing information, the model can produce engaging explanations that emphasize benefits, features, and emotional appeal to encourage viewers to make...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 14
    Luna AI

    Luna AI

    Virtual AI anchor that combines state-of-the-art technology

    Luna AI is a virtual AI streamer framework designed to power an interactive VTuber that can go live on major platforms and chat with viewers in real time. It is built around a core assistant persona called “Luna AI,” which can be driven by a wide range of large language models and platforms, including GPT-style APIs, Claude, LangChain-based backends, ChatGLM, Kimi, Ollama, and many others. The project supports multiple rendering backends for the avatar, such as Live2D, Unreal Engine (UE),...
    Downloads: 18 This Week
    Last Update:
    See Project
  • 15
    pyAudioAnalysis

    pyAudioAnalysis

    Python Audio Analysis Library: Feature Extraction, Classification

    pyAudioAnalysis is an open-source Python library designed for audio signal analysis, machine learning, and music information retrieval tasks. The project provides a collection of tools that allow developers to extract meaningful features from audio files and use those features for classification, segmentation, and analysis. The library supports multiple audio processing workflows, including feature extraction from raw audio signals, training of machine learning models, and automatic audio...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 16
    Dict UK

    Dict UK

    Project to generate POS tag dictionary for Ukrainian language

    A Java-based tool for generating full morphological dictionaries for Ukrainian, applying affix rules to base lexemes to produce all inflected forms with part-of-speech tags—used for natural language processing and spell-checking.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 17
    reverse-SynthID

    reverse-SynthID

    Reverse engineering Gemini's SynthID detection

    Reverse-SynthID is a research-focused project that analyzes and reverse-engineers Google’s SynthID watermarking system used in AI-generated images. It leverages signal processing and spectral analysis techniques to identify hidden watermark patterns without access to proprietary encoding methods. The project introduces a multi-resolution “SpectralCodebook” that maps watermark characteristics across different image sizes. Using this approach, it can detect SynthID watermarks with high...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 18
    OpenAI

    OpenAI

    Swift community driven package for OpenAI public API

    MacPaw OpenAI is a community-driven Swift SDK that provides developers with a structured and type-safe way to interact with the OpenAI API and compatible providers within Apple ecosystem applications. It simplifies the integration of AI capabilities into iOS, macOS, and other Swift-based applications by offering a clean abstraction over the underlying REST API, enabling developers to focus on functionality rather than low-level implementation details. The SDK supports a wide range of...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    HivisionIDPhoto

    HivisionIDPhoto

    HivisionIDPhotos: a lightweight and efficient AI ID photos tools

    ...The software analyzes portrait images, performs background removal, aligns the face according to ID photo standards, and produces images in various official size formats. It also allows the generation of layout sheets such as six-inch photo arrangements for printing multiple ID photos on a single page. The project focuses on building a practical pipeline for automated ID photo production using AI-based segmentation and image processing techniques.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 20
    ComfyUI Examples

    ComfyUI Examples

    Examples of ComfyUI workflows

    ComfyUI_examples is the companion repository for ComfyUI that collects ready-made example workflows, nodes, and compositions to help users learn the node-based interface for AI image generation. Instead of starting from an empty graph, you can open an example and see how prompts, samplers, models, and image processing steps are wired together. This makes ComfyUI more approachable for people coming from “one text box” generators, because they can reverse-engineer complex pipelines visually....
    Downloads: 3 This Week
    Last Update:
    See Project
  • 21
    AI App Lab

    AI App Lab

    Implementing large models into scenario-based applications

    AI App Lab is an open-source platform developed by Volcengine that provides tools, SDKs, and example applications for building real-world AI applications powered by large language models. The project focuses on helping developers bridge the gap between AI models and practical business use cases by offering a structured environment for creating production-ready AI systems. It includes a high-level SDK called Arkitect, which provides workflows and tools for integrating models, plugins, and...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 22
    Open Vision Agents by Stream

    Open Vision Agents by Stream

    Build Vision Agents quickly with any model or video provider

    Open Vision Agents by Stream is an open source framework from Stream for building real time, multimodal AI agents that watch, listen, and respond to live video streams. It focuses on combining video understanding models, such as YOLO and Roboflow based detectors, with real time large language models like OpenAI Realtime and Gemini Live to create interactive experiences. The framework uses Stream’s ultra low latency edge network so agents can join sessions quickly and maintain very low audio...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 23
    Anime4KCPP

    Anime4KCPP

    A high performance anime upscaler

    Anime4KCPP provides an optimized bloc97's Anime4K algorithm version 0.9, and it also provides its own CNN algorithm ACNet, it provides a variety of way to use, including preprocessing and real-time playback, it aims to be a high-performance tool to process both image and video. This project is for learning and the exploration task of the algorithm course in SWJTU. Anime4K is a simple high-quality anime upscale algorithm. Version 0.9 does not use any machine learning approaches and can be very fast in real-time processing or pretreatment. ACNet is a CNN-based anime upscale algorithm. It aims to provide both high-quality and high-performance. ...
    Downloads: 19 This Week
    Last Update:
    See Project
  • 24
    VCClient

    VCClient

    Software that uses AI to perform real-time voice conversion

    VCClient is a real-time voice conversion system that uses machine learning models to transform a speaker’s voice into another voice with minimal latency. It is designed for live applications such as streaming, gaming, and virtual communication, where immediate feedback is essential. The system supports multiple voice conversion models, including RVC and other neural network-based approaches, allowing users to switch between different voices or customize their output. It provides both a...
    Downloads: 20 This Week
    Last Update:
    See Project
  • 25
    React Native AI

    React Native AI

    Full stack framework for building cross-platform mobile AI apps

    React Native AI is a full-stack framework designed to simplify the development of AI-powered mobile applications using React Native. The project provides a ready-to-use infrastructure for building cross-platform apps that integrate large language models and other AI services. It supports real-time streaming responses from multiple AI providers and enables developers to build chat interfaces, AI-driven image generation tools, and natural language features within mobile apps. The framework...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • Next
MongoDB Logo MongoDB