Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Search Results

Search Results for "speech & image processing based project in matlab"

x

Sort By:

Relevance

OS

Windows 85
Linux 80
Mac 71
More...
BSD 42
ChromeOS 35
Desktop Operating Systems 3
Mobile Operating Systems 2
Embedded Operating Systems 1

Category

Multimedia 39
Artificial Intelligence 36
Scientific/Engineering 29
Software Development 13
Business 11
Education 8
Database 4
Internet 2
Security 1
Social sciences 1
System 1
Text Editors 1

License

OSI-Approved Open Source 76
Other License 3
Creative Commons Attribution License 2
GNU Free Documentation License 1
More...
Public Domain 1

Translations

English 17
Russian 3
French 2
German 2
More...
Arabic 1
Brazilian Portuguese 1
Chinese (Simplified) 1
Chinese (Traditional) 1
Dutch 1
Indonesian 1
Italian 1
Japanese 1
Korean 1
Persian 1
Portuguese 1
Spanish 1
Turkish 1

Programming Language

Python 29
MATLAB 19
C++ 18
Java 9
More...
C# 7
C 6
JavaScript 3
TypeScript 3
Unix Shell 3
PHP 2
Groovy 1
Perl 1
PL/SQL 1
Swift 1
VHDL/Verilog 1
XBase/Clipper 1

Status

Beta 14
Production/Stable 14
Alpha 6
Pre-Alpha 5
More...
Planning 2
Mature 2
Inactive 2

Showing 97 open source projects for "speech & image processing based project in matlab"

View related business solutions

Enterprise AI Agents for Every Customer Moment
For enterprise companies looking for AI Agents

From chat to voice to SMS, every conversation gets a smart, personalized response powered by your policies, tone, and data.

Learn More
Go beyond a virtual data room with Datasite Diligence
Datasite Diligence, helps dealmakers in more than 170 countries close more deals, faster.

The data room with a view. Evolved for next-generation M&A. Built on decades of deal experience. Packed with expert tools, yet intuitive for novices. A fully mobile platform with frictionless processes. Smart AI tools that let you close more deals, faster, plus end-to-end support at all times. Do due diligence with intelligence.

Learn More
1

Image-Editor

AI based photo editing website for changing image background

Welcome to Image-Editor, the AI-based photo editing website that lets you change backgrounds, colors, crop, sharpen images, and much more with just a single click. With exceptional image quality and fast processing times, Image-Editor is the ultimate tool for all your photo editing needs. To get started, simply run pip install -r requirements.txt to download all the necessary libraries.

Downloads: 4 This Week

Last Update: 2024-06-06
See Project
2

pyVideoTrans

Translate the video from one language to another and embed dubbing

pyVideoTrans is an ambitious open-source multimedia processing project that assembles speech recognition, subtitle generation, AI translation, voice synthesis, and video assembly into a unified pipeline for converting videos from one language to another with embedded dubbing and captions. At its core it runs speech-to-text models to transcribe audio tracks, translates the resulting text into a target language using local or cloud-based translation engines, synthesizes new speech to match the translated subtitles, and then merges that speech back into the video, creating a fully localized media file. ...

Downloads: 30 This Week

Last Update: 7 hours ago
See Project
3

GLM

OpenGL Mathematics (GLM)

OpenGL Mathematics (GLM) is a header only C++ mathematics library for graphics software based on the OpenGL Shading Language (GLSL) specifications. GLM provides classes and functions designed and implemented with the same naming conventions and functionality than GLSL so that anyone who knows GLSL, can use GLM as well in C++. This project isn't limited to GLSL features. An extension system, based on the GLSL extension conventions, provides extended capabilities: matrix transformations, quaternions, data packing, random numbers, noise, etc. ...

Downloads: 64 This Week

Last Update: 2025-12-31
See Project
4

Point Cloud Library

A standalone, large scale, open project for 2D/3D image processing

The Point Cloud Library (PCL) is a standalone, large scale, open project for 2D/3D image and point cloud processing. PCL is released under the terms of the BSD license, and thus free for commercial and research use. Whether you’ve just discovered PCL or you’re a long time veteran, this page contains links to a set of resources that will help consolidate your knowledge on PCL and 3D processing. An additional Wiki resource for developers is available too. To simplify both usage and...

Downloads: 18 This Week

Last Update: 2025-08-27
See Project
Create stunning, professional email signatures in minutes
For companies looking to create, assign and manage all their employees email signatures and add targeted marketing banners.

Create, assign and manage all your employees’ email signatures and add targeted marketing banners. Stop getting worked up about your signatures! Leverage a centralized interface to easily create and manage the email signatures of all your employees. Take advantage of each email to broadcast and amplify your brand. Letsignit helps you regain control over your digital identity. Harmonize 100% of your employee’s email signatures in just a few clicks! 121 professional emails are received and 40 are sent every day by an employee. With Letsignit, turn every email into a powerful communication opportunity: send the right message to the right person at the right time! Innovative more than tech, inspiring more than following. Authentic more than overrated, close more than "think big", trustworthy more than doubtful. Hands-on more than complex, available but yet premium, fun but yet expert.

Learn More
5

AI Runner

Offline inference engine for art, real-time voice conversations

AI Runner is an offline inference engine designed to run a collection of AI workloads on your own machine, including image generation for art, real-time voice conversations, LLM-powered chatbots and automated workflows. It is implemented as a desktop-oriented Python application and emphasizes privacy and self-hosting, allowing users to work with text-to-speech, speech-to-text, text-to-image and multimodal models without sending data to external services. ...

Downloads: 13 This Week

Last Update: 2025-12-11
See Project
6

Readest

Readest is a modern, feature-rich ebook reader

Readest is a project meant to facilitate reading, studying, or consuming content by integrating reading tools with AI-powered assistance. Although the repository is not as widely documented or popular as some, the idea is that Readest supports features to help with reading comprehension — likely combining OCR / text retrieval, translation, note-taking, or summarization for reading materials (eBooks, articles, PDFs). The goal appears to be to let users feed in arbitrary reading material and...

Downloads: 41 This Week

Last Update: 1 day ago
See Project
7

FaceFusion

Industry leading face manipulation platform

FaceFusion is an open-source face swapping and facial enhancement toolkit designed for high-quality video and image manipulation workflows. The project enables users to replace faces in images or videos while maintaining temporal consistency and visual realism. It integrates modern deep learning models for face detection, alignment, and blending to produce smoother results than traditional approaches. FaceFusion is built with a modular pipeline that allows users to customize processing steps...

Downloads: 252 This Week

Last Update: 2026-03-16
See Project
8

WhisperJAV

Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD

WhisperJAV is an open-source speech transcription pipeline designed specifically for generating subtitles for Japanese adult video content. The project addresses challenges that standard speech recognition models face when transcribing this type of audio, which often includes low signal-to-noise ratios and large numbers of non-verbal vocalizations. Traditional automatic speech recognition systems can misinterpret these sounds as words, leading to inaccurate transcripts. WhisperJAV introduces...

Downloads: 22 This Week

Last Update: 5 days ago
See Project
9

Orpheus TTS

Towards Human-Sounding Speech

Orpheus TTS is a state-of-the-art open-source text-to-speech system built on a Llama-3B backbone, treating speech synthesis as a large language model problem instead of a traditional TTS pipeline. It is designed to produce human-like speech with natural intonation, emotion, and rhythm, targeting quality comparable to or better than many closed-source systems. The project ships both pretrained and finetuned English models, as well as a family of multilingual models released as a research...

Downloads: 1 This Week

Last Update: 2025-12-05
See Project
Free and Open Source HR Software
OrangeHRM provides a world-class HRIS experience and offers everything you and your team need to be that HR hero you know that you are.

Give your HR team the tools they need to streamline administrative tasks, support employees, and make informed decisions with the OrangeHRM free and open source HR software.

Learn More
10

SD.Next

All-in-one WebUI for AI generative image and video creation

SD.Next is an all-in-one web user interface for generative image creation that expands beyond basic Stable Diffusion workflows to cover broader image and video generation, captioning, and processing tasks. It is designed as a power-user environment where model management, generation features, and workflow controls are centralized in a single UI rather than spread across separate scripts and utilities.

Downloads: 10 This Week

Last Update: 2026-04-02
See Project
11

comfyui-mixlab-nodes

Workflow and speech recognition app

comfyui-mixlab-nodes is a large collection of custom nodes for ComfyUI that turns workflows into interactive apps and adds real-time multimedia, LLM, and TTS capabilities. It introduces a “Workflow-to-APP” concept, where a ComfyUI graph can be transformed into a Web App through an AppInfo node, complete with categories, batch prompts, and editable configurations. The project also brings Real-time Design features like screen capture and floating video nodes, enabling creative pipelines that...

Downloads: 11 This Week

Last Update: 2025-11-28
See Project
12

OpenAI-Compatible Edge-TTS API

Free, high-quality text-to-speech API endpoint to replace OpenAI

OpenAI-Compatible Edge-TTS API is a local, OpenAI-compatible text-to-speech API that uses edge-tts—Microsoft Edge’s online TTS service—as the backend. The project emulates the /v1/audio/speech endpoint used by OpenAI, so any client that can talk to the OpenAI TTS API can be redirected to this service with minimal changes. It exposes parameters for input text, voice selection, audio format, and playback speed, mirroring the OpenAI interface while mapping popular OpenAI voice names to...

Downloads: 1 This Week

Last Update: 2025-11-28
See Project
13

Streamer-Sales

LLM Large Model of Selling Anchor

Streamer-Sales is an open-source large language model system designed specifically for e-commerce live streaming and automated product promotion. The project focuses on generating persuasive product descriptions and live presentation scripts that mimic the style of professional online sales hosts. By analyzing product characteristics and marketing information, the model can produce engaging explanations that emphasize benefits, features, and emotional appeal to encourage viewers to make...

Downloads: 7 This Week

Last Update: 2026-03-05
See Project
14

Luna AI

Virtual AI anchor that combines state-of-the-art technology

Luna AI is a virtual AI streamer framework designed to power an interactive VTuber that can go live on major platforms and chat with viewers in real time. It is built around a core assistant persona called “Luna AI,” which can be driven by a wide range of large language models and platforms, including GPT-style APIs, Claude, LangChain-based backends, ChatGLM, Kimi, Ollama, and many others. The project supports multiple rendering backends for the avatar, such as Live2D, Unreal Engine (UE),...

Downloads: 18 This Week

Last Update: 2025-11-28
See Project
15

pyAudioAnalysis

Python Audio Analysis Library: Feature Extraction, Classification

pyAudioAnalysis is an open-source Python library designed for audio signal analysis, machine learning, and music information retrieval tasks. The project provides a collection of tools that allow developers to extract meaningful features from audio files and use those features for classification, segmentation, and analysis. The library supports multiple audio processing workflows, including feature extraction from raw audio signals, training of machine learning models, and automatic audio...

Downloads: 3 This Week

Last Update: 2026-03-10
See Project
16

Dict UK

Project to generate POS tag dictionary for Ukrainian language

A Java-based tool for generating full morphological dictionaries for Ukrainian, applying affix rules to base lexemes to produce all inflected forms with part-of-speech tags—used for natural language processing and spell-checking.

Downloads: 7 This Week

Last Update: 2026-02-28
See Project
17

reverse-SynthID

Reverse engineering Gemini's SynthID detection

Reverse-SynthID is a research-focused project that analyzes and reverse-engineers Google’s SynthID watermarking system used in AI-generated images. It leverages signal processing and spectral analysis techniques to identify hidden watermark patterns without access to proprietary encoding methods. The project introduces a multi-resolution “SpectralCodebook” that maps watermark characteristics across different image sizes. Using this approach, it can detect SynthID watermarks with high...

Downloads: 7 This Week

Last Update: 4 days ago
See Project
18

OpenAI

Swift community driven package for OpenAI public API

MacPaw OpenAI is a community-driven Swift SDK that provides developers with a structured and type-safe way to interact with the OpenAI API and compatible providers within Apple ecosystem applications. It simplifies the integration of AI capabilities into iOS, macOS, and other Swift-based applications by offering a clean abstraction over the underlying REST API, enabling developers to focus on functionality rather than low-level implementation details. The SDK supports a wide range of...

Downloads: 2 This Week

Last Update: 2026-03-30
See Project
19

HivisionIDPhoto

HivisionIDPhotos: a lightweight and efficient AI ID photos tools

...The software analyzes portrait images, performs background removal, aligns the face according to ID photo standards, and produces images in various official size formats. It also allows the generation of layout sheets such as six-inch photo arrangements for printing multiple ID photos on a single page. The project focuses on building a practical pipeline for automated ID photo production using AI-based segmentation and image processing techniques.

Downloads: 7 This Week

Last Update: 2026-03-10
See Project
20

ComfyUI Examples

Examples of ComfyUI workflows

ComfyUI_examples is the companion repository for ComfyUI that collects ready-made example workflows, nodes, and compositions to help users learn the node-based interface for AI image generation. Instead of starting from an empty graph, you can open an example and see how prompts, samplers, models, and image processing steps are wired together. This makes ComfyUI more approachable for people coming from “one text box” generators, because they can reverse-engineer complex pipelines visually....

Downloads: 3 This Week

Last Update: 2025-11-26
See Project
21

AI App Lab

Implementing large models into scenario-based applications

AI App Lab is an open-source platform developed by Volcengine that provides tools, SDKs, and example applications for building real-world AI applications powered by large language models. The project focuses on helping developers bridge the gap between AI models and practical business use cases by offering a structured environment for creating production-ready AI systems. It includes a high-level SDK called Arkitect, which provides workflows and tools for integrating models, plugins, and...

Downloads: 7 This Week

Last Update: 2026-03-17
See Project
22

Open Vision Agents by Stream

Build Vision Agents quickly with any model or video provider

Open Vision Agents by Stream is an open source framework from Stream for building real time, multimodal AI agents that watch, listen, and respond to live video streams. It focuses on combining video understanding models, such as YOLO and Roboflow based detectors, with real time large language models like OpenAI Realtime and Gemini Live to create interactive experiences. The framework uses Stream’s ultra low latency edge network so agents can join sessions quickly and maintain very low audio...

Downloads: 9 This Week

Last Update: 18 hours ago
See Project
23

Anime4KCPP

A high performance anime upscaler

Anime4KCPP provides an optimized bloc97's Anime4K algorithm version 0.9, and it also provides its own CNN algorithm ACNet, it provides a variety of way to use, including preprocessing and real-time playback, it aims to be a high-performance tool to process both image and video. This project is for learning and the exploration task of the algorithm course in SWJTU. Anime4K is a simple high-quality anime upscale algorithm. Version 0.9 does not use any machine learning approaches and can be very fast in real-time processing or pretreatment. ACNet is a CNN-based anime upscale algorithm. It aims to provide both high-quality and high-performance. ...

Downloads: 19 This Week

Last Update: 2025-08-01
See Project
24

VCClient

Software that uses AI to perform real-time voice conversion

VCClient is a real-time voice conversion system that uses machine learning models to transform a speaker’s voice into another voice with minimal latency. It is designed for live applications such as streaming, gaming, and virtual communication, where immediate feedback is essential. The system supports multiple voice conversion models, including RVC and other neural network-based approaches, allowing users to switch between different voices or customize their output. It provides both a...

Downloads: 20 This Week

Last Update: 2026-03-23
See Project
25

React Native AI

Full stack framework for building cross-platform mobile AI apps

React Native AI is a full-stack framework designed to simplify the development of AI-powered mobile applications using React Native. The project provides a ready-to-use infrastructure for building cross-platform apps that integrate large language models and other AI services. It supports real-time streaming responses from multiple AI providers and enables developers to build chat interfaces, AI-driven image generation tools, and natural language features within mobile apps. The framework...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project

Previous
You're on page 1
2
3
4
Next

Related Searches

ai

speech

facefusion

photo editor

readest

ai offline

video editor

image to video

translate

glm

Related Categories

Multimedia

Artificial Intelligence

Scientific/Engineering

Software Development

Business

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Privacy Choices Advertise