Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Search Results

Search Results for "speech recognition machine learning"

x

Sort By:

Relevance

Clear All Filters

OS

Windows 81
Linux 79
Mac 78
More...
BSD 21
ChromeOS 20
Mobile Operating Systems 2

Category

Artificial Intelligence 82
Software Development 14
Multimedia 6
System 3
Business 2
Education 2
Scientific/Engineering 2
Text Editors 1

License

OSI-Approved Open Source 82
Creative Commons Attribution License 1
Other License 1

Programming Language

Python 87
C++ 3
C 2
C# 1
Common Lisp 1
More...
Java 1
JavaScript 1
Perl 1
Ruby 1
Unix Shell 1

Status

Beta 4
Alpha 1
Production/Stable 1

Showing 87 open source projects for "speech recognition machine learning"

View related business solutions

Python Clear Filters & Widen Search

Respond 100x faster, more accurately, and improve your documentation
Designed for forward-thinking security, sales, and compliance teams

Slash response times for questionnaires, audits, and RFPs by up to 90%. OptiValue.ai automates the heavy lifting, freeing your team to focus on strategic priorities with intuitive tools for seamless review and validation.

Learn More
Attack Surface Management | Criminal IP ASM
For security operations, threat-intelligence and risk teams wanting a tool to get access to auto-monitored assets exposed to attack surfaces

Criminal IP’s Attack Surface Management (ASM) is a threat-intelligence–driven platform that continuously discovers, inventories, and monitors every internet-connected asset associated with an organization, including shadow and forgotten resources, so teams see their true external footprint from an attacker’s perspective. The solution combines automated asset discovery with OSINT techniques, AI enrichment and advanced threat intelligence to surface exposed hosts, domains, cloud services, IoT endpoints and other Internet-facing vectors, capture evidence (screenshots and metadata), and correlate findings to known exploitability and attacker tradecraft. ASM prioritizes exposures by business context and risk, highlights vulnerable components and misconfigurations, and provides real-time alerts and dashboards to speed investigation and remediation.

Learn More
1

Hugging Face - Speech To Speech

Open speech-to-speech models and pipelines by Hugging Face toolkit AI

This project from Hugging Face focuses on enabling direct speech-to-speech processing using modern machine learning models. It provides tools and reference implementations that allow audio input to be transformed into audio output without requiring an intermediate text representation. Hugging Face - Speech To Speech builds on recent advances in speech modeling, combining components such as speech recognition, translation, and synthesis into unified pipelines. ...

Downloads: 3 This Week

Last Update: 2026-03-18
See Project
2

The SpeechBrain Toolkit

A PyTorch-based Speech Toolkit

...It is designed to be simple, extremely flexible, and user-friendly. Competitive or state-of-the-art performance is obtained in various domains. SpeechBrain supports state-of-the-art methods for end-to-end speech recognition, including models based on CTC, CTC+attention, transducers, transformers, and neural language models relying on recurrent neural networks and transformers. Speaker recognition is already deployed in a wide variety of realistic applications. SpeechBrain provides different models for speaker recognition, including X-vector, ECAPA-TDNN, PLDA, and contrastive learning. ...

Downloads: 4 This Week

Last Update: 2026-03-30
See Project
3

whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps

Multilingual Automatic Speech Recognition with word-level timestamps and confidence. Whisper is a set of multi-lingual, robust speech recognition models trained by OpenAI that achieve state-of-the-art results in many languages. Whisper models were trained to predict approximate timestamps on speech segments (most of the time with 1-second accuracy), but they cannot originally predict word timestamps. This repository proposes an implementation to predict word timestamps and provide a more...

Downloads: 8 This Week

Last Update: 2025-09-09
See Project
4

Diffgram

Training data (data labeling, annotation, workflow) for all data types

...Training Data is the art of supervising machines through data. This includes the activities of annotation, which produces structured data; ready to be consumed by a machine learning model. Annotation is required because raw media is considered to be unstructured and not usable without it. That’s why training data is required for many modern machine learning use cases including computer vision, natural language processing and speech recognition.

Downloads: 8 This Week

Last Update: 2024-10-14
See Project
Regpack: All-in-One Online Registration and Payment Software
For camps, courses, virtual classes, client billing, events, conferences, meetings, afterschool programs, educational travel, retreats

Regpack is a powerful onboarding, registration, and payments platform trusted by thousands of organizations worldwide. Our mission is simple: to give you the tools to automate busywork, streamline your processes, and keep your focus where it belongs, on growing your programs and serving your clients.

Learn More
5

Transformers

State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX

Hugging Face Transformers provides APIs and tools to easily download and train state-of-the-art pre-trained models. Using pre-trained models can reduce your compute costs, carbon footprint, and save you the time and resources required to train a model from scratch. These models support common tasks in different modalities. Text, for tasks like text classification, information extraction, question answering, summarization, translation, text generation, in over 100 languages. Images, for tasks...

Downloads: 19 This Week

Last Update: 4 days ago
See Project
6

NVIDIA NeMo

Toolkit for conversational AI

NVIDIA NeMo, part of the NVIDIA AI platform, is a toolkit for building new state-of-the-art conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS) models. Each collection consists of prebuilt modules that include everything needed to train on your data. Every module can easily be customized, extended, and composed to create new conversational AI model architectures. Conversational AI...

Downloads: 3 This Week

Last Update: 2026-03-23
See Project
7

Best-of Machine Learning with Python

A ranked list of awesome machine learning Python libraries

This curated list contains 900 awesome open-source projects with a total of 3.3M stars grouped into 34 categories. All projects are ranked by a project-quality score, which is calculated based on various metrics automatically collected from GitHub and different package managers. If you like to add or update projects, feel free to open an issue, submit a pull request, or directly edit the projects.yaml. Contributions are very welcome! General-purpose machine learning and deep learning...

Downloads: 2 This Week

Last Update: 2025-10-30
See Project
8

pyAudioAnalysis

Python Audio Analysis Library: Feature Extraction, Classification

...It also includes utilities for visualizing audio features and analyzing patterns within sound recordings, which can be useful in applications such as speech recognition, music classification, and acoustic event detection. Because the library integrates machine learning algorithms with signal processing tools, it enables researchers to develop complete audio analysis pipelines using a single framework.

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
9

deepface

A Lightweight Face Recognition and Facial Attribute Analysis

DeepFace is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python. It is a hybrid face recognition framework wrapping state-of-the-art models: VGG-Face, FaceNet, OpenFace, DeepFace, DeepID, ArcFace, Dlib, SFace and GhostFaceNet. Experiments show that human beings have 97.53% accuracy on facial recognition tasks whereas those models already reached and passed that accuracy level.

Downloads: 37 This Week

Last Update: 2026-03-01
See Project
JS7 JobScheduler is an open source workload automation solution.
JS7 offers cross-platform job execution, managed file transfer, complex no-code job dependencies and a real REST API.

JS7 JobScheduler is an open source workload automation solution. It is used to run executable files, shell scripts etc. and database procedures.

Learn More
10

TorchAudio

Data manipulation and transformation for audio signal processing

The aim of torchaudio is to apply PyTorch to the audio domain. By supporting PyTorch, torchaudio follows the same philosophy of providing strong GPU acceleration, having a focus on trainable features through the autograd system, and having consistent style (tensor names and dimension names). Therefore, it is primarily a machine learning library and not a general signal processing library. The benefits of PyTorch can be seen in torchaudio through having all the computations be through PyTorch...

Downloads: 4 This Week

Last Update: 2026-02-17
See Project
11

SimpleHTR

Handwritten Text Recognition (HTR) system implemented with TensorFlow

SimpleHTR is an open-source implementation of a handwriting text recognition system based on deep learning techniques. The project focuses on converting images of handwritten text into machine-readable digital text using neural networks. The system uses a combination of convolutional neural networks and recurrent neural networks to extract visual features and model sequential character patterns in handwriting.

Downloads: 0 This Week

Last Update: 2026-03-12
See Project
12

spaCy models

Models for the spaCy Natural Language Processing (NLP) library

spaCy is designed to help you do real work, to build real products, or gather real insights. The library respects your time, and tries to avoid wasting it. It's easy to install, and its API is simple and productive. spaCy excels at large-scale information extraction tasks. It's written from the ground up in carefully memory-managed Cython. If your application needs to process entire web dumps, spaCy is the library you want to be using. Since its release in 2015, spaCy has become an industry...

Downloads: 6 This Week

Last Update: 2026-03-18
See Project
13

SCAIL

Towards Studio-Grade Character Animation via In-Context Learning of 3D

...While specific documentation about SCAIL’s exact goals and implementation is limited from the repository context alone, the project appears to be part of a collection of machine learning and AI research tools that facilitate scalable model development, evaluation, or application workflows. Given its listing alongside other ZAI projects like speech recognition and text-to-speech systems, SCAIL likely emphasizes scalable, composable AI learning frameworks that support researchers and practitioners in experimenting with learning algorithms, datasets, and model components. ...

Downloads: 0 This Week

Last Update: 2026-01-30
See Project
14

pyVideoTrans

Translate the video from one language to another and embed dubbing

pyVideoTrans is an ambitious open-source multimedia processing project that assembles speech recognition, subtitle generation, AI translation, voice synthesis, and video assembly into a unified pipeline for converting videos from one language to another with embedded dubbing and captions. At its core it runs speech-to-text models to transcribe audio tracks, translates the resulting text into a target language using local or cloud-based translation engines, synthesizes new speech to match the...

Downloads: 21 This Week

Last Update: 3 days ago
See Project
15

HanLP

Han Language Processing

HanLP is a multilingual Natural Language Processing (NLP) library composed of a series of models and algorithms. Built on TensorFlow 2.0, it was designed to advance state-of-the-art deep learning techniques and popularize the application of natural language processing in both academia and industry. HanLP is capable of lexical analysis (Chinese word segmentation, part-of-speech tagging, named entity recognition), syntax analysis, text classification, and sentiment analysis. It comes with pretrained models for numerous languages including Chinese and English. ...

Downloads: 8 This Week

Last Update: 2025-03-07
See Project
16

face.evoLVe

High-Performance Face Recognition Library on PaddlePaddle & PyTorch

face.evoLVe is a high-performance face recognition library designed for research and real-world applications in computer vision. The project provides a comprehensive framework for building and training modern face recognition models using deep learning architectures. It includes components for face alignment, landmark localization, data preprocessing, and model training pipelines that allow developers to construct end-to-end facial recognition systems. The repository supports multiple neural...

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
17

Video-subtitle-extractor

A GUI tool for extracting hard-coded subtitle (hardsub) from videos

Video hard subtitle extraction, generate srt file. There is no need to apply for a third-party API, and text recognition can be implemented locally. A deep learning-based video subtitle extraction framework, including subtitle region detection and subtitle content extraction. A GUI tool for extracting hard-coded subtitles (hardsub) from videos and generating srt files. Use local OCR recognition, no need to set up and call any API, and do not need to access online OCR services such as Baidu...

1 Review

Downloads: 66 This Week

Last Update: 2026-04-05
See Project
18

DocTR

Library for OCR-related tasks powered by Deep Learning

DocTR provides an easy and powerful way to extract valuable information from your documents. Seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents. Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters. User-friendly, 3 lines of code to load a document and extract text with a predictor. State-of-the-art performances on public document...

Downloads: 8 This Week

Last Update: 2026-02-04
See Project
19

Advanced NLP with spaCy

Advanced NLP with spaCy: A free online course

Advanced NLP with spaCy is an open-source educational repository that provides the materials for an interactive course on advanced natural language processing using the spaCy library. The course is designed to teach developers how to build real-world NLP systems by combining rule-based techniques with machine learning models. The repository includes lessons, exercises, and examples that guide learners through tasks such as tokenization, named entity recognition, text classification, and training custom NLP models. It also demonstrates how spaCy pipelines work and how developers can extend them with custom components and training data. ...

Downloads: 0 This Week

Last Update: 2026-03-12
See Project
20

flair

A very simple framework for state-of-the-art NLP

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. A powerful NLP library. Flair allows you to apply our state-of-the-art natural language processing (NLP) models to your text, such as named entity recognition (NER), sentiment analysis, part-of-speech tagging (PoS), special support for biomedical texts, sense disambiguation and classification, with support for a rapidly growing number of languages. A text embedding library. Flair has...

Downloads: 0 This Week

Last Update: 2025-02-05
See Project
21

ESPnet

End-to-end speech processing toolkit

ESPnet is a comprehensive end-to-end speech processing toolkit covering a wide spectrum of tasks, including automatic speech recognition (ASR), text-to-speech (TTS), speech translation (ST), speech enhancement, speaker diarization, and spoken language understanding. It uses PyTorch as its deep learning engine and adopts a Kaldi-style data processing pipeline for features, data formats, and experimental recipes.

Downloads: 2 This Week

Last Update: 2026-04-07
See Project
22

Python Client For NLP Cloud

NLP Cloud serves high performance pre-trained or custom models for NER

NLP Cloud serves high performance pre-trained or custom models for NER, sentiment-analysis, classification, summarization, dialogue summarization, paraphrasing, intent classification, product description and ad generation, chatbot, grammar and spelling correction, keywords and keyphrases extraction, text generation, image generation, blog post generation, source code generation, question answering, automatic speech recognition, machine translation, language detection, semantic search, semantic similarity, tokenization, POS tagging, embeddings, and dependency parsing. It is ready for production, served through a REST API. You can either use the NLP Cloud pre-trained models, fine-tune your own models, or deploy your own models.

Downloads: 0 This Week

Last Update: 2024-11-27
See Project
23

Jittor

Jittor is a high-performance deep learning framework

Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators. The whole framework and meta-operators are compiled just in time. A powerful op compiler and tuner are integrated into Jittor. It allowed us to generate high-performance code specialized for your model. Jittor also contains a wealth of high-performance model libraries, including image recognition, detection, segmentation, generation, differentiable rendering, geometric learning, reinforcement...

Downloads: 2 This Week

Last Update: 2025-07-28
See Project
24

spaCy

Industrial-strength Natural Language Processing (NLP)

spaCy is a library built on the very latest research for advanced Natural Language Processing (NLP) in Python and Cython. Since its inception it was designed to be used for real world applications-- for building real products and gathering real insights. It comes with pretrained statistical models and word vectors, convolutional neural network models, easy deep learning integration and so much more. spaCy is the fastest syntactic parser in the world according to independent benchmarks, with...

Downloads: 76 This Week

Last Update: 2026-03-29
See Project
25

Open Model Zoo

Pre-trained Deep Learning models and demos

Open Model Zoo is a large repository of high-quality pre-trained deep learning models and demonstration applications designed to work with the OpenVINO™ toolkit, offering a comprehensive starting point for a wide range of AI and computer vision workloads. It includes hundreds of models covering object detection, classification, segmentation, pose estimation, speech recognition, text-to-speech, and more, many of which are already converted into formats optimized for inference on CPUs, GPUs, VPUs, and other accelerators supported by OpenVINO. ...

Downloads: 0 This Week

Last Update: 2026-01-10
See Project

Previous
You're on page 1
2
3
4
Next

Related Searches

video to srt

video text ocr

video subtitle extractor

deepface

subtitle

conversational ai

roof

transformers

nvidia

artificial intelligence projects

Related Categories

Artificial Intelligence

Software Development

Multimedia

System

Business

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Privacy Choices Advertise