Results for: audio generation

View:

Open Source Commercial

Suggested Categories:

AI Audio Generators

AI audio generators are tools that create speech, music, and sound effects using artificial intelligence. They use deep learning models, such as neural text-to-speech (TTS) and generative networks, to produce high-quality and realistic audio. These generators create audio and sound effects that can be used in movies, videos, video games, voiceovers, audiobooks, virtual assistants, and music production. Some can replicate human voices with natural tone, emotion, and accents, while others generate immersive sound effects for films and interactive media. As AI technology evolves, these tools continue to improve in realism, customization, and creative potential across various industries.

Audio Recording Software

Audio recording software allows users to capture, edit, and manipulate audio files on various devices, ranging from simple voice recordings to complex sound production. These platforms typically provide tools for recording sound from microphones, instruments, or other audio sources, along with features for editing, mixing, and enhancing audio quality. Audio recording software often supports multi-track editing, noise reduction, and effects processing, making it ideal for creating professional audio content. By using this software, users can produce high-quality recordings for podcasts, music, voiceovers, and other audio projects.

Audio Editing Software

Audio editing software is a tool that allows users to modify, enhance, and manipulate audio recordings for various purposes, such as music production, podcasting, and sound design. It provides a range of features, including trimming, cutting, merging, and applying effects to audio files, giving users precise control over their sound. Many audio editors also offer advanced tools like noise reduction, pitch correction, and equalization to refine audio quality. With a user-friendly interface and compatibility with multiple audio formats, audio editing software caters to beginners and professionals alike. It’s an essential tool for anyone working with sound, enabling them to craft polished and professional audio.

View more categories (20) for "audio generation"

51 Products for "audio generation" with 1 filter applied:

Sort By:

Linux Clear Filters & Widen Search

1

LALAL.AI

LALAL.AI

LALAL.AI is a next-generation audio separation service powered by advanced AI technology. With a suite of innovative tools - Stem Splitter, Voice Cleaner, Voice Changer, Voice Cloner, LALAL.AI enables users to take their audio content to the next level. Stem Splitter The core service of LALAL.AI, Stem Splitter allows users to extract individual vocals or instruments from audio tracks.

4,912 Ratings

Starting Price: $20 one-time payment

View Software
Visit Website
2

HunyuanVideo-Avatar

Tencent-Hunyuan

HunyuanVideo‑Avatar supports animating any input avatar images to high‑dynamic, emotion‑controllable videos using simple audio conditions. It is a multimodal diffusion transformer (MM‑DiT)‑based model capable of generating dynamic, emotion‑controllable, multi‑character dialogue videos. It accepts multi‑style avatar inputs, photorealistic, cartoon, 3D‑rendered, anthropomorphic, at arbitrary scales from portrait to full body. Provides a character image injection module that ensures strong character consistency while enabling dynamic motion; an Audio Emotion Module (AEM) that extracts emotional cues from a reference image to enable fine‑grained emotion control over generated video; and a Face‑Aware Audio Adapter (FAA) that isolates audio influence to specific face regions via latent‑level masking, supporting independent audio‑driven animation in multi‑character scenarios.

Starting Price: Free

View Software
3

iTranscribe

iTranscribe

iTranscribe is an AI-powered web transcription tool that converts audio, video, and links into accurate text with summaries and translations. Upload files or record live—get searchable transcripts in minutes, no software installation required. Key Features: -Smart Transcription Upload audio/video files and get AI-generated text with 95%+ accuracy. Process hours of content in minutes. -AI Summaries & Translations Automatically generate concise summaries and translate transcripts into multiple languages—all in one place. ...

1 Rating

Starting Price: $5.99/week & $99/year

View Software
4

Pyramix

Merging

Pyramix is a digital audio workstation used by professional studios and engineers the world over for music production, mastering, TV and film post-production. Pyramix, in combination with its networked audio interfaces offers the only end to end solution on the market to produce music digitally with an audio format that sounds like analog to humans. We enable music producers to give no compromise to their music throughout the whole production chain, and give them all the tools needed for a broad outreach including latest “Next Generation Audio" streaming formats such as Dolby Atmos ®. ...

1 Rating

View Software
5

MediaHuman Audio Converter

MediaHuman

MediaHuman Audio Converter is a free application for macOS and Windows. With it, you can convert music in formats like MP3, AAC, WMA, OGG, etc. as well as in lossless formats like FLAC, Apple Lossless, AIFF, and WAV (up to 32-bit). A simple and intuitive interface converts between all key audio formats. Splits lossless tracks by CUE sheet. Exports to iTunes/Music.app.

Starting Price: Free

View Software
6

HunyuanCustom

Tencent

...To enable audio- and video-conditioned generation, it further proposes modality-specific condition injection mechanisms, an AudioNet module that achieves hierarchical alignment via spatial cross-attention, and a video-driven injection module that integrates latent-compressed conditional video through a patchify-based feature-alignment network. Extensive experiments on single- and multi-subject scenarios demonstrate that HunyuanCustom significantly outperforms state-of-the-art open and closed source methods in terms of ID consistency, realism, and text-video alignment.

View Software
7

Sonic Visualiser

Sonic Visualiser

Sonic Visualiser is a free, open source application for Windows, Linux, and Mac, designed to be the first program you reach for when you want to study a music recording closely. It's designed for musicologists, archivists, signal-processing researchers, and anyone else looking for a friendly way to look at what lies inside the audio file. Sonic Visualiser itself is the most general, program for highly configurable detailed visualization, analysis, and annotation of audio recordings. Rapid visualization of multiple audio files containing versions of the same source material, such as performances from the same score, or different takes of an instrumental part. ...

Starting Price: Free

View Software
8

Loudly

Loudly

With massive curated audio loops, Loudly's advanced playback engine combines, warps, and follows chord progressions in real time. Loudly's unique blend of expert systems and generative adversarial networks ensures musically meaningful compositions. Collaboration between Loudly's music team and ML experts fuels their success. Easy to use tool that will create AI-generated songs in a matter of seconds.

1 Rating

Starting Price: $9.99 per month

View Software
9

AnswerBank

AnswerBank

...Not only can you offer customer-facing chat, you can publish an FAQ, a newsletter, or even a podcast, all from your private documents and without sacrificing their security. Domain-level access control. Public-facing bot pages. AI-generated audio. Embeddable routes. Zero exposure of source files. AnswerBank.

Starting Price: $29/month/tenant

View Software
10

OmniHuman-1

ByteDance

OmniHuman-1 is a cutting-edge AI framework developed by ByteDance that generates realistic human videos from a single image and motion signals, such as audio or video. The platform utilizes multimodal motion conditioning to create lifelike avatars with accurate gestures, lip-syncing, and expressions that align with speech or music. OmniHuman-1 can work with a range of inputs, including portraits, half-body, and full-body images, and is capable of producing high-quality video content even from weak signals like audio-only input. ...

View Software
11

AI Document Suite

AI Doc Suite

Aidocsuite.com's AI Document Suite is a next-generation, AI-powered workspace for creating, editing, and converting documents, slides, spreadsheets, images, and audio-video content — all in one browser-based suite. Powered by Free Document Maker (fdmGTP Engine), it delivers intelligent document generation, smart PDF editing, AI writing assistance, and instant file conversion with zero sign-up or watermark

1 Rating

Starting Price: $0

View Software
12

Sonantic

Sonantic

Reduce production timelines from months to minutes by rapidly transforming scripts into audio. Use the desktop app to create a stellar voice without any code. Or try the developer page to explore our API and CLI tools. Create highly expressive, nuanced performances by incorporating rich emotions into your narrative. Dial-in the precise level of intensity. Sit in the director’s chair. Shape scenes with full control over voice performance parameters.

View Software
13

Ekiga

Ekiga

...The GNU/Linux desktop was at its infancy, and let's not speak about multimedia capabilities. Most webcam drivers were buggy, ALSA had not been released yet and full-duplex audio was something difficult to achieve. General performance could also be an issue, especially when most efficient codecs were closed source. Generally speaking, the technology was not ready yet but Ekiga was already kicking!

View Software
14

ACE

u-he

...Once you start connecting the sixteen modules in ACE, exploring new combinations, and cross-pollinating ideas, the vast potential of modular soon becomes clear. The two VCOs act as the main sound generators, but as ACE does not differentiate between audio and control (modulation) signals, the full-range LFOs can also be used to generate audio frequencies. ACE’s oscillators are modeled on analog circuits, including instabilities and various non-linear characteristics.

Starting Price: €69 one-time payment

View Software
15

Hugging Face Transformers

Hugging Face

Transformers is a library of pretrained natural language processing, computer vision, audio, and multimodal models for inference and training. Use Transformers to train models on your data, build inference applications, and generate text with large language models. Explore the Hugging Face Hub today to find a model and use Transformers to help you get started right away. Simple and optimized inference class for many machine learning tasks like text generation, image segmentation, automatic speech recognition, document question answering, and more. ...

Starting Price: $9 per month

View Software
16

Rage

Enlightenment

Rage is a video and audio player written using the Enlightenment Foundation Libraries (EFL) with some interesting features. Rage is a simple video and audio player intended to be slick yet simplistic, much like Mplayer. Use the command line to play media files or just drag and drop them onto the Rage window to add them to a playlist. Run Rage with no command-line arguments to enter video browser mode.

Starting Price: Free

View Software
17

gTTS

gTTS

gTTS (Google Text-to-Speech), a Python library and CLI tool to interface with Google Translate's text-to-speech API. Write spoken mp3 data to a file, a file-like object (bytestring) for further audio manipulation, or stdout. Or simply pre-generate Google Translate TTS request URLs to feed to an external program. Customizable speech-specific sentence tokenizer that allows for unlimited lengths of text to be read, all while keeping proper intonation, abbreviations, decimals and more. Customizable text pre-processors which can, for example, provide pronunciation corrections.

Starting Price: Free

View Software
18

Wavel

Wavel.ai

Wavel AI is a powerful AI-driven platform designed to revolutionize video and audio content creation. It offers a complete set of intelligent tools including AI Dubbing, AI Video Translator, and Auto Subtitle Generation to make multilingual content accessible and engaging. The platform also features AI Text-to-Video generation, AI Avatars for dynamic presentations, and AI Video to Shorts for creating attention-grabbing short clips.

11 Ratings

Starting Price: $0

View Software
19

Vital

Vital Audio

...Turn your own samples into wavetables by using Vital's pitch-splice or vocode wavetable converter. Create wavetables from scratch using the built-in wavetable editor and even generate wavetables from text. Vital is a visual synthesizer. See what's happening behind the scenes with animated controls, filter responses, waveforms, smooth LFOs, oscilloscopes, spectrograms, and more. All animations run at 60 frames per second and are GPU optimized which leaves your CPU to do its real job, the audio processing. Modulate Vital's controls with a fast, drag-and-drop workflow. ...

Starting Price: $5 per month

View Software
20

AIVA

AIVA

The artificial intelligence that composes emotional soundtrack music. Whether you are an independent game developer, a complete novice in music, or a seasoned professional composer, AIVA assists you in your creative process. Create compelling themes for your projects faster than ever before, by leveraging the power of AI-generated music. Use our preset algorithms to compose music in pre-defined styles. If you need to create an original score that has a similar emotional impact as another...

Starting Price: €11 per month

View Software
21

Speechelo

Speechelo

Just paste the text you want to be transformed into our online text-to-voice tool. Our A.I. text-to-audio converter engine will check your text and will add all the punctuation marks needed to make the speech sound natural. We offer over 30 voices for you to choose from. You can preview each voice to hear and find the one that best fits your needs. Also, you can add breathing sounds, long pauses in the speech, and even choose the tone of the speech.

Starting Price: $47 one-time payment

View Software
22

Layer

Layer

Layer is an AI platform empowering game studios & entertainment brands with tools to generate images, video, 3D, and audio using 149+ models for marketing and monetization creatives. Enjoy custom models trained with top visual styles used in games, automated workflows, and enterprise-grade security meeting world's top IP owners demands.

Starting Price: Free

View Software
23

KOLSprite

KOLSprite

...Unlock TikTok Success: ● Global Reach: 5+ languages (EN, CN, JP, VN, ID & more). ● Watermark-Free Downloads: Bulk TikTok video downloads. ● AI-Powered Scripting: Summarize existing scripts & generate fresh AI content. ● Unlimited Audio Extraction: 1-click MP3/MP4 audio extraction. ● Instant Script Analysis & Translation: Easily understand & adapt scripts. ● Real-Time Creator Analytics: Track location, followers, engagement, category. ● Advanced Video Filtering & Export: Sort by views/likes/comments; export data & covers....

3 Ratings

Starting Price: Free

View Software
24

Acapela Cloud

Acapela Group

...It features an easy to integrate API, a web interface with advanced UX, new layouts as well as prompt editing capabilities. Cost effective and very easy to use, it gives all content a natural (digital) voice. It provides an immediate solution to answer all needs for voice interface or audio interactivity, in a wide range of languages and voices. With only a few lines of code, connect to the Acapela Cloud server, send the text to be spoken and let the service do its job! Acapela Cloud will instantly generate the voice file that will be played on your applications or devices. Over 30 languages and 100 standard voices are available, 24/7. ...

View Software
25

Voicv

Voicv

Voicv is a cutting-edge voice cloning platform that transforms your voice into a digital asset in minutes, supporting multiple languages and zero-shot learning. It allows users to clone any voice with just a 10-30-second audio sample, maintaining high fidelity and natural expression. It supports multiple languages, including English, Japanese, Korean, Chinese, French, German, Arabic, and Spanish. Voicv offers real-time processing, enabling fast voice generation suitable for quick iterations and production needs. It achieves professional-quality output with extremely low error rates, ensuring clear and accurate speech generation. ...

Starting Price: $23.99 per month

View Software
26

Piper TTS

Rhasspy

Piper is a fast, local neural text-to-speech (TTS) system optimized for devices like the Raspberry Pi 4, designed to deliver high-quality speech synthesis without relying on cloud services. It utilizes neural network models trained with VITS and exported to ONNX Runtime, enabling efficient and natural-sounding speech generation. Piper supports a wide range of languages, including English (US and UK), Spanish (Spain and Mexico), French, German, and many others, with voices available for download. Users can run Piper via the command line or integrate it into Python applications using the piper-tts package. The system allows for real-time audio streaming, JSON input for batch processing, and supports multi-speaker models. ...

Starting Price: Free

View Software
27

MiniMax Agent

MiniMax

MiniMax Agent is an AI super-companion that helps you think faster and achieve more by combining a natural-language chat interface with a suite of creativity, productivity, and learning tools. Its modules include a meditation audio generator for calming, three-minute guided sessions; a podcast assistant for scripting and episode planning; a code builder and debugger that writes, refactors and explains code; a data analyst for charting and interpreting datasets; an itinerary planner that creates detailed, multi-day trip schedules; a story crafter for children’s picture books with illustration prompts; interactive quiz maker to turn any topic into engaging learning exercises; a fact-checker that verifies citations and sources; a stock insight tool that analyzes performance and suggests strategies; a video brainstormer for naming projects and generating domain ideas; and a tech finder for discovering the latest gadgets.

View Software
28

ReadSpeaker

ReadSpeaker

Lifelike text to speech for your customers. Make your products more engaging with our voice solutions. Add speech to your website & apps to make your content available to a larger audience. Produce your own audio files with our natural-sounding text to speech voices. Give a voice to robots, public announcement systems, IVRs and more with text to speech. Text to speech enables brands, companies, and organizations to deliver enhanced end-user experience, while minimizing costs. Whether you’re...

View Software
29

CreateAIvoiceovers

The Seaplace Group, LLC

...Then, process and download your final MP3 audio file. That's it. CreateAIvoiceovers caters to diverse text to speech needs. It is best for: - Product and business promotions - Explainer videos - E-learning narrations - Podcasts - Marketing videos - Presentations - Software and App demos - YouTube Videos - Audiobooks - Documentaries - Animations - Games - Content for people with reading disabilities or visual impairment

Starting Price: $47 per user per month

View Software
30

Amazon Q Business

Amazon

Amazon Q Business is a fully managed, generative AI–powered assistant designed to help employees find information, gain insights, and take action at work. It enables users to interact using natural language to request information, generate content, or create lightweight apps that automate workflows. It provides a unified search experience across systems and data, delivering quick, accurate, and relevant answers to complex questions based on documents, images, audio, and video files, and other application data, with results including citations and references for transparency. ...

Starting Price: $20 per month

View Software