simd free download - SourceForge

30 projects for "simd" with 1 filter applied:

BSD Clear Filters & Widen Search

Earn up to 16% annual interest with Nexo.
Access competitive interest rates on your digital assets.

Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.

Get started with Nexo.
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
1

HighwayHash

Fast strong hash functions: SipHash/HighwayHash

HighwayHash is a fast, keyed hash function intended for scenarios where you need strong, DoS-resistant hashing without the full overhead of a general-purpose cryptographic hash. It’s designed to defeat hash-flooding attacks by mixing input with wide SIMD operations and a branch-free inner loop, so adversaries can’t cheaply craft many colliding keys. The implementation targets multiple CPU families with vectorized code paths while keeping a portable fallback, yielding high throughput across platforms. It exposes simple one-shot and streaming APIs, so you can hash short keys or long byte streams with the same function. ...

Downloads: 0 This Week

Last Update: 2025-10-10
See Project
2

Polars

Dataframes powered by a multithreaded, vectorized query engine

Polars is a high-performance, multi-language DataFrame library built in Rust using Apache Arrow. It delivers blazing-fast, vectorized, and parallel data manipulation with both eager and lazy execution, making it an excellent tool for data processing in Python, Rust, Node.js, R, and SQL contexts.

Downloads: 1 This Week

Last Update: 2026-03-20
See Project
3

wllama

WebAssembly binding for llama.cpp - Enabling on-browser LLM inference

...Built as a binding for the llama.cpp inference engine, the project allows developers to run LLM models locally without requiring a server backend or dedicated GPU hardware. The library leverages WebAssembly SIMD capabilities to achieve efficient execution within modern browsers while maintaining compatibility across platforms. By running models locally on the user’s device, wllama enables privacy-preserving AI applications that do not require sending data to remote servers. The framework provides both high-level APIs for common tasks such as text generation and embeddings, as well as low-level APIs that expose tokenization, sampling controls, and model state management.

Downloads: 6 This Week

Last Update: 2026-03-10
See Project
4

Zerocopy

Zerocopy makes zero-cost memory manipulation effortless

Zerocopy is a Rust library designed to make zero-cost memory manipulation both safe and effortless. It allows developers to reinterpret or convert raw byte sequences into structured types—and vice versa—without writing unsafe code directly. The crate provides safe abstractions for transmuting data while preserving Rust’s strict safety guarantees, removing the need for manual memory manipulation. Zerocopy introduces a suite of conversion traits such as TryFromBytes, FromBytes, IntoBytes, and...

Downloads: 9 This Week

Last Update: 14 minutes ago
See Project
Forever Free Full-Stack Observability | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
5

sleef

Vectorized libm

SLEEF stands for SIMD Library for Evaluating Elementary Functions. SLEEF implements vectorized versions of all C99 math functions, that utilize SIMD instructions of modern processors to make computation more efficient. The library also includes vectorized DFT subroutines.

Downloads: 4 This Week

Last Update: 2025-01-28
See Project
6

Armadillo

fast C++ library for linear algebra & scientific computing

* Fast C++ library for linear algebra (matrix maths) and scientific computing * Easy to use functions and syntax, deliberately similar to Matlab / Octave * Uses template meta-programming techniques to increase efficiency * Provides user-friendly wrappers for OpenBLAS, Intel MKL, LAPACK, ATLAS, ARPACK, SuperLU and FFTW libraries * Useful for machine learning, pattern recognition, signal processing, bioinformatics, statistics, finance, etc. * Downloads:...

Downloads: 2,730 This Week

Last Update: 2026-03-15
See Project
7

Fosite - advection problem solver

numerical simulation code for solving transport equations in 1D/2D/3D

...Fosite is written with object-oriented patterns in Fortran 2003 and follows the Structure of Arrays (SoA) layout, operating on generic field datatypes. This allows for high performance on modern architectures (SIMD). It is parallelized and vectorized. The software is thereby optimized for the NEC SX-Aurora TSUBASA Vector Engine .

Downloads: 2 This Week

Last Update: 2025-02-05
See Project
8

libsombrero

Astronomical object/structure detection from 1D and 2D data sets.

Sombrero is a fast wavelet image processing and object detection C library for astronomical images. Sombrero is named after the "Mexican Hat" shape of the wavelet masks used in image convolution and is released under the GNU LGPL library.

Downloads: 0 This Week

Last Update: 2026-02-22
See Project
9

libjpeg-turbo

SIMD-accelerated libjpeg-compatible JPEG codec library

libjpeg-turbo is a JPEG image codec that uses SIMD instructions (MMX, SSE2, NEON, AltiVec) to accelerate baseline JPEG compression and decompression on x86, x86-64, ARM, and PowerPC systems. On such systems, libjpeg-turbo is generally 2-6x as fast as libjpeg, all else being equal. On other types of systems, libjpeg-turbo can still outperform libjpeg by a significant amount, by virtue of its highly-optimized Huffman coding routines.

16 Reviews

Downloads: 44,917 This Week

Last Update: 2024-01-13
See Project
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
10

TurboPFor

Fastest Integer Compression

Fastest Integer Compression. ALL functions are available for AMD/Intel, 64-bit ARMv8 NEON Linux+MacOS/M1 & Power9 Altivec. 100% C (C++ headers), as simple as memcpy. OS:Linux amd64, arm64, Power9, MacOs (Amd/intel + Apple M1).

Downloads: 0 This Week

Last Update: 2024-05-30
See Project
11

dispy

Distributed and Parallel Computing with/for Python.

dispy is a generic and comprehensive, yet easy to use framework for creating and using compute clusters to execute computations in parallel across multiple processors in a single machine (SMP), among many machines in a cluster, grid or cloud. dispy is well suited for data parallel (SIMD) paradigm where a computation (Python function or standalone program) is evaluated with different (large) datasets independently. dispy supports public / private / hybrid cloud computing, fog / edge computing.

3 Reviews

Downloads: 12 This Week

Last Update: 2022-10-06
See Project
12

LightPCC

Parallel pairwise correlation computation on Intel Xeon Phi clusters

The first parallel and distributed library for pairwise correlation/dependence computation on Intel Xeon Phi clusters. This library is written in C++ template classes and achieves high speed by exploring the SIMD-instruction-level and thread-level parallelism within Xeon Phis as well as accelerator-level parallelism among multiple Xeon Phis. To facilitate balanced workload distribution, we have proposed a general framework for symmetric all-pairs computation by building provable bijective functions between job identifier and coordinate space for the first time.

Downloads: 0 This Week

Last Update: 2017-04-05
See Project
13

smartIDS

Lightweight intrusion detection for IoT and embedded devices.

The aim of the project is a lightweight intrusion detection library for embedded devices which supports MSP430 and ARM Cortex based devices. Features include DSP/SIMD support, IoT and embedded protocols, distributed operation, event and history management, tool supported configuration and visualization. There is a Java port that supports less features.

Downloads: 0 This Week

Last Update: 2016-11-28
See Project
14

RandomLib

Random number library

RandomLib is a C++ interface to the Mersenne Twister random number generator MT19937 and to the SIMD-oriented Fast Mersenne Twister random number generator, SFMT19937. For documentation, visit http://randomlib.sf.net

Downloads: 1 This Week

Last Update: 2016-02-04
See Project
15

SWAPHI-LS: Alignment on Xeon Phi Cluster

Smith-Waterman long DNA sequence alignment on Xeon Phi clusters

The first parallel Smith-Waterman algorithm exploiting Intel Xeon Phi clusters to accelerate the alignment of long DNA sequences. This algorithm is written in C++ (with a set of SIMD intrinsic extensions), OpenMP and MPI. The performance evaluation revealed that our algorithm achieves very stable performance, and yields a performance of up to 30.1 GCUPS on a single Xeon Phi and up to 111.4 GCUPS on four Xeon Phis sharing a host.

Downloads: 0 This Week

Last Update: 2016-05-13
See Project
16

libSIMD

Mathematical library utilising SIMD features of common processors to accelerate many commonly-used algorithms where compilers fear to tread.

Downloads: 1 This Week

Last Update: 2015-01-15
See Project
17

Virtual Lighttable and Darkroom

Darktable is a virtual lighttable and darkroom for photographers: it manages your digital negatives in a database and lets you view them through a zoomable light table. It also enables you to develop raw images and enhance them.

24 Reviews

Downloads: 35 This Week

Last Update: 2014-04-23
See Project
18

Block Matrix library

Highly efficient implementation of BLAS for sparse block matrices.

Highly efficient implementation of BLAS for sparse block matrices. Accelerated using heavy-duty C++ meta-programming, SIMD instructions and GPU.

Downloads: 0 This Week

Last Update: 2015-04-15
See Project
19

Vector3D SSE

A C++ header library for fast operations on vectors/matrices (3D/3x3) using Streaming SIMD Extensions (SSE, SSE2, SSE3, SSE4); Tends to be used in 3D graphics applications and game developement.

Downloads: 0 This Week

Last Update: 2013-04-24
See Project
20

Fast SIFT Image Features Library

A cross-platform library that computes fast and accurate SIFT image features. libsiftfast provides Octave/Matlab scripts, a command line interface, and a python interface (siftfastpy). Optimized with SIMD instructions and OpenMP .

2 Reviews

Downloads: 0 This Week

Last Update: 2015-12-02
See Project
21

diagonalsw

An efficient implementation of the Smith-Waterman algorithm that takes advantage of SIMD instruction sets in modern CPUs. The Smith-Waterman algorithm is used for sequence alignment in bioinformatics.

Downloads: 0 This Week

Last Update: 2019-02-22
See Project
22

SSEPlus

SSEPlus is a SIMD function library. It provides optimized emulation for newer SSE instructions. It also provides a rich set of high performance routines for common operations such as arithmetic, bitwise logic, and data packing and unpacking.

1 Review

Downloads: 0 This Week

Last Update: 2013-04-24
See Project
23

parallel for

A data parallel scientific programming model. Compiles efficiently to different platforms like distributed memory (MPI), shared memory multi-processor (pthreads), Cell BE processor, Nvidia Cuda, SIMD vectorization (SSE, Altivec), and sequential C++ code.

Downloads: 0 This Week

Last Update: 2013-04-10
See Project
24

GENIAL

GENIAL is a C++ library for signal and image processing. It uses template-generic techniques, multi threading, cache optimization and SIMD instructions for Pentium (MMX, SSE, SSE2, SSE3) to achieve high performance: FFT,DCT,Convolution,Linear Algebra...

1 Review

Downloads: 0 This Week

Last Update: 2015-06-04
See Project
25

mubench

Low-level processor benchmark for x86 and amd64 processors: measures exact latency and throughput for each assembly instruction, and automatically finds execution units. Special focus on SIMD (MMX, SSE) instructions.

Downloads: 0 This Week

Last Update: 2013-04-22
See Project