cuda-gpumemtest free download

Showing 72 open source projects for "cuda-gpumemtest"

View related business solutions

Software Development Clear Filters & Widen Search

Fully managed relational database service for MySQL, PostgreSQL, and SQL Server
Focus on your application, and leave the database to us

Cloud SQL manages your databases so you don't have to, so your business can run without disruption. It automates all your backups, replication, patches, encryption, and storage capacity increases to give your applications the reliability, scalability, and security they need.

Try for free
Project Management Software
Understand how PI® can automate your processes

Project Insight offers powerful tools for project managers and teams in order to deliver optimal performance and ensure success. An enterprise project and portfolio management software, Project Insight offers personalized dashboards, intelligent scheduling, resource management, collaboration, time and expense tracking, project and portfolio tracking, workflow, and reporting features to help teams work better and get real results. Project Insight is suitable for small teams, mid-market companies, and large enterprises.

Learn More
1

CUDA Python

Performance meets Productivity

CUDA Python is a unified Python interface for accessing and working with the NVIDIA CUDA platform, enabling developers to build GPU-accelerated applications entirely in Python. It acts as a metapackage composed of multiple submodules that provide both high-level and low-level access to CUDA functionality, including runtime APIs, driver APIs, and JIT compilation tools.

Downloads: 3 This Week

Last Update: 4 days ago
See Project
2

Numba CUDA Target

The CUDA target for Numba

Numba CUDA Target is NVIDIA’s maintained CUDA backend for the Numba JIT compiler, enabling developers to write GPU-accelerated code directly in Python. It allows users to define CUDA kernels using Python syntax, which are then compiled into efficient GPU code at runtime using LLVM-based toolchains. This approach significantly lowers the barrier to entry for GPU programming by eliminating the need to write CUDA C++ while still delivering high performance. ...

Downloads: 18 This Week

Last Update: 2026-04-06
See Project
3

CUDA API Wrappers

Thin, unified, C++-flavored wrappers for the CUDA APIs

...In a nutshell - making CUDA API work more fun.

Downloads: 2 This Week

Last Update: 2026-02-09
See Project
4

CUDA Core Compute Libraries (CCCL)

CUDA Core Compute Libraries

CCCL, or CUDA Core Compute Libraries, is a unified repository that consolidates several foundational CUDA C++ libraries into a single, cohesive development platform. It brings together Thrust, CUB, and libcudacxx, which collectively provide high-level abstractions, low-level performance primitives, and a CUDA-compatible standard library for GPU programming.

Downloads: 13 This Week

Last Update: 4 days ago
See Project
The top-rated AI recruiting platform for faster, smarter hiring.
Humanly is an AI recruiting platform that automates candidate conversations, screening, and scheduling.

Humanly is an AI-first recruiting platform that helps talent teams hire in days, not months—without adding headcount. Our intuitive CRM pairs with powerful agentic AI to engage and screen every candidate instantly, surfacing top talent fast. Built on insights from over 4 million candidate interactions, Humanly delivers speed, structure, and consistency at scale—engaging 100% of interested candidates and driving pipeline growth through targeted outreach and smart re-engagement. We integrate seamlessly with all major ATSs to reduce manual work, improve data flow, and enhance recruiter efficiency and candidate experience. Independent audits ensure our AI remains fair and bias-free, so you can hire confidently.

Learn More
5

CuPy

A NumPy-compatible array library accelerated by CUDA

CuPy is an open source implementation of NumPy-compatible multi-dimensional array accelerated with NVIDIA CUDA. It consists of cupy.ndarray, a core multi-dimensional array class and many functions on it. CuPy offers GPU accelerated computing with Python, using CUDA-related libraries to fully utilize the GPU architecture. According to benchmarks, it can even speed up some operations by more than 100X. CuPy is highly compatible with NumPy, serving as a drop-in replacement in most cases. ...

Downloads: 22 This Week

Last Update: 2026-02-20
See Project
6

Tiny CUDA Neural Networks

Lightning fast C++/CUDA neural network framework

...It will likely only work on an RTX 3090, an RTX 2080 Ti, or high-end enterprise GPUs. Lower-end cards must reduce the n_neurons parameter or use the CutlassMLP (better compatibility but slower) instead. tiny-cuda-nn comes with a PyTorch extension that allows using the fast MLPs and input encodings from within a Python context. These bindings can be significantly faster than full Python implementations; in particular for the multiresolution hash encoding.

Downloads: 1 This Week

Last Update: 2025-07-08
See Project
7

Numbast

Build an automated pipeline that converts CUDA APIs into Numba

Numbast is an automated toolchain that bridges CUDA C++ and Python by generating Numba-compatible bindings directly from CUDA header files. Its primary goal is to eliminate the manual effort required to expose CUDA libraries to Python, enabling developers to use GPU-accelerated functionality in Python environments more easily. The system parses CUDA C++ declarations and converts them into Python bindings that can be used within Numba, allowing seamless integration with Python-based GPU workflows. ...

Downloads: 0 This Week

Last Update: 2026-03-18
See Project
8

NVIDIA Warp

A Python framework for accelerated simulation, data generation

NVIDIA Warp is a high-performance Python framework developed by NVIDIA for building and accelerating simulation, graphics, and physics-based workloads using GPU computing. It enables developers to write kernel-level code in Python that is automatically compiled into efficient CUDA kernels, combining ease of use with near-native performance. The framework is designed for applications such as robotics, reinforcement learning, physical simulation, and differentiable computing, where performance and flexibility are critical. Warp provides a set of primitives for working with arrays, geometry, and physics operations, allowing users to implement complex simulations without writing low-level CUDA code directly. ...

Downloads: 17 This Week

Last Update: 2026-04-06
See Project
9

Triton

Development repository for the Triton language and compiler

Triton is a programming language and compiler framework specifically designed for writing highly efficient custom deep learning operations, particularly for GPUs. It aims to bridge the gap between low-level GPU programming, such as CUDA, and higher-level abstractions by providing a more productive and flexible environment for developers. Triton enables users to write optimized kernels for machine learning workloads while maintaining readability and control over performance-critical aspects like memory access patterns and parallel execution. The project leverages LLVM and MLIR to compile code into efficient GPU instructions, supporting both NVIDIA and AMD hardware. ...

Downloads: 6 This Week

Last Update: 2026-03-20
See Project
Easy-to-use online form builder for every business.
Create online forms and publish them. Get an email for each response. Collect data.

Easy-to-use online form builder for every business. Create online forms and publish them. Get an email for each response. Collect data. Design professional looking forms with JotForm Online Form Builder. Customize with advanced styling options to match your branding. Speed up and simplify your daily work by automating complex tasks with JotForm’s industry leading features. Securely and easily sell products. Collect subscription fees and donations. Being away from your computer shouldn’t stop you from getting the information you need. No matter where you work, JotForm Mobile Forms lets you collect data offline with powerful forms you can manage from your phone or tablet. Get the full power of JotForm at your fingertips. JotForm PDF Editor automatically turns collected form responses into professional, secure PDF documents that you can share with colleagues and customers. Easily generate custom PDF files online!

Learn More
10

opencvsharp

OpenCV wrapper for .NET

...The native binding (libOpenCvSharpExtern) is already built in the docker image and you don't need to worry about it. OpenCvSharp won't work on Unity and Xamarin platform. For Unity, please consider using OpenCV for Unity or some other solutions. OpenCvSharp does not support CUDA. If you want to use the CUDA features, you need to customize the native bindings yourself. Objects of classes, such as Mat and MatExpr, have unmanaged resources and need to be manually released by calling the Dispose() method. Worst of all, the +, -, *, and other operators create new objects each time, and these objects need to be disposed of, or there will be memory leaks. ...

Downloads: 9 This Week

Last Update: 2026-03-17
See Project
11

Halide

A language for fast, portable data-parallel computation

...It was designed to make writing high-performance image and array processing code much easier on modern machines. It works on all major operating systems and with several CPU architectures (X86, ARM, MIPS, Hexagon, PowerPC) and GPU Compute APIs (CUDA, OpenCL, OpenGL, among others). It isn't a standalone programming language however; rather it is embedded in C++ which means that you write C++ code, building an in-memory representation of a Halide pipeline using Halide's C++ API. This representation can then be compiled to an object file, or a JIT-compile and run in the same process. ...

Downloads: 6 This Week

Last Update: 2025-09-17
See Project
12

TensorRT

C++ library for high performance inference on NVIDIA GPUs

...With TensorRT, you can optimize neural network models trained in all major frameworks, calibrate for lower precision with high accuracy, and deploy to hyperscale data centers, embedded, or automotive product platforms. TensorRT is built on CUDA®, NVIDIA’s parallel programming model, and enables you to optimize inference leveraging libraries, development tools, and technologies in CUDA-X™ for artificial intelligence, autonomous machines, high-performance computing, and graphics. With new NVIDIA Ampere Architecture GPUs, TensorRT also leverages sparse tensor cores providing an additional performance boost.

Downloads: 19 This Week

Last Update: 2026-03-25
See Project
13

SuiteSparse

The official SuiteSparse library: a suite of sparse matrix algorithms

The official SuiteSparse library: a suite of sparse matrix algorithms authored or co-authored by Tim Davis, Texas A&M University.

Downloads: 4 This Week

Last Update: 2026-02-10
See Project
14

cuDF

GPU DataFrame Library

...It relies on NVIDIA® CUDA® primitives for low-level compute optimization but exposing that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.

Downloads: 5 This Week

Last Update: 2026-04-08
See Project
15

Shumai

Fast Differentiable Tensor Library in JavaScript & TypeScript with Bun

...The library supports matrix operations, gradient computation, and tensor conversions with intuitive APIs and near-native speed, thanks to Bun’s low-overhead FFI bindings. It can automatically leverage GPU acceleration on Linux (via CUDA) and CPU computation on macOS.

Downloads: 0 This Week

Last Update: 11 hours ago
See Project
16

PyTorch Geometric

Geometric deep learning extension library for PyTorch

...We have outsourced a lot of functionality of PyTorch Geometric to other packages, which needs to be additionally installed. These packages come with their own CPU and GPU kernel implementations based on C++/CUDA extensions. We do not recommend installation as root user on your system python. Please setup an Anaconda/Miniconda environment or create a Docker image. We provide pip wheels for all major OS/PyTorch/CUDA combinations.

Downloads: 0 This Week

Last Update: 2025-10-14
See Project
17

Jittor

Jittor is a high-performance deep learning framework

...Module Design and Dynamic Graph Execution is used in the front-end, which is the most popular design for deep learning framework interface. The back-end is implemented by high-performance languages, such as CUDA, C++. Jittor'op is similar to NumPy. Let's try some operations. We create Var a and b via operation jt.float32, and add them. Printing those variables shows they have the same shape and dtype.

Downloads: 1 This Week

Last Update: 2025-07-28
See Project
18

Ccache

A fast compiler cache

...Supports GCC, Clang, MSVC (Microsoft Visual C++) and other similar compilers. Works on Linux, macOS, other Unix-like operating systems and Windows. Understands C, C++, assembler, CUDA, Objective-C and Objective-C++. Supports secondary storage over HTTP (e.g. using Nginx or Google Cloud Storage), Redis or local filesystem, optionally sharding data onto a server cluster. Supports fast "direct" and "depend" modes that don't rely on using the preprocessor. Supports compression using Zstandard. Checksums cache content using XXH3 to detect data corruption. ...

Downloads: 28 This Week

Last Update: 6 days ago
See Project
19

CubeCL

Multi-platform high-performance compute language extension for Rust

...It provides an abstraction layer that allows developers to write portable, hardware-efficient compute kernels without directly dealing with complex GPU APIs such as CUDA or OpenCL. CubeCL focuses on delivering predictable performance and composability by exposing explicit control over memory layouts, parallelism, and execution patterns while still maintaining a developer-friendly syntax. The framework is built to integrate tightly with modern ML stacks, enabling efficient tensor operations and custom kernel development that can outperform generic libraries in specialized workloads. ...

Downloads: 6 This Week

Last Update: 2026-03-18
See Project
20

NVIDIA GPU Operator

NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes

...The NVIDIA GPU Operator uses the operator framework within Kubernetes to automate the management of all NVIDIA software components needed to provision GPU. These components include the NVIDIA drivers (to enable CUDA), Kubernetes device plugin for GPUs, the NVIDIA Container Runtime, automatic node labeling, DCGM-based monitoring, and others.

Downloads: 1 This Week

Last Update: 2026-03-19
See Project
21

Face Alignment

2D and 3D Face alignment library build using pytorch

...However, the users can alternatively use dlib, BlazeFace, or pre-existing ground truth bounding boxes. While not required, for optimal performance(especially for the detector) it is highly recommended to run the code using a CUDA-enabled GPU. While here the work is presented as a black box, if you want to know more about the intrisecs of the method please check the original paper either on arxiv or my webpage.

Downloads: 4 This Week

Last Update: 2026-04-06
See Project
22

ArrayFire

ArrayFire, a general purpose GPU library

...Together we can fulfill The ArrayFire Mission under an excellent Code of Conduct that promotes a respectful and friendly building experience. Rigorous benchmarks and tests ensuring top performance and numerical accuracy. Cross-platform compatibility with support for CUDA, OpenCL, and native CPU on Windows, Mac, and Linux. Built-in visualization functions through Forge.

Downloads: 2 This Week

Last Update: 2025-09-05
See Project
23

The Futhark Programming Language

A data-parallel functional programming language

...It is a statically typed, data-parallel, and purely functional array language in the ML family, and comes with a heavily optimizing ahead-of-time compiler that presently generates either GPU code via CUDA and OpenCL, or multi-threaded CPU code. Futhark is not designed for graphics programming, but can instead use the compute power of the GPU to accelerate data-parallel array computations. The language supports regular nested data-parallelism, as well as a form of imperative-style in-place modification of arrays, while still preserving the purity of the language via the use of a uniqueness type system. ...

Downloads: 2 This Week

Last Update: 2026-04-02
See Project
24

Faiss

Library for efficient similarity search and clustering dense vectors

Faiss is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning. Faiss is written in C++ with complete wrappers for Python/numpy. Some of the most useful algorithms are implemented on the GPU. It is developed by Facebook AI Research. Faiss contains several methods for similarity search. It...

Downloads: 5 This Week

Last Update: 2026-03-06
See Project
25

AWS Deep Learning Containers

A set of Docker images for training and serving models in TensorFlow

AWS Deep Learning Containers (DLCs) are a set of Docker images for training and serving models in TensorFlow, TensorFlow 2, PyTorch, and MXNet. Deep Learning Containers provide optimized environments with TensorFlow and MXNet, Nvidia CUDA (for GPU instances), and Intel MKL (for CPU instances) libraries and are available in the Amazon Elastic Container Registry (Amazon ECR). The AWS DLCs are used in Amazon SageMaker as the default vehicles for your SageMaker jobs such as training, inference, transforms etc. They've been tested for machine learning workloads on Amazon EC2, Amazon ECS and Amazon EKS services as well. ...

Downloads: 4 This Week

Last Update: 24 hours ago
See Project