TransferBench is a utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs)
PyTorch Extension Library of Optimized Scatter Operations
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, ...
最近更新: 7天前Functional Programming Library for C++. Write concise and readable C++ code.
最近更新: 7天前Machine Intelligence Shader Autogen. AMDGPU ML shader code generator. (previously iGEMMgen)
最近更新: 7天前FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
最近更新: 7天前Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
最近更新: 7天前