News

NVIDIA Technical Blog
developer.nvidia.com > blog > how-to-write-high-performance-matrix-multiply-in-nvidia-cuda-tile

How to Write High-Performance Matrix Multiply in NVIDIA CUDA Tile

How to Write High-Performance Matrix Multiply in NVIDIA CUDA Tile6+ hour, 49+ min ago   (775+ words) This blog post is part of a series designed to help developers learn NVIDIA CUDA Tile programming for building high-performance GPU kernels, using matrix multiplication as a core example. In this post, you'll learn: Before you begin, be sure your…...

developer.nvidia.com
developer.nvidia.com > blog

Delivering Massive Performance Leaps for Mixture of Experts Inference on NVIDIA Blackwell

1+ week, 58+ min ago   (636+ words) As AI models continue to get smarter, people can rely on them for an expanding set of tasks. This leads users'from consumers to enterprises'to interact with AI more frequently, meaning that more tokens need to be generated. To serve these…...

developer.nvidia.com
developer.nvidia.com > blog

Introducing NVIDIA BlueField-4-Powered Inference Context Memory Storage Platform for the Next

Introducing NVIDIA BlueField-4-Powered Inference Context Memory Storage Platform for the Next1+ week, 1+ day ago   (1289+ words) AI'native organizations increasingly face scaling challenges as agentic AI workflows drive context windows to millions of tokens and models scale toward trillions of parameters. These systems currently rely on agentic long'term memory for context that persists across turns, tools, and…...

NVIDIA Developer
developer.nvidia.com > blog > open-source-ai-tool-upgrades-speed-up-llm-and-diffusion-models-on-nvidia-rtx-pcs

Open-Source AI Tool Upgrades Speed Up LLM and Diffusion Models on NVIDIA RTX PCs

1+ week, 1+ day ago   (576+ words) At CES 2026, NVIDIA is announcing several new updates for the AI PC developer ecosystem, including: NVIDIA collaborated with the open-source community to boost inference performance across the AI PC stack." On the diffusion front, ComfyUI optimized performance on NVIDIA GPUs…...

NVIDIA Developer
developer.nvidia.com > blog > new-software-and-model-optimizations-supercharge-nvidia-dgx-spark

New Software and Model Optimizations Supercharge NVIDIA DGX Spark

1+ week, 2+ day ago   (658+ words) Since its release, NVIDIA has continued to push performance of the Grace Blackwell-powered DGX Spark through continuous software optimization and close collaboration with software partners and the open-source community. These efforts are delivering meaningful gains across inference, training and creative…...

developer.nvidia.com
developer.nvidia.com > blog

Inside the NVIDIA Rubin Platform: Six New Chips, One AI Supercomputer

1+ week, 2+ day ago   (1598+ words) AI has entered an industrial phase. What began as systems performing discrete AI model training and human-facing inference has evolved into always-on AI factories that continuously convert power, silicon, and data into intelligence at scale. These factories now underpin applications…...

developer.nvidia.com
developer.nvidia.com > blog

Accelerate AI Inference for Edge and Robotics with NVIDIA Jetson T4000 and NVIDIA JetPack 7.1

1+ week, 2+ day ago   (722+ words) The module includes 1" NVENC and 1" NVDEC hardware video codec engines, enabling real-time 4K video encoding and decoding. This balanced design is built for platforms that combine advanced vision processing and I/O capabilities with power and thermal efficiency. The Jetson T4000 module…...

NVIDIA Technical Blog
developer.nvidia.com > blog

Real-Time Decoding, Algorithmic GPU Decoders, and AI Inference Enhancements in NVIDIA CUDA-Q QEC

Real-Time Decoding, Algorithmic GPU Decoders, and AI Inference Enhancements in NVIDIA CUDA-Q QEC4+ week, 6+ hour ago   (588+ words) To help solve these problems and enable research into better solutions, NVIDIA CUDA-Q QEC version 0.5.0 includes a range of improvements. These include support for online real-time decoding, new GPU-accelerated algorithmic decoders, infrastructure for high-performance AI decoder inference, sliding window decoder…...

NVIDIA Technical Blog
developer.nvidia.com > blog

Migrate Apache Spark Workloads to GPUs at Scale on Amazon EMR with Project Aether

Migrate Apache Spark Workloads to GPUs at Scale on Amazon EMR with Project Aether4+ week, 8+ hour ago   (566+ words) Building on this foundation, we introduce a smart and efficient way to migrate existing CPU-based Spark workloads running on Amazon Elastic MapReduce (EMR). Project Aether is an NVIDIA tool engineered to automate this transition. It works by taking existing CPU…...

NVIDIA Technical Blog
developer.nvidia.com > blog

Solving Large-Scale Linear Sparse Problems with NVIDIA cuDSS

Solving Large-Scale Linear Sparse Problems with NVIDIA cuDSS4+ week, 9+ hour ago   (966+ words) You can leverage your CPU/GPU using hybrid memory mode to run larger problems that otherwise would not fit in a single GPU memory, or run a workload across multiple GPUs, or even scale to multiple nodes effortlessly. This blog…...