CUDA Matrix Multiplication: Shared Memory
Classified in Computers
Written at on English with a size of 3.23 KB.
CUDA Matrix Multiplication Using Shared Memory
This code demonstrates matrix multiplication in CUDA, leveraging shared memory for optimization. It includes two examples: a kernel using shared memory and a host-side implementation using the Thrust library.
CUDA Kernel with Shared Memory
The following CUDA kernel performs matrix multiplication using shared memory to optimize data access:
__global__ void matMulShared(int *A, int *B, int *C, int rowsA, int colsA, int colsB) {
__shared__ int tile_A[TILE_SIZE][TILE_SIZE], tile_B[TILE_SIZE][TILE_SIZE];
int row = blockIdx.y * TILE_SIZE + threadIdx.y, col = blockIdx.x * TILE_SIZE + threadIdx.x, temp = 0;
for (int i = 0; i < (colsA + TILE_SIZE - 1) / TILE_SIZE; ++i) {
if (row <