Introduction This is an individual assignment. In this assignment, you will implement a tiled matrix multiplication using CUDA.
0) Install CUDA SDK in your home directory. How to install CUDA SDK.
1) Untar hw3.tar.gz into ~/NVIDIA_CUDA_SDK/projects
Instruction:
cd ~/NVIDIA_CUDA_SDK/projects
tar -xvf hw3.tar.gz
cd hw3
make
2) Edit the source files matrixmul.cu and matrixmul_kernel.cu to complete the functionality of the matrix multiplication on the device. The two matrices could be any size, but the resulting matrix is guaranteed to have a number of elements less than 64,000.
3) There are several modes of operation for the application.
No arguments: The application will create two randomly sized and initialized matrices such that the matrix operation M * N is valid, and P is properly sized to hold the result. After the device multiplication is invoked, it will compute the correct solution matrix using the CPU, and compare that solution with the device-computed solution. If it matches (within a certain tolerance), if will print out "Test PASSED" to the screen before exiting.
One argument: The application will use the random initialization to create the input matrices, and write the device-computed output to the file specified by the argument.
Three arguments: The application will read input matrices from provided files. The first argument should be a file containing three integers. The first, second and third integers will be used as M.height, M.width, and N.height. The second and third function arguments will be expected to be files which have exactly enough entries to fill matrices M and N respectively. No output is written to file.
Four arguments: The application will read its inputs from the files provided by the first three arguments as described above, and write its output to the file provided in the fourth.
Note that if you wish to use the output of one run of the application as an input, you must delete the first line in the output file, which displays the accuracy of the values within the file. The value is not relevant for this application.
4) Measure the following cases.
For matrix size 1024 vary the block size 8, 16 and measure speedup.
5) Submission:
The hw3.tar.gz file should contain the hw3 folder provided, with all the changes and additions you have made to the source code. Include a pdf file with the answer of question 4.
Instruction:
cd ~/NVIDIA_CUDA_SDK/projects/hw3
make clean
cd ..
tar cvf hw3.tar hw3
gzip hw3.tar
upload hw3.tar.gz file at T-square
6) Grading.
We will grade the functionality of the code by varying the matrix size. If your code does not handle arbitrary matrix size, you will not receive a full credit. Please test matrix size which is not a multiple of the block size.
(Source: UIUC EE498AL)