No-stress CUDA programming using Go and C

Top Tech Firms

Top 10 Sports Betting App and Sports Mobile App Development Companies

Sports betting is one of the oldest forms of gambling, carried out by generations for a very...

Top Mobile App Development Companies in USA and Worldwide

The global headcount of app developers is growing exponentially. Paradoxically, the gap between mobile and web app...

Exploring the Top 10 Blockchain Companies From Australia

Recently, the Australian government published a 52-page report on its vision regarding blockchain development in the country. Essentially, the...
James Hook
An experienced Content Writer to work with a Big 4 consultancy on an exciting programme in the technology/AI (artificial intelligence) field, specifically within the transportation sector. Key responsibilities for the Content Writer include: Create white papers discussing subject matter in the technology/AI field, for applications within the transportation sector Utilise existing content ensuring it meets brand guidelines and drives the strategic priorities of the organisation Work collaboratively with colleagues The Successful Applicant will ideally have: Ability to produce written content, including editing and proofreading Strong understanding of technology language, drivers and outcomes Understanding of MS Office applications, Adobe Acrobat, Photoshop etc. Unrivalled attention to detail Good organisational skills including the ability to manage and reconcile competing priorities Good communication and interpersonal skills Ability to interact with stakeholders at various levels and ensure objectives are met Self-motivated, flexible and proactive attitude Exceptional English language skills

Programming CUDA using Go is a bit more complex than in other languages. Although there are some excellent packages, such as mumax, the documentation is poor, lacks examples and it’s difficult to use.CUDA is for C, so the best alternative is to use Command cgo and invoke an external function with your Cuda Kernel. This is what I will do in this example, where I multiply two matrices using CUDA.If you want to know more about CUDA programming, read the my article.

Kernel

I created a Simple Kernel that has the Kernel function and a helper function to be called externally. Note that I used extern C because this is how cgo invokes functions:

#include <stdio.h>
#include <cuda.h>
 

__global__ void vecmul(float *A, float* B, float *C, int size)
{
    // Row and Column indexes: 
    int row = blockIdx.y*blockDim.y+threadIdx.y;
    int col = blockIdx.x*blockDim.x+threadIdx.x;

    // Are they bellow the maximum?
    if (col < size && row < size) {
       float result = 0;
       for(int ix=0;ix<size;ix++) {
          result += A[row*size+ix]*B[ix*size+col];
       }
       C[row*size+col] = result;
    }
}

extern "C" {

    void maxmul(float *A, float* B, float *C, int size) {

        int total = size*size;

        // Allocate device memory:
        float* gpu_A;
        float* gpu_B;
        float* gpu_C;
        int msize = total * sizeof(float);
        cudaMalloc((void**)&gpu_A, msize);
        cudaMemcpy(gpu_A,A,msize,cudaMemcpyHostToDevice);
        cudaMalloc((void**)&gpu_B, msize);
        cudaMemcpy(gpu_B,B,msize,cudaMemcpyHostToDevice);
        cudaMalloc((void**)&gpu_C,msize);

        // Blocks & grids:
        dim3 blocks(size,size);
        dim3 grid(1,1);

        // Call the kernel:
        vecmul<<<grid,blocks>>>(gpu_A,gpu_B,gpu_C,size);

        // Get the result Matrix:
        cudaMemcpy(C,gpu_C,msize,cudaMemcpyDeviceToHost);

        //Free device matrices
        cudaFree(gpu_A);
        cudaFree(gpu_B);
        cudaFree(gpu_C);
    }

}

The vecmul() function is the kernel and the maxmul() function is the helper. Its function is to allocate memory in the GPU, copy the parameters, invoke the kernel, and copy the result. Values ​​are passed by reference.

Go code

Program maxmul.go invokes the helper function and displays the result:

package main

/*
void maxmul(float *A, float* B, float *C, int size);
#cgo LDFLAGS: -L. -L./ -lmaxmul
*/
import "C"

import "fmt"

func Maxmul(a []C.float, b []C.float, c []C.float, size int) {
	C.maxmul(&a[0], &b[0], &c[0], C.int(size))
}

func main() {
	//in := []C.float{1.23, 4.56}
    //C.test(&in[0]) // C 1.230000 4.560000
	a := []C.float{-1,2,4,0,5,3,6,2,1}
	b := []C.float{3,0,2,3,4,5,4,7,2}
	var c []C.float = make([]C.float, 9)
	Maxmul(a,b,c,3)
	fmt.Println(c)
}

Before importing the C package, which allows to invoke external functions in pure C code (extern C), I pass the configuration of cgo, indicating the prototype of the function C , the path to lib and its name.I had to create a wrapper function in the Go code to invoke the external function to make things easier. It simply passes the reference to the arrays (the address of the first position) and the array size (in this case 3×3 = 9). In CUDA we work with flat matrices.I used the type C.float to create slices containing my arrays (transformed into vectors). Then I called the function. Note that I passed the size of each row (or column).

Compiling

To compile the C code use the command:

nvcc --ptxas-options=-v --compiler-options '-fPIC' -o libmaxmul.so --shared maxmul.cu

You need to have CUDA and the Nvidia driver installed!Then just run the Go code with the command:

go run maxmul.go
...
[19 36 16 27 41 31 28 15 24]

And this is the result of matrix multiplication!Full Source Code is here: https://github.com/cleuton/golang-network/tree/master/english/cuda/nostress

- Advertisement -

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Future Technology

All You Need to Know About the Life-Changing Digital Twin Technology

“Digital twin is a digital representation of a physical entity that helps in tracking and modifying the activities...

WhatsApp Users Hit 2 Billion: What Does This Mean for the Future of Privacy?

There are now over 2 billion registered users on the mobile messaging platform, up from 1.5 billion in 2017. Brief History...

How We Made a Simple Avatar Generator for Our Fitness Interviews

My name is Mads Phikamphon and I'm the founder of Bulk Hackers. At Bulk Hackers, we interview people who do great...

The Importance of Unlearning Emerging Technologies

The world of software is constantly changing at a very fast pace. Yesterday’s axioms might be tomorrow’s anti-patterns. Newborn technologies rise to popularity only to become...

How AI Could Save the 3D Printing Industry and the Future of Machines

3D printing is a billion-dollar market with a variety of use cases- from healthcare, replicas to architecture, airplane parts.

More Articles Like This

- Advertisement -