OpenCL 学习step by step (2) 一个简单的OpenCL的程序
现在,我们开始写一个简单的OpenCL程序,计算两个数组相加的和,放到另一个数组中去。程序用CPU和GPU分别计算,最后验证它们是否相等。OpenCL程序的流程大致如下: 下面是source code中的主要代码: int main(int argc, char* argv[]) { //在host内存中创建
现在,我们开始写一个简单的OpenCL程序,计算两个数组相加的和,放到另一个数组中去。程序用CPU和GPU分别计算,最后验证它们是否相等。OpenCL程序的流程大致如下:
下面是source code中的主要代码:
int main(int argc, char* argv[])
{
//在host内存中创建三个缓冲区
float *buf1 = 0;
float *buf2 = 0;
float *buf = 0;
buf1 =(float *)malloc(BUFSIZE * sizeof(float));
buf2 =(float *)malloc(BUFSIZE * sizeof(float));
buf =(float *)malloc(BUFSIZE * sizeof(float));
//用一些随机值初始化buf1和buf2的内容
int i;
srand( (unsigned)time( NULL ) );
for(i = 0; i
buf1[i] = rand()%65535;
srand( (unsigned)time( NULL ) +1000);
for(i = 0; i
buf2[i] = rand()%65535;
//cpu计算buf1,buf2的和
for(i = 0; i
buf[i] = buf1[i] + buf2[i];
cl_uint status;
cl_platform_id platform;
//创建平台对象
status = clGetPlatformIDs( 1, &platform, NULL );
注意:如果我们系统中安装不止一个opencl平台,比如我的os中,有intel和amd两家opencl平台,用上面这行代码,有可能会出错,因为它得到了intel的opencl平台,而intel的平台只支持cpu,而我们后面的操作都是基于gpu,这时我们可以用下面的代码,得到AMD的opencl平台。
cl_uint numPlatforms;<p>std::string platformVendor;</p><p>status = clGetPlatformIDs(0, NULL, &numPlatforms);</p><p><span>if</span>(status != CL_SUCCESS)</p><p>{</p><p><span>return</span> 0;</p><p>}</p><p><span>if</span> (0 </p><p>{</p><p>cl_platform_id* platforms = <span>new</span> cl_platform_id[numPlatforms];</p><p>status = clGetPlatformIDs(numPlatforms, platforms, NULL);</p><p><span>char</span> platformName[100];</p><p><span>for</span> (<span>unsigned</span> i = 0; i </p><p>{</p><p>status = clGetPlatformInfo(platforms[i],</p><p>CL_PLATFORM_VENDOR,</p><p><span>sizeof</span>(platformName),</p><p>platformName,</p><p>NULL);</p><p>platform = platforms[i];</p><p>platformVendor.assign(platformName);</p><p><span>if</span> (!strcmp(platformName, <span>"Advanced Micro Devices, Inc."</span>))</p><p>{</p><p><span>break</span>;</p><p>}</p><p>}</p><p>std::cout "Platform found : " "\n";</p><p><span>delete</span>[] platforms;</p><p>}</p>
cl_device_id device;
//创建GPU设备
clGetDeviceIDs( platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL);
//创建context
cl_context context = clCreateContext( NULL, 1, &device, NULL, NULL, NULL);
//创建命令队列
cl_command_queue queue = clCreateCommandQueue( context,
device,
CL_QUEUE_PROFILING_ENABLE, NULL );
//创建三个OpenCL内存对象,并把buf1的内容通过隐式拷贝的方式
//拷贝到clbuf1,buf2的内容通过显示拷贝的方式拷贝到clbuf2
cl_mem clbuf1 = clCreateBuffer(context,
CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
BUFSIZE*sizeof(cl_float),buf1,
NULL );
cl_mem clbuf2 = clCreateBuffer(context,
CL_MEM_READ_ONLY ,
BUFSIZE*sizeof(cl_float),NULL,
NULL );
cl_event writeEvt;
status = clEnqueueWriteBuffer(queue, clbuf2, 1, 0, BUFSIZE*sizeof(cl_float), buf2, 0, 0, 0);
上面这行代码把buf2中的内容拷贝到clbuf2,因为buf2位于host端,clbuf2位于device端,所以这个函数会执行一次host到device的传输操作,或者说一次system memory到video memory的拷贝操作,所以我在该函数的后面放置了clFush函数,表示把command queue中的所有命令提交到device(注意:该命令并不保证命令执行完成),所以我们调用函数waitForEventAndRelease来等待write缓冲的完成,swaitForEventAndReleae 是一个用户定义的函数,它的内容如下,主要代码就是通过event来查询我们的操作是否完成,没完成的话,程序就一直block在这行代码处,另外我们也可以用opencl中内置的函数clWaitForEvents来代替clFlush和swaitForEventAndReleae。
<span>//等待事件完成</span><p><span>int</span> waitForEventAndRelease(cl_event *event)</p><p>{</p><p>cl_int status = CL_SUCCESS;</p><p>cl_int eventStatus = CL_QUEUED;</p><p><span>while</span>(eventStatus != CL_COMPLETE)</p><p>{</p><p>status = clGetEventInfo(</p><p>*event,</p><p>CL_EVENT_COMMAND_EXECUTION_STATUS,</p><p><span>sizeof</span>(cl_int),</p><p>&eventStatus,</p><p>NULL);</p><p>}</p><p>status = clReleaseEvent(*event);</p><p><span>return</span> 0;</p><p>}</p>
status = clFlush(queue);
//等待数据传输完成再继续往下执行
waitForEventAndRelease(&writeEvt);
cl_mem buffer = clCreateBuffer( context,
CL_MEM_WRITE_ONLY,
BUFSIZE * sizeof(cl_float),
NULL, NULL );
kernel文件中放的是gpu中执行的代码,它被放在一个单独的文件add.cl中,本程序中kernel代码非常简单,只是执行两个数组相加。kernel的代码为:
__kernel <span>void</span> vecadd(__global <span>const</span> <span>float</span>* A, __global <span>const</span> <span>float</span>* B, __global <span>float</span>* C)<p>{</p><p><span>int</span> id = get_global_id(0);</p><p>C[id] = A[id] + B[id];</p><p>}</p>
//kernel文件为add.cl
const char * filename = "add.cl"
std::string sourceStr;
status = convertToString(filename, sourceStr);
convertToString也是用户定义的函数,该函数把kernel源文件读入到一个string中,它的代码如下:
<span>//把文本文件读入一个string中,用来读入kernel源文件</span><p><span>int</span> convertToString(<span>const</span> <span>char</span> *filename, std::string& s)</p><p>{</p><p>size_t size;</p><p><span>char</span>* str;</p><p>std::fstream f(filename, (std::fstream::in | std::fstream::binary));</p><p><span>if</span>(f.is_open())</p><p>{</p><p>size_t fileSize;</p><p>f.seekg(0, std::fstream::end);</p><p>size = fileSize = (size_t)f.tellg();</p><p>f.seekg(0, std::fstream::beg);</p><p>str = <span>new</span> <span>char</span>[size+1];</p><p><span>if</span>(!str)</p><p>{</p><p>f.close();</p><p><span>return</span> NULL;</p><p>}</p><p>f.read(str, fileSize);</p><p>f.close();</p><p>str[size] = <span>'\0'</span>;</p><p>s = str;</p><p><span>delete</span>[] str;</p><p><span>return</span> 0;</p><p>}</p><p>printf(<span>"Error: Failed to open file %s\n"</span>, filename);</p><p><span>return</span> 1;</p><p>}</p>
const char * source = sourceStr.c_str();
size_t sourceSize[] = { strlen(source) };
//创建程序对象
cl_program program = clCreateProgramWithSource( context, 1, &source, sourceSize, NULL);
//编译程序对象
status = clBuildProgram( program, 1, &device, NULL, NULL, NULL );
if(status != 0)
{
printf("clBuild failed:%d\n", status);
char tbuf[0x10000];
clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG, 0x10000, tbuf, NULL);
printf("\n%s\n", tbuf);
return -1;
}
//创建Kernel对象
cl_kernel kernel = clCreateKernel( program, "vecadd", NULL );
//设置Kernel参数
cl_int clnum = BUFSIZE;
clSetKernelArg(kernel, 0, sizeof(cl_mem), (void*) &clbuf1);
clSetKernelArg(kernel, 1, sizeof(cl_mem), (void*) &clbuf2);
clSetKernelArg(kernel, 2, sizeof(cl_mem), (void*) &buffer);
注意:在执行kernel时候,我们只设置了global work items数量,没有设置group size,这时候,系统会使用默认的work group size,通常可能是256之类的。
//执行kernel,Range用1维,work itmes size为BUFSIZE
cl_event ev;
size_t global_work_size = BUFSIZE;
clEnqueueNDRangeKernel( queue, kernel, 1, NULL, &global_work_size, NULL, 0, NULL, &ev);
status = clFlush( queue );
waitForEventAndRelease(&ev);
//数据拷回host内存
cl_float *ptr;
cl_event mapevt;
ptr = (cl_float *) clEnqueueMapBuffer( queue, buffer, CL_TRUE, CL_MAP_READ, 0, BUFSIZE * sizeof(cl_float), 0, NULL, NULL, NULL );
status = clFlush( queue );
waitForEventAndRelease(&mapevt);
//结果验证,和cpu计算的结果比较
if(!memcmp(buf, ptr, BUFSIZE))
printf("Verify passed\n");
else printf("verify failed");
if(buf)
free(buf);
if(buf1)
free(buf1);
if(buf2)
free(buf2);
程序结束后,这些opencl对象一般会自动释放,但是为了程序完整,养成一个好习惯,这儿我加上了手动释放opencl对象的代码。
//删除OpenCL资源对象
clReleaseMemObject(clbuf1);
clReleaseMemObject(clbuf2);
clReleaseMemObject(buffer);
clReleaseProgram(program);
clReleaseCommandQueue(queue);
clReleaseContext(context);
return 0;
}
程序执行后的界面如下:
完整的代码请参考:
工程文件gclTutorial1
代码下载:http://files.cnblogs.com/mikewolf2002/gclTutorial.zip
原文作者:迈克老狼

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



The default map on the iPhone is Maps, Apple's proprietary geolocation provider. Although the map is getting better, it doesn't work well outside the United States. It has nothing to offer compared to Google Maps. In this article, we discuss the feasible steps to use Google Maps to become the default map on your iPhone. How to Make Google Maps the Default Map in iPhone Setting Google Maps as the default map app on your phone is easier than you think. Follow the steps below – Prerequisite steps – You must have Gmail installed on your phone. Step 1 – Open the AppStore. Step 2 – Search for “Gmail”. Step 3 – Click next to Gmail app

The hard disk serial number is an important identifier of the hard disk and is usually used to uniquely identify the hard disk and identify the hardware. In some cases, we may need to query the hard drive serial number, such as when installing an operating system, finding the correct device driver, or performing hard drive repairs. This article will introduce some simple methods to help you check the hard drive serial number. Method 1: Use Windows Command Prompt to open the command prompt. In Windows system, press Win+R keys, enter "cmd" and press Enter key to open the command

Is the clock app missing from your phone? The date and time will still appear on your iPhone's status bar. However, without the Clock app, you won’t be able to use world clock, stopwatch, alarm clock, and many other features. Therefore, fixing missing clock app should be at the top of your to-do list. These solutions can help you resolve this issue. Fix 1 – Place the Clock App If you mistakenly removed the Clock app from your home screen, you can put the Clock app back in its place. Step 1 – Unlock your iPhone and start swiping to the left until you reach the App Library page. Step 2 – Next, search for “clock” in the search box. Step 3 – When you see “Clock” below in the search results, press and hold it and

Are you getting "Unable to allow access to camera and microphone" when trying to use the app? Typically, you grant camera and microphone permissions to specific people on a need-to-provide basis. However, if you deny permission, the camera and microphone will not work and will display this error message instead. Solving this problem is very basic and you can do it in a minute or two. Fix 1 – Provide Camera, Microphone Permissions You can provide the necessary camera and microphone permissions directly in settings. Step 1 – Go to the Settings tab. Step 2 – Open the Privacy & Security panel. Step 3 – Turn on the “Camera” permission there. Step 4 – Inside, you will find a list of apps that have requested permission for your phone’s camera. Step 5 – Open the “Camera” of the specified app

Learn Pygame from scratch: complete installation and configuration tutorial, specific code examples required Introduction: Pygame is an open source game development library developed using the Python programming language. It provides a wealth of functions and tools, allowing developers to easily create a variety of type of game. This article will help you learn Pygame from scratch, and provide a complete installation and configuration tutorial, as well as specific code examples to get you started quickly. Part One: Installing Python and Pygame First, make sure you have

The Charm of Learning C Language: Unlocking the Potential of Programmers With the continuous development of technology, computer programming has become a field that has attracted much attention. Among many programming languages, C language has always been loved by programmers. Its simplicity, efficiency and wide application make learning C language the first step for many people to enter the field of programming. This article will discuss the charm of learning C language and how to unlock the potential of programmers by learning C language. First of all, the charm of learning C language lies in its simplicity. Compared with other programming languages, C language

When editing text content in Word, you sometimes need to enter formula symbols. Some guys don’t know how to input the root number in Word, so Xiaomian asked me to share with my friends a tutorial on how to input the root number in Word. Hope it helps my friends. First, open the Word software on your computer, then open the file you want to edit, and move the cursor to the location where you need to insert the root sign, refer to the picture example below. 2. Select [Insert], and then select [Formula] in the symbol. As shown in the red circle in the picture below: 3. Then select [Insert New Formula] below. As shown in the red circle in the picture below: 4. Select [Radical Formula], and then select the appropriate root sign. As shown in the red circle in the picture below:
![Scripted diagnostics native host has stopped working [FIXED]](https://img.php.cn/upload/article/000/465/014/171012105385034.jpg?x-oss-process=image/resize,m_fill,h_207,w_330)
When running a program or troubleshooting, if you get an error message indicating that script diagnostics localhost has stopped working, this could be due to a number of reasons. Fixing this issue on a Windows 11/10 PC may require a different approach, as each computer's situation may be different. One common reason is that the script itself is buggy or corrupted, preventing it from running properly. Solutions to this problem may include repairing or reinstalling the script, or trying a different version of the script. Another possible cause is corrupted or missing system files, which may affect the script's operation. In this case, you can try running a system file check tool to repair any damaged files, or perform a system recovery to restore to the previous
