cuda - 内核中的新运算符..奇怪的行为

我想知道是否有人可以通过内核中的 new 运算符阐明这种行为。以下是代码

#include <stdio.h>
#include "cuda_runtime.h"
#include "cuComplex.h"
using namespace std;
__global__ void test()
{

    cuComplex *store;
    store= new cuComplex[30000];
    if (store==NULL) printf("Unable to allocate %i\n",blockIdx.y);
    delete store;
    if (threadIdx.x==10000) store->x=0.0;
}

int main(int argc, char *argv[])
{
    float timestamp;
    cudaEvent_t event_start,event_stop;
    // Initialise


    cudaEventCreate(&event_start);
    cudaEventCreate(&event_stop);
    cudaEventRecord(event_start, 0);
    dim3 threadsPerBlock;
    dim3 blocks;
    threadsPerBlock.x=1;
    threadsPerBlock.y=1;
    threadsPerBlock.z=1;
    blocks.x=1;
    blocks.y=500;
    blocks.z=1;

    cudaEventRecord(event_start);
    test<<<blocks,threadsPerBlock,0>>>();
    cudaEventRecord(event_stop, 0);
    cudaEventSynchronize(event_stop);
    cudaEventElapsedTime(&timestamp, event_start, event_stop);
    printf("test took  %fms \n", timestamp);
}

在 GTX680 Cuda 5 上运行它并调查输出会注意到随机内存未分配 :( 我在想这可能是因为所有全局内存都已完成但我有 2GB 内存并且因为最大数量事件 block 是 16 使用此方法分配的内存量最大应为 16*30000*8=38.4x10e6.. 即大约 38Mb。那么我还应该考虑什么？

最佳答案

该问题与 malloc() 和 free() 设备系统调用使用的堆大小有关。有关详细信息，请参阅 NVIDIA CUDA C 编程指南中的 3.2.9 调用堆栈部分和附录 B.16.1 堆内存分配。

如果您设置堆大小以满足您的内核要求，您的测试将会成功

    cudaDeviceSetLimit(cudaLimitMallocHeapSize, 500*30000*sizeof(cuComplex));

https://stackoverflow.com/questions/13072624/

相关文章：

php - 在选定的时间后停止执行循环

sql-server-2008 - SQL中错误跟踪系统的数据库设计

arduino - avrdude.exe : invalid file format ' ' in

emacs - GNU Emacs 中的前段

database - SQLite 数据库作为文本文件？

python - 如何在python字符串中找到相反的字符？

vim-plugin - 在插入模式下使用 vundle 绑定(bind) ctrl-h 以在 vi

git - 将分支 merge 到 master 上的新子目录中

python - 在网格管理器中使用带有 Tkinter 的输入框？

apache - 同一路径的多个 Apache Location 指令