Kernel CUDA error #296

nuneslu · 2021-01-15T12:29:43Z

Describe the bug
I have been trying to use the SparseConvs on my GPU, after testing everything on CPU, when trying to use CUDA it throws the error:

assertion (!kernel.is_cuda()) failed. kernel must be CPU

I'm using the Sparse ResNet as in the repo examples but when I load the variables with CUDA and try to run the forward pass it throws this error.

To Reproduce
Steps to reproduce the behavior.

from models.resnet import *
import MinkowskiEngine as ME
import torch

if args.use_cuda:
        dtype = torch.cuda.FloatTensor
        device = torch.device("cuda")
        torch.cuda.set_device(0)
        print('GPU')
    else:
        dtype = torch.FloatTensor
        device = torch.device("cpu")

    net = SparseResNet14(in_channels=4, out_channels=args.feature_size).type(dtype)

   ...

   net(x)

The data is from modelnet40 dataset

Desktop (please complete the following information):

OS: Ubuntu 20.04
Python version: 3.6
CUDA version: 11.1
NVIDIA Driver version: 455.45.01
Minkowski Engine version 0.5.0
Output of the following command. (If you installed the latest MinkowskiEngine, simply call MinkowskiEngine.print_diagnostics())

wget -q https://raw.githubusercontent.com/NVIDIA/MinkowskiEngine/master/MinkowskiEngine/diagnostics.py ; python diagnostics.py

==========System==========
Linux-5.4.0-58-generic-x86_64-with-glibc2.29
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.1 LTS"
3.8.5 (default, Jul 28 2020, 12:59:40)
[GCC 9.3.0]
==========Pytorch==========
1.7.1
torch.cuda.is_available(): True
==========NVIDIA-SMI==========
/usr/bin/nvidia-smi
Driver Version 455.45.01
CUDA Version 11.1
VBIOS Version 90.04.7A.80.B2
Image Version G001.0000.02.04
==========NVCC==========
/usr/bin/nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
==========CC==========
/usr/bin/c++
c++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

==========MinkowskiEngine==========
/home/lucas/PhD/Implementations/SparseConvModels/sparseconv_venv/lib/python3.8/site-packages/MinkowskiEngine/init.py:36: UserWarning: The environment variable OMP_NUM_THREADS not set. MinkowskiEngine will automatically set OMP_NUM_THREADS=16. If you want to set OMP_NUM_THREADS manually, please export it on the command line before running a python script. e.g. export OMP_NUM_THREADS=12; python your_program.py. It is recommended to set it below 24.
warnings.warn(
0.5.0
MinkowskiEngine compiled with CUDA Support: True
NVCC version MinkowskiEngine is compiled: 10010
CUDART version MinkowskiEngine is compiled: 10010

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

chrischoy · 2021-01-15T12:32:25Z

Make sure to create a sparse tensor on the GPU.

model = ResNet().cuda()
sinput = ME.SparseTensor(..., device=“cuda”)
model(sinput)

chrischoy · 2021-01-15T13:08:47Z

Also, I noticed that the CUDA version used to compile ME is 10.1 but the pytorch is using 11.0. This could create potential problems.

Try to compile ME with

export CUDA_HOME=/usr/local/cuda-11.1; pip install MinkowskiEngine -v —no-deps

volvox292 · 2022-09-14T14:44:33Z

Hi,
I am getting the same error when trying convolution! Moving the tensor to gpu works just fine, but then the convolution fails... on cpu it works without problem.
Thanks for your help!

A = ME.SparseTensor(coordinates=coords, features=feats,device='cuda')
conv = ME.MinkowskiConvolution(in_channels=1, out_channels=128, kernel_size=3, stride=2, dimension=2)
B = conv(A)
B

RuntimeError: /tmp/pip-req-build-qvlqpmi2/src/convolution_gpu.cu:65, assertion (kernel.is_cuda()) failed. kernel must be CUDA

==========System==========
Linux-3.10.0-1160.53.1.el7.x86_64-x86_64-with-debian-buster-sid
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.5 LTS"
3.7.10 (default, Feb 26 2021, 18:47:35)
[GCC 7.3.0]
==========Pytorch==========
1.9.0
torch.cuda.is_available(): True
==========NVIDIA-SMI==========
/usr/bin/nvidia-smi
Driver Version 510.39.01
CUDA Version 11.6
VBIOS Version 88.00.48.00.02
Image Version G500.0202.00.02
GSP Firmware Version N/A
==========NVCC==========
/usr/local/cuda/bin/nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0
==========CC==========
CC=icpc
==========MinkowskiEngine==========
0.5.4
MinkowskiEngine compiled with CUDA Support: True
NVCC version MinkowskiEngine is compiled: 11010
CUDART version MinkowskiEngine is compiled: 11010

jrockholt3 · 2023-02-15T05:40:15Z

I am getting this error when using the Pruning function. all my tensors in the forward pass are 'cuda' but when pruning keep cuda fails.

pphuangyi · 2023-08-21T16:38:42Z

I encountered the same problem and the cause of my problem is probably not very common but I want to share it anyway in the hope that it might help someone out there 😸

My server has multiple GPU cards, and CUDA_VISIBLE_DEVICES=[id] doesn't point to the GPU id shown in nvidia-smi, unless I include os.environ['CUDA_DEVICE_ORDER'] = "PCI_BUS_ID" at the beginning my script to align them.

After I added the above line, I can do something like:

id = 0
device = f'cuda:{id}'
model = ResNet().to(device)
sinput = ME.SparseTensor(..., device=device)
model(sinput)

And the problem is solved.

I am very naive with respect to computer hardware, so I guess might be totally wrong. But I feel that the problem might be that the network and input were sent to different GPU cards if I don't align them and they simply didn't find each other.

chrischoy closed this as completed Jan 15, 2021

AbhishekKaushikCV mentioned this issue Oct 2, 2023

inference_vis.py -> ERROR: kernel must be CUDA. PRBonn/segcontrast#19

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel CUDA error #296

Kernel CUDA error #296

nuneslu commented Jan 15, 2021 •

edited

chrischoy commented Jan 15, 2021 •

edited

chrischoy commented Jan 15, 2021 •

edited

volvox292 commented Sep 14, 2022

jrockholt3 commented Feb 15, 2023

pphuangyi commented Aug 21, 2023

Kernel CUDA error #296

Kernel CUDA error #296

Comments

nuneslu commented Jan 15, 2021 • edited

chrischoy commented Jan 15, 2021 • edited

chrischoy commented Jan 15, 2021 • edited

volvox292 commented Sep 14, 2022

jrockholt3 commented Feb 15, 2023

pphuangyi commented Aug 21, 2023

nuneslu commented Jan 15, 2021 •

edited

chrischoy commented Jan 15, 2021 •

edited

chrischoy commented Jan 15, 2021 •

edited