Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel CUDA error #296

Closed
nuneslu opened this issue Jan 15, 2021 · 5 comments
Closed

Kernel CUDA error #296

nuneslu opened this issue Jan 15, 2021 · 5 comments

Comments

@nuneslu
Copy link

nuneslu commented Jan 15, 2021

Describe the bug
I have been trying to use the SparseConvs on my GPU, after testing everything on CPU, when trying to use CUDA it throws the error:

assertion (!kernel.is_cuda()) failed. kernel must be CPU

I'm using the Sparse ResNet as in the repo examples but when I load the variables with CUDA and try to run the forward pass it throws this error.

To Reproduce
Steps to reproduce the behavior.

from models.resnet import *
import MinkowskiEngine as ME
import torch

if args.use_cuda:
        dtype = torch.cuda.FloatTensor
        device = torch.device("cuda")
        torch.cuda.set_device(0)
        print('GPU')
    else:
        dtype = torch.FloatTensor
        device = torch.device("cpu")

    net = SparseResNet14(in_channels=4, out_channels=args.feature_size).type(dtype)

   ...

   net(x)

The data is from modelnet40 dataset

Desktop (please complete the following information):

  • OS: Ubuntu 20.04
  • Python version: 3.6
  • CUDA version: 11.1
  • NVIDIA Driver version: 455.45.01
  • Minkowski Engine version 0.5.0
  • Output of the following command. (If you installed the latest MinkowskiEngine, simply call MinkowskiEngine.print_diagnostics())
wget -q https://raw.githubusercontent.com/NVIDIA/MinkowskiEngine/master/MinkowskiEngine/diagnostics.py ; python diagnostics.py

==========System==========
Linux-5.4.0-58-generic-x86_64-with-glibc2.29
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.1 LTS"
3.8.5 (default, Jul 28 2020, 12:59:40)
[GCC 9.3.0]
==========Pytorch==========
1.7.1
torch.cuda.is_available(): True
==========NVIDIA-SMI==========
/usr/bin/nvidia-smi
Driver Version 455.45.01
CUDA Version 11.1
VBIOS Version 90.04.7A.80.B2
Image Version G001.0000.02.04
==========NVCC==========
/usr/bin/nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
==========CC==========
/usr/bin/c++
c++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

==========MinkowskiEngine==========
/home/lucas/PhD/Implementations/SparseConvModels/sparseconv_venv/lib/python3.8/site-packages/MinkowskiEngine/init.py:36: UserWarning: The environment variable OMP_NUM_THREADS not set. MinkowskiEngine will automatically set OMP_NUM_THREADS=16. If you want to set OMP_NUM_THREADS manually, please export it on the command line before running a python script. e.g. export OMP_NUM_THREADS=12; python your_program.py. It is recommended to set it below 24.
warnings.warn(
0.5.0
MinkowskiEngine compiled with CUDA Support: True
NVCC version MinkowskiEngine is compiled: 10010
CUDART version MinkowskiEngine is compiled: 10010

Additional context
Add any other context about the problem here.

@chrischoy
Copy link
Contributor

chrischoy commented Jan 15, 2021

Make sure to create a sparse tensor on the GPU.

model = ResNet().cuda()
sinput = ME.SparseTensor(..., device=“cuda”)
model(sinput)

@chrischoy
Copy link
Contributor

chrischoy commented Jan 15, 2021

Also, I noticed that the CUDA version used to compile ME is 10.1 but the pytorch is using 11.0. This could create potential problems.

Try to compile ME with

export CUDA_HOME=/usr/local/cuda-11.1; pip install MinkowskiEngine -v —no-deps

@volvox292
Copy link

Hi,
I am getting the same error when trying convolution! Moving the tensor to gpu works just fine, but then the convolution fails... on cpu it works without problem.
Thanks for your help!

A = ME.SparseTensor(coordinates=coords, features=feats,device='cuda')
conv = ME.MinkowskiConvolution(in_channels=1, out_channels=128, kernel_size=3, stride=2, dimension=2)
B = conv(A)
B

RuntimeError: /tmp/pip-req-build-qvlqpmi2/src/convolution_gpu.cu:65, assertion (kernel.is_cuda()) failed. kernel must be CUDA

==========System==========
Linux-3.10.0-1160.53.1.el7.x86_64-x86_64-with-debian-buster-sid
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.5 LTS"
3.7.10 (default, Feb 26 2021, 18:47:35)
[GCC 7.3.0]
==========Pytorch==========
1.9.0
torch.cuda.is_available(): True
==========NVIDIA-SMI==========
/usr/bin/nvidia-smi
Driver Version 510.39.01
CUDA Version 11.6
VBIOS Version 88.00.48.00.02
Image Version G500.0202.00.02
GSP Firmware Version N/A
==========NVCC==========
/usr/local/cuda/bin/nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0
==========CC==========
CC=icpc
==========MinkowskiEngine==========
0.5.4
MinkowskiEngine compiled with CUDA Support: True
NVCC version MinkowskiEngine is compiled: 11010
CUDART version MinkowskiEngine is compiled: 11010

@jrockholt3
Copy link

I am getting this error when using the Pruning function. all my tensors in the forward pass are 'cuda' but when pruning keep cuda fails.

@pphuangyi
Copy link

I encountered the same problem and the cause of my problem is probably not very common but I want to share it anyway in the hope that it might help someone out there 😸

My server has multiple GPU cards, and CUDA_VISIBLE_DEVICES=[id] doesn't point to the GPU id shown in nvidia-smi, unless I include os.environ['CUDA_DEVICE_ORDER'] = "PCI_BUS_ID" at the beginning my script to align them.

After I added the above line, I can do something like:

id = 0
device = f'cuda:{id}'
model = ResNet().to(device)
sinput = ME.SparseTensor(..., device=device)
model(sinput)

And the problem is solved.

I am very naive with respect to computer hardware, so I guess might be totally wrong. But I feel that the problem might be that the network and input were sent to different GPU cards if I don't align them and they simply didn't find each other.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants