locked
Cannot find libcublas.so.9.0 with nvidia-docker and DSVM RRS feed

  • Question

  • Hi,

    I'm trying to run a script using Azure ML Workbench and tensorflow on a DSVM with a GPU.

    When I run the script, I'm getting the following error : ' ImportError: libcublas.so.90 : cannot open shared object file

    The full trace is available here : pastebin.c o m /mtpgRghX (cannot use hyperlink, sorry)

    I'm not sure if it's a cuda problem (cuda 9.0 is installed on the dsvm), or the docker image  (mmlspark:plus-gpu-0.10.9)

    , or a mismatch with my tensorflow version (I use tensorflow-gpu 1.7).

    Any advice on this ?

    Friday, April 27, 2018 9:42 AM

Answers

  • Ok, solved this, posting my solution for anyone having a similar problem. The main issue was with my 'conda_dependencies.yml' file.

    Originally, tensorflow was installed with conda, but this would install an old version (1.3), so I used pip to get version 1.7. Then, when I wanted to install tensorflow-gpu using pip, it would not work (i suppose because pip has more trouble with non-python dependencies ?)

    Anyway, I found a way to add more channels to the conda dependencies file :

    > name: project_environment

    > channels:

    > - conda-forge

    > - defaults

    > - anaconda

    > dependencies:

    > ...

    > - tensorflow-gpu

    So it seems like it's working now. The key was to install tensorflow-gpu with 'conda install' and not pip, and to add additional channels to have an up-to-date version.

    Friday, April 27, 2018 12:35 PM

All replies

  • Ok, solved this, posting my solution for anyone having a similar problem. The main issue was with my 'conda_dependencies.yml' file.

    Originally, tensorflow was installed with conda, but this would install an old version (1.3), so I used pip to get version 1.7. Then, when I wanted to install tensorflow-gpu using pip, it would not work (i suppose because pip has more trouble with non-python dependencies ?)

    Anyway, I found a way to add more channels to the conda dependencies file :

    > name: project_environment

    > channels:

    > - conda-forge

    > - defaults

    > - anaconda

    > dependencies:

    > ...

    > - tensorflow-gpu

    So it seems like it's working now. The key was to install tensorflow-gpu with 'conda install' and not pip, and to add additional channels to have an up-to-date version.

    Friday, April 27, 2018 12:35 PM
  • Glad you figured it out! For future reference, AzureML's support forum is here:

    https://social.msdn.microsoft.com/forums/azure/en-US/home?forum=MachineLearning

    Friday, April 27, 2018 8:34 PM