locked
Spark kernel not working with Jupyter and Fresh Install of newer Ubuntu 18.04 (preview) DSVM RRS feed

  • Question

  • Hello,

    After provisioning a new DSVM with newer Ubuntu 18.04 (preview), the Spark doesn't work out of the box with Jupyter.

    Jupyter will give this general error when trying to use the Spark (local) kernel with the spark sample from the DSVM "pySpark 2.0 modeling.ipynb":
    "A connection to the notebook server could not be established. The notebook will continue trying to reconnect. Check your network connection or notebook server configuration."

    One of the docs suggests to "enable a local single node Hadoop HDFS and Yarn instance":
    "dsvm-tools-data-platforms" web page from MS (can't post link as new member)

    echo -e 'y\n' | ssh-keygen -t rsa -P '' -f ~hadoop/.ssh/id_rsa
    cat ~hadoop/.ssh/id_rsa.pub >> ~hadoop/.ssh/authorized_keys
    chmod 0600 ~hadoop/.ssh/authorized_keys
    chown hadoop:hadoop ~hadoop/.ssh/id_rsa
    chown hadoop:hadoop ~hadoop/.ssh/id_rsa.pub
    chown hadoop:hadoop ~hadoop/.ssh/authorized_keys
    systemctl start hadoop-namenode hadoop-datanode hadoop-yarn


    But, those commands from the sample don't work... there is no ~hadoop directory.
    And the only directories named "hadoop" are either in tensorflow or xgboost subdirectories.


    I also tried the older (not preview) Ubuntu DSVM, and the Spark (local) kernel in Jupyter made it further, but still had errors on the sample.


    Spark seems to be a heavily featured package in the DSVM, so I'm not sure why it's not working easier "out of the box".  Maybe I'm missing something simple here??

    And is it easier to use Spark on the newer (preview) or older Ubuntu DSVM?  Which DSVM should I be using here?

    Thanks!
    Rich
    Saturday, January 25, 2020 7:58 PM

Answers

  • Hi,

    Thanks for trying out the preview image, and thanks for reporting this issue. You can fix this issue on your existing preview image by pointing the kernel to the right conda environment: edit the file /usr/local/share/jupyter/kernels/spark-3-python/kernel.sh and replace "py37_base" with "py37_default". There are two occurrences you need to replace.

    We will publish an updated image soon. 

    We recommend that you use the preview image going forward.

    -Paul

    • Marked as answer by richds Friday, January 31, 2020 2:01 PM
    Wednesday, January 29, 2020 12:49 AM

All replies

  • Hi,

    Thanks for trying out the preview image, and thanks for reporting this issue. You can fix this issue on your existing preview image by pointing the kernel to the right conda environment: edit the file /usr/local/share/jupyter/kernels/spark-3-python/kernel.sh and replace "py37_base" with "py37_default". There are two occurrences you need to replace.

    We will publish an updated image soon. 

    We recommend that you use the preview image going forward.

    -Paul

    • Marked as answer by richds Friday, January 31, 2020 2:01 PM
    Wednesday, January 29, 2020 12:49 AM
  • Hello Paul,

    That worked, Thanks!

    Hey, just a very minor thing for when you publish the new image:
    After changing kernel.sh, a Jupyter prompt asked me something about which kernel to use (which wasn't a typical Jupyter "popup message").
    No biggie, it's not prompting me anymore! 

    Thank you!
    -Rich

    Friday, January 31, 2020 2:01 PM