none
DSVM Ubuntu spark version upgrade RRS feed

  • Question

  • Hello, I am very new to DSVM and in data science tools. Does anyone tried to upgrade the spark version of DSVM from 2.1.1 to 2.3.0. We need a specific function of that already fix on this version And I like to ask how to properly upgrade the spark version of DSVM

    Any guidance will greatly appreciated

    -Ryan

    Friday, March 16, 2018 8:07 AM

Answers

  • Hi Paul

    Yes its Ubuntu Linux.

    Here's the detailed steps of what I did

    1. Extract the latest version of spark under /dsvm/tools/spark.

    2. update the "current" symbolic link to the newly created directory for spark v2.3  (This symbolic link seems used by jupyter notebook)  

    current -> /dsvm/tools/spark/spark-2.3.0-bin-hadoop2.7/

    3. Create a symbolic links for each jars related to azure from its common directory that I have found:

    /opt/adls-jars/
    /opt/azure-storage-jars/
    /opt/DocumentDB-jars/

    4. And update the spark profile 

    I would also like to know if there's anything that I missed on the steps?  

    So far we didn't encounter any issue running our code written under version 2.1 with this configuration.

    -Ryan

    Tuesday, March 20, 2018 6:41 AM

All replies

  • Hi Ryan,

    Is this on Linux? You can download 2.3.0 from here. I think it will work if you extract it over the version that is already in place. We change a few things so Spark can talk to blob storage and use a different location as scratch space, and I think extracting over the version there will keep those changes in place. (If that doesn't work, try deleting the contents of /dsvm/tools/spark and starting from scratch.) 


    Monday, March 19, 2018 11:47 PM
    Owner
  • Hi Paul

    Yes its Ubuntu Linux.

    Here's the detailed steps of what I did

    1. Extract the latest version of spark under /dsvm/tools/spark.

    2. update the "current" symbolic link to the newly created directory for spark v2.3  (This symbolic link seems used by jupyter notebook)  

    current -> /dsvm/tools/spark/spark-2.3.0-bin-hadoop2.7/

    3. Create a symbolic links for each jars related to azure from its common directory that I have found:

    /opt/adls-jars/
    /opt/azure-storage-jars/
    /opt/DocumentDB-jars/

    4. And update the spark profile 

    I would also like to know if there's anything that I missed on the steps?  

    So far we didn't encounter any issue running our code written under version 2.1 with this configuration.

    -Ryan

    Tuesday, March 20, 2018 6:41 AM
  • This looks good.

    We edit two configuration files. spark-env.sh moves the scratch space to the user's directory so there are no permission issues when multiple users run jobs. spark-defaults.conf sets the driver memory and loads the jars. These might be covered under #4, "update the spark profile", I just wanted to be sure we were on the same page.

    Glad this is working for you.

    Thursday, March 22, 2018 12:25 AM
    Owner
  • Hi

    Thank you for giving other points to look at

    I'm referring to /etc/profile.d/spark.sh on item#4 :) Because instead of overwriting the existing version I just extract the archive file on this path /dsvm/tools/spark

    -Ryan
    • Edited by Ryz_24 Friday, March 23, 2018 3:56 AM
    Thursday, March 22, 2018 6:01 AM