Answered by:
DSVM Ubuntu spark version upgrade

Question
-
Hello, I am very new to DSVM and in data science tools. Does anyone tried to upgrade the spark version of DSVM from 2.1.1 to 2.3.0. We need a specific function of that already fix on this version And I like to ask how to properly upgrade the spark version of DSVM
Any guidance will greatly appreciated
-Ryan
Friday, March 16, 2018 8:07 AM
Answers
-
Hi Paul
Yes its Ubuntu Linux.
Here's the detailed steps of what I did
1. Extract the latest version of spark under /dsvm/tools/spark.
2. update the "current" symbolic link to the newly created directory for spark v2.3 (This symbolic link seems used by jupyter notebook)
current -> /dsvm/tools/spark/spark-2.3.0-bin-hadoop2.7/
3. Create a symbolic links for each jars related to azure from its common directory that I have found:
/opt/adls-jars/
/opt/azure-storage-jars/
/opt/DocumentDB-jars/4. And update the spark profile
I would also like to know if there's anything that I missed on the steps?
So far we didn't encounter any issue running our code written under version 2.1 with this configuration.
-Ryan
- Marked as answer by Paul Shealy [MSFT]Microsoft employee Thursday, March 22, 2018 6:08 PM
Tuesday, March 20, 2018 6:41 AM
All replies
-
Hi Ryan,
Is this on Linux? You can download 2.3.0 from here. I think it will work if you extract it over the version that is already in place. We change a few things so Spark can talk to blob storage and use a different location as scratch space, and I think extracting over the version there will keep those changes in place. (If that doesn't work, try deleting the contents of /dsvm/tools/spark and starting from scratch.)
- Edited by Paul Shealy [MSFT]Microsoft employee Monday, March 19, 2018 11:47 PM
Monday, March 19, 2018 11:47 PM -
Hi Paul
Yes its Ubuntu Linux.
Here's the detailed steps of what I did
1. Extract the latest version of spark under /dsvm/tools/spark.
2. update the "current" symbolic link to the newly created directory for spark v2.3 (This symbolic link seems used by jupyter notebook)
current -> /dsvm/tools/spark/spark-2.3.0-bin-hadoop2.7/
3. Create a symbolic links for each jars related to azure from its common directory that I have found:
/opt/adls-jars/
/opt/azure-storage-jars/
/opt/DocumentDB-jars/4. And update the spark profile
I would also like to know if there's anything that I missed on the steps?
So far we didn't encounter any issue running our code written under version 2.1 with this configuration.
-Ryan
- Marked as answer by Paul Shealy [MSFT]Microsoft employee Thursday, March 22, 2018 6:08 PM
Tuesday, March 20, 2018 6:41 AM -
This looks good.
We edit two configuration files. spark-env.sh moves the scratch space to the user's directory so there are no permission issues when multiple users run jobs. spark-defaults.conf sets the driver memory and loads the jars. These might be covered under #4, "update the spark profile", I just wanted to be sure we were on the same page.
Glad this is working for you.
Thursday, March 22, 2018 12:25 AM -
Hi
Thank you for giving other points to look at
I'm referring to /etc/profile.d/spark.sh on item#4 :) Because instead of overwriting the existing version I just extract the archive file on this path /dsvm/tools/spark
-Ryan- Edited by Ryan_Elfa Friday, March 23, 2018 3:56 AM
Thursday, March 22, 2018 6:01 AM