ubuntu dsvm - runtime pyspark 2.0 modeling notebook - runtime error detailed below RRS feed

  • Question

  • Hi, I just created the Ubuntu based DSVM and started to go through the jupyter notebook and selected pyspark 2.0 modeling to run (uses the NYC Taxi dataset). I get a very long runtime error and tried to find something interesting and saw this, hoping it is useful in figuring out the issue: (the error is emitted at cell 16):

    Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /home/jw/notebooks/SparkML/pySpark/metastore_db.

    also - on the same system i started sparkR and did a simple:

    head(faithful) # works fine

    sparkfaithful <- createDataFrame(faithful) #try to create a spark dataframe

    that results in a several hundred line error that i'm not even sure i captured all of it. seems like something might be systemically going on at spark level.

    FWIW - using notebooks or code that does not touch spark behaves just fine.




    I rebooted the VM and then reran the notebook cell by cell instead of just running all cells. That was successful as was a subsequent "run all cells". My best guess is that the original boot or loading up of spark had problems.

    • Edited by jimwill Monday, April 24, 2017 4:16 PM update
    Friday, April 21, 2017 12:37 PM


All replies

  • Hi Jim,

    Thanks for reaching out. Glad to hear you got it working. We've seen this error before when users have multiple processes accessing the same metastore database - perhaps you somehow ended up with multiple Jupyter processes? Regardless, glad it's resolved.

    Please let us know if you encounter any other issues.

    Tuesday, April 25, 2017 12:09 AM
  • yes - that is possible. i was demoing it at a meetup and probably had two jupyter sessions - one on the local DSVM system and a remote one from my desktop browser. thanks.



    Friday, April 28, 2017 3:27 PM