none
Accessing Excel file using Excel Interop COM API RRS feed

  • Question

  • Hello,

    I would like to deploy a job on the HPC 2012. The job requires to read some input from some Excel files. I have implemented this using the Excel interop COM API with C#. This works just fine if I run the program interactively on any computer that has Excel installed. But when running the same program as a batch job in HPC it always produces an exception:

    Microsoft Office Excel can't access the file
    
    \\TASQL\3aDaten\TestCluster\ExcelTest\Test.xlsx. There is multiple possibilities :
    
        The file's name or the path doesn't existe
        The file is currently used par another program
        The workbook you are trying to use has the same name of another workbook already open

    The code I am using is something very close to:

    IExcel._Application application = new IExcel.Application();
    IExcel.Workbooks workbooks = null;
    IExcel._Workbook workbook = null;
    IExcel.Sheets worksheets = null;
    IExcel.Worksheet worksheet = null;
    
    application = new IExcel.Application();
    application.Visible = false;
    application.UserControl = false;
    application.DisplayAlerts = false;
    
    workbooks = application.Workbooks;
    try
    {
       workbook = workbooks.Open(file, 0, false, 3, "", "", missing, missing, missing, missing, missing, missing, missing, missing, missing);
    }
    catch (Exception ee)
    {
       log.WriteLine("ERROR: open workbook" + ee.Message);
       log.Flush();
    }
    worksheets = workbook.Worksheets;
    worksheet = worksheets.get_Item(1);
    IExcel.Range range = worksheet.get_Range("A1:B2");
    IExcel.Range cells = range.Cells;
    System.Array data = (System.Array)cells.Value2;
    Console.WriteLine("Cell: " + data.GetValue(1, 1).ToString());

    On the HPC I have Office 2010 installed. I performed the "Excel Workbook Configuration Test" and the result is success.

    Could someone point me what the problem is and how could I make this work?

    Thanks in advance!

    Thursday, November 19, 2015 11:15 AM

Answers

  • Hi Valentin S,

    You may try following two things,

    1. Open the workbook with the argument list as below,

       Application xlApp = new Application();
                Workbook workbook = xlApp.Workbooks.Open(
                    @"c:\test.xlsx",
                    0,
                    true,
                    5,
                    null,
                    null,
                    true,
                    XlPlatform.xlWindows,
                    string.Empty,
                    false,
                    false,
                    0,
                    false,
                    false,
                    XlCorruptLoad.xlNormalLoad);

       Check here for the API reference.

    2. Run the HPC job under an interactive remote desktop or console session by setting the HPC_ATTACHTOSESSION or HPC_ATTACHTOCONSOLE job environment variable with value True when submitting the job.

        For details, pelase refer this link.

       Note, an interactive session would be required when there are too many e.g. 20 workbook instances opened at the same time on a same machine.

    BR,

    Yutong Sun

    • Marked as answer by Valentin S Friday, November 27, 2015 9:17 AM
    Thursday, November 19, 2015 3:36 PM
    Moderator

All replies

  • Hi Valentin S,

    You may try following two things,

    1. Open the workbook with the argument list as below,

       Application xlApp = new Application();
                Workbook workbook = xlApp.Workbooks.Open(
                    @"c:\test.xlsx",
                    0,
                    true,
                    5,
                    null,
                    null,
                    true,
                    XlPlatform.xlWindows,
                    string.Empty,
                    false,
                    false,
                    0,
                    false,
                    false,
                    XlCorruptLoad.xlNormalLoad);

       Check here for the API reference.

    2. Run the HPC job under an interactive remote desktop or console session by setting the HPC_ATTACHTOSESSION or HPC_ATTACHTOCONSOLE job environment variable with value True when submitting the job.

        For details, pelase refer this link.

       Note, an interactive session would be required when there are too many e.g. 20 workbook instances opened at the same time on a same machine.

    BR,

    Yutong Sun

    • Marked as answer by Valentin S Friday, November 27, 2015 9:17 AM
    Thursday, November 19, 2015 3:36 PM
    Moderator
  • Here is a useful reply from VijayRock on another thread. You may find the class in the assembly Microsoft.Hpc.Excel.dll from the HPC Pack SDK or under the GAC location %systemdrive%\Windows\Microsoft.NET\assembly\GAC_MSIL\Microsoft.Hpc.Excel.

    >>Microsoft.Hpc.Excel.ExcelDriver: provides a wrapper around the Excel Primary Interop Assembly (PIA) interface that enables managed code to interact with a Microsoft Officeapplication's COM-based Microsoft.Hpc.Excel.ExcelDriver, which allows a user to open a workbook, launch an Excel process, and invoke a macro.

         
    >>Friday, November 20, 2015 6:07 AM

    BR,

    Yutong Sun

    Wednesday, November 25, 2015 4:09 AM
    Moderator
  • Did you imported Microsoft.Hpc.Excel Interop to your class? Otherwise, you have to call the entire object hierarchy everytime
    Wednesday, November 25, 2015 7:20 PM
  • Thanks for the answers! And sorry for my late reply (have been deviated at work with other matters).

    I have tried your first solution and I can confirm that it works:

    * using the arguments in the Workbooks.Open() function as suggested by you.

    * using the HPC_ATTACHTOSESSION=true environment variable.


    Regarding the second solution, I havent tried it since I would have to change a lot of my existing code. But I think it is a goot tip and I will investigate it more in the future in case I encounter problems again.

    Friday, November 27, 2015 9:36 AM
  • So without an interactive remte desktop session, the limit of workbook instances was on the compute node instead of the head node? Suppose I have a cluster with 20 nodes and each node has only 12 cores, such a cluster won't encounter "unreadable content" issue, am I right?
    Monday, March 28, 2016 9:44 AM
  • Hi MChen19th,

    That's right. The limit is on the compute node. The "readable content" issue may occur when there are too many Excel instances, each for one task/core, on a compute node. This issue can be eliminated by running the Excel instances in an interactive session.

    BR,

    Yutong Sun  

    Monday, March 28, 2016 12:26 PM
    Moderator
  • Okay, thanks.

    I was checking the document of https://technet.microsoft.com/en-us/library/ff919452.aspx

    There was a property "Cores per node", so if I allocate 120 cores to an Excel job and set the Cores per node=12,  Number of nodes = 10, so the job would be running on 10 nodes and each node would only have 12 excel instances running, and this could eliminate the issue as well, right?

    I know that for ISchedulerJob there are MaxCoresPerNode and MaxCoresPerNode properties, but for Excel job we submit job by using BrokerClient<T> and Session, seems no such properties in this way.

    Sorry to bother you again. The reason that I'm asking this was because we have some issue from IT division that the "remote desktop session" solution is not allowed, so I have to figure out another approach.


    • Edited by MChen19th Tuesday, March 29, 2016 1:22 AM
    Tuesday, March 29, 2016 1:06 AM
  • Hi MChen19th,

    Yes, you may use MaxCoresPerNode job property for node selection to choose the nodes with less cores. When running Excel jobs by SOA Session APIs or Excel VBA Macros, this job property can be set through the "Maximum Cores Per Node" property of the job template with which the SOA Session job is submitted.

    Another solution could be using the under-subscribe cores feature (see how-to) if you have admin rights on the cluster. 

    BR,

    Yutong Sun

    Tuesday, March 29, 2016 6:59 AM
    Moderator
  • Hi Yutong,

    Not sure if you had ever seen this reply, so I'm just trying to post my question here. I'd appreciate it very much if you can take a look my issue.

    I don't think I can use the under-subscribe feature because the nodes run different kind of jobs, not only excel jobs.

    Regarding MaxCoresPerNode, I am a little confused. You said "use MaxCoresPerNode job property for node selection to choose the nodes with less cores", so this property is not to limit the number of cores on a node allocated? it is used to filter compute nodes? If we have 10 nodes, each of them has 32 cores, does this mean this property won't be useful in this case?

    Looking at the doc https://msdn.microsoft.com/en-us/library/microsoft.hpc.scheduler.schedulerjob.maxcorespernode.aspx, it says

        Retrieves or sets the maximum number of cores that a node can have for the job to run on it.

    ...seems different from what you said, I'm a little confused.

    As I said I didn't find a way to use the property when submitting an Excel job because the SessionStartInfo doesn't have such property. However, when I tried to use job template, there was also no such property, is that available for HPC pack 2008 R2?

       

    Friday, April 1, 2016 12:59 AM
  • Hi MChen19th,

    The MaxCoresPerNode job property is used to filter the nodes which has no more than the specified number of cores. If you specify the number as 16, but all your nodes have 32 cores each, then the job would fail because the scheduler cannot find any node has only 16 or less number of cores.

    For the "Maximum Cores Per Node" property of the job template, it is available in HPC Pack 2012 R2.

    BR,

    Yutong Sun

    Saturday, April 2, 2016 3:32 AM
    Moderator
  • Thanks Yutong, if so I think this is not the right way to go.

    So for either HPC 2012 or HPC 2008 R2,  there is no way to run a job with too many (e.g., 24) excel instances on a single node without a remote session, is my understanding correct? Even in HPC 2012, set HPC_ATTACHTOSESSION=True, still needs a remote desktop on the compute nodes that run excel jobs. If the IT division doesn't allow to create remote desktop session, upgrading the HPC cluster to 2012 R2 won't help, right?

    Monday, April 4, 2016 3:02 PM
  • Hi MChen19th,

    If upgraded to HPC Pack 2012 R2, you may use the "Maximum Cores Per Node" property of the job template to choose the nodes with less cores for your Excel workload, so that there is no need to set HPC_ATTACHTOSESSION=True and create user log-on session.

    You may also try to under subscribe sockets instead of cores, and use socket as resource type for your Excel workload. So other jobs using cores won't be affected.

    BR,

    Yutong Sun

    Tuesday, April 5, 2016 6:23 AM
    Moderator