locked
file system inconsistency within a MPI application RRS feed

  • Question

  • Hi all,

    I have a inconsistent file view problem with my Fortran - MPI application running on a Windows HPC cluster. The applications runs on a global NTFS network directory shared by the head node. The program is as easy as you can see in the following code block.

        program File_open
    
        implicit none
    
        include 'mpif.h'
    
        ! Variables
        integer:: ierr, ios, myid, size
        ! Body of File_open
        
        call MPI_INIT(ierr)
        call MPI_COMM_RANK(MPI_COMM_WORLD,myid,ierr)
        call MPI_COMM_SIZE(MPI_COMM_WORLD,size,ierr)
        
        if( myid .eq. 0) then
            open(unit=103,File='test.data',STATUS='REPLACE')
            write(103,*) "Number of processes: ", size ," myid: ", myid
            close(103)
        endif
    
        call MPI_BARRIER(MPI_COMM_WORLD,ierr)
    
        open(unit=104,File='test.data',STATUS='OLD',ACTION='read',IOSTAT=ios)
        if( ios .ne. 0 ) then
            write(*,*) "Process ", myid, " Error: File test.data does not exist"
        end if
        close(104)
        
        call MPI_FINALIZE(ierr)
        
        end program File_open
    The symptom is that all processes not located on the node of process zero, i.e. processes on all other nodes, do not see the file and return an error. When I insert a sleep of 10 seconds after the MPI_BARRIER everything is fine. But inserting a sleep is no option. Any idea where the inconsistency comes from and how to fix it? We believe its a synchronization issue of a file service providing the ressources.

    Thanks in advance,
    Jens
    Friday, December 11, 2009 3:08 PM

Answers

  • Hi Jens,

    Another try is to first create a file, say test.data, and then try your program. This is just to confirm that the problem does come from the sync delay. By the way, what's the specific message of ios value indicate? Is it the file doesn't exist or could be something else?

    Thanks,
    James
    • Marked as answer by Don Pattee Wednesday, January 12, 2011 2:50 AM
    Wednesday, January 20, 2010 12:39 AM

All replies

  • Hi Jens,

    I agree that it is probably due to a synchronization delay in your file service. Have you tried prefacing your file OPEN (after the barrier) with a loop around an INQUIRE statement checking whether the file exists?

    Regards,

    Patrick
    Monday, December 14, 2009 11:43 PM
  • Hi Jens,

    Another try is to first create a file, say test.data, and then try your program. This is just to confirm that the problem does come from the sync delay. By the way, what's the specific message of ios value indicate? Is it the file doesn't exist or could be something else?

    Thanks,
    James
    • Marked as answer by Don Pattee Wednesday, January 12, 2011 2:50 AM
    Wednesday, January 20, 2010 12:39 AM