询问者
windows hpc server 2008 集群诊断出错

问题
-
我是初学者,最近搭建集群,测试出现如下错误,请问这都是什么错误,能不能详细解释下,谢谢。
头结点:CLUSTER1 IP:192.168.10.10 子网掩码:255 255 255 0 DNS:127.0.0.1
计算节点:CLUSTER2 IP:192.168.10.20 DNS:192.168.10.10
mpi ping-pang:轻型吞吐量
2012/10/4 10:50:44 操作失败,不会重试。
2012/10/4 10:50:44 ---- error analysis -----
2012/10/4 10:50:44
2012/10/4 10:50:44 mpi has detected a fatal error and aborted mpipingpong.exe
2012/10/4 10:50:44 [1] on CLUSTER2
2012/10/4 10:50:44
2012/10/4 10:50:44 ---- error analysis -----
2012/10/4 10:50:44
2012/10/4 10:50:44 ConnectFailed(977)......: unable to connect to 192.168.10.10 on port 57089, �������ӷ���һ��ʱ���û����ȷ�����ӵ�����û�з�Ӧ�����ӳ���ʧ�ܡ� (errno 10060)
2012/10/4 10:50:44 ConnectFailed(986)......: unable to connect to 192.168.10.10 on port 57089, exhausted all endpoints
2012/10/4 10:50:44 ConnectFailed(1061).....: [ch3:sock] failed to connnect to remote process 468AEC1E-97B2-4acf-A324-48572EFBDABB:0
2012/10/4 10:50:44 MPIDI_CH3I_Progress(244): handle_sock_op failed
2012/10/4 10:50:44 MPIC_Wait(277)..........:
2012/10/4 10:50:44 MPIC_Sendrecv(123)......:
2012/10/4 10:50:44 MPIR_Allgather(185).....:
2012/10/4 10:50:44 MPI_Allgather(864)......: MPI_Allgather(sbuf=0x000000000024F940, scount=128, MPI_CHAR, rbuf=0x0000000000D38150, rcount=128, MPI_CHAR, MPI_COMM_WORLD) failed
2012/10/4 10:50:44 Fatal error in MPI_Allgather: Other MPI error, error stack:
2012/10/4 10:50:44 [1] fatal error
2012/10/4 10:50:44
2012/10/4 10:50:44 [0] terminated
2012/10/4 10:50:44
2012/10/4 10:50:44 [ranks] message
2012/10/4 10:50:44 job aborted:
2012/10/4 10:50:44mpi ping-pong:快速检查
2012/10/4 10:49:06 操作失败,不会重试。
2012/10/4 10:49:06 ---- error analysis -----
2012/10/4 10:49:06
2012/10/4 10:49:06 mpi has detected a fatal error and aborted mpipingpong.exe
2012/10/4 10:49:06 [0] on CLUSTER1
2012/10/4 10:49:06
2012/10/4 10:49:06 ---- error analysis -----
2012/10/4 10:49:06
2012/10/4 10:49:06 [1] terminated
2012/10/4 10:49:06
2012/10/4 10:49:06 ConnectFailed(977)......: unable to connect to 192.168.10.20 on port 50579, �������ӷ���һ��ʱ���û����ȷ�����ӵ�����û�з�Ӧ�����ӳ���ʧ�ܡ� (errno 10060)
2012/10/4 10:49:06 ConnectFailed(986)......: unable to connect to 192.168.10.20 on port 50579, exhausted all endpoints
2012/10/4 10:49:06 ConnectFailed(1061).....: [ch3:sock] failed to connnect to remote process 15AC5D45-632C-468c-A151-DBE9BC23801A:1
2012/10/4 10:49:06 MPIDI_CH3I_Progress(244): handle_sock_op failed
2012/10/4 10:49:06 MPIC_Wait(277)..........:
2012/10/4 10:49:06 MPIC_Sendrecv(123)......:
2012/10/4 10:49:06 MPIR_Allgather(186).....:
2012/10/4 10:49:06 MPI_Allgather(865)......: MPI_Allgather(sbuf=0x000000000029FBD0, scount=128, MPI_CHAR, rbuf=0x0000000000C88400, rcount=128, MPI_CHAR, MPI_COMM_WORLD) failed
2012/10/4 10:49:06 Fatal error in MPI_Allgather: Other MPI error, error stack:
2012/10/4 10:49:06 [0] fatal error
2012/10/4 10:49:06
2012/10/4 10:49:06 [ranks] message
2012/10/4 10:49:06 job aborted:
2012/10/4 10:49:06