【以下的问题经过翻译处理】 我正在尝试为使用AWS ParallelCluster 3创建的多个集群启用slurm计费,按照此指南。我已成功为第一个集群(cluster-one
)启用了计费,现在我正在尝试按上述页面中“在多个集群上复制该过程”的推荐方式设置第二个(cluster-two
),即使用单个slurmdbd
实例。
然而,从cluster-two
到cluster-one
的slurmdbd
连接无法正常工作。这是来自cluster-one
上的slurmdbd.log
文件:
[2022-05-22T15:09:02.965] error: Munge decode failed: Invalid credential
[2022-05-22T15:09:02.966] auth/munge: _print_cred: ENCODED: Thu Jan 01 00:00:00 1970
[2022-05-22T15:09:02.966] auth/munge: _print_cred: DECODED: Thu Jan 01 00:00:00 1970
[2022-05-22T15:09:02.966] error: slurm_unpack_received_msg: auth_g_verify: REQUEST_PERSIST_INIT has authentication error: Unspecified error
[2022-05-22T15:09:02.966] error: slurm_unpack_received_msg: Protocol authentication error
[2022-05-22T15:09:02.976] error: CONN:10 Failed to unpack SLURM_PERSIST_INIT message
这是cluster-two
上的slurmctld.log
:
[2022-05-09T21:39:36.773] error: slurmdbd: Invalid message version=6500, type:1432
[2022-05-09T21:39:37.250] error: auth_g_pack: protocol_version 6500 not supported
[2022-05-09T21:39:37.250] error: slurm