[Mckernel-users 12] Re: Some questions about large scale tests with mckernel

Balazs Gerofi bgerofi at riken.jp
Thu Apr 20 13:11:24 JST 2017


Hello Jeremie,

I have added the mckernel-users mailing list to CC so that we all see your
messages, please keep it in the loop!

On Wed, Apr 19, 2017 at 8:07 AM, FINIEL, JEREMIE <jeremie.finiel at atos.net>
wrote:

> We had difficulties to launch McKernel on Nehalem processor (E5540 in this
> case). When starting mcreboot.sh, we just got this error two times: “error:
> booting”. Please find the dmesg log attached.
>

I haven't had a chance so far to try McKernel on Nehalem, but I would
expect it's more related to your Linux kernel version, what is the version
you are running?  Also, I looked at the dmesg but unfortunately the root
cause of the failure is not visible, I will need to adjust how the kmsg is
printed to make that part visible. Let me try to get a patch done this
week, I'll contact you again.

We tried to launch MPI  program with openMPI, but we had an error about
> hwloc pinning functions (hwloc_set_cpubind returned "Error" for bitmap "0").
> As written in your paper, we tried with mvapich2 which work perfectly.
> Could you let me know what specific development you had to do to have
> mvapich working with McKernel? I'm trying to see how hard it would be to
> have OpenMPI working too.
>

Are you trying to run OpenMPI on another host? Could you let me know the
platform and the configuration how you boot McKernel?
Hwloc generally is supported, what do you get for mcexec hwloc-ls? Does
that work? Another thing you could try is to disable binding in OpenMPI,
just as a test..

Now we would like to do some tests at a larger scale. So we develop a
> script to launch McKernel on each machines, but we got difficulties about
> right access. Mcreboot.sh must be launch with privilege access, but mcexec
> can be executed by a user only if /dev/mcos0 is accessible by this user.
> For the moment, in order to avoid using root for every execution, I can
> change owner of /dev/mcos0 as a workaround.
> Furthermore, when executing “./mcexec mpirun …” we notice that execution
> on the other machine is done in the Linux side and not in McKernel, but
> “mpirun ./mcexec …” seems to do the job. I would be interested to know how
> you launch your tests at large scale. If, by any chances, you have a
> template script, it would be helpful.
>

Ahh, yes, we are aware of this. One way to get around it is to set your
umask to 0002 and use sudo to run the script (I mean not directly by root).

Best,
Balazs

Thank you in advance.
>
> Best regards,
> Jérémie Finiel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pccluster.org/pipermail/mckernel-users/attachments/20170419/8fa841ad/attachment-0001.html>


More information about the Mckernel-users mailing list