<div dir="ltr">Hello Jeremie,<div><br></div><div>I would suggest that you use MVAPICH or Intel MPI, but let's focus on the boot bug first. I've added a change to make sure we can see the full kmsg when boot fails.</div><div>Could you please pull the IHK repository, retry the boot script, and send me the new dmesg log?</div><div><br></div><div>Thanks,</div><div>Balazs</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Apr 20, 2017 at 6:37 AM, FINIEL, JEREMIE <span dir="ltr"><<a href="mailto:jeremie.finiel@atos.net" target="_blank">jeremie.finiel@atos.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div lang="FR" link="blue" vlink="purple">
<div class="m_-4778413642558593912WordSection1">
<p class="MsoNormal"><span lang="EN-US">Hello Balazs,<br>
<br>
Thank you for your quick reply.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">I catch a set of hwloc and mpi_hello_world executions in the ‘execution’ attached file.<br>
It contain my console output with hwloc-ls execution in both, Linux and McKernel, and execution of mpi_hello_world with and without pinning.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Has you can see (Line 161), in case where pinning is not explicit, the command print an error and then hangs.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">And by explicitly disabling pinning (Line 217), the error does not appear, but the command hang too.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Here is my environment:<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">1 node<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">CPU Xeon E5-2680, 2 sockets, 12 cores per socket, 1 thread per core<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Uname –r : 3.10.0-327.el7.x86_64<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">OpenMPI version: 2.0.0<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Best regards,<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jérémie Finiel<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> Balazs Gerofi [mailto:<a href="mailto:bgerofi@riken.jp" target="_blank">bgerofi@riken.jp</a>]
<br>
<b>Sent:</b> Thursday, April 20, 2017 6:11 AM<br>
<b>To:</b> FINIEL, JEREMIE; <a href="mailto:mckernel-users@pccluster.org" target="_blank">mckernel-users@pccluster.org</a><br>
<b>Cc:</b> LAFERRIERE, Christophe; WELTERLEN, BENOIT; Olivier Gruber<br>
<b>Subject:</b> Re: Some questions about large scale tests with mckernel<u></u><u></u></span></p><div><div class="h5">
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal">Hello Jeremie,<u></u><u></u></p>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">I have added the mckernel-users mailing list to CC so that we all see your messages, please keep it in the loop!<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal">On Wed, Apr 19, 2017 at 8:07 AM, FINIEL, JEREMIE <<a href="mailto:jeremie.finiel@atos.net" target="_blank">jeremie.finiel@atos.net</a>> wrote:<u></u><u></u></p>
<div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">We had difficulties to launch McKernel on Nehalem processor (E5540 in this case). When starting mcreboot.sh, we just got this error two times: “error: booting”. Please find
the dmesg log attached.<u></u><u></u></span></p>
</div>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">I haven't had a chance so far to try McKernel on Nehalem, but I would expect it's more related to your Linux kernel version, what is the version you are running? Also, I looked at the dmesg but unfortunately the root cause of the failure
is not visible, I will need to adjust how the kmsg is printed to make that part visible. Let me try to get a patch done this week, I'll contact you again.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<blockquote style="border:none;border-left:solid #cccccc 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-right:0cm">
<div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">We tried to launch MPI program with openMPI, but we had an error about hwloc pinning functions (hwloc_set_cpubind returned "Error" for bitmap "0").<u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">As written in your paper, we tried with mvapich2 which work perfectly.
<u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">Could you let me know what specific development you had to do to have mvapich working with McKernel? I'm trying to see how hard it would be to have OpenMPI working too.<u></u><u></u></span></p>
</div>
</div>
</blockquote>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Are you trying to run OpenMPI on another host? Could you let me know the platform and the configuration how you boot McKernel?<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Hwloc generally is supported, what do you get for mcexec hwloc-ls? Does that work? Another thing you could try is to disable binding in OpenMPI, just as a test..<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<blockquote style="border:none;border-left:solid #cccccc 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-right:0cm">
<div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">Now we would like to do some tests at a larger scale. So we develop a script to launch McKernel on each machines, but we got difficulties about right access. Mcreboot.sh
must be launch with privilege access, but mcexec can be executed by a user only if /dev/mcos0 is accessible by this user. For the moment, in order to avoid using root for every execution, I can change owner of /dev/mcos0 as a workaround.
<u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">Furthermore, when executing “./mcexec mpirun …” we notice that execution on the other machine is done in the Linux side and not in McKernel, but “mpirun ./mcexec …” seems
to do the job. I would be interested to know how you launch your tests at large scale. If, by any chances, you have a template script, it would be helpful.<u></u><u></u></span></p>
</div>
</div>
</blockquote>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Ahh, yes, we are aware of this. One way to get around it is to set your umask to 0002 and use sudo to run the script (I mean not directly by root).<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Best,<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Balazs<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<blockquote style="border:none;border-left:solid #cccccc 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-right:0cm">
<div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">Thank you in advance.<u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> <u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">Best regards,<u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">Jérémie Finiel<u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> <u></u><u></u></span></p>
</div>
</div>
</blockquote>
</div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
</div>
</div></div></div>
</div>
</blockquote></div><br></div>