[Mckernel-users 19] Re: Some questions about large scale tests with mckernel

Fri Apr 28 22:18:54 JST 2017

With attachments. Sorry …

Jérémie Finiel

From: FINIEL, JEREMIE
Sent: Friday, April 28, 2017 3:16 PM
To: 'bgerofi at riken.jp'
Cc: mckernel-users at pccluster.org; LAFERRIERE, Christophe; WELTERLEN, BENOIT; Olivier Gruber
Subject: RE: Some questions about large scale tests with mckernel

Hello Balazs,

Here is the new dmesg file in attachment, seems like nothing has changed.

I’m trying now to use InfiniBand. As the driver bypass the OS, no syscall may be triggered and no offloading may be necessary. Right?
I tried to execute IB tests commands like ib_read_bw (from perftest), but the command hang when executed in McKernel, and put the machine in a strange state (command like ps hang).
So I tried to use a more simple application which uses ibverbs lib (found here : https://blog.zhaw.ch/icclab/infiniband-an-introduction-simple-ib-verbs-program-with-rdma-write/).
But when I execute it in McKernel, the application hangs too.

As I try to reduce the number of syscall in an intensive MPI communication application, I have started to read some IHK/McKernel code. In particular regarding syscall catch and syscall offloading.

I have seen that a tracking syscall functionality is available (by defining TRACK_SYSCALLS and using “track_syscalls”). I guess that’s what you used to evaluate syscall offloading time in “Exploring the Design Space of Combining Linux with Lightweight Kernels for Extreme Scale Computing” article, isn’t it?
I have tried to define this TRACK_SYSCALLS but now McKernel don’t start. Please find the dmesg_track_syscalls.log file in attachment.
Could you tell me how I can enable this feature (if this is a feature)? Is it deprecated?

I think it would be helpful to be able to debug an application under McKernel. Do you have a tool to do so?

When I activate debug trace on McKernel, a lot of information are written with kprintf. More than ihkosctl kmsg are able to print. How can I extend this buffer size? Is there a way to follow kmsg in real time (like dmesg –w)?

Thank you in advance for your help.

Best regards,

Jérémie Finiel

From: Balazs Gerofi [mailto:bgerofi at riken.jp]
Sent: Sunday, April 23, 2017 2:01 AM
To: FINIEL, JEREMIE
Cc: mckernel-users at pccluster.org<mailto:mckernel-users at pccluster.org>; LAFERRIERE, Christophe; WELTERLEN, BENOIT; Olivier Gruber
Subject: Re: Some questions about large scale tests with mckernel

Hello Jeremie,

I would suggest that you use MVAPICH or Intel MPI, but let's focus on the boot bug first. I've added a change to make sure we can see the full kmsg when boot fails.
Could you please pull the IHK repository, retry the boot script, and send me the new dmesg log?

Thanks,
Balazs

On Thu, Apr 20, 2017 at 6:37 AM, FINIEL, JEREMIE <jeremie.finiel at atos.net<mailto:jeremie.finiel at atos.net>> wrote:
Hello Balazs,

Thank you for your quick reply.

I catch a set of hwloc and mpi_hello_world executions in the ‘execution’ attached file.
It contain my console output with hwloc-ls execution in both, Linux and McKernel, and execution of mpi_hello_world with and without pinning.
Has you can see (Line 161), in case where pinning is not explicit, the command print an error and then hangs.
And by explicitly disabling pinning (Line 217), the error does not appear, but the command hang too.

Here is my environment:
1 node
CPU Xeon E5-2680, 2 sockets, 12 cores per socket, 1 thread per core
Uname –r : 3.10.0-327.el7.x86_64
OpenMPI version: 2.0.0

Best regards,

Jérémie Finiel

From: Balazs Gerofi [mailto:bgerofi at riken.jp<mailto:bgerofi at riken.jp>]
Sent: Thursday, April 20, 2017 6:11 AM
To: FINIEL, JEREMIE; mckernel-users at pccluster.org<mailto:mckernel-users at pccluster.org>
Cc: LAFERRIERE, Christophe; WELTERLEN, BENOIT; Olivier Gruber
Subject: Re: Some questions about large scale tests with mckernel

Hello Jeremie,

I have added the mckernel-users mailing list to CC so that we all see your messages, please keep it in the loop!

On Wed, Apr 19, 2017 at 8:07 AM, FINIEL, JEREMIE <jeremie.finiel at atos.net<mailto:jeremie.finiel at atos.net>> wrote:
We had difficulties to launch McKernel on Nehalem processor (E5540 in this case). When starting mcreboot.sh, we just got this error two times: “error: booting”. Please find the dmesg log attached.

I haven't had a chance so far to try McKernel on Nehalem, but I would expect it's more related to your Linux kernel version, what is the version you are running?  Also, I looked at the dmesg but unfortunately the root cause of the failure is not visible, I will need to adjust how the kmsg is printed to make that part visible. Let me try to get a patch done this week, I'll contact you again.

We tried to launch MPI  program with openMPI, but we had an error about hwloc pinning functions (hwloc_set_cpubind returned "Error" for bitmap "0").
As written in your paper, we tried with mvapich2 which work perfectly.
Could you let me know what specific development you had to do to have mvapich working with McKernel? I'm trying to see how hard it would be to have OpenMPI working too.

Are you trying to run OpenMPI on another host? Could you let me know the platform and the configuration how you boot McKernel?
Hwloc generally is supported, what do you get for mcexec hwloc-ls? Does that work? Another thing you could try is to disable binding in OpenMPI, just as a test..

Now we would like to do some tests at a larger scale. So we develop a script to launch McKernel on each machines, but we got difficulties about right access. Mcreboot.sh must be launch with privilege access, but mcexec can be executed by a user only if /dev/mcos0 is accessible by this user. For the moment, in order to avoid using root for every execution, I can change owner of /dev/mcos0 as a workaround.
Furthermore, when executing “./mcexec mpirun …” we notice that execution on the other machine is done in the Linux side and not in McKernel, but “mpirun ./mcexec …” seems to do the job. I would be interested to know how you launch your tests at large scale. If, by any chances, you have a template script, it would be helpful.

Ahh, yes, we are aware of this. One way to get around it is to set your umask to 0002 and use sudo to run the script (I mean not directly by root).

Best,
Balazs

Thank you in advance.

Best regards,
Jérémie Finiel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pccluster.org/pipermail/mckernel-users/attachments/20170428/10fee610/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dmesg.log
Type: application/octet-stream
Size: 2196 bytes
Desc: dmesg.log
URL: <http://www.pccluster.org/pipermail/mckernel-users/attachments/20170428/10fee610/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dmesg_track_syscalls.log
Type: application/octet-stream
Size: 3214 bytes
Desc: dmesg_track_syscalls.log
URL: <http://www.pccluster.org/pipermail/mckernel-users/attachments/20170428/10fee610/attachment-0003.obj>