[Mckernel-users 27] Re: Some questions about large scale tests with mckernel
Balazs Gerofi
bgerofi at riken.jp
Fri May 5 02:18:37 JST 2017
Hello Jeremie,
On Thu, May 4, 2017 at 6:58 AM, FINIEL, JEREMIE <jeremie.finiel at atos.net>
wrote:
> I’m interested in the changes you have done in MPI and McKernel for the
> Thread Private Shared Library (Toward Operating System Support for Scalable
> Multithreaded Message Passing).
>
>
>
> Is it already integrated in McKernel or do you have a branch for this?
> How can I try this functionality?
>
This functionality is implemented in another branch, which is currently not
on the git server. It's a little out-dated compared to the master branch.
I can send you a tarball, but it assumes the IB network is working.
I’m also interested in the article “Revisiting RDMA Buffer Registration in
> the Context of Lightweight Multi-kernels”. I have tried to download it from
> your web page but it look like the link is down.
> http://www-sys-aics.riken.jp/Members/bgerofi/papers/
> bgerofi-eurompi2016.pdf
>
Sorry, I fixed the link, you should be able to access the paper now.
> IHK was developed to be able to host LWK, right?
>
> McKernel is one instance of a LWK, but did you try with others? Which one?
>
> Is McKernel a simple LWK to allow proof of concept of IHK or does it have
> specific assets beside the TPSL capability?
>
McKernel is mostly about memory management and scalability. We have a paper
submission this year to SC, if it gets in to the program you will be able
to see more details.
As for the "other LWKs" argument, it is primarily McKernel at the moment,
but we had some versions of it that were considerable different. For
example the one with the thread private shared library feature.
I haven't tried to port another OS (e.g., Kitten or MINIX) on top of IHK,
but it shouldn't be that difficult. You could give it a try! :)
Thanks,
Balazs
> Thank you in advance.
>
>
>
> Best regards,
>
>
>
> Jérémie Finiel
>
>
>
>
>
> *From:* Balazs Gerofi [mailto:bgerofi at riken.jp]
> *Sent:* Wednesday, May 03, 2017 7:17 AM
>
> *To:* FINIEL, JEREMIE
> *Cc:* mckernel-users at pccluster.org; LAFERRIERE, Christophe; WELTERLEN,
> BENOIT; Olivier Gruber
> *Subject:* Re: Some questions about large scale tests with mckernel
>
>
>
> Hello Jeremie,
>
>
>
> On Tue, May 2, 2017 at 7:14 AM, FINIEL, JEREMIE <jeremie.finiel at atos.net>
> wrote:
>
> I am sorry but it is not possible to access our machines.
>
> It look like the problem is in the init_fpu() function (in
> mckernel/arch/x86§kernel/cpu.c file). When enabling debug traces, it look
> like xgetbv(0) is the issue (function before dkprintf("init_fpu():
> xsave_mask = 0x%016lX\n", xsave_mask);).
>
>
>
> I think that the processor don’t have the xgetbv instruction available.
> Does I need to undefine the ENABLE_SSE?
>
>
>
> I am not quite sure about this, I will have to investigate it first. You
> could meanwhile try it though!
>
>
>
> I have checkout the performance branch, but I get error at compile time.
> In mckernel/executer/kernel/mcctrl/driver.c, IHK_OS_AUX_PERF_* are not
> declared.
>
>
>
> Grep show that these varaibles are used in mckernel/executer/kernel/mcctrl/driver.c
> and ./executer/kernel/mcctrl/control.c, but never declared.
>
>
>
> Please pull the ihk repository as well, I think it's due to the mismatch
> between the mckernel and ihk repository versions.
>
>
>
> Balazs
>
>
>
>
>
> Thank you for your help.
>
>
>
> Best regards,
>
> Jérémie Finiel
>
>
>
>
>
> *From:* Balazs Gerofi [mailto:bgerofi at riken.jp]
> *Sent:* Saturday, April 29, 2017 1:01 AM
>
>
> *To:* FINIEL, JEREMIE
> *Cc:* mckernel-users at pccluster.org; LAFERRIERE, Christophe; WELTERLEN,
> BENOIT; Olivier Gruber
> *Subject:* Re: Some questions about large scale tests with mckernel
>
>
>
> Hello Jeremie,
>
>
>
> On Fri, Apr 28, 2017 at 6:16 AM, FINIEL, JEREMIE <jeremie.finiel at atos.net>
> wrote:
>
> Here is the new dmesg file in attachment, seems like nothing has changed.
>
>
>
> That indeed looks the same as previously. Is there any way I could access
> your machine?
>
>
>
> Otherwise you could try to comment out the call to x86_init_perfctr() in
> init_cpu() in the file mckernel/arch/x86/kernel/cpu.c, that is where the
> next kmsg is supposed to show up so perhaps something goes wrong there.
> This change would disable the performance counter initialization code.
>
>
>
> I’m trying now to use InfiniBand. As the driver bypass the OS, no syscall
> may be triggered and no offloading may be necessary. Right?
>
> I tried to execute IB tests commands like ib_read_bw (from perftest), but
> the command hang when executed in McKernel, and put the machine in a
> strange state (command like ps hang).
>
> So I tried to use a more simple application which uses ibverbs lib (found
> here : https://blog.zhaw.ch/icclab/infiniband-an-introduction-
> simple-ib-verbs-program-with-rdma-write/).
>
> But when I execute it in McKernel, the application hangs too.
>
>
>
> IB works on our machines, but I am a little confused now, are you using
> another machine for the IB tests? Are you using real HW or a VM?
>
>
>
> As I try to reduce the number of syscall in an intensive MPI communication
> application, I have started to read some IHK/McKernel code. In particular
> regarding syscall catch and syscall offloading.
>
>
>
> I have seen that a tracking syscall functionality is available (by
> defining TRACK_SYSCALLS and using “track_syscalls”). I guess that’s what
> you used to evaluate syscall offloading time in “Exploring the Design Space
> of Combining Linux with Lightweight Kernels for Extreme Scale Computing”
> article, isn’t it?
>
> I have tried to define this TRACK_SYSCALLS but now McKernel don’t start.
> Please find the dmesg_track_syscalls.log file in attachment.
>
> Could you tell me how I can enable this feature (if this is a feature)? Is
> it deprecated?
>
>
>
> TRACK_SYSCALLS is a deprecated feature in the master branch, there is
> another branch called "performance" that provides much more elaborate
> profiling code, if you are interested in trying it please checkout that
> branch (in mckernel: git fetch; git checkout performance). What you can do
> then is to pass --profile to mcexec, for example: mcexec --profile ls -ls ,
> when it returns you can look at the kmsg and see some log about various
> syscalls/kernel events.
>
>
>
> I think it would be helpful to be able to debug an application under
> McKernel. Do you have a tool to do so?
>
>
>
> GDB is half-way supported, you could give it a try, but it lacks many
> features.. catching segfaults in single threaded apps sort of works.
>
> For example, you would run it as: mcexec gdb --args ls -ls
>
>
>
> When I activate debug trace on McKernel, a lot of information are written
> with kprintf. More than ihkosctl kmsg are able to print. How can I extend
> this buffer size? Is there a way to follow kmsg in real time (like dmesg
> –w)?
>
>
>
> There is macro called IHK_KMSG_SIZE, that controls the size of the buffer.
> dmesg -w like usage model doesn't quite work yet I believe.
>
>
>
> Balazs
>
>
>
> Thank you in advance for your help.
>
>
>
> Best regards,
>
>
>
> Jérémie Finiel
>
>
>
> *From:* Balazs Gerofi [mailto:bgerofi at riken.jp]
> *Sent:* Sunday, April 23, 2017 2:01 AM
> *To:* FINIEL, JEREMIE
> *Cc:* mckernel-users at pccluster.org; LAFERRIERE, Christophe; WELTERLEN,
> BENOIT; Olivier Gruber
>
>
> *Subject:* Re: Some questions about large scale tests with mckernel
>
>
>
> Hello Jeremie,
>
>
>
> I would suggest that you use MVAPICH or Intel MPI, but let's focus on the
> boot bug first. I've added a change to make sure we can see the full kmsg
> when boot fails.
>
> Could you please pull the IHK repository, retry the boot script, and send
> me the new dmesg log?
>
>
>
> Thanks,
>
> Balazs
>
>
>
>
>
> On Thu, Apr 20, 2017 at 6:37 AM, FINIEL, JEREMIE <jeremie.finiel at atos.net>
> wrote:
>
> Hello Balazs,
>
> Thank you for your quick reply.
>
>
>
> I catch a set of hwloc and mpi_hello_world executions in the ‘execution’
> attached file.
> It contain my console output with hwloc-ls execution in both, Linux and
> McKernel, and execution of mpi_hello_world with and without pinning.
>
> Has you can see (Line 161), in case where pinning is not explicit, the
> command print an error and then hangs.
>
> And by explicitly disabling pinning (Line 217), the error does not appear,
> but the command hang too.
>
>
>
> Here is my environment:
>
> 1 node
>
> CPU Xeon E5-2680, 2 sockets, 12 cores per socket, 1 thread per core
>
> Uname –r : 3.10.0-327.el7.x86_64
>
> OpenMPI version: 2.0.0
>
>
>
> Best regards,
>
>
>
> Jérémie Finiel
>
>
>
>
>
>
>
> *From:* Balazs Gerofi [mailto:bgerofi at riken.jp]
> *Sent:* Thursday, April 20, 2017 6:11 AM
> *To:* FINIEL, JEREMIE; mckernel-users at pccluster.org
> *Cc:* LAFERRIERE, Christophe; WELTERLEN, BENOIT; Olivier Gruber
> *Subject:* Re: Some questions about large scale tests with mckernel
>
>
>
> Hello Jeremie,
>
>
>
> I have added the mckernel-users mailing list to CC so that we all see your
> messages, please keep it in the loop!
>
>
>
> On Wed, Apr 19, 2017 at 8:07 AM, FINIEL, JEREMIE <jeremie.finiel at atos.net>
> wrote:
>
> We had difficulties to launch McKernel on Nehalem processor (E5540 in this
> case). When starting mcreboot.sh, we just got this error two times: “error:
> booting”. Please find the dmesg log attached.
>
>
>
> I haven't had a chance so far to try McKernel on Nehalem, but I would
> expect it's more related to your Linux kernel version, what is the version
> you are running? Also, I looked at the dmesg but unfortunately the root
> cause of the failure is not visible, I will need to adjust how the kmsg is
> printed to make that part visible. Let me try to get a patch done this
> week, I'll contact you again.
>
>
>
> We tried to launch MPI program with openMPI, but we had an error about
> hwloc pinning functions (hwloc_set_cpubind returned "Error" for bitmap "0").
>
> As written in your paper, we tried with mvapich2 which work perfectly.
>
> Could you let me know what specific development you had to do to have
> mvapich working with McKernel? I'm trying to see how hard it would be to
> have OpenMPI working too.
>
>
>
> Are you trying to run OpenMPI on another host? Could you let me know the
> platform and the configuration how you boot McKernel?
>
> Hwloc generally is supported, what do you get for mcexec hwloc-ls? Does
> that work? Another thing you could try is to disable binding in OpenMPI,
> just as a test..
>
>
>
> Now we would like to do some tests at a larger scale. So we develop a
> script to launch McKernel on each machines, but we got difficulties about
> right access. Mcreboot.sh must be launch with privilege access, but mcexec
> can be executed by a user only if /dev/mcos0 is accessible by this user.
> For the moment, in order to avoid using root for every execution, I can
> change owner of /dev/mcos0 as a workaround.
>
> Furthermore, when executing “./mcexec mpirun …” we notice that execution
> on the other machine is done in the Linux side and not in McKernel, but
> “mpirun ./mcexec …” seems to do the job. I would be interested to know how
> you launch your tests at large scale. If, by any chances, you have a
> template script, it would be helpful.
>
>
>
> Ahh, yes, we are aware of this. One way to get around it is to set your
> umask to 0002 and use sudo to run the script (I mean not directly by root).
>
>
>
> Best,
>
> Balazs
>
>
>
> Thank you in advance.
>
>
>
> Best regards,
>
> Jérémie Finiel
>
>
>
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pccluster.org/pipermail/mckernel-users/attachments/20170504/8176b9d2/attachment-0001.html>
More information about the Mckernel-users
mailing list