[Mckernel-users 40] Re: McKernel and Turbo

Balazs Gerofi bgerofi at riken.jp
Fri Jun 2 08:18:16 JST 2017


Hello Scott,

this is bizarre. I just tried the exact same thing:

-----------------------------------------------------------------
SYSCALL_DECLARE(my_call)
{
    uint64_t ret = 0x123456789ABCDEF1ULL;
    return ret;
}

-----------------------------------------------------------------
#define _GNU_SOURCE
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/syscall.h>
#include <sys/types.h>

int main(int argc, char *argv[])
{
        unsigned long l;

        l = syscall(700);
        printf("%s: %lX\n", __FUNCTION__, l);

        return 0;
}
-----------------------------------------------------------------

Output:
$ mcexec ./syscall
main: 123456789ABCDEF1

How do you declare the syscall?
Is it SYSCALL_HANDLED(number, sys_rdmsr)
in mckernel/arch/x86/kernel/include/syscall_list.h?

Best,
Balazs


On Thu, Jun 1, 2017 at 3:03 PM, Scott Walker <walker8 at email.arizona.edu>
wrote:

> Hi Balazs,
> I just tested the exact same McKernel source on a Sandy Bridge machine
> with the same linux version. I still see the same bug with the 64 bit
> return values. This experiment was identical to the code I sent you.
>
> Thanks,
> Scott Walker
>
> On May 31, 2017 2:52 PM, "Scott Walker" <walker8 at email.arizona.edu> wrote:
>
>> Hi Balazs,
>> I checked the userspace program and the system call again and everything
>> looks OK. However, I still don't get the correct value when returning 64
>> bit numbers. I tried creating another syscall test and switched everything
>> involving return values to use "uint64_t" to ensure that the data types
>> were 64 bit. Is it possible that this is a problem with running McKernel on
>> Kaby Lake? Normally I would doubt that but it is unusual that you don't see
>> this result at all.
>>
>> Here is the syscall:
>>
>> SYSCALL_DECLARE(sys_rdmsr)
>> {
>>     unsigned reg = ihk_mc_syscall_arg0(ctx);
>>     unsigned long *addr = (unsigned long *) ihk_mc_syscall_arg1(ctx);
>>     uint64_t ret = 0x123456789ABCDEF1ULL;
>>     *addr = ret;
>>     return ret;
>> }
>>
>> Here is the userspace program:
>> int main()
>> {
>>         uint64_t tsc = 0;
>>         uint64_t ret = 0;
>>         unsigned long i;
>>         for (i = 0; i < 1000000UL; i++)
>>         {
>>                 // this syscall is hardcoded to return 0x123456789ABCDEF1
>>                 ret = syscall(312, 0x10, &tsc);
>>                 printf("t %lu(%lx), ", tsc, tsc);
>>                 printf("ret is %lu(%lx)\n", ret, ret);
>>         }
>>         return 0;
>> }
>>
>> Here are some results of running this code with different values returned
>> from the syscall. The 2 values on the left are the correct values, stored
>> at *addr, in decimal and hex. The values on the right are the values
>> returned from the syscall. It appears that the upper 32 bits are all set to
>> true when bit 31 becomes true. If bit 31 is false and there are bits true
>> in the upper 32 bits, they get discarded.
>>
>> format: correct(hex), returnval(returnval as hex)
>> t 2147483648 <(214)%20748-3648>(80000000), ret is
>> 18446744071562067968(ffffffff80000000)
>> t 1879048192(70000000), ret is 1879048192(70000000)
>> t 66303557632(f70000000), ret is 1879048192(70000000)
>> t 1311768464867721217(1234567800000001), ret is 1(1)
>>
>>
>>
>> I came across another problem yesterday in McKernel. When I cause a
>> floating point exception, McKernel locks up. A normal restart (with
>> mcreboot or through linux) does not stop McKernel, I need to hold the power
>> button down to stop it.
>>
>> thanks,
>> Scott
>>
>> On Mon, May 29, 2017 at 1:48 AM, Balazs Gerofi <bgerofi at riken.jp> wrote:
>>
>>> Hello Scott,
>>>
>>> On Wed, May 24, 2017 at 3:08 PM, Balazs Gerofi <bgerofi at riken.jp> wrote:
>>>
>>>> I will look into the 64 bit return value issue hopefully sometimes
>>>> later this week.
>>>>
>>>
>>> I looked at the 64 bit return value issue just now, but it seems to me
>>> that it is working okay.
>>> I added a system call that receives an integer and shifts a bit by the
>>> argument, it looks like this:
>>>
>>> SYSCALL_DECLARE(shift)
>>> {
>>>     int shift = (int)ihk_mc_syscall_arg0(ctx);
>>>     unsigned long ret = 1UL << shift;
>>>     kprintf("%s: shift: %d, ret: %lu\n", __FUNCTION__, shift, ret);
>>>     return ret;
>>> }
>>>
>>> I call this from userspace and this is what I get:
>>>
>>> $ for i in 0 7 31 48 63; do ./bin/mcexec ~/src/syscall $i; done
>>> main: 1
>>> main: 128
>>> main: 2147483648 <(214)%20748-3648>
>>> main: 281474976710656
>>> main: 9223372036854775808
>>>
>>> This seems to suggest that the 64 bit values work fine (the return value
>>> for shifting by 63 is 2^63).
>>> What are you exactly comparing against? Isn't it possible that something
>>> is wrong with the userspace part of your code?
>>>
>>> Bests,
>>> Balazs
>>>
>>>
>>>
>>>> For the time being, you could just try:
>>>>
>>>> unsigned long *addr = ihk_mc_syscall_arg1(ctx);
>>>> *addr = __rdmsr(reg);
>>>>
>>>> in your sys_rdmsr() implementation, because on x86 it is possible to
>>>> access userspace from the kernel (assuming that your are in the given
>>>> process' context).  This should enable you to proceed with your experiments.
>>>> Note that copy_to_user() is slower because it goes through various
>>>> verification steps to make sure the address is valid.
>>>>
>>>> Best,
>>>> Balazs
>>>>
>>>>
>>>> On Tue, May 23, 2017 at 3:51 PM, Scott Walker <
>>>> walker8 at email.arizona.edu> wrote:
>>>>
>>>>> Hi Balazs,
>>>>> When I went to get my code to send to you, I noticed something which
>>>>> may better indicate the problem.
>>>>>
>>>>> When I first made my read/write MSR system calls, I was passing and
>>>>> returning the register values to the syscall. This was resulting in a bug
>>>>> where 64 bit values passed and returned had strange behavior. I did not
>>>>> realize that I left debug code in the system call when I was trying to
>>>>> figure this out, and that was causing the readings to come back indicating
>>>>> the aforementioned strange turbo behavior.
>>>>>
>>>>> I changed my code to use the only way I could get the read/write MSR
>>>>> syscalls to work, which is using copy_to/from_user to handle all 64 bit
>>>>> values. This shows that turbo is indeed enabled when I use the "-t" flag
>>>>> (and off without the -t flag). However, the problem still remains where the
>>>>> 64 bit values being passed and returned have odd behavior. It looks like
>>>>> returned 64 bit values have the sign bit flipped sometimes, and passed 64
>>>>> bit values have the top 32 bits truncated.
>>>>>
>>>>> I have attached a test program which shows the strange behavior this,
>>>>> along with my two McKernel read/write system calls. In addition to this
>>>>> program's output, you can uncomment the "kprintf" and check the McKernel
>>>>> klog, which shows the problems with 64 bit values being passed to the
>>>>> system call. Please let me know if there is a fix for this problem as using
>>>>> copy_to/from_user has too high overhead for our experiment.
>>>>>
>>>>> Additional info:
>>>>>
>>>>> All of these tests have hyper threading off.
>>>>>
>>>>> In my test program I am comparing the value of a system call reading
>>>>> the TSC, and using the "RDTSC" instruction in the user program. Both
>>>>> counters still increment at the same rate but the TSC value returned from
>>>>> the kernel has strange behavior.
>>>>>
>>>>> I tried using my own rdmsr/wrmsr assembly functions instead of the one
>>>>> in "registers.h" but they both had the same result. However, you can verify
>>>>> correct values within the kernel with kprintf.
>>>>>
>>>>> thanks,
>>>>> Scott
>>>>>
>>>>> On Tue, May 23, 2017 at 12:13 PM, Balazs Gerofi <bgerofi at riken.jp>
>>>>> wrote:
>>>>>
>>>>>> Hello Scott,
>>>>>>
>>>>>> On Mon, May 22, 2017 at 2:07 PM, Scott Walker <
>>>>>> walker8 at email.arizona.edu> wrote:
>>>>>>
>>>>>>> I have attached a tarball with some results in it. The 5th column
>>>>>>> and the 7th and 8th columns are the interesting results. The 5th column is
>>>>>>> the measured frequency. The 7th column is one of the turbo disable bits,
>>>>>>> and the 8th column is another turbo disable bit. The 7th and 8th column
>>>>>>> should be non-zero if turbo is disabled.
>>>>>>>
>>>>>>
>>>>>> I took a look at the tarball, but I cannot find any results file in
>>>>>> it. Which file should I look at?
>>>>>>
>>>>>>
>>>>>>> In the linux results the frequency stays in turbo and the turbo bits
>>>>>>> are both off, as expected. In the McKernel results, the processor is not in
>>>>>>> a turbo frequency but the turbo disable bits do not stay off, sometimes
>>>>>>> both of them are zero.
>>>>>>>
>>>>>>> I have noticed the following bugs regarding turbo:
>>>>>>>
>>>>>>> The "-t" flag to mcreboot does not enable turbo. It appears to do
>>>>>>> nothing.
>>>>>>>
>>>>>>
>>>>>> This is strange, I just double-checked it by printing out the MSR and
>>>>>> it does toggle the turbo bit.
>>>>>> Turbo is bit 32 of MSR_IA32_PERF_CTL (0x199) and is configured in
>>>>>> init_pstate_and_turbo() function in arch/x86/kernel/cpu.c, I actually have
>>>>>> the text from the Intel manual copied there:
>>>>>>
>>>>>>     /* Turbo boost setting:
>>>>>>      * Bit 1 of EAX in Leaf 06H (i.e. CPUID.06H:EAX[1]) indicates
>>>>>> opportunistic
>>>>>>      * processor performance operation, such as IDA, has been enabled
>>>>>> by BIOS.
>>>>>>      *
>>>>>>      * IA32_PERF_CTL (0x199H) bit 32: IDA (i.e., turbo boost) Engage.
>>>>>> (R/W)
>>>>>>      * When set to 1: disengages IDA
>>>>>>      * When set to 0: enables IDA
>>>>>>      */
>>>>>>
>>>>>> Is this the MSR you are looking at?
>>>>>>
>>>>>> When turbo is supposedly disabled in McKernel, I sometimes see the
>>>>>>> processor in turbo frequencies.
>>>>>>>
>>>>>>
>>>>>> What platform are you using and how do you configure Linux and
>>>>>> McKernel CPUs exactly? Do you split HW threads of the same CPU core between
>>>>>> the two kernels?
>>>>>>
>>>>>>
>>>>>>> mcstop+release sometimes does not re-enable turbo. I'm not sure how
>>>>>>> to replicate this bug, it just seems to happen sometimes.
>>>>>>>
>>>>>>
>>>>>> Thanks for reporting this, I am adding a bug report.
>>>>>>
>>>>>>
>>>>>>> As for trying these experiments yourself, it may be a little tricky.
>>>>>>> I modified the McKernel source code to provide two system calls which allow
>>>>>>> me to read and write MSRs. I can send you a diff of the modifications if
>>>>>>> that helps.
>>>>>>>
>>>>>>
>>>>>> I do have a patch in a development branch that does similar things
>>>>>> actually, but yes, send me your code please!
>>>>>>
>>>>>> Thanks,
>>>>>> Balazs
>>>>>>
>>>>>>
>>>>>>> thanks,
>>>>>>> Scott
>>>>>>>
>>>>>>> On Thu, May 4, 2017 at 1:20 PM, Balazs Gerofi <bgerofi at riken.jp>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hello Scott,
>>>>>>>>
>>>>>>>> could you let me know what is this test exactly?
>>>>>>>> If you can share it, I could try to run on one of our machines to
>>>>>>>> investigate what's going on.
>>>>>>>>
>>>>>>>> Balazs
>>>>>>>>
>>>>>>>> On Thu, May 4, 2017 at 12:01 AM, Scott Walker <
>>>>>>>> walker8 at email.arizona.edu> wrote:
>>>>>>>>
>>>>>>>>> Hi Balazs,
>>>>>>>>>
>>>>>>>>> I tried executing the same test in Linux and I didn't see it
>>>>>>>>> modifying those registers at all. I am currently either partitioning one or
>>>>>>>>> two cores to McKernel, and never core 0.
>>>>>>>>>
>>>>>>>>> I'll let you know if I find out more.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Scott Walker
>>>>>>>>>
>>>>>>>>> On May 3, 2017 10:50 PM, "Balazs Gerofi" <bgerofi at riken.jp> wrote:
>>>>>>>>>
>>>>>>>>> Hi Scott,
>>>>>>>>>
>>>>>>>>> that sounds strange, McKernel touches those MSRs only during boot.
>>>>>>>>> How do you partition your CPUs?  Isn't it possible that someone
>>>>>>>>> else (e.g., Linux) makes modifications to some MSRs on the fly?
>>>>>>>>>
>>>>>>>>> Balazs
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, May 3, 2017 at 10:26 PM, Scott Walker <
>>>>>>>>> walker8 at email.arizona.edu> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Balazs,
>>>>>>>>>>
>>>>>>>>>> Thanks, that's exactly what we need.
>>>>>>>>>>
>>>>>>>>>> For "sometimes enabled" here is what I observed:
>>>>>>>>>>
>>>>>>>>>> If I run a bunch of mcexec jobs where I check to see if those 2
>>>>>>>>>> turbo disable bits are set to 1, I noticed that they independently change
>>>>>>>>>> states. Sometimes both bits would be true, sometimes only one or the other
>>>>>>>>>> would be true, and at other times they would both be false.
>>>>>>>>>>
>>>>>>>>>> I'm not sure if Turbo actually becomes active during the latter
>>>>>>>>>> observation, I could find out if you want. Also, if I try to change the
>>>>>>>>>> values of those bits then McKernel locks up.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Scott Walker
>>>>>>>>>>
>>>>>>>>>> On May 3, 2017 10:14 PM, "Balazs Gerofi" <bgerofi at riken.jp>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello Scott,
>>>>>>>>>>>
>>>>>>>>>>> On Wed, May 3, 2017 at 5:45 PM, Scott Walker <
>>>>>>>>>>> walker8 at email.arizona.edu> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> The experiments we are doing require Intel Turbo to be enabled
>>>>>>>>>>>> but we noticed that McKernel is disabling it. I am seeing that the Turbo
>>>>>>>>>>>> disable bits in the PERF_CONTROL MSR and MISC_ENABLE are sometimes enabled.
>>>>>>>>>>>>
>>>>>>>>>>>> This is not happening with the same system running linux. Is
>>>>>>>>>>>> there a way we can disable this in McKernel? I've been unable to track this
>>>>>>>>>>>> down in the McKernel source code.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Turbo mode is disabled by default. If you want to enable it
>>>>>>>>>>> please pass -t to the mcreboot script.
>>>>>>>>>>> Also, what do you exactly mean by "sometimes enabled"?
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Balazs
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Scott Walker
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pccluster.org/pipermail/mckernel-users/attachments/20170601/68807aff/attachment-0001.html>


More information about the Mckernel-users mailing list