[Mckernel-users 39] Re: McKernel and Turbo
Scott Walker
walker8 at email.arizona.edu
Fri Jun 2 07:03:34 JST 2017
Hi Balazs,
I just tested the exact same McKernel source on a Sandy Bridge machine with
the same linux version. I still see the same bug with the 64 bit return
values. This experiment was identical to the code I sent you.
Thanks,
Scott Walker
On May 31, 2017 2:52 PM, "Scott Walker" <walker8 at email.arizona.edu> wrote:
> Hi Balazs,
> I checked the userspace program and the system call again and everything
> looks OK. However, I still don't get the correct value when returning 64
> bit numbers. I tried creating another syscall test and switched everything
> involving return values to use "uint64_t" to ensure that the data types
> were 64 bit. Is it possible that this is a problem with running McKernel on
> Kaby Lake? Normally I would doubt that but it is unusual that you don't see
> this result at all.
>
> Here is the syscall:
>
> SYSCALL_DECLARE(sys_rdmsr)
> {
> unsigned reg = ihk_mc_syscall_arg0(ctx);
> unsigned long *addr = (unsigned long *) ihk_mc_syscall_arg1(ctx);
> uint64_t ret = 0x123456789ABCDEF1ULL;
> *addr = ret;
> return ret;
> }
>
> Here is the userspace program:
> int main()
> {
> uint64_t tsc = 0;
> uint64_t ret = 0;
> unsigned long i;
> for (i = 0; i < 1000000UL; i++)
> {
> // this syscall is hardcoded to return 0x123456789ABCDEF1
> ret = syscall(312, 0x10, &tsc);
> printf("t %lu(%lx), ", tsc, tsc);
> printf("ret is %lu(%lx)\n", ret, ret);
> }
> return 0;
> }
>
> Here are some results of running this code with different values returned
> from the syscall. The 2 values on the left are the correct values, stored
> at *addr, in decimal and hex. The values on the right are the values
> returned from the syscall. It appears that the upper 32 bits are all set to
> true when bit 31 becomes true. If bit 31 is false and there are bits true
> in the upper 32 bits, they get discarded.
>
> format: correct(hex), returnval(returnval as hex)
> t 2147483648 <(214)%20748-3648>(80000000), ret is 18446744071562067968(
> ffffffff80000000)
> t 1879048192(70000000), ret is 1879048192(70000000)
> t 66303557632(f70000000), ret is 1879048192(70000000)
> t 1311768464867721217(1234567800000001), ret is 1(1)
>
>
>
> I came across another problem yesterday in McKernel. When I cause a
> floating point exception, McKernel locks up. A normal restart (with
> mcreboot or through linux) does not stop McKernel, I need to hold the power
> button down to stop it.
>
> thanks,
> Scott
>
> On Mon, May 29, 2017 at 1:48 AM, Balazs Gerofi <bgerofi at riken.jp> wrote:
>
>> Hello Scott,
>>
>> On Wed, May 24, 2017 at 3:08 PM, Balazs Gerofi <bgerofi at riken.jp> wrote:
>>
>>> I will look into the 64 bit return value issue hopefully sometimes later
>>> this week.
>>>
>>
>> I looked at the 64 bit return value issue just now, but it seems to me
>> that it is working okay.
>> I added a system call that receives an integer and shifts a bit by the
>> argument, it looks like this:
>>
>> SYSCALL_DECLARE(shift)
>> {
>> int shift = (int)ihk_mc_syscall_arg0(ctx);
>> unsigned long ret = 1UL << shift;
>> kprintf("%s: shift: %d, ret: %lu\n", __FUNCTION__, shift, ret);
>> return ret;
>> }
>>
>> I call this from userspace and this is what I get:
>>
>> $ for i in 0 7 31 48 63; do ./bin/mcexec ~/src/syscall $i; done
>> main: 1
>> main: 128
>> main: 2147483648 <(214)%20748-3648>
>> main: 281474976710656
>> main: 9223372036854775808
>>
>> This seems to suggest that the 64 bit values work fine (the return value
>> for shifting by 63 is 2^63).
>> What are you exactly comparing against? Isn't it possible that something
>> is wrong with the userspace part of your code?
>>
>> Bests,
>> Balazs
>>
>>
>>
>>> For the time being, you could just try:
>>>
>>> unsigned long *addr = ihk_mc_syscall_arg1(ctx);
>>> *addr = __rdmsr(reg);
>>>
>>> in your sys_rdmsr() implementation, because on x86 it is possible to
>>> access userspace from the kernel (assuming that your are in the given
>>> process' context). This should enable you to proceed with your experiments.
>>> Note that copy_to_user() is slower because it goes through various
>>> verification steps to make sure the address is valid.
>>>
>>> Best,
>>> Balazs
>>>
>>>
>>> On Tue, May 23, 2017 at 3:51 PM, Scott Walker <walker8 at email.arizona.edu
>>> > wrote:
>>>
>>>> Hi Balazs,
>>>> When I went to get my code to send to you, I noticed something which
>>>> may better indicate the problem.
>>>>
>>>> When I first made my read/write MSR system calls, I was passing and
>>>> returning the register values to the syscall. This was resulting in a bug
>>>> where 64 bit values passed and returned had strange behavior. I did not
>>>> realize that I left debug code in the system call when I was trying to
>>>> figure this out, and that was causing the readings to come back indicating
>>>> the aforementioned strange turbo behavior.
>>>>
>>>> I changed my code to use the only way I could get the read/write MSR
>>>> syscalls to work, which is using copy_to/from_user to handle all 64 bit
>>>> values. This shows that turbo is indeed enabled when I use the "-t" flag
>>>> (and off without the -t flag). However, the problem still remains where the
>>>> 64 bit values being passed and returned have odd behavior. It looks like
>>>> returned 64 bit values have the sign bit flipped sometimes, and passed 64
>>>> bit values have the top 32 bits truncated.
>>>>
>>>> I have attached a test program which shows the strange behavior this,
>>>> along with my two McKernel read/write system calls. In addition to this
>>>> program's output, you can uncomment the "kprintf" and check the McKernel
>>>> klog, which shows the problems with 64 bit values being passed to the
>>>> system call. Please let me know if there is a fix for this problem as using
>>>> copy_to/from_user has too high overhead for our experiment.
>>>>
>>>> Additional info:
>>>>
>>>> All of these tests have hyper threading off.
>>>>
>>>> In my test program I am comparing the value of a system call reading
>>>> the TSC, and using the "RDTSC" instruction in the user program. Both
>>>> counters still increment at the same rate but the TSC value returned from
>>>> the kernel has strange behavior.
>>>>
>>>> I tried using my own rdmsr/wrmsr assembly functions instead of the one
>>>> in "registers.h" but they both had the same result. However, you can verify
>>>> correct values within the kernel with kprintf.
>>>>
>>>> thanks,
>>>> Scott
>>>>
>>>> On Tue, May 23, 2017 at 12:13 PM, Balazs Gerofi <bgerofi at riken.jp>
>>>> wrote:
>>>>
>>>>> Hello Scott,
>>>>>
>>>>> On Mon, May 22, 2017 at 2:07 PM, Scott Walker <
>>>>> walker8 at email.arizona.edu> wrote:
>>>>>
>>>>>> I have attached a tarball with some results in it. The 5th column and
>>>>>> the 7th and 8th columns are the interesting results. The 5th column is the
>>>>>> measured frequency. The 7th column is one of the turbo disable bits, and
>>>>>> the 8th column is another turbo disable bit. The 7th and 8th column should
>>>>>> be non-zero if turbo is disabled.
>>>>>>
>>>>>
>>>>> I took a look at the tarball, but I cannot find any results file in
>>>>> it. Which file should I look at?
>>>>>
>>>>>
>>>>>> In the linux results the frequency stays in turbo and the turbo bits
>>>>>> are both off, as expected. In the McKernel results, the processor is not in
>>>>>> a turbo frequency but the turbo disable bits do not stay off, sometimes
>>>>>> both of them are zero.
>>>>>>
>>>>>> I have noticed the following bugs regarding turbo:
>>>>>>
>>>>>> The "-t" flag to mcreboot does not enable turbo. It appears to do
>>>>>> nothing.
>>>>>>
>>>>>
>>>>> This is strange, I just double-checked it by printing out the MSR and
>>>>> it does toggle the turbo bit.
>>>>> Turbo is bit 32 of MSR_IA32_PERF_CTL (0x199) and is configured in
>>>>> init_pstate_and_turbo() function in arch/x86/kernel/cpu.c, I actually have
>>>>> the text from the Intel manual copied there:
>>>>>
>>>>> /* Turbo boost setting:
>>>>> * Bit 1 of EAX in Leaf 06H (i.e. CPUID.06H:EAX[1]) indicates
>>>>> opportunistic
>>>>> * processor performance operation, such as IDA, has been enabled
>>>>> by BIOS.
>>>>> *
>>>>> * IA32_PERF_CTL (0x199H) bit 32: IDA (i.e., turbo boost) Engage.
>>>>> (R/W)
>>>>> * When set to 1: disengages IDA
>>>>> * When set to 0: enables IDA
>>>>> */
>>>>>
>>>>> Is this the MSR you are looking at?
>>>>>
>>>>> When turbo is supposedly disabled in McKernel, I sometimes see the
>>>>>> processor in turbo frequencies.
>>>>>>
>>>>>
>>>>> What platform are you using and how do you configure Linux and
>>>>> McKernel CPUs exactly? Do you split HW threads of the same CPU core between
>>>>> the two kernels?
>>>>>
>>>>>
>>>>>> mcstop+release sometimes does not re-enable turbo. I'm not sure how
>>>>>> to replicate this bug, it just seems to happen sometimes.
>>>>>>
>>>>>
>>>>> Thanks for reporting this, I am adding a bug report.
>>>>>
>>>>>
>>>>>> As for trying these experiments yourself, it may be a little tricky.
>>>>>> I modified the McKernel source code to provide two system calls which allow
>>>>>> me to read and write MSRs. I can send you a diff of the modifications if
>>>>>> that helps.
>>>>>>
>>>>>
>>>>> I do have a patch in a development branch that does similar things
>>>>> actually, but yes, send me your code please!
>>>>>
>>>>> Thanks,
>>>>> Balazs
>>>>>
>>>>>
>>>>>> thanks,
>>>>>> Scott
>>>>>>
>>>>>> On Thu, May 4, 2017 at 1:20 PM, Balazs Gerofi <bgerofi at riken.jp>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello Scott,
>>>>>>>
>>>>>>> could you let me know what is this test exactly?
>>>>>>> If you can share it, I could try to run on one of our machines to
>>>>>>> investigate what's going on.
>>>>>>>
>>>>>>> Balazs
>>>>>>>
>>>>>>> On Thu, May 4, 2017 at 12:01 AM, Scott Walker <
>>>>>>> walker8 at email.arizona.edu> wrote:
>>>>>>>
>>>>>>>> Hi Balazs,
>>>>>>>>
>>>>>>>> I tried executing the same test in Linux and I didn't see it
>>>>>>>> modifying those registers at all. I am currently either partitioning one or
>>>>>>>> two cores to McKernel, and never core 0.
>>>>>>>>
>>>>>>>> I'll let you know if I find out more.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Scott Walker
>>>>>>>>
>>>>>>>> On May 3, 2017 10:50 PM, "Balazs Gerofi" <bgerofi at riken.jp> wrote:
>>>>>>>>
>>>>>>>> Hi Scott,
>>>>>>>>
>>>>>>>> that sounds strange, McKernel touches those MSRs only during boot.
>>>>>>>> How do you partition your CPUs? Isn't it possible that someone
>>>>>>>> else (e.g., Linux) makes modifications to some MSRs on the fly?
>>>>>>>>
>>>>>>>> Balazs
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, May 3, 2017 at 10:26 PM, Scott Walker <
>>>>>>>> walker8 at email.arizona.edu> wrote:
>>>>>>>>
>>>>>>>>> Hi Balazs,
>>>>>>>>>
>>>>>>>>> Thanks, that's exactly what we need.
>>>>>>>>>
>>>>>>>>> For "sometimes enabled" here is what I observed:
>>>>>>>>>
>>>>>>>>> If I run a bunch of mcexec jobs where I check to see if those 2
>>>>>>>>> turbo disable bits are set to 1, I noticed that they independently change
>>>>>>>>> states. Sometimes both bits would be true, sometimes only one or the other
>>>>>>>>> would be true, and at other times they would both be false.
>>>>>>>>>
>>>>>>>>> I'm not sure if Turbo actually becomes active during the latter
>>>>>>>>> observation, I could find out if you want. Also, if I try to change the
>>>>>>>>> values of those bits then McKernel locks up.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Scott Walker
>>>>>>>>>
>>>>>>>>> On May 3, 2017 10:14 PM, "Balazs Gerofi" <bgerofi at riken.jp> wrote:
>>>>>>>>>
>>>>>>>>>> Hello Scott,
>>>>>>>>>>
>>>>>>>>>> On Wed, May 3, 2017 at 5:45 PM, Scott Walker <
>>>>>>>>>> walker8 at email.arizona.edu> wrote:
>>>>>>>>>>
>>>>>>>>>>> The experiments we are doing require Intel Turbo to be enabled
>>>>>>>>>>> but we noticed that McKernel is disabling it. I am seeing that the Turbo
>>>>>>>>>>> disable bits in the PERF_CONTROL MSR and MISC_ENABLE are sometimes enabled.
>>>>>>>>>>>
>>>>>>>>>>> This is not happening with the same system running linux. Is
>>>>>>>>>>> there a way we can disable this in McKernel? I've been unable to track this
>>>>>>>>>>> down in the McKernel source code.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Turbo mode is disabled by default. If you want to enable it
>>>>>>>>>> please pass -t to the mcreboot script.
>>>>>>>>>> Also, what do you exactly mean by "sometimes enabled"?
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Balazs
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Scott Walker
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pccluster.org/pipermail/mckernel-users/attachments/20170601/a2d2b6e6/attachment-0001.html>
More information about the Mckernel-users
mailing list