Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lilu CPU detection broken for KabyLake vs. CoffeeLake #290

Closed
RehabMan opened this issue Oct 21, 2018 · 9 comments
Closed

Lilu CPU detection broken for KabyLake vs. CoffeeLake #290

RehabMan opened this issue Oct 21, 2018 · 9 comments

Comments

@RehabMan
Copy link

I made this commit based on our discussion in the AppleALC project:
acidanthera/Lilu@b4e77b9

While it fixed the problem I had with the CoffeeLake i7-8559U (now correctly detected as CFL), it broke detection for KabyLake-R i7-8650U.

i7-8650U (KabyLake-R) CPUID 000806EA, (microcode 00000070)
i7-8559U (CoffeeLake) CPUID 000806EA, (microcode 00000096)

What is your opinion on differentiating this scenario?

@vit9696
Copy link
Contributor

vit9696 commented Oct 21, 2018

Hi, I was aware of that nuance at the time of suggesting the fix, and the behaviour is intentional. While Kaby Lake Refresh is called "Kaby", it has everything of Coffee Lake (e.g. UHD GPU), so it is proper to consider it Coffee with Kaby manufacturing technology than an actual Kaby. Should be closed I believe?

@RehabMan
Copy link
Author

UHD 620 in KabyLake-R is a bit of a strange case because it is supported by the KBL FB kext by injecting device-id=0x5916. So, it is KabyLake-R CPU, and KabyLake graphics.

I found this issue when implementing support in WEG for KBL->SKL spoofing (KBL as SKL is one of those spoofs that actually works quite well). Of course, it wasn't working because I added it to the CpuGeneration::KabyLake case, and the CPU was detected as CoffeeLake, which meant the right kexts weren't load hooked.

I ended up implementing support for yet another currentFrameBuffer pointer (ending up with currentFramebuffer, currentFramebufferOpt1, currentFramebufferOpt2). It seems to work, but the code is getting ugly due to the special cases... and that code doesn't even include the (yet another) special case to hook the SKL accelerator kext (would need curentGraphicsOpt). That is in my github fork as branch 'spoof_fix1'.

Which brings me back to my original proposal for these issues, which is to have WEG load hook all of the involved graphics/framebuffer kexts. This effort is in my github fork as branch 'spoof_fix2'.

I would appreciate any feedback you have on those two approaches. Keep in mind the code there is a WIP.

BTW, I'm noticing that on 10.14.1 (db4), there is some issue resolving the _gPlatformInformationList symbol (maybe others). The symbol is there according to the 'nm' utility... Not sure if it is related to my changes here, or just a change in the beta that will need Lilu work. Investigating still.

@RehabMan
Copy link
Author

RehabMan commented Oct 22, 2018

Weird... today I have no problem with a few things that were giving me issues yesterday (such as the 10.14.1 _gPlatformInformationList problem, and some other strangeness with Lilu::onKextLoad). I did make this change in a an effort to track down all the strange things I was seeing:

NUC6i7KYK:lilu.git rehabman$ git diff
diff --git a/Lilu/Headers/kern_util.hpp b/Lilu/Headers/kern_util.hpp
index 17e627f..5c93dbb 100644
--- a/Lilu/Headers/kern_util.hpp
+++ b/Lilu/Headers/kern_util.hpp
@@ -660,8 +660,13 @@ public:
         */
        T *reserve(size_t num) {
                if (rsvd < num) {
-                       T *nPtr = static_cast<T *>(kern_os_realloc(ptr, num * sizeof(T)));
+                       //T *nPtr = static_cast<T *>(kern_os_realloc(ptr, num * sizeof(T)));
+                       T *nPtr = static_cast<T *>(kern_os_malloc(num * sizeof(T)));
                        if (nPtr) {
+                               if (ptr) {
+                                       lilu_os_memmove(nPtr, ptr, rsvd * sizeof(T));
+                                       kern_os_free(ptr);
+                               }
                                ptr = nPtr;
                                rsvd = num;
                        } else {

(it would seem unlikely realloc is broken, but I've seen it before...)

I will undo that and re-do all tests with 10.14 and 10.14.1 now.

BTW, it would be better if evector::reserve did NOT do a realloc for each added entry. Perhaps it should grow the reservation 10 items at a time instead.

@vit9696
Copy link
Contributor

vit9696 commented Oct 22, 2018

Firstly, regarding the CPUID issue.

I checked the CPUs, and I am pretty certain that these chipsets should be considered the same. It is not rare for KBL or CFL drivers to work better on some IGPUs regardless of their generation, and in the majority of the cases they could be used interchangeably. This is why we permit KBL drivers for CFL at no issue.

For other generations, I believe we should drop ICL and CNL code from both Lilu and WhateverGreen, as currently they are not present in any operating system, and were not known to function at all.

With this in mind I suppose we should have the ability to spoof CFL as KBL/SKL, and KBL as SKL. While surely imperfect, I do not have an issue about it. I saw your spoof_fix2 branch, and while with certain level of refactoring it could be "ok", I do not believe there is any reason to go with this approach. Also, I am not positive the SKL spoof is actually needed on the latest platforms, which we primarily target. You are the first one I hear this from, and I feel that likely the cause was related to other issues, and we should reconsider this.

Next, regarding evector.

This is the second time somebody talks about evector (e.g. acidanthera/Lilu#46), and point is, the people do not see the basics behind, and often simply bork it at the very least.

kern_os_realloc is provided by macOS and for sure it is not broken. It also takes care about memory reservation, and that it why there is no need to implement it yourself: for most of the calls reallocation will not cost anything at all. UPD: it does not :/, the condition is nsize == osize instead of nsize <= osize, see: libkern/c++/OSRuntime.cpp.

The reason reserve method exists is not to avoid not existent performance issues, but to guarantee no kern_os_realloc calls (which may invoke the allocator) during the insertion of N elements. This property is crucial when dealing with the code executing in page fault context, like interrupt emulation in VirtualSMC. You are also not allowed to allocate during spinlock acquisition and some other cases.

@RehabMan
Copy link
Author

Is there a performance implication with the spoof_fix2 technique? I ask because aside from that, the resulting code is arguably cleaner than other methods. With some refactoring, I can impose limits on the range of spoofing allowed, and like you, I think that is probably a good idea. But, I would like to understand any performance implications of unrestrained onKextLoad.

I'm ok with dropping ICL and CNL until such hardware arrives, or we can leave it in as a template for the future (as it does no harm remaining there). Either way.

Already KBL->SKL spoof is being used (and I made no announcement of my fork) since the KBL graphics kexts in Mojave are fairly borked for certain laptops and the SKL kexts work just fine.

As far as this specific issue regarding lilucpu, I'm willing to close but perhaps with a README comment regarding flawed Kaby vs. CoffeeLake detection for certain Kaby-R CPUs.

And if we are to continue to switch on detected CPU in WhateverGreen, I would suggest a separate flag, specific for WhateverGreen (wegcpu?) such that the user could provide the alternate CPU just for WEG graphics detect. Because perhaps I want to do some graphics spoofing without affecting other Lilu plugins also dependent on core Lilu CPU detection.

And regarding kerne_os_realloc, it does seem unlikely it is broken... but I was seeing some really strange stuff, so I started trying to eliminate possibilities.

For evector::reserve, I understand your issue/design in the context of interrupt time calls... My proposal would be to pre-allocate the evector(s) (LiluAPI) to a reasonable size (eg. call reserve(10) in the constructor) as it would avoid multiple realloc calls in the most common situations of Lilu callback/kext setup.

@RehabMan
Copy link
Author

Note that my proposal for evector::reserve was more like something like this.
(but, I would probably make the constant 10 a template param, but you get the jist)

NUC6i7KYK:lilu.git rehabman$ git diff
diff --git a/Lilu/Headers/kern_util.hpp b/Lilu/Headers/kern_util.hpp
index 17e627f..4568bc5 100644
--- a/Lilu/Headers/kern_util.hpp
+++ b/Lilu/Headers/kern_util.hpp
@@ -660,10 +660,10 @@ public:
         */
        T *reserve(size_t num) {
                if (rsvd < num) {
-                       T *nPtr = static_cast<T *>(kern_os_realloc(ptr, num * sizeof(T)));
+                       T *nPtr = static_cast<T *>(kern_os_realloc(ptr, (num + 10) * sizeof(T)));
                        if (nPtr) {
                                ptr = nPtr;
-                               rsvd = num;
+                               rsvd = num + 10;
                        } else {
                                return nullptr;
                        }

It would not affect your concerns regarding interrupt time calls and unexpected allocator execution.

@vit9696
Copy link
Contributor

vit9696 commented Oct 22, 2018

Nobody measured the performance, but I believe it will not be noticeable with the latest Lilu, which does take care of not invoking disk I/O when kexts are disabled. The only overhead is larger storedKexts, but it is processed only once.

A bigger concern will be the actual logic overcomplication as I see it. Besides KBL/CFL spoofing there is no need for using kexts different from the CPU generation. Why should one load random IGPU kext on random CPU model??? For SKL I am afraid it is a hack that is of no reason to support: Apple will update SKL code to match KBL code sooner or later, as both are still maintained, and you will have to solve the root cause of the issue anyway. I do not believe using SKL kexts is a solution at all, and suppose that it should be addressed properly, awaited for bugfixes from Apple, or not used at all (for instance, I am still on 10.13). If you think that I am mistaken, you will have to be a lot more specific in technical details of the existing issues.

For reserve the point of randomly allocating more items than necessary feels crazy in most contexts of our needs. I believe it is fine to have a template param in push_back and reserve that expands to a multiplier/divisor for extra storage reservation and equals 1 by default, but the use case and actual performance growth is to be justified. This way the existing code will continue to function as it is and the code on the hot path gets a desired optimisation.

@RehabMan
Copy link
Author

FYI: KBL as SKL is more applicable than CFL as KBL (more stuff changed with CFL, in particular as we are discovering with laptops, the PWM backlight register layout is different)

I'll work up another branch that limits spoofing to those combos that make sense and the wegcpu idea.

As far as the evector issue, I'll let you decide/implement what you like... I was just letting you know about the inefficiency of doing a realloc on every push_back.

@RehabMan
Copy link
Author

Closing this issue, as we are punting on the CPU detection bug.

vit9696 added a commit to acidanthera/Lilu that referenced this issue Oct 23, 2018
…habMan.

This commit provides finer control on `evector` allocation, to workaround naive implementation of `kern_os_realloc`.
Initial bugreport: #46
Further information: acidanthera/bugtracker#290 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants