[amd64] Fix more regressions due to "efi: Build our own page table structure"
- efi: Fix boot crash by always mapping boot service regions into new EFI page tables (Closes: #815125) - mm/pat: Fix boot crash when 1GB pages are not supported by cpu Although I don't have confirmation from Norbert Preining for #815125, it does look like these fix all the remaining regressions.
This commit is contained in:
parent
20994ffba4
commit
328762c833
|
@ -73,6 +73,11 @@ linux (4.4.5-1) UNRELEASED; urgency=medium
|
||||||
* [x86] drm/i915: Fix oops caused by fbdev initialization failure
|
* [x86] drm/i915: Fix oops caused by fbdev initialization failure
|
||||||
* module: Fix ABI change in 4.4.5
|
* module: Fix ABI change in 4.4.5
|
||||||
* Revert "libata: Align ata_device's id on a cacheline" to avoid ABI change
|
* Revert "libata: Align ata_device's id on a cacheline" to avoid ABI change
|
||||||
|
* [amd64] Fix more regressions due to "efi: Build our own page table
|
||||||
|
structure":
|
||||||
|
- efi: Fix boot crash by always mapping boot service regions into new EFI
|
||||||
|
page tables (Closes: #815125)
|
||||||
|
- mm/pat: Fix boot crash when 1GB pages are not supported by cpu
|
||||||
|
|
||||||
[ Ian Campbell ]
|
[ Ian Campbell ]
|
||||||
* [arm64] Enable ARCH_HISI (Hisilicon) and the set of currently available
|
* [arm64] Enable ARCH_HISI (Hisilicon) and the set of currently available
|
||||||
|
|
197
debian/patches/bugfix/x86/x86-efi-fix-boot-crash-by-always-mapping-boot-servic.patch
vendored
Normal file
197
debian/patches/bugfix/x86/x86-efi-fix-boot-crash-by-always-mapping-boot-servic.patch
vendored
Normal file
|
@ -0,0 +1,197 @@
|
||||||
|
From: Matt Fleming <matt@codeblueprint.co.uk>
|
||||||
|
Date: Fri, 11 Mar 2016 11:19:23 +0000
|
||||||
|
Subject: x86/efi: Fix boot crash by always mapping boot service regions into
|
||||||
|
new EFI page tables
|
||||||
|
Origin: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit?id=452308de61056a539352a9306c46716d7af8a1f1
|
||||||
|
Bug-Debian: https://bugs.debian.org/815125
|
||||||
|
|
||||||
|
Some machines have EFI regions in page zero (physical address
|
||||||
|
0x00000000) and historically that region has been added to the e820
|
||||||
|
map via trim_bios_range(), and ultimately mapped into the kernel page
|
||||||
|
tables. It was not mapped via efi_map_regions() as one would expect.
|
||||||
|
|
||||||
|
Alexis reports that with the new separate EFI page tables some boot
|
||||||
|
services regions, such as page zero, are not mapped. This triggers an
|
||||||
|
oops during the SetVirtualAddressMap() runtime call.
|
||||||
|
|
||||||
|
For the EFI boot services quirk on x86 we need to memblock_reserve()
|
||||||
|
boot services regions until after SetVirtualAddressMap(). Doing that
|
||||||
|
while respecting the ownership of regions that may have already been
|
||||||
|
reserved by the kernel was the motivation behind this commit:
|
||||||
|
|
||||||
|
7d68dc3f1003 ("x86, efi: Do not reserve boot services regions within reserved areas")
|
||||||
|
|
||||||
|
That patch was merged at a time when the EFI runtime virtual mappings
|
||||||
|
were inserted into the kernel page tables as described above, and the
|
||||||
|
trick of setting ->numpages (and hence the region size) to zero to
|
||||||
|
track regions that should not be freed in efi_free_boot_services()
|
||||||
|
meant that we never mapped those regions in efi_map_regions(). Instead
|
||||||
|
we were relying solely on the existing kernel mappings.
|
||||||
|
|
||||||
|
Now that we have separate page tables we need to make sure the EFI
|
||||||
|
boot services regions are mapped correctly, even if someone else has
|
||||||
|
already called memblock_reserve(). Instead of stashing a tag in
|
||||||
|
->numpages, set the EFI_MEMORY_RUNTIME bit of ->attribute. Since it
|
||||||
|
generally makes no sense to mark a boot services region as required at
|
||||||
|
runtime, it's pretty much guaranteed the firmware will not have
|
||||||
|
already set this bit.
|
||||||
|
|
||||||
|
For the record, the specific circumstances under which Alexis
|
||||||
|
triggered this bug was that an EFI runtime driver on his machine was
|
||||||
|
responding to the EVT_SIGNAL_VIRTUAL_ADDRESS_CHANGE event during
|
||||||
|
SetVirtualAddressMap().
|
||||||
|
|
||||||
|
The event handler for this driver looks like this,
|
||||||
|
|
||||||
|
sub rsp,0x28
|
||||||
|
lea rdx,[rip+0x2445] # 0xaa948720
|
||||||
|
mov ecx,0x4
|
||||||
|
call func_aa9447c0 ; call to ConvertPointer(4, & 0xaa948720)
|
||||||
|
mov r11,QWORD PTR [rip+0x2434] # 0xaa948720
|
||||||
|
xor eax,eax
|
||||||
|
mov BYTE PTR [r11+0x1],0x1
|
||||||
|
add rsp,0x28
|
||||||
|
ret
|
||||||
|
|
||||||
|
Which is pretty typical code for an EVT_SIGNAL_VIRTUAL_ADDRESS_CHANGE
|
||||||
|
handler. The "mov r11, QWORD PTR [rip+0x2424]" was the faulting
|
||||||
|
instruction because ConvertPointer() was being called to convert the
|
||||||
|
address 0x0000000000000000, which when converted is left unchanged and
|
||||||
|
remains 0x0000000000000000.
|
||||||
|
|
||||||
|
The output of the oops trace gave the impression of a standard NULL
|
||||||
|
pointer dereference bug, but because we're accessing physical
|
||||||
|
addresses during ConvertPointer(), it wasn't. EFI boot services code
|
||||||
|
is stored at that address on Alexis' machine.
|
||||||
|
|
||||||
|
Reported-by: Alexis Murzeau <amurzeau@gmail.com>
|
||||||
|
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
|
||||||
|
Cc: Andy Lutomirski <luto@amacapital.net>
|
||||||
|
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
|
||||||
|
Cc: Ben Hutchings <ben@decadent.org.uk>
|
||||||
|
Cc: Borislav Petkov <bp@alien8.de>
|
||||||
|
Cc: Brian Gerst <brgerst@gmail.com>
|
||||||
|
Cc: Denys Vlasenko <dvlasenk@redhat.com>
|
||||||
|
Cc: H. Peter Anvin <hpa@zytor.com>
|
||||||
|
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
||||||
|
Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com>
|
||||||
|
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
|
||||||
|
Cc: Peter Zijlstra <peterz@infradead.org>
|
||||||
|
Cc: Raphael Hertzog <hertzog@debian.org>
|
||||||
|
Cc: Roger Shimizu <rogershimizu@gmail.com>
|
||||||
|
Cc: Thomas Gleixner <tglx@linutronix.de>
|
||||||
|
Cc: linux-efi@vger.kernel.org
|
||||||
|
Link: http://lkml.kernel.org/r/1457695163-29632-2-git-send-email-matt@codeblueprint.co.uk
|
||||||
|
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=815125
|
||||||
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
||||||
|
---
|
||||||
|
arch/x86/platform/efi/quirks.c | 79 +++++++++++++++++++++++++++++++++---------
|
||||||
|
1 file changed, 62 insertions(+), 17 deletions(-)
|
||||||
|
|
||||||
|
--- a/arch/x86/platform/efi/quirks.c
|
||||||
|
+++ b/arch/x86/platform/efi/quirks.c
|
||||||
|
@@ -130,6 +130,27 @@ efi_status_t efi_query_variable_store(u3
|
||||||
|
EXPORT_SYMBOL_GPL(efi_query_variable_store);
|
||||||
|
|
||||||
|
/*
|
||||||
|
+ * Helper function for efi_reserve_boot_services() to figure out if we
|
||||||
|
+ * can free regions in efi_free_boot_services().
|
||||||
|
+ *
|
||||||
|
+ * Use this function to ensure we do not free regions owned by somebody
|
||||||
|
+ * else. We must only reserve (and then free) regions:
|
||||||
|
+ *
|
||||||
|
+ * - Not within any part of the kernel
|
||||||
|
+ * - Not the BIOS reserved area (E820_RESERVED, E820_NVS, etc)
|
||||||
|
+ */
|
||||||
|
+static bool can_free_region(u64 start, u64 size)
|
||||||
|
+{
|
||||||
|
+ if (start + size > __pa_symbol(_text) && start <= __pa_symbol(_end))
|
||||||
|
+ return false;
|
||||||
|
+
|
||||||
|
+ if (!e820_all_mapped(start, start+size, E820_RAM))
|
||||||
|
+ return false;
|
||||||
|
+
|
||||||
|
+ return true;
|
||||||
|
+}
|
||||||
|
+
|
||||||
|
+/*
|
||||||
|
* The UEFI specification makes it clear that the operating system is free to do
|
||||||
|
* whatever it wants with boot services code after ExitBootServices() has been
|
||||||
|
* called. Ignoring this recommendation a significant bunch of EFI implementations
|
||||||
|
@@ -146,26 +167,50 @@ void __init efi_reserve_boot_services(vo
|
||||||
|
efi_memory_desc_t *md = p;
|
||||||
|
u64 start = md->phys_addr;
|
||||||
|
u64 size = md->num_pages << EFI_PAGE_SHIFT;
|
||||||
|
+ bool already_reserved;
|
||||||
|
|
||||||
|
if (md->type != EFI_BOOT_SERVICES_CODE &&
|
||||||
|
md->type != EFI_BOOT_SERVICES_DATA)
|
||||||
|
continue;
|
||||||
|
- /* Only reserve where possible:
|
||||||
|
- * - Not within any already allocated areas
|
||||||
|
- * - Not over any memory area (really needed, if above?)
|
||||||
|
- * - Not within any part of the kernel
|
||||||
|
- * - Not the bios reserved area
|
||||||
|
- */
|
||||||
|
- if ((start + size > __pa_symbol(_text)
|
||||||
|
- && start <= __pa_symbol(_end)) ||
|
||||||
|
- !e820_all_mapped(start, start+size, E820_RAM) ||
|
||||||
|
- memblock_is_region_reserved(start, size)) {
|
||||||
|
- /* Could not reserve, skip it */
|
||||||
|
- md->num_pages = 0;
|
||||||
|
- memblock_dbg("Could not reserve boot range [0x%010llx-0x%010llx]\n",
|
||||||
|
- start, start+size-1);
|
||||||
|
- } else
|
||||||
|
+
|
||||||
|
+ already_reserved = memblock_is_region_reserved(start, size);
|
||||||
|
+
|
||||||
|
+ /*
|
||||||
|
+ * Because the following memblock_reserve() is paired
|
||||||
|
+ * with free_bootmem_late() for this region in
|
||||||
|
+ * efi_free_boot_services(), we must be extremely
|
||||||
|
+ * careful not to reserve, and subsequently free,
|
||||||
|
+ * critical regions of memory (like the kernel image) or
|
||||||
|
+ * those regions that somebody else has already
|
||||||
|
+ * reserved.
|
||||||
|
+ *
|
||||||
|
+ * A good example of a critical region that must not be
|
||||||
|
+ * freed is page zero (first 4Kb of memory), which may
|
||||||
|
+ * contain boot services code/data but is marked
|
||||||
|
+ * E820_RESERVED by trim_bios_range().
|
||||||
|
+ */
|
||||||
|
+ if (!already_reserved) {
|
||||||
|
memblock_reserve(start, size);
|
||||||
|
+
|
||||||
|
+ /*
|
||||||
|
+ * If we are the first to reserve the region, no
|
||||||
|
+ * one else cares about it. We own it and can
|
||||||
|
+ * free it later.
|
||||||
|
+ */
|
||||||
|
+ if (can_free_region(start, size))
|
||||||
|
+ continue;
|
||||||
|
+ }
|
||||||
|
+
|
||||||
|
+ /*
|
||||||
|
+ * We don't own the region. We must not free it.
|
||||||
|
+ *
|
||||||
|
+ * Setting this bit for a boot services region really
|
||||||
|
+ * doesn't make sense as far as the firmware is
|
||||||
|
+ * concerned, but it does provide us with a way to tag
|
||||||
|
+ * those regions that must not be paired with
|
||||||
|
+ * free_bootmem_late().
|
||||||
|
+ */
|
||||||
|
+ md->attribute |= EFI_MEMORY_RUNTIME;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
@@ -182,8 +227,8 @@ void __init efi_free_boot_services(void)
|
||||||
|
md->type != EFI_BOOT_SERVICES_DATA)
|
||||||
|
continue;
|
||||||
|
|
||||||
|
- /* Could not reserve boot area */
|
||||||
|
- if (!size)
|
||||||
|
+ /* Do not free, someone else owns it: */
|
||||||
|
+ if (md->attribute & EFI_MEMORY_RUNTIME)
|
||||||
|
continue;
|
||||||
|
|
||||||
|
free_bootmem_late(start, size);
|
58
debian/patches/bugfix/x86/x86-mm-pat-fix-boot-crash-when-1gb-pages-are-not-supported.patch
vendored
Normal file
58
debian/patches/bugfix/x86/x86-mm-pat-fix-boot-crash-when-1gb-pages-are-not-supported.patch
vendored
Normal file
|
@ -0,0 +1,58 @@
|
||||||
|
From: Matt Fleming <matt@codeblueprint.co.uk>
|
||||||
|
Date: Mon, 14 Mar 2016 10:33:01 +0000
|
||||||
|
Subject: x86/mm/pat: Fix boot crash when 1GB pages are not supported by cpu
|
||||||
|
Origin: http://mid.gmane.org/1457951581-27353-2-git-send-email-matt@codeblueprint.co.uk
|
||||||
|
|
||||||
|
Scott reports that with the new separate EFI page tables he's seeing
|
||||||
|
the following error on boot, caused by setting reserved bits in the
|
||||||
|
page table structures (fault code is PF_RSVD | PF_PROT),
|
||||||
|
|
||||||
|
swapper/0: Corrupted page table at address 17b102020
|
||||||
|
PGD 17b0e5063 PUD 1400000e3
|
||||||
|
Bad pagetable: 0009 [#1] SMP
|
||||||
|
|
||||||
|
On first inspection the PUD is using a 1GB page size (_PAGE_PSE) and
|
||||||
|
looks fine but that's only true if support for 1GB PUD pages
|
||||||
|
("pdpe1gb") is present in the cpu.
|
||||||
|
|
||||||
|
Scott's Intel Celeron N2820 does not have that feature and so the
|
||||||
|
_PAGE_PSE bit is reserved. Fix this issue by making the 1GB mapping
|
||||||
|
code in conditional on "cpu_has_gbpages".
|
||||||
|
|
||||||
|
This issue didn't come up in the past because the required mapping for
|
||||||
|
the faulting address (0x17b102020) will already have been setup by the
|
||||||
|
kernel in early boot before we got to efi_map_regions(), but we no
|
||||||
|
longer use the standard kernel page tables during EFI calls.
|
||||||
|
|
||||||
|
Reported-by: Scott Ashcroft <scott.ashcroft@talk21.com>
|
||||||
|
Tested-by: Scott Ashcroft <scott.ashcroft@talk21.com>
|
||||||
|
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
|
||||||
|
Cc: Ben Hutchings <ben@decadent.org.uk>
|
||||||
|
Cc: Borislav Petkov <bp@alien8.de>
|
||||||
|
Cc: Brian Gerst <brgerst@gmail.com>
|
||||||
|
Cc: Denys Vlasenko <dvlasenk@redhat.com>
|
||||||
|
Cc: "H. Peter Anvin" <hpa@zytor.com>
|
||||||
|
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
||||||
|
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
|
||||||
|
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
|
||||||
|
Cc: Peter Zijlstra <peterz@infradead.org>
|
||||||
|
Cc: Raphael Hertzog <hertzog@debian.org>
|
||||||
|
Cc: Roger Shimizu <rogershimizu@gmail.com>
|
||||||
|
Cc: Thomas Gleixner <tglx@linutronix.de>
|
||||||
|
Cc: linux-efi@vger.kernel.org
|
||||||
|
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
|
||||||
|
---
|
||||||
|
arch/x86/mm/pageattr.c | 2 +-
|
||||||
|
1 file changed, 1 insertion(+), 1 deletion(-)
|
||||||
|
|
||||||
|
--- a/arch/x86/mm/pageattr.c
|
||||||
|
+++ b/arch/x86/mm/pageattr.c
|
||||||
|
@@ -1036,7 +1036,7 @@ static int populate_pud(struct cpa_data
|
||||||
|
/*
|
||||||
|
* Map everything starting from the Gb boundary, possibly with 1G pages
|
||||||
|
*/
|
||||||
|
- while (end - start >= PUD_SIZE) {
|
||||||
|
+ while (cpu_has_gbpages && end - start >= PUD_SIZE) {
|
||||||
|
set_pud(pud, __pud(cpa->pfn << PAGE_SHIFT | _PAGE_PSE |
|
||||||
|
massage_pgprot(pud_pgprot)));
|
||||||
|
|
|
@ -137,3 +137,5 @@ bugfix/powerpc/powerpc-fix-dedotify-for-binutils-2.26.patch
|
||||||
bugfix/x86/drm-i915-Fix-oops-caused-by-fbdev-initialization-fai.patch
|
bugfix/x86/drm-i915-Fix-oops-caused-by-fbdev-initialization-fai.patch
|
||||||
debian/revert-libata-align-ata_device-s-id-on-a-cacheline.patch
|
debian/revert-libata-align-ata_device-s-id-on-a-cacheline.patch
|
||||||
debian/module-fix-abi-change-in-4.4.5.patch
|
debian/module-fix-abi-change-in-4.4.5.patch
|
||||||
|
bugfix/x86/x86-efi-fix-boot-crash-by-always-mapping-boot-servic.patch
|
||||||
|
bugfix/x86/x86-mm-pat-fix-boot-crash-when-1gb-pages-are-not-supported.patch
|
||||||
|
|
Loading…
Reference in New Issue