141 lines
5.4 KiB
Diff
141 lines
5.4 KiB
Diff
From: Michael Ellerman <mpe@ellerman.id.au>
|
|
Date: Wed, 12 Jun 2019 23:35:07 +1000
|
|
Subject: powerpc/mm/64s/hash: Reallocate context ids on fork
|
|
Origin: https://git.kernel.org/linus/ca72d88378b2f2444d3ec145dd442d449d3fefbc
|
|
Bug-Debian-Security: https://security-tracker.debian.org/tracker/CVE-2019-12817
|
|
|
|
When using the Hash Page Table (HPT) MMU, userspace memory mappings
|
|
are managed at two levels. Firstly in the Linux page tables, much like
|
|
other architectures, and secondly in the SLB (Segment Lookaside
|
|
Buffer) and HPT. It's the SLB and HPT that are actually used by the
|
|
hardware to do translations.
|
|
|
|
As part of the series adding support for 4PB user virtual address
|
|
space using the hash MMU, we added support for allocating multiple
|
|
"context ids" per process, one for each 512TB chunk of address space.
|
|
These are tracked in an array called extended_id in the mm_context_t
|
|
of a process that has done a mapping above 512TB.
|
|
|
|
If such a process forks (ie. clone(2) without CLONE_VM set) it's mm is
|
|
copied, including the mm_context_t, and then init_new_context() is
|
|
called to reinitialise parts of the mm_context_t as appropriate to
|
|
separate the address spaces of the two processes.
|
|
|
|
The key step in ensuring the two processes have separate address
|
|
spaces is to allocate a new context id for the process, this is done
|
|
at the beginning of hash__init_new_context(). If we didn't allocate a
|
|
new context id then the two processes would share mappings as far as
|
|
the SLB and HPT are concerned, even though their Linux page tables
|
|
would be separate.
|
|
|
|
For mappings above 512TB, which use the extended_id array, we
|
|
neglected to allocate new context ids on fork, meaning the parent and
|
|
child use the same ids and therefore share those mappings even though
|
|
they're supposed to be separate. This can lead to the parent seeing
|
|
writes done by the child, which is essentially memory corruption.
|
|
|
|
There is an additional exposure which is that if the child process
|
|
exits, all its context ids are freed, including the context ids that
|
|
are still in use by the parent for mappings above 512TB. One or more
|
|
of those ids can then be reallocated to a third process, that process
|
|
can then read/write to the parent's mappings above 512TB. Additionally
|
|
if the freed id is used for the third process's primary context id,
|
|
then the parent is able to read/write to the third process's mappings
|
|
*below* 512TB.
|
|
|
|
All of these are fundamental failures to enforce separation between
|
|
processes. The only mitigating factor is that the bug only occurs if a
|
|
process creates mappings above 512TB, and most applications still do
|
|
not create such mappings.
|
|
|
|
Only machines using the hash page table MMU are affected, eg. PowerPC
|
|
970 (G5), PA6T, Power5/6/7/8/9. By default Power9 bare metal machines
|
|
(powernv) use the Radix MMU and are not affected, unless the machine
|
|
has been explicitly booted in HPT mode (using disable_radix on the
|
|
kernel command line). KVM guests on Power9 may be affected if the host
|
|
or guest is configured to use the HPT MMU. LPARs under PowerVM on
|
|
Power9 are affected as they always use the HPT MMU. Kernels built with
|
|
PAGE_SIZE=4K are not affected.
|
|
|
|
The fix is relatively simple, we need to reallocate context ids for
|
|
all extended mappings on fork.
|
|
|
|
Fixes: f384796c40dc ("powerpc/mm: Add support for handling > 512TB address in SLB miss")
|
|
Cc: stable@vger.kernel.org # v4.17+
|
|
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
---
|
|
arch/powerpc/mm/mmu_context_book3s64.c | 46 +++++++++++++++++++++++---
|
|
1 file changed, 42 insertions(+), 4 deletions(-)
|
|
|
|
diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c
|
|
index dbd8f762140b..68984d85ad6b 100644
|
|
--- a/arch/powerpc/mm/mmu_context_book3s64.c
|
|
+++ b/arch/powerpc/mm/mmu_context_book3s64.c
|
|
@@ -53,14 +53,48 @@ int hash__alloc_context_id(void)
|
|
}
|
|
EXPORT_SYMBOL_GPL(hash__alloc_context_id);
|
|
|
|
+static int realloc_context_ids(mm_context_t *ctx)
|
|
+{
|
|
+ int i, id;
|
|
+
|
|
+ /*
|
|
+ * id 0 (aka. ctx->id) is special, we always allocate a new one, even if
|
|
+ * there wasn't one allocated previously (which happens in the exec
|
|
+ * case where ctx is newly allocated).
|
|
+ *
|
|
+ * We have to be a bit careful here. We must keep the existing ids in
|
|
+ * the array, so that we can test if they're non-zero to decide if we
|
|
+ * need to allocate a new one. However in case of error we must free the
|
|
+ * ids we've allocated but *not* any of the existing ones (or risk a
|
|
+ * UAF). That's why we decrement i at the start of the error handling
|
|
+ * loop, to skip the id that we just tested but couldn't reallocate.
|
|
+ */
|
|
+ for (i = 0; i < ARRAY_SIZE(ctx->extended_id); i++) {
|
|
+ if (i == 0 || ctx->extended_id[i]) {
|
|
+ id = hash__alloc_context_id();
|
|
+ if (id < 0)
|
|
+ goto error;
|
|
+
|
|
+ ctx->extended_id[i] = id;
|
|
+ }
|
|
+ }
|
|
+
|
|
+ /* The caller expects us to return id */
|
|
+ return ctx->id;
|
|
+
|
|
+error:
|
|
+ for (i--; i >= 0; i--) {
|
|
+ if (ctx->extended_id[i])
|
|
+ ida_free(&mmu_context_ida, ctx->extended_id[i]);
|
|
+ }
|
|
+
|
|
+ return id;
|
|
+}
|
|
+
|
|
static int hash__init_new_context(struct mm_struct *mm)
|
|
{
|
|
int index;
|
|
|
|
- index = hash__alloc_context_id();
|
|
- if (index < 0)
|
|
- return index;
|
|
-
|
|
/*
|
|
* The old code would re-promote on fork, we don't do that when using
|
|
* slices as it could cause problem promoting slices that have been
|
|
@@ -78,6 +112,10 @@ static int hash__init_new_context(struct mm_struct *mm)
|
|
if (mm->context.id == 0)
|
|
slice_init_new_context_exec(mm);
|
|
|
|
+ index = realloc_context_ids(&mm->context);
|
|
+ if (index < 0)
|
|
+ return index;
|
|
+
|
|
subpage_prot_init_new_context(mm);
|
|
|
|
pkey_mm_init(mm);
|
|
--
|
|
2.20.1
|
|
|