Cherry-pick important fixes from 3.2 stable patch queue

Mostly networking security fixes; also one mm denial-of-service fix
and one other networking fix that involves an ABI change.

svn path=/dists/sid/linux/; revision=19380
This commit is contained in:
Ben Hutchings 2012-09-16 00:04:13 +00:00
parent cfbfe7ebbc
commit 1fd7124a8b
25 changed files with 1489 additions and 0 deletions

12
debian/changelog vendored
View File

@ -134,6 +134,18 @@ linux (3.2.29-1) UNRELEASED; urgency=low
* [x86] drm/i915: Fix i8xx interrupt handling (Closes: #655152)
* [armel/kirkwood] ahci: Add JMicron 362 device IDs (Closes: #634180)
* speakup: lower default software speech rate (Closes: #686742)
* e1000e: Fix potential DoS when TSO enabled
* mm: Remove user-triggerable BUG from mpol_to_str
* sfc: Fix maximum number of TSO segments and minimum TX queue size
(CVE-2012-3412)
- tcp: Apply device TSO segment limit earlier
* net_sched: gact: Fix potential panic in tcf_gact().
* af_packet: remove BUG statement in tpacket_destruct_skb
* net: Fix various information leaks
* af_packet: don't emit packet on orig fanout group
* af_netlink: force credentials passing (CVE-2012-3520)
* netlink: fix possible spoofing from non-root processes
* net: ipv4: ipmr_expire_timer causes crash when removing net namespace
[ Bastian Blank ]
* Make xen-linux-system meta-packages depend on xen-system. This allows

View File

@ -0,0 +1,91 @@
From: Eric Dumazet <edumazet@google.com>
Date: Tue, 21 Aug 2012 06:21:17 +0000
Subject: af_netlink: force credentials passing [CVE-2012-3520]
[ Upstream commit e0e3cea46d31d23dc40df0a49a7a2c04fe8edfea ]
Pablo Neira Ayuso discovered that avahi and
potentially NetworkManager accept spoofed Netlink messages because of a
kernel bug. The kernel passes all-zero SCM_CREDENTIALS ancillary data
to the receiver if the sender did not provide such data, instead of not
including any such data at all or including the correct data from the
peer (as it is the case with AF_UNIX).
This bug was introduced in commit 16e572626961
(af_unix: dont send SCM_CREDENTIALS by default)
This patch forces passing credentials for netlink, as
before the regression.
Another fix would be to not add SCM_CREDENTIALS in
netlink messages if not provided by the sender, but it
might break some programs.
With help from Florian Weimer & Petr Matousek
This issue is designated as CVE-2012-3520
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
include/net/scm.h | 4 +++-
net/netlink/af_netlink.c | 2 +-
net/unix/af_unix.c | 4 ++--
3 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/include/net/scm.h b/include/net/scm.h
index d456f4c..0c0017c 100644
--- a/include/net/scm.h
+++ b/include/net/scm.h
@@ -71,9 +71,11 @@ static __inline__ void scm_destroy(struct scm_cookie *scm)
}
static __inline__ int scm_send(struct socket *sock, struct msghdr *msg,
- struct scm_cookie *scm)
+ struct scm_cookie *scm, bool forcecreds)
{
memset(scm, 0, sizeof(*scm));
+ if (forcecreds)
+ scm_set_cred(scm, task_tgid(current), current_cred());
unix_get_peersec_dgram(sock, scm);
if (msg->msg_controllen <= 0)
return 0;
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index a99fb41..1af8542 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1333,7 +1333,7 @@ static int netlink_sendmsg(struct kiocb *kiocb, struct socket *sock,
if (NULL == siocb->scm)
siocb->scm = &scm;
- err = scm_send(sock, msg, siocb->scm);
+ err = scm_send(sock, msg, siocb->scm, true);
if (err < 0)
return err;
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index d99678a..317bfe3 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1435,7 +1435,7 @@ static int unix_dgram_sendmsg(struct kiocb *kiocb, struct socket *sock,
if (NULL == siocb->scm)
siocb->scm = &tmp_scm;
wait_for_unix_gc();
- err = scm_send(sock, msg, siocb->scm);
+ err = scm_send(sock, msg, siocb->scm, false);
if (err < 0)
return err;
@@ -1596,7 +1596,7 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
if (NULL == siocb->scm)
siocb->scm = &tmp_scm;
wait_for_unix_gc();
- err = scm_send(sock, msg, siocb->scm);
+ err = scm_send(sock, msg, siocb->scm, false);
if (err < 0)
return err;

View File

@ -0,0 +1,102 @@
From: Eric Leblond <eric@regit.org>
Date: Thu, 16 Aug 2012 22:02:58 +0000
Subject: af_packet: don't emit packet on orig fanout group
[ Upstream commit c0de08d04215031d68fa13af36f347a6cfa252ca ]
If a packet is emitted on one socket in one group of fanout sockets,
it is transmitted again. It is thus read again on one of the sockets
of the fanout group. This result in a loop for software which
generate packets when receiving one.
This retransmission is not the intended behavior: a fanout group
must behave like a single socket. The packet should not be
transmitted on a socket if it originates from a socket belonging
to the same fanout group.
This patch fixes the issue by changing the transmission check to
take fanout group info account.
Reported-by: Aleksandr Kotov <a1k@mail.ru>
Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
include/linux/netdevice.h | 2 ++
net/core/dev.c | 16 ++++++++++++++--
net/packet/af_packet.c | 9 +++++++++
3 files changed, 25 insertions(+), 2 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d178fb8..00ca32b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1513,6 +1513,8 @@ struct packet_type {
struct sk_buff **(*gro_receive)(struct sk_buff **head,
struct sk_buff *skb);
int (*gro_complete)(struct sk_buff *skb);
+ bool (*id_match)(struct packet_type *ptype,
+ struct sock *sk);
void *af_packet_priv;
struct list_head list;
};
diff --git a/net/core/dev.c b/net/core/dev.c
index 75da76d..832ba6d 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1631,6 +1631,19 @@ static inline int deliver_skb(struct sk_buff *skb,
return pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
}
+static inline bool skb_loop_sk(struct packet_type *ptype, struct sk_buff *skb)
+{
+ if (ptype->af_packet_priv == NULL)
+ return false;
+
+ if (ptype->id_match)
+ return ptype->id_match(ptype, skb->sk);
+ else if ((struct sock *)ptype->af_packet_priv == skb->sk)
+ return true;
+
+ return false;
+}
+
/*
* Support routine. Sends outgoing frames to any network
* taps currently in use.
@@ -1648,8 +1661,7 @@ static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
* they originated from - MvS (miquels@drinkel.ow.org)
*/
if ((ptype->dev == dev || !ptype->dev) &&
- (ptype->af_packet_priv == NULL ||
- (struct sock *)ptype->af_packet_priv != skb->sk)) {
+ (!skb_loop_sk(ptype, skb))) {
if (pt_prev) {
deliver_skb(skb2, pt_prev, skb->dev);
pt_prev = ptype;
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 13b14dc..85afc13 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1281,6 +1281,14 @@ static void __fanout_unlink(struct sock *sk, struct packet_sock *po)
spin_unlock(&f->lock);
}
+bool match_fanout_group(struct packet_type *ptype, struct sock * sk)
+{
+ if (ptype->af_packet_priv == (void*)((struct packet_sock *)sk)->fanout)
+ return true;
+
+ return false;
+}
+
static int fanout_add(struct sock *sk, u16 id, u16 type_flags)
{
struct packet_sock *po = pkt_sk(sk);
@@ -1333,6 +1341,7 @@ static int fanout_add(struct sock *sk, u16 id, u16 type_flags)
match->prot_hook.dev = po->prot_hook.dev;
match->prot_hook.func = packet_rcv_fanout;
match->prot_hook.af_packet_priv = match;
+ match->prot_hook.id_match = match_fanout_group;
dev_add_pack(&match->prot_hook);
list_add(&match->list, &fanout_list);
}

View File

@ -0,0 +1,47 @@
From: "danborkmann@iogearbox.net" <danborkmann@iogearbox.net>
Date: Fri, 10 Aug 2012 22:48:54 +0000
Subject: af_packet: remove BUG statement in tpacket_destruct_skb
[ Upstream commit 7f5c3e3a80e6654cf48dfba7cf94f88c6b505467 ]
Here's a quote of the comment about the BUG macro from asm-generic/bug.h:
Don't use BUG() or BUG_ON() unless there's really no way out; one
example might be detecting data structure corruption in the middle
of an operation that can't be backed out of. If the (sub)system
can somehow continue operating, perhaps with reduced functionality,
it's probably not BUG-worthy.
If you're tempted to BUG(), think again: is completely giving up
really the *only* solution? There are usually better options, where
users don't need to reboot ASAP and can mostly shut down cleanly.
In our case, the status flag of a ring buffer slot is managed from both sides,
the kernel space and the user space. This means that even though the kernel
side might work as expected, the user space screws up and changes this flag
right between the send(2) is triggered when the flag is changed to
TP_STATUS_SENDING and a given skb is destructed after some time. Then, this
will hit the BUG macro. As David suggested, the best solution is to simply
remove this statement since it cannot be used for kernel side internal
consistency checks. I've tested it and the system still behaves /stable/ in
this case, so in accordance with the above comment, we should rather remove it.
Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
net/packet/af_packet.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index d9d4970..13b14dc 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1931,7 +1931,6 @@ static void tpacket_destruct_skb(struct sk_buff *skb)
if (likely(po->tx_ring.pg_vec)) {
ph = skb_shinfo(skb)->destructor_arg;
- BUG_ON(__packet_get_status(po, ph) != TP_STATUS_SENDING);
BUG_ON(atomic_read(&po->tx_ring.pending) == 0);
atomic_dec(&po->tx_ring.pending);
__packet_set_status(po, ph, TP_STATUS_AVAILABLE);

View File

@ -0,0 +1,29 @@
From: Mathias Krause <minipli@googlemail.com>
Date: Wed, 15 Aug 2012 11:31:44 +0000
Subject: atm: fix info leak in getsockopt(SO_ATMPVC)
[ Upstream commit e862f1a9b7df4e8196ebec45ac62295138aa3fc2 ]
The ATM code fails to initialize the two padding bytes of struct
sockaddr_atmpvc inserted for alignment. Add an explicit memset(0)
before filling the structure to avoid the info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
net/atm/common.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/atm/common.c b/net/atm/common.c
index 14ff9fe..0ca06e8 100644
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -784,6 +784,7 @@ int vcc_getsockopt(struct socket *sock, int level, int optname,
if (!vcc->dev || !test_bit(ATM_VF_ADDR, &vcc->flags))
return -ENOTCONN;
+ memset(&pvc, 0, sizeof(pvc));
pvc.sap_family = AF_ATMPVC;
pvc.sap_addr.itf = vcc->dev->number;
pvc.sap_addr.vpi = vcc->vpi;

View File

@ -0,0 +1,29 @@
From: Mathias Krause <minipli@googlemail.com>
Date: Wed, 15 Aug 2012 11:31:45 +0000
Subject: atm: fix info leak via getsockname()
[ Upstream commit 3c0c5cfdcd4d69ffc4b9c0907cec99039f30a50a ]
The ATM code fails to initialize the two padding bytes of struct
sockaddr_atmpvc inserted for alignment. Add an explicit memset(0)
before filling the structure to avoid the info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
net/atm/pvc.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/atm/pvc.c b/net/atm/pvc.c
index 3a73491..ae03240 100644
--- a/net/atm/pvc.c
+++ b/net/atm/pvc.c
@@ -95,6 +95,7 @@ static int pvc_getname(struct socket *sock, struct sockaddr *sockaddr,
return -ENOTCONN;
*sockaddr_len = sizeof(struct sockaddr_atmpvc);
addr = (struct sockaddr_atmpvc *)sockaddr;
+ memset(addr, 0, sizeof(*addr));
addr->sap_family = AF_ATMPVC;
addr->sap_addr.itf = vcc->dev->number;
addr->sap_addr.vpi = vcc->vpi;

View File

@ -0,0 +1,33 @@
From: Mathias Krause <minipli@googlemail.com>
Date: Wed, 15 Aug 2012 11:31:46 +0000
Subject: Bluetooth: HCI - Fix info leak in getsockopt(HCI_FILTER)
[ Upstream commit e15ca9a0ef9a86f0477530b0f44a725d67f889ee ]
The HCI code fails to initialize the two padding bytes of struct
hci_ufilter before copying it to userland -- that for leaking two
bytes kernel stack. Add an explicit memset(0) before filling the
structure to avoid the info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
net/bluetooth/hci_sock.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index f6afe3d..e4c8bc0 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -671,6 +671,7 @@ static int hci_sock_getsockopt(struct socket *sock, int level, int optname, char
{
struct hci_filter *f = &hci_pi(sk)->filter;
+ memset(&uf, 0, sizeof(uf));
uf.type_mask = f->type_mask;
uf.opcode = f->opcode;
uf.event_mask[0] = *((u32 *) f->event_mask + 0);

View File

@ -0,0 +1,33 @@
From: Mathias Krause <minipli@googlemail.com>
Date: Wed, 15 Aug 2012 11:31:47 +0000
Subject: Bluetooth: HCI - Fix info leak via getsockname()
[ Upstream commit 3f68ba07b1da811bf383b4b701b129bfcb2e4988 ]
The HCI code fails to initialize the hci_channel member of struct
sockaddr_hci and that for leaks two bytes kernel stack via the
getsockname() syscall. Initialize hci_channel with 0 to avoid the
info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
net/bluetooth/hci_sock.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index e4c8bc0..8361ee4 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -388,6 +388,7 @@ static int hci_sock_getname(struct socket *sock, struct sockaddr *addr, int *add
*addr_len = sizeof(*haddr);
haddr->hci_family = AF_BLUETOOTH;
haddr->hci_dev = hdev->id;
+ haddr->hci_channel= 0;
release_sock(sk);
return 0;

View File

@ -0,0 +1,33 @@
From: Mathias Krause <minipli@googlemail.com>
Date: Wed, 15 Aug 2012 11:31:51 +0000
Subject: Bluetooth: L2CAP - Fix info leak via getsockname()
[ Upstream commit 792039c73cf176c8e39a6e8beef2c94ff46522ed ]
The L2CAP code fails to initialize the l2_bdaddr_type member of struct
sockaddr_l2 and the padding byte added for alignment. It that for leaks
two bytes kernel stack via the getsockname() syscall. Add an explicit
memset(0) before filling the structure to avoid the info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
net/bluetooth/l2cap_sock.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
index 5c406d3..6dedd6f 100644
--- a/net/bluetooth/l2cap_sock.c
+++ b/net/bluetooth/l2cap_sock.c
@@ -293,6 +293,7 @@ static int l2cap_sock_getname(struct socket *sock, struct sockaddr *addr, int *l
BT_DBG("sock %p, sk %p", sock, sk);
+ memset(la, 0, sizeof(struct sockaddr_l2));
addr->sa_family = AF_BLUETOOTH;
*len = sizeof(struct sockaddr_l2);

View File

@ -0,0 +1,33 @@
From: Mathias Krause <minipli@googlemail.com>
Date: Wed, 15 Aug 2012 11:31:48 +0000
Subject: Bluetooth: RFCOMM - Fix info leak in getsockopt(BT_SECURITY)
[ Upstream commit 9ad2de43f1aee7e7274a4e0d41465489299e344b ]
The RFCOMM code fails to initialize the key_size member of struct
bt_security before copying it to userland -- that for leaking one
byte kernel stack. Initialize key_size with 0 to avoid the info
leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
net/bluetooth/rfcomm/sock.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
index 5417f61..03584bc 100644
--- a/net/bluetooth/rfcomm/sock.c
+++ b/net/bluetooth/rfcomm/sock.c
@@ -835,6 +835,7 @@ static int rfcomm_sock_getsockopt(struct socket *sock, int level, int optname, c
}
sec.level = rfcomm_pi(sk)->sec_level;
+ sec.key_size = 0;
len = min_t(unsigned int, len, sizeof(sec));
if (copy_to_user(optval, (char *) &sec, len))

View File

@ -0,0 +1,37 @@
From: Mathias Krause <minipli@googlemail.com>
Date: Wed, 15 Aug 2012 11:31:49 +0000
Subject: Bluetooth: RFCOMM - Fix info leak in ioctl(RFCOMMGETDEVLIST)
[ Upstream commit f9432c5ec8b1e9a09b9b0e5569e3c73db8de432a ]
The RFCOMM code fails to initialize the two padding bytes of struct
rfcomm_dev_list_req inserted for alignment before copying it to
userland. Additionally there are two padding bytes in each instance of
struct rfcomm_dev_info. The ioctl() that for disclosures two bytes plus
dev_num times two bytes uninitialized kernel heap memory.
Allocate the memory using kzalloc() to fix this issue.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
net/bluetooth/rfcomm/tty.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/bluetooth/rfcomm/tty.c b/net/bluetooth/rfcomm/tty.c
index c258796..bc1eb56 100644
--- a/net/bluetooth/rfcomm/tty.c
+++ b/net/bluetooth/rfcomm/tty.c
@@ -471,7 +471,7 @@ static int rfcomm_get_dev_list(void __user *arg)
size = sizeof(*dl) + dev_num * sizeof(*di);
- dl = kmalloc(size, GFP_KERNEL);
+ dl = kzalloc(size, GFP_KERNEL);
if (!dl)
return -ENOMEM;

View File

@ -0,0 +1,33 @@
From: Mathias Krause <minipli@googlemail.com>
Date: Wed, 15 Aug 2012 11:31:50 +0000
Subject: Bluetooth: RFCOMM - Fix info leak via getsockname()
[ Upstream commit 9344a972961d1a6d2c04d9008b13617bcb6ec2ef ]
The RFCOMM code fails to initialize the trailing padding byte of struct
sockaddr_rc added for alignment. It that for leaks one byte kernel stack
via the getsockname() syscall. Add an explicit memset(0) before filling
the structure to avoid the info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
net/bluetooth/rfcomm/sock.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
index 03584bc..7ee4ead 100644
--- a/net/bluetooth/rfcomm/sock.c
+++ b/net/bluetooth/rfcomm/sock.c
@@ -547,6 +547,7 @@ static int rfcomm_sock_getname(struct socket *sock, struct sockaddr *addr, int *
BT_DBG("sock %p, sk %p", sock, sk);
+ memset(sa, 0, sizeof(*sa));
sa->rc_family = AF_BLUETOOTH;
sa->rc_channel = rfcomm_pi(sk)->channel;
if (peer)

View File

@ -0,0 +1,32 @@
From: Mathias Krause <minipli@googlemail.com>
Date: Wed, 15 Aug 2012 11:31:55 +0000
Subject: dccp: fix info leak via getsockopt(DCCP_SOCKOPT_CCID_TX_INFO)
[ Upstream commit 7b07f8eb75aa3097cdfd4f6eac3da49db787381d ]
The CCID3 code fails to initialize the trailing padding bytes of struct
tfrc_tx_info added for alignment on 64 bit architectures. It that for
potentially leaks four bytes kernel stack via the getsockopt() syscall.
Add an explicit memset(0) before filling the structure to avoid the
info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
net/dccp/ccids/ccid3.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index 3d604e1..4caf63f 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -532,6 +532,7 @@ static int ccid3_hc_tx_getsockopt(struct sock *sk, const int optname, int len,
case DCCP_SOCKOPT_CCID_TX_INFO:
if (len < sizeof(tfrc))
return -EINVAL;
+ memset(&tfrc, 0, sizeof(tfrc));
tfrc.tfrctx_x = hc->tx_x;
tfrc.tfrctx_x_recv = hc->tx_x_recv;
tfrc.tfrctx_x_calc = hc->tx_x_calc;

View File

@ -0,0 +1,175 @@
From: Bruce Allan <bruce.w.allan@intel.com>
Date: Fri, 24 Aug 2012 20:38:11 +0000
Subject: e1000e: DoS while TSO enabled caused by link partner with small MSS
commit d821a4c4d11ad160925dab2bb009b8444beff484 upstream.
With a low enough MSS on the link partner and TSO enabled locally, the
networking stack can periodically send a very large (e.g. 64KB) TCP
message for which the driver will attempt to use more Tx descriptors than
are available by default in the Tx ring. This is due to a workaround in
the code that imposes a limit of only 4 MSS-sized segments per descriptor
which appears to be a carry-over from the older e1000 driver and may be
applicable only to some older PCI or PCIx parts which are not supported in
e1000e. When the driver gets a message that is too large to fit across the
configured number of Tx descriptors, it stops the upper stack from queueing
any more and gets stuck in this state. After a timeout, the upper stack
assumes the adapter is hung and calls the driver to reset it.
Remove the unnecessary limitation of using up to only 4 MSS-sized segments
per Tx descriptor, and put in a hard failure test to catch when attempting
to check for message sizes larger than would fit in the whole Tx ring.
Refactor the remaining logic that limits the size of data per Tx descriptor
from a seemingly arbitrary 8KB to a limit based on the dynamic size of the
Tx packet buffer as described in the hardware specification.
Also, fix the logic in the check for space in the Tx ring for the next
largest possible packet after the current one has been successfully queued
for transmit, and use the appropriate defines for default ring sizes in
e1000_probe instead of magic values.
This issue goes back to the introduction of e1000e in 2.6.24 when it was
split off from e1000.
Reported-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Adjust context
- Adjust for use of net_device vs e1000_ring parameter]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
--- a/drivers/net/ethernet/intel/e1000e/e1000.h
+++ b/drivers/net/ethernet/intel/e1000e/e1000.h
@@ -302,6 +302,7 @@ struct e1000_adapter {
*/
struct e1000_ring *tx_ring /* One per active queue */
____cacheline_aligned_in_smp;
+ u32 tx_fifo_limit;
struct napi_struct napi;
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -3386,6 +3386,15 @@ void e1000e_reset(struct e1000_adapter *
}
/*
+ * Alignment of Tx data is on an arbitrary byte boundary with the
+ * maximum size per Tx descriptor limited only to the transmit
+ * allocation of the packet buffer minus 96 bytes with an upper
+ * limit of 24KB due to receive synchronization limitations.
+ */
+ adapter->tx_fifo_limit = min_t(u32, ((er32(PBA) >> 16) << 10) - 96,
+ 24 << 10);
+
+ /*
* Disable Adaptive Interrupt Moderation if 2 full packets cannot
* fit in receive buffer and early-receive not supported.
*/
@@ -4647,13 +4656,9 @@ static bool e1000_tx_csum(struct e1000_a
return 1;
}
-#define E1000_MAX_PER_TXD 8192
-#define E1000_MAX_TXD_PWR 12
-
static int e1000_tx_map(struct e1000_adapter *adapter,
struct sk_buff *skb, unsigned int first,
- unsigned int max_per_txd, unsigned int nr_frags,
- unsigned int mss)
+ unsigned int max_per_txd, unsigned int nr_frags)
{
struct e1000_ring *tx_ring = adapter->tx_ring;
struct pci_dev *pdev = adapter->pdev;
@@ -4882,20 +4887,19 @@ static int e1000_maybe_stop_tx(struct ne
{
struct e1000_adapter *adapter = netdev_priv(netdev);
+ BUG_ON(size > adapter->tx_ring->count);
+
if (e1000_desc_unused(adapter->tx_ring) >= size)
return 0;
return __e1000_maybe_stop_tx(netdev, size);
}
-#define TXD_USE_COUNT(S, X) (((S) >> (X)) + 1 )
static netdev_tx_t e1000_xmit_frame(struct sk_buff *skb,
struct net_device *netdev)
{
struct e1000_adapter *adapter = netdev_priv(netdev);
struct e1000_ring *tx_ring = adapter->tx_ring;
unsigned int first;
- unsigned int max_per_txd = E1000_MAX_PER_TXD;
- unsigned int max_txd_pwr = E1000_MAX_TXD_PWR;
unsigned int tx_flags = 0;
unsigned int len = skb_headlen(skb);
unsigned int nr_frags;
@@ -4915,18 +4919,8 @@ static netdev_tx_t e1000_xmit_frame(stru
}
mss = skb_shinfo(skb)->gso_size;
- /*
- * The controller does a simple calculation to
- * make sure there is enough room in the FIFO before
- * initiating the DMA for each buffer. The calc is:
- * 4 = ceil(buffer len/mss). To make sure we don't
- * overrun the FIFO, adjust the max buffer len if mss
- * drops.
- */
if (mss) {
u8 hdr_len;
- max_per_txd = min(mss << 2, max_per_txd);
- max_txd_pwr = fls(max_per_txd) - 1;
/*
* TSO Workaround for 82571/2/3 Controllers -- if skb->data
@@ -4956,12 +4950,12 @@ static netdev_tx_t e1000_xmit_frame(stru
count++;
count++;
- count += TXD_USE_COUNT(len, max_txd_pwr);
+ count += DIV_ROUND_UP(len, adapter->tx_fifo_limit);
nr_frags = skb_shinfo(skb)->nr_frags;
for (f = 0; f < nr_frags; f++)
- count += TXD_USE_COUNT(skb_frag_size(&skb_shinfo(skb)->frags[f]),
- max_txd_pwr);
+ count += DIV_ROUND_UP(skb_frag_size(&skb_shinfo(skb)->frags[f]),
+ adapter->tx_fifo_limit);
if (adapter->hw.mac.tx_pkt_filtering)
e1000_transfer_dhcp_info(adapter, skb);
@@ -5000,13 +4994,16 @@ static netdev_tx_t e1000_xmit_frame(stru
tx_flags |= E1000_TX_FLAGS_IPV4;
/* if count is 0 then mapping error has occurred */
- count = e1000_tx_map(adapter, skb, first, max_per_txd, nr_frags, mss);
+ count = e1000_tx_map(adapter, skb, first, adapter->tx_fifo_limit,
+ nr_frags);
if (count) {
netdev_sent_queue(netdev, skb->len);
e1000_tx_queue(adapter, tx_flags, count);
/* Make sure there is space in the ring for the next send. */
- e1000_maybe_stop_tx(netdev, MAX_SKB_FRAGS + 2);
-
+ e1000_maybe_stop_tx(netdev,
+ (MAX_SKB_FRAGS *
+ DIV_ROUND_UP(PAGE_SIZE,
+ adapter->tx_fifo_limit) + 2));
} else {
dev_kfree_skb_any(skb);
tx_ring->buffer_info[first].time_stamp = 0;
@@ -6150,8 +6147,8 @@ static int __devinit e1000_probe(struct
adapter->hw.phy.autoneg_advertised = 0x2f;
/* ring size defaults */
- adapter->rx_ring->count = 256;
- adapter->tx_ring->count = 256;
+ adapter->rx_ring->count = E1000_DEFAULT_RXD;
+ adapter->tx_ring->count = E1000_DEFAULT_TXD;
/*
* Initial Wake on LAN setting - If APM wake is enabled in

View File

@ -0,0 +1,34 @@
From: Mathias Krause <minipli@googlemail.com>
Date: Wed, 15 Aug 2012 11:31:56 +0000
Subject: ipvs: fix info leak in getsockopt(IP_VS_SO_GET_TIMEOUT)
[ Upstream commit 2d8a041b7bfe1097af21441cb77d6af95f4f4680 ]
If at least one of CONFIG_IP_VS_PROTO_TCP or CONFIG_IP_VS_PROTO_UDP is
not set, __ip_vs_get_timeouts() does not fully initialize the structure
that gets copied to userland and that for leaks up to 12 bytes of kernel
stack. Add an explicit memset(0) before passing the structure to
__ip_vs_get_timeouts() to avoid the info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Wensong Zhang <wensong@linux-vs.org>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
net/netfilter/ipvs/ip_vs_ctl.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index e1a66cf..72f4253 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -2713,6 +2713,7 @@ do_ip_vs_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
{
struct ip_vs_timeout_user t;
+ memset(&t, 0, sizeof(t));
__ip_vs_get_timeouts(net, &t);
if (copy_to_user(user, &t, sizeof(t)) != 0)
ret = -EFAULT;

View File

@ -0,0 +1,44 @@
From: Mathias Krause <minipli@googlemail.com>
Date: Wed, 15 Aug 2012 11:31:53 +0000
Subject: llc: fix info leak via getsockname()
[ Upstream commit 3592aaeb80290bda0f2cf0b5456c97bfc638b192 ]
The LLC code wrongly returns 0, i.e. "success", when the socket is
zapped. Together with the uninitialized uaddrlen pointer argument from
sys_getsockname this leads to an arbitrary memory leak of up to 128
bytes kernel stack via the getsockname() syscall.
Return an error instead when the socket is zapped to prevent the info
leak. Also remove the unnecessary memset(0). We don't directly write to
the memory pointed by uaddr but memcpy() a local structure at the end of
the function that is properly initialized.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
net/llc/af_llc.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index a18e6c3..99a60d5 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -966,14 +966,13 @@ static int llc_ui_getname(struct socket *sock, struct sockaddr *uaddr,
struct sockaddr_llc sllc;
struct sock *sk = sock->sk;
struct llc_sock *llc = llc_sk(sk);
- int rc = 0;
+ int rc = -EBADF;
memset(&sllc, 0, sizeof(sllc));
lock_sock(sk);
if (sock_flag(sk, SOCK_ZAPPED))
goto out;
*uaddrlen = sizeof(sllc);
- memset(uaddr, 0, *uaddrlen);
if (peer) {
rc = -ENOTCONN;
if (sk->sk_state != TCP_ESTABLISHED)

View File

@ -0,0 +1,69 @@
From: Ben Hutchings <bhutchings@solarflare.com>
Date: Mon, 30 Jul 2012 15:57:00 +0000
Subject: net: Allow driver to limit number of GSO segments per skb
[ Upstream commit 30b678d844af3305cda5953467005cebb5d7b687 ]
A peer (or local user) may cause TCP to use a nominal MSS of as little
as 88 (actual MSS of 76 with timestamps). Given that we have a
sufficiently prodigious local sender and the peer ACKs quickly enough,
it is nevertheless possible to grow the window for such a connection
to the point that we will try to send just under 64K at once. This
results in a single skb that expands to 861 segments.
In some drivers with TSO support, such an skb will require hundreds of
DMA descriptors; a substantial fraction of a TX ring or even more than
a full ring. The TX queue selected for the skb may stall and trigger
the TX watchdog repeatedly (since the problem skb will be retried
after the TX reset). This particularly affects sfc, for which the
issue is designated as CVE-2012-3412.
Therefore:
1. Add the field net_device::gso_max_segs holding the device-specific
limit.
2. In netif_skb_features(), if the number of segments is too high then
mask out GSO features to force fall back to software GSO.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
include/linux/netdevice.h | 2 ++
net/core/dev.c | 4 ++++
2 files changed, 6 insertions(+)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index cb52340..d178fb8 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1299,6 +1299,8 @@ struct net_device {
/* for setting kernel sock attribute on TCP connection setup */
#define GSO_MAX_SIZE 65536
unsigned int gso_max_size;
+#define GSO_MAX_SEGS 65535
+ u16 gso_max_segs;
#ifdef CONFIG_DCB
/* Data Center Bridging netlink ops */
diff --git a/net/core/dev.c b/net/core/dev.c
index 4b18703..e6f2635 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2108,6 +2108,9 @@ u32 netif_skb_features(struct sk_buff *skb)
__be16 protocol = skb->protocol;
u32 features = skb->dev->features;
+ if (skb_shinfo(skb)->gso_segs > skb->dev->gso_max_segs)
+ features &= ~NETIF_F_GSO_MASK;
+
if (protocol == htons(ETH_P_8021Q)) {
struct vlan_ethhdr *veh = (struct vlan_ethhdr *)skb->data;
protocol = veh->h_vlan_encapsulated_proto;
@@ -5990,6 +5993,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
dev_net_set(dev, &init_net);
dev->gso_max_size = GSO_MAX_SIZE;
+ dev->gso_max_segs = GSO_MAX_SEGS;
INIT_LIST_HEAD(&dev->napi_list);
INIT_LIST_HEAD(&dev->unreg_list);

View File

@ -0,0 +1,31 @@
From: Mathias Krause <minipli@googlemail.com>
Date: Wed, 15 Aug 2012 11:31:57 +0000
Subject: net: fix info leak in compat dev_ifconf()
[ Upstream commit 43da5f2e0d0c69ded3d51907d9552310a6b545e8 ]
The implementation of dev_ifconf() for the compat ioctl interface uses
an intermediate ifc structure allocated in userland for the duration of
the syscall. Though, it fails to initialize the padding bytes inserted
for alignment and that for leaks four bytes of kernel stack. Add an
explicit memset(0) before filling the structure to avoid the info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
net/socket.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/socket.c b/net/socket.c
index 273cbce..68879db 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2645,6 +2645,7 @@ static int dev_ifconf(struct net *net, struct compat_ifconf __user *uifc32)
if (copy_from_user(&ifc32, uifc32, sizeof(struct compat_ifconf)))
return -EFAULT;
+ memset(&ifc, 0, sizeof(ifc));
if (ifc32.ifcbuf == 0) {
ifc32.ifc_len = 0;
ifc.ifc_len = 0;

View File

@ -0,0 +1,82 @@
From: Francesco Ruggeri <fruggeri@aristanetworks.com>
Date: Fri, 24 Aug 2012 07:38:35 +0000
Subject: net: ipv4: ipmr_expire_timer causes crash when removing net namespace
[ Upstream commit acbb219d5f53821b2d0080d047800410c0420ea1 ]
When tearing down a net namespace, ipv4 mr_table structures are freed
without first deactivating their timers. This can result in a crash in
run_timer_softirq.
This patch mimics the corresponding behaviour in ipv6.
Locking and synchronization seem to be adequate.
We are about to kfree mrt, so existing code should already make sure that
no other references to mrt are pending or can be created by incoming traffic.
The functions invoked here do not cause new references to mrt or other
race conditions to be created.
Invoking del_timer_sync guarantees that ipmr_expire_timer is inactive.
Both ipmr_expire_process (whose completion we may have to wait in
del_timer_sync) and mroute_clean_tables internally use mfc_unres_lock
or other synchronizations when needed, and they both only modify mrt.
Tested in Linux 3.4.8.
Signed-off-by: Francesco Ruggeri <fruggeri@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
net/ipv4/ipmr.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index d2aae27..0064394 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -125,6 +125,8 @@ static DEFINE_SPINLOCK(mfc_unres_lock);
static struct kmem_cache *mrt_cachep __read_mostly;
static struct mr_table *ipmr_new_table(struct net *net, u32 id);
+static void ipmr_free_table(struct mr_table *mrt);
+
static int ip_mr_forward(struct net *net, struct mr_table *mrt,
struct sk_buff *skb, struct mfc_cache *cache,
int local);
@@ -132,6 +134,7 @@ static int ipmr_cache_report(struct mr_table *mrt,
struct sk_buff *pkt, vifi_t vifi, int assert);
static int __ipmr_fill_mroute(struct mr_table *mrt, struct sk_buff *skb,
struct mfc_cache *c, struct rtmsg *rtm);
+static void mroute_clean_tables(struct mr_table *mrt);
static void ipmr_expire_process(unsigned long arg);
#ifdef CONFIG_IP_MROUTE_MULTIPLE_TABLES
@@ -272,7 +275,7 @@ static void __net_exit ipmr_rules_exit(struct net *net)
list_for_each_entry_safe(mrt, next, &net->ipv4.mr_tables, list) {
list_del(&mrt->list);
- kfree(mrt);
+ ipmr_free_table(mrt);
}
fib_rules_unregister(net->ipv4.mr_rules_ops);
}
@@ -300,7 +303,7 @@ static int __net_init ipmr_rules_init(struct net *net)
static void __net_exit ipmr_rules_exit(struct net *net)
{
- kfree(net->ipv4.mrt);
+ ipmr_free_table(net->ipv4.mrt);
}
#endif
@@ -337,6 +340,13 @@ static struct mr_table *ipmr_new_table(struct net *net, u32 id)
return mrt;
}
+static void ipmr_free_table(struct mr_table *mrt)
+{
+ del_timer_sync(&mrt->ipmr_expire_timer);
+ mroute_clean_tables(mrt);
+ kfree(mrt);
+}
+
/* Service routines creating virtual interfaces: DVMRP tunnels and PIMREG */
static void ipmr_del_tunnel(struct net_device *dev, struct vifctl *v)

View File

@ -0,0 +1,66 @@
From: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Date: Fri, 3 Aug 2012 19:57:52 +0900
Subject: net_sched: gact: Fix potential panic in tcf_gact().
[ Upstream commit 696ecdc10622d86541f2e35cc16e15b6b3b1b67e ]
gact_rand array is accessed by gact->tcfg_ptype whose value
is assumed to less than MAX_RAND, but any range checks are
not performed.
So add a check in tcf_gact_init(). And in tcf_gact(), we can
reduce a branch.
Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
net/sched/act_gact.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
index b77f5a0..bdacd8d 100644
--- a/net/sched/act_gact.c
+++ b/net/sched/act_gact.c
@@ -67,6 +67,9 @@ static int tcf_gact_init(struct nlattr *nla, struct nlattr *est,
struct tcf_common *pc;
int ret = 0;
int err;
+#ifdef CONFIG_GACT_PROB
+ struct tc_gact_p *p_parm = NULL;
+#endif
if (nla == NULL)
return -EINVAL;
@@ -82,6 +85,12 @@ static int tcf_gact_init(struct nlattr *nla, struct nlattr *est,
#ifndef CONFIG_GACT_PROB
if (tb[TCA_GACT_PROB] != NULL)
return -EOPNOTSUPP;
+#else
+ if (tb[TCA_GACT_PROB]) {
+ p_parm = nla_data(tb[TCA_GACT_PROB]);
+ if (p_parm->ptype >= MAX_RAND)
+ return -EINVAL;
+ }
#endif
pc = tcf_hash_check(parm->index, a, bind, &gact_hash_info);
@@ -103,8 +112,7 @@ static int tcf_gact_init(struct nlattr *nla, struct nlattr *est,
spin_lock_bh(&gact->tcf_lock);
gact->tcf_action = parm->action;
#ifdef CONFIG_GACT_PROB
- if (tb[TCA_GACT_PROB] != NULL) {
- struct tc_gact_p *p_parm = nla_data(tb[TCA_GACT_PROB]);
+ if (p_parm) {
gact->tcfg_paction = p_parm->paction;
gact->tcfg_pval = p_parm->pval;
gact->tcfg_ptype = p_parm->ptype;
@@ -133,7 +141,7 @@ static int tcf_gact(struct sk_buff *skb, const struct tc_action *a,
spin_lock(&gact->tcf_lock);
#ifdef CONFIG_GACT_PROB
- if (gact->tcfg_ptype && gact_rand[gact->tcfg_ptype] != NULL)
+ if (gact->tcfg_ptype)
action = gact_rand[gact->tcfg_ptype](gact);
else
action = gact->tcf_action;

View File

@ -0,0 +1,72 @@
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Thu, 23 Aug 2012 02:09:11 +0000
Subject: netlink: fix possible spoofing from non-root processes
[ Upstream commit 20e1db19db5d6b9e4e83021595eab0dc8f107bef ]
Non-root user-space processes can send Netlink messages to other
processes that are well-known for being subscribed to Netlink
asynchronous notifications. This allows ilegitimate non-root
process to send forged messages to Netlink subscribers.
The userspace process usually verifies the legitimate origin in
two ways:
a) Socket credentials. If UID != 0, then the message comes from
some ilegitimate process and the message needs to be dropped.
b) Netlink portID. In general, portID == 0 means that the origin
of the messages comes from the kernel. Thus, discarding any
message not coming from the kernel.
However, ctnetlink sets the portID in event messages that has
been triggered by some user-space process, eg. conntrack utility.
So other processes subscribed to ctnetlink events, eg. conntrackd,
know that the event was triggered by some user-space action.
Neither of the two ways to discard ilegitimate messages coming
from non-root processes can help for ctnetlink.
This patch adds capability validation in case that dst_pid is set
in netlink_sendmsg(). This approach is aggressive since existing
applications using any Netlink bus to deliver messages between
two user-space processes will break. Note that the exception is
NETLINK_USERSOCK, since it is reserved for netlink-to-netlink
userspace communication.
Still, if anyone wants that his Netlink bus allows netlink-to-netlink
userspace, then they can set NL_NONROOT_SEND. However, by default,
I don't think it makes sense to allow to use NETLINK_ROUTE to
communicate two processes that are sending no matter what information
that is not related to link/neighbouring/routing. They should be using
NETLINK_USERSOCK instead for that.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
net/netlink/af_netlink.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 1af8542..38b78b9 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1344,7 +1344,8 @@ static int netlink_sendmsg(struct kiocb *kiocb, struct socket *sock,
dst_pid = addr->nl_pid;
dst_group = ffs(addr->nl_groups);
err = -EPERM;
- if (dst_group && !netlink_capable(sock, NL_NONROOT_SEND))
+ if ((dst_group || dst_pid) &&
+ !netlink_capable(sock, NL_NONROOT_SEND))
goto out;
} else {
dst_pid = nlk->dst_pid;
@@ -2103,6 +2104,7 @@ static void __init netlink_add_usersock_entry(void)
rcu_assign_pointer(nl_table[NETLINK_USERSOCK].listeners, listeners);
nl_table[NETLINK_USERSOCK].module = THIS_MODULE;
nl_table[NETLINK_USERSOCK].registered = 1;
+ nl_table[NETLINK_USERSOCK].nl_nonroot = NL_NONROOT_SEND;
netlink_table_ungrab();
}

View File

@ -0,0 +1,40 @@
From: Dave Jones <davej@redhat.com>
Date: Thu, 6 Sep 2012 12:01:00 -0400
Subject: Remove user-triggerable BUG from mpol_to_str
commit 80de7c3138ee9fd86a98696fd2cf7ad89b995d0a upstream.
Trivially triggerable, found by trinity:
kernel BUG at mm/mempolicy.c:2546!
Process trinity-child2 (pid: 23988, threadinfo ffff88010197e000, task ffff88007821a670)
Call Trace:
show_numa_map+0xd5/0x450
show_pid_numa_map+0x13/0x20
traverse+0xf2/0x230
seq_read+0x34b/0x3e0
vfs_read+0xac/0x180
sys_pread64+0xa2/0xc0
system_call_fastpath+0x1a/0x1f
RIP: mpol_to_str+0x156/0x360
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
mm/mempolicy.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index bd92431..4ada3be 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2562,7 +2562,7 @@ int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol, int no_context)
break;
default:
- BUG();
+ return -EINVAL;
}
l = strlen(policy_modes[mode]);

View File

@ -0,0 +1,171 @@
From: Ben Hutchings <bhutchings@solarflare.com>
Date: Mon, 30 Jul 2012 15:57:44 +0000
Subject: sfc: Fix maximum number of TSO segments and minimum TX queue size
[ Upstream commit 7e6d06f0de3f74ca929441add094518ae332257c ]
Currently an skb requiring TSO may not fit within a minimum-size TX
queue. The TX queue selected for the skb may stall and trigger the TX
watchdog repeatedly (since the problem skb will be retried after the
TX reset). This issue is designated as CVE-2012-3412.
Set the maximum number of TSO segments for our devices to 100. This
should make no difference to behaviour unless the actual MSS is less
than about 700. Increase the minimum TX queue size accordingly to
allow for 2 worst-case skbs, so that there will definitely be space
to add an skb after we wake a queue.
To avoid invalidating existing configurations, change
efx_ethtool_set_ringparam() to fix up values that are too small rather
than returning -EINVAL.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
drivers/net/ethernet/sfc/efx.c | 6 ++++++
drivers/net/ethernet/sfc/efx.h | 14 ++++++++++----
drivers/net/ethernet/sfc/ethtool.c | 16 +++++++++++-----
drivers/net/ethernet/sfc/nic.h | 5 +++++
drivers/net/ethernet/sfc/tx.c | 19 +++++++++++++++++++
5 files changed, 51 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index d5731f1..a6611f1 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -1383,6 +1383,11 @@ static int efx_probe_all(struct efx_nic *efx)
goto fail2;
}
+ BUILD_BUG_ON(EFX_DEFAULT_DMAQ_SIZE < EFX_RXQ_MIN_ENT);
+ if (WARN_ON(EFX_DEFAULT_DMAQ_SIZE < EFX_TXQ_MIN_ENT(efx))) {
+ rc = -EINVAL;
+ goto fail3;
+ }
efx->rxq_entries = efx->txq_entries = EFX_DEFAULT_DMAQ_SIZE;
rc = efx_probe_channels(efx);
if (rc)
@@ -1973,6 +1978,7 @@ static int efx_register_netdev(struct efx_nic *efx)
net_dev->irq = efx->pci_dev->irq;
net_dev->netdev_ops = &efx_netdev_ops;
SET_ETHTOOL_OPS(net_dev, &efx_ethtool_ops);
+ net_dev->gso_max_segs = EFX_TSO_MAX_SEGS;
/* Clear MAC statistics */
efx->mac_op->update_stats(efx);
diff --git a/drivers/net/ethernet/sfc/efx.h b/drivers/net/ethernet/sfc/efx.h
index 4764793..1355245 100644
--- a/drivers/net/ethernet/sfc/efx.h
+++ b/drivers/net/ethernet/sfc/efx.h
@@ -34,6 +34,7 @@ extern netdev_tx_t
efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb);
extern void efx_xmit_done(struct efx_tx_queue *tx_queue, unsigned int index);
extern int efx_setup_tc(struct net_device *net_dev, u8 num_tc);
+extern unsigned int efx_tx_max_skb_descs(struct efx_nic *efx);
/* RX */
extern int efx_probe_rx_queue(struct efx_rx_queue *rx_queue);
@@ -56,10 +57,15 @@ extern void efx_schedule_slow_fill(struct efx_rx_queue *rx_queue);
#define EFX_MAX_EVQ_SIZE 16384UL
#define EFX_MIN_EVQ_SIZE 512UL
-/* The smallest [rt]xq_entries that the driver supports. Callers of
- * efx_wake_queue() assume that they can subsequently send at least one
- * skb. Falcon/A1 may require up to three descriptors per skb_frag. */
-#define EFX_MIN_RING_SIZE (roundup_pow_of_two(2 * 3 * MAX_SKB_FRAGS))
+/* Maximum number of TCP segments we support for soft-TSO */
+#define EFX_TSO_MAX_SEGS 100
+
+/* The smallest [rt]xq_entries that the driver supports. RX minimum
+ * is a bit arbitrary. For TX, we must have space for at least 2
+ * TSO skbs.
+ */
+#define EFX_RXQ_MIN_ENT 128U
+#define EFX_TXQ_MIN_ENT(efx) (2 * efx_tx_max_skb_descs(efx))
/* Filters */
extern int efx_probe_filters(struct efx_nic *efx);
diff --git a/drivers/net/ethernet/sfc/ethtool.c b/drivers/net/ethernet/sfc/ethtool.c
index f3cd96d..90158c9 100644
--- a/drivers/net/ethernet/sfc/ethtool.c
+++ b/drivers/net/ethernet/sfc/ethtool.c
@@ -690,21 +690,27 @@ static int efx_ethtool_set_ringparam(struct net_device *net_dev,
struct ethtool_ringparam *ring)
{
struct efx_nic *efx = netdev_priv(net_dev);
+ u32 txq_entries;
if (ring->rx_mini_pending || ring->rx_jumbo_pending ||
ring->rx_pending > EFX_MAX_DMAQ_SIZE ||
ring->tx_pending > EFX_MAX_DMAQ_SIZE)
return -EINVAL;
- if (ring->rx_pending < EFX_MIN_RING_SIZE ||
- ring->tx_pending < EFX_MIN_RING_SIZE) {
+ if (ring->rx_pending < EFX_RXQ_MIN_ENT) {
netif_err(efx, drv, efx->net_dev,
- "TX and RX queues cannot be smaller than %ld\n",
- EFX_MIN_RING_SIZE);
+ "RX queues cannot be smaller than %u\n",
+ EFX_RXQ_MIN_ENT);
return -EINVAL;
}
- return efx_realloc_channels(efx, ring->rx_pending, ring->tx_pending);
+ txq_entries = max(ring->tx_pending, EFX_TXQ_MIN_ENT(efx));
+ if (txq_entries != ring->tx_pending)
+ netif_warn(efx, drv, efx->net_dev,
+ "increasing TX queue size to minimum of %u\n",
+ txq_entries);
+
+ return efx_realloc_channels(efx, ring->rx_pending, txq_entries);
}
static int efx_ethtool_set_pauseparam(struct net_device *net_dev,
diff --git a/drivers/net/ethernet/sfc/nic.h b/drivers/net/ethernet/sfc/nic.h
index 5fb24d3..392e36e 100644
--- a/drivers/net/ethernet/sfc/nic.h
+++ b/drivers/net/ethernet/sfc/nic.h
@@ -65,6 +65,11 @@ enum {
#define FALCON_GMAC_LOOPBACKS \
(1 << LOOPBACK_GMAC)
+/* Alignment of PCIe DMA boundaries (4KB) */
+#define EFX_PAGE_SIZE 4096
+/* Size and alignment of buffer table entries (same) */
+#define EFX_BUF_SIZE EFX_PAGE_SIZE
+
/**
* struct falcon_board_type - board operations and type information
* @id: Board type id, as found in NVRAM
diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c
index df88c543..807d515 100644
--- a/drivers/net/ethernet/sfc/tx.c
+++ b/drivers/net/ethernet/sfc/tx.c
@@ -115,6 +115,25 @@ efx_max_tx_len(struct efx_nic *efx, dma_addr_t dma_addr)
return len;
}
+unsigned int efx_tx_max_skb_descs(struct efx_nic *efx)
+{
+ /* Header and payload descriptor for each output segment, plus
+ * one for every input fragment boundary within a segment
+ */
+ unsigned int max_descs = EFX_TSO_MAX_SEGS * 2 + MAX_SKB_FRAGS;
+
+ /* Possibly one more per segment for the alignment workaround */
+ if (EFX_WORKAROUND_5391(efx))
+ max_descs += EFX_TSO_MAX_SEGS;
+
+ /* Possibly more for PCIe page boundaries within input fragments */
+ if (PAGE_SIZE > EFX_PAGE_SIZE)
+ max_descs += max_t(unsigned int, MAX_SKB_FRAGS,
+ DIV_ROUND_UP(GSO_MAX_SIZE, EFX_PAGE_SIZE));
+
+ return max_descs;
+}
+
/*
* Add a socket buffer to a TX queue
*

View File

@ -0,0 +1,136 @@
From: Ben Hutchings <bhutchings@solarflare.com>
Date: Mon, 30 Jul 2012 16:11:42 +0000
Subject: tcp: Apply device TSO segment limit earlier
[ Upstream commit 1485348d2424e1131ea42efc033cbd9366462b01 ]
Cache the device gso_max_segs in sock::sk_gso_max_segs and use it to
limit the size of TSO skbs. This avoids the need to fall back to
software GSO for local TCP senders.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
include/net/sock.h | 2 ++
net/core/sock.c | 1 +
net/ipv4/tcp.c | 4 +++-
net/ipv4/tcp_cong.c | 3 ++-
net/ipv4/tcp_output.c | 21 ++++++++++++---------
5 files changed, 20 insertions(+), 11 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h
index 32e3937..ddf523c 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -194,6 +194,7 @@ struct sock_common {
* @sk_route_nocaps: forbidden route capabilities (e.g NETIF_F_GSO_MASK)
* @sk_gso_type: GSO type (e.g. %SKB_GSO_TCPV4)
* @sk_gso_max_size: Maximum GSO segment size to build
+ * @sk_gso_max_segs: Maximum number of GSO segments
* @sk_lingertime: %SO_LINGER l_linger setting
* @sk_backlog: always used with the per-socket spinlock held
* @sk_callback_lock: used with the callbacks in the end of this struct
@@ -310,6 +311,7 @@ struct sock {
int sk_route_nocaps;
int sk_gso_type;
unsigned int sk_gso_max_size;
+ u16 sk_gso_max_segs;
int sk_rcvlowat;
unsigned long sk_lingertime;
struct sk_buff_head sk_error_queue;
diff --git a/net/core/sock.c b/net/core/sock.c
index 8d095b9..018fd41 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1308,6 +1308,7 @@ void sk_setup_caps(struct sock *sk, struct dst_entry *dst)
} else {
sk->sk_route_caps |= NETIF_F_SG | NETIF_F_HW_CSUM;
sk->sk_gso_max_size = dst->dev->gso_max_size;
+ sk->sk_gso_max_segs = dst->dev->gso_max_segs;
}
}
}
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index ad466a7..043d49b 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -740,7 +740,9 @@ static unsigned int tcp_xmit_size_goal(struct sock *sk, u32 mss_now,
old_size_goal + mss_now > xmit_size_goal)) {
xmit_size_goal = old_size_goal;
} else {
- tp->xmit_size_goal_segs = xmit_size_goal / mss_now;
+ tp->xmit_size_goal_segs =
+ min_t(u16, xmit_size_goal / mss_now,
+ sk->sk_gso_max_segs);
xmit_size_goal = tp->xmit_size_goal_segs * mss_now;
}
}
diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index 850c737..6cebfd2 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -290,7 +290,8 @@ int tcp_is_cwnd_limited(const struct sock *sk, u32 in_flight)
left = tp->snd_cwnd - in_flight;
if (sk_can_gso(sk) &&
left * sysctl_tcp_tso_win_divisor < tp->snd_cwnd &&
- left * tp->mss_cache < sk->sk_gso_max_size)
+ left * tp->mss_cache < sk->sk_gso_max_size &&
+ left < sk->sk_gso_max_segs)
return 1;
return left <= tcp_max_burst(tp);
}
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index c51dd5b..921cbac 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1318,21 +1318,21 @@ static void tcp_cwnd_validate(struct sock *sk)
* when we would be allowed to send the split-due-to-Nagle skb fully.
*/
static unsigned int tcp_mss_split_point(const struct sock *sk, const struct sk_buff *skb,
- unsigned int mss_now, unsigned int cwnd)
+ unsigned int mss_now, unsigned int max_segs)
{
const struct tcp_sock *tp = tcp_sk(sk);
- u32 needed, window, cwnd_len;
+ u32 needed, window, max_len;
window = tcp_wnd_end(tp) - TCP_SKB_CB(skb)->seq;
- cwnd_len = mss_now * cwnd;
+ max_len = mss_now * max_segs;
- if (likely(cwnd_len <= window && skb != tcp_write_queue_tail(sk)))
- return cwnd_len;
+ if (likely(max_len <= window && skb != tcp_write_queue_tail(sk)))
+ return max_len;
needed = min(skb->len, window);
- if (cwnd_len <= needed)
- return cwnd_len;
+ if (max_len <= needed)
+ return max_len;
return needed - needed % mss_now;
}
@@ -1560,7 +1560,8 @@ static int tcp_tso_should_defer(struct sock *sk, struct sk_buff *skb)
limit = min(send_win, cong_win);
/* If a full-sized TSO skb can be sent, do it. */
- if (limit >= sk->sk_gso_max_size)
+ if (limit >= min_t(unsigned int, sk->sk_gso_max_size,
+ sk->sk_gso_max_segs * tp->mss_cache))
goto send_now;
/* Middle in queue won't get any more data, full sendable already? */
@@ -1786,7 +1787,9 @@ static int tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
limit = mss_now;
if (tso_segs > 1 && !tcp_urg_mode(tp))
limit = tcp_mss_split_point(sk, skb, mss_now,
- cwnd_quota);
+ min_t(unsigned int,
+ cwnd_quota,
+ sk->sk_gso_max_segs));
if (skb->len > limit &&
unlikely(tso_fragment(sk, skb, limit, mss_now, gfp)))

25
debian/patches/series vendored
View File

@ -396,3 +396,28 @@ bugfix/alpha/alpha-use-large-data-model.diff
bugfix/x86/drm-i915-i8xx-interrupt-handler.patch
features/arm/ahci-Add-JMicron-362-device-IDs.patch
bugfix/all/speakup-lower-default-software-speech-rate.patch
# These were all picked from the 3.2.30 patch queue
bugfix/all/e1000e-dos-while-tso-enabled-caused-by-link-partner-with-small-mss.patch
bugfix/all/remove-user-triggerable-bug-from-mpol_to_str.patch
bugfix/all/net-allow-driver-to-limit-number-of-gso-segments-per-skb.patch
bugfix/all/sfc-fix-maximum-number-of-tso-segments-and-minimum-tx-queue-size.patch
bugfix/all/tcp-apply-device-tso-segment-limit-earlier.patch
bugfix/all/net_sched-gact-fix-potential-panic-in-tcf_gact.patch
bugfix/all/af_packet-remove-bug-statement-in-tpacket_destruct_skb.patch
bugfix/all/atm-fix-info-leak-in-getsockopt-so_atmpvc.patch
bugfix/all/atm-fix-info-leak-via-getsockname.patch
bugfix/all/bluetooth-hci-fix-info-leak-in-getsockopt-hci_filter.patch
bugfix/all/bluetooth-hci-fix-info-leak-via-getsockname.patch
bugfix/all/bluetooth-rfcomm-fix-info-leak-in-getsockopt-bt_security.patch
bugfix/all/bluetooth-rfcomm-fix-info-leak-in-ioctl-rfcommgetdevlist.patch
bugfix/all/bluetooth-rfcomm-fix-info-leak-via-getsockname.patch
bugfix/all/bluetooth-l2cap-fix-info-leak-via-getsockname.patch
bugfix/all/llc-fix-info-leak-via-getsockname.patch
bugfix/all/dccp-fix-info-leak-via-getsockopt-dccp_sockopt_ccid_tx_info.patch
bugfix/all/ipvs-fix-info-leak-in-getsockopt-ip_vs_so_get_timeout.patch
bugfix/all/net-fix-info-leak-in-compat-dev_ifconf.patch
bugfix/all/af_packet-don-t-emit-packet-on-orig-fanout-group.patch
bugfix/all/af_netlink-force-credentials-passing.patch
bugfix/all/netlink-fix-possible-spoofing-from-non-root-processes.patch
bugfix/all/net-ipv4-ipmr_expire_timer-causes-crash-when-removing-net-namespace.patch