aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMichael Bommarito <michael.bommarito@gmail.com>2026-05-13 13:53:24 -0400
committerJason Gunthorpe <jgg@nvidia.com>2026-05-14 15:30:54 -0300
commit0ce1bc9e46ecabe84772bb561e373c0d9876d6f2 (patch)
tree3c2507971276b16ed0905e76d42b99ae1b510b41
parentf6b079629becfa977f9c51fe53ad2e6dcc55ef44 (diff)
RDMA/siw: Reject MPA FPDU length underflow before signed receive math
A malicious connected siw peer can send an iWARP FPDU whose MPA length field (c_hdr->mpa_len, 16 bit big-endian, peer-controlled) is smaller than the fixed DDP/RDMAP header for the announced opcode. Soft-iWARP parses the full header in siw_get_hdr() based on iwarp_pktinfo[opcode] .hdr_len, but never compares mpa_len against that header length. siw_tcp_rx_data() then derives srx->fpdu_part_rem = be16_to_cpu(mpa_len) - fpdu_part_rcvd + MPA_HDR_SIZE; where fpdu_part_rcvd equals iwarp_pktinfo[opcode].hdr_len at this point. For a tagged WRITE (hdr_len 16, MPA_HDR_SIZE 2) the smallest on-wire mpa_len of 0 yields fpdu_part_rem = -14, and any mpa_len below hdr_len - MPA_HDR_SIZE underflows to a negative int. The signed value then flows into siw_proc_write()/siw_proc_rresp() as bytes = min(srx->fpdu_part_rem, srx->skb_new); is handed to siw_check_mem() as an int len (whose interval check addr + len > mem->va + mem->len is satisfied for a valid base when len is negative), and reaches siw_rx_data() -> siw_rx_kva() / siw_rx_umem() -> skb_copy_bits() as a signed copy length. The header copy branch in skb_copy_bits() promotes that to size_t, producing a multi-gigabyte read. KASAN under a KUnit harness that drives the real kernel TCP receive path -- a loopback AF_INET socketpair, the malformed FPDU written via kernel_sendmsg, sk_data_ready firing in softirq, tcp_read_sock dispatching to siw_tcp_rx_data -- reports: BUG: KASAN: use-after-free in skb_copy_bits+0x284/0x480 Read of size 4294967295 at addr ffff888... Call Trace: skb_copy_bits siw_rx_kva siw_rx_data siw_check_mem siw_proc_write siw_tcp_rx_data __tcp_read_sock siw_qp_llp_data_ready tcp_data_ready tcp_data_queue Add the missing invariant at the earliest point where the peer header is fully assembled. iwarp_pktinfo[*].hdr_len - MPA_HDR_SIZE is exactly the value the siw transmitter uses as the minimum mpa_len for each opcode (drivers/infiniband/sw/siw/siw_qp.c:33), so this matches the protocol contract. Out-of-range FPDUs terminate the connection with TERM_ERROR_LAYER_LLP / LLP_ETYPE_MPA / LLP_ECODE_FPDU_START -- which is RFC 5044 Section 8 error code 3 ("Marker and ULPDU Length fields do not agree on the start of an FPDU"), the correct framing-error class for this inconsistency. Fixes: 8b6a361b8c48 ("rdma/siw: receive path") Link: https://patch.msgid.link/r/20260513175325.2042630-2-michael.bommarito@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Assisted-by: Claude:claude-opus-4-7 Acked-by: Bernard Metzler <bernard.metzler@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-rw-r--r--drivers/infiniband/sw/siw/siw_qp_rx.c15
1 files changed, 15 insertions, 0 deletions
diff --git a/drivers/infiniband/sw/siw/siw_qp_rx.c b/drivers/infiniband/sw/siw/siw_qp_rx.c
index e8a88b378d51..34d03584160c 100644
--- a/drivers/infiniband/sw/siw/siw_qp_rx.c
+++ b/drivers/infiniband/sw/siw/siw_qp_rx.c
@@ -1082,6 +1082,21 @@ static int siw_get_hdr(struct siw_rx_stream *srx)
}
/*
+ * Peer-controlled mpa_len must not underflow srx->fpdu_part_rem
+ * in siw_tcp_rx_data(); a negative value flows as a signed copy
+ * length into siw_check_mem() and skb_copy_bits().
+ */
+ if (unlikely(be16_to_cpu(c_hdr->mpa_len) + MPA_HDR_SIZE <
+ iwarp_pktinfo[opcode].hdr_len)) {
+ pr_warn_ratelimited("siw: short mpa_len %u for opcode %u (hdr_len %u)\n",
+ be16_to_cpu(c_hdr->mpa_len), opcode,
+ iwarp_pktinfo[opcode].hdr_len);
+ siw_init_terminate(rx_qp(srx), TERM_ERROR_LAYER_LLP,
+ LLP_ETYPE_MPA, LLP_ECODE_FPDU_START, 0);
+ return -EINVAL;
+ }
+
+ /*
* DDP/RDMAP header receive completed. Check if the current
* DDP segment starts a new RDMAP message or continues a previously
* started RDMAP message.