aboutsummaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorIlpo Järvinen <ij@kernel.org>2025-09-16 10:24:26 +0200
committerPaolo Abeni <pabeni@redhat.com>2025-09-18 08:47:51 +0200
commit3cae34274c79e0c60ccd1c10516973af1aed2a7c (patch)
tree63e7f21fb47b12148c4dd23da67d942e4c854c6e /Documentation
parent542a495cbaa6dc57a310da62b501fdf318657cad (diff)
tcp: accecn: AccECN negotiation
Accurate ECN negotiation parts based on the specification: https://tools.ietf.org/id/draft-ietf-tcpm-accurate-ecn-28.txt Accurate ECN is negotiated using ECE, CWR and AE flags in the TCP header. TCP falls back into using RFC3168 ECN if one of the ends supports only RFC3168-style ECN. The AccECN negotiation includes reflecting IP ECN field value seen in SYN and SYNACK back using the same bits as negotiation to allow responding to SYN CE marks and to detect ECN field mangling. CE marks should not occur currently because SYN=1 segments are sent with Non-ECT in IP ECN field (but proposal exists to remove this restriction). Reflecting SYN IP ECN field in SYNACK is relatively simple. Reflecting SYNACK IP ECN field in the final/third ACK of the handshake is more challenging. Linux TCP code is not well prepared for using the final/third ACK a signalling channel which makes things somewhat complicated here. tcp_ecn sysctl can be used to select the highest ECN variant (Accurate ECN, ECN, No ECN) that is attemped to be negotiated and requested for incoming connection and outgoing connection: TCP_ECN_IN_NOECN_OUT_NOECN, TCP_ECN_IN_ECN_OUT_ECN, TCP_ECN_IN_ECN_OUT_NOECN, TCP_ECN_IN_ACCECN_OUT_ACCECN, TCP_ECN_IN_ACCECN_OUT_ECN, and TCP_ECN_IN_ACCECN_OUT_NOECN. After this patch, the size of tcp_request_sock remains unchanged and no new holes are added. Below are the pahole outcomes before and after this patch: [BEFORE THIS PATCH] struct tcp_request_sock { [...] u32 rcv_nxt; /* 352 4 */ u8 syn_tos; /* 356 1 */ /* size: 360, cachelines: 6, members: 16 */ } [AFTER THIS PATCH] struct tcp_request_sock { [...] u32 rcv_nxt; /* 352 4 */ u8 syn_tos; /* 356 1 */ bool accecn_ok; /* 357 1 */ u8 syn_ect_snt:2; /* 358: 0 1 */ u8 syn_ect_rcv:2; /* 358: 2 1 */ u8 accecn_fail_mode:4; /* 358: 4 1 */ /* size: 360, cachelines: 6, members: 20 */ } After this patch, the size of tcp_sock remains unchanged and no new holes are added. Also, 4 bits of the existing 2-byte hole are exploited. Below are the pahole outcomes before and after this patch: [BEFORE THIS PATCH] struct tcp_sock { [...] u8 dup_ack_counter:2; /* 2761: 0 1 */ u8 tlp_retrans:1; /* 2761: 2 1 */ u8 unused:5; /* 2761: 3 1 */ u8 thin_lto:1; /* 2762: 0 1 */ u8 fastopen_connect:1; /* 2762: 1 1 */ u8 fastopen_no_cookie:1; /* 2762: 2 1 */ u8 fastopen_client_fail:2; /* 2762: 3 1 */ u8 frto:1; /* 2762: 5 1 */ /* XXX 2 bits hole, try to pack */ [...] u8 keepalive_probes; /* 2765 1 */ /* XXX 2 bytes hole, try to pack */ [...] /* size: 3200, cachelines: 50, members: 164 */ } [AFTER THIS PATCH] struct tcp_sock { [...] u8 dup_ack_counter:2; /* 2761: 0 1 */ u8 tlp_retrans:1; /* 2761: 2 1 */ u8 syn_ect_snt:2; /* 2761: 3 1 */ u8 syn_ect_rcv:2; /* 2761: 5 1 */ u8 thin_lto:1; /* 2761: 7 1 */ u8 fastopen_connect:1; /* 2762: 0 1 */ u8 fastopen_no_cookie:1; /* 2762: 1 1 */ u8 fastopen_client_fail:2; /* 2762: 2 1 */ u8 frto:1; /* 2762: 4 1 */ /* XXX 3 bits hole, try to pack */ [...] u8 keepalive_probes; /* 2765 1 */ u8 accecn_fail_mode:4; /* 2766: 0 1 */ /* XXX 4 bits hole, try to pack */ /* XXX 1 byte hole, try to pack */ [...] /* size: 3200, cachelines: 50, members: 166 */ } Signed-off-by: Ilpo Järvinen <ij@kernel.org> Co-developed-by: Olivier Tilmans <olivier.tilmans@nokia.com> Signed-off-by: Olivier Tilmans <olivier.tilmans@nokia.com> Co-developed-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250916082434.100722-3-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/networking/ip-sysctl.rst36
-rw-r--r--Documentation/networking/net_cachelines/tcp_sock.rst3
2 files changed, 25 insertions, 14 deletions
diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index 9f5891c9b07b..3d8eb54b84f9 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -443,20 +443,28 @@ tcp_early_retrans - INTEGER
tcp_ecn - INTEGER
Control use of Explicit Congestion Notification (ECN) by TCP.
- ECN is used only when both ends of the TCP connection indicate
- support for it. This feature is useful in avoiding losses due
- to congestion by allowing supporting routers to signal
- congestion before having to drop packets.
-
- Possible values are:
-
- = =====================================================
- 0 Disable ECN. Neither initiate nor accept ECN.
- 1 Enable ECN when requested by incoming connections and
- also request ECN on outgoing connection attempts.
- 2 Enable ECN when requested by incoming connections
- but do not request ECN on outgoing connections.
- = =====================================================
+ ECN is used only when both ends of the TCP connection indicate support
+ for it. This feature is useful in avoiding losses due to congestion by
+ allowing supporting routers to signal congestion before having to drop
+ packets. A host that supports ECN both sends ECN at the IP layer and
+ feeds back ECN at the TCP layer. The highest variant of ECN feedback
+ that both peers support is chosen by the ECN negotiation (Accurate ECN,
+ ECN, or no ECN).
+
+ The highest negotiated variant for incoming connection requests
+ and the highest variant requested by outgoing connection
+ attempts:
+
+ ===== ==================== ====================
+ Value Incoming connections Outgoing connections
+ ===== ==================== ====================
+ 0 No ECN No ECN
+ 1 ECN ECN
+ 2 ECN No ECN
+ 3 AccECN AccECN
+ 4 AccECN ECN
+ 5 AccECN No ECN
+ ===== ==================== ====================
Default: 2
diff --git a/Documentation/networking/net_cachelines/tcp_sock.rst b/Documentation/networking/net_cachelines/tcp_sock.rst
index 31313a9adccc..4f71ece7c655 100644
--- a/Documentation/networking/net_cachelines/tcp_sock.rst
+++ b/Documentation/networking/net_cachelines/tcp_sock.rst
@@ -103,6 +103,9 @@ u32 delivered read_mostly read_w
u32 delivered_ce read_mostly read_write tcp_rate_skb_sent(tx);tcp_rate_gen(rx)
u32 received_ce read_mostly read_write
u8:4 received_ce_pending read_mostly read_write
+u8:2 syn_ect_snt write_mostly read_write
+u8:2 syn_ect_rcv read_mostly read_write
+u8:4 accecn_fail_mode
u32 lost read_mostly tcp_ack
u32 app_limited read_write read_mostly tcp_rate_check_app_limited,tcp_rate_skb_sent(tx);tcp_rate_gen(rx)
u64 first_tx_mstamp read_write tcp_rate_skb_sent