diff options
| author | Ilpo Järvinen <ij@kernel.org> | 2025-09-16 10:24:26 +0200 |
|---|---|---|
| committer | Paolo Abeni <pabeni@redhat.com> | 2025-09-18 08:47:51 +0200 |
| commit | 3cae34274c79e0c60ccd1c10516973af1aed2a7c (patch) | |
| tree | 63e7f21fb47b12148c4dd23da67d942e4c854c6e /Documentation | |
| parent | 542a495cbaa6dc57a310da62b501fdf318657cad (diff) | |
tcp: accecn: AccECN negotiation
Accurate ECN negotiation parts based on the specification:
https://tools.ietf.org/id/draft-ietf-tcpm-accurate-ecn-28.txt
Accurate ECN is negotiated using ECE, CWR and AE flags in the
TCP header. TCP falls back into using RFC3168 ECN if one of the
ends supports only RFC3168-style ECN.
The AccECN negotiation includes reflecting IP ECN field value
seen in SYN and SYNACK back using the same bits as negotiation
to allow responding to SYN CE marks and to detect ECN field
mangling. CE marks should not occur currently because SYN=1
segments are sent with Non-ECT in IP ECN field (but proposal
exists to remove this restriction).
Reflecting SYN IP ECN field in SYNACK is relatively simple.
Reflecting SYNACK IP ECN field in the final/third ACK of
the handshake is more challenging. Linux TCP code is not well
prepared for using the final/third ACK a signalling channel
which makes things somewhat complicated here.
tcp_ecn sysctl can be used to select the highest ECN variant
(Accurate ECN, ECN, No ECN) that is attemped to be negotiated and
requested for incoming connection and outgoing connection:
TCP_ECN_IN_NOECN_OUT_NOECN, TCP_ECN_IN_ECN_OUT_ECN,
TCP_ECN_IN_ECN_OUT_NOECN, TCP_ECN_IN_ACCECN_OUT_ACCECN,
TCP_ECN_IN_ACCECN_OUT_ECN, and TCP_ECN_IN_ACCECN_OUT_NOECN.
After this patch, the size of tcp_request_sock remains unchanged
and no new holes are added. Below are the pahole outcomes before
and after this patch:
[BEFORE THIS PATCH]
struct tcp_request_sock {
[...]
u32 rcv_nxt; /* 352 4 */
u8 syn_tos; /* 356 1 */
/* size: 360, cachelines: 6, members: 16 */
}
[AFTER THIS PATCH]
struct tcp_request_sock {
[...]
u32 rcv_nxt; /* 352 4 */
u8 syn_tos; /* 356 1 */
bool accecn_ok; /* 357 1 */
u8 syn_ect_snt:2; /* 358: 0 1 */
u8 syn_ect_rcv:2; /* 358: 2 1 */
u8 accecn_fail_mode:4; /* 358: 4 1 */
/* size: 360, cachelines: 6, members: 20 */
}
After this patch, the size of tcp_sock remains unchanged and no new
holes are added. Also, 4 bits of the existing 2-byte hole are exploited.
Below are the pahole outcomes before and after this patch:
[BEFORE THIS PATCH]
struct tcp_sock {
[...]
u8 dup_ack_counter:2; /* 2761: 0 1 */
u8 tlp_retrans:1; /* 2761: 2 1 */
u8 unused:5; /* 2761: 3 1 */
u8 thin_lto:1; /* 2762: 0 1 */
u8 fastopen_connect:1; /* 2762: 1 1 */
u8 fastopen_no_cookie:1; /* 2762: 2 1 */
u8 fastopen_client_fail:2; /* 2762: 3 1 */
u8 frto:1; /* 2762: 5 1 */
/* XXX 2 bits hole, try to pack */
[...]
u8 keepalive_probes; /* 2765 1 */
/* XXX 2 bytes hole, try to pack */
[...]
/* size: 3200, cachelines: 50, members: 164 */
}
[AFTER THIS PATCH]
struct tcp_sock {
[...]
u8 dup_ack_counter:2; /* 2761: 0 1 */
u8 tlp_retrans:1; /* 2761: 2 1 */
u8 syn_ect_snt:2; /* 2761: 3 1 */
u8 syn_ect_rcv:2; /* 2761: 5 1 */
u8 thin_lto:1; /* 2761: 7 1 */
u8 fastopen_connect:1; /* 2762: 0 1 */
u8 fastopen_no_cookie:1; /* 2762: 1 1 */
u8 fastopen_client_fail:2; /* 2762: 2 1 */
u8 frto:1; /* 2762: 4 1 */
/* XXX 3 bits hole, try to pack */
[...]
u8 keepalive_probes; /* 2765 1 */
u8 accecn_fail_mode:4; /* 2766: 0 1 */
/* XXX 4 bits hole, try to pack */
/* XXX 1 byte hole, try to pack */
[...]
/* size: 3200, cachelines: 50, members: 166 */
}
Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Co-developed-by: Olivier Tilmans <olivier.tilmans@nokia.com>
Signed-off-by: Olivier Tilmans <olivier.tilmans@nokia.com>
Co-developed-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250916082434.100722-3-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Diffstat (limited to 'Documentation')
| -rw-r--r-- | Documentation/networking/ip-sysctl.rst | 36 | ||||
| -rw-r--r-- | Documentation/networking/net_cachelines/tcp_sock.rst | 3 |
2 files changed, 25 insertions, 14 deletions
diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index 9f5891c9b07b..3d8eb54b84f9 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -443,20 +443,28 @@ tcp_early_retrans - INTEGER tcp_ecn - INTEGER Control use of Explicit Congestion Notification (ECN) by TCP. - ECN is used only when both ends of the TCP connection indicate - support for it. This feature is useful in avoiding losses due - to congestion by allowing supporting routers to signal - congestion before having to drop packets. - - Possible values are: - - = ===================================================== - 0 Disable ECN. Neither initiate nor accept ECN. - 1 Enable ECN when requested by incoming connections and - also request ECN on outgoing connection attempts. - 2 Enable ECN when requested by incoming connections - but do not request ECN on outgoing connections. - = ===================================================== + ECN is used only when both ends of the TCP connection indicate support + for it. This feature is useful in avoiding losses due to congestion by + allowing supporting routers to signal congestion before having to drop + packets. A host that supports ECN both sends ECN at the IP layer and + feeds back ECN at the TCP layer. The highest variant of ECN feedback + that both peers support is chosen by the ECN negotiation (Accurate ECN, + ECN, or no ECN). + + The highest negotiated variant for incoming connection requests + and the highest variant requested by outgoing connection + attempts: + + ===== ==================== ==================== + Value Incoming connections Outgoing connections + ===== ==================== ==================== + 0 No ECN No ECN + 1 ECN ECN + 2 ECN No ECN + 3 AccECN AccECN + 4 AccECN ECN + 5 AccECN No ECN + ===== ==================== ==================== Default: 2 diff --git a/Documentation/networking/net_cachelines/tcp_sock.rst b/Documentation/networking/net_cachelines/tcp_sock.rst index 31313a9adccc..4f71ece7c655 100644 --- a/Documentation/networking/net_cachelines/tcp_sock.rst +++ b/Documentation/networking/net_cachelines/tcp_sock.rst @@ -103,6 +103,9 @@ u32 delivered read_mostly read_w u32 delivered_ce read_mostly read_write tcp_rate_skb_sent(tx);tcp_rate_gen(rx) u32 received_ce read_mostly read_write u8:4 received_ce_pending read_mostly read_write +u8:2 syn_ect_snt write_mostly read_write +u8:2 syn_ect_rcv read_mostly read_write +u8:4 accecn_fail_mode u32 lost read_mostly tcp_ack u32 app_limited read_write read_mostly tcp_rate_check_app_limited,tcp_rate_skb_sent(tx);tcp_rate_gen(rx) u64 first_tx_mstamp read_write tcp_rate_skb_sent |
