summaryrefslogtreecommitdiffstats
path: root/Documentation/networking/ip-sysctl.rst
diff options
context:
space:
mode:
authorMauro Carvalho Chehab <mchehab+huawei@kernel.org>2020-04-28 00:01:49 +0200
committerDavid S. Miller <davem@davemloft.net>2020-04-28 14:40:18 -0700
commit1cec2cacaaec5d53adc04dd3ecfdb687b26c0e89 (patch)
tree89e5f3347d8d28ae4b2826a12f89f4db092f7778 /Documentation/networking/ip-sysctl.rst
parent355e656e017c3b42deb57d125d86c4cbd277d6db (diff)
docs: networking: convert ip-sysctl.txt to ReST
- add SPDX header; - adjust titles and chapters, adding proper markups; - mark code blocks and literals as such; - mark lists as such; - mark tables as such; - use footnote markup; - adjust identation, whitespaces and blank lines; - add to networking/index.rst. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'Documentation/networking/ip-sysctl.rst')
-rw-r--r--Documentation/networking/ip-sysctl.rst2649
1 files changed, 2649 insertions, 0 deletions
diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
new file mode 100644
index 000000000000..38f811d4b2f0
--- /dev/null
+++ b/Documentation/networking/ip-sysctl.rst
@@ -0,0 +1,2649 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=========
+IP Sysctl
+=========
+
+/proc/sys/net/ipv4/* Variables
+==============================
+
+ip_forward - BOOLEAN
+ - 0 - disabled (default)
+ - not 0 - enabled
+
+ Forward Packets between interfaces.
+
+ This variable is special, its change resets all configuration
+ parameters to their default state (RFC1122 for hosts, RFC1812
+ for routers)
+
+ip_default_ttl - INTEGER
+ Default value of TTL field (Time To Live) for outgoing (but not
+ forwarded) IP packets. Should be between 1 and 255 inclusive.
+ Default: 64 (as recommended by RFC1700)
+
+ip_no_pmtu_disc - INTEGER
+ Disable Path MTU Discovery. If enabled in mode 1 and a
+ fragmentation-required ICMP is received, the PMTU to this
+ destination will be set to min_pmtu (see below). You will need
+ to raise min_pmtu to the smallest interface MTU on your system
+ manually if you want to avoid locally generated fragments.
+
+ In mode 2 incoming Path MTU Discovery messages will be
+ discarded. Outgoing frames are handled the same as in mode 1,
+ implicitly setting IP_PMTUDISC_DONT on every created socket.
+
+ Mode 3 is a hardened pmtu discover mode. The kernel will only
+ accept fragmentation-needed errors if the underlying protocol
+ can verify them besides a plain socket lookup. Current
+ protocols for which pmtu events will be honored are TCP, SCTP
+ and DCCP as they verify e.g. the sequence number or the
+ association. This mode should not be enabled globally but is
+ only intended to secure e.g. name servers in namespaces where
+ TCP path mtu must still work but path MTU information of other
+ protocols should be discarded. If enabled globally this mode
+ could break other protocols.
+
+ Possible values: 0-3
+
+ Default: FALSE
+
+min_pmtu - INTEGER
+ default 552 - minimum discovered Path MTU
+
+ip_forward_use_pmtu - BOOLEAN
+ By default we don't trust protocol path MTUs while forwarding
+ because they could be easily forged and can lead to unwanted
+ fragmentation by the router.
+ You only need to enable this if you have user-space software
+ which tries to discover path mtus by itself and depends on the
+ kernel honoring this information. This is normally not the
+ case.
+
+ Default: 0 (disabled)
+
+ Possible values:
+
+ - 0 - disabled
+ - 1 - enabled
+
+fwmark_reflect - BOOLEAN
+ Controls the fwmark of kernel-generated IPv4 reply packets that are not
+ associated with a socket for example, TCP RSTs or ICMP echo replies).
+ If unset, these packets have a fwmark of zero. If set, they have the
+ fwmark of the packet they are replying to.
+
+ Default: 0
+
+fib_multipath_use_neigh - BOOLEAN
+ Use status of existing neighbor entry when determining nexthop for
+ multipath routes. If disabled, neighbor information is not used and
+ packets could be directed to a failed nexthop. Only valid for kernels
+ built with CONFIG_IP_ROUTE_MULTIPATH enabled.
+
+ Default: 0 (disabled)
+
+ Possible values:
+
+ - 0 - disabled
+ - 1 - enabled
+
+fib_multipath_hash_policy - INTEGER
+ Controls which hash policy to use for multipath routes. Only valid
+ for kernels built with CONFIG_IP_ROUTE_MULTIPATH enabled.
+
+ Default: 0 (Layer 3)
+
+ Possible values:
+
+ - 0 - Layer 3
+ - 1 - Layer 4
+ - 2 - Layer 3 or inner Layer 3 if present
+
+fib_sync_mem - UNSIGNED INTEGER
+ Amount of dirty memory from fib entries that can be backlogged before
+ synchronize_rcu is forced.
+
+ Default: 512kB Minimum: 64kB Maximum: 64MB
+
+ip_forward_update_priority - INTEGER
+ Whether to update SKB priority from "TOS" field in IPv4 header after it
+ is forwarded. The new SKB priority is mapped from TOS field value
+ according to an rt_tos2priority table (see e.g. man tc-prio).
+
+ Default: 1 (Update priority.)
+
+ Possible values:
+
+ - 0 - Do not update priority.
+ - 1 - Update priority.
+
+route/max_size - INTEGER
+ Maximum number of routes allowed in the kernel. Increase
+ this when using large numbers of interfaces and/or routes.
+
+ From linux kernel 3.6 onwards, this is deprecated for ipv4
+ as route cache is no longer used.
+
+neigh/default/gc_thresh1 - INTEGER
+ Minimum number of entries to keep. Garbage collector will not
+ purge entries if there are fewer than this number.
+
+ Default: 128
+
+neigh/default/gc_thresh2 - INTEGER
+ Threshold when garbage collector becomes more aggressive about
+ purging entries. Entries older than 5 seconds will be cleared
+ when over this number.
+
+ Default: 512
+
+neigh/default/gc_thresh3 - INTEGER
+ Maximum number of non-PERMANENT neighbor entries allowed. Increase
+ this when using large numbers of interfaces and when communicating
+ with large numbers of directly-connected peers.
+
+ Default: 1024
+
+neigh/default/unres_qlen_bytes - INTEGER
+ The maximum number of bytes which may be used by packets
+ queued for each unresolved address by other network layers.
+ (added in linux 3.3)
+
+ Setting negative value is meaningless and will return error.
+
+ Default: SK_WMEM_MAX, (same as net.core.wmem_default).
+
+ Exact value depends on architecture and kernel options,
+ but should be enough to allow queuing 256 packets
+ of medium size.
+
+neigh/default/unres_qlen - INTEGER
+ The maximum number of packets which may be queued for each
+ unresolved address by other network layers.
+
+ (deprecated in linux 3.3) : use unres_qlen_bytes instead.
+
+ Prior to linux 3.3, the default value is 3 which may cause
+ unexpected packet loss. The current default value is calculated
+ according to default value of unres_qlen_bytes and true size of
+ packet.
+
+ Default: 101
+
+mtu_expires - INTEGER
+ Time, in seconds, that cached PMTU information is kept.
+
+min_adv_mss - INTEGER
+ The advertised MSS depends on the first hop route MTU, but will
+ never be lower than this setting.
+
+IP Fragmentation:
+
+ipfrag_high_thresh - LONG INTEGER
+ Maximum memory used to reassemble IP fragments.
+
+ipfrag_low_thresh - LONG INTEGER
+ (Obsolete since linux-4.17)
+ Maximum memory used to reassemble IP fragments before the kernel
+ begins to remove incomplete fragment queues to free up resources.
+ The kernel still accepts new fragments for defragmentation.
+
+ipfrag_time - INTEGER
+ Time in seconds to keep an IP fragment in memory.
+
+ipfrag_max_dist - INTEGER
+ ipfrag_max_dist is a non-negative integer value which defines the
+ maximum "disorder" which is allowed among fragments which share a
+ common IP source address. Note that reordering of packets is
+ not unusual, but if a large number of fragments arrive from a source
+ IP address while a particular fragment queue remains incomplete, it
+ probably indicates that one or more fragments belonging to that queue
+ have been lost. When ipfrag_max_dist is positive, an additional check
+ is done on fragments before they are added to a reassembly queue - if
+ ipfrag_max_dist (or more) fragments have arrived from a particular IP
+ address between additions to any IP fragment queue using that source
+ address, it's presumed that one or more fragments in the queue are
+ lost. The existing fragment queue will be dropped, and a new one
+ started. An ipfrag_max_dist value of zero disables this check.
+
+ Using a very small value, e.g. 1 or 2, for ipfrag_max_dist can
+ result in unnecessarily dropping fragment queues when normal
+ reordering of packets occurs, which could lead to poor application
+ performance. Using a very large value, e.g. 50000, increases the
+ likelihood of incorrectly reassembling IP fragments that originate
+ from different IP datagrams, which could result in data corruption.
+ Default: 64
+
+INET peer storage
+=================
+
+inet_peer_threshold - INTEGER
+ The approximate size of the storage. Starting from this threshold
+ entries will be thrown aggressively. This threshold also determines
+ entries' time-to-live and time intervals between garbage collection
+ passes. More entries, less time-to-live, less GC interval.
+
+inet_peer_minttl - INTEGER
+ Minimum time-to-live of entries. Should be enough to cover fragment
+ time-to-live on the reassembling side. This minimum time-to-live is
+ guaranteed if the pool size is less than inet_peer_threshold.
+ Measured in seconds.
+
+inet_peer_maxttl - INTEGER
+ Maximum time-to-live of entries. Unused entries will expire after
+ this period of time if there is no memory pressure on the pool (i.e.
+ when the number of entries in the pool is very small).
+ Measured in seconds.
+
+TCP variables
+=============
+
+somaxconn - INTEGER
+ Limit of socket listen() backlog, known in userspace as SOMAXCONN.
+ Defaults to 4096. (Was 128 before linux-5.4)
+ See also tcp_max_syn_backlog for additional tuning for TCP sockets.
+
+tcp_abort_on_overflow - BOOLEAN
+ If listening service is too slow to accept new connections,
+ reset them. Default state is FALSE. It means that if overflow
+ occurred due to a burst, connection will recover. Enable this
+ option _only_ if you are really sure that listening daemon
+ cannot be tuned to accept connections faster. Enabling this
+ option can harm clients of your server.
+
+tcp_adv_win_scale - INTEGER
+ Count buffering overhead as bytes/2^tcp_adv_win_scale
+ (if tcp_adv_win_scale > 0) or bytes-bytes/2^(-tcp_adv_win_scale),
+ if it is <= 0.
+
+ Possible values are [-31, 31], inclusive.
+
+ Default: 1
+
+tcp_allowed_congestion_control - STRING
+ Show/set the congestion control choices available to non-privileged
+ processes. The list is a subset of those listed in
+ tcp_available_congestion_control.
+
+ Default is "reno" and the default setting (tcp_congestion_control).
+
+tcp_app_win - INTEGER
+ Reserve max(window/2^tcp_app_win, mss) of window for application
+ buffer. Value 0 is special, it means that nothing is reserved.
+
+ Default: 31
+
+tcp_autocorking - BOOLEAN
+ Enable TCP auto corking :
+ When applications do consecutive small write()/sendmsg() system calls,
+ we try to coalesce these small writes as much as possible, to lower
+ total amount of sent packets. This is done if at least one prior
+ packet for the flow is waiting in Qdisc queues or device transmit
+ queue. Applications can still use TCP_CORK for optimal behavior
+ when they know how/when to uncork their sockets.
+
+ Default : 1
+
+tcp_available_congestion_control - STRING
+ Shows the available congestion control choices that are registered.
+ More congestion control algorithms may be available as modules,
+ but not loaded.
+
+tcp_base_mss - INTEGER
+ The initial value of search_low to be used by the packetization layer
+ Path MTU discovery (MTU probing). If MTU probing is enabled,
+ this is the initial MSS used by the connection.
+
+tcp_mtu_probe_floor - INTEGER
+ If MTU probing is enabled this caps the minimum MSS used for search_low
+ for the connection.
+
+ Default : 48
+
+tcp_min_snd_mss - INTEGER
+ TCP SYN and SYNACK messages usually advertise an ADVMSS option,
+ as described in RFC 1122 and RFC 6691.
+
+ If this ADVMSS option is smaller than tcp_min_snd_mss,
+ it is silently capped to tcp_min_snd_mss.
+
+ Default : 48 (at least 8 bytes of payload per segment)
+
+tcp_congestion_control - STRING
+ Set the congestion control algorithm to be used for new
+ connections. The algorithm "reno" is always available, but
+ additional choices may be available based on kernel configuration.
+ Default is set as part of kernel configuration.
+ For passive connections, the listener congestion control choice
+ is inherited.
+
+ [see setsockopt(listenfd, SOL_TCP, TCP_CONGESTION, "name" ...) ]
+
+tcp_dsack - BOOLEAN
+ Allows TCP to send "duplicate" SACKs.
+
+tcp_early_retrans - INTEGER
+ Tail loss probe (TLP) converts RTOs occurring due to tail
+ losses into fast recovery (draft-ietf-tcpm-rack). Note that
+ TLP requires RACK to function properly (see tcp_recovery below)
+
+ Possible values:
+
+ - 0 disables TLP
+ - 3 or 4 enables TLP
+
+ Default: 3
+
+tcp_ecn - INTEGER
+ Control use of Explicit Congestion Notification (ECN) by TCP.
+ ECN is used only when both ends of the TCP connection indicate
+ support for it. This feature is useful in avoiding losses due
+ to congestion by allowing supporting routers to signal
+ congestion before having to drop packets.
+
+ Possible values are:
+
+ = =====================================================
+ 0 Disable ECN. Neither initiate nor accept ECN.
+ 1 Enable ECN when requested by incoming connections and
+ also request ECN on outgoing connection attempts.
+ 2 Enable ECN when requested by incoming connections
+ but do not request ECN on outgoing connections.
+ = =====================================================
+
+ Default: 2
+
+tcp_ecn_fallback - BOOLEAN
+ If the kernel detects that ECN connection misbehaves, enable fall
+ back to non-ECN. Currently, this knob implements the fallback
+ from RFC3168, section 6.1.1.1., but we reserve that in future,
+ additional detection mechanisms could be implemented under this
+ knob. The value is not used, if tcp_ecn or per route (or congestion
+ control) ECN settings are disabled.
+
+ Default: 1 (fallback enabled)
+
+tcp_fack - BOOLEAN
+ This is a legacy option, it has no effect anymore.
+
+tcp_fin_timeout - INTEGER
+ The length of time an orphaned (no longer referenced by any
+ application) connection will remain in the FIN_WAIT_2 state
+ before it is aborted at the local end. While a perfectly
+ valid "receive only" state for an un-orphaned connection, an
+ orphaned connection in FIN_WAIT_2 state could otherwise wait
+ forever for the remote to close its end of the connection.
+
+ Cf. tcp_max_orphans
+
+ Default: 60 seconds
+
+tcp_frto - INTEGER
+ Enables Forward RTO-Recovery (F-RTO) defined in RFC5682.
+ F-RTO is an enhanced recovery algorithm for TCP retransmission
+ timeouts. It is particularly beneficial in networks where the
+ RTT fluctuates (e.g., wireless). F-RTO is sender-side only
+ modification. It does not require any support from the peer.
+
+ By default it's enabled with a non-zero value. 0 disables F-RTO.
+
+tcp_fwmark_accept - BOOLEAN
+ If set, incoming connections to listening sockets that do not have a
+ socket mark will set the mark of the accepting socket to the fwmark of
+ the incoming SYN packet. This will cause all packets on that connection
+ (starting from the first SYNACK) to be sent with that fwmark. The
+ listening socket's mark is unchanged. Listening sockets that already
+ have a fwmark set via setsockopt(SOL_SOCKET, SO_MARK, ...) are
+ unaffected.
+
+ Default: 0
+
+tcp_invalid_ratelimit - INTEGER
+ Limit the maximal rate for sending duplicate acknowledgments
+ in response to incoming TCP packets that are for an existing
+ connection but that are invalid due to any of these reasons:
+
+ (a) out-of-window sequence number,
+ (b) out-of-window acknowledgment number, or
+ (c) PAWS (Protection Against Wrapped Sequence numbers) check failure
+
+ This can help mitigate simple "ack loop" DoS attacks, wherein
+ a buggy or malicious middlebox or man-in-the-middle can
+ rewrite TCP header fields in manner that causes each endpoint
+ to think that the other is sending invalid TCP segments, thus
+ causing each side to send an unterminating stream of duplicate
+ acknowledgments for invalid segments.
+
+ Using 0 disables rate-limiting of dupacks in response to
+ invalid segments; otherwise this value specifies the minimal
+ space between sending such dupacks, in milliseconds.
+
+ Default: 500 (milliseconds).
+
+tcp_keepalive_time - INTEGER
+ How often TCP sends out keepalive messages when keepalive is enabled.
+ Default: 2hours.
+
+tcp_keepalive_probes - INTEGER
+ How many keepalive probes TCP sends out, until it decides that the
+ connection is broken. Default value: 9.
+
+tcp_keepalive_intvl - INTEGER
+ How frequently the probes are send out. Multiplied by
+ tcp_keepalive_probes it is time to kill not responding connection,
+ after probes started. Default value: 75sec i.e. connection
+ will be aborted after ~11 minutes of retries.
+
+tcp_l3mdev_accept - BOOLEAN
+ Enables child sockets to inherit the L3 master device index.
+ Enabling this option allows a "global" listen socket to work
+ across L3 master domains (e.g., VRFs) with connected sockets
+ derived from the listen socket to be bound to the L3 domain in
+ which the packets originated. Only valid when the kernel was
+ compiled with CONFIG_NET_L3_MASTER_DEV.
+
+ Default: 0 (disabled)
+
+tcp_low_latency - BOOLEAN
+ This is a legacy option, it has no effect anymore.
+
+tcp_max_orphans - INTEGER
+ Maximal number of TCP sockets not attached to any user file handle,
+ held by system. If this number is exceeded orphaned connections are
+ reset immediately and warning is printed. This limit exists
+ only to prevent simple DoS attacks, you _must_ not rely on this
+ or lower the limit artificially, but rather increase it
+ (probably, after increasing installed memory),
+ if network conditions require more than default value,
+ and tune network services to linger and kill such states
+ more aggressively. Let me to remind again: each orphan eats
+ up to ~64K of unswappable memory.
+
+tcp_max_syn_backlog - INTEGER
+ Maximal number of remembered connection requests (SYN_RECV),
+ which have not received an acknowledgment from connecting client.
+
+ This is a per-listener limit.
+
+ The minimal value is 128 for low memory machines, and it will
+ increase in proportion to the memory of machine.
+
+ If server suffers from overload, try increasing this number.
+
+ Remember to also check /proc/sys/net/core/somaxconn
+ A SYN_RECV request socket consumes about 304 bytes of memory.
+
+tcp_max_tw_buckets - INTEGER
+ Maximal number of timewait sockets held by system simultaneously.
+ If this number is exceeded time-wait socket is immediately destroyed
+ and warning is printed. This limit exists only to prevent
+ simple DoS attacks, you _must_ not lower the limit artificially,
+ but rather increase it (probably, after increasing installed memory),
+ if network conditions require more than default value.
+
+tcp_mem - vector of 3 INTEGERs: min, pressure, max
+ min: below this number of pages TCP is not bothered about its
+ memory appetite.
+
+ pressure: when amount of memory allocated by TCP exceeds this number
+ of pages, TCP moderates its memory consumption and enters memory
+ pressure mode, which is exited when memory consumption falls
+ under "min".
+
+ max: number of pages allowed for queueing by all TCP sockets.
+
+ Defaults are calculated at boot time from amount of available
+ memory.
+
+tcp_min_rtt_wlen - INTEGER
+ The window length of the windowed min filter to track the minimum RTT.
+ A shorter window lets a flow more quickly pick up new (higher)
+ minimum RTT when it is moved to a longer path (e.g., due to traffic
+ engineering). A longer window makes the filter more resistant to RTT
+ inflations such as transient congestion. The unit is seconds.
+
+ Possible values: 0 - 86400 (1 day)
+
+ Default: 300
+
+tcp_moderate_rcvbuf - BOOLEAN
+ If set, TCP performs receive buffer auto-tuning, attempting to
+ automatically size the buffer (no greater than tcp_rmem[2]) to
+ match the size required by the path for full throughput. Enabled by
+ default.
+
+tcp_mtu_probing - INTEGER
+ Controls TCP Packetization-Layer Path MTU Discovery. Takes three
+ values:
+
+ - 0 - Disabled
+ - 1 - Disabled by default, enabled when an ICMP black hole detected
+ - 2 - Always enabled, use initial MSS of tcp_base_mss.
+
+tcp_probe_interval - UNSIGNED INTEGER
+ Controls how often to start TCP Packetization-Layer Path MTU
+ Discovery reprobe. The default is reprobing every 10 minutes as
+ per RFC4821.
+
+tcp_probe_threshold - INTEGER
+ Controls when TCP Packetization-Layer Path MTU Discovery probing
+ will stop in respect to the width of search range in bytes. Default
+ is 8 bytes.
+
+tcp_no_metrics_save - BOOLEAN
+ By default, TCP saves various connection metrics in the route cache
+ when the connection closes, so that connections established in the
+ near future can use these to set initial conditions. Usually, this
+ increases overall performance, but may sometimes cause performance
+ degradation. If set, TCP will not cache metrics on closing
+ connections.
+
+tcp_no_ssthresh_metrics_save - BOOLEAN
+ Controls whether TCP saves ssthresh metrics in the route cache.
+
+ Default is 1, which disables ssthresh metrics.
+
+tcp_orphan_retries - INTEGER
+ This value influences the timeout of a locally closed TCP connection,
+ when RTO retransmissions remain unacknowledged.
+ See tcp_retries2 for more details.
+
+ The default value is 8.
+
+ If your machine is a loaded WEB server,
+ you should think about lowering this value, such sockets
+ may consume significant resources. Cf. tcp_max_orphans.
+
+tcp_recovery - INTEGER
+ This value is a bitmap to enable various experimental loss recovery
+ features.
+
+ ========= =============================================================
+ RACK: 0x1 enables the RACK loss detection for fast detection of lost
+ retransmissions and tail drops. It also subsumes and disables
+ RFC6675 recovery for SACK connections.
+
+ RACK: 0x2 makes RACK's reordering window static (min_rtt/4).
+
+ RACK: 0x4 disables RACK's DUPACK threshold heuristic
+ ========= =============================================================
+
+ Default: 0x1
+
+tcp_reordering - INTEGER
+ Initial reordering level of packets in a TCP stream.
+ TCP stack can then dynamically adjust flow reordering level
+ between this initial value and tcp_max_reordering
+
+ Default: 3
+
+tcp_max_reordering - INTEGER
+ Maximal reordering level of packets in a TCP stream.
+ 300 is a fairly conservative value, but you might increase it
+ if paths are using per packet load balancing (like bonding rr mode)
+
+ Default: 300
+
+tcp_retrans_collapse - BOOLEAN
+ Bug-to-bug compatibility with some broken printers.
+ On retransmit try to send bigger packets to work around bugs in
+ certain TCP stacks.
+
+tcp_retries1 - INTEGER
+ This value influences the time, after which TCP decides, that
+ something is wrong due to unacknowledged RTO retransmissions,
+ and reports this suspicion to the network layer.
+ See tcp_retries2 for more details.
+
+ RFC 1122 recommends at least 3 retransmissions, which is the
+ default.
+
+tcp_retries2 - INTEGER
+ This value influences the timeout of an alive TCP connection,
+ when RTO retransmissions remain unacknowledged.
+ Given a value of N, a hypothetical TCP connection following
+ exponential backoff with an initial RTO of TCP_RTO_MIN would
+ retransmit N times before killing the connection at the (N+1)th RTO.
+
+ The default value of 15 yields a hypothetical timeout of 924.6
+ seconds and is a lower bound for the effective timeout.
+ TCP will effectively time out at the first RTO which exceeds the
+ hypothetical timeout.
+
+ RFC 1122 recommends at least 100 seconds for the timeout,
+ which corresponds to a value of at least 8.
+
+tcp_rfc1337 - BOOLEAN
+ If set, the TCP stack behaves conforming to RFC1337. If unset,
+ we are not conforming to RFC, but prevent TCP TIME_WAIT
+ assassination.
+
+ Default: 0
+
+tcp_rmem - vector of 3 INTEGERs: min, default, max
+ min: Minimal size of receive buffer used by TCP sockets.
+ It is guaranteed to each TCP socket, even under moderate memory
+ pressure.
+
+ Default: 4K
+
+ default: initial size of receive buffer used by TCP sockets.
+ This value overrides net.core.rmem_default used by other protocols.
+ Default: 87380 bytes. This value results in window of 65535 with
+ default setting of tcp_adv_win_scale and tcp_app_win:0 and a bit
+ less for default tcp_app_win. See below about these variables.
+
+ max: maximal size of receive buffer allowed for automatically
+ selected receiver buffers for TCP socket. This value does not override
+ net.core.rmem_max. Calling setsockopt() with SO_RCVBUF disables
+ automatic tuning of that socket's receive buffer size, in which
+ case this value is ignored.
+ Default: between 87380B and 6MB, depending on RAM size.
+
+tcp_sack - BOOLEAN
+ Enable select acknowledgments (SACKS).
+
+tcp_comp_sack_delay_ns - LONG INTEGER
+ TCP tries to reduce number of SACK sent, using a timer
+ based on 5% of SRTT, capped by this sysctl, in nano seconds.
+ The default is 1ms, based on TSO autosizing period.
+
+ Default : 1,000,000 ns (1 ms)
+
+tcp_comp_sack_nr - INTEGER
+ Max number of SACK that can be compressed.
+ Using 0 disables SACK compression.
+
+ Default : 44
+
+tcp_slow_start_after_idle - BOOLEAN
+ If set, provide RFC2861 behavior and time out the congestion
+ window after an idle period. An idle period is defined at
+ the current RTO. If unset, the congestion window will not
+ be timed out after an idle period.
+
+ Default: 1
+
+tcp_stdurg - BOOLEAN
+ Use the Host requirements interpretation of the TCP urgent pointer field.
+ Most hosts use the older BSD interpretation, so if you turn this on
+ Linux might not communicate correctly with them.
+
+ Default: FALSE
+
+tcp_synack_retries - INTEGER
+ Number of times SYNACKs for a passive TCP connection attempt will
+ be retransmitted. Should not be higher than 255. Default value
+ is 5, which corresponds to 31seconds till the last retransmission
+ with the current initial RTO of 1second. With this the final timeout
+ for a passive TCP connection will happen after 63seconds.
+
+tcp_syncookies - INTEGER
+ Only valid when the kernel was compiled with CONFIG_SYN_COOKIES
+ Send out syncookies when the syn backlog queue of a socket
+ overflows. This is to prevent against the common 'SYN flood attack'
+ Default: 1
+
+ Note, that syncookies is fallback facility.
+ It MUST NOT be used to help highly loaded servers to stand
+ against legal connection rate. If you see SYN flood warnings
+ in your logs, but investigation shows that they occur
+ because of overload with legal connections, you should tune
+ another parameters until this warning disappear.
+ See: tcp_max_syn_backlog, tcp_synack_retries, tcp_abort_on_overflow.
+
+ syncookies seriously violate TCP protocol, do not allow
+ to use TCP extensions, can result in serious degradation
+ of some services (f.e. SMTP relaying), visible not by you,
+ but your clients and relays, contacting you. While you see
+ SYN flood warnings in logs not being really flooded, your server
+ is seriously misconfigured.
+
+ If you want to test which effects syncookies have to your
+ network connections you can set this knob to 2 to enable
+ unconditionally generation of syncookies.
+
+tcp_fastopen - INTEGER
+ Enable TCP Fast Open (RFC7413) to send and accept data in the opening
+ SYN packet.
+
+ The client support is enabled by flag 0x1 (on by default). The client
+ then must use sendmsg() or sendto() with the MSG_FASTOPEN flag,
+ rather than connect() to send data in SYN.
+
+ The server support is enabled by flag 0x2 (off by default). Then
+ either enable for all listeners with another flag (0x400) or
+ enable individual listeners via TCP_FASTOPEN socket option with
+ the option value being the length of the syn-data backlog.
+
+ The values (bitmap) are
+
+ ===== ======== ======================================================
+ 0x1 (client) enables sending data in the opening SYN on the client.
+ 0x2 (server) enables the server support, i.e., allowing data in
+ a SYN packet to be accepted and passed to the
+ application before 3-way handshake finishes.
+ 0x4 (client) send data in the opening SYN regardless of cookie
+ availability and without a cookie option.
+ 0x200 (server) accept data-in-SYN w/o any cookie option present.
+ 0x400 (server) enable all listeners to support Fast Open by
+ default without explicit TCP_FASTOPEN socket option.
+ ===== ======== ======================================================
+
+ Default: 0x1
+
+ Note that that additional client or server features are only
+ effective if the basic support (0x1 and 0x2) are enabled respectively.
+
+tcp_fastopen_blackhole_timeout_sec - INTEGER
+ Initial time period in second to disable Fastopen on active TCP sockets
+ when a TFO firewall blackhole issue happens.
+ This time period will grow exponentially when more blackhole issues
+ get detected right after Fastopen is re-enabled and will reset to
+ initial value when the blackhole issue goes away.
+ 0 to disable the blackhole detection.
+
+ By default, it is set to 1hr.
+
+tcp_fastopen_key - list of comma separated 32-digit hexadecimal INTEGERs
+ The list consists of a primary key and an optional backup key. The
+ primary key is used for both creating and validating cookies, while the
+ optional backup key is only used for validating cookies. The purpose of
+ the backup key is to maximize TFO validation when keys are rotated.
+
+ A randomly chosen primary key may be configured by the kernel if
+ the tcp_fastopen sysctl is set to 0x400 (see above), or if the
+ TCP_FASTOPEN setsockopt() optname is set and a key has not been
+ previously configured via sysctl. If keys are configured via
+ setsockopt() by using the TCP_FASTOPEN_KEY optname, then those
+ per-socket keys will be used instead of any keys that are specified via
+ sysctl.
+
+ A key is specified as 4 8-digit hexadecimal integers which are separated
+ by a '-' as: xxxxxxxx-xxxxxxxx-xxxxxxxx-xxxxxxxx. Leading zeros may be
+ omitted. A primary and a backup key may be specified by separating them
+ by a comma. If only one key is specified, it becomes the primary key and
+ any previously configured backup keys are removed.
+
+tcp_syn_retries - INTEGER
+ Number of times initial SYNs for an active TCP connection attempt
+ will be retransmitted. Should not be higher than 127. Default value
+ is 6, which corresponds to 63seconds till the last retransmission
+ with the current initial RTO of 1second. With this the final timeout
+ for an active TCP connection attempt will happen after 127seconds.
+
+tcp_timestamps - INTEGER
+ Enable timestamps as defined in RFC1323.
+
+ - 0: Disabled.
+ - 1: Enable timestamps as defined in RFC1323 and use random offset for
+ each connection rather than only using the current time.
+ - 2: Like 1, but without random offsets.
+
+ Default: 1
+
+tcp_min_tso_segs - INTEGER
+ Minimal number of segments per TSO frame.
+
+ Since linux-3.12, TCP does an automatic sizing of TSO frames,
+ depending on flow rate, instead of filling 64Kbytes packets.
+ For specific usages, it's possible to force TCP to build big
+ TSO frames. Note that TCP stack might split too big TSO packets
+ if available window is too small.
+
+ Default: 2
+
+tcp_pacing_ss_ratio - INTEGER
+ sk->sk_pacing_rate is set by TCP stack using a ratio applied
+ to current rate. (current_rate = cwnd * mss / srtt)
+ If TCP is in slow start, tcp_pacing_ss_ratio is applied
+ to let TCP probe for bigger speeds, assuming cwnd can be
+ doubled every other RTT.
+
+ Default: 200
+
+tcp_pacing_ca_ratio - INTEGER
+ sk->sk_pacing_rate is set by TCP stack using a ratio applied
+ to current rate. (current_rate = cwnd * mss / srtt)
+ If TCP is in congestion avoidance phase, tcp_pacing_ca_ratio
+ is applied to conservatively probe for bigger throughput.
+
+ Default: 120
+
+tcp_tso_win_divisor - INTEGER
+ This allows control over what percentage of the congestion window
+ can be consumed by a single TSO frame.
+ The setting of this parameter is a choice between burstiness and
+ building larger TSO frames.
+
+ Default: 3
+
+tcp_tw_reuse - INTEGER
+ Enable reuse of TIME-WAIT sockets for new connections when it is
+ safe from protocol viewpoint.
+
+ - 0 - disable
+ - 1 - global enable
+ - 2 - enable for loopback traffic only
+
+ It should not be changed without advice/request of technical
+ experts.
+
+ Default: 2
+
+tcp_window_scaling - BOOLEAN
+ Enable window scaling as defined in RFC1323.
+
+tcp_wmem - vector of 3 INTEGERs: min, default, max
+ min: Amount of memory reserved for send buffers for TCP sockets.
+ Each TCP socket has rights to use it due to fact of its birth.
+
+ Default: 4K
+
+ default: initial size of send buffer used by TCP sockets. This
+ value overrides net.core.wmem_default used by other protocols.
+
+ It is usually lower than net.core.wmem_default.
+
+ Default: 16K
+
+ max: Maximal amount of memory allowed for automatically tuned
+ send buffers for TCP sockets. This value does not override
+ net.core.wmem_max. Calling setsockopt() with SO_SNDBUF disables
+ automatic tuning of that socket's send buffer size, in which case
+ this value is ignored.
+
+ Default: between 64K and 4MB, depending on RAM size.
+
+tcp_notsent_lowat - UNSIGNED INTEGER
+ A TCP socket can control the amount of unsent bytes in its write queue,
+ thanks to TCP_NOTSENT_LOWAT socket option. poll()/select()/epoll()
+ reports POLLOUT events if the amount of unsent bytes is below a per
+ socket value, and if the write queue is not full. sendmsg() will
+ also not add new buffers if the limit is hit.
+
+ This global variable controls the amount of unsent data for
+ sockets not using TCP_NOTSENT_LOWAT. For these sockets, a change
+ to the global variable has immediate effect.
+
+ Default: UINT_MAX (0xFFFFFFFF)
+
+tcp_workaround_signed_windows - BOOLEAN
+ If set, assume no receipt of a window scaling option means the
+ remote TCP is broken and treats the window as a signed quantity.
+ If unset, assume the remote TCP is not broken even if we do
+ not receive a window scaling option from them.
+
+ Default: 0
+
+tcp_thin_linear_timeouts - BOOLEAN
+ Enable dynamic triggering of linear timeouts for thin streams.
+ If set, a check is performed upon retransmission by timeout to
+ determine if the stream is thin (less than 4 packets in flight).
+ As long as the stream is found to be thin, up to 6 linear
+ timeouts may be performed before exponential backoff mode is
+ initiated. This improves retransmission latency for
+ non-aggressive thin streams, often found to be time-dependent.
+ For more information on thin streams, see
+ Documentation/networking/tcp-thin.txt
+
+ Default: 0
+
+tcp_limit_output_bytes - INTEGER
+ Controls TCP Small Queue limit per tcp socket.
+ TCP bulk sender tends to increase packets in flight until it
+ gets losses notifications. With SNDBUF autotuning, this can
+ result in a large amount of packets queued on the local machine
+ (e.g.: qdiscs, CPU backlog, or device) hurting latency of other
+ flows, for typical pfifo_fast qdiscs. tcp_limit_output_bytes
+ limits the number of bytes on qdisc or device to reduce artificial
+ RTT/cwnd and reduce bufferbloat.
+
+ Default: 1048576 (16 * 65536)
+
+tcp_challenge_ack_limit - INTEGER
+ Limits number of Challenge ACK sent per second, as recommended
+ in RFC 5961 (Improving TCP's Robustness to Blind In-Window Attacks)
+ Default: 1000
+
+tcp_rx_skb_cache - BOOLEAN
+ Controls a per TCP socket cache of one skb, that might help
+ performance of some workloads. This might be dangerous
+ on systems with a lot of TCP sockets, since it increases
+ memory usage.
+
+ Default: 0 (disabled)
+
+UDP variables
+=============
+
+udp_l3mdev_accept - BOOLEAN
+ Enabling this option allows a "global" bound socket to work
+ across L3 master domains (e.g., VRFs) with packets capable of
+ being received regardless of the L3 domain in which they
+ originated. Only valid when the kernel was compiled with
+ CONFIG_NET_L3_MASTER_DEV.
+
+ Default: 0 (disabled)
+
+udp_mem - vector of 3 INTEGERs: min, pressure, max
+ Number of pages allowed for queueing by all UDP sockets.
+
+ min: Below this number of pages UDP is not bothered about its
+ memory appetite. When amount of memory allocated by UDP exceeds
+ this number, UDP starts to moderate memory usage.
+
+ pressure: This value was introduced to follow format of tcp_mem.
+
+ max: Number of pages allowed for queueing by all UDP sockets.
+
+ Default is calculated at boot time from amount of available memory.
+
+udp_rmem_min - INTEGER
+ Minimal size of receive buffer used by UDP sockets in moderation.
+ Each UDP socket is able to use the size for receiving data, even if
+ total pages of UDP sockets exceed udp_mem pressure. The unit is byte.
+
+ Default: 4K
+
+udp_wmem_min - INTEGER
+ Minimal size of send buffer used by UDP sockets in moderation.
+ Each UDP socket is able to use the size for sending data, even if
+ total pages of UDP sockets exceed udp_mem pressure. The unit is byte.
+
+ Default: 4K
+
+RAW variables
+=============
+
+raw_l3mdev_accept - BOOLEAN
+ Enabling this option allows a "global" bound socket to work
+ across L3 master domains (e.g., VRFs) with packets capable of
+ being received regardless of the L3 domain in which they
+ originated. Only valid when the kernel was compiled with
+ CONFIG_NET_L3_MASTER_DEV.
+</