From dca145ffaa8d39ea1904491ac81b92b7049372c0 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 27 Oct 2014 21:45:24 -0700 Subject: tcp: allow for bigger reordering level While testing upcoming Yaogong patch (converting out of order queue into an RB tree), I hit the max reordering level of linux TCP stack. Reordering level was limited to 127 for no good reason, and some network setups [1] can easily reach this limit and get limited throughput. Allow a new max limit of 300, and add a sysctl to allow admins to even allow bigger (or lower) values if needed. [1] Aggregation of links, per packet load balancing, fabrics not doing deep packet inspections, alternative TCP congestion modules... Signed-off-by: Eric Dumazet Cc: Yaogong Wang Signed-off-by: David S. Miller --- Documentation/networking/bonding.txt | 7 ++----- Documentation/networking/ip-sysctl.txt | 10 +++++++++- 2 files changed, 11 insertions(+), 6 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index eeb5b2e97bed..83bf4986baea 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -2230,11 +2230,8 @@ balance-rr: This mode is the only mode that will permit a single It is possible to adjust TCP/IP's congestion limits by altering the net.ipv4.tcp_reordering sysctl parameter. The - usual default value is 3, and the maximum useful value is 127. - For a four interface balance-rr bond, expect that a single - TCP/IP stream will utilize no more than approximately 2.3 - interface's worth of throughput, even after adjusting - tcp_reordering. + usual default value is 3. But keep in mind TCP stack is able + to automatically increase this when it detects reorders. Note that the fraction of packets that will be delivered out of order is highly variable, and is unlikely to be zero. The level diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 0307e2875f21..9028b879a97b 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -376,9 +376,17 @@ tcp_orphan_retries - INTEGER may consume significant resources. Cf. tcp_max_orphans. tcp_reordering - INTEGER - Maximal reordering of packets in a TCP stream. + Initial reordering level of packets in a TCP stream. + TCP stack can then dynamically adjust flow reordering level + between this initial value and tcp_max_reordering Default: 3 +tcp_max_reordering - INTEGER + Maximal reordering level of packets in a TCP stream. + 300 is a fairly conservative value, but you might increase it + if paths are using per packet load balancing (like bonding rr mode) + Default: 300 + tcp_retrans_collapse - BOOLEAN Bug-to-bug compatibility with some broken printers. On retransmit try to send bigger packets to work around bugs in -- cgit v1.2.3 From 7fd2561e4ebdd070ebba6d3326c4c5b13942323f Mon Sep 17 00:00:00 2001 From: Erik Kline Date: Tue, 28 Oct 2014 18:11:14 +0900 Subject: net: ipv6: Add a sysctl to make optimistic addresses useful candidates Add a sysctl that causes an interface's optimistic addresses to be considered equivalent to other non-deprecated addresses for source address selection purposes. Preferred addresses will still take precedence over optimistic addresses, subject to other ranking in the source address selection algorithm. This is useful where different interfaces are connected to different networks from different ISPs (e.g., a cell network and a home wifi network). The current behaviour complies with RFC 3484/6724, and it makes sense if the host has only one interface, or has multiple interfaces on the same network (same or cooperating administrative domain(s), but not in the multiple distinct networks case. For example, if a mobile device has an IPv6 address on an LTE network and then connects to IPv6-enabled wifi, while the wifi IPv6 address is undergoing DAD, IPv6 connections will try use the wifi default route with the LTE IPv6 address, and will get stuck until they time out. Also, because optimistic nodes can receive frames, issue an RTM_NEWADDR as soon as DAD starts (with the IFA_F_OPTIMSTIC flag appropriately set). A second RTM_NEWADDR is sent if DAD completes (the address flags have changed), otherwise an RTM_DELADDR is sent. Also: add an entry in ip-sysctl.txt for optimistic_dad. Signed-off-by: Erik Kline Acked-by: Lorenzo Colitti Acked-by: Hannes Frederic Sowa Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'Documentation') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 9028b879a97b..368e3251c553 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -1460,6 +1460,19 @@ suppress_frag_ndisc - INTEGER 1 - (default) discard fragmented neighbor discovery packets 0 - allow fragmented neighbor discovery packets +optimistic_dad - BOOLEAN + Whether to perform Optimistic Duplicate Address Detection (RFC 4429). + 0: disabled (default) + 1: enabled + +use_optimistic - BOOLEAN + If enabled, do not classify optimistic addresses as deprecated during + source address selection. Preferred addresses will still be chosen + before optimistic addresses, subject to other ranking in the source + address selection algorithm. + 0: disabled (default) + 1: enabled + icmp/*: ratelimit - INTEGER Limit the maximal rates for sending ICMPv6 packets. -- cgit v1.2.3 From 06745729c48e3677a64db63481184cc7aef1ea69 Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Wed, 29 Oct 2014 10:45:02 -0700 Subject: dsa: Add new optional devicetree property to describe EEPROM size The dsa core now supports reading from and writing to a switch EEPROM if connected. Describe optional devicetree property indicating that an EEPROM is present and its size. Signed-off-by: Guenter Roeck Signed-off-by: David S. Miller --- Documentation/devicetree/bindings/net/dsa/dsa.txt | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/dsa/dsa.txt b/Documentation/devicetree/bindings/net/dsa/dsa.txt index a62c889aafca..e124847443f8 100644 --- a/Documentation/devicetree/bindings/net/dsa/dsa.txt +++ b/Documentation/devicetree/bindings/net/dsa/dsa.txt @@ -10,7 +10,7 @@ Required properties: - dsa,ethernet : Should be a phandle to a valid Ethernet device node - dsa,mii-bus : Should be a phandle to a valid MDIO bus device node -Optionnal properties: +Optional properties: - interrupts : property with a value describing the switch interrupt number (not supported by the driver) @@ -23,6 +23,13 @@ Each of these switch child nodes should have the following required properties: - #address-cells : Must be 1 - #size-cells : Must be 0 +A switch child node has the following optional property: + +- eeprom-length : Set to the length of an EEPROM connected to the + switch. Must be set if the switch can not detect + the presence and/or size of a connected EEPROM, + otherwise optional. + A switch may have multiple "port" children nodes Each port children node must have the following mandatory properties: -- cgit v1.2.3 From 9227dc5e579b6b2ef58ad0d3d0d23ddac77846ef Mon Sep 17 00:00:00 2001 From: "Lendacky, Thomas" Date: Tue, 4 Nov 2014 16:06:56 -0600 Subject: amd-xgbe: Add support for per DMA channel interrupts This patch provides support for interrupts that are generated by the Tx/Rx DMA channel pairs of the device. This allows for Tx and Rx processing to run across multiple processsors. Signed-off-by: Tom Lendacky Signed-off-by: David S. Miller --- Documentation/devicetree/bindings/net/amd-xgbe.txt | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/amd-xgbe.txt b/Documentation/devicetree/bindings/net/amd-xgbe.txt index 41354f730beb..26efd526d16c 100644 --- a/Documentation/devicetree/bindings/net/amd-xgbe.txt +++ b/Documentation/devicetree/bindings/net/amd-xgbe.txt @@ -7,7 +7,10 @@ Required properties: - PCS registers - interrupt-parent: Should be the phandle for the interrupt controller that services interrupts for this device -- interrupts: Should contain the amd-xgbe interrupt +- interrupts: Should contain the amd-xgbe interrupt(s). The first interrupt + listed is required and is the general device interrupt. If the optional + amd,per-channel-interrupt property is specified, then one additional + interrupt for each DMA channel supported by the device should be specified - clocks: - DMA clock for the amd-xgbe device (used for calculating the correct Rx interrupt watchdog timer value on a DMA channel @@ -23,6 +26,9 @@ Optional properties: - mac-address: mac address to be assigned to the device. Can be overridden by UEFI. - dma-coherent: Present if dma operations are coherent +- amd,per-channel-interrupt: Indicates that Rx and Tx complete will generate + a unique interrupt for each DMA channel - this requires an additional + interrupt be configured for each DMA channel Example: xgbe@e0700000 { @@ -30,7 +36,9 @@ Example: reg = <0 0xe0700000 0 0x80000>, <0 0xe0780000 0 0x80000>; interrupt-parent = <&gic>; - interrupts = <0 325 4>; + interrupts = <0 325 4>, + <0 326 1>, <0 327 1>, <0 328 1>, <0 329 1>; + amd,per-channel-interrupt; clocks = <&xgbe_dma_clk>, <&xgbe_ptp_clk>; clock-names = "dma_clk", "ptp_clk"; phy-handle = <&phy>; -- cgit v1.2.3 From ba7a46f16dd29f93303daeb1fee8af316c5a07f4 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Tue, 11 Nov 2014 10:59:17 -0800 Subject: net: Convert LIMIT_NETDEBUG to net_dbg_ratelimited Use the more common dynamic_debug capable net_dbg_ratelimited and remove the LIMIT_NETDEBUG macro. All messages are still ratelimited. Some KERN_ uses are changed to KERN_DEBUG. This may have some negative impact on messages that were emitted at KERN_INFO that are not not enabled at all unless DEBUG is defined or dynamic_debug is enabled. Even so, these messages are now _not_ emitted by default. This also eliminates the use of the net_msg_warn sysctl "/proc/sys/net/core/warnings". For backward compatibility, the sysctl is not removed, but it has no function. The extern declaration of net_msg_warn is removed from sock.h and made static in net/core/sysctl_net_core.c Miscellanea: o Update the sysctl documentation o Remove the embedded uses of pr_fmt o Coalesce format fragments o Realign arguments Signed-off-by: Joe Perches Signed-off-by: David S. Miller --- Documentation/sysctl/net.txt | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) (limited to 'Documentation') diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt index 04892b821157..e26c607468a6 100644 --- a/Documentation/sysctl/net.txt +++ b/Documentation/sysctl/net.txt @@ -120,10 +120,14 @@ seconds. warnings -------- -This controls console messages from the networking stack that can occur because -of problems on the network like duplicate address or bad checksums. Normally, -this should be enabled, but if the problem persists the messages can be -disabled. +This sysctl is now unused. + +This was used to control console messages from the networking stack that +occur because of problems on the network like duplicate address or bad +checksums. + +These messages are now emitted at KERN_DEBUG and can generally be enabled +and controlled by the dynamic_debug facility. netdev_budget ------------- -- cgit v1.2.3 From 71783576b5345d63df048c0f18974037eea6e4f9 Mon Sep 17 00:00:00 2001 From: Hauke Mehrtens Date: Sat, 1 Nov 2014 16:54:56 +0100 Subject: bcma: get IRQ numbers from dt It is not possible to auto detect the irq numbers used by the cores on an arm SoC. If bcma was registered with device tree it will search for some device tree nodes with the irq number and add it to the core configuration. Signed-off-by: Hauke Mehrtens Signed-off-by: John W. Linville --- Documentation/devicetree/bindings/bus/bcma.txt | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/bus/bcma.txt b/Documentation/devicetree/bindings/bus/bcma.txt index 62a48348ac15..edd44d802139 100644 --- a/Documentation/devicetree/bindings/bus/bcma.txt +++ b/Documentation/devicetree/bindings/bus/bcma.txt @@ -8,6 +8,11 @@ Required properties: The cores on the AXI bus are automatically detected by bcma with the memory ranges they are using and they get registered afterwards. +Automatic detection of the IRQ number is not working on +BCM47xx/BCM53xx ARM SoCs. To assign IRQ numbers to the cores, provide +them manually through device tree. Use an interrupt-map to specify the +IRQ used by the devices on the bus. The first address is just an index, +because we do not have any special register. The top-level axi bus may contain children representing attached cores (devices). This is needed since some hardware details can't be auto @@ -22,6 +27,22 @@ Example: ranges = <0x00000000 0x18000000 0x00100000>; #address-cells = <1>; #size-cells = <1>; + #interrupt-cells = <1>; + interrupt-map-mask = <0x000fffff 0xffff>; + interrupt-map = + /* Ethernet Controller 0 */ + <0x00024000 0 &gic GIC_SPI 147 IRQ_TYPE_LEVEL_HIGH>, + + /* Ethernet Controller 1 */ + <0x00025000 0 &gic GIC_SPI 148 IRQ_TYPE_LEVEL_HIGH>; + + /* PCIe Controller 0 */ + <0x00012000 0 &gic GIC_SPI 126 IRQ_TYPE_LEVEL_HIGH>, + <0x00012000 1 &gic GIC_SPI 127 IRQ_TYPE_LEVEL_HIGH>, + <0x00012000 2 &gic GIC_SPI 128 IRQ_TYPE_LEVEL_HIGH>, + <0x00012000 3 &gic GIC_SPI 129 IRQ_TYPE_LEVEL_HIGH>, + <0x00012000 4 &gic GIC_SPI 130 IRQ_TYPE_LEVEL_HIGH>, + <0x00012000 5 &gic GIC_SPI 131 IRQ_TYPE_LEVEL_HIGH>; chipcommon { reg = <0x00000000 0x1000>; -- cgit v1.2.3 From c88c7d3204e66faf1f24402d551ac1d0b400e3ff Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Tue, 11 Nov 2014 20:00:07 +0100 Subject: dt/bindings: fix documentation of ethernet-phy compatible property A recent commit extended the documentation of the ethernet-phy compatible property, but placed the new paragraph under the max-speed property. Fixes: f00e756ed12d ("dt: Document a compatible entry for MDIO ethernet Phys") Cc: devicetree@vger.kernel.org Signed-off-by: Johan Hovold Signed-off-by: David S. Miller --- Documentation/devicetree/bindings/net/phy.txt | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/phy.txt b/Documentation/devicetree/bindings/net/phy.txt index 5b8c58903077..40831fbaff72 100644 --- a/Documentation/devicetree/bindings/net/phy.txt +++ b/Documentation/devicetree/bindings/net/phy.txt @@ -19,7 +19,6 @@ Optional Properties: specifications. If neither of these are specified, the default is to assume clause 22. The compatible list may also contain other elements. -- max-speed: Maximum PHY supported speed (10, 100, 1000...) If the phy's identifier is known then the list may contain an entry of the form: "ethernet-phy-idAAAA.BBBB" where @@ -29,6 +28,8 @@ Optional Properties: 4 hex digits. This is the chip vendor OUI bits 19:24, followed by 10 bits of a vendor specific ID. +- max-speed: Maximum PHY supported speed (10, 100, 1000...) + Example: ethernet-phy@0 { -- cgit v1.2.3 From 7b52314cc44569f56aa07abdbe43e6ccfcef9478 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Tue, 11 Nov 2014 20:00:15 +0100 Subject: net: phy: micrel: enable led-mode for KSZ8081/KSZ8091 Enable led-mode configuration for KSZ8081 and KSZ8091. Signed-off-by: Johan Hovold Signed-off-by: David S. Miller --- Documentation/devicetree/bindings/net/micrel.txt | 2 ++ 1 file changed, 2 insertions(+) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/micrel.txt b/Documentation/devicetree/bindings/net/micrel.txt index e1d99b95c4ec..30062fae5623 100644 --- a/Documentation/devicetree/bindings/net/micrel.txt +++ b/Documentation/devicetree/bindings/net/micrel.txt @@ -14,6 +14,8 @@ Optional properties: KSZ8021: register 0x1f, bits 5..4 KSZ8031: register 0x1f, bits 5..4 KSZ8051: register 0x1f, bits 5..4 + KSZ8081: register 0x1f, bits 5..4 + KSZ8091: register 0x1f, bits 5..4 See the respective PHY datasheet for the mode values. -- cgit v1.2.3 From 9488e1e5b319751c71eebfd49027bf9e2377f38c Mon Sep 17 00:00:00 2001 From: Hisashi Nakamura Date: Thu, 13 Nov 2014 15:59:07 +0900 Subject: net: sh_eth: Add r8a7793 support The device tree probing for R-Car M2N (r8a7793) is added. Signed-off-by: Hisashi Nakamura Signed-off-by: Yoshihiro Kaneko Signed-off-by: David S. Miller --- Documentation/devicetree/bindings/net/sh_eth.txt | 1 + 1 file changed, 1 insertion(+) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/sh_eth.txt b/Documentation/devicetree/bindings/net/sh_eth.txt index 34d4db1a4e25..2f6ec85fda8e 100644 --- a/Documentation/devicetree/bindings/net/sh_eth.txt +++ b/Documentation/devicetree/bindings/net/sh_eth.txt @@ -9,6 +9,7 @@ Required properties: "renesas,ether-r8a7779" if the device is a part of R8A7779 SoC. "renesas,ether-r8a7790" if the device is a part of R8A7790 SoC. "renesas,ether-r8a7791" if the device is a part of R8A7791 SoC. + "renesas,ether-r8a7793" if the device is a part of R8A7793 SoC. "renesas,ether-r8a7794" if the device is a part of R8A7794 SoC. "renesas,ether-r7s72100" if the device is a part of R7S72100 SoC. - reg: offset and length of (1) the E-DMAC/feLic register block (required), -- cgit v1.2.3 From 960fb622f85180f36d3aff82af53e2be3db2f888 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Sun, 16 Nov 2014 06:23:05 -0800 Subject: net: provide a per host RSS key generic infrastructure RSS (Receive Side Scaling) typically uses Toeplitz hash and a 40 or 52 bytes RSS key. Some drivers use a constant (and well known key), some drivers use a random key per port, making bonding setups hard to tune. Well known keys increase attack surface, considering that number of queues is usually a power of two. This patch provides infrastructure to help drivers doing the right thing. netdev_rss_key_fill() should be used by drivers to initialize their RSS key, even if they provide ethtool -X support to let user redefine the key later. A new /proc/sys/net/core/netdev_rss_key file can be used to get the host RSS key even for drivers not providing ethtool -x support, in case some applications want to precisely setup flows to match some RX queues. Tested: myhost:~# cat /proc/sys/net/core/netdev_rss_key 11:63:99:bb:79:fb:a5:a7:07:45:b2:20:bf:02:42:2d:08:1a:dd:19:2b:6b:23:ac:56:28:9d:70:c3:ac:e8:16:4b:b7:c1:10:53:a4:78:41:36:40:74:b6:15:ca:27:44:aa:b3:4d:72 myhost:~# ethtool -x eth0 RX flow hash indirection table for eth0 with 8 RX ring(s): 0: 0 1 2 3 4 5 6 7 RSS hash key: 11:63:99:bb:79:fb:a5:a7:07:45:b2:20:bf:02:42:2d:08:1a:dd:19:2b:6b:23:ac:56:28:9d:70:c3:ac:e8:16:4b:b7:c1:10:53:a4:78:41 Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- Documentation/sysctl/net.txt | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) (limited to 'Documentation') diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt index e26c607468a6..666594b43cff 100644 --- a/Documentation/sysctl/net.txt +++ b/Documentation/sysctl/net.txt @@ -142,6 +142,28 @@ netdev_max_backlog Maximum number of packets, queued on the INPUT side, when the interface receives packets faster than kernel can process them. +netdev_rss_key +-------------- + +RSS (Receive Side Scaling) enabled drivers use a 40 bytes host key that is +randomly generated. +Some user space might need to gather its content even if drivers do not +provide ethtool -x support yet. + +myhost:~# cat /proc/sys/net/core/netdev_rss_key +84:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8: ... (52 bytes total) + +File contains nul bytes if no driver ever called netdev_rss_key_fill() function. +Note: +/proc/sys/net/core/netdev_rss_key contains 52 bytes of key, +but most drivers only use 40 bytes of it. + +myhost:~# ethtool -x eth0 +RX flow hash indirection table for eth0 with 8 RX ring(s): + 0: 0 1 2 3 4 5 6 7 +RSS hash key: +84:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8:43:e3:c9:0c:fd:17:55:c2:3a:4d:69:ed:f1:42:89 + netdev_tstamp_prequeue ---------------------- -- cgit v1.2.3 From 3ff9027ca6b00e194d2eae353febf7233cfcc1ea Mon Sep 17 00:00:00 2001 From: Roger Quadros Date: Fri, 14 Nov 2014 17:37:39 +0200 Subject: can: c_can: Add syscon/regmap RAMINIT mechanism Some TI SoCs like DRA7 have a RAMINIT register specification different from the other AMxx SoCs and as expected by the existing driver. To add more insanity, this register is shared with other IPs like DSS, PCIe and PWM. Provides a more generic mechanism to specify the RAMINIT register location and START/DONE bit position and use the syscon/regmap framework to access the register. Signed-off-by: Roger Quadros Signed-off-by: Marc Kleine-Budde --- Documentation/devicetree/bindings/net/can/c_can.txt | 3 +++ 1 file changed, 3 insertions(+) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/can/c_can.txt b/Documentation/devicetree/bindings/net/can/c_can.txt index 8f1ae81228e3..a3ca3ee53546 100644 --- a/Documentation/devicetree/bindings/net/can/c_can.txt +++ b/Documentation/devicetree/bindings/net/can/c_can.txt @@ -12,6 +12,9 @@ Required properties: Optional properties: - ti,hwmods : Must be "d_can" or "c_can", n being the instance number +- syscon-raminit : Handle to system control region that contains the + RAMINIT register, register offset to the RAMINIT + register and the CAN instance number (0 offset). Note: "ti,hwmods" field is used to fetch the base address and irq resources from TI, omap hwmod data base during device registration. -- cgit v1.2.3 From 0f4da3a8da5fe3a079b1adf613d121a0fafd63f1 Mon Sep 17 00:00:00 2001 From: Roger Quadros Date: Fri, 7 Nov 2014 16:49:21 +0200 Subject: can: c_can: Add support for TI DRA7 DCAN DRA7 SoC has 2 CAN IPs. Provide compatible IDs and RAMINIT register data for both. Signed-off-by: Roger Quadros Signed-off-by: Marc Kleine-Budde --- Documentation/devicetree/bindings/net/can/c_can.txt | 1 + 1 file changed, 1 insertion(+) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/can/c_can.txt b/Documentation/devicetree/bindings/net/can/c_can.txt index a3ca3ee53546..f682fdb6d921 100644 --- a/Documentation/devicetree/bindings/net/can/c_can.txt +++ b/Documentation/devicetree/bindings/net/can/c_can.txt @@ -4,6 +4,7 @@ Bosch C_CAN/D_CAN controller Device Tree Bindings Required properties: - compatible : Should be "bosch,c_can" for C_CAN controllers and "bosch,d_can" for D_CAN controllers. + Can be "ti,dra7-d_can". - reg : physical base address and size of the C_CAN/D_CAN registers map - interrupts : property with a value describing the interrupt -- cgit v1.2.3 From c71d0b31bf71b84ad566292ea2412755da7934b2 Mon Sep 17 00:00:00 2001 From: Roger Quadros Date: Fri, 7 Nov 2014 16:49:22 +0200 Subject: can: c_can: Add support for TI am3352 DCAN AM3352 SoC has 2 DCAN modules. Add compatible id and raminit driver data for am3352 DCAN. Signed-off-by: Roger Quadros Acked-by: Wolfram Sang Signed-off-by: Marc Kleine-Budde --- Documentation/devicetree/bindings/net/can/c_can.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/can/c_can.txt b/Documentation/devicetree/bindings/net/can/c_can.txt index f682fdb6d921..6731730eec0d 100644 --- a/Documentation/devicetree/bindings/net/can/c_can.txt +++ b/Documentation/devicetree/bindings/net/can/c_can.txt @@ -4,7 +4,7 @@ Bosch C_CAN/D_CAN controller Device Tree Bindings Required properties: - compatible : Should be "bosch,c_can" for C_CAN controllers and "bosch,d_can" for D_CAN controllers. - Can be "ti,dra7-d_can". + Can be "ti,dra7-d_can" or "ti,am3352-d_can". - reg : physical base address and size of the C_CAN/D_CAN registers map - interrupts : property with a value describing the interrupt -- cgit v1.2.3 From f2bf2589834faec7af8c02c3949c90788d21b790 Mon Sep 17 00:00:00 2001 From: Roger Quadros Date: Mon, 17 Nov 2014 14:09:03 +0200 Subject: net: can: c_can: Add support for TI am4372 DCAN AM4372 SoC has 2 DCAN modules. Add compatible id and raminit driver data for it. The driver data is same as AM3352 but this gives us flexibility to add AM4372 specific quirks if required later. Signed-off-by: Roger Quadros Signed-off-by: Marc Kleine-Budde --- Documentation/devicetree/bindings/net/can/c_can.txt | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/can/c_can.txt b/Documentation/devicetree/bindings/net/can/c_can.txt index 6731730eec0d..5a1d8b0c39e9 100644 --- a/Documentation/devicetree/bindings/net/can/c_can.txt +++ b/Documentation/devicetree/bindings/net/can/c_can.txt @@ -4,7 +4,8 @@ Bosch C_CAN/D_CAN controller Device Tree Bindings Required properties: - compatible : Should be "bosch,c_can" for C_CAN controllers and "bosch,d_can" for D_CAN controllers. - Can be "ti,dra7-d_can" or "ti,am3352-d_can". + Can be "ti,dra7-d_can", "ti,am3352-d_can" or + "ti,am4372-d_can". - reg : physical base address and size of the C_CAN/D_CAN registers map - interrupts : property with a value describing the interrupt -- cgit v1.2.3 From 098ea6bc4cb5890d09b1b79154fa11182e0f71e0 Mon Sep 17 00:00:00 2001 From: Amitkumar Karwar Date: Wed, 19 Nov 2014 01:28:32 -0800 Subject: Bluetooth: btmrvl: add DT bindings documentation Calibration data can be downloaded through device tree method. This patch adds the documentation. Also, instead of searching device tree node by name using of_find_node_by_name() API, let's use for_each_compatible_node(). Signed-off-by: Amitkumar Karwar Signed-off-by: Cathy Luo Signed-off-by: Avinash Patil Signed-off-by: Marcel Holtmann --- Documentation/devicetree/bindings/btmrvl.txt | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) create mode 100644 Documentation/devicetree/bindings/btmrvl.txt (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/btmrvl.txt b/Documentation/devicetree/bindings/btmrvl.txt new file mode 100644 index 000000000000..cdd57680a749 --- /dev/null +++ b/Documentation/devicetree/bindings/btmrvl.txt @@ -0,0 +1,22 @@ +btmrvl +------ + +Required properties: + + - compatible : must be "btmrvl,cfgdata" + +Optional properties: + + - btmrvl,cal-data : Calibration data downloaded to the device during + initialization. This is an array of 28 values(u8). + +Example: + +btmrvl { + compatible = "btmrvl,cfgdata"; + + btmrvl,cal-data = /bits/ 8 < + 0x37 0x01 0x1c 0x00 0xff 0xff 0xff 0xff 0x01 0x7f 0x04 0x02 + 0x00 0x00 0xba 0xce 0xc0 0xc6 0x2d 0x00 0x00 0x00 0x00 0x00 + 0x00 0x00 0xf0 0x00>; +}; -- cgit v1.2.3 From 025a60a752261fd479a4bb74cf7cb04fdc313ec7 Mon Sep 17 00:00:00 2001 From: Amitkumar Karwar Date: Wed, 19 Nov 2014 01:28:33 -0800 Subject: Bluetooth: btmrvl: add DT-bindings for gpio-gap This can be used to have GPIO host wakeup method suitable for the platform and configurable GAP for host sleep handshake. Signed-off-by: Amitkumar Karwar Signed-off-by: Cathy Luo Signed-off-by: Avinash Patil Signed-off-by: Marcel Holtmann --- Documentation/devicetree/bindings/btmrvl.txt | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/btmrvl.txt b/Documentation/devicetree/bindings/btmrvl.txt index cdd57680a749..58f964bb0a52 100644 --- a/Documentation/devicetree/bindings/btmrvl.txt +++ b/Documentation/devicetree/bindings/btmrvl.txt @@ -10,8 +10,14 @@ Optional properties: - btmrvl,cal-data : Calibration data downloaded to the device during initialization. This is an array of 28 values(u8). + - btmrvl,gpio-gap : gpio and gap (in msecs) combination to be + configured. + Example: +GPIO pin 13 is configured as a wakeup source and GAP is set to 100 msecs +in below example. + btmrvl { compatible = "btmrvl,cfgdata"; @@ -19,4 +25,5 @@ btmrvl { 0x37 0x01 0x1c 0x00 0xff 0xff 0xff 0xff 0x01 0x7f 0x04 0x02 0x00 0x00 0xba 0xce 0xc0 0xc6 0x2d 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xf0 0x00>; + btmrvl,gpio-gap = <0x0d64>; }; -- cgit v1.2.3 From 233b36cf1f83ef4a7b3670ffa5eeac1ed9d41889 Mon Sep 17 00:00:00 2001 From: Giuseppe CAVALLARO Date: Tue, 18 Nov 2014 09:46:59 +0100 Subject: stmmac: update driver documentation Recently many changes have been done inside the driver so this patch updates the driver's doc for example reviewing information for the rx and tx processes that are managed by napi method, adding new information for missing glue-logic files etc. Signed-off-by: Giuseppe Cavallaro Signed-off-by: David S. Miller --- Documentation/networking/stmmac.txt | 132 ++++++++++++++++++------------------ 1 file changed, 65 insertions(+), 67 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/stmmac.txt b/Documentation/networking/stmmac.txt index 2090895b08d4..e655e2453c98 100644 --- a/Documentation/networking/stmmac.txt +++ b/Documentation/networking/stmmac.txt @@ -1,12 +1,12 @@ STMicroelectronics 10/100/1000 Synopsys Ethernet driver -Copyright (C) 2007-2013 STMicroelectronics Ltd +Copyright (C) 2007-2014 STMicroelectronics Ltd Author: Giuseppe Cavallaro This is the driver for the MAC 10/100/1000 on-chip Ethernet controllers (Synopsys IP blocks). -Currently this network device driver is for all STM embedded MAC/GMAC +Currently this network device driver is for all STi embedded MAC/GMAC (i.e. 7xxx/5xxx SoCs), SPEAr (arm), Loongson1B (mips) and XLINX XC2V3000 FF1152AMT0221 D1215994A VIRTEX FPGA board. @@ -22,6 +22,9 @@ The kernel configuration option is STMMAC_ETH: Device Drivers ---> Network device support ---> Ethernet (1000 Mbit) ---> STMicroelectronics 10/100/1000 Ethernet driver (STMMAC_ETH) +CONFIG_STMMAC_PLATFORM: is to enable the platform driver. +CONFIG_STMMAC_PCI: is to enable the pci driver. + 2) Driver parameters list: debug: message level (0: no output, 16: all); phyaddr: to manually provide the physical address to the PHY device; @@ -45,10 +48,11 @@ Driver parameters can be also passed in command line by using: The xmit method is invoked when the kernel needs to transmit a packet; it sets the descriptors in the ring and informs the DMA engine that there is a packet ready to be transmitted. -Once the controller has finished transmitting the packet, an interrupt is -triggered; So the driver will be able to release the socket buffers. By default, the driver sets the NETIF_F_SG bit in the features field of the -net_device structure enabling the scatter/gather feature. +net_device structure enabling the scatter-gather feature. This is true on +chips and configurations where the checksum can be done in hardware. +Once the controller has finished transmitting the packet, napi will be +scheduled to release the transmit resources. 4.2) Receive process When one or more packets are received, an interrupt happens. The interrupts @@ -58,20 +62,12 @@ This is based on NAPI so the interrupt handler signals only if there is work to be done, and it exits. Then the poll method will be scheduled at some future point. The incoming packets are stored, by the DMA, in a list of pre-allocated socket -buffers in order to avoid the memcpy (Zero-copy). +buffers in order to avoid the memcpy (zero-copy). 4.3) Interrupt Mitigation The driver is able to mitigate the number of its DMA interrupts using NAPI for the reception on chips older than the 3.50. New chips have an HW RX-Watchdog used for this mitigation. - -On Tx-side, the mitigation schema is based on a SW timer that calls the -tx function (stmmac_tx) to reclaim the resource after transmitting the -frames. -Also there is another parameter (like a threshold) used to program -the descriptors avoiding to set the interrupt on completion bit in -when the frame is sent (xmit). - Mitigation parameters can be tuned by ethtool. 4.4) WOL @@ -79,7 +75,7 @@ Wake up on Lan feature through Magic and Unicast frames are supported for the GMAC core. 4.5) DMA descriptors -Driver handles both normal and enhanced descriptors. The latter has been only +Driver handles both normal and alternate descriptors. The latter has been only tested on DWC Ether MAC 10/100/1000 Universal version 3.41a and later. STMMAC supports DMA descriptor to operate both in dual buffer (RING) @@ -91,9 +87,20 @@ In CHAINED mode each descriptor will have pointer to next descriptor in the list, hence creating the explicit chaining in the descriptor itself, whereas such explicit chaining is not possible in RING mode. +4.5.1) Extended descriptors + The extended descriptors give us information about the Ethernet payload + when it is carrying PTP packets or TCP/UDP/ICMP over IP. + These are not available on GMAC Synopsys chips older than the 3.50. + At probe time the driver will decide if these can be actually used. + This support also is mandatory for PTPv2 because the extra descriptors + are used for saving the hardware timestamps and Extended Status. + 4.6) Ethtool support -Ethtool is supported. Driver statistics and internal errors can be taken using: -ethtool -S ethX command. It is possible to dump registers etc. +Ethtool is supported. + +For example, driver statistics (including RMON), internal errors can be taken +using: + # ethtool -S ethX command 4.7) Jumbo and Segmentation Offloading Jumbo frames are supported and tested for the GMAC. @@ -101,12 +108,11 @@ The GSO has been also added but it's performed in software. LRO is not supported. 4.8) Physical -The driver is compatible with PAL to work with PHY and GPHY devices. +The driver is compatible with Physical Abstraction Layer to be connected with +PHY and GPHY devices. 4.9) Platform information -Several driver's information can be passed through the platform -These are included in the include/linux/stmmac.h header file -and detailed below as well: +Several information can be passed through the platform and device-tree. struct plat_stmmacenet_data { char *phy_bus_name; @@ -125,15 +131,18 @@ struct plat_stmmacenet_data { int force_sf_dma_mode; int force_thresh_dma_mode; int riwt_off; + int max_speed; + int maxmtu; void (*fix_mac_speed)(void *priv, unsigned int speed); void (*bus_setup)(void __iomem *ioaddr); void *(*setup)(struct platform_device *pdev); + void (*free)(struct platform_device *pdev, void *priv); int (*init)(struct platform_device *pdev, void *priv); void (*exit)(struct platform_device *pdev, void *priv); void *custom_cfg; void *custom_data; void *bsp_priv; - }; +}; Where: o phy_bus_name: phy bus name to attach to the stmmac. @@ -258,32 +267,43 @@ and the second one, with a real PHY device attached to the bus, by using the stmmac_mdio_bus_data structure (to provide the id, the reset procedure etc). -4.10) List of source files: - o Kconfig - o Makefile - o stmmac_main.c: main network device driver; - o stmmac_mdio.c: mdio functions; - o stmmac_pci: PCI driver; - o stmmac_platform.c: platform driver - o stmmac_ethtool.c: ethtool support; - o stmmac_timer.[ch]: timer code used for mitigating the driver dma interrupts - (only tested on ST40 platforms based); +Note that, starting from new chips, where it is available the HW capability +register, many configurations are discovered at run-time for example to +understand if EEE, HW csum, PTP, enhanced descriptor etc are actually +available. As strategy adopted in this driver, the information from the HW +capability register can replace what has been passed from the platform. + +4.10) Device-tree support. + +Please see the following document: + Documentation/devicetree/bindings/net/stmmac.txt + +and the stmmac_of_data structure inside the include/linux/stmmac.h header file. + +4.11) This is a summary of the content of some relevant files: + o stmmac_main.c: to implement the main network device driver; + o stmmac_mdio.c: to provide mdio functions; + o stmmac_pci: this the PCI driver; + o stmmac_platform.c: this the platform driver (OF supported) + o stmmac_ethtool.c: to implement the ethtool support; o stmmac.h: private driver structure; o common.h: common definitions and VFTs; o descs.h: descriptor structure definitions; - o dwmac1000_core.c: GMAC core functions; - o dwmac1000_dma.c: dma functions for the GMAC chip; - o dwmac1000.h: specific header file for the GMAC; - o dwmac100_core: MAC 100 core and dma code; - o dwmac100_dma.c: dma functions for the MAC chip; + o dwmac1000_core.c: dwmac GiGa core functions; + o dwmac1000_dma.c: dma functions for the GMAC chip; + o dwmac1000.h: specific header file for the dwmac GiGa; + o dwmac100_core: dwmac 100 core code; + o dwmac100_dma.c: dma functions for the dwmac 100 chip; o dwmac1000.h: specific header file for the MAC; - o dwmac_lib.c: generic DMA functions shared among chips; + o dwmac_lib.c: generic DMA functions; o enh_desc.c: functions for handling enhanced descriptors; o norm_desc.c: functions for handling normal descriptors; o chain_mode.c/ring_mode.c:: functions to manage RING/CHAINED modes; o mmc_core.c/mmc.h: Management MAC Counters; - o stmmac_hwtstamp.c: HW timestamp support for PTP - o stmmac_ptp.c: PTP 1588 clock + o stmmac_hwtstamp.c: HW timestamp support for PTP; + o stmmac_ptp.c: PTP 1588 clock; + o dwmac-.c: these are for the platform glue-logic file; e.g. dwmac-sti.c + for STMicroelectronics SoCs. 5) Debug Information @@ -298,23 +318,14 @@ to get statistics: e.g. using: ethtool -S ethX (that shows the Management counters (MMC) if supported) or sees the MAC/DMA registers: e.g. using: ethtool -d ethX -Compiling the Kernel with CONFIG_DEBUG_FS and enabling the -STMMAC_DEBUG_FS option the driver will export the following +Compiling the Kernel with CONFIG_DEBUG_FS the driver will export the following debugfs entries: /sys/kernel/debug/stmmaceth/descriptors_status To show the DMA TX/RX descriptor rings -Developer can also use the "debug" module parameter to get -further debug information. - -In the end, there are other macros (that cannot be enabled -via menuconfig) to turn-on the RX/TX DMA debugging, -specific MAC core debug printk etc. Others to enable the -debug in the TX and RX processes. -All these are only useful during the developing stage -and should never enabled inside the code for general usage. -In fact, these can generate an huge amount of debug messages. +Developer can also use the "debug" module parameter to get further debug +information (please see: NETIF Msg Level). 6) Energy Efficient Ethernet @@ -337,15 +348,7 @@ To enter in Tx LPI mode the driver needs to have a software timer that enable and disable the LPI mode when there is nothing to be transmitted. -7) Extended descriptors -The extended descriptors give us information about the receive Ethernet payload -when it is carrying PTP packets or TCP/UDP/ICMP over IP. -These are not available on GMAC Synopsys chips older than the 3.50. -At probe time the driver will decide if these can be actually used. -This support also is mandatory for PTPv2 because the extra descriptors 6 and 7 -are used for saving the hardware timestamps. - -8) Precision Time Protocol (PTP) +7) Precision Time Protocol (PTP) The driver supports the IEEE 1588-2002, Precision Time Protocol (PTP), which enables precise synchronization of clocks in measurement and control systems implemented with technologies such as network @@ -355,7 +358,7 @@ In addition to the basic timestamp features mentioned in IEEE 1588-2002 Timestamps, new GMAC cores support the advanced timestamp features. IEEE 1588-2008 that can be enabled when configure the Kernel. -9) SGMII/RGMII supports +8) SGMII/RGMII supports New GMAC devices provide own way to manage RGMII/SGMII. This information is available at run-time by looking at the HW capability register. This means that the stmmac can manage @@ -364,8 +367,3 @@ In fact, the HW provides a subset of extended registers to restart the ANE, verify Full/Half duplex mode and Speed. Also thanks to these registers it is possible to look at the Auto-negotiated Link Parter Ability. - -10) TODO: - o XGMAC is not supported. - o Complete the TBI & RTBI support. - o extend VLAN support for 3.70a SYNP GMAC. -- cgit v1.2.3 From 86dc1342bcbb1905b2ac9653a559b303f62bd728 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Wed, 19 Nov 2014 12:59:19 +0100 Subject: net: phy: micrel: add support for clock-mode select to KSZ8081/KSZ8091 Micrel KSZ8081 and KSZ8091 PHYs have the RMII Reference Clock Select bit, which is used to select 25 or 50 MHz clock mode. Note that on some revisions of the PHY (e.g. KSZ8081RND) the function of this bit is inverted so that setting it enables 25 rather than 50 MHz mode. Add a new device-tree property "micrel,rmii-reference-clock-select-25-mhz" to describe this. Signed-off-by: Johan Hovold Signed-off-by: David S. Miller --- Documentation/devicetree/bindings/net/micrel.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/micrel.txt b/Documentation/devicetree/bindings/net/micrel.txt index 30062fae5623..a1bab5eaae02 100644 --- a/Documentation/devicetree/bindings/net/micrel.txt +++ b/Documentation/devicetree/bindings/net/micrel.txt @@ -22,5 +22,5 @@ Optional properties: - clocks, clock-names: contains clocks according to the common clock bindings. supported clocks: - - KSZ8021, KSZ8031: "rmii-ref": The RMII refence input clock. Used - to determine the XI input clock. + - KSZ8021, KSZ8031, KSZ8081, KSZ8091: "rmii-ref": The RMII + refence input clock. Used to determine the XI input clock. -- cgit v1.2.3 From 6d01329444725a5c17cf75ba6c5c0c5e42843613 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Wed, 19 Nov 2014 12:59:20 +0100 Subject: dt/bindings: reformat micrel eth-phy documentation Reduce indentation of Micrel PHY binding documentations somewhat. Also fix "reference input clock" typo while at it. Cc: devicetree@vger.kernel.org Signed-off-by: Johan Hovold Signed-off-by: David S. Miller --- Documentation/devicetree/bindings/net/micrel.txt | 26 ++++++++++++------------ 1 file changed, 13 insertions(+), 13 deletions(-) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/micrel.txt b/Documentation/devicetree/bindings/net/micrel.txt index a1bab5eaae02..20a6cac7abc6 100644 --- a/Documentation/devicetree/bindings/net/micrel.txt +++ b/Documentation/devicetree/bindings/net/micrel.txt @@ -6,21 +6,21 @@ Optional properties: - micrel,led-mode : LED mode value to set for PHYs with configurable LEDs. - Configure the LED mode with single value. The list of PHYs and - the bits that are currently supported: + Configure the LED mode with single value. The list of PHYs and the + bits that are currently supported: - KSZ8001: register 0x1e, bits 15..14 - KSZ8041: register 0x1e, bits 15..14 - KSZ8021: register 0x1f, bits 5..4 - KSZ8031: register 0x1f, bits 5..4 - KSZ8051: register 0x1f, bits 5..4 - KSZ8081: register 0x1f, bits 5..4 - KSZ8091: register 0x1f, bits 5..4 + KSZ8001: register 0x1e, bits 15..14 + KSZ8041: register 0x1e, bits 15..14 + KSZ8021: register 0x1f, bits 5..4 + KSZ8031: register 0x1f, bits 5..4 + KSZ8051: register 0x1f, bits 5..4 + KSZ8081: register 0x1f, bits 5..4 + KSZ8091: register 0x1f, bits 5..4 - See the respective PHY datasheet for the mode values. + See the respective PHY datasheet for the mode values. - clocks, clock-names: contains clocks according to the common clock bindings. - supported clocks: - - KSZ8021, KSZ8031, KSZ8081, KSZ8091: "rmii-ref": The RMII - refence input clock. Used to determine the XI input clock. + supported clocks: + - KSZ8021, KSZ8031, KSZ8081, KSZ8091: "rmii-ref": The RMII reference + input clock. Used to determine the XI input clock. -- cgit v1.2.3 From c3a8e1eddd6c140fbaaf957f7da8f2175f79623d Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Wed, 19 Nov 2014 12:59:21 +0100 Subject: dt/bindings: add clock-select function property to micrel phy binding Add "micrel,rmii-reference-clock-select-25-mhz" to Micrel ethernet PHY binding documentation. This property is needed to properly describe some revisions of Micrel PHYs which has the function of this configuration bit inverted so that setting it enables 25 MHz rather than 50 MHz clock mode. Note that a clock reference ("rmii-ref") is still needed to actually select either mode. Cc: devicetree@vger.kernel.org Signed-off-by: Johan Hovold Acked-by: Sascha Hauer Signed-off-by: David S. Miller --- Documentation/devicetree/bindings/net/micrel.txt | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/micrel.txt b/Documentation/devicetree/bindings/net/micrel.txt index 20a6cac7abc6..87496a8c64ab 100644 --- a/Documentation/devicetree/bindings/net/micrel.txt +++ b/Documentation/devicetree/bindings/net/micrel.txt @@ -19,6 +19,17 @@ Optional properties: See the respective PHY datasheet for the mode values. + - micrel,rmii-reference-clock-select-25-mhz: RMII Reference Clock Select + bit selects 25 MHz mode + + Setting the RMII Reference Clock Select bit enables 25 MHz rather + than 50 MHz clock mode. + + Note that this option in only needed for certain PHY revisions with a + non-standard, inverted function of this configuration bit. + Specifically, a clock reference ("rmii-ref" below) is always needed to + actually select a mode. + - clocks, clock-names: contains clocks according to the common clock bindings. supported clocks: -- cgit v1.2.3 From 2ad7bf3638411cb547f2823df08166c13ab04269 Mon Sep 17 00:00:00 2001 From: Mahesh Bandewar Date: Sun, 23 Nov 2014 23:07:46 -0800 Subject: ipvlan: Initial check-in of the IPVLAN driver. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This driver is very similar to the macvlan driver except that it uses L3 on the frame to determine the logical interface while functioning as packet dispatcher. It inherits L2 of the master device hence the packets on wire will have the same L2 for all the packets originating from all virtual devices off of the same master device. This driver was developed keeping the namespace use-case in mind. Hence most of the examples given here take that as the base setup where main-device belongs to the default-ns and virtual devices are assigned to the additional namespaces. The device operates in two different modes and the difference in these two modes in primarily in the TX side. (a) L2 mode : In this mode, the device behaves as a L2 device. TX processing upto L2 happens on the stack of the virtual device associated with (namespace). Packets are switched after that into the main device (default-ns) and queued for xmit. RX processing is simple and all multicast, broadcast (if applicable), and unicast belonging to the address(es) are delivered to the virtual devices. (b) L3 mode : In this mode, the device behaves like a L3 device. TX processing upto L3 happens on the stack of the virtual device associated with (namespace). Packets are switched to the main-device (default-ns) for the L2 processing. Hence the routing table of the default-ns will be used in this mode. RX processins is somewhat similar to the L2 mode except that in this mode only Unicast packets are delivered to the virtual device while main-dev will handle all other packets. The devices can be added using the "ip" command from the iproute2 package - ip link add link type ipvlan mode [ l2 | l3 ] Signed-off-by: Mahesh Bandewar Cc: Eric Dumazet Cc: Maciej Żenczykowski Cc: Laurent Chavey Cc: Tim Hockin Cc: Brandon Philips Cc: Pavel Emelianov Signed-off-by: David S. Miller --- Documentation/networking/ipvlan.txt | 107 ++++++++++++++++++++++++++++++++++++ 1 file changed, 107 insertions(+) create mode 100644 Documentation/networking/ipvlan.txt (limited to 'Documentation') diff --git a/Documentation/networking/ipvlan.txt b/Documentation/networking/ipvlan.txt new file mode 100644 index 000000000000..cf996394e466 --- /dev/null +++ b/Documentation/networking/ipvlan.txt @@ -0,0 +1,107 @@ + + IPVLAN Driver HOWTO + +Initial Release: + Mahesh Bandewar + +1. Introduction: + This is conceptually very similar to the macvlan driver with one major +exception of using L3 for mux-ing /demux-ing among slaves. This property makes +the master device share the L2 with it's slave devices. I have developed this +driver in conjuntion with network namespaces and not sure if there is use case +outside of it. + + +2. Building and Installation: + In order to build the driver, please select the config item CONFIG_IPVLAN. +The driver can be built into the kernel (CONFIG_IPVLAN=y) or as a module +(CONFIG_IPVLAN=m). + + +3. Configuration: + There are no module parameters for this driver and it can be configured +using IProute2/ip utility. + + ip link add link type ipvlan mode { l2 | L3 } + + e.g. ip link add link ipvl0 eth0 type ipvlan mode l2 + + +4. Operating modes: + IPvlan has two modes of operation - L2 and L3. For a given master device, +you can select one of these two modes and all slaves on that master will +operate in the same (selected) mode. The RX mode is almost identical except +that in L3 mode the slaves wont receive any multicast / broadcast traffic. +L3 mode is more restrictive since routing is controlled from the other (mostly) +default namespace. + +4.1 L2 mode: + In this mode TX processing happens on the stack instance attached to the +slave device and packets are switched and queued to the master device to send +out. In this mode the slaves will RX/TX multicast and broadcast (if applicable) +as well. + +4.2 L3 mode: + In this mode TX processing upto L3 happens on the stack instance attached +to the slave device and packets are switched to the stack instance of the +master device for the L2 processing and routing from that instance will be +used before packets are queued on the outbound device. In this mode the slaves +will not receive nor can send multicast / broadcast traffic. + + +5. What to choose (macvlan vs. ipvlan)? + These two devices are very similar in many regards and the specific use +case could very well define which device to choose. if one of the following +situations defines your use case then you can choose to use ipvlan - + (a) The Linux host that is connected to the external switch / router has +policy configured that allows only one mac per port. + (b) No of virtual devices created on a master exceed the mac capacity and +puts the NIC in promiscous mode and degraded performance is a concern. + (c) If the slave device is to be put into the hostile / untrusted network +namespace where L2 on the slave could be changed / misused. + + +6. Example configuration: + + +=============================================================+ + | Host: host1 | + | | + | +----------------------+ +----------------------+ | + | | NS:ns0 | | NS:ns1 | | + | | | | | | + | | | | | | + | | ipvl0 | | ipvl1 | | + | +----------#-----------+ +-----------#----------+ | + | # # | + | ################################ | + | # eth0 | + +==============================#==============================+ + + + (a) Create two network namespaces - ns0, ns1 + ip netns add ns0 + ip netns add ns1 + + (b) Create two ipvlan slaves on eth0 (master device) + ip link add link eth0 ipvl0 type ipvlan mode l2 + ip link add link eth0 ipvl1 type ipvlan mode l2 + + (c) Assign slaves to the respective network namespaces + ip link set dev ipvl0 netns ns0 + ip link set dev ipvl1 netns ns1 + + (d) Now switch to the namespace (ns0 or ns1) to configure the slave devices + - For ns0 + (1) ip netns exec ns0 bash + (2) ip link set dev ipvl0 up + (3) ip link set dev lo up + (4) ip -4 addr add 127.0.0.1 dev lo + (5) ip -4 addr add $IPADDR dev ipvl0 + (6) ip -4 route add default via $ROUTER dev ipvl0 + - For ns1 + (1) ip netns exec ns1 bash + (2) ip link set dev ipvl1 up + (3) ip link set dev lo up + (4) ip -4 addr add 127.0.0.1 dev lo + (5) ip -4 addr add $IPADDR dev ipvl1 + (6) ip -4 route add default via $ROUTER dev ipvl1 -- cgit v1.2.3 From 007f790c8276271de26416f90d55561bcc96588a Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Fri, 28 Nov 2014 14:34:17 +0100 Subject: net: introduce generic switch devices support The goal of this is to provide a possibility to support various switch chips. Drivers should implement relevant ndos to do so. Now there is only one ndo defined: - for getting physical switch id is in place. Note that user can use random port netdevice to access the switch. Signed-off-by: Jiri Pirko Reviewed-by: Thomas Graf Acked-by: Andy Gospodarek Signed-off-by: David S. Miller --- Documentation/networking/switchdev.txt | 59 ++++++++++++++++++++++++++++++++++ 1 file changed, 59 insertions(+) create mode 100644 Documentation/networking/switchdev.txt (limited to 'Documentation') diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt new file mode 100644 index 000000000000..f981a9295a39 --- /dev/null +++ b/Documentation/networking/switchdev.txt @@ -0,0 +1,59 @@ +Switch (and switch-ish) device drivers HOWTO +=========================== + +Please note that the word "switch" is here used in very generic meaning. +This include devices supporting L2/L3 but also various flow offloading chips, +including switches embedded into SR-IOV NICs. + +Lets describe a topology a bit. Imagine the following example: + + +----------------------------+ +---------------+ + | SOME switch chip | | CPU | + +----------------------------+ +---------------+ + port1 port2 port3 port4 MNGMNT | PCI-E | + | | | | | +---------------+ + PHY PHY | | | | NIC0 NIC1 + | | | | | | + | | +- PCI-E -+ | | + | +------- MII -------+ | + +------------- MII ------------+ + +In this example, there are two independent lines between the switch silicon +and CPU. NIC0 and NIC1 drivers are not aware of a switch presence. They are +separate from the switch driver. SOME switch chip is by managed by a driver +via PCI-E device MNGMNT. Note that MNGMNT device, NIC0 and NIC1 may be +connected to some other type of bus. + +Now, for the previous example show the representation in kernel: + + +----------------------------+ +---------------+ + | SOME switch chip | | CPU | + +----------------------------+ +---------------+ + sw0p0 sw0p1 sw0p2 sw0p3 MNGMNT | PCI-E | + | | | | | +---------------+ + PHY PHY | | | | eth0 eth1 + | | | | | | + | | +- PCI-E -+ | | + | +------- MII -------+ | + +------------- MII ------------+ + +Lets call the example switch driver for SOME switch chip "SOMEswitch". This +driver takes care of PCI-E device MNGMNT. There is a netdevice instance sw0pX +created for each port of a switch. These netdevices are instances +of "SOMEswitch" driver. sw0pX netdevices serve as a "representation" +of the switch chip. eth0 and eth1 are instances of some other existing driver. + +The only difference of the switch-port netdevice from the ordinary netdevice +is that is implements couple more NDOs: + + ndo_switch_parent_id_get - This returns the same ID for two port netdevices + of the same physical switch chip. This is + mandatory to be implemented by all switch drivers + and serves the caller for recognition of a port + netdevice. + ndo_switch_parent_* - Functions that serve for a manipulation of the switch + chip itself (it can be though of as a "parent" of the + port, therefore the name). They are not port-specific. + Caller might use arbitrary port netdevice of the same + switch and it will make no difference. + ndo_switch_port_* - Functions that serve for a port-specific manipulation. -- cgit v1.2.3 From aecbe01e7410ad2de022796472f531ae6941f15e Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Fri, 28 Nov 2014 14:34:19 +0100 Subject: net-sysfs: expose physical switch id for particular device Signed-off-by: Jiri Pirko Reviewed-by: Thomas Graf Acked-by: John Fastabend Acked-by: Andy Gospodarek Acked-by: Florian Fainelli Signed-off-by: David S. Miller --- Documentation/ABI/testing/sysfs-class-net | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-class-net b/Documentation/ABI/testing/sysfs-class-net index e1b2e785bba8..beb8ec4dabbc 100644 --- a/Documentation/ABI/testing/sysfs-class-net +++ b/Documentation/ABI/testing/sysfs-class-net @@ -216,3 +216,11 @@ Contact: netdev@vger.kernel.org Description: Indicates the interface protocol type as a decimal value. See include/uapi/linux/if_arp.h for all possible values. + +What: /sys/class/net//phys_switch_id +Date: November 2014 +KernelVersion: 3.19 +Contact: netdev@vger.kernel.org +Description: + Indicates the unique physical switch identifier of a switch this + port belongs to, as a string. -- cgit v1.2.3 From 829ae9d611651467fe6cd7be834bd33ca6b28dfe Mon Sep 17 00:00:00 2001 From: Willem de Bruijn Date: Sun, 30 Nov 2014 22:22:34 -0500 Subject: net-timestamp: allow reading recv cmsg on errqueue with origin tstamp Allow reading of timestamps and cmsg at the same time on all relevant socket families. One use is to correlate timestamps with egress device, by asking for cmsg IP_PKTINFO. on AF_INET sockets, call the relevant function (ip_cmsg_recv). To avoid changing legacy expectations, only do so if the caller sets a new timestamping flag SOF_TIMESTAMPING_OPT_CMSG. on AF_INET6 sockets, IPV6_PKTINFO and all other recv cmsg are already returned for all origins. only change is to set ifindex, which is not initialized for all error origins. In both cases, only generate the pktinfo message if an ifindex is known. This is not the case for ACK timestamps. The difference between the protocol families is probably a historical accident as a result of the different conditions for generating cmsg in the relevant ip(v6)_recv_error function: ipv4: if (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP) { ipv6: if (serr->ee.ee_origin != SO_EE_ORIGIN_LOCAL) { At one time, this was the same test bar for the ICMP/ICMP6 distinction. This is no longer true. Signed-off-by: Willem de Bruijn ---- Changes v1 -> v2 large rewrite - integrate with existing pktinfo cmsg generation code - on ipv4: only send with new flag, to maintain legacy behavior - on ipv6: send at most a single pktinfo cmsg - on ipv6: initialize fields if not yet initialized The recv cmsg interfaces are also relevant to the discussion of whether looping packet headers is problematic. For v6, cmsgs that identify many headers are already returned. This patch expands that to v4. If it sounds reasonable, I will follow with patches 1. request timestamps without payload with SOF_TIMESTAMPING_OPT_TSONLY (http://patchwork.ozlabs.org/patch/366967/) 2. sysctl to conditionally drop all timestamps that have payload or cmsg from users without CAP_NET_RAW. Signed-off-by: David S. Miller --- Documentation/networking/timestamping.txt | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt index 1d6d02d6ba52..b08e27261ff9 100644 --- a/Documentation/networking/timestamping.txt +++ b/Documentation/networking/timestamping.txt @@ -122,7 +122,7 @@ SOF_TIMESTAMPING_RAW_HARDWARE: 1.3.3 Timestamp Options -The interface supports one option +The interface supports the options SOF_TIMESTAMPING_OPT_ID: @@ -145,6 +145,16 @@ SOF_TIMESTAMPING_OPT_ID: stream sockets, it increments with every byte. +SOF_TIMESTAMPING_OPT_CMSG: + + Support recv() cmsg for all timestamped packets. Control messages + are already supported unconditionally on all packets with receive + timestamps and on IPv6 packets with transmit timestamp. This option + extends them to IPv4 packets with transmit timestamp. One use case + is to correlate packets with their egress device, by enabling socket + option IP_PKTINFO simultaneously. + + 1.4 Bytestream Timestamps The SO_TIMESTAMPING interface supports timestamping of bytes in a -- cgit v1.2.3 From cbd3aad5ce66f5a266a185aa37e0eb9be9ba4154 Mon Sep 17 00:00:00 2001 From: Willem de Bruijn Date: Sun, 30 Nov 2014 22:22:35 -0500 Subject: net-timestamp: expand documentation and test Documentation: expand explanation of timestamp counter Test: new: flag -I requests and prints PKTINFO new: flag -x prints payload (possibly truncated) fix: remove pretty print that breaks common flag '-l 1' Signed-off-by: Willem de Bruijn Signed-off-by: David S. Miller --- Documentation/networking/timestamping.txt | 23 ++++-- .../networking/timestamping/txtimestamp.c | 90 +++++++++++++++++++--- 2 files changed, 93 insertions(+), 20 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt index b08e27261ff9..a5c784c89312 100644 --- a/Documentation/networ