summaryrefslogtreecommitdiffstats
path: root/net/netfilter/ipvs
AgeCommit message (Collapse)Author
2015-02-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/vxlan.c drivers/vhost/net.c include/linux/if_vlan.h net/core/dev.c The net/core/dev.c conflict was the overlap of one commit marking an existing function static whilst another was adding a new function. In the include/linux/if_vlan.h case, the type used for a local variable was changed in 'net', whereas the function got rewritten to fix a stacked vlan bug in 'net-next'. In drivers/vhost/net.c, Al Viro's iov_iter conversions in 'net-next' overlapped with an endainness fix for VHOST 1.0 in 'net'. In drivers/net/vxlan.c, vxlan_find_vni() added a 'flags' parameter in 'net-next' whereas in 'net' there was a bug fix to pass in the correct network namespace pointer in calls to this function. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-30ipvs: rerouting to local clients is not needed anymoreJulian Anastasov
commit f5a41847acc5 ("ipvs: move ip_route_me_harder for ICMP") from 2.6.37 introduced ip_route_me_harder() call for responses to local clients, so that we can provide valid rt_src after SNAT. It was used by TCP to provide valid daddr for ip_send_reply(). After commit 0a5ebb8000c5 ("ipv4: Pass explicit daddr arg to ip_send_reply()." from 3.0 this rerouting is not needed anymore and should be avoided, especially in LOCAL_IN. Fixes 3.12.33 crash in xfrm reported by Florian Wiessner: "3.12.33 - BUG xfrm_selector_match+0x25/0x2f6" Reported-by: Smart Weblications GmbH - Florian Wiessner <f.wiessner@smart-weblications.de> Tested-by: Smart Weblications GmbH - Florian Wiessner <f.wiessner@smart-weblications.de> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-01-18netlink: make nlmsg_end() and genlmsg_end() voidJohannes Berg
Contrary to common expectations for an "int" return, these functions return only a positive value -- if used correctly they cannot even return 0 because the message header will necessarily be in the skb. This makes the very common pattern of if (genlmsg_end(...) < 0) { ... } be a whole bunch of dead code. Many places also simply do return nlmsg_end(...); and the caller is expected to deal with it. This also commonly (at least for me) causes errors, because it is very common to write if (my_function(...)) /* error condition */ and if my_function() does "return nlmsg_end()" this is of course wrong. Additionally, there's not a single place in the kernel that actually needs the message length returned, and if anyone needs it later then it'll be very easy to just use skb->len there. Remove this, and make the functions void. This removes a bunch of dead code as described above. The patch adds lines because I did - return nlmsg_end(...); + nlmsg_end(...); + return 0; I could have preserved all the function's return values by returning skb->len, but instead I've audited all the places calling the affected functions and found that none cared. A few places actually compared the return value with <= 0 in dump functionality, but that could just be changed to < 0 with no change in behaviour, so I opted for the more efficient version. One instance of the error I've made numerous times now is also present in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't check for <0 or <=0 and thus broke out of the loop every single time. I've preserved this since it will (I think) have caused the messages to userspace to be formatted differently with just a single message for every SKB returned to userspace. It's possible that this isn't needed for the tools that actually use this, but I don't even know what they are so couldn't test that changing this behaviour would be acceptable. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10ipvs: uninitialized data with IP_VS_IPV6Dan Carpenter
The app_tcp_pkt_out() function expects "*diff" to be set and ends up using uninitialized data if CONFIG_IP_VS_IPV6 is turned on. The same issue is there in app_tcp_pkt_in(). Thanks to Julian Anastasov for noticing that. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-12-09dst: no need to take reference on DST_NOCACHE dstsHannes Frederic Sowa
Since commit f8864972126899 ("ipv4: fix dst race in sk_dst_get()") DST_NOCACHE dst_entries get freed by RCU. So there is no need to get a reference on them when we are in rcu protected sections. Cc: Eric Dumazet <edumazet@google.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Reviewed-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== netfilter/ipvs updates for net-next The following patchset contains Netfilter updates for your net-next tree, this includes the NAT redirection support for nf_tables, the cgroup support for nft meta and conntrack zone support for the connlimit match. Coming after those, a bunch of sparse warning fixes, missing netns bits and cleanups. More specifically, they are: 1) Prepare IPv4 and IPv6 NAT redirect code to use it from nf_tables, patches from Arturo Borrero. 2) Introduce the nf_tables redir expression, from Arturo Borrero. 3) Remove an unnecessary assignment in ip_vs_xmit/__ip_vs_get_out_rt(). Patch from Alex Gartrell. 4) Add nft_log_dereference() macro to the nf_log infrastructure, patch from Marcelo Leitner. 5) Add some extra validation when registering logger families, also from Marcelo. 6) Some spelling cleanups from stephen hemminger. 7) Fix sparse warning in nf_logger_find_get(). 8) Add cgroup support to nf_tables meta, patch from Ana Rey. 9) A Kconfig fix for the new redir expression and fix sparse warnings in the new redir expression. 10) Fix several sparse warnings in the netfilter tree, from Florian Westphal. 11) Reduce verbosity when OOM in nfnetlink_log. User can basically do nothing when this situation occurs. 12) Add conntrack zone support to xt_connlimit, again from Florian. 13) Add netnamespace support to the h323 conntrack helper, contributed by Vasily Averin. 14) Remove unnecessary nul-pointer checks before free_percpu() and module_put(), from Markus Elfring. 15) Use pr_fmt in nfnetlink_log, again patch from Marcelo Leitner. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-20netfilter: Deletion of unnecessary checks before two function callsMarkus Elfring
The functions free_percpu() and module_put() test whether their argument is NULL and then return immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-12ipvs: Keep skb->sk when allocating headroom on tunnel xmitCalvin Owens
ip_vs_prepare_tunneled_skb() ignores ->sk when allocating a new skb, either unconditionally setting ->sk to NULL or allowing the uninitialized ->sk from a newly allocated skb to leak through to the caller. This patch properly copies ->sk and increments its reference count. Signed-off-by: Calvin Owens <calvinowens@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-10-28ipvs: remove unnecessary assignment in __ip_vs_get_out_rtAlex Gartrell
It is a precondition of the function that daddr be equal to dest->addr.ip if dest is non-NULL, so this additional assignment is just confusing for stupid engineers like me. Signed-off-by: Alex Gartrell <agartrell@fb.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-10-28ipvs: Avoid null-pointer deref in debug codeAlex Gartrell
Use daddr instead of reaching into dest. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alex Gartrell <agartrell@fb.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-10-14netfilter: replace strnicmp with strncasecmpRasmus Villemoes
The kernel used to contain two functions for length-delimited, case-insensitive string comparison, strnicmp with correct semantics and a slightly buggy strncasecmp. The latter is the POSIX name, so strnicmp was renamed to strncasecmp, and strnicmp made into a wrapper for the new strncasecmp to avoid breaking existing users. To allow the compat wrapper strnicmp to be removed at some point in the future, and to avoid the extra indirection cost, do s/strnicmp/strncasecmp/g. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-18ipvs: Allow heterogeneous pools now that we support themAlex Gartrell
Remove the temporary consistency check and add a case statement to only allow ipip mixed dests. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-18ipvs: use the new dest addr family fieldJulian Anastasov
Use the new address family field cp->daf when printing cp->daddr in logs or connection listing. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Alex Gartrell <agartrell@fb.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-18ipvs: use correct address family in scheduler logsJulian Anastasov
Needed to support svc->af != dest->af. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Alex Gartrell <agartrell@fb.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16ipvs: address family of LBLCR entry depends on svc familyJulian Anastasov
The LBLCR entries should use svc->af, not dest->af. Needed to support svc->af != dest->af. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Alex Gartrell <agartrell@fb.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16ipvs: address family of LBLC entry depends on svc familyJulian Anastasov
The LBLC entries should use svc->af, not dest->af. Needed to support svc->af != dest->af. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Alex Gartrell <agartrell@fb.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16ipvs: support ipv4 in ipv6 and ipv6 in ipv4 tunnel forwardingAlex Gartrell
Pull the common logic for preparing an skb to prepend the header into a single function and then set fields such that they can be used in either case (generalize tos and tclass to dscp, hop_limit and ttl to ttl, etc) Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16ipvs: Add generic ensure_mtu_is_adequate to handle mixed poolsAlex Gartrell
The out_rt functions check to see if the mtu is large enough for the packet and, if not, send icmp messages (TOOBIG or DEST_UNREACH) to the source and bail out. We needed the ability to send ICMP from the out_rt_v6 function and DEST_UNREACH from the out_rt function, so we just pulled it out into a common function. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16ipvs: Pull out update_pmtu codeAlex Gartrell
Another step toward heterogeneous pools, this removes another piece of functionality currently specific to each address family type. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16ipvs: Pull out crosses_local_route_boundary logicAlex Gartrell
This logic is repeated in both out_rt functions so it was redundant. Additionally, we'll need to be able to do checks to route v4 to v6 and vice versa in order to deal with heterogeneous pools. This patch also updates the callsites to add an additional parameter to the out route functions. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16ipvs: prevent mixing heterogeneous pools and synchronizationAlex Gartrell
The synchronization protocol is not compatible with heterogeneous pools, so we need to verify that we're not turning both on at the same time. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16ipvs: Supply destination address family to ip_vs_conn_newAlex Gartrell
The assumption that dest af is equal to service af is now unreliable, so we must specify it manually so as not to copy just the first 4 bytes of a v6 address or doing an illegal read of 16 butes on a v6 address. We "lie" in two places: for synchronization (which we will explicitly disallow from happening when we have heterogeneous pools) and for black hole addresses where there's no real dest. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16ipvs: Pass destination address family to ip_vs_trash_get_destAlex Gartrell
Part of a series of diffs to tease out destination family from virtual family. This diff just adds a parameter to ip_vs_trash_get and then uses it for comparison rather than svc->af. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16ipvs: Supply destination addr family to ip_vs_{lookup_dest,find_dest}Alex Gartrell
We need to remove the assumption that virtual address family is the same as real address family in order to support heterogeneous services (that is, services with v4 vips and v6 backends or the opposite). Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16ipvs: Add destination address family to netlink interfaceAlex Gartrell
This is necessary to support heterogeneous pools. For example, if you have an ipv6 addressed network, you'll want to be able to forward ipv4 traffic into it. This patch enforces that destination address family is the same as service family, as none of the forwarding mechanisms support anything else. For the old setsockopt mechanism, we simply set the dest address family to AF_INET as we do with the service. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16ipvs: Add simple weighted failover schedulerKenny Mathis
Add simple weighted IPVS failover support to the Linux kernel. All other scheduling modules implement some form of load balancing, while this offers a simple failover solution. Connections are directed to the appropriate server based solely on highest weight value and server availability. Tested functionality with keepalived. Signed-off-by: Kenny Mathis <kmathis@chokepoint.net> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== nf-next pull request The following patchset contains Netfilter/IPVS updates for your net-next tree. Regarding nf_tables, most updates focus on consolidating the NAT infrastructure and adding support for masquerading. More specifically, they are: 1) use __u8 instead of u_int8_t in arptables header, from Mike Frysinger. 2) Add support to match by skb->pkttype to the meta expression, from Ana Rey. 3) Add support to match by cpu to the meta expression, also from Ana Rey. 4) A smatch warning about IPSET_ATTR_MARKMASK validation, patch from Vytas Dauksa. 5) Fix netnet and netportnet hash types the range support for IPv4, from Sergey Popovich. 6) Fix missing-field-initializer warnings resolved, from Mark Rustad. 7) Dan Carperter reported possible integer overflows in ipset, from Jozsef Kadlecsick. 8) Filter out accounting objects in nfacct by type, so you can selectively reset quotas, from Alexey Perevalov. 9) Move specific NAT IPv4 functions to the core so x_tables and nf_tables can share the same NAT IPv4 engine. 10) Use the new NAT IPv4 functions from nft_chain_nat_ipv4. 11) Move specific NAT IPv6 functions to the core so x_tables and nf_tables can share the same NAT IPv4 engine. 12) Use the new NAT IPv6 functions from nft_chain_nat_ipv6. 13) Refactor code to add nft_delrule(), which can be reused in the enhancement of the NFT_MSG_DELTABLE to remove a table and its content, from Arturo Borrero. 14) Add a helper function to unregister chain hooks, from Arturo Borrero. 15) A cleanup to rename to nft_delrule_by_chain for consistency with the new nft_*() functions, also from Arturo. 16) Add support to match devgroup to the meta expression, from Ana Rey. 17) Reduce stack usage for IPVS socket option, from Julian Anastasov. 18) Remove unnecessary textsearch state initialization in xt_string, from Bojan Prtvar. 19) Add several helper functions to nf_tables, more work to prepare the enhancement of NFT_MSG_DELTABLE, again from Arturo Borrero. 20) Enhance NFT_MSG_DELTABLE to delete a table and its content, from Arturo Borrero. 21) Support NAT flags in the nat expression to indicate the flavour, eg. random fully, from Arturo. 22) Add missing audit code to ebtables when replacing tables, from Nicolas Dichtel. 23) Generalize the IPv4 masquerading code to allow its re-use from nf_tables, from Arturo. 24) Generalize the IPv6 masquerading code, also from Arturo. 25) Add the new masq expression to support IPv4/IPv6 masquerading from nf_tables, also from Arturo. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09ipvs: reduce stack usage for sockopt dataJulian Anastasov
Use union to reserve the required stack space for sockopt data which is less than the currently hardcoded value of 128. Now the tables for commands should be more readable. The checks added for readability are optimized by compiler, others warn at compile time if command uses too much stack or exceeds the storage of set_arglen and get_arglen. As Dan Carpenter points out, we can run for unprivileged user, so we can silent some error messages. Signed-off-by: Julian Anastasov <ja@ssi.bg> CC: Dan Carpenter <dan.carpenter@oracle.com> CC: Andrey Utkin <andrey.krieger.utkin@gmail.com> CC: David Binderman <dcb314@hotmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-08-28ipvs: fix ipv6 hook registration for local repliesJulian Anastasov
commit fc604767613b6d2036cdc35b660bc39451040a47 ("ipvs: changes for local real server") from 2.6.37 introduced DNAT support to local real server but the IPv6 LOCAL_OUT handler ip_vs_local_reply6() is registered incorrectly as IPv4 hook causing any outgoing IPv4 traffic to be dropped depending on the IP header values. Chris tracked down the problem to CONFIG_IP_VS_IPV6=y Bug report: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1349768 Reported-by: Chris J Arges <chris.j.arges@canonical.com> Tested-by: Chris J Arges <chris.j.arges@canonical.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-08-27ipvs: properly declare tunnel encapsulationJulian Anastasov
The tunneling method should properly use tunnel encapsulation. Fixes problem with CHECKSUM_PARTIAL packets when TCP/UDP csum offload is supported. Thanks to Alex Gartrell for reporting the problem, providing solution and for all suggestions. Reported-by: Alex Gartrell <agartrell@fb.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Alex Gartrell <agartrell@fb.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-08-08netfilter: don't use mutex_lock_interruptible()Pablo Neira Ayuso
Eric Dumazet reports that getsockopt() or setsockopt() sometimes returns -EINTR instead of -ENOPROTOOPT, causing headaches to application developers. This patch replaces all the mutex_lock_interruptible() by mutex_lock() in the netfilter tree, as there is no reason we should sleep for a long time there. Reported-by: Eric Dumazet <edumazet@google.com> Suggested-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Julian Anastasov <ja@ssi.bg>
2014-08-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/Makefile net/ipv6/sysctl_net_ipv6.c Two ipv6_table_template[] additions overlap, so the index of the ipv6_table[x] assignments needed to be adjusted. In the drivers/net/Makefile case, we've gotten rid of the garbage whereby we had to list every single USB networking driver in the top-level Makefile, there is just one "USB_NETWORKING" that guards everything. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains updates for your net-next tree, they are: 1) Use kvfree() helper function from x_tables, from Eric Dumazet. 2) Remove extra timer from the conntrack ecache extension, use a workqueue instead to redeliver lost events to userspace instead, from Florian Westphal. 3) Removal of the ulog targets for ebtables and iptables. The nflog infrastructure superseded this almost 9 years ago, time to get rid of this code. 4) Replace the list of loggers by an array now that we can only have two possible non-overlapping logger flavours, ie. kernel ring buffer and netlink logging. 5) Move Eric Dumazet's log buffer code to nf_log to reuse it from all of the supported per-family loggers. 6) Consolidate nf_log_packet() as an unified interface for packet logging. After this patch, if the struct nf_loginfo is available, it explicitly selects the logger that is used. 7) Move ip and ip6 logging code from xt_LOG to the corresponding per-family loggers. Thus, x_tables and nf_tables share the same code for packet logging. 8) Add generic ARP packet logger, which is used by nf_tables. The format aims to be consistent with the output of xt_LOG. 9) Add generic bridge packet logger. Again, this is used by nf_tables and it routes the packets to the real family loggers. As a result, we get consistent logging format for the bridge family. The ebt_log logging code has been intentionally left in place not to break backward compatibility since the logging output differs from xt_LOG. 10) Update nft_log to explicitly request the required family logger when needed. 11) Finish nft_log so it supports arp, ip, ip6, bridge and inet families. Allowing selection between netlink and kernel buffer ring logging. 12) Several fixes coming after the netfilter core logging changes spotted by robots. 13) Use IS_ENABLED() macros whenever possible in the netfilter tree, from Duan Jiong. 14) Removal of a couple of unnecessary branch before kfree, from Fabian Frederick. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-17ipvs: Maintain all DSCP and ECN bits for ipv6 tun forwardingAlex Gartrell
Previously, only the four high bits of the tclass were maintained in the ipv6 case. This matches the behavior of ipv4, though whether or not we should reflect ECN bits may be up for debate. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-07-16ipvs: Remove dead debug codeYannick Brosseau
This code section cannot compile as it refer to non existing variable It also pre-date git history. Signed-off-by: Yannick Brosseau <scientist@fb.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-07-16ipvs: remove null test before kfreeFabian Frederick
Fix checkpatch warning: WARNING: kfree(NULL) is safe this check is probably not required Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-07-16ipvs: avoid netns exit crash on ip_vs_conn_drop_conntrackJulian Anastasov
commit 8f4e0a18682d91 ("IPVS netns exit causes crash in conntrack") added second ip_vs_conn_drop_conntrack call instead of just adding the needed check. As result, the first call still can cause crash on netns exit. Remove it. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-06-16Merge branch 'ipvs'Pablo Neira Ayuso
Simon Horman says: ==================== Fix for panic due use of tot_stats estimator outside of CONFIG_SYSCTL It has been present since v3.6.39. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-13ipvs: stop tot_stats estimator only under CONFIG_SYSCTLJulian Anastasov
The tot_stats estimator is started only when CONFIG_SYSCTL is defined. But it is stopped without checking CONFIG_SYSCTL. Fix the crash by moving ip_vs_stop_estimator into ip_vs_control_net_cleanup_sysctl. The change is needed after commit 14e405461e664b ("IPVS: Add __ip_vs_control_{init,cleanup}_sysctl()") from 2.6.39. Reported-by: Jet Chen <jet.chen@intel.com> Tested-by: Jet Chen <jet.chen@intel.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-06-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: include/net/inetpeer.h net/ipv6/output_core.c Changes in net were fixing bugs in code removed in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02inetpeer: get rid of ip_id_countEric Dumazet
Ideally, we would need to generate IP ID using a per destination IP generator. linux kernels used inet_peer cache for this purpose, but this had a huge cost on servers disabling MTU discovery. 1) each inet_peer struct consumes 192 bytes 2) inetpeer cache uses a binary tree of inet_peer structs, with a nominal size of ~66000 elements under load. 3) lookups in this tree are hitting a lot of cache lines, as tree depth is about 20. 4) If server deals with many tcp flows, we have a high probability of not finding the inet_peer, allocating a fresh one, inserting it in the tree with same initial ip_id_count, (cf secure_ip_id()) 5) We garbage collect inet_peer aggressively. IP ID generation do not have to be 'perfect' Goal is trying to avoid duplicates in a short period of time, so that reassembly units have a chance to complete reassembly of fragments belonging to one message before receiving other fragments with a recycled ID. We simply use an array of generators, and a Jenkin hash using the dst IP as a key. ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it belongs (it is only used from this file) secure_ip_id() and secure_ipv6_id() no longer are needed. Rename ip_select_ident_more() to ip_select_ident_segs() to avoid unnecessary decrement/increment of the number of segments. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next This small patchset contains three accumulated Netfilter/IPVS updates, they are: 1) Refactorize common NAT code by encapsulating it into a helper function, similarly to what we do in other conntrack extensions, from Florian Westphal. 2) A minor format string mismatch fix for IPVS, from Masanari Iida. 3) Add quota support to the netfilter accounting infrastructure, now you can add quotas to accounting objects via the nfnetlink interface and use them from iptables. You can also listen to quota notifications from userspace. This enhancement from Mathieu Poirier. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-26ipvs: Fix panic due to non-linear skbPeter Christensen
Receiving a ICMP response to an IPIP packet in a non-linear skb could cause a kernel panic in __skb_pull. The problem was introduced in commit f2edb9f7706dcb2c0d9a362b2ba849efe3a97f5e ("ipvs: implement passive PMTUD for IPIP packets"). Signed-off-by: Peter Christensen <pch@ordbogen.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-05-12net: rename local_df to ignore_dfWANG Cong
As suggested by several people, rename local_df to ignore_df, since it means "ignore df bit if it is set". Cc: Maciej Żenczykowski <maze@google.com> Cc: Florian Westphal <fw@strlen.de> Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-23netfilter: Fix format string mismatch in ip_vs_proto_name()Masanari Iida
Fix string mismatch in ip_vs_proto_name() Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-03-17Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for net-next, most relevantly they are: * cleanup to remove double semicolon from stephen hemminger. * calm down sparse warning in xt_ipcomp, from Fan Du. * nf_ct_labels support for nf_tables, from Florian Westphal. * new macros to simplify rcu dereferences in the scope of nfnetlink and nf_tables, from Patrick McHardy. * Accept queue and drop (including reason for drop) to verdict parsing in nf_tables, also from Patrick. * Remove unused random seed initialization in nfnetlink_log, from Florian Westphal. * Allow to attach user-specific information to nf_tables rules, useful to attach user comments to rule, from me. * Return errors in ipset according to the manpage documentation, from Jozsef Kadlecsik. * Fix coccinelle warnings related to incorrect bool type usage for ipset, from Fengguang Wu. * Add hash:ip,mark set type to ipset, from Vytas Dauksa. * Fix message for each spotted by ipset for each netns that is created, from Ilia Mirkin. * Add forceadd option to ipset, which evicts a random entry from the set if it becomes full, from Josh Hunt. * Minor IPVS cleanups and fixes from Andi Kleen and Tingwei Liu. * Improve conntrack scalability by removing a central spinlock, original work from Eric Dumazet. Jesper Dangaard Brouer took them over to address remaining issues. Several patches to prepare this change come in first place. * Rework nft_hash to resolve bugs (leaking chain, missing rcu synchronization on element removal, etc. from Patrick McHardy. * Restore context in the rule deletion path, as we now release rule objects synchronously, from Patrick McHardy. This gets back event notification for anonymous sets. * Fix NAT family validation in nft_nat, also from Patrick. * Improve scalability of xt_connlimit by using an array of spinlocks and by introducing a rb-tree of hashtables for faster lookup of accounted objects per network. This patch was preceded by several patches and refactorizations to accomodate this change including the use of kmem_cache, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14net: Replace u64_stats_fetch_begin_bh to u64_stats_fetch_begin_irqEric W. Biederman
Replace the bh safe variant with the hard irq safe variant. We need a hard irq safe variant to deal with netpoll transmitting packets from hard irq context, and we need it in most if not all of the places using the bh safe variant. Except on 32bit uni-processor the code is exactly the same so don't bother with a bh variant, just have a hard irq safe variant that everyone can use. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-07ipvs: Reduce checkpatch noise in ip_vs_lblc.cTingwei Liu
Add whitespace after operator and put open brace { on the previous line Cc: Tingwei Liu <liutingwei@hisense.com> Cc: lvs-devel@vger.kernel.org Signed-off-by: Tingwei Liu <tingw.liu@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-03-07sections, ipvs: Remove useless __read_mostly for ipvs genl_opsAndi Kleen
const __read_mostly does not make any sense, because const data is already read-only. Remove the __read_mostly for the ipvs genl_ops. This avoids a LTO section conflict compile problem. Cc: Wensong Zhang <wensong@linux-vs.org> Cc: Simon Horman <horms@verge.net.au> Cc: Patrick McHardy <kaber@trash.net> Cc: lvs-devel@vger.kernel.org Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Simon Horman <horms@verge.net.au>