summaryrefslogtreecommitdiffstats
path: root/net/netfilter/core.c
AgeCommit message (Collapse)Author
2015-07-23netfilter: fix possible removal of wrong hookPablo Neira Ayuso
nf_unregister_net_hook() uses the nf_hook_ops fields as tuple to look up for the corresponding hook in the list. However, we may have two hooks with exactly the same configuration. This shouldn't be a problem for nftables since every new chain has an unique priv field set, but this may still cause us problems in the future, so better address this problem now by keeping a reference to the original nf_hook_ops structure to make sure we delete the right hook from nf_unregister_net_hook(). Fixes: 085db2c04557 ("netfilter: Per network namespace netfilter hooks.") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-07-23netfilter: nf_queue: fix nf_queue_nf_hook_drop()Pablo Neira Ayuso
This function reacquires the rtnl_lock() which is already held by nf_unregister_hook(). This can be triggered via: modprobe nf_conntrack_ipv4 && rmmod nf_conntrack_ipv4 [ 720.628746] INFO: task rmmod:3578 blocked for more than 120 seconds. [ 720.628749] Not tainted 4.2.0-rc2+ #113 [ 720.628752] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 720.628754] rmmod D ffff8800ca46fd58 0 3578 3571 0x00000080 [...] [ 720.628783] Call Trace: [ 720.628790] [<ffffffff8152ea0b>] schedule+0x6b/0x90 [ 720.628795] [<ffffffff8152ecb3>] schedule_preempt_disabled+0x13/0x20 [ 720.628799] [<ffffffff8152ff55>] mutex_lock_nested+0x1f5/0x380 [ 720.628803] [<ffffffff81462622>] ? rtnl_lock+0x12/0x20 [ 720.628807] [<ffffffff81462622>] ? rtnl_lock+0x12/0x20 [ 720.628812] [<ffffffff81462622>] rtnl_lock+0x12/0x20 [ 720.628817] [<ffffffff8148ab25>] nf_queue_nf_hook_drop+0x15/0x160 [ 720.628825] [<ffffffff81488d48>] nf_unregister_net_hook+0x168/0x190 [ 720.628831] [<ffffffff81488e24>] nf_unregister_hook+0x64/0x80 [ 720.628837] [<ffffffff81488e60>] nf_unregister_hooks+0x20/0x30 [...] Moreover, nf_unregister_net_hook() should only destroy the queue for this netns, not for every netns. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Fixes: 085db2c04557 ("netfilter: Per network namespace netfilter hooks.") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-07-20netfilter: Fix memory leak in nf_register_net_hookEric W. Biederman
In the rare case that when it is a attempted to use a per network device netfilter hook and the network device does not exist the newly allocated structure can leak. Be a good citizen and free the newly allocated structure in the error handling code. Fixes: 085db2c04557 ("netfilter: Per network namespace netfilter hooks.") Reported-by: kbuild@01.org Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-07-15netfilter: move tee_active to coreFlorian Westphal
This prepares for a TEE like expression in nftables. We want to ensure only one duplicate is sent, so both will use the same percpu variable to detect duplication. The other use case is detection of recursive call to xtables, but since we don't want dependency from nft to xtables core its put into core.c instead of the x_tables core. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-07-15netfilter: Per network namespace netfilter hooks.Eric W. Biederman
- Add a new set of functions for registering and unregistering per network namespace hooks. - Modify the old global namespace hook functions to use the per network namespace hooks in their implementation, so their remains a single list that needs to be walked for any hook (this is important for keeping the hook priority working and for keeping the code walking the hooks simple). - Only allow registering the per netdevice hooks in the network namespace where the network device lives. - Dynamically allocate the structures in the per network namespace hook list in nf_register_net_hook, and unregister them in nf_unregister_net_hook. Dynamic allocate is required somewhere as the number of network namespaces are not fixed so we might as well allocate them in the registration function. The chain of registered hooks on any list is expected to be small so the cost of walking that list to find the entry we are unregistering should also be small. Performing the management of the dynamically allocated list entries in the registration and unregistration functions keeps the complexity from spreading. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-07-15netfilter: Factor out the hook list selection from nf_register_hookEric W. Biederman
- Add a new function find_nf_hook_list to select the nf_hook_list - Fail nf_register_hook when asked for a per netdevice hook list when support for per netdevice hook lists is not built into the kernel. - Move the hook list head selection outside of nf_hook_mutex as nothing in the selection requires the hook list, and error handling is simpler if a mutex is not held. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-07-15netfilter: Simply the tests for enabling and disabling the ingress queue hookEric W. Biederman
Replace an overcomplicated switch statement with a simple if statement. This also removes the ingress queue enable outside of nf_hook_mutex as the protection provided by the mutex is not necessary and the code is clearer having both of the static key increments together. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-06-23netfilter: nf_qeueue: Drop queue entries on nf_unregister_hookEric W. Biederman
Add code to nf_unregister_hook to flush the nf_queue when a hook is unregistered. This guarantees that the pointer that the nf_queue code retains into the nf_hook list will remain valid while a packet is queued. I tested what would happen if we do not flush queued packets and was trivially able to obtain the oops below. All that was required was to stop the nf_queue listening process, to delete all of the nf_tables, and to awaken the nf_queue listening process. > BUG: unable to handle kernel paging request at 0000000100000001 > IP: [<0000000100000001>] 0x100000001 > PGD b9c35067 PUD 0 > Oops: 0010 [#1] SMP > Modules linked in: > CPU: 0 PID: 519 Comm: lt-nfqnl_test Not tainted > task: ffff8800b9c8c050 ti: ffff8800ba9d8000 task.ti: ffff8800ba9d8000 > RIP: 0010:[<0000000100000001>] [<0000000100000001>] 0x100000001 > RSP: 0018:ffff8800ba9dba40 EFLAGS: 00010a16 > RAX: ffff8800bab48a00 RBX: ffff8800ba9dba90 RCX: ffff8800ba9dba90 > RDX: ffff8800b9c10128 RSI: ffff8800ba940900 RDI: ffff8800bab48a00 > RBP: ffff8800b9c10128 R08: ffffffff82976660 R09: ffff8800ba9dbb28 > R10: dead000000100100 R11: dead000000200200 R12: ffff8800ba940900 > R13: ffffffff8313fd50 R14: ffff8800b9c95200 R15: 0000000000000000 > FS: 00007fb91fc34700(0000) GS:ffff8800bfa00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000100000001 CR3: 00000000babfb000 CR4: 00000000000007f0 > Stack: > ffffffff8206ab0f ffffffff82982240 ffff8800bab48a00 ffff8800b9c100a8 > ffff8800b9c10100 0000000000000001 ffff8800ba940900 ffff8800b9c10128 > ffffffff8206bd65 ffff8800bfb0d5e0 ffff8800bab48a00 0000000000014dc0 > Call Trace: > [<ffffffff8206ab0f>] ? nf_iterate+0x4f/0xa0 > [<ffffffff8206bd65>] ? nf_reinject+0x125/0x190 > [<ffffffff8206dee5>] ? nfqnl_recv_verdict+0x255/0x360 > [<ffffffff81386290>] ? nla_parse+0x80/0xf0 > [<ffffffff8206c42c>] ? nfnetlink_rcv_msg+0x13c/0x240 > [<ffffffff811b2fec>] ? __memcg_kmem_get_cache+0x4c/0x150 > [<ffffffff8206c2f0>] ? nfnl_lock+0x20/0x20 > [<ffffffff82068159>] ? netlink_rcv_skb+0xa9/0xc0 > [<ffffffff820677bf>] ? netlink_unicast+0x12f/0x1c0 > [<ffffffff82067ade>] ? netlink_sendmsg+0x28e/0x650 > [<ffffffff81fdd814>] ? sock_sendmsg+0x44/0x50 > [<ffffffff81fde07b>] ? ___sys_sendmsg+0x2ab/0x2c0 > [<ffffffff810e8f73>] ? __wake_up+0x43/0x70 > [<ffffffff8141a134>] ? tty_write+0x1c4/0x2a0 > [<ffffffff81fde9f4>] ? __sys_sendmsg+0x44/0x80 > [<ffffffff823ff8d7>] ? system_call_fastpath+0x12/0x6a > Code: Bad RIP value. > RIP [<0000000100000001>] 0x100000001 > RSP <ffff8800ba9dba40> > CR2: 0000000100000001 > ---[ end trace 08eb65d42362793f ]--- Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-14netfilter: add netfilter ingress hook after handle_ing() under unique static keyPablo Neira
This patch adds the Netfilter ingress hook just after the existing tc ingress hook, that seems to be the consensus solution for this. Note that the Netfilter hook resides under the global static key that enables ingress filtering. Nonetheless, Netfilter still also has its own static key for minimal impact on the existing handle_ing(). * Without this patch: Result: OK: 6216490(c6216338+d152) usec, 100000000 (60byte,0frags) 16086246pps 7721Mb/sec (7721398080bps) errors: 100000000 42.46% kpktgend_0 [kernel.kallsyms] [k] __netif_receive_skb_core 25.92% kpktgend_0 [kernel.kallsyms] [k] kfree_skb 7.81% kpktgend_0 [pktgen] [k] pktgen_thread_worker 5.62% kpktgend_0 [kernel.kallsyms] [k] ip_rcv 2.70% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_internal 2.34% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_sk 1.44% kpktgend_0 [kernel.kallsyms] [k] __build_skb * With this patch: Result: OK: 6214833(c6214731+d101) usec, 100000000 (60byte,0frags) 16090536pps 7723Mb/sec (7723457280bps) errors: 100000000 41.23% kpktgend_0 [kernel.kallsyms] [k] __netif_receive_skb_core 26.57% kpktgend_0 [kernel.kallsyms] [k] kfree_skb 7.72% kpktgend_0 [pktgen] [k] pktgen_thread_worker 5.55% kpktgend_0 [kernel.kallsyms] [k] ip_rcv 2.78% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_internal 2.06% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_sk 1.43% kpktgend_0 [kernel.kallsyms] [k] __build_skb * Without this patch + tc ingress: tc filter add dev eth4 parent ffff: protocol ip prio 1 \ u32 match ip dst 4.3.2.1/32 Result: OK: 9269001(c9268821+d179) usec, 100000000 (60byte,0frags) 10788648pps 5178Mb/sec (5178551040bps) errors: 100000000 40.99% kpktgend_0 [kernel.kallsyms] [k] __netif_receive_skb_core 17.50% kpktgend_0 [kernel.kallsyms] [k] kfree_skb 11.77% kpktgend_0 [cls_u32] [k] u32_classify 5.62% kpktgend_0 [kernel.kallsyms] [k] tc_classify_compat 5.18% kpktgend_0 [pktgen] [k] pktgen_thread_worker 3.23% kpktgend_0 [kernel.kallsyms] [k] tc_classify 2.97% kpktgend_0 [kernel.kallsyms] [k] ip_rcv 1.83% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_internal 1.50% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_sk 0.99% kpktgend_0 [kernel.kallsyms] [k] __build_skb * With this patch + tc ingress: tc filter add dev eth4 parent ffff: protocol ip prio 1 \ u32 match ip dst 4.3.2.1/32 Result: OK: 9308218(c9308091+d126) usec, 100000000 (60byte,0frags) 10743194pps 5156Mb/sec (5156733120bps) errors: 100000000 42.01% kpktgend_0 [kernel.kallsyms] [k] __netif_receive_skb_core 17.78% kpktgend_0 [kernel.kallsyms] [k] kfree_skb 11.70% kpktgend_0 [cls_u32] [k] u32_classify 5.46% kpktgend_0 [kernel.kallsyms] [k] tc_classify_compat 5.16% kpktgend_0 [pktgen] [k] pktgen_thread_worker 2.98% kpktgend_0 [kernel.kallsyms] [k] ip_rcv 2.84% kpktgend_0 [kernel.kallsyms] [k] tc_classify 1.96% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_internal 1.57% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_sk Note that the results are very similar before and after. I can see gcc gets the code under the ingress static key out of the hot path. Then, on that cold branch, it generates the code to accomodate the netfilter ingress static key. My explanation for this is that this reduces the pressure on the instruction cache for non-users as the new code is out of the hot path, and it comes with minimal impact for tc ingress users. Using gcc version 4.8.4 on: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 [...] L1d cache: 16K L1i cache: 64K L2 cache: 2048K L3 cache: 8192K Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-14netfilter: add hook list to nf_hook_statePablo Neira
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-04netfilter: Make nf_hookfn use nf_hook_state.David S. Miller
Pass the nf_hook_state all the way down into the hook functions themselves. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-04netfilter: Create and use nf_hook_state.David S. Miller
Instead of passing a large number of arguments down into the nf_hook() entry points, create a structure which carries this state down through the hook processing layers. This makes is so that if we want to change the types or signatures of any of these pieces of state, there are less places that need to be changed. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13netfilter: fix various sparse warningsFlorian Westphal
net/bridge/br_netfilter.c:870:6: symbol 'br_netfilter_enable' was not declared. Should it be static? no; add include net/ipv4/netfilter/nft_reject_ipv4.c:22:6: symbol 'nft_reject_ipv4_eval' was not declared. Should it be static? yes net/ipv6/netfilter/nf_reject_ipv6.c:16:6: symbol 'nf_send_reset6' was not declared. Should it be static? no; add include net/ipv6/netfilter/nft_reject_ipv6.c:22:6: symbol 'nft_reject_ipv6_eval' was not declared. Should it be static? yes net/netfilter/core.c:33:32: symbol 'nf_ipv6_ops' was not declared. Should it be static? no; add include net/netfilter/xt_DSCP.c:40:57: cast truncates bits from constant value (ffffff03 becomes 3) net/netfilter/xt_DSCP.c:57:59: cast truncates bits from constant value (ffffff03 becomes 3) add __force, 3 is what we want. net/ipv4/netfilter/nf_log_arp.c:77:6: symbol 'nf_log_arp_packet' was not declared. Should it be static? yes net/ipv4/netfilter/nf_reject_ipv4.c:17:6: symbol 'nf_send_reset' was not declared. Should it be static? no; add include Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-08-25netfilter: HAVE_JUMP_LABEL instead of CONFIG_JUMP_LABELZhouyi Zhou
Use HAVE_JUMP_LABEL as elsewhere in the kernel to ensure that the toolchain has the required support in addition to CONFIG_JUMP_LABEL being set. Signed-off-by: Zhouyi Zhou <yizhouzhou@ict.ac.cn> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-08-08netfilter: don't use mutex_lock_interruptible()Pablo Neira Ayuso
Eric Dumazet reports that getsockopt() or setsockopt() sometimes returns -EINTR instead of -ENOPROTOOPT, causing headaches to application developers. This patch replaces all the mutex_lock_interruptible() by mutex_lock() in the netfilter tree, as there is no reason we should sleep for a long time there. Reported-by: Eric Dumazet <edumazet@google.com> Suggested-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Julian Anastasov <ja@ssi.bg>
2013-10-14netfilter: pass hook ops to hookfnPatrick McHardy
Pass the hook ops to the hookfn to allow for generic hook functions. This change is required by nf_tables. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31netfilter: nf_conntrack: constify sk_buff argument to nf_ct_attach()Patrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-06Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Conflicts: net/netfilter/nf_log.c The conflict in nf_log.c is that in 'net' we added CONFIG_PROC_FS protection around foo_proc_entry() calls to fix a build failure, whereas in Pablo's tree a guard if() test around a call is remove_proc_entry() was removed. Trivially resolved. Pablo Neira Ayuso says: ==================== The following patchset contains the first batch of Netfilter/IPVS updates for your net-next tree, they are: * Three patches with improvements and code refactorization for nfnetlink_queue, from Florian Westphal. * FTP helper now parses replies without brackets, as RFC1123 recommends, from Jeff Mahoney. * Rise a warning to tell everyone about ULOG deprecation, NFLOG has been already in the kernel tree for long time and supersedes the old logging over netlink stub, from myself. * Don't panic if we fail to load netfilter core framework, just bail out instead, from myself. * Add cond_resched_rcu, used by IPVS to allow rescheduling while walking over big hashtables, from Simon Horman. * Change type of IPVS sysctl_sync_qlen_max sysctl to avoid possible overflow, from Zhang Yanfei. * Use strlcpy instead of strncpy to skip zeroing of already initialized area to write the extension names in ebtables, from Chen Gang. * Use already existing per-cpu notrack object from xt_CT, from Eric Dumazet. * Save explicit socket lookup in xt_socket now that we have early demux, also from Eric Dumazet. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-23netfilter: don't panic on error while walking through the init pathPablo Neira Ayuso
Don't panic if we hit an error while adding the nf_log or pernet netfilter support, just bail out. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-05-23netfilter: add nf_ipv6_ops hook to fix xt_addrtype with IPv6Florian Westphal
Quoting https://bugzilla.netfilter.org/show_bug.cgi?id=812: [ ip6tables -m addrtype ] When I tried to use in the nat/PREROUTING it messes up the routing cache even if the rule didn't matched at all. [..] If I remove the --limit-iface-in from the non-working scenario, so just use the -m addrtype --dst-type LOCAL it works! This happens when LOCAL type matching is requested with --limit-iface-in, and the default ipv6 route is via the interface the packet we test arrived on. Because xt_addrtype uses ip6_route_output, the ipv6 routing implementation creates an unwanted cached entry, and the packet won't make it to the real/expected destination. Silently ignoring --limit-iface-in makes the routing work but it breaks rule matching (--dst-type LOCAL with limit-iface-in is supposed to only match if the dst address is configured on the incoming interface; without --limit-iface-in it will match if the address is reachable via lo). The test should call ipv6_chk_addr() instead. However, this would add a link-time dependency on ipv6. There are two possible solutions: 1) Revert the commit that moved ipt_addrtype to xt_addrtype, and put ipv6 specific code into ip6t_addrtype. 2) add new "nf_ipv6_ops" struct to register pointers to ipv6 functions. While the former might seem preferable, Pablo pointed out that there are more xt modules with link-time dependeny issues regarding ipv6, so lets go for 2). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-18netfilter: add my copyright statementsPatrick McHardy
Add copyright statements to all netfilter files which have had significant changes done by myself in the past. Some notes: - nf_conntrack_ecache.c was incorrectly attributed to Rusty and Netfilter Core Team when it got split out of nf_conntrack_core.c. The copyrights even state a date which lies six years before it was written. It was written in 2005 by Harald and myself. - net/ipv{4,6}/netfilter.c, net/netfitler/nf_queue.c were missing copyright statements. I've added the copyright statement from net/netfilter/core.c, where this code originated - for nf_conntrack_proto_tcp.c I've also added Jozsef, since I didn't want it to give the wrong impression Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05netfilter: remove unneeded variable proc_net_netfilterPablo Neira Ayuso
Now that this supports net namespace for nflog and nfqueue, we can remove the global proc_net_netfilter which has no clients anymore. Based on patch from Gao feng. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05netfilter: make /proc/net/netfilter pernetGao feng
This patch makes this proc dentry pernet. So far only init_net had a /proc/net/netfilter directory. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-03netfilter: kill support for per-af queue backendsFlorian Westphal
We used to have several queueing backends, but nowadays only nfnetlink_queue remains. In light of this there doesn't seem to be a good reason to support per-af registering -- just hook up nfnetlink_queue on module load and remove it on unload. This means that the userspace BIND/UNBIND_PF commands are now obsolete; the kernel will ignore them. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-03netfilter: pass 'nf_hook_ops' instead of 'list_head' to nf_queue()Michael Wang
Since 'list_for_each_continue_rcu' has already been replaced by 'list_for_each_entry_continue_rcu', pass 'list_head' to nf_queue() as a parameter can not benefit us any more. This patch will replace 'list_head' with 'nf_hook_ops' as the parameter of nf_queue() and __nf_queue() to save code. Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-03netfilter: pass 'nf_hook_ops' instead of 'list_head' to nf_iterate()Michael Wang
Since 'list_for_each_continue_rcu' has already been replaced by 'list_for_each_entry_continue_rcu', pass 'list_head' to nf_iterate() as a parameter can not benefit us any more. This patch will replace 'list_head' with 'nf_hook_ops' as the parameter of nf_iterate() to save code. Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-30netfilter: add protocol independent NAT corePatrick McHardy
Convert the IPv4 NAT implementation to a protocol independent core and address family specific modules. Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-22netfilter: replace list_for_each_continue_rcu with new interfaceMichael Wang
This patch replaces list_for_each_continue_rcu() with list_for_each_entry_continue_rcu() to allow removing list_for_each_continue_rcu(). Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-22netfilter: nfnetlink_queue: fix compilation with CONFIG_NF_NAT=m and ↵Pablo Neira Ayuso
CONFIG_NF_CT_NETLINK=y LD init/built-in.o net/built-in.o:(.data+0x4408): undefined reference to `nf_nat_tcp_seq_adjust' make: *** [vmlinux] Error 1 This patch adds a new pointer hook (nfq_ct_nat_hook) similar to other existing in Netfilter to solve our complicated configuration dependencies. Reported-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-20netfilter: nfq_ct_hook needs __rcu and __read_mostlyPablo Neira Ayuso
This removes some sparse warnings. Reported-by: Fengguang Wu <wfg@linux.intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-16netfilter: add glue code to integrate nfnetlink_queue and ctnetlinkPablo Neira Ayuso
This patch allows you to include the conntrack information together with the packet that is sent to user-space via NFQUEUE. Previously, there was no integration between ctnetlink and nfnetlink_queue. If you wanted to access conntrack information from your libnetfilter_queue program, you required to query ctnetlink from user-space to obtain it. Thus, delaying the packet processing even more. Including the conntrack information is optional, you can set it via NFQA_CFG_F_CONNTRACK flag with the new NFQA_CFG_FLAGS attribute. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-04-20net: Delete all remaining instances of ctl_pathEric W. Biederman
We don't use struct ctl_path anymore so delete the exported constants. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-24static keys: Introduce 'struct static_key', static_key_true()/false() and ↵Ingo Molnar
static_key_slow_[inc|dec]() So here's a boot tested patch on top of Jason's series that does all the cleanups I talked about and turns jump labels into a more intuitive to use facility. It should also address the various misconceptions and confusions that surround jump labels. Typical usage scenarios: #include <linux/static_key.h> struct static_key key = STATIC_KEY_INIT_TRUE; if (static_key_false(&key)) do unlikely code else do likely code Or: if (static_key_true(&key)) do likely code else do unlikely code The static key is modified via: static_key_slow_inc(&key); ... static_key_slow_dec(&key); The 'slow' prefix makes it abundantly clear that this is an expensive operation. I've updated all in-kernel code to use this everywhere. Note that I (intentionally) have not pushed through the rename blindly through to the lowest levels: the actual jump-label patching arch facility should be named like that, so we want to decouple jump labels from the static-key facility a bit. On non-jump-label enabled architectures static keys default to likely()/unlikely() branches. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Jason Baron <jbaron@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: a.p.zijlstra@chello.nl Cc: mathieu.desnoyers@efficios.com Cc: davem@davemloft.net Cc: ddaney.cavm@gmail.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20120222085809.GA26397@elte.hu Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-16net:netfilter: use IS_ENABLEDIgor Maravić
Use IS_ENABLED(CONFIG_FOO) instead of defined(CONFIG_FOO) || defined (CONFIG_FOO_MODULE) Signed-off-by: Igor Maravić <igorm@etf.rs> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-21netfilter: use jump_label for nf_hooksEric Dumazet
On configs where CONFIG_JUMP_LABEL=y, we can replace in fast path a load/compare/conditional jump by a single jump with no dcache reference. Jump target is modified as soon as nf_hooks[pf][hook] switches from empty state to non empty states. jump_label state is kept outside of nf_hooks array so has no cost on cpu caches. This patch removes the test on CONFIG_NETFILTER_DEBUG : No need to call nf_hook_slow() at all if nf_hooks[pf][hook] is empty, this didnt give useful information, but slowed down things a lot. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McHardy <kaber@trash.net> CC: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-01netfilter: do not propagate nf_queue errors in nf_hook_slowFlorian Westphal
commit f15850861860636c905b33a9a5be3dcbc2b0d56a (netfilter: nfnetlink_queue: return error number to caller) erronously assigns the return value of nf_queue() to the "ret" value. This can cause bogus return values if we encounter QUEUE verdict when bypassing is enabled, the listener does not exist and the next hook returns NF_STOLEN. In this case nf_hook_slow returned -ESRCH instead of 0. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-08-02rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTERStephen Hemminger
When assigning a NULL value to an RCU protected pointer, no barrier is needed. The rcu_assign_pointer, used to handle that but will soon change to not handle the special case. Convert all rcu_assign_pointer of NULL value. //smpl @@ expression P; @@ - rcu_assign_pointer(P, NULL) + RCU_INIT_POINTER(P, NULL) // </smpl> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-19Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: Documentation/feature-removal-schedule.txt drivers/net/e1000e/netdev.c net/xfrm/xfrm_policy.c
2011-02-14netfilter: nf_iterate: fix incorrect RCU usagePatrick McHardy
As noticed by Eric, nf_iterate doesn't use RCU correctly by accessing the prev pointer of a RCU protected list element when a verdict of NF_REPEAT is issued. Fix by jumping backwards to the hook invocation directly instead of loading the previous list element before continuing the list iteration. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-18netfilter: allow NFQUEUE bypass if no listener is availableFlorian Westphal
If an skb is to be NF_QUEUE'd, but no program has opened the queue, the packet is dropped. This adds a v2 target revision of xt_NFQUEUE that allows packets to continue through the ruleset instead. Because the actual queueing happens outside of the target context, the 'bypass' flag has to be communicated back to the netfilter core. Unfortunately the only choice to do this without adding a new function argument is to use the target function return value (i.e. the verdict). In the NF_QUEUE case, the upper 16bit already contain the queue number to use. The previous patch reduced NF_VERDICT_MASK to 0xff, i.e. we now have extra room for a new flag. If a hook issued a NF_QUEUE verdict, then the netfilter core will continue packet processing if the queueing hook returns -ESRCH (== "this queue does not exist") and the new NF_VERDICT_FLAG_QUEUE_BYPASS flag is set in the verdict value. Note: If the queue exists, but userspace does not consume packets fast enough, the skb will still be dropped. Signed-off-by: Florian Westphal <fwestphal@astaro.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-18netfilter: reduce NF_VERDICT_MASK to 0xffFlorian Westphal
NF_VERDICT_MASK is currently 0xffff. This is because the upper 16 bits are used to store errno (for NF_DROP) or the queue number (NF_QUEUE verdict). As there are up to 0xffff different queues available, there is no more room to store additional flags. At the moment there are only 6 different verdicts, i.e. we can reduce NF_VERDICT_MASK to 0xff to allow storing additional flags in the 0xff00 space. NF_VERDICT_BITS would then be reduced to 8, but because the value is exported to userspace, this might cause breakage; e.g.: e.g. 'queuenr = (1 << NF_VERDICT_BITS) | NF_QUEUE' would now break. Thus, remove NF_VERDICT_BITS usage in the kernel and move the old value to the 'userspace compat' section. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-18netfilter: nfnetlink_queue: do not free skb on errorFlorian Westphal
Move free responsibility from nf_queue to caller. This enables more flexible error handling; we can now accept the skb instead of freeing it. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-18netfilter: nfnetlink_queue: return error number to callerFlorian Westphal
instead of returning -1 on error, return an error number to allow the caller to handle some errors differently. ECANCELED is used to indicate that the hook is going away and should be ignored. A followup patch will introduce more 'ignore this hook' conditions, (depending on queue settings) and will move kfree_skb responsibility to the caller. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-13Merge branch 'master' of ↵Simon Horman
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 into HEAD
2010-11-17netfilter: allow hooks to pass error code back up the stackEric Paris
SELinux would like to pass certain fatal errors back up the stack. This patch implements the generic netfilter support for this functionality. Based-on-patch-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15netfilter: add __rcu annotationsEric Dumazet
Add some __rcu annotations and use helpers to reduce number of sparse warnings (CONFIG_SPARSE_RCU_POINTER=y) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1699 commits) bnx2/bnx2x: Unsupported Ethtool operations should return -EINVAL. vlan: Calling vlan_hwaccel_do_receive() is always valid. tproxy: use the interface primary IP address as a default value for --on-ip tproxy: added IPv6 support to the socket match cxgb3: function namespace cleanup tproxy: added IPv6 support to the TPROXY target tproxy: added IPv6 socket lookup function to nf_tproxy_core be2net: Changes to use only priority codes allowed by f/w tproxy: allow non-local binds of IPv6 sockets if IP_TRANSPARENT is enabled tproxy: added tproxy sockopt interface in the IPV6 layer tproxy: added udp6_lib_lookup function tproxy: added const specifiers to udp lookup functions tproxy: split off ipv6 defragmentation to a separate module l2tp: small cleanup nf_nat: restrict ICMP translation for embedded header can: mcp251x: fix generation of error frames can: mcp251x: fix endless loop in interrupt handler if CANINTF_MERRF is set can-raw: add msg_flags to distinguish local traffic 9p: client code cleanup rds: make local functions/variables static ... Fix up conflicts in net/core/dev.c, drivers/net/pcmcia/smc91c92_cs.c and drivers/net/wireless/ath/ath9k/debug.c as per David
2010-10-04netfilter: unregister nf hooks, matches and targets in the reverse orderChangli Gao
Since we register nf hooks, matches and targets in order, we'd better unregister them in the reverse order. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-08-19net/netfilter: __rcu annotationsArnd Bergmann
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Patrick McHardy <kaber@trash.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>