summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorDavid S. Miller <davem@davemloft.net>2013-10-17 15:22:05 -0400
committerDavid S. Miller <davem@davemloft.net>2013-10-17 15:22:05 -0400
commitda33edccebcc36d387423dcdb557094fbda55994 (patch)
tree9f426a52f875169ae24e54a395beedc697c96e02
parent78dea8cc4942c6adbcccc8f483463906a078f039 (diff)
parented683f138b3dbc8a5e878e24a0bfa0bb61043a09 (diff)
Merge branch 'net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables
Pablo Neira Ayuso says: ==================== netfilter updates: nf_tables pull request The following patchset contains the current original nf_tables tree condensed in 17 patches. I have organized them by chronogical order since the original nf_tables code was released in 2009 and by dependencies between the different patches. The patches are: 1) Adapt all existing hooks in the tree to pass hook ops to the hook callback function, required by nf_tables, from Patrick McHardy. 2) Move alloc_null_binding to nf_nat_core, as it is now also needed by nf_tables and ip_tables, original patch from Patrick McHardy but required major changes to adapt it to the current tree that I made. 3) Add nf_tables core, including the netlink API, the packet filtering engine, expressions and built-in tables, from Patrick McHardy. This patch includes accumulated fixes since 2009 and minor enhancements. The patch description contains a list of references to the original patches for the record. For those that are not familiar to the original work, see [1], [2] and [3]. 4) Add netlink set API, this replaces the original set infrastructure to introduce a netlink API to add/delete sets and to add/delete set elements. This includes two set types: the hash and the rb-tree sets (used for interval based matching). The main difference with ipset is that this infrastructure is data type agnostic. Patch from Patrick McHardy. 5) Allow expression operation overload, this API change allows us to provide define expression subtypes depending on the configuration that is received from user-space via Netlink. It is used by follow up patches to provide optimized versions of the payload and cmp expressions and the x_tables compatibility layer, from Patrick McHardy. 6) Add optimized data comparison operation, it requires the previous patch, from Patrick McHardy. 7) Add optimized payload implementation, it requires patch 5, from Patrick McHardy. 8) Convert built-in tables to chain types. Each chain type have special semantics (filter, route and nat) that are used by userspace to configure the chain behaviour. The main chain regarding iptables is that tables become containers of chain, with no specific semantics. However, you may still configure your tables and chains to retain iptables like semantics, patch from me. 9) Add compatibility layer for x_tables. This patch adds support to use all existing x_tables extensions from nf_tables, this is used to provide a userspace utility that accepts iptables syntax but used internally the nf_tables kernel core. This patch includes missing features in the nf_tables core such as the per-chain stats, default chain policy and number of chain references, which are required by the iptables compatibility userspace tool. Patch from me. 10) Fix transport protocol matching, this fix is a side effect of the x_tables compatibility layer, which now provides a pointer to the transport header, from me. 11) Add support for dormant tables, this feature allows you to disable all chains and rules that are contained in one table, from me. 12) Add IPv6 NAT support. At the time nf_tables was made, there was no NAT IPv6 support yet, from Tomasz Bursztyka. 13) Complete net namespace support. This patch register the protocol family per net namespace, so tables (thus, other objects contained in tables such as sets, chains and rules) are only visible from the corresponding net namespace, from me. 14) Add the insert operation to the nf_tables netlink API, this requires adding a new position attribute that allow us to locate where in the ruleset a rule needs to be inserted, from Eric Leblond. 15) Add rule batching support, including atomic rule-set updates by using rule-set generations. This patch includes a change to nfnetlink to include two new control messages to indicate the beginning and the end of a batch. The end message is interpreted as the commit message, if it's missing, then the rule-set updates contained in the batch are aborted, from me. 16) Add trace support to the nf_tables packet filtering core, from me. 17) Add ARP filtering support, original patch from Patrick McHardy, but adapted to fit into the chain type infrastructure. This was recovered to be used by nft userspace tool and our compatibility arptables userspace tool. There is still work to do to fully replace x_tables [4] [5] but that can be done incrementally by extending our netlink API. Moreover, looking at netfilter-devel and the amount of contributions to nf_tables we've been getting, I think it would be good to have it mainstream to avoid accumulating large patchsets skip continuous rebases. I tried to provide a reasonable patchset, we have more than 100 accumulated patches in the original nf_tables tree, so I collapsed many of the small fixes to the main patch we had since 2009 and provide a small batch for review to netdev, while trying to retain part of the history. For those who didn't give a try to nf_tables yet, there's a quick howto available from Eric Leblond that describes how to get things working [6]. Comments/reviews welcome. Thanks! [1] http://lwn.net/Articles/324251/ [2] http://workshop.netfilter.org/2013/wiki/images/e/ee/Nftables-osd-2013-developer.pdf [3] http://lwn.net/Articles/564095/ [4] http://people.netfilter.org/pablo/map-pending-work.txt [4] http://people.netfilter.org/pablo/nftables-todo.txt [5] https://home.regit.org/netfilter-en/nftables-quick-howto/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-rw-r--r--include/linux/netfilter.h14
-rw-r--r--include/linux/netfilter/nfnetlink.h5
-rw-r--r--include/net/net_namespace.h4
-rw-r--r--include/net/netfilter/nf_nat.h3
-rw-r--r--include/net/netfilter/nf_tables.h522
-rw-r--r--include/net/netfilter/nf_tables_core.h42
-rw-r--r--include/net/netfilter/nf_tables_ipv4.h23
-rw-r--r--include/net/netfilter/nf_tables_ipv6.h30
-rw-r--r--include/net/netns/nftables.h19
-rw-r--r--include/uapi/linux/netfilter/Kbuild2
-rw-r--r--include/uapi/linux/netfilter/nf_conntrack_common.h4
-rw-r--r--include/uapi/linux/netfilter/nf_tables.h718
-rw-r--r--include/uapi/linux/netfilter/nf_tables_compat.h38
-rw-r--r--include/uapi/linux/netfilter/nfnetlink.h10
-rw-r--r--net/bridge/br_netfilter.c22
-rw-r--r--net/bridge/netfilter/Kconfig3
-rw-r--r--net/bridge/netfilter/Makefile2
-rw-r--r--net/bridge/netfilter/ebtable_filter.c16
-rw-r--r--net/bridge/netfilter/ebtable_nat.c16
-rw-r--r--net/bridge/netfilter/nf_tables_bridge.c65
-rw-r--r--net/decnet/netfilter/dn_rtmsg.c2
-rw-r--r--net/ipv4/netfilter/Kconfig21
-rw-r--r--net/ipv4/netfilter/Makefile6
-rw-r--r--net/ipv4/netfilter/arptable_filter.c5
-rw-r--r--net/ipv4/netfilter/ipt_CLUSTERIP.c2
-rw-r--r--net/ipv4/netfilter/ipt_SYNPROXY.c2
-rw-r--r--net/ipv4/netfilter/iptable_filter.c7
-rw-r--r--net/ipv4/netfilter/iptable_mangle.c10
-rw-r--r--net/ipv4/netfilter/iptable_nat.c26
-rw-r--r--net/ipv4/netfilter/iptable_raw.c6
-rw-r--r--net/ipv4/netfilter/iptable_security.c7
-rw-r--r--net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c12
-rw-r--r--net/ipv4/netfilter/nf_defrag_ipv4.c6
-rw-r--r--net/ipv4/netfilter/nf_tables_arp.c102
-rw-r--r--net/ipv4/netfilter/nf_tables_ipv4.c128
-rw-r--r--net/ipv4/netfilter/nft_chain_nat_ipv4.c205
-rw-r--r--net/ipv4/netfilter/nft_chain_route_ipv4.c90
-rw-r--r--net/ipv4/netfilter/nft_reject_ipv4.c123
-rw-r--r--net/ipv6/netfilter/Kconfig13
-rw-r--r--net/ipv6/netfilter/Makefile5
-rw-r--r--net/ipv6/netfilter/ip6t_SYNPROXY.c2
-rw-r--r--net/ipv6/netfilter/ip6table_filter.c5
-rw-r--r--net/ipv6/netfilter/ip6table_mangle.c10
-rw-r--r--net/ipv6/netfilter/ip6table_nat.c27
-rw-r--r--net/ipv6/netfilter/ip6table_raw.c5
-rw-r--r--net/ipv6/netfilter/ip6table_security.c5
-rw-r--r--net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c14
-rw-r--r--net/ipv6/netfilter/nf_defrag_ipv6_hooks.c6
-rw-r--r--net/ipv6/netfilter/nf_tables_ipv6.c127
-rw-r--r--net/ipv6/netfilter/nft_chain_nat_ipv6.c211
-rw-r--r--net/ipv6/netfilter/nft_chain_route_ipv6.c88
-rw-r--r--net/netfilter/Kconfig52
-rw-r--r--net/netfilter/Makefile18
-rw-r--r--net/netfilter/core.c2
-rw-r--r--net/netfilter/ipvs/ip_vs_core.c42
-rw-r--r--net/netfilter/nf_nat_core.c20
-rw-r--r--net/netfilter/nf_tables_api.c3275
-rw-r--r--net/netfilter/nf_tables_core.c270
-rw-r--r--net/netfilter/nfnetlink.c175
-rw-r--r--net/netfilter/nft_bitwise.c146
-rw-r--r--net/netfilter/nft_byteorder.c173
-rw-r--r--net/netfilter/nft_cmp.c223
-rw-r--r--net/netfilter/nft_compat.c768
-rw-r--r--net/netfilter/nft_counter.c113
-rw-r--r--net/netfilter/nft_ct.c258
-rw-r--r--net/netfilter/nft_expr_template.c94
-rw-r--r--net/netfilter/nft_exthdr.c133
-rw-r--r--net/netfilter/nft_hash.c231
-rw-r--r--net/netfilter/nft_immediate.c132
-rw-r--r--net/netfilter/nft_limit.c119
-rw-r--r--net/netfilter/nft_log.c146
-rw-r--r--net/netfilter/nft_lookup.c141
-rw-r--r--net/netfilter/nft_meta.c228
-rw-r--r--net/netfilter/nft_meta_target.c117
-rw-r--r--net/netfilter/nft_nat.c220
-rw-r--r--net/netfilter/nft_payload.c160
-rw-r--r--net/netfilter/nft_rbtree.c247
-rw-r--r--security/selinux/hooks.c10
78 files changed, 10217 insertions, 132 deletions
diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index 61223c52414f..2077489f9887 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -42,7 +42,8 @@ int netfilter_init(void);
struct sk_buff;
-typedef unsigned int nf_hookfn(unsigned int hooknum,
+struct nf_hook_ops;
+typedef unsigned int nf_hookfn(const struct nf_hook_ops *ops,
struct sk_buff *skb,
const struct net_device *in,
const struct net_device *out,
@@ -52,12 +53,13 @@ struct nf_hook_ops {
struct list_head list;
/* User fills in from here down. */
- nf_hookfn *hook;
- struct module *owner;
- u_int8_t pf;
- unsigned int hooknum;
+ nf_hookfn *hook;
+ struct module *owner;
+ void *priv;
+ u_int8_t pf;
+ unsigned int hooknum;
/* Hooks are ordered in ascending priority. */
- int priority;
+ int priority;
};
struct nf_sockopt_ops {
diff --git a/include/linux/netfilter/nfnetlink.h b/include/linux/netfilter/nfnetlink.h
index 4f68cd7141d2..28c74367e900 100644
--- a/include/linux/netfilter/nfnetlink.h
+++ b/include/linux/netfilter/nfnetlink.h
@@ -14,6 +14,9 @@ struct nfnl_callback {
int (*call_rcu)(struct sock *nl, struct sk_buff *skb,
const struct nlmsghdr *nlh,
const struct nlattr * const cda[]);
+ int (*call_batch)(struct sock *nl, struct sk_buff *skb,
+ const struct nlmsghdr *nlh,
+ const struct nlattr * const cda[]);
const struct nla_policy *policy; /* netlink attribute policy */
const u_int16_t attr_count; /* number of nlattr's */
};
@@ -23,6 +26,8 @@ struct nfnetlink_subsystem {
__u8 subsys_id; /* nfnetlink subsystem ID */
__u8 cb_count; /* number of callbacks */
const struct nfnl_callback *cb; /* callback for individual types */
+ int (*commit)(struct sk_buff *skb);
+ int (*abort)(struct sk_buff *skb);
};
int nfnetlink_subsys_register(const struct nfnetlink_subsystem *n);
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index bcc4a8ed4450..da68c9a90ac5 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -22,6 +22,7 @@
#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
#include <net/netns/conntrack.h>
#endif
+#include <net/netns/nftables.h>
#include <net/netns/xfrm.h>
struct user_namespace;
@@ -101,6 +102,9 @@ struct net {
#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
struct netns_ct ct;
#endif
+#if defined(CONFIG_NF_TABLES) || defined(CONFIG_NF_TABLES_MODULE)
+ struct netns_nftables nft;
+#endif
#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
struct netns_nf_frag nf_frag;
#endif
diff --git a/include/net/netfilter/nf_nat.h b/include/net/netfilter/nf_nat.h
index c29b4e545f87..07eaaf604092 100644
--- a/include/net/netfilter/nf_nat.h
+++ b/include/net/netfilter/nf_nat.h
@@ -45,6 +45,9 @@ unsigned int nf_nat_setup_info(struct nf_conn *ct,
const struct nf_nat_range *range,
enum nf_nat_manip_type maniptype);
+extern unsigned int nf_nat_alloc_null_binding(struct nf_conn *ct,
+ unsigned int hooknum);
+
/* Is this tuple already taken? (not by us)*/
int nf_nat_used_tuple(const struct nf_conntrack_tuple *tuple,
const struct nf_conn *ignored_conntrack);
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
new file mode 100644
index 000000000000..54c4a5cafb64
--- /dev/null
+++ b/include/net/netfilter/nf_tables.h
@@ -0,0 +1,522 @@
+#ifndef _NET_NF_TABLES_H
+#define _NET_NF_TABLES_H
+
+#include <linux/list.h>
+#include <linux/netfilter.h>
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/nf_tables.h>
+#include <net/netlink.h>
+
+#define NFT_JUMP_STACK_SIZE 16
+
+struct nft_pktinfo {
+ struct sk_buff *skb;
+ const struct net_device *in;
+ const struct net_device *out;
+ u8 hooknum;
+ u8 nhoff;
+ u8 thoff;
+ /* for x_tables compatibility */
+ struct xt_action_param xt;
+};
+
+static inline void nft_set_pktinfo(struct nft_pktinfo *pkt,
+ const struct nf_hook_ops *ops,
+ struct sk_buff *skb,
+ const struct net_device *in,
+ const struct net_device *out)
+{
+ pkt->skb = skb;
+ pkt->in = pkt->xt.in = in;
+ pkt->out = pkt->xt.out = out;
+ pkt->hooknum = pkt->xt.hooknum = ops->hooknum;
+ pkt->xt.family = ops->pf;
+}
+
+struct nft_data {
+ union {
+ u32 data[4];
+ struct {
+ u32 verdict;
+ struct nft_chain *chain;
+ };
+ };
+} __attribute__((aligned(__alignof__(u64))));
+
+static inline int nft_data_cmp(const struct nft_data *d1,
+ const struct nft_data *d2,
+ unsigned int len)
+{
+ return memcmp(d1->data, d2->data, len);
+}
+
+static inline void nft_data_copy(struct nft_data *dst,
+ const struct nft_data *src)
+{
+ BUILD_BUG_ON(__alignof__(*dst) != __alignof__(u64));
+ *(u64 *)&dst->data[0] = *(u64 *)&src->data[0];
+ *(u64 *)&dst->data[2] = *(u64 *)&src->data[2];
+}
+
+static inline void nft_data_debug(const struct nft_data *data)
+{
+ pr_debug("data[0]=%x data[1]=%x data[2]=%x data[3]=%x\n",
+ data->data[0], data->data[1],
+ data->data[2], data->data[3]);
+}
+
+/**
+ * struct nft_ctx - nf_tables rule/set context
+ *
+ * @net: net namespace
+ * @skb: netlink skb
+ * @nlh: netlink message header
+ * @afi: address family info
+ * @table: the table the chain is contained in
+ * @chain: the chain the rule is contained in
+ * @nla: netlink attributes
+ */
+struct nft_ctx {
+ struct net *net;
+ const struct sk_buff *skb;
+ const struct nlmsghdr *nlh;
+ const struct nft_af_info *afi;
+ const struct nft_table *table;
+ const struct nft_chain *chain;
+ const struct nlattr * const *nla;
+};
+
+struct nft_data_desc {
+ enum nft_data_types type;
+ unsigned int len;
+};
+
+extern int nft_data_init(const struct nft_ctx *ctx, struct nft_data *data,
+ struct nft_data_desc *desc, const struct nlattr *nla);
+extern void nft_data_uninit(const struct nft_data *data,
+ enum nft_data_types type);
+extern int nft_data_dump(struct sk_buff *skb, int attr,
+ const struct nft_data *data,
+ enum nft_data_types type, unsigned int len);
+
+static inline enum nft_data_types nft_dreg_to_type(enum nft_registers reg)
+{
+ return reg == NFT_REG_VERDICT ? NFT_DATA_VERDICT : NFT_DATA_VALUE;
+}
+
+static inline enum nft_registers nft_type_to_reg(enum nft_data_types type)
+{
+ return type == NFT_DATA_VERDICT ? NFT_REG_VERDICT : NFT_REG_1;
+}
+
+extern int nft_validate_input_register(enum nft_registers reg);
+extern int nft_validate_output_register(enum nft_registers reg);
+extern int nft_validate_data_load(const struct nft_ctx *ctx,
+ enum nft_registers reg,
+ const struct nft_data *data,
+ enum nft_data_types type);
+
+/**
+ * struct nft_set_elem - generic representation of set elements
+ *
+ * @cookie: implementation specific element cookie
+ * @key: element key
+ * @data: element data (maps only)
+ * @flags: element flags (end of interval)
+ *
+ * The cookie can be used to store a handle to the element for subsequent
+ * removal.
+ */
+struct nft_set_elem {
+ void *cookie;
+ struct nft_data key;
+ struct nft_data data;
+ u32 flags;
+};
+
+struct nft_set;
+struct nft_set_iter {
+ unsigned int count;
+ unsigned int skip;
+ int err;
+ int (*fn)(const struct nft_ctx *ctx,
+ const struct nft_set *set,
+ const struct nft_set_iter *iter,
+ const struct nft_set_elem *elem);
+};
+
+/**
+ * struct nft_set_ops - nf_tables set operations
+ *
+ * @lookup: look up an element within the set
+ * @insert: insert new element into set
+ * @remove: remove element from set
+ * @walk: iterate over all set elemeennts
+ * @privsize: function to return size of set private data
+ * @init: initialize private data of new set instance
+ * @destroy: destroy private data of set instance
+ * @list: nf_tables_set_ops list node
+ * @owner: module reference
+ * @features: features supported by the implementation
+ */
+struct nft_set_ops {
+ bool (*lookup)(const struct nft_set *set,
+ const struct nft_data *key,
+ struct nft_data *data);
+ int (*get)(const struct nft_set *set,
+ struct nft_set_elem *elem);
+ int (*insert)(const struct nft_set *set,
+ const struct nft_set_elem *elem);
+ void (*remove)(const struct nft_set *set,
+ const struct nft_set_elem *elem);
+ void (*walk)(const struct nft_ctx *ctx,
+ const struct nft_set *set,
+ struct nft_set_iter *iter);
+
+ unsigned int (*privsize)(const struct nlattr * const nla[]);
+ int (*init)(const struct nft_set *set,
+ const struct nlattr * const nla[]);
+ void (*destroy)(const struct nft_set *set);
+
+ struct list_head list;
+ struct module *owner;
+ u32 features;
+};
+
+extern int nft_register_set(struct nft_set_ops *ops);
+extern void nft_unregister_set(struct nft_set_ops *ops);
+
+/**
+ * struct nft_set - nf_tables set instance
+ *
+ * @list: table set list node
+ * @bindings: list of set bindings
+ * @name: name of the set
+ * @ktype: key type (numeric type defined by userspace, not used in the kernel)
+ * @dtype: data type (verdict or numeric type defined by userspace)
+ * @ops: set ops
+ * @flags: set flags
+ * @klen: key length
+ * @dlen: data length
+ * @data: private set data
+ */
+struct nft_set {
+ struct list_head list;
+ struct list_head bindings;
+ char name[IFNAMSIZ];
+ u32 ktype;
+ u32 dtype;
+ /* runtime data below here */
+ const struct nft_set_ops *ops ____cacheline_aligned;
+ u16 flags;
+ u8 klen;
+ u8 dlen;
+ unsigned char data[]
+ __attribute__((aligned(__alignof__(u64))));
+};
+
+static inline void *nft_set_priv(const struct nft_set *set)
+{
+ return (void *)set->data;
+}
+
+extern struct nft_set *nf_tables_set_lookup(const struct nft_table *table,
+ const struct nlattr *nla);
+
+/**
+ * struct nft_set_binding - nf_tables set binding
+ *
+ * @list: set bindings list node
+ * @chain: chain containing the rule bound to the set
+ *
+ * A set binding contains all information necessary for validation
+ * of new elements added to a bound set.
+ */
+struct nft_set_binding {
+ struct list_head list;
+ const struct nft_chain *chain;
+};
+
+extern int nf_tables_bind_set(const struct nft_ctx *ctx, struct nft_set *set,
+ struct nft_set_binding *binding);
+extern void nf_tables_unbind_set(const struct nft_ctx *ctx, struct nft_set *set,
+ struct nft_set_binding *binding);
+
+
+/**
+ * struct nft_expr_type - nf_tables expression type
+ *
+ * @select_ops: function to select nft_expr_ops
+ * @ops: default ops, used when no select_ops functions is present
+ * @list: used internally
+ * @name: Identifier
+ * @owner: module reference
+ * @policy: netlink attribute policy
+ * @maxattr: highest netlink attribute number
+ */
+struct nft_expr_type {
+ const struct nft_expr_ops *(*select_ops)(const struct nft_ctx *,
+ const struct nlattr * const tb[]);
+ const struct nft_expr_ops *ops;
+ struct list_head list;
+ const char *name;
+ struct module *owner;
+ const struct nla_policy *policy;
+ unsigned int maxattr;
+};
+
+/**
+ * struct nft_expr_ops - nf_tables expression operations
+ *
+ * @eval: Expression evaluation function
+ * @size: full expression size, including private data size
+ * @init: initialization function
+ * @destroy: destruction function
+ * @dump: function to dump parameters
+ * @type: expression type
+ * @validate: validate expression, called during loop detection
+ * @data: extra data to attach to this expression operation
+ */
+struct nft_expr;
+struct nft_expr_ops {
+ void (*eval)(const struct nft_expr *expr,
+ struct nft_data data[NFT_REG_MAX + 1],
+ const struct nft_pktinfo *pkt);
+ unsigned int size;
+
+ int (*init)(const struct nft_ctx *ctx,
+ const struct nft_expr *expr,
+ const struct nlattr * const tb[]);
+ void (*destroy)(const struct nft_expr *expr);
+