From 102c9323c35a83789ad5ebd3c45fa8fb389add88 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (Red Hat)" Date: Fri, 12 Jul 2013 17:07:27 -0400 Subject: tracing: Add __tracepoint_string() to export string pointers There are several tracepoints (mostly in RCU), that reference a string pointer and uses the print format of "%s" to display the string that exists in the kernel, instead of copying the actual string to the ring buffer (saves time and ring buffer space). But this has an issue with userspace tools that read the binary buffers that has the address of the string but has no access to what the string itself is. The end result is just output that looks like: rcu_dyntick: ffffffff818adeaa 1 0 rcu_dyntick: ffffffff818adeb5 0 140000000000000 rcu_dyntick: ffffffff818adeb5 0 140000000000000 rcu_utilization: ffffffff8184333b rcu_utilization: ffffffff8184333b The above is pretty useless when read by the userspace tools. Ideally we would want something that looks like this: rcu_dyntick: Start 1 0 rcu_dyntick: End 0 140000000000000 rcu_dyntick: Start 140000000000000 0 rcu_callback: rcu_preempt rhp=0xffff880037aff710 func=put_cred_rcu 0/4 rcu_callback: rcu_preempt rhp=0xffff880078961980 func=file_free_rcu 0/5 rcu_dyntick: End 0 1 The trace_printk() which also only stores the address of the string format instead of recording the string into the buffer itself, exports the mapping of kernel addresses to format strings via the printk_format file in the debugfs tracing directory. The tracepoint strings can use this same method and output the format to the same file and the userspace tools will be able to decipher the address without any modification. The tracepoint strings need its own section to save the strings because the trace_printk section will cause the trace_printk() buffers to be allocated if anything exists within the section. trace_printk() is only used for debugging and should never exist in the kernel, we can not use the trace_printk sections. Add a new tracepoint_str section that will also be examined by the output of the printk_format file. Cc: Paul E. McKenney Signed-off-by: Steven Rostedt --- include/asm-generic/vmlinux.lds.h | 7 ++++++- include/linux/ftrace_event.h | 34 ++++++++++++++++++++++++++++++++++ 2 files changed, 40 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 69732d279e8b..83e2c31e8b00 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -122,8 +122,12 @@ #define TRACE_PRINTKS() VMLINUX_SYMBOL(__start___trace_bprintk_fmt) = .; \ *(__trace_printk_fmt) /* Trace_printk fmt' pointer */ \ VMLINUX_SYMBOL(__stop___trace_bprintk_fmt) = .; +#define TRACEPOINT_STR() VMLINUX_SYMBOL(__start___tracepoint_str) = .; \ + *(__tracepoint_str) /* Trace_printk fmt' pointer */ \ + VMLINUX_SYMBOL(__stop___tracepoint_str) = .; #else #define TRACE_PRINTKS() +#define TRACEPOINT_STR() #endif #ifdef CONFIG_FTRACE_SYSCALLS @@ -190,7 +194,8 @@ VMLINUX_SYMBOL(__stop___verbose) = .; \ LIKELY_PROFILE() \ BRANCH_PROFILE() \ - TRACE_PRINTKS() + TRACE_PRINTKS() \ + TRACEPOINT_STR() /* * Data section helpers diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h index 4372658c73ae..81af18a75f4d 100644 --- a/include/linux/ftrace_event.h +++ b/include/linux/ftrace_event.h @@ -357,6 +357,40 @@ do { \ __trace_printk(ip, fmt, ##args); \ } while (0) +/** + * tracepoint_string - register constant persistent string to trace system + * @str - a constant persistent string that will be referenced in tracepoints + * + * If constant strings are being used in tracepoints, it is faster and + * more efficient to just save the pointer to the string and reference + * that with a printf "%s" instead of saving the string in the ring buffer + * and wasting space and time. + * + * The problem with the above approach is that userspace tools that read + * the binary output of the trace buffers do not have access to the string. + * Instead they just show the address of the string which is not very + * useful to users. + * + * With tracepoint_string(), the string will be registered to the tracing + * system and exported to userspace via the debugfs/tracing/printk_formats + * file that maps the string address to the string text. This way userspace + * tools that read the binary buffers have a way to map the pointers to + * the ASCII strings they represent. + * + * The @str used must be a constant string and persistent as it would not + * make sense to show a string that no longer exists. But it is still fine + * to be used with modules, because when modules are unloaded, if they + * had tracepoints, the ring buffers are cleared too. As long as the string + * does not change during the life of the module, it is fine to use + * tracepoint_string() within a module. + */ +#define tracepoint_string(str) \ + ({ \ + static const char *___tp_str __tracepoint_string = str; \ + ___tp_str; \ + }) +#define __tracepoint_string __attribute__((section("__tracepoint_str"))) + #ifdef CONFIG_PERF_EVENTS struct perf_event; -- cgit v1.2.3 From e66c33d579ea566d10e8c8695a7168aae3e02992 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (Red Hat)" Date: Fri, 12 Jul 2013 16:50:28 -0400 Subject: rcu: Add const annotation to char * for RCU tracepoints and functions All the RCU tracepoints and functions that reference char pointers do so with just 'char *' even though they do not modify the contents of the string itself. This will cause warnings if a const char * is used in one of these functions. The RCU tracepoints store the pointer to the string to refer back to them when the trace output is displayed. As this can be minutes, hours or even days later, those strings had better be constant. This change also opens the door to allow the RCU tracepoint strings and their addresses to be exported so that userspace tracing tools can translate the contents of the pointers of the RCU tracepoints. Signed-off-by: Steven Rostedt --- include/linux/rcupdate.h | 4 +-- include/trace/events/rcu.h | 82 +++++++++++++++++++++++----------------------- 2 files changed, 43 insertions(+), 43 deletions(-) (limited to 'include') diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 4b14bdc911d7..0c38abbe6e35 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -52,7 +52,7 @@ extern int rcutorture_runnable; /* for sysctl */ #if defined(CONFIG_TREE_RCU) || defined(CONFIG_TREE_PREEMPT_RCU) extern void rcutorture_record_test_transition(void); extern void rcutorture_record_progress(unsigned long vernum); -extern void do_trace_rcu_torture_read(char *rcutorturename, +extern void do_trace_rcu_torture_read(const char *rcutorturename, struct rcu_head *rhp, unsigned long secs, unsigned long c_old, @@ -65,7 +65,7 @@ static inline void rcutorture_record_progress(unsigned long vernum) { } #ifdef CONFIG_RCU_TRACE -extern void do_trace_rcu_torture_read(char *rcutorturename, +extern void do_trace_rcu_torture_read(const char *rcutorturename, struct rcu_head *rhp, unsigned long secs, unsigned long c_old, diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h index 59ebcc89f148..ee2376cfaab3 100644 --- a/include/trace/events/rcu.h +++ b/include/trace/events/rcu.h @@ -19,12 +19,12 @@ */ TRACE_EVENT(rcu_utilization, - TP_PROTO(char *s), + TP_PROTO(const char *s), TP_ARGS(s), TP_STRUCT__entry( - __field(char *, s) + __field(const char *, s) ), TP_fast_assign( @@ -51,14 +51,14 @@ TRACE_EVENT(rcu_utilization, */ TRACE_EVENT(rcu_grace_period, - TP_PROTO(char *rcuname, unsigned long gpnum, char *gpevent), + TP_PROTO(const char *rcuname, unsigned long gpnum, const char *gpevent), TP_ARGS(rcuname, gpnum, gpevent), TP_STRUCT__entry( - __field(char *, rcuname) + __field(const char *, rcuname) __field(unsigned long, gpnum) - __field(char *, gpevent) + __field(const char *, gpevent) ), TP_fast_assign( @@ -89,21 +89,21 @@ TRACE_EVENT(rcu_grace_period, */ TRACE_EVENT(rcu_future_grace_period, - TP_PROTO(char *rcuname, unsigned long gpnum, unsigned long completed, + TP_PROTO(const char *rcuname, unsigned long gpnum, unsigned long completed, unsigned long c, u8 level, int grplo, int grphi, - char *gpevent), + const char *gpevent), TP_ARGS(rcuname, gpnum, completed, c, level, grplo, grphi, gpevent), TP_STRUCT__entry( - __field(char *, rcuname) + __field(const char *, rcuname) __field(unsigned long, gpnum) __field(unsigned long, completed) __field(unsigned long, c) __field(u8, level) __field(int, grplo) __field(int, grphi) - __field(char *, gpevent) + __field(const char *, gpevent) ), TP_fast_assign( @@ -132,13 +132,13 @@ TRACE_EVENT(rcu_future_grace_period, */ TRACE_EVENT(rcu_grace_period_init, - TP_PROTO(char *rcuname, unsigned long gpnum, u8 level, + TP_PROTO(const char *rcuname, unsigned long gpnum, u8 level, int grplo, int grphi, unsigned long qsmask), TP_ARGS(rcuname, gpnum, level, grplo, grphi, qsmask), TP_STRUCT__entry( - __field(char *, rcuname) + __field(const char *, rcuname) __field(unsigned long, gpnum) __field(u8, level) __field(int, grplo) @@ -168,12 +168,12 @@ TRACE_EVENT(rcu_grace_period_init, */ TRACE_EVENT(rcu_preempt_task, - TP_PROTO(char *rcuname, int pid, unsigned long gpnum), + TP_PROTO(const char *rcuname, int pid, unsigned long gpnum), TP_ARGS(rcuname, pid, gpnum), TP_STRUCT__entry( - __field(char *, rcuname) + __field(const char *, rcuname) __field(unsigned long, gpnum) __field(int, pid) ), @@ -195,12 +195,12 @@ TRACE_EVENT(rcu_preempt_task, */ TRACE_EVENT(rcu_unlock_preempted_task, - TP_PROTO(char *rcuname, unsigned long gpnum, int pid), + TP_PROTO(const char *rcuname, unsigned long gpnum, int pid), TP_ARGS(rcuname, gpnum, pid), TP_STRUCT__entry( - __field(char *, rcuname) + __field(const char *, rcuname) __field(unsigned long, gpnum) __field(int, pid) ), @@ -224,14 +224,14 @@ TRACE_EVENT(rcu_unlock_preempted_task, */ TRACE_EVENT(rcu_quiescent_state_report, - TP_PROTO(char *rcuname, unsigned long gpnum, + TP_PROTO(const char *rcuname, unsigned long gpnum, unsigned long mask, unsigned long qsmask, u8 level, int grplo, int grphi, int gp_tasks), TP_ARGS(rcuname, gpnum, mask, qsmask, level, grplo, grphi, gp_tasks), TP_STRUCT__entry( - __field(char *, rcuname) + __field(const char *, rcuname) __field(unsigned long, gpnum) __field(unsigned long, mask) __field(unsigned long, qsmask) @@ -268,15 +268,15 @@ TRACE_EVENT(rcu_quiescent_state_report, */ TRACE_EVENT(rcu_fqs, - TP_PROTO(char *rcuname, unsigned long gpnum, int cpu, char *qsevent), + TP_PROTO(const char *rcuname, unsigned long gpnum, int cpu, const char *qsevent), TP_ARGS(rcuname, gpnum, cpu, qsevent), TP_STRUCT__entry( - __field(char *, rcuname) + __field(const char *, rcuname) __field(unsigned long, gpnum) __field(int, cpu) - __field(char *, qsevent) + __field(const char *, qsevent) ), TP_fast_assign( @@ -308,12 +308,12 @@ TRACE_EVENT(rcu_fqs, */ TRACE_EVENT(rcu_dyntick, - TP_PROTO(char *polarity, long long oldnesting, long long newnesting), + TP_PROTO(const char *polarity, long long oldnesting, long long newnesting), TP_ARGS(polarity, oldnesting, newnesting), TP_STRUCT__entry( - __field(char *, polarity) + __field(const char *, polarity) __field(long long, oldnesting) __field(long long, newnesting) ), @@ -352,12 +352,12 @@ TRACE_EVENT(rcu_dyntick, */ TRACE_EVENT(rcu_prep_idle, - TP_PROTO(char *reason), + TP_PROTO(const char *reason), TP_ARGS(reason), TP_STRUCT__entry( - __field(char *, reason) + __field(const char *, reason) ), TP_fast_assign( @@ -376,13 +376,13 @@ TRACE_EVENT(rcu_prep_idle, */ TRACE_EVENT(rcu_callback, - TP_PROTO(char *rcuname, struct rcu_head *rhp, long qlen_lazy, + TP_PROTO(const char *rcuname, struct rcu_head *rhp, long qlen_lazy, long qlen), TP_ARGS(rcuname, rhp, qlen_lazy, qlen), TP_STRUCT__entry( - __field(char *, rcuname) + __field(const char *, rcuname) __field(void *, rhp) __field(void *, func) __field(long, qlen_lazy) @@ -412,13 +412,13 @@ TRACE_EVENT(rcu_callback, */ TRACE_EVENT(rcu_kfree_callback, - TP_PROTO(char *rcuname, struct rcu_head *rhp, unsigned long offset, + TP_PROTO(const char *rcuname, struct rcu_head *rhp, unsigned long offset, long qlen_lazy, long qlen), TP_ARGS(rcuname, rhp, offset, qlen_lazy, qlen), TP_STRUCT__entry( - __field(char *, rcuname) + __field(const char *, rcuname) __field(void *, rhp) __field(unsigned long, offset) __field(long, qlen_lazy) @@ -447,12 +447,12 @@ TRACE_EVENT(rcu_kfree_callback, */ TRACE_EVENT(rcu_batch_start, - TP_PROTO(char *rcuname, long qlen_lazy, long qlen, long blimit), + TP_PROTO(const char *rcuname, long qlen_lazy, long qlen, long blimit), TP_ARGS(rcuname, qlen_lazy, qlen, blimit), TP_STRUCT__entry( - __field(char *, rcuname) + __field(const char *, rcuname) __field(long, qlen_lazy) __field(long, qlen) __field(long, blimit) @@ -477,12 +477,12 @@ TRACE_EVENT(rcu_batch_start, */ TRACE_EVENT(rcu_invoke_callback, - TP_PROTO(char *rcuname, struct rcu_head *rhp), + TP_PROTO(const char *rcuname, struct rcu_head *rhp), TP_ARGS(rcuname, rhp), TP_STRUCT__entry( - __field(char *, rcuname) + __field(const char *, rcuname) __field(void *, rhp) __field(void *, func) ), @@ -506,12 +506,12 @@ TRACE_EVENT(rcu_invoke_callback, */ TRACE_EVENT(rcu_invoke_kfree_callback, - TP_PROTO(char *rcuname, struct rcu_head *rhp, unsigned long offset), + TP_PROTO(const char *rcuname, struct rcu_head *rhp, unsigned long offset), TP_ARGS(rcuname, rhp, offset), TP_STRUCT__entry( - __field(char *, rcuname) + __field(const char *, rcuname) __field(void *, rhp) __field(unsigned long, offset) ), @@ -539,13 +539,13 @@ TRACE_EVENT(rcu_invoke_kfree_callback, */ TRACE_EVENT(rcu_batch_end, - TP_PROTO(char *rcuname, int callbacks_invoked, + TP_PROTO(const char *rcuname, int callbacks_invoked, bool cb, bool nr, bool iit, bool risk), TP_ARGS(rcuname, callbacks_invoked, cb, nr, iit, risk), TP_STRUCT__entry( - __field(char *, rcuname) + __field(const char *, rcuname) __field(int, callbacks_invoked) __field(bool, cb) __field(bool, nr) @@ -577,13 +577,13 @@ TRACE_EVENT(rcu_batch_end, */ TRACE_EVENT(rcu_torture_read, - TP_PROTO(char *rcutorturename, struct rcu_head *rhp, + TP_PROTO(const char *rcutorturename, struct rcu_head *rhp, unsigned long secs, unsigned long c_old, unsigned long c), TP_ARGS(rcutorturename, rhp, secs, c_old, c), TP_STRUCT__entry( - __field(char *, rcutorturename) + __field(const char *, rcutorturename) __field(struct rcu_head *, rhp) __field(unsigned long, secs) __field(unsigned long, c_old) @@ -623,13 +623,13 @@ TRACE_EVENT(rcu_torture_read, */ TRACE_EVENT(rcu_barrier, - TP_PROTO(char *rcuname, char *s, int cpu, int cnt, unsigned long done), + TP_PROTO(const char *rcuname, const char *s, int cpu, int cnt, unsigned long done), TP_ARGS(rcuname, s, cpu, cnt, done), TP_STRUCT__entry( - __field(char *, rcuname) - __field(char *, s) + __field(const char *, rcuname) + __field(const char *, s) __field(int, cpu) __field(int, cnt) __field(unsigned long, done) -- cgit v1.2.3 From b778ae25366e6f3891fe51306f56a3bca211975d Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 23 Apr 2013 12:51:11 -0700 Subject: debugobjects: Make debug_object_activate() return status In order to better respond to things like duplicate invocations of call_rcu(), RCU needs to see the status of a call to debug_object_activate(). This would allow RCU to leak the callback in order to avoid adding freelist-reuse mischief to the duplicate invoations. This commit therefore makes debug_object_activate() return status, zero for success and -EINVAL for failure. Signed-off-by: Paul E. McKenney Cc: Mathieu Desnoyers Cc: Sedat Dilek Cc: Davidlohr Bueso Cc: Rik van Riel Cc: Thomas Gleixner Cc: Linus Torvalds Tested-by: Sedat Dilek Reviewed-by: Josh Triplett --- include/linux/debugobjects.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/linux/debugobjects.h b/include/linux/debugobjects.h index 0e5f5785d9f2..98ffcbd4888e 100644 --- a/include/linux/debugobjects.h +++ b/include/linux/debugobjects.h @@ -63,7 +63,7 @@ struct debug_obj_descr { extern void debug_object_init (void *addr, struct debug_obj_descr *descr); extern void debug_object_init_on_stack(void *addr, struct debug_obj_descr *descr); -extern void debug_object_activate (void *addr, struct debug_obj_descr *descr); +extern int debug_object_activate (void *addr, struct debug_obj_descr *descr); extern void debug_object_deactivate(void *addr, struct debug_obj_descr *descr); extern void debug_object_destroy (void *addr, struct debug_obj_descr *descr); extern void debug_object_free (void *addr, struct debug_obj_descr *descr); @@ -85,8 +85,8 @@ static inline void debug_object_init (void *addr, struct debug_obj_descr *descr) { } static inline void debug_object_init_on_stack(void *addr, struct debug_obj_descr *descr) { } -static inline void -debug_object_activate (void *addr, struct debug_obj_descr *descr) { } +static inline int +debug_object_activate (void *addr, struct debug_obj_descr *descr) { return 0; } static inline void debug_object_deactivate(void *addr, struct debug_obj_descr *descr) { } static inline void -- cgit v1.2.3 From c34ac00caefbe49d40058ae7200bd58725cebb45 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Fri, 28 Jun 2013 10:34:48 -0700 Subject: rculist: list_first_or_null_rcu() should use list_entry_rcu() list_first_or_null() should test whether the list is empty and return pointer to the first entry if not in a RCU safe manner. It's broken in several ways. * It compares __kernel @__ptr with __rcu @__next triggering the following sparse warning. net/core/dev.c:4331:17: error: incompatible types in comparison expression (different address spaces) * It doesn't perform rcu_dereference*() and computes the entry address using container_of() directly from the __rcu pointer which is inconsitent with other rculist interface. As a result, all three in-kernel users - net/core/dev.c, macvlan, cgroup - are buggy. They dereference the pointer w/o going through read barrier. * While ->next dereference passes through list_next_rcu(), the compiler is still free to fetch ->next more than once and thus nullify the "__ptr != __next" condition check. Fix it by making list_first_or_null_rcu() dereference ->next directly using ACCESS_ONCE() and then use list_entry_rcu() on it like other rculist accessors. v2: Paul pointed out that the compiler may fetch the pointer more than once nullifying the condition check. ACCESS_ONCE() added on ->next dereference. v3: Restored () around macro param which was accidentally removed. Spotted by Paul. Signed-off-by: Tejun Heo Reported-by: Fengguang Wu Cc: Dipankar Sarma Cc: "Paul E. McKenney" Cc: "David S. Miller" Cc: Li Zefan Cc: Patrick McHardy Cc: stable@vger.kernel.org Signed-off-by: Paul E. McKenney Reviewed-by: Josh Triplett --- include/linux/rculist.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/rculist.h b/include/linux/rculist.h index f4b1001a4676..4106721c4e5e 100644 --- a/include/linux/rculist.h +++ b/include/linux/rculist.h @@ -267,8 +267,9 @@ static inline void list_splice_init_rcu(struct list_head *list, */ #define list_first_or_null_rcu(ptr, type, member) \ ({struct list_head *__ptr = (ptr); \ - struct list_head __rcu *__next = list_next_rcu(__ptr); \ - likely(__ptr != __next) ? container_of(__next, type, member) : NULL; \ + struct list_head *__next = ACCESS_ONCE(__ptr->next); \ + likely(__ptr != __next) ? \ + list_entry_rcu(__next, type, member) : NULL; \ }) /** -- cgit v1.2.3 From feed66ed26a53e700ca02ce1744fed7d0c647292 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Thu, 9 May 2013 08:55:54 -0700 Subject: rcu: Eliminate unused APIs intended for adaptive ticks The rcu_user_enter_after_irq() and rcu_user_exit_after_irq() functions were intended for use by adaptive ticks, but changes in implementation have rendered them unnecessary. This commit therefore removes them. Reported-by: Frederic Weisbecker Signed-off-by: Paul E. McKenney Reviewed-by: Josh Triplett --- include/linux/rcupdate.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include') diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 0c38abbe6e35..30bea9c25735 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -229,13 +229,9 @@ extern void rcu_irq_exit(void); #ifdef CONFIG_RCU_USER_QS extern void rcu_user_enter(void); extern void rcu_user_exit(void); -extern void rcu_user_enter_after_irq(void); -extern void rcu_user_exit_after_irq(void); #else static inline void rcu_user_enter(void) { } static inline void rcu_user_exit(void) { } -static inline void rcu_user_enter_after_irq(void) { } -static inline void rcu_user_exit_after_irq(void) { } static inline void rcu_user_hooks_switch(struct task_struct *prev, struct task_struct *next) { } #endif /* CONFIG_RCU_USER_QS */ -- cgit v1.2.3 From 5a581b367b5df0531265311fc681c2abd377e5e6 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Sat, 27 Jul 2013 03:53:54 -0700 Subject: jiffies: Avoid undefined behavior from signed overflow According to the C standard 3.4.3p3, overflow of a signed integer results in undefined behavior. This commit therefore changes the definitions of time_after(), time_after_eq(), time_after64(), and time_after_eq64() to avoid this undefined behavior. The trick is that the subtraction is done using unsigned arithmetic, which according to 6.2.5p9 cannot overflow because it is defined as modulo arithmetic. This has the added (though admittedly quite small) benefit of shortening four lines of code by four characters each. Note that the C standard considers the cast from unsigned to signed to be implementation-defined, see 6.3.1.3p3. However, on a two's-complement system, an implementation that defines anything other than a reinterpretation of the bits is free to come to me, and I will be happy to act as a witness for its being committed to an insane asylum. (Although I have nothing against saturating arithmetic or signals in some cases, these things really should not be the default when compiling an operating-system kernel.) Signed-off-by: Paul E. McKenney Cc: John Stultz Cc: "David S. Miller" Cc: Arnd Bergmann Cc: Ingo Molnar Cc: Linus Torvalds Cc: Eric Dumazet Cc: Kevin Easton [ paulmck: Included time_after64() and time_after_eq64(), as suggested by Eric Dumazet, also fixed commit message.] Reviewed-by: Josh Triplett --- include/linux/jiffies.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'include') diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h index 97ba4e78a37e..d235e88cfd7c 100644 --- a/include/linux/jiffies.h +++ b/include/linux/jiffies.h @@ -101,13 +101,13 @@ static inline u64 get_jiffies_64(void) #define time_after(a,b) \ (typecheck(unsigned long, a) && \ typecheck(unsigned long, b) && \ - ((long)(b) - (long)(a) < 0)) + ((long)((b) - (a)) < 0)) #define time_before(a,b) time_after(b,a) #define time_after_eq(a,b) \ (typecheck(unsigned long, a) && \ typecheck(unsigned long, b) && \ - ((long)(a) - (long)(b) >= 0)) + ((long)((a) - (b)) >= 0)) #define time_before_eq(a,b) time_after_eq(b,a) /* @@ -130,13 +130,13 @@ static inline u64 get_jiffies_64(void) #define time_after64(a,b) \ (typecheck(__u64, a) && \ typecheck(__u64, b) && \ - ((__s64)(b) - (__s64)(a) < 0)) + ((__s64)((b) - (a)) < 0)) #define time_before64(a,b) time_after64(b,a) #define time_after_eq64(a,b) \ (typecheck(__u64, a) && \ typecheck(__u64, b) && \ - ((__s64)(a) - (__s64)(b) >= 0)) + ((__s64)((a) - (b)) >= 0)) #define time_before_eq64(a,b) time_after_eq64(b,a) #define time_in_range64(a, b, c) \ -- cgit v1.2.3 From 0edd1b1784cbdad55aca2c1293be018f53c0ab1d Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Fri, 21 Jun 2013 16:37:22 -0700 Subject: nohz_full: Add full-system-idle state machine This commit adds the state machine that takes the per-CPU idle data as input and produces a full-system-idle indication as output. This state machine is driven out of RCU's quiescent-state-forcing mechanism, which invokes rcu_sysidle_check_cpu() to collect per-CPU idle state and then rcu_sysidle_report() to drive the state machine. The full-system-idle state is sampled using rcu_sys_is_idle(), which also drives the state machine if RCU is idle (and does so by forcing RCU to become non-idle). This function returns true if all but the timekeeping CPU (tick_do_timer_cpu) are idle and have been idle long enough to avoid memory contention on the full_sysidle_state state variable. The rcu_sysidle_force_exit() may be called externally to reset the state machine back into non-idle state. For large systems the state machine is driven out of RCU's force-quiescent-state logic, which provides good scalability at the price of millisecond-scale latencies on the transition to full-system-idle state. This is not so good for battery-powered systems, which are usually small enough that they don't need to care about scalability, but which do care deeply about energy efficiency. Small systems therefore drive the state machine directly out of the idle-entry code. The number of CPUs in a "small" system is defined by a new NO_HZ_FULL_SYSIDLE_SMALL Kconfig parameter, which defaults to 8. Note that this is a build-time definition. Signed-off-by: Paul E. McKenney Cc: Frederic Weisbecker Cc: Steven Rostedt Cc: Lai Jiangshan [ paulmck: Use true and false for boolean constants per Lai Jiangshan. ] Reviewed-by: Josh Triplett [ paulmck: Simplify logic and provide better comments for memory barriers, based on review comments and questions by Lai Jiangshan. ] --- include/linux/rcupdate.h | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) (limited to 'include') diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 30bea9c25735..f1f1bc39346b 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -1011,4 +1011,22 @@ static inline bool rcu_is_nocb_cpu(int cpu) { return false; } #endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */ +/* Only for use by adaptive-ticks code. */ +#ifdef CONFIG_NO_HZ_FULL_SYSIDLE +extern bool rcu_sys_is_idle(void); +extern void rcu_sysidle_force_exit(void); +#else /* #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */ + +static inline bool rcu_sys_is_idle(void) +{ + return false; +} + +static inline void rcu_sysidle_force_exit(void) +{ +} + +#endif /* #else #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */ + + #endif /* __LINUX_RCUPDATE_H */ -- cgit v1.2.3