summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2015-07-04block: Add support for DAX reads/writes to block devicesMatthew Wilcox
If a block device supports the ->direct_access methods, bypass the normal DIO path and use DAX to go straight to memcpy() instead of allocating a DIO and a BIO. Includes support for the DIO_SKIP_DIO_COUNT flag in DAX, as is done in do_blockdev_direct_IO(). Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-04dax: Use copy_from_iter_nocacheMatthew Wilcox
When userspace does a write, there's no need for the written data to pollute the CPU cache. This matches the original XIP code. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-04dax: Add block size note to documentationMatthew Wilcox
For block devices which are small enough, mkfs will default to creating a filesystem with block sizes smaller than page size. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "Except for the preempt notifiers fix, these are all small bugfixes that could have been waited for -rc2. Sending them now since I was taking care of Peter's patch anyway" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: add hyper-v crash msrs values KVM: x86: remove data variable from kvm_get_msr_common KVM: s390: virtio-ccw: don't overwrite config space values KVM: x86: keep track of LVT0 changes under APICv KVM: x86: properly restore LVT0 KVM: x86: make vapics_in_nmi_mode atomic sched, preempt_notifier: separate notifier registration from static_key inc/dec
2015-07-04NTB: Add split BAR output for debugfs statsDave Jiang
When split BAR is enabled, the driver needs to dump out the split BAR registers rather than the original 64bit BAR registers. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Change WARN_ON_ONCE to pr_warn_once on unsafeDave Jiang
The unsafe doorbell and scratchpad access should display reason when WARN is called. Otherwise we get a stack dump without any explanation. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Print driver name and version in module initDave Jiang
Printouts driver name and version to indicate what is being loaded. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Increase transport MTU to 64k from 16kDave Jiang
Benchmarking showed a significant performance increase with the MTU size to 64k instead of 16k. Change the driver default to 64k. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Rename Intel code names to platform namesDave Jiang
Instead of using the platform code names, use the correct platform names to identify the respective Intel NTB hardware. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Default to CPU memcpy for performanceDave Jiang
Disable DMA usage by default, since the CPU provides much better performance with write combining. Provide a module parameter to enable DMA usage when offloading the memcpy is preferred. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Improve performance with write combiningDave Jiang
Changing the memory window BAR mappings to write combining significantly boosts the performance. We will also use memcpy that uses non-temporal store, which showed performance improvement when doing non-cached memcpys. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Use NUMA memory in Intel driverAllen Hubbe
Allocate memory for the NUMA node of the NTB device. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Use NUMA memory and DMA chan in transportAllen Hubbe
Allocate memory and request the DMA channel for the same NUMA node as the NTB device. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Rate limit ntb_qp_link_workAllen Hubbe
When the ntb transport is connecting and waiting for the peer, the debug console receives lots of debug level messages about the remote qp link status being down. Rate limit those messages. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Add tool test clientAllen Hubbe
This is a simple debugging driver that enables the doorbell and scratch pad registers to be read and written from the debugfs. This tool enables more complicated debugging to be scripted from user space. This driver may be used to test that your ntb hardware and drivers are functioning at a basic level. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Add ping pong test clientAllen Hubbe
This is a simple ping pong driver that exercises the scratch pads and doorbells of the ntb hardware. This driver may be used to test that your ntb hardware and drivers are functioning at a basic level. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Add parameters for Intel SNB B2B addressesAllen Hubbe
Add module parameters for the addresses to be used in B2B topology. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Reset transport QP link stats on downAllen Hubbe
Reset the link stats when the link goes down. In particular, the TX and RX index and count must be reset, or else the TX side will be sending packets to the RX side where the RX side is not expecting them. Reset all the stats, to be consistent. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Do not advance transport RX on link downAllen Hubbe
On link down, don't advance RX index to the next entry. The next entry should never be valid after receiving the link down flag. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Differentiate transport link down messagesAllen Hubbe
The same message "qp %d: Link Down\n" was printed at two locations in ntb_transport. Change the messages so they are distinct. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Check the device ID to set errata flagsDave Jiang
Set errata flags for the specific device IDs to which they apply, instead of the whole Xeon hardware class. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Enable link for Intel root port mode in probeDave Jiang
Link training should be enabled in the driver probe for root port mode. We should not have to wait for transport to be loaded for this to happen. Otherwise the ntb device will not show up on the transparent bridge side of the link. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Read peer info from local SPAD in transportDave Jiang
The transport was writing and then reading the peer scratch pad, essentially reading what it just wrote instead of exchanging any information with the peer. The transport expects the peer values to be the same as the local values, so this issue was not obvious. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Split ntb_hw_intel and ntb_transport driversAllen Hubbe
Change ntb_hw_intel to use the new NTB hardware abstraction layer. Split ntb_transport into its own driver. Change it to use the new NTB hardware abstraction layer. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Add NTB hardware abstraction layerAllen Hubbe
Abstract the NTB device behind a programming interface, so that it can support different hardware and client drivers. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq update from Thomas Gleixner: "The last update for 4.2 is just moving a macro from a local header to the global one, so it can be used in architecture code as well. Cleanup of the now empty local header is 4.3 material" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip: Move IRQCHIP_DECLARE macro to include/linux/irqchip.h
2015-07-04Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Two FPU rewrite related fixes. This addresses all known x86 regressions at this stage. Also some other misc fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu: Fix boot crash in the early FPU code x86/asm/entry/64: Update path names x86/fpu: Fix FPU related boot regression when CPUID masking BIOS feature is enabled x86/boot/setup: Clean up the e820_reserve_setup_data() code x86/kaslr: Fix typo in the KASLR_FLAG documentation
2015-07-04Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Debug info and other statistics fixes and related enhancements" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/numa: Fix numa balancing stats in /proc/pid/sched sched/numa: Show numa_group ID in /proc/sched_debug task listings sched/debug: Move print_cfs_rq() declaration to kernel/sched/sched.h sched/stat: Expose /proc/pid/schedstat if CONFIG_SCHED_INFO=y sched/stat: Simplify the sched_info accounting dependency
2015-07-04Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "This tree includes an x86 PMU scheduling fix, but most changes are late breaking tooling fixes and updates: User visible fixes: - Create config.detected into OUTPUT directory, fixing parallel builds sharing the same source directory (Aaro Kiskinen) - Allow to specify custom linker command, fixing some MIPS64 builds. (Aaro Kiskinen) - Fix to show proper convergence stats in 'perf bench numa' (Srikar Dronamraju) User visible changes: - Validate syscall list passed via -e argument to 'perf trace'. (Arnaldo Carvalho de Melo) - Introduce 'perf stat --per-thread' (Jiri Olsa) - Check access permission for --kallsyms and --vmlinux (Li Zhang) - Move toggling event logic from 'perf top' and into hists browser, allowing freeze/unfreeze with event lists with more than one entry (Namhyung Kim) - Add missing newlines when dumping PERF_RECORD_FINISHED_ROUND and showing the Aggregated stats in 'perf report -D' (Adrian Hunter) Infrastructure fixes: - Add missing break for PERF_RECORD_ITRACE_START, which caused those events samples to be parsed as well as PERF_RECORD_LOST_SAMPLES. ITRACE_START only appears when Intel PT or BTS are present, so.. (Jiri Olsa) - Call the perf_session destructor when bailing out in the inject, kmem, report, kvm and mem tools (Taeung Song) Infrastructure changes: - Move stuff out of 'perf stat' and into the lib for further use (Jiri Olsa) - Reference count the cpu_map and thread_map classes (Jiri Olsa) - Set evsel->{cpus,threads} from the evlist, if not set, allowing the generalization of some 'perf stat' functions that previously were accessing private static evlist variable (Jiri Olsa) - Delete an unnecessary check before the calling free_event_desc() (Markus Elfring) - Allow auxtrace data alignment (Adrian Hunter) - Allow events with dot (Andi Kleen) - Fix failure to 'perf probe' events on arm (He Kuang) - Add testing for Makefile.perf (Jiri Olsa) - Add test for make install with prefix (Jiri Olsa) - Fix single target build dependency check (Jiri Olsa) - Access thread_map entries via accessors, prep patch to hold more info per entry, for ongoing 'perf stat --per-thread' work (Jiri Olsa) - Use __weak definition from compiler.h (Sukadev Bhattiprolu) - Split perf_pmu__new_alias() (Sukadev Bhattiprolu)" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) perf tools: Allow to specify custom linker command perf tools: Create config.detected into OUTPUT directory perf mem: Fill in the missing session freeing after an error occurs perf kvm: Fill in the missing session freeing after an error occurs perf report: Fill in the missing session freeing after an error occurs perf kmem: Fill in the missing session freeing after an error occurs perf inject: Fill in the missing session freeing after an error occurs perf tools: Add missing break for PERF_RECORD_ITRACE_START perf/x86: Fix 'active_events' imbalance perf symbols: Check access permission when reading symbol files perf stat: Introduce --per-thread option perf stat: Introduce print_counters function perf stat: Using init_stats instead of memset perf stat: Rename print_interval to process_interval perf stat: Remove perf_evsel__read_cb function perf stat: Move perf_stat initialization counter process code perf stat: Move zero_per_pkg into counter process code perf stat: Separate counters reading and processing perf stat: Introduce read_counters function perf stat: Introduce perf_evsel__read function ...
2015-07-04Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull max log buf size increase from Ingo Molnar: "Ran into this limit recently, so increase it by an order of magnitude" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: printk: Increase maximum CONFIG_LOG_BUF_SHIFT from 21 to 25
2015-07-04Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull second round of input updates from Dmitry Torokhov: "A new driver for Weida wdt87xx touch controllers, and a bunch of fixups for other drivers" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: wdt87xx_i2c - add a scaling factor for TOUCH_MAJOR event Input: wdt87xx_i2c - remove stray newline in diagnostic message Input: arc_ps2 - add HAS_IOMEM dependency Input: wdt87xx_i2c - fix format warning Input: improve parsing OF parameters for touchscreens Input: edt-ft5x06 - mark as direct input device Input: use for_each_set_bit() where appropriate Input: add a driver for wdt87xx touchscreen controller Input: axp20x-pek - fix reporting button state as inverted Input: xpad - re-send LED command on present event Input: xpad - set the LEDs properly on XBox Wireless controllers Input: imx_keypad - check for clk_prepare_enable() error
2015-07-04x86/fpu: Fix boot crash in the early FPU codeIngo Molnar
Jan Kara and Thomas Gleixner reported boot crashes in the FPU code: general protection fault: 0000 [#1] SMP RIP: 0010:[<ffffffff81048a6c>] [<ffffffff81048a6c>] mxcsr_feature_mask_init+0x1c/0x40 2b:* 0f ae 85 00 fe ff ff fxsave -0x200(%rbp) and bisected it down to the following FPU commit: 91a8c2a5b43f ("x86/fpu: Clean up and fix MXCSR handling") The reason is that the on-stack FPU registers state variable, used by the FXSAVE instruction, did not have the required minimum alignment of 16 bytes, causing the general protection fault. This is most likely a GCC bug in older GCC versions, but the offending commit also added a bogus extra 32-byte alignment (which GCC ignored too). So fix this bug by making the variable static again, but also mark it __initdata this time, because fpu__init_system_mxcsr() is now an __init function. Reported-and-bisected-by: Jan Kara <jack@suse.cz> Reported-bisected-and-tested-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan Kara <jack@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150704075819.GA9201@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-04sched/numa: Fix numa balancing stats in /proc/pid/schedSrikar Dronamraju
Commit 44dba3d5d6a1 ("sched: Refactor task_struct to use numa_faults instead of numa_* pointers") modified the way tsk->numa_faults stats are accounted. However that commit never touched show_numa_stats() that is displayed in /proc/pid/sched and thus the numbers displayed in /proc/pid/sched don't match the actual numbers. Fix it by making sure that /proc/pid/sched reflects the task fault numbers. Also add group fault stats too. Also couple of more modifications are added here: 1. Format changes: - Previously we would list two entries per node, one for private and one for shared. Also the home node info was listed in each entry. - Now preferred node, total_faults and current node are displayed separately. - Now there is one entry per node, that lists private,shared task and group faults. 2. Unit changes: - p->numa_pages_migrated was getting reset after every read of /proc/pid/sched. It's more useful to have absolute numbers since differential migrations between two accesses can be more easily calculated. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Iulia Manda <iulia.manda21@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1435252903-1081-4-git-send-email-srikar@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-04sched/numa: Show numa_group ID in /proc/sched_debug task listingsSrikar Dronamraju
Having the numa group ID in /proc/sched_debug helps to see how the numa groups have spread across the system. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Iulia Manda <iulia.manda21@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1435252903-1081-3-git-send-email-srikar@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-04sched/debug: Move print_cfs_rq() declaration to kernel/sched/sched.hSrikar Dronamraju
Currently print_cfs_rq() is declared in include/linux/sched.h. However it's not used outside kernel/sched. Hence move the declaration to kernel/sched/sched.h Also some functions are only available for CONFIG_SCHED_DEBUG=y. Hence move the declarations to within the #ifdef. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Iulia Manda <iulia.manda21@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1435252903-1081-2-git-send-email-srikar@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-04sched/stat: Expose /proc/pid/schedstat if CONFIG_SCHED_INFO=yNaveen N. Rao
Expand /proc/pid/schedstat output: - enable it on CONFIG_TASK_DELAY_ACCT=y && !CONFIG_SCHEDSTATS kernels. - dump all zeroes on kernels that are booted with the 'nodelayacct' option, which boot option disables delay accounting on CONFIG_TASK_DELAY_ACCT=y kernels. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: a.p.zijlstra@chello.nl Cc: ricklind@us.ibm.com Link: http://lkml.kernel.org/r/5ccbef17d4bc841084ea6e6421d4e4a23b7b806f.1435654789.git.naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-04sched/stat: Simplify the sched_info accounting dependencyNaveen N. Rao
Both CONFIG_SCHEDSTATS=y and CONFIG_TASK_DELAY_ACCT=y track task sched_info, which results in ugly #if clauses. Simplify the code by introducing a synthethic CONFIG_SCHED_INFO switch, selected by both. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: a.p.zijlstra@chello.nl Cc: ricklind@us.ibm.com Link: http://lkml.kernel.org/r/8d19eef800811a94b0f91bcbeb27430a884d7433.1435255405.git.naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-03Merge branch 'next' into for-linusDmitry Torokhov
Prepare second round of input updates for 4.2 merge window.
2015-07-04ext4: correctly migrate a file with a hole at the beginningEryu Guan
Currently ext4_ind_migrate() doesn't correctly handle a file which contains a hole at the beginning of the file. This caused the migration to be done incorrectly, and then if there is a subsequent following delayed allocation write to the "hole", this would reclaim the same data blocks again and results in fs corruption. # assmuing 4k block size ext4, with delalloc enabled # skip the first block and write to the second block xfs_io -fc "pwrite 4k 4k" -c "fsync" /mnt/ext4/testfile # converting to indirect-mapped file, which would move the data blocks # to the beginning of the file, but extent status cache still marks # that region as a hole chattr -e /mnt/ext4/testfile # delayed allocation writes to the "hole", reclaim the same data block # again, results in i_blocks corruption xfs_io -c "pwrite 0 4k" /mnt/ext4/testfile umount /mnt/ext4 e2fsck -nf /dev/sda6 ... Inode 53, i_blocks is 16, should be 8. Fix? no ... Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2015-07-03ext4: be more strict when migrating to non-extent based fileEryu Guan
Currently the check in ext4_ind_migrate() is not enough before doing the real conversion: a) delayed allocated extents could bypass the check on eh->eh_entries and eh->eh_depth This can be demonstrated by this script xfs_io -fc "pwrite 0 4k" -c "pwrite 8k 4k" /mnt/ext4/testfile chattr -e /mnt/ext4/testfile where testfile has two extents but still be converted to non-extent based file format. b) only extent length is checked but not the offset, which would result in data lose (delalloc) or fs corruption (nodelalloc), because non-extent based file only supports at most (12 + 2^10 + 2^20 + 2^30) blocks This can be demostrated by xfs_io -fc "pwrite 5T 4k" /mnt/ext4/testfile chattr -e /mnt/ext4/testfile sync If delalloc is enabled, dmesg prints EXT4-fs warning (device dm-4): ext4_block_to_path:105: block 1342177280 > max in inode 53 EXT4-fs (dm-4): Delayed block allocation failed for inode 53 at logical offset 1342177280 with max blocks 1 with error 5 EXT4-fs (dm-4): This should not happen!! Data will be lost If delalloc is disabled, e2fsck -nf shows corruption Inode 53, i_size is 5497558142976, should be 4096. Fix? no Fix the two issues by a) forcing all delayed allocation blocks to be allocated before checking eh->eh_depth and eh->eh_entries b) limiting the last logical block of the extent is within direct map Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2015-07-03ext4: fix reservation release on invalidatepage for delalloc fsLukas Czerner
On delalloc enabled file system on invalidatepage operation in ext4_da_page_release_reservation() we want to clear the delayed buffer and remove the extent covering the delayed buffer from the extent status tree. However currently there is a bug where on the systems with page size > block size we will always remove extents from the start of the page regardless where the actual delayed buffers are positioned in the page. This leads to the errors like this: EXT4-fs warning (device loop0): ext4_da_release_space:1225: ext4_da_release_space: ino 13, to_free 1 with only 0 reserved data blocks This however can cause data loss on writeback time if the file system is in ENOSPC condition because we're releasing reservation for someones else delayed buffer. Fix this by only removing extents that corresponds to the part of the page we want to invalidate. This problem is reproducible by the following fio receipt (however I was only able to reproduce it with fio-2.1 or older. [global] bs=8k iodepth=1024 iodepth_batch=60 randrepeat=1 size=1m directory=/mnt/test numjobs=20 [job1] ioengine=sync bs=1k direct=1 rw=randread filename=file1:file2 [job2] ioengine=libaio rw=randwrite direct=1 filename=file1:file2 [job3] bs=1k ioengine=posixaio rw=randwrite direct=1 filename=file1:file2 [job5] bs=1k ioengine=sync rw=randread filename=file1:file2 [job7] ioengine=libaio rw=randwrite filename=file1:file2 [job8] ioengine=posixaio rw=randwrite filename=file1:file2 [job10] ioengine=mmap rw=randwrite bs=1k filename=file1:file2 [job11] ioengine=mmap rw=randwrite direct=1 filename=file1:file2 Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org
2015-07-03Merge tag 'topic/drm-fixes-2015-07-04' of ↵Linus Torvalds
git://anongit.freedesktop.org/drm-intel Pull drm EDID fix from Daniel Vetter: "Since Dave is enjoying vacation I figured I'll send you this drm core fix directly" * tag 'topic/drm-fixes-2015-07-04' of git://anongit.freedesktop.org/drm-intel: drm/crtc: Fix edid length computation
2015-07-03Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio/vhost cross endian support from Michael Tsirkin: "I have just queued some more bugfix patches today but none fix regressions and none are related to these ones, so it looks like a good time for a merge for -rc1. The motivation for this is support for legacy BE guests on the new LE hosts. There are two redeeming properties that made me merge this: - It's a trivial amount of code: since we wrap host/guest accesses anyway, almost all of it is well hidden from drivers. - Sane platforms would never set flags like VHOST_CROSS_ENDIAN_LEGACY, and when it's clear, there's zero overhead (as some point it was tested by compiling with and without the patches, got the same stripped binary). Maybe we could create a Kconfig symbol to enforce the second point: prevent people from enabling it eg on x86. I will look into this" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio-pci: alloc only resources actually used. macvtap/tun: cross-endian support for little-endian hosts vhost: cross-endian support for legacy devices virtio: add explicit big-endian support to memory accessors vhost: introduce vhost_is_little_endian() helper vringh: introduce vringh_is_little_endian() helper macvtap: introduce macvtap_is_little_endian() helper tun: add tun_is_little_endian() helper virtio: introduce virtio_is_little_endian() helper
2015-07-04drm/crtc: Fix edid length computationShixin Zeng
The length of each EDID block is EDID_LENGTH, and number of blocks is (1 + edid->extensions) - we need to multiply not add them. This causes wrong EDID to be passed on, and is a regression introduced by d2ed34362a52 (drm: Introduce helper for replacing blob properties) Signed-off-by: Shixin Zeng <zeng.shixin@gmail.com> Cc: Daniel Stone <daniels@collabora.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Daniel Stone <daniels@collabora.com> [danvet: Add Cc: and fix commit summary.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-07-03Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace updates from Eric Biederman: "Long ago and far away when user namespaces where young it was realized that allowing fresh mounts of proc and sysfs with only user namespace permissions could violate the basic rule that only root gets to decide if proc or sysfs should be mounted at all. Some hacks were put in place to reduce the worst of the damage could be done, and the common sense rule was adopted that fresh mounts of proc and sysfs should allow no more than bind mounts of proc and sysfs. Unfortunately that rule has not been fully enforced. There are two kinds of gaps in that enforcement. Only filesystems mounted on empty directories of proc and sysfs should be ignored but the test for empty directories was insufficient. So in my tree directories on proc, sysctl and sysfs that will always be empty are created specially. Every other technique is imperfect as an ordinary directory can have entries added even after a readdir returns and shows that the directory is empty. Special creation of directories for mount points makes the code in the kernel a smidge clearer about it's purpose. I asked container developers from the various container projects to help test this and no holes were found in the set of mount points on proc and sysfs that are created specially. This set of changes also starts enforcing the mount flags of fresh mounts of proc and sysfs are consistent with the existing mount of proc and sysfs. I expected this to be the boring part of the work but unfortunately unprivileged userspace winds up mounting fresh copies of proc and sysfs with noexec and nosuid clear when root set those flags on the previous mount of proc and sysfs. So for now only the atime, read-only and nodev attributes which userspace happens to keep consistent are enforced. Dealing with the noexec and nosuid attributes remains for another time. This set of changes also addresses an issue with how open file descriptors from /proc/<pid>/ns/* are displayed. Recently readlink of /proc/<pid>/fd has been triggering a WARN_ON that has not been meaningful since it was added (as all of the code in the kernel was converted) and is not now actively wrong. There is also a short list of issues that have not been fixed yet that I will mention briefly. It is possible to rename a directory from below to above a bind mount. At which point any directory pointers below the renamed directory can be walked up to the root directory of the filesystem. With user namespaces enabled a bind mount of the bind mount can be created allowing the user to pick a directory whose children they can rename to outside of the bind mount. This is challenging to fix and doubly so because all obvious solutions must touch code that is in the performance part of pathname resolution. As mentioned above there is also a question of how to ensure that developers by accident or with purpose do not introduce exectuable files on sysfs and proc and in doing so introduce security regressions in the current userspace that will not be immediately obvious and as such are likely to require breaking userspace in painful ways once they are recognized" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: vfs: Remove incorrect debugging WARN in prepend_path mnt: Update fs_fully_visible to test for permanently empty directories sysfs: Create mountpoints with sysfs_create_mount_point sysfs: Add support for permanently empty directories to serve as mount points. kernfs: Add support for always empty directories. proc: Allow creating permanently empty directories that serve as mount points sysctl: Allow creating permanently empty directories that serve as mountpoints. fs: Add helper functions for permanently empty directories. vfs: Ignore unlocked mounts in fs_fully_visible mnt: Modify fs_fully_visible to deal with locked ro nodev and atime mnt: Refactor the logic for mounting sysfs and proc in a user namespace
2015-07-03Merge tag 'remoteproc-4.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ohad/remoteproc Pull remoteproc updates from Ohad Ben-Cohen: - remoteproc fixes/cleanups from Suman Anna - new remoteproc TI Wakeup M3 driver from Dave Gerlach - remoteproc core support for TI's Wakeup M3 driver from both Dave and Suman - tiny remoteproc build fix from myself * tag 'remoteproc-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/remoteproc: remoteproc: fix !CONFIG_OF build breakage remoteproc/wkup_m3: add a remoteproc driver for TI Wakeup M3 Documentation: dt: add bindings for TI Wakeup M3 processor remoteproc: add a rproc ops for performing address translation remoteproc: introduce rproc_get_by_phandle API remoteproc: fix various checkpatch warnings remoteproc/davinci: fix quoted split string checkpatch warning remoteproc/ste: add blank lines after declarations
2015-07-03Merge tag 'hwspinlock-4.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ohad/hwspinlock Pull hwspinlock updates from Ohad Ben-Cohen: - hwspinlock core DT support from Suman Anna - OMAP hwspinlock DT support from Suman Anna - QCOM hwspinlock DT support from Bjorn Andersson - a new CSR atlas7 hwspinlock driver from Wei Chen - CSR atlas7 hwspinlock DT binding document from Wei Chen - a tiny QCOM hwspinlock driver fix from Bjorn Andersson * tag 'hwspinlock-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/hwspinlock: hwspinlock: qcom: Correct msb in regmap_field DT: hwspinlock: add the CSR atlas7 hwspinlock bindings document hwspinlock: add a CSR atlas7 driver hwspinlock: qcom: Add support for Qualcomm HW Mutex block DT: hwspinlock: Add binding documentation for Qualcomm hwmutex hwspinlock/omap: add support for dt nodes Documentation: dt: add the omap hwspinlock bindings document hwspinlock/core: add device tree support Documentation: dt: add common bindings for hwspinlock
2015-07-03ARM: avoid unwanted GCC memset()/memcpy() optimisations for IO variantsRussell King
We don't want GCC optimising our memset_io(), memcpy_fromio() or memcpy_toio() variants, so we must not call one of the standard functions. Provide a separate name for our assembly memcpy() and memset() functions, and use that instead, thereby bypassing GCC's ability to optimise these operations. GCCs optimisation may introduce unaligned accesses which are invalid for device mappings. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-07-03Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes (and cleanups) from Catalin Marinas: "Various arm64 fixes: - suspicious RCU usage warning - BPF (out of bounds array read and endianness conversion) - perf (of_node usage after of_node_put, cpu_pmu->plat_device assignment) - huge pmd/pud check for value 0 - rate-limiting should only take unhandled signals into account Clean-up: - incorrect use of pgprot_t type - unused header include - __init annotation to arm_cpuidle_init - pr_debug instead of pr_error for disabled GICC entries in ACPI/MADT" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Fix show_unhandled_signal_ratelimited usage ARM64 / SMP: Switch pr_err() to pr_debug() for disabled GICC entry arm64: cpuidle: add __init section marker to arm_cpuidle_init arm64: Don't report clear pmds and puds as huge arm64: perf: fix unassigned cpu_pmu->plat_device when probing PMU PPIs arm64: perf: Don't use of_node after putting it arm64: fix incorrect use of pgprot_t variable arm64/hw_breakpoint.c: remove unnecessary header arm64: bpf: fix endianness conversion bugs arm64: bpf: fix out-of-bounds read in bpf2a64_offset() ARM64: smp: Fix suspicious RCU usage with ipi tracepoints
2015-07-03Merge tag 'nios2-v4.2' of git://git.rocketboards.org/linux-socfpga-nextLinus Torvalds
Pull nios2 update from Ley Foon Tan: "Check number of timer instances" * tag 'nios2-v4.2' of git://git.rocketboards.org/linux-socfpga-next: nios2: check number of timer instances