diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2017-05-01 10:39:57 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2017-05-01 10:39:57 -0700 |
commit | 694752922b12bd318aa80191bd9d8c3dcfb39055 (patch) | |
tree | 5afe83fd99100bea546dd5a1c1f778c58f41e5c0 /drivers/block | |
parent | a351e9b9fc24e982ec2f0e76379a49826036da12 (diff) | |
parent | 9438b3e080beccf6022138ea62192d55cc7dc4ed (diff) |
Merge branch 'for-4.12/block' of git://git.kernel.dk/linux-block
Pull block layer updates from Jens Axboe:
- Add BFQ IO scheduler under the new blk-mq scheduling framework. BFQ
was initially a fork of CFQ, but subsequently changed to implement
fairness based on B-WF2Q+, a modified variant of WF2Q. BFQ is meant
to be used on desktop type single drives, providing good fairness.
From Paolo.
- Add Kyber IO scheduler. This is a full multiqueue aware scheduler,
using a scalable token based algorithm that throttles IO based on
live completion IO stats, similary to blk-wbt. From Omar.
- A series from Jan, moving users to separately allocated backing
devices. This continues the work of separating backing device life
times, solving various problems with hot removal.
- A series of updates for lightnvm, mostly from Javier. Includes a
'pblk' target that exposes an open channel SSD as a physical block
device.
- A series of fixes and improvements for nbd from Josef.
- A series from Omar, removing queue sharing between devices on mostly
legacy drivers. This helps us clean up other bits, if we know that a
queue only has a single device backing. This has been overdue for
more than a decade.
- Fixes for the blk-stats, and improvements to unify the stats and user
windows. This both improves blk-wbt, and enables other users to
register a need to receive IO stats for a device. From Omar.
- blk-throttle improvements from Shaohua. This provides a scalable
framework for implementing scalable priotization - particularly for
blk-mq, but applicable to any type of block device. The interface is
marked experimental for now.
- Bucketized IO stats for IO polling from Stephen Bates. This improves
efficiency of polled workloads in the presence of mixed block size
IO.
- A few fixes for opal, from Scott.
- A few pulls for NVMe, including a lot of fixes for NVMe-over-fabrics.
From a variety of folks, mostly Sagi and James Smart.
- A series from Bart, improving our exposed info and capabilities from
the blk-mq debugfs support.
- A series from Christoph, cleaning up how handle WRITE_ZEROES.
- A series from Christoph, cleaning up the block layer handling of how
we track errors in a request. On top of being a nice cleanup, it also
shrinks the size of struct request a bit.
- Removal of mg_disk and hd (sorry Linus) by Christoph. The former was
never used by platforms, and the latter has outlived it's usefulness.
- Various little bug fixes and cleanups from a wide variety of folks.
* 'for-4.12/block' of git://git.kernel.dk/linux-block: (329 commits)
block: hide badblocks attribute by default
blk-mq: unify hctx delay_work and run_work
block: add kblock_mod_delayed_work_on()
blk-mq: unify hctx delayed_run_work and run_work
nbd: fix use after free on module unload
MAINTAINERS: bfq: Add Paolo as maintainer for the BFQ I/O scheduler
blk-mq-sched: alloate reserved tags out of normal pool
mtip32xx: use runtime tag to initialize command header
scsi: Implement blk_mq_ops.show_rq()
blk-mq: Add blk_mq_ops.show_rq()
blk-mq: Show operation, cmd_flags and rq_flags names
blk-mq: Make blk_flags_show() callers append a newline character
blk-mq: Move the "state" debugfs attribute one level down
blk-mq: Unregister debugfs attributes earlier
blk-mq: Only unregister hctxs for which registration succeeded
blk-mq-debugfs: Rename functions for registering and unregistering the mq directory
blk-mq: Let blk_mq_debugfs_register() look up the queue name
blk-mq: Register <dev>/queue/mq after having registered <dev>/queue
ide-pm: always pass 0 error to ide_complete_rq in ide_do_devset
ide-pm: always pass 0 error to __blk_end_request_all
..
Diffstat (limited to 'drivers/block')
33 files changed, 1487 insertions, 3282 deletions
diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig index f744de7a0f9b..19df4918e37e 100644 --- a/drivers/block/Kconfig +++ b/drivers/block/Kconfig @@ -312,22 +312,6 @@ config BLK_DEV_SKD Use device /dev/skd$N amd /dev/skd$Np$M. -config BLK_DEV_OSD - tristate "OSD object-as-blkdev support" - depends on SCSI_OSD_ULD - ---help--- - Saying Y or M here will allow the exporting of a single SCSI - OSD (object-based storage) object as a Linux block device. - - For example, if you create a 2G object on an OSD device, - you can then use this module to present that 2G object as - a Linux block device. - - To compile this driver as a module, choose M here: the - module will be called osdblk. - - If unsure, say N. - config BLK_DEV_SX8 tristate "Promise SATA SX8 support" depends on PCI @@ -434,23 +418,6 @@ config ATA_OVER_ETH This driver provides Support for ATA over Ethernet block devices like the Coraid EtherDrive (R) Storage Blade. -config MG_DISK - tristate "mGine mflash, gflash support" - depends on ARM && GPIOLIB - help - mGine mFlash(gFlash) block device driver - -config MG_DISK_RES - int "Size of reserved area before MBR" - depends on MG_DISK - default 0 - help - Define size of reserved area that usually used for boot. Unit is KB. - All of the block device operation will be taken this value as start - offset - Examples: - 1024 => 1 MB - config SUNVDC tristate "Sun Virtual Disk Client support" depends on SUN_LDOMS @@ -512,19 +479,7 @@ config VIRTIO_BLK_SCSI Enable support for SCSI passthrough (e.g. the SG_IO ioctl) on virtio-blk devices. This is only supported for the legacy virtio protocol and not enabled by default by any hypervisor. - Your probably want to virtio-scsi instead. - -config BLK_DEV_HD - bool "Very old hard disk (MFM/RLL/IDE) driver" - depends on HAVE_IDE - depends on !ARM || ARCH_RPC || BROKEN - help - This is a very old hard disk driver that lacks the enhanced - functionality of the newer ones. - - It is required for systems with ancient MFM/RLL/ESDI drives. - - If unsure, say N. + You probably want to use virtio-scsi instead. config BLK_DEV_RBD tristate "Rados block device (RBD)" diff --git a/drivers/block/Makefile b/drivers/block/Makefile index 1e9661e26f29..ec8c36897b75 100644 --- a/drivers/block/Makefile +++ b/drivers/block/Makefile @@ -19,10 +19,8 @@ obj-$(CONFIG_BLK_CPQ_CISS_DA) += cciss.o obj-$(CONFIG_BLK_DEV_DAC960) += DAC960.o obj-$(CONFIG_XILINX_SYSACE) += xsysace.o obj-$(CONFIG_CDROM_PKTCDVD) += pktcdvd.o -obj-$(CONFIG_MG_DISK) += mg_disk.o obj-$(CONFIG_SUNVDC) += sunvdc.o obj-$(CONFIG_BLK_DEV_SKD) += skd.o -obj-$(CONFIG_BLK_DEV_OSD) += osdblk.o obj-$(CONFIG_BLK_DEV_UMEM) += umem.o obj-$(CONFIG_BLK_DEV_NBD) += nbd.o @@ -30,7 +28,6 @@ obj-$(CONFIG_BLK_DEV_CRYPTOLOOP) += cryptoloop.o obj-$(CONFIG_VIRTIO_BLK) += virtio_blk.o obj-$(CONFIG_BLK_DEV_SX8) += sx8.o -obj-$(CONFIG_BLK_DEV_HD) += hd.o obj-$(CONFIG_XEN_BLKDEV_FRONTEND) += xen-blkfront.o obj-$(CONFIG_XEN_BLKDEV_BACKEND) += xen-blkback/ diff --git a/drivers/block/ataflop.c b/drivers/block/ataflop.c index 2104b1b4ccda..fa69ecd52cb5 100644 --- a/drivers/block/ataflop.c +++ b/drivers/block/ataflop.c @@ -617,12 +617,12 @@ static void fd_error( void ) if (!fd_request) return; - fd_request->errors++; - if (fd_request->errors >= MAX_ERRORS) { + fd_request->error_count++; + if (fd_request->error_count >= MAX_ERRORS) { printk(KERN_ERR "fd%d: too many errors.\n", SelectedDrive ); fd_end_request_cur(-EIO); } - else if (fd_request->errors == RECALIBRATE_ERRORS) { + else if (fd_request->error_count == RECALIBRATE_ERRORS) { printk(KERN_WARNING "fd%d: recalibrating\n", SelectedDrive ); if (SelectedDrive != -1) SUD.track = -1; @@ -1386,7 +1386,7 @@ static void setup_req_params( int drive ) ReqData = ReqBuffer + 512 * ReqCnt; if (UseTrackbuffer) - read_track = (ReqCmd == READ && fd_request->errors == 0); + read_track = (ReqCmd == READ && fd_request->error_count == 0); else read_track = 0; @@ -1409,8 +1409,10 @@ static struct request *set_next_request(void) fdc_queue = 0; if (q) { rq = blk_fetch_request(q); - if (rq) + if (rq) { + rq->error_count = 0; break; + } } } while (fdc_queue != old_pos); diff --git a/drivers/block/brd.c b/drivers/block/brd.c index 3adc32a3153b..4ec84d504780 100644 --- a/drivers/block/brd.c +++ b/drivers/block/brd.c @@ -134,28 +134,6 @@ static struct page *brd_insert_page(struct brd_device *brd, sector_t sector) return page; } -static void brd_free_page(struct brd_device *brd, sector_t sector) -{ - struct page *page; - pgoff_t idx; - - spin_lock(&brd->brd_lock); - idx = sector >> PAGE_SECTORS_SHIFT; - page = radix_tree_delete(&brd->brd_pages, idx); - spin_unlock(&brd->brd_lock); - if (page) - __free_page(page); -} - -static void brd_zero_page(struct brd_device *brd, sector_t sector) -{ - struct page *page; - - page = brd_lookup_page(brd, sector); - if (page) - clear_highpage(page); -} - /* * Free all backing store pages and radix tree. This must only be called when * there are no other users of the device. @@ -212,24 +190,6 @@ static int copy_to_brd_setup(struct brd_device *brd, sector_t sector, size_t n) return 0; } -static void discard_from_brd(struct brd_device *brd, - sector_t sector, size_t n) -{ - while (n >= PAGE_SIZE) { - /* - * Don't want to actually discard pages here because - * re-allocating the pages can result in writeback - * deadlocks under heavy load. - */ - if (0) - brd_free_page(brd, sector); - else - brd_zero_page(brd, sector); - sector += PAGE_SIZE >> SECTOR_SHIFT; - n -= PAGE_SIZE; - } -} - /* * Copy n bytes from src to the brd starting at sector. Does not sleep. */ @@ -338,14 +298,6 @@ static blk_qc_t brd_make_request(struct request_queue *q, struct bio *bio) if (bio_end_sector(bio) > get_capacity(bdev->bd_disk)) goto io_error; - if (unlikely(bio_op(bio) == REQ_OP_DISCARD)) { - if (sector & ((PAGE_SIZE >> SECTOR_SHIFT) - 1) || - bio->bi_iter.bi_size & ~PAGE_MASK) - goto io_error; - discard_from_brd(brd, sector, bio->bi_iter.bi_size); - goto out; - } - bio_for_each_segment(bvec, bio, iter) { unsigned int len = bvec.bv_len; int err; @@ -357,7 +309,6 @@ static blk_qc_t brd_make_request(struct request_queue *q, struct bio *bio) sector += len >> SECTOR_SHIFT; } -out: bio_endio(bio); return BLK_QC_T_NONE; io_error: @@ -464,11 +415,6 @@ static struct brd_device *brd_alloc(int i) * is harmless) */ blk_queue_physical_block_size(brd->brd_queue, PAGE_SIZE); - - brd->brd_queue->limits.discard_granularity = PAGE_SIZE; - blk_queue_max_discard_sectors(brd->brd_queue, UINT_MAX); - brd->brd_queue->limits.discard_zeroes_data = 1; - queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, brd->brd_queue); #ifdef CONFIG_BLK_DEV_RAM_DAX queue_flag_set_unlocked(QUEUE_FLAG_DAX, brd->brd_queue); #endif diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c index 8e1a4554951c..cd375503f7b0 100644 --- a/drivers/block/cciss.c +++ b/drivers/block/cciss.c @@ -1864,8 +1864,7 @@ static void cciss_softirq_done(struct request *rq) /* set the residual count for pc requests */ if (blk_rq_is_passthrough(rq)) scsi_req(rq)->resid_len = c->err_info->ResidualCnt; - - blk_end_request_all(rq, (rq->errors == 0) ? 0 : -EIO); + blk_end_request_all(rq, scsi_req(rq)->result ? -EIO : 0); spin_lock_irqsave(&h->lock, flags); cmd_free(h, c); @@ -3140,18 +3139,19 @@ static inline void complete_command(ctlr_info_t *h, CommandList_struct *cmd, { int retry_cmd = 0; struct request *rq = cmd->rq; + struct scsi_request *sreq = scsi_req(rq); - rq->errors = 0; + sreq->result = 0; if (timeout) - rq->errors = make_status_bytes(0, 0, 0, DRIVER_TIMEOUT); + sreq->result = make_status_bytes(0, 0, 0, DRIVER_TIMEOUT); if (cmd->err_info->CommandStatus == 0) /* no error has occurred */ goto after_error_processing; switch (cmd->err_info->CommandStatus) { case CMD_TARGET_STATUS: - rq->errors = evaluate_target_status(h, cmd, &retry_cmd); + sreq->result = evaluate_target_status(h, cmd, &retry_cmd); break; case CMD_DATA_UNDERRUN: if (!blk_rq_is_passthrough(cmd->rq)) { @@ -3169,7 +3169,7 @@ static inline void complete_command(ctlr_info_t *h, CommandList_struct *cmd, case CMD_INVALID: dev_warn(&h->pdev->dev, "cciss: cmd %p is " "reported invalid\n", cmd); - rq->errors = make_status_bytes(SAM_STAT_GOOD, + sreq->result = make_status_bytes(SAM_STAT_GOOD, cmd->err_info->CommandStatus, DRIVER_OK, blk_rq_is_passthrough(cmd->rq) ? DID_PASSTHROUGH : DID_ERROR); @@ -3177,7 +3177,7 @@ static inline void complete_command(ctlr_info_t *h, CommandList_struct *cmd, case CMD_PROTOCOL_ERR: dev_warn(&h->pdev->dev, "cciss: cmd %p has " "protocol error\n", cmd); - rq->errors = make_status_bytes(SAM_STAT_GOOD, + sreq->result = make_status_bytes(SAM_STAT_GOOD, cmd->err_info->CommandStatus, DRIVER_OK, blk_rq_is_passthrough(cmd->rq) ? DID_PASSTHROUGH : DID_ERROR); @@ -3185,7 +3185,7 @@ static inline void complete_command(ctlr_info_t *h, CommandList_struct *cmd, case CMD_HARDWARE_ERR: dev_warn(&h->pdev->dev, "cciss: cmd %p had " " hardware error\n", cmd); - rq->errors = make_status_bytes(SAM_STAT_GOOD, + sreq->result = make_status_bytes(SAM_STAT_GOOD, cmd->err_info->CommandStatus, DRIVER_OK, blk_rq_is_passthrough(cmd->rq) ? DID_PASSTHROUGH : DID_ERROR); @@ -3193,7 +3193,7 @@ static inline void complete_command(ctlr_info_t *h, CommandList_struct *cmd, case CMD_CONNECTION_LOST: dev_warn(&h->pdev->dev, "cciss: cmd %p had " "connection lost\n", cmd); - rq->errors = make_status_bytes(SAM_STAT_GOOD, + sreq->result = make_status_bytes(SAM_STAT_GOOD, cmd->err_info->CommandStatus, DRIVER_OK, blk_rq_is_passthrough(cmd->rq) ? DID_PASSTHROUGH : DID_ERROR); @@ -3201,7 +3201,7 @@ static inline void complete_command(ctlr_info_t *h, CommandList_struct *cmd, case CMD_ABORTED: dev_warn(&h->pdev->dev, "cciss: cmd %p was " "aborted\n", cmd); - rq->errors = make_status_bytes(SAM_STAT_GOOD, + sreq->result = make_status_bytes(SAM_STAT_GOOD, cmd->err_info->CommandStatus, DRIVER_OK, blk_rq_is_passthrough(cmd->rq) ? DID_PASSTHROUGH : DID_ABORT); @@ -3209,7 +3209,7 @@ static inline void complete_command(ctlr_info_t *h, CommandList_struct *cmd, case CMD_ABORT_FAILED: dev_warn(&h->pdev->dev, "cciss: cmd %p reports " "abort failed\n", cmd); - rq->errors = make_status_bytes(SAM_STAT_GOOD, + sreq->result = make_status_bytes(SAM_STAT_GOOD, cmd->err_info->CommandStatus, DRIVER_OK, blk_rq_is_passthrough(cmd->rq) ? DID_PASSTHROUGH : DID_ERROR); @@ -3224,21 +3224,21 @@ static inline void complete_command(ctlr_info_t *h, CommandList_struct *cmd, } else dev_warn(&h->pdev->dev, "%p retried too many times\n", cmd); - rq->errors = make_status_bytes(SAM_STAT_GOOD, + sreq->result = make_status_bytes(SAM_STAT_GOOD, cmd->err_info->CommandStatus, DRIVER_OK, blk_rq_is_passthrough(cmd->rq) ? DID_PASSTHROUGH : DID_ABORT); break; case CMD_TIMEOUT: dev_warn(&h->pdev->dev, "cmd %p timedout\n", cmd); - rq->errors = make_status_bytes(SAM_STAT_GOOD, + sreq->result = make_status_bytes(SAM_STAT_GOOD, cmd->err_info->CommandStatus, DRIVER_OK, blk_rq_is_passthrough(cmd->rq) ? DID_PASSTHROUGH : DID_ERROR); break; case CMD_UNABORTABLE: dev_warn(&h->pdev->dev, "cmd %p unabortable\n", cmd); - rq->errors = make_status_bytes(SAM_STAT_GOOD, + sreq->result = make_status_bytes(SAM_STAT_GOOD, cmd->err_info->CommandStatus, DRIVER_OK, blk_rq_is_passthrough(cmd->rq) ? DID_PASSTHROUGH : DID_ERROR); @@ -3247,7 +3247,7 @@ static inline void complete_command(ctlr_info_t *h, CommandList_struct *cmd, dev_warn(&h->pdev->dev, "cmd %p returned " "unknown status %x\n", cmd, cmd->err_info->CommandStatus); - rq->errors = make_status_bytes(SAM_STAT_GOOD, + sreq->result = make_status_bytes(SAM_STAT_GOOD, cmd->err_info->CommandStatus, DRIVER_OK, blk_rq_is_passthrough(cmd->rq) ? DID_PASSTHROUGH : DID_ERROR); @@ -3380,9 +3380,9 @@ static void do_cciss_request(struct request_queue *q) if (dma_mapping_error(&h->pdev->dev, temp64.val)) { dev_warn(&h->pdev->dev, "%s: error mapping page for DMA\n", __func__); - creq->errors = make_status_bytes(SAM_STAT_GOOD, - 0, DRIVER_OK, - DID_SOFT_ERROR); + scsi_req(creq)->result = + make_status_bytes(SAM_STAT_GOOD, 0, DRIVER_OK, + DID_SOFT_ERROR); cmd_free(h, c); return; } @@ -3395,9 +3395,9 @@ static void do_cciss_request(struct request_queue *q) if (cciss_map_sg_chain_block(h, c, h->cmd_sg_list[c->cmdindex], (seg - (h->max_cmd_sgentries - 1)) * sizeof(SGDescriptor_struct))) { - creq->errors = make_status_bytes(SAM_STAT_GOOD, - 0, DRIVER_OK, - DID_SOFT_ERROR); + scsi_req(creq)->result = + make_status_bytes(SAM_STAT_GOOD, 0, DRIVER_OK, + DID_SOFT_ERROR); cmd_free(h, c); return; } diff --git a/drivers/block/drbd/drbd_debugfs.c b/drivers/block/drbd/drbd_debugfs.c index de5c3ee8a790..494837e59f23 100644 --- a/drivers/block/drbd/drbd_debugfs.c +++ b/drivers/block/drbd/drbd_debugfs.c @@ -236,9 +236,6 @@ static void seq_print_peer_request_flags(struct seq_file *m, struct drbd_peer_re seq_print_rq_state_bit(m, f & EE_CALL_AL_COMPLETE_IO, &sep, "in-AL"); seq_print_rq_state_bit(m, f & EE_SEND_WRITE_ACK, &sep, "C"); seq_print_rq_state_bit(m, f & EE_MAY_SET_IN_SYNC, &sep, "set-in-sync"); - - if (f & EE_IS_TRIM) - __seq_print_rq_state_bit(m, f & EE_IS_TRIM_USE_ZEROOUT, &sep, "zero-out", "trim"); seq_print_rq_state_bit(m, f & EE_WRITE_SAME, &sep, "write-same"); seq_putc(m, '\n'); } diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h index 724d1c50fc52..d5da45bb03a6 100644 --- a/drivers/block/drbd/drbd_int.h +++ b/drivers/block/drbd/drbd_int.h @@ -437,9 +437,6 @@ enum { /* is this a TRIM aka REQ_DISCARD? */ __EE_IS_TRIM, - /* our lower level cannot handle trim, - * and we want to fall back to zeroout instead */ - __EE_IS_TRIM_USE_ZEROOUT, /* In case a barrier failed, * we need to resubmit without the barrier flag. */ @@ -482,7 +479,6 @@ enum { #define EE_CALL_AL_COMPLETE_IO (1<<__EE_CALL_AL_COMPLETE_IO) #define EE_MAY_SET_IN_SYNC (1<<__EE_MAY_SET_IN_SYNC) #define EE_IS_TRIM (1<<__EE_IS_TRIM) -#define EE_IS_TRIM_USE_ZEROOUT (1<<__EE_IS_TRIM_USE_ZEROOUT) #define EE_RESUBMITTED (1<<__EE_RESUBMITTED) #define EE_WAS_ERROR (1<<__EE_WAS_ERROR) #define EE_HAS_DIGEST (1<<__EE_HAS_DIGEST) @@ -1561,8 +1557,6 @@ extern void start_resync_timer_fn(unsigned long data); extern void drbd_endio_write_sec_final(struct drbd_peer_request *peer_req); /* drbd_receiver.c */ -extern int drbd_issue_discard_or_zero_out(struct drbd_device *device, - sector_t start, unsigned int nr_sectors, bool discard); extern int drbd_receiver(struct drbd_thread *thi); extern int drbd_ack_receiver(struct drbd_thread *thi); extern void drbd_send_ping_wf(struct work_struct *ws); diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c index 92c60cbd04ee..84455c365f57 100644 --- a/drivers/block/drbd/drbd_main.c +++ b/drivers/block/drbd/drbd_main.c @@ -931,7 +931,6 @@ void assign_p_sizes_qlim(struct drbd_device *device, struct p_sizes *p, struct r p->qlim->io_min = cpu_to_be32(queue_io_min(q)); p->qlim->io_opt = cpu_to_be32(queue_io_opt(q)); p->qlim->discard_enabled = blk_queue_discard(q); - p->qlim->discard_zeroes_data = queue_discard_zeroes_data(q); p->qlim->write_same_capable = !!q->limits.max_write_same_sectors; } else { q = device->rq_queue; @@ -941,7 +940,6 @@ void assign_p_sizes_qlim(struct drbd_device *device, struct p_sizes *p, struct r p->qlim->io_min = cpu_to_be32(queue_io_min(q)); p->qlim->io_opt = cpu_to_be32(queue_io_opt(q)); p->qlim->discard_enabled = 0; - p->qlim->discard_zeroes_data = 0; p->qlim->write_same_capable = 0; } } @@ -1668,7 +1666,8 @@ static u32 bio_flags_to_wire(struct drbd_connection *connection, (bio->bi_opf & REQ_FUA ? DP_FUA : 0) | (bio->bi_opf & REQ_PREFLUSH ? DP_FLUSH : 0) | (bio_op(bio) == REQ_OP_WRITE_SAME ? DP_WSAME : 0) | - (bio_op(bio) == REQ_OP_DISCARD ? DP_DISCARD : 0); + (bio_op(bio) == REQ_OP_DISCARD ? DP_DISCARD : 0) | + (bio_op(bio) == REQ_OP_WRITE_ZEROES ? DP_DISCARD : 0); else return bio->bi_opf & REQ_SYNC ? DP_RW_SYNC : 0; } diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c index 908c704e20aa..02255a0d68b9 100644 --- a/drivers/block/drbd/drbd_nl.c +++ b/drivers/block/drbd/drbd_nl.c @@ -1199,10 +1199,6 @@ static void decide_on_discard_support(struct drbd_device *device, struct drbd_connection *connection = first_peer_device(device)->connection; bool can_do = b ? blk_queue_discard(b) : true; - if (can_do && b && !b->limits.discard_zeroes_data && !discard_zeroes_if_aligned) { - can_do = false; - drbd_info(device, "discard_zeroes_data=0 and discard_zeroes_if_aligned=no: disabling discards\n"); - } if (can_do && connection->cstate >= C_CONNECTED && !(connection->agreed_features & DRBD_FF_TRIM)) { can_do = false; drbd_info(connection, "peer DRBD too old, does not support TRIM: disabling discards\n"); @@ -1217,10 +1213,12 @@ static void decide_on_discard_support(struct drbd_device *device, blk_queue_discard_granularity(q, 512); q->limits.max_discard_sectors = drbd_max_discard_sectors(connection); queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q); + q->limits.max_write_zeroes_sectors = drbd_max_discard_sectors(connection); } else { queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, q); blk_queue_discard_granularity(q, 0); q->limits.max_discard_sectors = 0; + q->limits.max_write_zeroes_sectors = 0; } } @@ -1482,8 +1480,7 @@ static void sanitize_disk_conf(struct drbd_device *device, struct disk_conf *dis if (disk_conf->al_extents > drbd_al_extents_max(nbc)) disk_conf->al_extents = drbd_al_extents_max(nbc); - if (!blk_queue_discard(q) - || (!q->limits.discard_zeroes_data && !disk_conf->discard_zeroes_if_aligned)) { + if (!blk_queue_discard(q)) { if (disk_conf->rs_discard_granularity) { disk_conf->rs_discard_granularity = 0; /* disable feature */ drbd_info(device, "rs_discard_granularity feature disabled\n"); diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c index aa6bf9692eff..1b0a2be24f39 100644 --- a/drivers/block/drbd/drbd_receiver.c +++ b/drivers/block/drbd/drbd_receiver.c @@ -1448,105 +1448,14 @@ void drbd_bump_write_ordering(struct drbd_resource *resource, struct drbd_backin drbd_info(resource, "Method to ensure write ordering: %s\n", write_ordering_str[resource->write_ordering]); } -/* - * We *may* ignore the discard-zeroes-data setting, if so configured. - * - * Assumption is that it "discard_zeroes_data=0" is only because the backend - * may ignore partial unaligned discards. - * - * LVM/DM thin as of at least - * LVM version: 2.02.115(2)-RHEL7 (2015-01-28) - * Library version: 1.02.93-RHEL7 (2015-01-28) - * Driver version: 4.29.0 - * still behaves this way. - * - * For unaligned (wrt. alignment and granularity) or too small discards, - * we zero-out the initial (and/or) trailing unaligned partial chunks, - * but discard all the aligned full chunks. - * - * At least for LVM/DM thin, the result is effectively "discard_zeroes_data=1". - */ -int drbd_issue_discard_or_zero_out(struct drbd_device *device, sector_t start, unsigned int nr_sectors, bool discard) -{ - struct block_device *bdev = device->ldev->backing_bdev; - struct request_queue *q = bdev_get_queue(bdev); - sector_t tmp, nr; - unsigned int max_discard_sectors, granularity; - int alignment; - int err = 0; - - if (!discard) - goto zero_out; - - /* Zero-sector (unknown) and one-sector granularities are the same. */ - granularity = max(q->limits.discard_granularity >> 9, 1U); - alignment = (bdev_discard_alignment(bdev) >> 9) % granularity; - - max_discard_sectors = min(q->limits.max_discard_sectors, (1U << 22)); - max_discard_sectors -= max_discard_sectors % granularity; - if (unlikely(!max_discard_sectors)) - goto zero_out; - - if (nr_sectors < granularity) - goto zero_out; - - tmp = start; - if (sector_div(tmp, granularity) != alignment) { - if (nr_sectors < 2*granularity) - goto zero_out; - /* start + gran - (start + gran - align) % gran */ - tmp = start + granularity - alignment; - tmp = start + granularity - sector_div(tmp, granularity); - - nr = tmp - start; - err |= blkdev_issue_zeroout(bdev, start, nr, GFP_NOIO, 0); - nr_sectors -= nr; - start = tmp; - } - while (nr_sectors >= granularity) { - nr = min_t(sector_t, nr_sectors, max_discard_sectors); - err |= blkdev_issue_discard(bdev, start, nr, GFP_NOIO, 0); - nr_sectors -= nr; - start += nr; - } - zero_out: - if (nr_sectors) { - err |= blkdev_issue_zeroout(bdev, start, nr_sectors, GFP_NOIO, 0); - } - return err != 0; -} - -static bool can_do_reliable_discards(struct drbd_device *device) -{ - struct request_queue *q = bdev_get_queue(device->ldev->backing_bdev); - struct disk_conf *dc; - bool can_do; - - if (!blk_queue_discard(q)) - return false; - - if (q->limits.discard_zeroes_data) - return true; - - rcu_read_lock(); - dc = rcu_dereference(device->ldev->disk_conf); - can_do = dc->discard_zeroes_if_aligned; - rcu_read_unlock(); - return can_do; -} - static void drbd_issue_peer_discard(struct drbd_device *device, struct drbd_peer_request *peer_req) { - /* If the backend cannot discard, or does not guarantee - * read-back zeroes in discarded ranges, we fall back to - * zero-out. Unless configuration specifically requested - * otherwise. */ - if (!can_do_reliable_discards(device)) - peer_req->flags |= EE_IS_TRIM_USE_ZEROOUT; + struct block_device *bdev = device->ldev->backing_bdev; - if (drbd_issue_discard_or_zero_out(device, peer_req->i.sector, - peer_req->i.size >> 9, !(peer_req->flags & EE_IS_TRIM_USE_ZEROOUT))) + if (blkdev_issue_zeroout(bdev, peer_req->i.sector, peer_req->i.size >> 9, + GFP_NOIO, 0)) peer_req->flags |= EE_WAS_ERROR; + drbd_endio_write_sec_final(peer_req); } @@ -2376,7 +2285,7 @@ static unsigned long wire_flags_to_bio_flags(u32 dpf) static unsigned long wire_flags_to_bio_op(u32 dpf) { if (dpf & DP_DISCARD) - return REQ_OP_DISCARD; + return REQ_OP_WRITE_ZEROES; else return REQ_OP_WRITE; } @@ -2567,7 +2476,7 @@ static int receive_Data(struct drbd_connection *connection, struct packet_info * op_flags = wire_flags_to_bio_flags(dp_flags); if (pi->cmd == P_TRIM) { D_ASSERT(peer_device, peer_req->i.size > 0); - D_ASSERT(peer_device, op == REQ_OP_DISCARD); + D_ASSERT(peer_device, op == REQ_OP_WRITE_ZEROES); D_ASSERT(peer_device, peer_req->pages == NULL); } else if (peer_req->pages == NULL) { D_ASSERT(device, peer_req->i.size == 0); @@ -4880,7 +4789,7 @@ static int receive_rs_deallocated(struct drbd_connection *connection, struct pac if (get_ldev(device)) { struct drbd_peer_request *peer_req; - const int op = REQ_OP_DISCARD; + const int op = REQ_OP_WRITE_ZEROES; peer_req = drbd_alloc_peer_req(peer_device, ID_SYNCER, sector, size, 0, GFP_NOIO); diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c index 652114ae1a8a..b5730e17b455 100644 --- a/drivers/block/drbd/drbd_req.c +++ b/drivers/block/drbd/drbd_req.c @@ -59,6 +59,7 @@ static struct drbd_request *drbd_req_new(struct drbd_device *device, struct bio drbd_req_make_private_bio(req, bio_src); req->rq_state = (bio_data_dir(bio_src) == WRITE ? RQ_WRITE : 0) | (bio_op(bio_src) == REQ_OP_WRITE_SAME ? RQ_WSAME : 0) + | (bio_op(bio_src) == REQ_OP_WRITE_ZEROES ? RQ_UNMAP : 0) | (bio_op(bio_src) == REQ_OP_DISCARD ? RQ_UNMAP : 0); req->device = device; req->master_bio = bio_src; @@ -1148,10 +1149,10 @@ static int drbd_process_write_request(struct drbd_request *req) static void drbd_process_discard_req(struct drbd_request *req) { - int err = drbd_issue_discard_or_zero_out(req->device, - req->i.sector, req->i.size >> 9, true); + struct block_device *bdev = req->device->ldev->backing_bdev; - if (err) + if (blkdev_issue_zeroout(bdev, req->i.sector, req->i.size >> 9, + GFP_NOIO, 0)) req->private_bio->bi_error = -EIO; bio_endio(req->private_bio); } @@ -1180,7 +1181,8 @@ drbd_submit_req_private_bio(struct drbd_request *req) if (get_ldev(device)) { if (drbd_insert_fault(device, type)) bio_io_error(bio); - else if (bio_op(bio) == REQ_OP_DISCARD) + else if (bio_op(bio) == REQ_OP_WRITE_ZEROES || |