Commit Graph

264389 Commits (a38eb630fa224d6fba8c14a4063174bc5e0f63bb)

Author SHA1 Message Date
Vivek Goyal a38eb630fa blk-throttle: Take blkcg->lock while traversing blkcg->policy_list
blkcg->policy_list is protected by blkcg->lock. Its not rcu protected
list. So even for readers, they need to take blkcg->lock. There are
few functions which were reading the list without taking lock. Fix it.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-25 15:48:12 +02:00
Vivek Goyal e060f00bee blk-throttle: Free up policy node associated with deleted rule
If a rule is being deleted, free up associated policy node. Otherwise
that memory is leaked.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-25 15:48:12 +02:00
Tao Ma 5e08159197 block: warn if tag is greater than real_max_depth.
In case tag depth is reduced, it is max_depth not real_max_depth.
So we should allow a request with tag >= max_depth, but for a
tag >= real_max_depth, there really should be some problem.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-25 10:20:05 +02:00
Jens Axboe 83157223de Merge branch 'for-linus' into for-3.2/core 2011-10-24 16:24:38 +02:00
Tejun Heo f992ae801a block: make gendisk hold a reference to its queue
The following command sequence triggers an oops.

# mount /dev/sdb1 /mnt
# echo 1 > /sys/class/scsi_device/0\:0\:1\:0/device/delete
# umount /mnt

 general protection fault: 0000 [#1] PREEMPT SMP
 CPU 2
 Modules linked in:

 Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ #8 Bochs Bochs
 RIP: 0010:[<ffffffff810d0879>]  [<ffffffff810d0879>] __lock_acquire+0x389/0x1d60
...
 Call Trace:
  [<ffffffff810d2845>] lock_acquire+0x95/0x140
  [<ffffffff81aed87b>] _raw_spin_lock+0x3b/0x50
  [<ffffffff811573bc>] bdi_lock_two+0x5c/0x70
  [<ffffffff811c2f6c>] bdev_inode_switch_bdi+0x4c/0xf0
  [<ffffffff811c3fcb>] __blkdev_put+0x11b/0x1d0
  [<ffffffff811c4010>] __blkdev_put+0x160/0x1d0
  [<ffffffff811c40df>] blkdev_put+0x5f/0x190
  [<ffffffff8118f18d>] kill_block_super+0x4d/0x80
  [<ffffffff8118f4a5>] deactivate_locked_super+0x45/0x70
  [<ffffffff8119003a>] deactivate_super+0x4a/0x70
  [<ffffffff811ac4ad>] mntput_no_expire+0xed/0x130
  [<ffffffff811acf2e>] sys_umount+0x7e/0x3a0
  [<ffffffff81aeeeab>] system_call_fastpath+0x16/0x1b

This is because bdev holds on to disk but disk doesn't pin the
associated queue.  If a SCSI device is removed while the device is
still open, the sdev puts the base reference to the queue on release.
When the bdev is finally released, the associated queue is already
gone along with the bdi and bdev_inode_switch_bdi() ends up
dereferencing already freed bdi.

Even if it were not for this bug, disk not holding onto the associated
queue is very unusual and error-prone.

Fix it by making add_disk() take an extra reference to its queue and
put it on disk_release() and ensuring that disk and its fops owner are
put in that order after all accesses to the disk and queue are
complete.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-24 16:24:31 +02:00
Jeff Moyer e67b77c791 blk-flush: move the queue kick into
A dm-multipath user reported[1] a problem when trying to boot
a kernel with commit 4853abaae7
(block: fix flush machinery for stacking drivers with differring
flush flags) applied.  It turns out that an empty flush request
can be sent into blk_insert_flush.  When the BUG_ON was fixed
to allow for this, I/O on the underlying device would stall.  The
reason is that blk_insert_cloned_request does not kick the queue.
In the aforementioned commit, I had added a special case to
kick the queue if data was sent down but the queue flags did
not require a flush.  A better solution is to push the queue
kick up into blk_insert_cloned_request.

This patch, along with a follow-on which fixes the BUG_ON, fixes
the issue reported.

[1] http://www.redhat.com/archives/dm-devel/2011-September/msg00154.html

Reported-by: Christophe Saout <christophe@saout.de>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>

Stable note: 3.1
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-24 16:24:31 +02:00
Jeff Moyer 834f9f61a5 blk-flush: fix invalid BUG_ON in blk_insert_flush
A user reported a regression due to commit
4853abaae7 (block: fix flush
machinery for stacking drivers with differring flush flags).
Part of the problem is that blk_insert_flush required a
single bio be attached to the request.  In reality, having
no attached bio is also a valid case, as can be observed with
an empty flush.

[1] http://www.redhat.com/archives/dm-devel/2011-September/msg00154.html

Reported-by: Christophe Saout <christophe@saout.de>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com
Acked-by: Tejun Heo <tj@kernel.org>

Stable note: 3.1
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-24 16:24:30 +02:00
Tao Ma 9562ad9ab3 block: Remove the control of complete cpu from bio.
bio originally has the functionality to set the complete cpu, but
it is broken.

Chirstoph said that "This code is unused, and from the all the
discussions lately pretty obviously broken.  The only thing keeping
it serves is creating more confusion and possibly more bugs."

And Jens replied with "We can kill bio_set_completion_cpu(). I'm fine
with leaving cpu control to the request based drivers, they are the
only ones that can toggle the setting anyway".

So this patch tries to remove all the work of controling complete cpu
from a bio.

Cc: Shaohua Li <shaohua.li@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-24 16:11:30 +02:00
Jie Liu e890413af4 block: fix a typo in the blk-cgroup.h file
byptes -> bytes.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-24 16:08:38 +02:00
David Vrabel 3bcfeaf93f block: initialize the bounce pool if high memory may be added later
init_emergency_pool() does not create the page pool for bouncing block
requests if the current count of high pages is zero.  If high memory
may be added later (either via memory hotplug or a balloon driver in a
virtualized system) then a oops occurs if a request with a high page
need bouncing because the pool does not exist.

So, always create the pool if memory hotplug is enabled and change the
test so it's valid even if all high pages are currently in the balloon
(the balloon drivers adjust totalhigh_pages but not max_pfn).

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-20 21:24:30 +02:00
Tejun Heo c9a929dde3 block: fix request_queue lifetime handling by making blk_queue_cleanup() properly shutdown
request_queue is refcounted but actually depdends on lifetime
management from the queue owner - on blk_cleanup_queue(), block layer
expects that there's no request passing through request_queue and no
new one will.

This is fundamentally broken.  The queue owner (e.g. SCSI layer)
doesn't have a way to know whether there are other active users before
calling blk_cleanup_queue() and other users (e.g. bsg) don't have any
guarantee that the queue is and would stay valid while it's holding a
reference.

With delay added in blk_queue_bio() before queue_lock is grabbed, the
following oops can be easily triggered when a device is removed with
in-flight IOs.

 sd 0:0:1:0: [sdb] Stopping disk
 ata1.01: disabled
 general protection fault: 0000 [#1] PREEMPT SMP
 CPU 2
 Modules linked in:

 Pid: 648, comm: test_rawio Not tainted 3.1.0-rc3-work+ #56 Bochs Bochs
 RIP: 0010:[<ffffffff8137d651>]  [<ffffffff8137d651>] elv_rqhash_find+0x61/0x100
 ...
 Process test_rawio (pid: 648, threadinfo ffff880019efa000, task ffff880019ef8a80)
 ...
 Call Trace:
  [<ffffffff8137d774>] elv_merge+0x84/0xe0
  [<ffffffff81385b54>] blk_queue_bio+0xf4/0x400
  [<ffffffff813838ea>] generic_make_request+0xca/0x100
  [<ffffffff81383994>] submit_bio+0x74/0x100
  [<ffffffff811c53ec>] dio_bio_submit+0xbc/0xc0
  [<ffffffff811c610e>] __blockdev_direct_IO+0x92e/0xb40
  [<ffffffff811c39f7>] blkdev_direct_IO+0x57/0x60
  [<ffffffff8113b1c5>] generic_file_aio_read+0x6d5/0x760
  [<ffffffff8118c1ca>] do_sync_read+0xda/0x120
  [<ffffffff8118ce55>] vfs_read+0xc5/0x180
  [<ffffffff8118cfaa>] sys_pread64+0x9a/0xb0
  [<ffffffff81afaf6b>] system_call_fastpath+0x16/0x1b

This happens because blk_queue_cleanup() destroys the queue and
elevator whether IOs are in progress or not and DEAD tests are
sprinkled in the request processing path without proper
synchronization.

Similar problem exists for blk-throtl.  On queue cleanup, blk-throtl
is shutdown whether it has requests in it or not.  Depending on
timing, it either oopses or throttled bios are lost putting tasks
which are waiting for bio completion into eternal D state.

The way it should work is having the usual clear distinction between
shutdown and release.  Shutdown drains all currently pending requests,
marks the queue dead, and performs partial teardown of the now
unnecessary part of the queue.  Even after shutdown is complete,
reference holders are still allowed to issue requests to the queue
although they will be immmediately failed.  The rest of teardown
happens on release.

This patch makes the following changes to make blk_queue_cleanup()
behave as proper shutdown.

* QUEUE_FLAG_DEAD is now set while holding both q->exit_mutex and
  queue_lock.

* Unsynchronized DEAD check in generic_make_request_checks() removed.
  This couldn't make any meaningful difference as the queue could die
  after the check.

* blk_drain_queue() updated such that it can drain all requests and is
  now called during cleanup.

* blk_throtl updated such that it checks DEAD on grabbing queue_lock,
  drains all throttled bios during cleanup and free td when queue is
  released.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-19 14:42:16 +02:00
Tejun Heo bd87b5898a block: drop @tsk from attempt_plug_merge() and explain sync rules
attempt_plug_merge() accesses elevator without holding queue_lock and
may call into ->elevator_bio_merge_fn().  The elvator is guaranteed to
be valid because it's accessed iff the plugged list has requests and
elevator is never exited with live requests, so as long as the
elevator method can deal with unlocked access, this is safe.

Explain the sync rules around attempt_plug_merge() and drop the
unnecessary @tsk parameter.

This patch doesn't introduce any functional change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-19 14:33:08 +02:00
Tejun Heo da8303c63b block: make get_request[_wait]() fail if queue is dead
Currently get_request[_wait]() allocates request whether queue is dead
or not.  This patch makes get_request[_wait]() return NULL if @q is
dead.  blk_queue_bio() is updated to fail the submitted bio if request
allocation fails.  While at it, add docbook comments for
get_request[_wait]().

Note that the current code has rather unclear (there are spurious DEAD
tests scattered around) assumption that the owner of a queue
guarantees that no request travels block layer if the queue is dead
and this patch in itself doesn't change much; however, this will allow
fixing the broken assumption in the next patch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-19 14:33:05 +02:00
Tejun Heo bc16a4f933 block: reorganize throtl_get_tg() and blk_throtl_bio()
blk_throtl_bio() and throtl_get_tg() have rather unusual interface.

* throtl_get_tg() returns pointer to a valid tg or ERR_PTR(-ENODEV),
  and drops queue_lock in the latter case.  Different locking context
  depending on return value is error-prone and DEAD state is scheduled
  to be protected by queue_lock anyway.  Move DEAD check inside
  queue_lock and return valid tg or NULL.

* blk_throtl_bio() indicates return status both with its return value
  and in/out param **@bio.  The former is used to indicate whether
  queue is found to be dead during throtl processing.  The latter
  whether the bio is throttled.

  There's no point in returning DEAD check result from
  blk_throtl_bio().  The queue can die after blk_throtl_bio() is
  finished but before make_request_fn() grabs queue lock.

  Make it take *@bio instead and return boolean result indicating
  whether the request is throttled or not.

This patch doesn't cause any visible functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-19 14:33:01 +02:00
Tejun Heo e3c78ca524 block: reorganize queue draining
Reorganize queue draining related code in preparation of queue exit
changes.

* Factor out actual draining from elv_quiesce_start() to
  blk_drain_queue().

* Make elv_quiesce_start/end() responsible for their own locking.

* Replace open-coded ELVSWITCH clearing in elevator_switch() with
  elv_quiesce_end().

This patch doesn't cause any visible functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-19 14:32:38 +02:00
Tejun Heo 315fceee81 block: drop unnecessary blk_get/put_queue() in scsi_cmd_ioctl() and blk_get_tg()
blk_get/put_queue() in scsi_cmd_ioctl() and throtl_get_tg() are
completely bogus.  The caller must have a reference to the queue on
entry and taking an extra reference doesn't change anything.

For scsi_cmd_ioctl(), the only effect is that it ends up checking
QUEUE_FLAG_DEAD on entry; however, this is bogus as queue can die
right after blk_get_queue().  Dead queue should be and is handled in
request issue path (it's somewhat broken now but that's a separate
problem and doesn't affect this one much).

throtl_get_tg() incorrectly assumes that q is rcu freed.  Also, it
doesn't check return value of blk_get_queue().  If the queue is
already dead, it ends up doing an extra put.

Drop them.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-19 14:31:25 +02:00
Tejun Heo 75eb6c372d block: pass around REQ_* flags instead of broken down booleans during request alloc/free
blk_alloc_request() and freed_request() take different combinations of
REQ_* @flags, @priv and @is_sync when @flags is superset of the latter
two.  Make them take @flags only.  This cleans up the code a bit and
will ease updating allocation related REQ_* flags.

This patch doesn't introduce any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-19 14:31:22 +02:00
Tejun Heo bc9fcbf9cb block: move blk_throtl prototypes to block/blk.h
blk_throtl interface is block internal and there's no reason to have
them in linux/blkdev.h.  Move them to block/blk.h.

This patch doesn't introduce any functional change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-19 14:31:18 +02:00
Tejun Heo ece84241b9 block: fix genhd refcounting in blkio_policy_parse_and_set()
blkio_policy_parse_and_set() calls blkio_check_dev_num() to check
whether the given dev_t is valid.  blkio_check_dev_num() uses
get_gendisk() for verification but never puts the returned genhd
leaking the reference.

This patch collapses blkio_check_dev_num() into its caller and updates
it such that the genhd is put before returning.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-19 14:31:15 +02:00
Tejun Heo 523e1d399c block: make gendisk hold a reference to its queue
The following command sequence triggers an oops.

# mount /dev/sdb1 /mnt
# echo 1 > /sys/class/scsi_device/0\:0\:1\:0/device/delete
# umount /mnt

 general protection fault: 0000 [#1] PREEMPT SMP
 CPU 2
 Modules linked in:

 Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ #8 Bochs Bochs
 RIP: 0010:[<ffffffff810d0879>]  [<ffffffff810d0879>] __lock_acquire+0x389/0x1d60
...
 Call Trace:
  [<ffffffff810d2845>] lock_acquire+0x95/0x140
  [<ffffffff81aed87b>] _raw_spin_lock+0x3b/0x50
  [<ffffffff811573bc>] bdi_lock_two+0x5c/0x70
  [<ffffffff811c2f6c>] bdev_inode_switch_bdi+0x4c/0xf0
  [<ffffffff811c3fcb>] __blkdev_put+0x11b/0x1d0
  [<ffffffff811c4010>] __blkdev_put+0x160/0x1d0
  [<ffffffff811c40df>] blkdev_put+0x5f/0x190
  [<ffffffff8118f18d>] kill_block_super+0x4d/0x80
  [<ffffffff8118f4a5>] deactivate_locked_super+0x45/0x70
  [<ffffffff8119003a>] deactivate_super+0x4a/0x70
  [<ffffffff811ac4ad>] mntput_no_expire+0xed/0x130
  [<ffffffff811acf2e>] sys_umount+0x7e/0x3a0
  [<ffffffff81aeeeab>] system_call_fastpath+0x16/0x1b

This is because bdev holds on to disk but disk doesn't pin the
associated queue.  If a SCSI device is removed while the device is
still open, the sdev puts the base reference to the queue on release.
When the bdev is finally released, the associated queue is already
gone along with the bdi and bdev_inode_switch_bdi() ends up
dereferencing already freed bdi.

Even if it were not for this bug, disk not holding onto the associated
queue is very unusual and error-prone.

Fix it by making add_disk() take an extra reference to its queue and
put it on disk_release() and ensuring that disk and its fops owner are
put in that order after all accesses to the disk and queue are
complete.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-19 14:31:07 +02:00
Jens Axboe 5c04b426f2 Merge branch 'v3.1-rc10' into for-3.2/core
Conflicts:
	block/blk-core.c
	include/linux/blkdev.h

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-19 14:30:42 +02:00
Linus Torvalds 899e3ee404 Linux 3.1-rc10 2011-10-17 21:06:23 -07:00
Linus Torvalds a84a79e4d3 Avoid using variable-length arrays in kernel/sys.c
The size is always valid, but variable-length arrays generate worse code
for no good reason (unless the function happens to be inlined and the
compiler sees the length for the simple constant it is).

Also, there seems to be some code generation problem on POWER, where
Henrik Bakken reports that register r28 can get corrupted under some
subtle circumstances (interrupt happening at the wrong time?).  That all
indicates some seriously broken compiler issues, but since variable
length arrays are bad regardless, there's little point in trying to
chase it down.

"Just don't do that, then".

Reported-by: Henrik Grindal Bakken <henribak@cisco.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-17 08:24:24 -07:00
Christoph Hellwig 456be1484f loop: remove the incorrect write_begin/write_end shortcut
Currently the loop device tries to call directly into write_begin/write_end
instead of going through ->write if it can.  This is a fairly nasty shortcut
as write_begin and write_end are only callbacks for the generic write code
and expect to be called with filesystem specific locks held.

This code currently causes various issues for clustered filesystems as it
doesn't take the required cluster locks, and it also causes issues for XFS
as it doesn't properly lock against the swapext ioctl as called by the
defragmentation tools.  This in case causes data corruption if
defragmentation hits a busy loop device in the wrong time window, as
reported by RH QA.

The reason why we have this shortcut is that it saves a data copy when
doing a transformation on the loop device, which is the technical term
for using cryptoloop (or an XOR transformation).  Given that cryptoloop
has been deprecated in favour of dm-crypt my opinion is that we should
simply drop this shortcut instead of finding complicated ways to to
introduce a formal interface for this shortcut.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-17 12:57:20 +02:00
Linus Torvalds 8bc03e8f3a Merge branch 'fixes' of http://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm
* 'fixes' of http://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm:
  ARM: 7128/1: vic: Don't write to the read-only register VIC_IRQ_STATUS
  ARM: 7122/1: localtimer: add header linux/errno.h explicitly
  ARM: 7117/1: perf: fix HW_CACHE_* events on Cortex-A9
  ARM: 7113/1: mm: Align bank start to MAX_ORDER_NR_PAGES
2011-10-16 13:08:27 -07:00
Zoltan Devai f8be12d153 ARM: 7128/1: vic: Don't write to the read-only register VIC_IRQ_STATUS
This is unneeded and causes an abort on the SPMP8000 platform.

Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Zoltan Devai <zoss@devai.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-10-15 11:04:22 +01:00
Shawn Guo bb1ac3ec95 ARM: 7122/1: localtimer: add header linux/errno.h explicitly
Per the text in  Documentation/SubmitChecklist as below, we should
explicitly have header linux/errno.h in localtimer.h for ENXIO
reference.

1: If you use a facility then #include the file that defines/declares
   that facility.  Don't depend on other header files pulling in ones
   that you use.

Otherwise, we may run into some compiling error like the following one,
if any file includes localtimer.h without CONFIG_LOCAL_TIMERS defined.

  arch/arm/include/asm/localtimer.h: In function ‘local_timer_setup’:
  arch/arm/include/asm/localtimer.h:53:10: error: ‘ENXIO’ undeclared (first use in this function)

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-10-15 11:04:22 +01:00
Will Deacon 29a541f6c1 ARM: 7117/1: perf: fix HW_CACHE_* events on Cortex-A9
Using COHERENT_LINE_{MISS,HIT} for cache misses and references
respectively is completely wrong. Instead, use the L1D events which
are a better and more useful approximation despite ignoring instruction
traffic.

Reported-by: Alasdair Grant <alasdair.grant@arm.com>
Reported-by: Matt Horsnell <matt.horsnell@arm.com>
Reported-by: Michael Williams <michael.williams@arm.com>
Cc: stable@kernel.org
Cc: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-10-15 11:04:22 +01:00
Linus Torvalds 4c41042d1d Merge branch 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (w83627ehf) Properly report thermal diode sensors
2011-10-15 08:29:09 +12:00
Linus Torvalds e9308cfd5a Merge branch 'gpio/merge' of git://git.secretlab.ca/git/linux-2.6
* 'gpio/merge' of git://git.secretlab.ca/git/linux-2.6:
  gpio-pca953x: fix gpio_base
  gpio/omap: fix build error with certain OMAP1 configs
2011-10-14 17:07:52 +12:00
Linus Torvalds 480082968a Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: revert to using a kthread for AIL pushing
  xfs: force the log if we encounter pinned buffers in .iop_pushbuf
  xfs: do not update xa_last_pushed_lsn for locked items
2011-10-14 17:06:39 +12:00
Linus Torvalds 95bc156c62 Merge branch 'stable' of git://github.com/cmetcalf-tilera/linux-tile
* 'stable' of git://github.com/cmetcalf-tilera/linux-tile:
  tile: revert change from <asm/atomic.h> to <linux/atomic.h> in asm files
2011-10-14 16:59:11 +12:00
Linus Torvalds 2ad53110d6 Merge branch 'x86-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip
* 'x86-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
  x86: Default to vsyscall=native for now
2011-10-14 16:54:56 +12:00
Mika Westerberg 153b19a3b9 x86, mrst: use a temporary variable for SFI irq
SFI tables reside in RAM and should not be modified once they are
written.  Current code went to set pentry->irq to zero which causes
subsequent reads to fail with invalid SFI table checksum.  This will
break kexec as the second kernel fails to validate SFI tables.

To fix this we use temporary variable for irq number.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-14 16:53:27 +12:00
Jean Delvare bf164c58e5 hwmon: (w83627ehf) Properly report thermal diode sensors
The w83627ehf driver is improperly reporting thermal diode sensors as
type 2, instead of 3. This caused "sensors" and possibly other
monitoring tools to report these sensors as "transistor" instead of
"thermal diode".

Furthermore, diode subtype selection (CPU vs. external) is only
supported by the original W83627EHF/EHG. All later models only support
CPU diode type, and some (NCT6776F) don't even have the register in
question so we should avoid reading from it.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: stable@kernel.org
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
2011-10-13 16:51:29 -07:00
Hartmut Knaack 25fcf2b7f1 gpio-pca953x: fix gpio_base
gpio_base was set to 0 if no system platform data or open firmware
platform data was provided. This led to conflicts, if any other gpiochip
with a gpiobase of 0 was instantiated already. Setting it to -1 will
automatically use the first one available.

Signed-off-by: Hartmut Knaack <knaack.h@gmx.de>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-10-13 13:49:37 -06:00
Janusz Krzysztofik 78a4315872 gpio/omap: fix build error with certain OMAP1 configs
With commit f64ad1a0e2, "gpio/omap: cleanup _set_gpio_wakeup(), remove
ifdefs", access to build time conditionally omitted 'suspend_wakeup'
member of the 'gpio_bank' structure has been placed unconditionally in
function _set_gpio_wakeup(), which is always built. This resulted in the
driver compilation broken for certain OMAP1, i.e., non-OMAP16xx,
configurations.

Really required or not in previously excluded cases, define this
structure member unconditionally as a fix.

Tested with a custom OMAP1510 only configuration.

Signed-off-by: Janusz Krzysztofik <jkrzyszt@tis.icnet.pl>
Acked-by: Kevin Hilman <khilman@ti.com>
Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-10-13 11:48:25 -06:00
Chris Metcalf d52104b29a tile: revert change from <asm/atomic.h> to <linux/atomic.h> in asm files
The 32-bit TILEPro support uses some #defines in <asm/atomic_32.h>
for atomic support routines in assembly.  To make this more explicit,
I've turned those includes into includes of <asm/atomic_32.h>, which
should hopefully make it clear that they shouldn't be bombed into
<linux/atomic.h> in any cleanups.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2011-10-13 08:25:01 -04:00
Linus Torvalds 37cf95162a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  mscan: too much data copied to CAN frame due to 16 bit accesses
  gro: refetch inet6_protos[] after pulling ext headers
  bnx2x: fix cl_id allocation for non-eth clients for NPAR mode
  mlx4_en: fix endianness with blue frame support
2011-10-13 18:25:45 +12:00
Johann Felix Soden 1d11360121 ide: Fix file references in drivers/ide/
Fix file references in drivers/ide/

There are a lot of file references to now moved or deleted files in the
whole tree, especially in documentation and Kconfig files.  This patch
fixes the references in drivers/ide/.

Signed-off-by: Johann Felix Soden <johfel@users.sourceforge.net>
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-13 18:25:11 +12:00
Linus Torvalds b2f9452bd5 Merge branch 'btrfs-3.0' of git://github.com/chrismason/linux
* 'btrfs-3.0' of git://github.com/chrismason/linux:
  Btrfs: make sure not to defrag extents past i_size
  Btrfs: fix recursive auto-defrag
2011-10-13 18:20:40 +12:00
Christoph Hellwig 0030807c66 xfs: revert to using a kthread for AIL pushing
Currently we have a few issues with the way the workqueue code is used to
implement AIL pushing:

 - it accidentally uses the same workqueue as the syncer action, and thus
   can be prevented from running if there are enough sync actions active
   in the system.
 - it doesn't use the HIGHPRI flag to queue at the head of the queue of
   work items

At this point I'm not confident enough in getting all the workqueue flags and
tweaks right to provide a perfectly reliable execution context for AIL
pushing, which is the most important piece in XFS to make forward progress
when the log fills.

Revert back to use a kthread per filesystem which fixes all the above issues
at the cost of having a task struct and stack around for each mounted
filesystem.  In addition this also gives us much better ways to diagnose
any issues involving hung AIL pushing and removes a small amount of code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Stefan Priebe <s.priebe@profihost.ag>
Tested-by: Stefan Priebe <s.priebe@profihost.ag>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-11 11:02:49 -05:00
Christoph Hellwig 17b38471c3 xfs: force the log if we encounter pinned buffers in .iop_pushbuf
We need to check for pinned buffers even in .iop_pushbuf given that inode
items flush into the same buffers that may be pinned directly due operations
on the unlinked inode list operating directly on buffers.  To do this add a
return value to .iop_pushbuf that tells the AIL push about this and use
the existing log force mechanisms to unpin it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Stefan Priebe <s.priebe@profihost.ag>
Tested-by: Stefan Priebe <s.priebe@profihost.ag>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-11 11:02:48 -05:00
Christoph Hellwig bc6e588a89 xfs: do not update xa_last_pushed_lsn for locked items
If an item was locked we should not update xa_last_pushed_lsn and thus skip
it when restarting the AIL scan as we need to be able to lock and write it
out as soon as possible.  Otherwise heavy lock contention might starve AIL
pushing too easily, especially given the larger backoff once we moved
xa_last_pushed_lsn all the way to the target lsn.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Stefan Priebe <s.priebe@profihost.ag>
Tested-by: Stefan Priebe <s.priebe@profihost.ag>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-11 11:02:48 -05:00
Chris Mason f7f43cc841 Btrfs: make sure not to defrag extents past i_size
The btrfs file defrag code will loop through the extents and
force COW on them.  But there is a concurrent truncate in the middle of
the defrag, it might end up defragging the same range over and over
again.

The problem is that writepage won't go through and do anything on pages
past i_size, so the cow won't happen, so the file will appear to still
be fragmented.  defrag will end up hitting the same extents again and
again.

In the worst case, the truncate can actually live lock with the defrag
because the defrag keeps creating new ordered extents which the truncate
code keeps waiting on.

The fix here is to make defrag check for i_size inside the main loop,
instead of just once before the looping starts.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-10-11 11:45:55 -04:00
Adrian Bunk 2b666859ec x86: Default to vsyscall=native for now
This UML breakage:

  linux-2.6.30.1[3800] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb9c498 ax:ffffffffff600000 si:0 di:606790
  linux-2.6.30.1[3856] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb13168 ax:ffffffffff600000 si:0 di:606790

Is caused by commit 3ae36655 ("x86-64: Rework vsyscall emulation and add
vsyscall= parameter") - the vsyscall emulation code is not fully cooked
yet as UML relies on some rather fragile SIGSEGV semantics.

Linus suggested in https://lkml.org/lkml/2011/8/9/376 to default
to vsyscall=native for now, this patch implements that.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Andrew Lutomirski <luto@mit.edu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/20111005214047.GE14406@localhost.pp.htv.fi
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-11 08:23:34 +02:00
Li Zefan 2a0f7f5769 Btrfs: fix recursive auto-defrag
Follow those steps:

  # mount -o autodefrag /dev/sda7 /mnt
  # dd if=/dev/urandom of=/mnt/tmp bs=200K count=1
  # sync
  # dd if=/dev/urandom of=/mnt/tmp bs=8K count=1 conv=notrunc

and then it'll go into a loop: writeback -> defrag -> writeback ...

It's because writeback writes [8K, 200K] and then writes [0, 8K].

I tried to make writeback know if the pages are dirtied by defrag,
but the patch was a bit intrusive. Here I simply set writeback_index
when we defrag a file.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-10-10 15:43:34 -04:00
Wolfgang Grandegger a3a4bfde8a mscan: too much data copied to CAN frame due to 16 bit accesses
Due to the 16 bit access to mscan registers there's too much data copied to
the zero initialized CAN frame when having an odd number of bytes to copy.
This patch ensures that only the requested bytes are copied by using an
8 bit access for the remaining byte.

Reported-by: Andre Naujoks <nautsch@gmail.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-10 14:31:00 -04:00
Yan, Zheng cdaf557034 gro: refetch inet6_protos[] after pulling ext headers
ipv6_gro_receive() doesn't update the protocol ops after pulling
the ext headers. It looks like a typo.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-10 14:26:16 -04:00
Dmitry Kravkov 134d0f974c bnx2x: fix cl_id allocation for non-eth clients for NPAR mode
There are some consolidations of NPAR configuration
when FCoE and iSCSI L2 clients will get the same id,
in this case FCoE ring will be non-functional.

Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-10 14:21:26 -04:00