Commit graph

311747 commits

Author SHA1 Message Date
Jeff Layton
ec01d738a1 cifs: when server doesn't set CAP_LARGE_READ_X, cap default rsize at MaxBufferSize
When the server doesn't advertise CAP_LARGE_READ_X, then MS-CIFS states
that you must cap the size of the read at the client's MaxBufferSize.
Unfortunately, testing with many older servers shows that they often
can't service a read larger than their own MaxBufferSize.

Since we can't assume what the server will do in this situation, we must
be conservative here for the default. When the server can't do large
reads, then assume that it can't satisfy any read larger than its
MaxBufferSize either.

Luckily almost all modern servers can do large reads, so this won't
affect them. This is really just for older win9x and OS/2 era servers.
Also, note that this patch just governs the default rsize. The admin can
always override this if he so chooses.

Cc: <stable@vger.kernel.org> # 3.2
Reported-by: David H. Durgee <dhdurgee@acm.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steven French <sfrench@w500smf.(none)>
2012-07-03 12:54:42 -05:00
Linus Torvalds
2fb748d265 md: collection of bug fixes for 3.5
You go away for 2 weeks vacation and what do you get when you come back?
 Piles of bugs :-)
 
 Some found by inspection, some by testing, some during use in the field,
 and some while developing for the next window...
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQIVAwUAT/KkEDnsnt1WYoG5AQJ7Wg/7Bsx63RRwcek5nr2qyUJEaNN8gW8U+69T
 cobf1epSZiVwjeR6DqyB7J3SbNGubH8ZFUV1qdjpoypJq6Wvt18Bh6NV70RVJtpu
 QTalNuu4xBYBn9t7vtgG9L31/zPz9XkwXFUm7oPsr1iMqyahTPe51/6rlmn3axNI
 o92scqAOIk2XpSA8T/fbe/ixi6wC1eF4MVv/0gVrwm4mCBIas1EhA6FjpkE8GPVi
 CWMwfN789w/A5rFBbjzKW/F5F1OhELB9M0kGM80JmGUKm3y3dU/aL7HLQb+bnnfo
 89BQeWLVcCwWhu2DwJYRPmZZRkqsTbEyLSrSLPv3nSzGHSyPJp8QpEdGaNVFf44A
 Emq4JqfOpBcjm4kP9tnNV8l+T0k0rjYo81Db4tg9eVtQ3Ld3ZCHosiJWjIMhPJq9
 vWQRYUc0OrRopdy0ugd3AYeHTkOaNVp781Ieph1v6lScdg38wHsA1cJ89jceeiwQ
 MJSFhNLckDlYdvmu5Hzp/St2FDiGihlIxHcaqVmvnJINCegdpmXQOPKJKp16fBdP
 XxE81wbHjtJtlQnjlx9jTMMVpSw3UxOcv6aW6avT/y4Zr8r143Z12YySG/DbfBg6
 SJtzIt3md5ebrTV3yVjWZ22jxKKxs3Eg/KU6XzlZmVTHKZ/cFN14FSFMugBk3dr9
 +NpaoIMZfls=
 =vv4r
 -----END PGP SIGNATURE-----

Merge tag 'md-3.5-fixes' of git://neil.brown.name/md

Pull md fixes from NeilBrown:
 "md: collection of bug fixes for 3.5

  You go away for 2 weeks vacation and what do you get when you come
  back? Piles of bugs :-)

  Some found by inspection, some by testing, some during use in the
  field, and some while developing for the next window..."

* tag 'md-3.5-fixes' of git://neil.brown.name/md:
  md: fix up plugging (again).
  md: support re-add of recovering devices.
  md/raid1: fix bug in read_balance introduced by hot-replace
  raid5: delayed stripe fix
  md/raid456: When read error cannot be recovered, record bad block
  md: make 'name' arg to md_register_thread non-optional.
  md/raid10: fix failure when trying to repair a read error.
  md/raid5: fix refcount problem when blocked_rdev is set.
  md:Add blk_plug in sync_thread.
  md/raid5: In ops_run_io, inc nr_pending before calling md_wait_for_blocked_rdev
  md/raid5: Do not add data_offset before call to is_badblock
  md/raid5: prefer replacing failed devices over want-replacement devices.
  md/raid10: Don't try to recovery unmatched (and unused) chunks.
2012-07-03 10:40:43 -07:00
Linus Torvalds
3bfd245476 Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security layer fixes from James Morris.

A documentation update, and a nommu build fix.

* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  security: Fix nommu build.
  security: document no_new_privs
2012-07-03 10:39:40 -07:00
Milan Broz
18068bdd5f dm: verity fix documentation
Veritysetup is now part of cryptsetup package.
Remove on-disk header description (which is not parsed in kernel)
and point users to cryptsetup where it the format is documented.
Mention units for block size paramaters.
Fix target line specification and dmsetup parameters.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-03 12:55:41 +01:00
Mike Snitzer
b0239faaf8 dm persistent data: fix allocation failure in space map checker init
If CONFIG_DM_DEBUG_SPACE_MAPS is enabled and memory is fragmented and a
sufficiently-large metadata device is used in a thin pool then the space
map checker will fail to allocate the memory it requires.

Switch from kmalloc to vmalloc to allow larger virtually contiguous
allocations for the space map checker's internal count arrays.

Reported-by: Vivek Goyal <vgoyal@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-03 12:55:37 +01:00
Mike Snitzer
62662303e7 dm persistent data: handle space map checker creation failure
If CONFIG_DM_DEBUG_SPACE_MAPS is enabled and dm_sm_checker_create()
fails, dm_tm_create_internal() would still return success even though it
cleaned up all resources it was supposed to have created.  This will
lead to a kernel crash:

general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
...
RIP: 0010:[<ffffffff81593659>]  [<ffffffff81593659>] dm_bufio_get_block_size+0x9/0x20
Call Trace:
  [<ffffffff81599bae>] dm_bm_block_size+0xe/0x10
  [<ffffffff8159b8b8>] sm_ll_init+0x78/0xd0
  [<ffffffff8159c1a6>] sm_ll_new_disk+0x16/0xa0
  [<ffffffff8159c98e>] dm_sm_disk_create+0xfe/0x160
  [<ffffffff815abf6e>] dm_pool_metadata_open+0x16e/0x6a0
  [<ffffffff815aa010>] pool_ctr+0x3f0/0x900
  [<ffffffff8158d565>] dm_table_add_target+0x195/0x450
  [<ffffffff815904c4>] table_load+0xe4/0x330
  [<ffffffff815917ea>] ctl_ioctl+0x15a/0x2c0
  [<ffffffff81591963>] dm_ctl_ioctl+0x13/0x20
  [<ffffffff8116a4f8>] do_vfs_ioctl+0x98/0x560
  [<ffffffff8116aa51>] sys_ioctl+0x91/0xa0
  [<ffffffff81869f52>] system_call_fastpath+0x16/0x1b

Fix the space map checker code to return an appropriate ERR_PTR and have
dm_sm_disk_create() and dm_tm_create_internal() check for it with
IS_ERR.

Reported-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-03 12:55:35 +01:00
Mike Snitzer
25d7cd6faa dm persistent data: fix shadow_info_leak on dm_tm_destroy
Cleanup the shadow table before destroying the transaction manager.

Reference: leak was identified with kmemleak when running
test_discard_random_sectors in the thinp-test-suite.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-03 12:55:33 +01:00
Joe Thornber
0d200aefd4 dm thin: commit metadata before creating metadata snapshot
Userland sometimes sees a corrupt metadata block if metadata is changing
rapidly when a metadata snapshot is reserved for userland,  To make the
problem go away, commit before we take the metadata snapshot (which is a
sensible thing to do anyway).

The checksums mean userland spots this corruption immediately so there's
no risk of acting on incorrect data.  No corruption exists from the
kernel's point of view, and thin_check passes after pool shutdown.

I believe this is to do with shared blocks at the first level of the
{device, mapping} btree.  Prior to the metadata-snap support no sharing
at this level was possible, so this patch is only required after commit
cc8394d86f ("dm thin: provide userspace
access to pool metadata").

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-03 12:55:31 +01:00
Paul Mundt
75331a597c security: Fix nommu build.
The security + nommu configuration presently blows up with an undefined
reference to BDI_CAP_EXEC_MAP:

security/security.c: In function 'mmap_prot':
security/security.c:687:36: error: dereferencing pointer to incomplete type
security/security.c:688:16: error: 'BDI_CAP_EXEC_MAP' undeclared (first use in this function)
security/security.c:688:16: note: each undeclared identifier is reported only once for each function it appears in

include backing-dev.h directly to fix it up.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-07-03 21:41:03 +10:00
Daniel Vetter
9f846a16d2 drm/i915: kick any firmware framebuffers before claiming the gtt
Especially vesafb likes to map everything as uc- (yikes), and if that
mapping hangs around still while we try to map the gtt as wc the
kernel will downgrade our request to uc-, resulting in abyssal
performance.

Unfortunately we can't do this as early as readon does (i.e. as the
first thing we do when initializing the hw) because our fb/mmio space
region moves around on a per-gen basis. So I've had to move it below
the gtt initialization, but that seems to work, too. The important
thing is that we do this before we set up the gtt wc mapping.

Now an altogether different question is why people compile their
kernels with vesafb enabled, but I guess making things just work isn't
bad per se ...

v2:
- s/radeondrmfb/inteldrmfb/
- fix up error handling

v3: Kill #ifdef X86, this is Intel after all. Noticed by Ben Widawsky.

v4: Jani Nikula complained about the pointless bool primary
initialization.

v5: Don't oops if we can't allocate, noticed by Chris Wilson.

v6: Resolve conflicts with agp rework and fixup whitespace.

This is commit e188719a28 in drm-next.

Backport to 3.5 -fixes queue requested by Dave Airlie - due to grub
using vesa on fedora their initrd seems to load vesafb before loading
the real kms driver. So tons more people actually experience a
dead-slow gpu. Hence also the Cc: stable.

Cc: stable@vger.kernel.org
Reported-and-tested-by: "Kilarski, Bernard R" <bernard.r.kilarski@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-07-03 11:18:48 +01:00
Takashi Iwai
7b668ebe2f drm: edid: Don't add inferred modes with higher resolution
When a monitor EDID doesn't give the preferred bit, driver assumes
that the mode with the higest resolution and rate is the preferred
mode.  Meanwhile the recent changes for allowing more modes in the
GFT/CVT ranges give actually more modes, and some modes may be over
the native size.  Thus such a mode would be picked up as the preferred
mode although it's no native resolution.

For avoiding such a problem, this patch limits the addition of
inferred modes by checking not to be greater than other modes.
Also, it checks the duplicated mode entry at the same time.

Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-07-03 11:18:10 +01:00
Jerome Glisse
1ef5325b23 drm/radeon: fix rare segfault
In gem idle/busy ioctl the radeon object was derefenced after
drm_gem_object_unreference_unlocked which in case the object
have been destroyed lead to use of a possibly free pointer with
possibly wrong data.

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-07-03 11:17:09 +01:00
NeilBrown
b357f04a67 md: fix up plugging (again).
The value returned by "mddev_check_plug" is only valid until the
next 'schedule' as that will unplug things.  This could happen at any
call to mempool_alloc.
So just calling mddev_check_plug at the start doesn't really make
sense.

So call it just before, or just after, queuing things for the thread.
As the action that happens at unplug is to wake the thread, this makes
lots of sense.
If we cannot add a plug (which requires a small GFP_ATOMIC alloc) we
wake thread immediately.

RAID5 is a bit different.  Requests are queued for the thread and the
thread is woken by release_stripe.  So we don't need to wake the
thread on failure.
However the thread doesn't perform certain actions when there is any
active plug, so it is important to install a plug before waking the
thread.  So for RAID5 we install the plug *before* queuing the request
and waking the thread.

Without this patch it is possible for raid1 or raid10 to queue a
request without then waking the thread, resulting in the array locking
up.

Also change raid10 to only flush_pending_write when there are not
active plugs, just like raid1.

This patch is suitable for 3.0 or later.  I plan to submit it to
-stable, but I'll like to let it spend a few weeks in mainline
first to be sure it is completely safe.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03 17:45:31 +10:00
NeilBrown
f456309106 md: support re-add of recovering devices.
We currently only allow a device to be re-added if it appear to be
in-sync.  This is overly restrictive as it may be desirable to re-add
a device that is in the middle of recovery.

So remove the test for "InSync" - the test on rdev->raid_disk is
sufficient to ensure that the re-add will succeed.

Reported-by: Alexander Lyakas <alex.bolshoy@gmail.com>
Tested-by: Alexander Lyakas <alex.bolshoy@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03 15:59:06 +10:00
NeilBrown
32644afd89 md/raid1: fix bug in read_balance introduced by hot-replace
When we added hot_replace we doubled the number of devices
that could be in a RAID1 array.  So we doubled how far read_balance
would search.  Unfortunately we didn't double the point at which
it looped back to the beginning - so it effectively loops over
all non-replacement disks twice.
This doesn't cause bad behaviour, but it pointless and means we
never read from replacement devices.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03 15:58:42 +10:00
Shaohua Li
fab363b5ff raid5: delayed stripe fix
There isn't locking setting STRIPE_DELAYED and STRIPE_PREREAD_ACTIVE bits, but
the two bits have relationship. A delayed stripe can be moved to hold list only
when preread active stripe count is below IO_THRESHOLD. If a stripe has both
the bits set, such stripe will be in delayed list and preread count not 0,
which will make such stripe never leave delayed list.

Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03 15:57:19 +10:00
majianpeng
2e8ac30312 md/raid456: When read error cannot be recovered, record bad block
We may not be able to fix a bad block if:
 - the array is degraded
 - the over-write fails.

In these cases we currently eject the device, but we should
record a bad block if possible.

Signed-off-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03 15:57:02 +10:00
NeilBrown
0232605d98 md: make 'name' arg to md_register_thread non-optional.
Having the 'name' arg optional and defaulting to the current
personality name is no necessary and leads to errors, as when
changing the level of an array we can end up using the
name of the old level instead of the new one.

So make it non-optional and always explicitly pass the name
of the level that the array will be.

Reported-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03 15:56:52 +10:00
NeilBrown
055d3747db md/raid10: fix failure when trying to repair a read error.
commit 58c54fcca3
     md/raid10: handle further errors during fix_read_error better.

in 3.1 added "r10_sync_page_io" which takes an IO size in sectors.
But we were passing the IO size in bytes!!!
This resulting in bio_add_page failing, and empty request being sent
down, and a consequent BUG_ON in scsi_lib.

[fix missing space in error message at same time]

This fix is suitable for 3.1.y and later.

Cc: stable@vger.kernel.org
Reported-by: Christian Balzer <chibi@gol.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03 15:55:33 +10:00
Linus Torvalds
9d4056aa9e Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull a couple more powerpc fixes from Benjamin Herrenschmidt:
 "Here are two more fixes that I "missed" when scrubbing patchwork last
  week which are worth still having in 3.5."

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc/kvm: sldi should be sld
  powerpc/xmon: Use cpumask iterator to avoid warning
2012-07-02 19:52:25 -07:00
Andy Lutomirski
09b243577b security: document no_new_privs
Document no_new_privs.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-07-03 12:35:36 +10:00
NeilBrown
5f066c632f md/raid5: fix refcount problem when blocked_rdev is set.
commit 43220aa0f2
    md/raid5: fix a hang on device failure.

fixed a hang, but introduced a refcounting in-balance so
that if the presence of bad-blocks ever caused an rdev to
be 'blocked' we would increment the refcount on the rdev and
never decrement it.

So added the needed rdev_dec_pending when md_wait_for_blocked_rdev
is not called.

Reported-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03 12:13:29 +10:00
majianpeng
7c2c57c9a9 md:Add blk_plug in sync_thread.
Add blk_plug in sync_thread will increase the performance of sync.
Because sync_thread did not blk_plug,so when raid sync, the bio merge
not well.

Testing environment:
SATA controller: Intel Corporation 82801JI (ICH10 Family) SATA AHCI
Controller.
OS:Linux xxx 3.5.0-rc2+ #340 SMP Tue Jun 12 09:00:25 CST 2012
x86_64 x86_64 x86_64 GNU/Linux.
RAID5: four ST31000524NS disk.

Without blk_plug:recovery speed about 63M/Sec;
Add blk_plug:recovery speed about 120M/Sec.

Using blktrace:
blktrace -d /dev/sdb -w 60  -o -|blkparse -i -

without blk_plug:
Total (8,16):
 Reads Queued:      309811,     1239MiB	 Writes Queued:           0,        0KiB
 Read Dispatches:   283583,     1189MiB	 Write Dispatches:        0,        0KiB
 Reads Requeued:         0		 Writes Requeued:         0
 Reads Completed:   273351,     1149MiB	 Writes Completed:        0,        0KiB
 Read Merges:        23533,    94132KiB	 Write Merges:            0,        0KiB
 IO unplugs:             0        	 Timer unplugs:           0

add blk_plug:
Total (8,16):
 Reads Queued:      428697,     1714MiB	 Writes Queued:           0,        0KiB
 Read Dispatches:     3954,     1714MiB	 Write Dispatches:        0,        0KiB
 Reads Requeued:         0		 Writes Requeued:         0
 Reads Completed:     3956,     1715MiB	 Writes Completed:        0,        0KiB
 Read Merges:       424743,     1698MiB	 Write Merges:            0,        0KiB
 IO unplugs:             0        	 Timer unplugs:        3384

The ratio of merge will be markedly increased.

Signed-off-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03 12:12:26 +10:00
majianpeng
1850753d2e md/raid5: In ops_run_io, inc nr_pending before calling md_wait_for_blocked_rdev
In ops_run_io(), the call to md_wait_for_blocked_rdev will decrement
nr_pending so we lose the reference we hold on the rdev.
So atomic_inc it first to maintain the reference.

This bug was introduced by commit  73e92e51b7
    md/raid5.  Don't write to known bad block on doubtful devices.

which appeared in 3.0, so patch is suitable for stable kernels since
then.

Cc: stable@vger.kernel.org
Signed-off-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03 12:11:54 +10:00
majianpeng
6c0544e255 md/raid5: Do not add data_offset before call to is_badblock
In chunk_aligned_read() we are adding data_offset before calling
is_badblock.  But is_badblock also adds data_offset, so that is bad.

So move the addition of data_offset to after the call to
is_badblock.

This bug was introduced by commit 31c176ecdf
     md/raid5: avoid reading from known bad blocks.
which first appeared in 3.0.  So that patch is suitable for any
-stable kernel from 3.0.y onwards.  However it will need minor
revision for most of those (as the comment didn't appear until
recently).

Cc: stable@vger.kernel.org
Signed-off-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03 12:09:57 +10:00
NeilBrown
5cfb22a1f8 md/raid5: prefer replacing failed devices over want-replacement devices.
If a RAID5 has both a failed device and a device marked as
'WantReplacement', then we should preferentially replace the failed
device.
However the current code replaces whichever is found first.
So split into 2 loops, check fail failed/missing first, and only check
for WantReplacement if nothing is failed or missing.

Reported-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03 11:46:53 +10:00
NeilBrown
fc448a18ae md/raid10: Don't try to recovery unmatched (and unused) chunks.
If a RAID10 has an odd number of chunks - as might happen when there
are an odd number of devices - the last chunk has no pair and so is
not mirrored.  We don't store data there, but when recovering the last
device in an array we retry to recover that last chunk from a
non-existent location.  This results in an error, and the recovery
aborts.

When we get to that last chunk we should just stop - there is nothing
more to do anyway.

This bug has been present since the introduction of RAID10, so the
patch is appropriate for any -stable kernel.

Cc: stable@vger.kernel.org
Reported-by: Christian Balzer <chibi@gol.com>
Tested-by: Christian Balzer <chibi@gol.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03 10:37:30 +10:00
Alex Williamson
326cf0334b KVM: Sanitize KVM_IRQFD flags
We only know of one so far.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-02 21:10:30 -03:00
Alex Williamson
f36992e312 KVM: Add missing KVM_IRQFD API documentation
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-02 21:10:30 -03:00
Alex Williamson
d4db2935e4 KVM: Pass kvm_irqfd to functions
Prune this down to just the struct kvm_irqfd so we can avoid
changing function definition for every flag or field we use.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-02 21:10:30 -03:00
Arnd Bergmann
d19550e5b8 Merge branch 'fixes' of git://github.com/hzhuang1/linux into fixes
* 'fixes' of git://github.com/hzhuang1/linux:
  ARM: pxa: hx4700: Fix basic suspend/resume
2012-07-02 23:14:29 +02:00
Chris Mason
b6305567e7 Btrfs: run delayed directory updates during log replay
While we are resolving directory modifications in the
tree log, we are triggering delayed metadata updates to
the filesystem btrees.

This commit forces the delayed updates to run so the
replay code can find any modifications done.  It stops
us from crashing because the directory deleltion replay
expects items to be removed immediately from the tree.

Signed-off-by: Chris Mason <chris.mason@fusionio.com>
cc: stable@kernel.org
2012-07-02 15:39:19 -04:00
Josef Bacik
7fd1a3f73f Btrfs: hold a ref on the inode during writepages
We can race with unlink and not actually be able to do our igrab in
btrfs_add_ordered_extent.  This will result in all sorts of problems.
Instead of doing the complicated work to try and handle returning an error
properly from btrfs_add_ordered_extent, just hold a ref to the inode during
writepages.  If we cannot grab a ref we know we're freeing this inode anyway
and can just drop the dirty pages on the floor, because screw them we're
going to invalidate them anyway.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-07-02 15:39:18 -04:00
Josef Bacik
bdb7d303b3 Btrfs: fix tree log remove space corner case
The tree log stuff can have allocated space that we end up having split
across a bitmap and a real extent.  The free space code does not deal with
this, it assumes that if it finds an extent or bitmap entry that the entire
range must fall within the entry it finds.  This isn't necessarily the case,
so rework the remove function so it can handle this case properly.  This
fixed two panics the user hit, first in the case where the space was
initially in a bitmap and then in an extent entry, and then the reverse
case.  Thanks,

Reported-and-tested-by: Shaun Reich <sreich@kde.org>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-07-02 15:39:18 -04:00
Liu Bo
6bf02314d9 Btrfs: fix wrong check during log recovery
When we're evicting an inode during log recovery, we need to ensure that the inode
is not in orphan state any more, which means inode's run_time flags has _no_
BTRFS_INODE_HAS_ORPHAN_ITEM.  Thus, the BUG_ON was triggered because of a wrong
check for the flags.

Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-07-02 15:39:17 -04:00
Alexander Block
d3a94048c9 Btrfs: use _IOR for BTRFS_IOC_SUBVOL_GETFLAGS
We used the wrong ioctl macro for the getflags ioctl before.
As we don't have the set/getflags ioctls in the user space ioctl.h
at the moment, it's safe to fix it now.

Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
2012-07-02 15:39:17 -04:00
Ilya Dryomov
2b6ba629b5 Btrfs: resume balance on rw (re)mounts properly
This introduces btrfs_resume_balance_async(), which, given that
restriper state was recovered earlier by btrfs_recover_balance(),
resumes balance in btrfs-balance kthread.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-07-02 15:39:17 -04:00
Ilya Dryomov
68310a5e42 Btrfs: restore restriper state on all mounts
Fix a bug that triggered asserts in btrfs_balance() in both normal and
resume modes -- restriper state was not properly restored on read-only
mounts.  This factors out resuming code from btrfs_restore_balance(),
which is now also called earlier in the mount sequence to avoid the
problem of some early writes getting the old profile.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-07-02 15:39:16 -04:00
Josef Bacik
c3473e8300 Btrfs: fix dio write vs buffered read race
Miao pointed out there's a problem with mixing dio writes and buffered
reads.  If the read happens between us invalidating the page range and
actually locking the extent we can bring in pages into page cache.  Then
once the write finishes if somebody tries to read again it will just find
uptodate pages and we'll read stale data.  So we need to lock the extent and
check for uptodate bits in the range.  If there are uptodate bits we need to
unlock and invalidate again.  This will keep this race from happening since
we will hold the extent locked until we create the ordered extent, and then
teh read side always waits for ordered extents.  There was also a race in
how we updated i_size, previously we were relying on the generic DIO stuff
to adjust the i_size after the DIO had completed, but this happens outside
of the extent lock which means reads could come in and not see the updated
i_size.  So instead move this work into where we create the extents, and
then this way the update ordered i_size stuff works properly in the endio
handlers.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-07-02 15:36:23 -04:00
Stefan Behrens
597a60fade Btrfs: don't count I/O statistic read errors for missing devices
It is normal behaviour of the low level btrfs function btrfs_map_bio()
to complete a bio with -EIO if the device is missing, instead of just
preventing the bio creation in an earlier step.
This used to cause I/O statistic read error increments and annoying
printk_ratelimited messages. This commit fixes the issue.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Reported-by: Carey Underwood <cwillu@cwillu.com>
2012-07-02 15:36:23 -04:00
Kevin Hilman
5941b8142e ARM: OMAP4: TWL6030: ensure sys_nirq1 is mux'd and wakeup enabled
The SYS_NIRQ1 pin is the interupt line for the PMIC part of the TWL6030
and interrupts from the PMIC are needed as wakeup sources.

Ensure this pin is mux'd as input and has wakeup enabled so PMIC
interupts (e.g. RTC) can be used as wakeup sources.

Tested on OMAP4430/Panda.

Signed-off-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
2012-07-02 04:59:04 -07:00
Kevin Hilman
95669d7881 ARM: OMAP2: Overo: init I2C before MMC to fix MMC suspend/resume failure
In order for suspend/resume dependencies to work correctly, I2C has to
be initialized (more specifically, registered with the driver core)
before MMC.  Without this, the MMC driver fails to adjust the VMMC
regulator (using i2c writes) during the suspend path.

Problem found testing suspend/resume on 3730/OveroSTORM platform.

Signed-off-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
2012-07-02 04:58:47 -07:00
Fabio Estevam
396c89b327 ARM: imx27_visstrim_m10: Do not include <asm/system.h>
commit 435ca24 (ARM i.MX: Visstrim_M10: Add board version detection)
included <asm/system.h>, which is a header file about to be deleted according to
9f97da (Disintegrate asm/system.h for ARM)

Include <asm/system_info.h> instead.

Reported-by: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
2012-07-02 11:35:34 +02:00
Michael Neuling
2f584a146a powerpc/kvm: sldi should be sld
Since we are taking a registers, this should never have been an sldi.
Talking to paulus offline, this is the correct fix.

Was introduced by:
 commit 19ccb76a19
 Author: Paul Mackerras <paulus@samba.org>
 Date:   Sat Jul 23 17:42:46 2011 +1000

Talking to paulus, this shouldn't be a literal.

Signed-off-by: Michael Neuling <mikey@neuling.org>
CC: <stable@kernel.org> [v3.2+]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-02 14:30:12 +10:00
Anton Blanchard
bc1d770291 powerpc/xmon: Use cpumask iterator to avoid warning
We have a bug report where the kernel hits a warning in the cpumask
code:

WARNING: at include/linux/cpumask.h:107

Which is:
        WARN_ON_ONCE(cpu >= nr_cpumask_bits);

The backtrace is:
        cpu_cmd
        cmds
        xmon_core
        xmon
        die

xmon is iterating through 0 to NR_CPUS. I'm not sure why we are still
open coding this but iterating above nr_cpu_ids is definitely a bug.

This patch iterates through all possible cpus, in case we issue a
system reset and CPUs in an offline state call in.

Perhaps the old code was trying to handle CPUs that were in the
partition but were never started (eg kexec into a kernel with an
nr_cpus= boot option). They are going to die way before we get into
xmon since we haven't set any kernel state up for them.

Signed-off-by: Anton Blanchard <anton@samba.org>
CC: <stable@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-02 14:30:11 +10:00
Linus Torvalds
ca24a14557 Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm
Pull two ARM fixes from Russell King:
 "It's been fairly quiet with the fixes.  Just two this time.  One fixes
  a long standing problem with KALLSYMS needing an additional pass, and
  the other sorts a problem with the vmalloc space interacting with
  static IO mappings."

* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
  ARM: 7438/1: fill possible PMD empty section gaps
  ARM: 7428/1: Prevent KALLSYM size mismatch on ARM.
2012-07-01 11:02:25 -07:00
Nicolas Pitre
19b52abe3c ARM: 7438/1: fill possible PMD empty section gaps
On ARM with the 2-level page table format, a PMD entry is represented by
two consecutive section entries covering 2MB of virtual space.

However, static mappings always were allowed to use separate 1MB section
entries.  This means in practice that a static mapping may create half
populated PMDs via create_mapping().

Since commit 0536bdf33f (ARM: move iotable mappings within the vmalloc
region) those static mappings are located in the vmalloc area. We must
ensure no such half populated PMDs are accessible once vmalloc() or
ioremap() start looking at the vmalloc area for nearby free virtual
address ranges, or various things leading to a kernel crash will happen.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Reported-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: "R, Sricharan" <r.sricharan@ti.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-07-01 14:21:35 +01:00
Bruce Allan
2e1706f234 e1000e: remove use of IP payload checksum
Currently only used when packet split mode is enabled with jumbo frames,
IP payload checksum (for fragmented UDP packets) is mutually exclusive with
receive hashing offload since the hardware uses the same space in the
receive descriptor for the hardware-provided packet checksum and the RSS
hash, respectively.  Users currently must disable jumbos when receive
hashing offload is enabled, or vice versa, because of this incompatibility.
Since testing has shown that IP payload checksum does not provide any real
benefit, just remove it so that there is no longer a choice between jumbos
or receive hashing offload but not both as done in other Intel GbE drivers
(e.g. e1000, igb).

Also, add a missing check for IP checksum error reported by the hardware;
let the stack verify the checksum when this happens.

CC: stable <stable@vger.kernel.org> [3.4]
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-01 00:25:32 -07:00
Paul Parsons
6416c0409d ARM: pxa: hx4700: Fix basic suspend/resume
Basic suspend/resume is fixed by ensuring that the PGSR registers are
set correctly before sleep mode is entered. In particular four of the
active low resets need to be driven high while in sleep mode, otherwise
the unit resets itself instead of suspending. Another problem was that
the PCFR_GPROD bit is set by the HTC bootloader; this caused GPIO reset
(i.e. the reset button) to fail immediately after returning from sleep
mode.

Signed-off-by: Paul Parsons <lost.distance@yahoo.com>
Cc: Philipp Zabel <philipp.zabel@gmail.com>
Signed-off-by: Haojian Zhuang <haojian.zhuang@gmail.com>
2012-07-01 14:40:58 +08:00
Neil Horman
4244854d22 sctp: be more restrictive in transport selection on bundled sacks
It was noticed recently that when we send data on a transport, its possible that
we might bundle a sack that arrived on a different transport.  While this isn't
a major problem, it does go against the SHOULD requirement in section 6.4 of RFC
2960:

 An endpoint SHOULD transmit reply chunks (e.g., SACK, HEARTBEAT ACK,
   etc.) to the same destination transport address from which it
   received the DATA or control chunk to which it is replying.  This
   rule should also be followed if the endpoint is bundling DATA chunks
   together with the reply chunk.

This patch seeks to correct that.  It restricts the bundling of sack operations
to only those transports which have moved the ctsn of the association forward
since the last sack.  By doing this we guarantee that we only bundle outbound
saks on a transport that has received a chunk since the last sack.  This brings
us into stricter compliance with the RFC.

Vlad had initially suggested that we strictly allow only sack bundling on the
transport that last moved the ctsn forward.  While this makes sense, I was
concerned that doing so prevented us from bundling in the case where we had
received chunks that moved the ctsn on multiple transports.  In those cases, the
RFC allows us to select any of the transports having received chunks to bundle
the sack on.  so I've modified the approach to allow for that, by adding a state
variable to each transport that tracks weather it has moved the ctsn since the
last sack.  This I think keeps our behavior (and performance), close enough to
our current profile that I think we can do this without a sysctl knob to
enable/disable it.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Vlad Yaseivch <vyasevich@gmail.com>
CC: David S. Miller <davem@davemloft.net>
CC: linux-sctp@vger.kernel.org
Reported-by: Michele Baldessari <michele@redhat.com>
Reported-by: sorin serban <sserban@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-30 22:44:35 -07:00