Commit graph

310440 commits

Author SHA1 Message Date
Michael S. Tsirkin
3bbf372c6c virtio-net: remove useless disable on freeze
disable_cb is just an optimization: it
can not guarantee that there are no callbacks.
In particular it doesn't have any effect when
event index is on.

Instead, detach, napi disable and reset on freeze ensure we don't run
concurrently with a callback.

Remove the useless calls so we get same behaviour
with and without event index.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-30 16:36:15 -04:00
Joe Perches
0053ea9c34 netdevice: Update netif_dbg for CONFIG_DYNAMIC_DEBUG
Make netif_dbg use dynamic debugging whenever
CONFIG_DYNAMIC_DEBUG is enabled.

commit b558c96ffa
("dynamic_debug: make dynamic-debug supersede DEBUG ccflag")
missed updating the netif_dbg variant.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-30 16:34:27 -04:00
H. Peter Anvin
bbd771474e Merge branch 'x86/trampoline' into x86/urgent
x86/trampoline contains an urgent commit which is necessarily on a
newer baseline.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-05-30 12:11:32 -07:00
Linus Torvalds
af56e0aa35 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph updates from Sage Weil:
 "There are some updates and cleanups to the CRUSH placement code, a bug
  fix with incremental maps, several cleanups and fixes from Josh Durgin
  in the RBD block device code, a series of cleanups and bug fixes from
  Alex Elder in the messenger code, and some miscellaneous bounds
  checking and gfp cleanups/fixes."

Fix up trivial conflicts in net/ceph/{messenger.c,osdmap.c} due to the
networking people preferring "unsigned int" over just "unsigned".

* git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (45 commits)
  libceph: fix pg_temp updates
  libceph: avoid unregistering osd request when not registered
  ceph: add auth buf in prepare_write_connect()
  ceph: rename prepare_connect_authorizer()
  ceph: return pointer from prepare_connect_authorizer()
  ceph: use info returned by get_authorizer
  ceph: have get_authorizer methods return pointers
  ceph: ensure auth ops are defined before use
  ceph: messenger: reduce args to create_authorizer
  ceph: define ceph_auth_handshake type
  ceph: messenger: check return from get_authorizer
  ceph: messenger: rework prepare_connect_authorizer()
  ceph: messenger: check prepare_write_connect() result
  ceph: don't set WRITE_PENDING too early
  ceph: drop msgr argument from prepare_write_connect()
  ceph: messenger: send banner in process_connect()
  ceph: messenger: reset connection kvec caller
  libceph: don't reset kvec in prepare_write_banner()
  ceph: ignore preferred_osd field
  ceph: fully initialize new layout
  ...
2012-05-30 11:17:19 -07:00
Linus Torvalds
65a50c951a Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar.

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  perf ui browser: Stop using 'self'
  perf annotate browser: Read perf config file for settings
  perf config: Allow '_' in config file variable names
  perf annotate browser: Make feature toggles global
  perf annotate browser: The idx_asm field should be used in asm only view
  perf tools: Convert critical messages to ui__error()
  perf ui: Make --stdio default when TUI is not supported
  tools lib traceevent: Silence compiler warning on 32bit build
  perf record: Fix branch_stack type in perf_record_opts
  perf tools: Reconstruct event with modifiers from perf_event_attr
  perf top: Fix counter name fixup when fallbacking to cpu-clock
  perf tools: fix thread_map__new_by_pid_str() memory leak in error path
  perf tools: Do not use _FORTIFY_SOURCE when DEBUG=1 is specified
  tools lib traceevent: Fix signature of create_arg_item()
  tools lib traceevent: Use proper function parameter type
  tools lib traceevent: Fix freeing arg on process_dynamic_array()
  tools lib traceevent: Fix a possibly wrong memory dereference
  tools lib traceevent: Fix a possible memory leak
  tools lib traceevent: Allow expressions in __print_symbolic() fields
  perf evlist: Explicititely initialize input_name
  ...
2012-05-30 11:12:00 -07:00
H. Peter Anvin
319b6ffc6d x86, realmode: Unbreak the ia64 build of drivers/acpi/sleep.c
Revert usage of acpi_wakeup_address and move definition
to x86 architecture code in order to make compilation work
in ia64.

[jsakkine: tested compilation in ia64/x86-64 and added
proper commit message]

Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Originally-by: H. Peter Anvin <hpa@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
Link: http://lkml.kernel.org/r/1338370421-27735-1-git-send-email-jarkko.sakkinen@intel.com
Cc: Tony Luck <tony.luck@intel.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-05-30 10:12:48 -07:00
Linus Torvalds
42fe55ce90 Merge branch 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging
Pull i2c updates from Jean Delvare.

* 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
  i2c: Split I2C_M_NOSTART support out of I2C_FUNC_PROTOCOL_MANGLING
  i2c-dev: Add support for I2C_M_RECV_LEN
2012-05-30 10:03:46 -07:00
Linus Torvalds
19ce0a995f Merge git://www.linux-watchdog.org/linux-watchdog
Pull second set of watchdog updates from Wim Van Sebroeck:
 "This changeset contains following changes:
   * Add support for multiple watchdog devices.  We use dynamically
     allocated device id's for this.
   * Add locking into the generic watchdog infrastructure.
   * Add support for dynamically allocated watchdog_device structs so
     that we can deal with devices that get unbound.
   * convert following drivers to the generic watchdog framework:
     sch5627, sch5636 and sp805_wdt.
   * Add DA9052/53 PMIC watchdog support
   * Fix printk format warnings for iTCO_wdt.c"

* git://www.linux-watchdog.org/linux-watchdog:
  watchdog: iTCO_wdt.c: fix printk format warnings
  watchdog: sp805_wdt: Add clk_{un}prepare support
  watchdog: sp805_wdt: convert to watchdog core
  hwmon/sch56xx: Depend on watchdog for watchdog core functions
  watchdog: sch56xx-common: set correct bits in register()
  Watchdog: DA9052/53 PMIC watchdog support
  watchdog: sch56xx-common: Add proper ref-counting of watchdog data
  watchdog: sch56xx: Remove unnecessary checks for register changes
  watchdog: sch56xx: Use watchdog core
  watchdog: Add support for dynamically allocated watchdog_device structs
  watchdog: Add Locking support
  watchdog: watchdog_dev: Rewrite wrapper code
  watchdog: use dev_ functions
  watchdog: create all the proper device files
  watchdog: Add a flag to indicate the watchdog doesn't reboot things
  watchdog: Add multiple device support
  watchdog: watchdog_core.h: make functions extern
  watchdog: correct the name of the watchdog_core inlude file
  watchdog: Add watchdog_active() routine
  watchdog: watchdog_dev: include private header to pickup global symbol prototypes
2012-05-30 09:59:13 -07:00
Linus Torvalds
6bb340c786 Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
 "Just regular fixes, bunch from intel, quieting some of the over
  zealous power warnings, and the rest just misc.

  I've got another pull with the remaining dma-buf bits, since the vmap
  bits are in your tree now.  I'll send tomorrow just to space things
  out a bit."

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (22 commits)
  drm/edid/quirks: ViewSonic VA2026w
  drm/udl: remove unused variables.
  drm/radeon: fix XFX quirk
  drm: Use stdint types for consistency
  drm: Constify params to format_check() and framebuffer_checks()
  drm/radeon: fix typo in trinity tiling setup
  drm/udl: unlock before returning in udl_gem_mmap()
  radeon: make radeon_cs_update_pages static.
  drm/i915: tune down the noise of the RP irq limit fail
  drm/i915: Remove the error message for unbinding pinned buffers
  drm/i915: Limit page allocations to lowmem (dma32) for i965
  drm/i915: always use RPNSWREQ for turbo change requests
  drm/i915: reject doubleclocked cea modes on dp
  drm/i915: Adding TV Out Missing modes.
  drm/i915: wait for a vblank to pass after tv detect
  drm/i915: no lvds quirk for HP t5740e Thin Client
  drm/i915: enable vdd when switching off the eDP panel
  drm/i915: Fix PCH PLL assertions to not assume CRTC:PLL relationship
  drm/i915: Always update RPS interrupts thresholds along with frequency
  drm/i915: properly handle interlaced bit for sdvo dtd conversion
  ...
2012-05-30 09:55:53 -07:00
Jan Schmidt
95a06077f7 Btrfs: use delayed ref sequence numbers for all fs-tree updates
The sequence number for delayed refs is needed to postpone certain delayed
refs for a very short period while walking backrefs. Before the tree
modification log, we thought we'd only have to hold back those references
that don't have a counter operation.

While now we've the tree mod log, we're rewinding fs tree blocks to a
defined consistent state. We cannot know in advance for which tree block
we'll be doing rewind operations later. Therefore, we must postpone all the
delayed refs for fs-tree blocks, even those having a counter operation.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-30 18:18:21 +02:00
Linus Torvalds
a70f35af4e Merge branch 'for-3.5/drivers' of git://git.kernel.dk/linux-block
Pull block driver updates from Jens Axboe:
 "Here are the driver related changes for 3.5.  It contains:

   - The floppy changes from Jiri.  Jiri is now also marked as the
     maintainer of floppy.c, I shall be publically branding his forehead
     with red hot iron at the next opportune moment.

   - A batch of drbd updates and fixes from the linbit crew, as well as
     fixes from others.

   - Two small fixes for xen-blkfront courtesy of Jan."

* 'for-3.5/drivers' of git://git.kernel.dk/linux-block: (70 commits)
  floppy: take over maintainership
  floppy: remove floppy-specific O_EXCL handling
  floppy: convert to delayed work and single-thread wq
  xen-blkfront: module exit handling adjustments
  xen-blkfront: properly name all devices
  drbd: grammar fix in log message
  drbd: check MODULE for THIS_MODULE
  drbd: Restore the request restart logic
  drbd: introduce a bio_set to allocate housekeeping bios from
  drbd: remove unused define
  drbd: bm_page_async_io: properly initialize page->private
  drbd: use the newly introduced page pool for bitmap IO
  drbd: add page pool to be used for meta data IO
  drbd: allow bitmap to change during writeout from resync_finished
  drbd: fix race between drbdadm invalidate/verify and finishing resync
  drbd: fix resend/resubmit of frozen IO
  drbd: Ensure that data_size is not 0 before using data_size-1 as index
  drbd: Delay/reject other state changes while establishing a connection
  drbd: move put_ldev from __req_mod() to the endio callback
  drbd: fix WRITE_ACKED_BY_PEER_AND_SIS to not set RQ_NET_DONE
  ...
2012-05-30 09:05:47 -07:00
Chris Mason
cfc442b696 Merge branch 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next into HEAD 2012-05-30 11:55:38 -04:00
Linus Torvalds
0d167518e0 Merge branch 'for-3.5/core' of git://git.kernel.dk/linux-block
Merge block/IO core bits from Jens Axboe:
 "This is a bit bigger on the core side than usual, but that is purely
  because we decided to hold off on parts of Tejun's submission on 3.4
  to give it a bit more time to simmer.  As a consequence, it's seen a
  long cycle in for-next.

  It contains:

   - Bug fix from Dan, wrong locking type.
   - Relax splice gifting restriction from Eric.
   - A ton of updates from Tejun, primarily for blkcg.  This improves
     the code a lot, making the API nicer and cleaner, and also includes
     fixes for how we handle and tie policies and re-activate on
     switches.  The changes also include generic bug fixes.
   - A simple fix from Vivek, along with a fix for doing proper delayed
     allocation of the blkcg stats."

Fix up annoying conflict just due to different merge resolution in
Documentation/feature-removal-schedule.txt

* 'for-3.5/core' of git://git.kernel.dk/linux-block: (92 commits)
  blkcg: tg_stats_alloc_lock is an irq lock
  vmsplice: relax alignement requirements for SPLICE_F_GIFT
  blkcg: use radix tree to index blkgs from blkcg
  blkcg: fix blkcg->css ref leak in __blkg_lookup_create()
  block: fix elvpriv allocation failure handling
  block: collapse blk_alloc_request() into get_request()
  blkcg: collapse blkcg_policy_ops into blkcg_policy
  blkcg: embed struct blkg_policy_data in policy specific data
  blkcg: mass rename of blkcg API
  blkcg: style cleanups for blk-cgroup.h
  blkcg: remove blkio_group->path[]
  blkcg: blkg_rwstat_read() was missing inline
  blkcg: shoot down blkgs if all policies are deactivated
  blkcg: drop stuff unused after per-queue policy activation update
  blkcg: implement per-queue policy activation
  blkcg: add request_queue->root_blkg
  blkcg: make request_queue bypassing on allocation
  blkcg: make sure blkg_lookup() returns %NULL if @q is bypassing
  blkcg: make blkg_conf_prep() take @pol and return with queue lock held
  blkcg: remove static policy ID enums
  ...
2012-05-30 08:52:42 -07:00
Linus Torvalds
2f83766d4b IOMMU Updates for Linux 3.5
Not much stuff this time. The only change to the IOMMU core code is the
 addition of a handle to the fault handling code. A few updates to the
 AMD IOMMU driver to work around new errata. The other patches are mostly
 fixes and enhancements to the existing ARM IOMMU drivers and
 documentation updates.
 
 A new IOMMU driver for the Exynos platform was also underway but got
 merged via the Samsung tree and is not part of this tree.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPxfseAAoJECvwRC2XARrjvL4QAL39988y7ajHSI3ym3Dxovn9
 w8md63xKNlTpCB8NJPRIJpcGrE7QFtNXPFCagTqO713ulwCoKayEwKGOU7VQagFc
 0/JoHxE5usE5OuA6tyAJbpWK10kWKDzu6HjZfqF2yoa0q/REbsu65KsY7zc7HbpF
 qEAXX1xr9IC7GUM7gv75OR8CP2VJCW3+6VyhiD/37t3KpNwINMpRDO/eN/KiwoUI
 1t+/DVwO6pH5UrGReWrmjs/gcxFMzkeelt+iCA32kzkWLtyWjeWBujVWnFvVtpkz
 R4pV2T2jvs6fWPU5MMBXZRd5AvLLqcu/g/Yr21WYHz07jCcGxlCUp9qpnGLt2el0
 /YTY3LBZUQJ5sx3OSJV+oQVTtI5x0EkAiOrJ8Dx20wNAFqun9bhJb1WX0IXflmZc
 oC7SF5wjXq8pUQmX/wpGMbW7XYompypJGqlEsftJEytf4dfR6KJ2Vo1h3pHtpaex
 IaY6TqmdW44e0EgbFTM7RMNFtC7GrIY9NE+WKlrFtsHhUFrqt1NVBEcO3faU0ES6
 UAguFRPM/HAdkVmY620+DUT/JkEMemWq2jgWExLGLC9gI8L1Xj2cdU8esstuMUoV
 GGG4u9a5W1rALwg+zPCQGoVxPKmd6fpeC3U+Rmg2639chy+h4c/cBXkzfUsxe2lg
 wvMDVbjDN1Fz0c29YJit
 =K23I
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU updates from Joerg Roedel:
 "Not much stuff this time.  The only change to the IOMMU core code is
  the addition of a handle to the fault handling code.  A few updates to
  the AMD IOMMU driver to work around new errata.  The other patches are
  mostly fixes and enhancements to the existing ARM IOMMU drivers and
  documentation updates.

  A new IOMMU driver for the Exynos platform was also underway but got
  merged via the Samsung tree and is not part of this tree."

* tag 'iommu-updates-v3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  Documentation: kernel-parameters.txt Add amd_iommu_dump
  iommu/core: pass a user-provided token to fault handlers
  iommu/tegra: gart: Fix register offset correctly
  iommu: OMAP: device detach on domain destroy
  iommu: tegra/gart: Add device tree support
  iommu: tegra/gart: use correct gart_device
  iommu/tegra: smmu: Print device name correctly
  iommu/amd: Add workaround for event log erratum
  iommu/amd: Check for the right TLP prefix bit
  dma-debug: release free_entries_lock before saving stack trace
2012-05-30 08:49:28 -07:00
Dave Hansen
4523e14585 mm: fix vma_resv_map() NULL pointer
hugetlb_reserve_pages() can be used for either normal file-backed
hugetlbfs mappings, or MAP_HUGETLB.  In the MAP_HUGETLB, semi-anonymous
mode, there is not a VMA around.  The new call to resv_map_put() assumed
that there was, and resulted in a NULL pointer dereference:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
  IP: vma_resv_map+0x9/0x30
  PGD 141453067 PUD 1421e1067 PMD 0
  Oops: 0000 [#1] PREEMPT SMP
  ...
  Pid: 14006, comm: trinity-child6 Not tainted 3.4.0+ #36
  RIP: vma_resv_map+0x9/0x30
  ...
  Process trinity-child6 (pid: 14006, threadinfo ffff8801414e0000, task ffff8801414f26b0)
  Call Trace:
    resv_map_put+0xe/0x40
    hugetlb_reserve_pages+0xa6/0x1d0
    hugetlb_file_setup+0x102/0x2c0
    newseg+0x115/0x360
    ipcget+0x1ce/0x310
    sys_shmget+0x5a/0x60
    system_call_fastpath+0x16/0x1b

This was reported by Dave Jones, but was reproducible with the
libhugetlbfs test cases, so shame on me for not running them in the
first place.

With this, the oops is gone, and the output of libhugetlbfs's
run_tests.py is identical to plain 3.4 again.

[ Marked for stable, since this was introduced by commit c50ac05081
  ("hugetlb: fix resv_map leak in error path") which was also marked for
  stable ]

Reported-by: Dave Jones <davej@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@vger.kernel.org>        [2.6.32+]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-30 08:48:13 -07:00
John W. Linville
a0f68763e1 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2012-05-30 10:32:16 -04:00
Stefan Behrens
48235a68a3 Btrfs: fix false positive in check-integrity on unmount
During unmount, it could happen that the integrity checker printed a
warning message "attempt to free ... on umount which is not yet iodone"
which turned out to be a false positive.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-05-30 10:23:44 -04:00
Stefan Behrens
86ff7ffce0 Btrfs: fix runtime warning in check-integrity check data mode
If a file_extent_item was located at the very end of a leaf and there was
not enough space to hold a full item, but there was enough space to hold
one of type BTRFS_FILE_EXTENT_INLINE or PREALLOC, and it was only such a
short item, a warning was printed anyway. This check is now fixed.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-05-30 10:23:43 -04:00
Stefan Behrens
3d136a1131 Btrfs: set ioprio of scrub readahead to idle
Reduce ioprio class of scrub readahead threads to idle priority.
This setting is fixed. This priority has shown the best performance
during all measurements.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-05-30 10:23:43 -04:00
Josef Bacik
5bdbeb2187 Btrfs: fix return code in drop_objectid_items
So dpkg fsync()'s the file and the directory containing the file whenever it
writes to a file which is really slow in btrfs.  This is partly because
fsync()'ing a directory _always_ committed the transaction instead of just
going to the tree log.  This is because drop_objectid_items() would return 1
since it does a btrfs_search_slot() which returns 1.  In tree-log jargon
this means that we have to commit the transaction to be safe.  So just check
if ret is greater than 0 and set it to 0 if it does.  With this patch we now
use the tree-log instead of committing the entire transaction, which is
twice as fast on my box.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:42 -04:00
Josef Bacik
22ee6985de Btrfs: check to see if the inode is in the log before fsyncing
We have this check down in the actual logging code, but this is after we
start a transaction and all that good stuff.  So move the helper
inode_in_log() out so we can call it in fsync() and avoid starting a
transaction altogether and just exit if we've already fsync()'ed this file
recently.  You would notice this issue if you fsync()'ed a file over and
over again until the transaction committed.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:42 -04:00
Tsutomu Itoh
018642a1f1 Btrfs: return value of btrfs_read_buffer is checked correctly
btrfs_read_buffer() has the possibility of returning the error.
Therefore, I add the code in which the return value of btrfs_read_buffer()
is checked.

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
2012-05-30 10:23:41 -04:00
Stefan Behrens
733f4fbbc1 Btrfs: read device stats on mount, write modified ones during commit
The device statistics are written into the device tree with each
transaction commit. Only modified statistics are written.
When a filesystem is mounted, the device statistics for each involved
device are read from the device tree and used to initialize the
counters.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-05-30 10:23:41 -04:00
Stefan Behrens
c11d2c236c Btrfs: add ioctl to get and reset the device stats
An ioctl interface is added to get the device statistic counters.
A second ioctl is added to atomically get and reset these counters.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-05-30 10:23:40 -04:00
Stefan Behrens
442a4f6308 Btrfs: add device counters for detected IO and checksum errors
The goal is to detect when drives start to get an increased error rate,
when drives should be replaced soon. Therefore statistic counters are
added that count IO errors (read, write and flush). Additionally, the
software detected errors like checksum errors and corrupted blocks are
counted.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-05-30 10:23:39 -04:00
Asias He
d07eb91170 btrfs: Drop unused function btrfs_abort_devices()
1) This function is not used anywhere.

2) Using the blk_abort_queue() to abort the queue seems not correct.
blk_abort_queue() is used for timeout handling (block/blk-timeout.c).

Cc: Chris Mason <chris.mason@oracle.com>
Cc: linux-btrfs@vger.kernel.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Asias He <asias@redhat.com>
2012-05-30 10:23:39 -04:00
Miao Xie
762f226326 Btrfs: fix the same inode id problem when doing auto defragment
Two files in the different subvolumes may have the same inode id, so
The rb-tree which is used to manage the defragment object must take it
into account. This patch fix this problem.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
2012-05-30 10:23:38 -04:00
Josef Bacik
2adcac1a73 Btrfs: fall back to non-inline if we don't have enough space
If cow_file_range_inline fails with ENOSPC we abort the transaction which
isn't very nice.  This really shouldn't be happening anyways but there's no
sense in making it a horrible error when we can easily just go allocate
normal data space for this stuff.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:38 -04:00
Josef Bacik
8a35d95ff4 Btrfs: fix how we deal with the orphan block rsv
Ceph was hitting this race where we would remove an inode from the per-root
orphan list before we would release the space we had reserved for the inode.
We actually don't need a list or anything, we just need to make sure the
root doesn't try to free up the orphan reserve until after the inodes have
released their reservations.  So use an atomic counter instead of a list on
the root and only decrement the counter after we've released our
reservation.  I've tested this as well as several others and we no longer
see the warnings that you would see while running ceph.  Thanks,
Btrfs: fix how we deal with the orphan block rsv

Ceph was hitting this race where we would remove an inode from the per-root
orphan list before we would release the space we had reserved for the inode.
We actually don't need a list or anything, we just need to make sure the
root doesn't try to free up the orphan reserve until after the inodes have
released their reservations.  So use an atomic counter instead of a list on
the root and only decrement the counter after we've released our
reservation.  I've tested this as well as several others and we no longer
see the warnings that you would see while running ceph.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:37 -04:00
Josef Bacik
72ac3c0d79 Btrfs: convert the inode bit field to use the actual bit operations
Miao pointed this out while I was working on an orphan problem that messing
with a bitfield where different ranges are protected by different locks
doesn't work out right.  Turns out we've been doing this forever where we
have different parts of the bit field protected by either no lock at all or
different locks which could cause all sorts of weird problems including the
issue I was hitting.  So instead make a runtime_flags thing that we use the
normal bit operations on that are all atomic so we can keep having our
no/different locking for the different flags and then make force_compress
it's own thing so it can be treated normally.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:36 -04:00
Josef Bacik
cd023e7b17 Btrfs: merge contigous regions when loading free space cache
When we write out the free space cache we will write out everything that is
in our in memory tree, and then we will just walk the pinned extents tree
and write anything we see there.  The problem with this is that during
normal operations the pinned extents will be merged back into the free space
tree normally, and then we can allocate space from the merged areas and
commit them to the tree log.  If we crash and replay the tree log we will
crash again because the tree log will try to free up space from what looks
like 2 seperate but contiguous entries, since one entry is from the original
free space cache and the other was a pinned extent that was merged back.  To
fix this we just need to walk the free space tree after we load it and merge
contiguous entries back together.  This will keep the tree log stuff from
breaking and it will make the allocator behave more nicely.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:36 -04:00
Liu Bo
9ba1f6e44e Btrfs: do not do balance in readonly mode
In normal cases, we would not be allowed to do balance in RO mode.
However, when we're using a seeding device and adding another device to sprout,
things will change:

$ mkfs.btrfs /dev/sdb7
$ btrfstune -S 1 /dev/sdb7
$ mount /dev/sdb7 /mnt/btrfs -o ro
$ btrfs fi bal /mnt/btrfs   -----------------------> fail.
$ btrfs dev add /dev/sdb8 /mnt/btrfs
$ btrfs fi bal /mnt/btrfs   -----------------------> works!

It should not be designed as an exception, and we'd better add another check for
mnt flags.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Reviewed-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:35 -04:00
Liu Bo
d1ac6e41d5 Btrfs: use fastpath in extent state ops as much as possible
Fully utilize our extent state's new helper functions to use
fastpath as much as possible.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Reviewed-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:34 -04:00
Liu Bo
f8c5d0b443 Btrfs: fix wrong error returned by adding a device
Reproduce:
$ mkfs.btrfs /dev/sdb7
$ mount /dev/sdb7 /mnt/btrfs -o ro
$ btrfs dev add /dev/sdb8 /mnt/btrfs
ERROR: error adding the device '/dev/sdb8' - Invalid argument

Since we mount with readonly options, and /dev/sdb7 is not a seeding one,
a readonly notification is preferred.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Reviewed-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:34 -04:00
Josef Bacik
5fd0204355 Btrfs: finish ordered extents in their own thread
We noticed that the ordered extent completion doesn't really rely on having
a page and that it could be done independantly of ending the writeback on a
page.  This patch makes us not do the threaded endio stuff for normal
buffered writes and direct writes so we can end page writeback as soon as
possible (in irq context) and only start threads to do the ordered work when
it is actually done.  Compression needs to be reworked some to take
advantage of this as well, but atm it has to do a find_get_page in its endio
handler so it must be done in its own thread.  This makes direct writes
quite a bit faster.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:33 -04:00
Josef Bacik
4e89915220 Btrfs: do not check delalloc when updating disk_i_size
We are checking delalloc to see if it is ok to update the i_size.  There are
2 cases it stops us from updating

1) If there is delalloc between our current disk_i_size and this ordered
extent

2) If there is delalloc between our current ordered extent and the next
ordered extent

These tests are racy however since we can set delalloc for these ranges at
any time.  Also for the first case if we notice there is delalloc between
disk_i_size and our ordered extent we will not update disk_i_size and assume
that when that delalloc bit gets written out it will update everything
properly.  However if we crash before that we will have file extents outside
of our i_size, which is not good, so this test is dangerous as well as racy.
Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:33 -04:00
Jim Meyering
f60d16a892 Btrfs: avoid buffer overrun in mount option handling
There is an off-by-one error: allocating room for a maximal result
string but without room for a trailing NUL.  That, can lead to
returning a transformed string that is not NUL-terminated, and
then to a caller reading beyond end of the malloc'd buffer.

Rewrite to s/kzalloc/kmalloc/, remove unwarranted use of strncpy
(the result is guaranteed to fit), remove dead strlen at end, and
change a few variable names and comments.

Reviewed-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Jim Meyering <meyering@redhat.com>
2012-05-30 10:23:32 -04:00
Jim Meyering
a27202fbe9 Btrfs: NUL-terminate path buffer in DEV_INFO ioctl result
A device with name of length BTRFS_DEVICE_PATH_NAME_MAX or longer
would not be NUL-terminated in the DEV_INFO ioctl result buffer.

Signed-off-by: Jim Meyering <meyering@redhat.com>
2012-05-30 10:23:31 -04:00
Jim Meyering
f07c9a79f0 Btrfs: avoid buffer overrun in btrfs_printk
The buffer read-overrun would be triggered by a printk format
starting with <N>, where N is a single digit.  NUL-terminate
after strncpy.  Use memcpy, not strncpy, since we know the
string we're copying fits in the destination buffer and
contains no NUL byte.

Signed-off-by: Jim Meyering <meyering@redhat.com>
2012-05-30 10:23:31 -04:00
Daniel J Blueman
2eec6c8102 Fix minor type issues
Address some minor type issues identified by sparse checker.

Signed-off-by: Daniel J Blueman <daniel@quora.org>
2012-05-30 10:23:30 -04:00
Sergei Trofimovich
0d2450abfa btrfs: allow changing 'thread_pool' size at remount time
Changing 'mount -oremount,thread_pool=2 /' didn't make any effect:

maximum amount of worker threads is specified in 2 places:
- in 'strict btrfs_fs_info::thread_pool_size'
- in each worker struct: 'struct btrfs_workers::max_workers'

'mount -oremount' updated only 'btrfs_fs_info::thread_pool_size'.

Fix it by pushing new maximum value to all created worker structures
as well.

Cc: Josef Bacik <josef@redhat.com>
Cc: Chris Mason <chris.mason@oracle.com>
Reviewed-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
2012-05-30 10:23:30 -04:00
Josef Bacik
0885ef5b56 Btrfs: do not do filemap_write_and_wait_range in fsync
We already do the btrfs_wait_ordered_range which will do this for us, so
just remove this call so we don't call it twice.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:29 -04:00
Josef Bacik
551ebb2d34 Btrfs: remove useless waiting and extra filemap work
In btrfs_wait_ordered_range we have been calling filemap_fdata_write() twice
because compression does strange things and then waiting.  Then we look up
ordered extents and if we find any we will always schedule_timeout(); once
and then loop back around and do it all again.  We will even check to see if
there is delalloc pages on this range and loop again.  So this patch gets
rid of the multipe fdata_write() calls and just does
filemap_write_and_wait().  In the case of compression we will still find the
ordered extents and start those individually if we need to so that is ok,
but in the normal buffered case we avoid all this weird overhead.

Then in the case of the schedule_timeout(1), we don't need it.  All callers
either 1) don't care, they just want to make sure what they just wrote maeks
it to disk or 2) are doing the lock()->lookup ordered->unlock->flush thing
in which case it will lock and check for ordered extents _anyway_ so get
back to them as quickly as possible.  The delaloc check is simply not
needed, this only catches the case where we write to the file again since
doing the filemap_write_and_wait() and if the caller truly cares about that
it will take care of everything itself.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:28 -04:00
Josef Bacik
d7dbe9e7f6 Btrfs: fix compile warnings in extent_io.c
These warnings are bogus since we will always have at least one page in an
eb, but to make the compiler happy just set ret = 0 in these two cases.
Thanks,
Btrfs: fix compile warnings in extent_io.c

These warnings are bogus since we will always have at least one page in an
eb, but to make the compiler happy just set ret = 0 in these two cases.
Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:28 -04:00
Josef Bacik
30f8fe3e47 Btrfs: cache no acl on new inodes
When running compilebench I noticed we were spending some time looking up
acls on new inodes, which shouldn't be happening since there were no acls.
This is because when we init acls on the inode after creating them we don't
cache the fact there are no acls if there aren't any.  Doing this adds a
little bit of a bump to my compilebench runs.  Thanks,
Btrfs: cache no acl on new inodes

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:27 -04:00
Josef Bacik
0c4d2d95d0 Btrfs: use i_version instead of our own sequence
We've been keeping around the inode sequence number in hopes that somebody
would use it, but nobody uses it and people actually use i_version which
serves the same purpose, so use i_version where we used the incore inode's
sequence number and that way the sequence is updated properly across the
board, and not just in file write.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:27 -04:00
Jan Schmidt
20b297d620 Btrfs: tree mod log sanity checks in join_transaction
When a fresh transaction begins, the tree mod log must be clean. Users of
the tree modification log must ensure they never span across transaction
boundaries.

We reset the sequence to 0 in this safe situation to make absolutely sure
overflow can't happen.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-30 15:17:36 +02:00
Jan Schmidt
19ae4e8133 Btrfs: fs_info variable for join_transaction
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-30 15:17:35 +02:00
Jan Schmidt
8445f61cad Btrfs: use the tree modification log for backref resolving
This enables backref resolving on life trees while they are changing. This
is a prerequisite for quota groups and just nice to have for everything
else.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-30 15:17:34 +02:00
Jan Schmidt
5d9e75c41d Btrfs: add btrfs_search_old_slot
The tree modification log together with the current state of the tree gives
a consistent, old version of the tree. btrfs_search_old_slot is used to
search through this old version and return old (dummy!) extent buffers.
Naturally, this function cannot do any tree modifications.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-30 15:17:33 +02:00