Commit Graph

891 Commits (06d9a8d7c24fe22836bf0b0f82db59d6f98e271e)

Author SHA1 Message Date
Chris Mason 06d9a8d7c2 Btrfs: Change btrfs_truncate_inode_items to stop when it hits the inode
btrfs_truncate_inode_items is setup to stop doing btree searches when
it has finished removing the items for the inode.  It used to detect the
end of the inode by looking for an objectid that didn't match the
one we were searching for.

But, this would result in an extra search through the btree, which
adds extra balancing and cow costs to the operation.

This commit adds a check to see if we found the inode item, which means
we can stop searching early.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-04 09:30:58 -05:00
Chris Mason f03d9301f1 Btrfs: Don't try to compress pages past i_size
The compression code had some checks to make sure we were only
compressing bytes inside of i_size, but it wasn't catching every
case.  To make things worse, some incorrect math about the number
of bytes remaining would make it try to compress more pages than the
file really had.

The fix used here is to fall back to the non-compression code in this
case, which does all the proper cleanup of delalloc and other accounting.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-04 09:31:06 -05:00
Josef Bacik 811449496b Btrfs: join the transaction in __btrfs_setxattr
With selinux on we end up calling __btrfs_setxattr when we create an inode,
which calls btrfs_start_transaction().  The problem is we've already called
that in btrfs_new_inode, and in btrfs_start_transaction we end up doing a
wait_current_trans().  If btrfs-transaction has started committing it will wait
for all handles to finish, while the other process is waiting for the
transaction to commit.  This is fixed by using btrfs_join_transaction, which
won't wait for the transaction to commit.  Thanks,

Signed-off-by: Josef Bacik <jbacik@redhat.com>
2009-02-04 09:18:33 -05:00
Chris Ball 8c087b5183 Btrfs: Handle SGID bit when creating inodes
Before this patch, new files/dirs would ignore the SGID bit on their
parent directory and always be owned by the creating user's uid/gid.

Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-04 09:29:54 -05:00
Chris Mason bd56b30205 Btrfs: Make btrfs_drop_snapshot work in larger and more efficient chunks
Every transaction in btrfs creates a new snapshot, and then schedules the
snapshot from the last transaction for deletion.  Snapshot deletion
works by walking down the btree and dropping the reference counts
on each btree block during the walk.

If if a given leaf or node has a reference count greater than one,
the reference count is decremented and the subtree pointed to by that
node is ignored.

If the reference count is one, walking continues down into that node
or leaf, and the references of everything it points to are decremented.

The old code would try to work in small pieces, walking down the tree
until it found the lowest leaf or node to free and then returning.  This
was very friendly to the rest of the FS because it didn't have a huge
impact on other operations.

But it wouldn't always keep up with the rate that new commits added new
snapshots for deletion, and it wasn't very optimal for the extent
allocation tree because it wasn't finding leaves that were close together
on disk and processing them at the same time.

This changes things to walk down to a level 1 node and then process it
in bulk.  All the leaf pointers are sorted and the leaves are dropped
in order based on their extent number.

The extent allocation tree and commit code are now fast enough for
this kind of bulk processing to work without slowing the rest of the FS
down.  Overall it does less IO and is better able to keep up with
snapshot deletions under high load.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-04 09:27:02 -05:00
Chris Mason b4ce94de9b Btrfs: Change btree locking to use explicit blocking points
Most of the btrfs metadata operations can be protected by a spinlock,
but some operations still need to schedule.

So far, btrfs has been using a mutex along with a trylock loop,
most of the time it is able to avoid going for the full mutex, so
the trylock loop is a big performance gain.

This commit is step one for getting rid of the blocking locks entirely.
btrfs_tree_lock takes a spinlock, and the code explicitly switches
to a blocking lock when it starts an operation that can schedule.

We'll be able get rid of the blocking locks in smaller pieces over time.
Tracing allows us to find the most common cause of blocking, so we
can start with the hot spots first.

The basic idea is:

btrfs_tree_lock() returns with the spin lock held

btrfs_set_lock_blocking() sets the EXTENT_BUFFER_BLOCKING bit in
the extent buffer flags, and then drops the spin lock.  The buffer is
still considered locked by all of the btrfs code.

If btrfs_tree_lock gets the spinlock but finds the blocking bit set, it drops
the spin lock and waits on a wait queue for the blocking bit to go away.

Much of the code that needs to set the blocking bit finishes without actually
blocking a good percentage of the time.  So, an adaptive spin is still
used against the blocking bit to avoid very high context switch rates.

btrfs_clear_lock_blocking() clears the blocking bit and returns
with the spinlock held again.

btrfs_tree_unlock() can be called on either blocking or spinning locks,
it does the right thing based on the blocking bit.

ctree.c has a helper function to set/clear all the locked buffers in a
path as blocking.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-04 09:25:08 -05:00
Chris Mason c487685d7c Btrfs: hash_lock is no longer needed
Before metadata is written to disk, it is updated to reflect that writeout
has begun.  Once this update is done, the block must be cow'd before it
can be modified again.

This update was originally synchronized by using a per-fs spinlock.  Today
the buffers for the metadata blocks are locked before writeout begins,
and everyone that tests the flag has the buffer locked as well.

So, the per-fs spinlock (called hash_lock for no good reason) is no
longer required.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-04 09:24:25 -05:00
Chris Mason 3935127c50 Btrfs: disable leak debugging checks in extent_io.c
extent_io.c has debugging code to report and free leaked extent_state
and extent_buffer objects at rmmod time.  This helps track down
leaks and it saves you from rebooting just to properly remove the
kmem_cache object.

But, the code runs under a fairly expensive spinlock and the checks to
see if it is currently enabled are not entirely consistent.  Some use
#ifdef and some #if.

This changes everything to #if and disables the leak checking.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-04 09:24:05 -05:00
Chris Mason b7a9f29fcf Btrfs: sort references by byte number during btrfs_inc_ref
When a block goes through cow, we update the reference counts of
everything that block points to.  The internal pointers of the block
can be in just about any order, and it is likely to have clusters of
things that are close together and clusters of things that are not.

To help reduce the seeks that come with updating all of these reference
counts, sort them by byte number before actual updates are done.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-04 09:23:45 -05:00
Chris Mason b51912c91f Btrfs: async threads should try harder to find work
Tracing shows the delay between when an async thread goes to sleep
and when more work is added is often very short.  This commit adds
a little bit of delay and extra checking to the code right before
we schedule out.

It allows more work to be added to the worker
without requiring notifications from other procs.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-04 09:23:24 -05:00
Jim Owens 0279b4cd86 Btrfs: selinux support
Add call to LSM security initialization and save
resulting security xattr for new inodes.

Add xattr support to symlink inode ops.

Set inode->i_op for existing special files.

Signed-off-by: jim owens <jowens@hp.com>
2009-02-04 09:29:13 -05:00
Christian Hesse bef62ef339 Btrfs: make btrfs acls selectable
This patch adds a menu entry to kconfig to enable acls for btrfs.
This allows you to enable FS_POSIX_ACL at kernel compile time.

(updated by Jeff Mahoney to make the changes in fs/btrfs/Kconfig instead)

Signed-off-by: Christian Hesse <mail@earthworm.de>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
2009-02-04 09:28:28 -05:00
Chris Mason a683705153 Btrfs: Catch missed bios in the async bio submission thread
The async bio submission thread was missing some bios that were
added after it had decided there was no work left to do.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-04 09:19:41 -05:00
Chris Mason 89f135d8b5 Btrfs: fix readdir on 32 bit machines
After btrfs_readdir has gone through all the directory items, it
sets the directory f_pos to the largest possible int.  This way
applications that mix readdir with creating new files don't
end up in an endless loop finding the new directory items as they go.

It was a workaround for a bug in git, but the assumption was that if git
could make this looping mistake than it would be a common problem.

The largest possible int chosen was INT_LIMIT(typeof(file->f_pos),
and it is possible for that to be a larger number than 32 bit glibc
expects to come out of readdir.

This patches switches that to INT_LIMIT(off_t), which should keep
applications happy on 32 and 64 bit machines.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-28 15:34:27 -05:00
Chris Mason e4f722fa42 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
Fix fs/btrfs/super.c conflict around #includes
2009-01-28 20:29:43 -05:00
Chris Mason a717531942 Btrfs: do less aggressive btree readahead
Just before reading a leaf, btrfs scans the node for blocks that are
close by and reads them too.  It tries to build up a large window
of IO looking for blocks that are within a max distance from the top
and bottom of the IO window.

This patch changes things to just look for blocks within 64k of the
target block.  It will trigger less IO and make for lower latencies on
the read size.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-22 09:23:10 -05:00
Alexey Dobriyan 335debee07 fs/Kconfig: move btrfs out
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-01-22 13:15:54 +03:00
Yehuda Sadeh 1506fcc818 Btrfs: fiemap support
Now that bmap support is gone, this is the only way to get extent
mappings for userland.  These are still not valid for IO, but they
can tell us if a file has holes or how much fragmentation there is.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2009-01-21 14:39:14 -05:00
Chris Mason 35054394c4 Btrfs: stop providing a bmap operation to avoid swapfile corruptions
Swapfiles use bmap to build a list of extents belonging to the file,
and they assume these extents won't change over the life of the file.
They also use resulting list to do IO directly to the block device.

This causes problems for btrfs in a few ways:

btrfs returns logical block numbers through bmap, and these are not suitable
for IO.  They might translate to different devices, raid etc.

COW means that file block mappings are going to change frequently.

Using swapfiles on btrfs will lead to corruption, so we're avoiding the
problem for now by dropping bmap support entirely.  A later commit
will add fiemap support for people that really want to know how
a file is laid out.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-21 13:11:13 -05:00
Yan Zheng 7237f18336 Btrfs: fix tree logs parallel sync
To improve performance, btrfs_sync_log merges tree log sync
requests. But it wrongly merges sync requests for different
tree logs. If multiple tree logs are synced at the same time,
only one of them actually gets synced.

This patch has following changes to fix the bug:

Move most tree log related fields in btrfs_fs_info to
btrfs_root. This allows merging sync requests separately
for each tree log.

Don't insert root item into the log root tree immediately
after log tree is allocated. Root item for log tree is
inserted when log tree get synced for the first time. This
allows syncing the log root tree without first syncing all
log trees.

At tree-log sync, btrfs_sync_log first sync the log tree;
then updates corresponding root item in the log root tree;
sync the log root tree; then update the super block.

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
2009-01-21 12:54:03 -05:00
Qinghuang Feng 7e6628544a Btrfs: open_ctree() error handling can oops on fs_info
a bug in open_ctree:

struct btrfs_root *open_ctree(..)
{
....
	if (!extent_root || !tree_root || !fs_info ||
	    !chunk_root || !dev_root || !csum_root) {
		err = -ENOMEM;
		goto fail;
//When code flow goes to "fail", fs_info may be NULL or uninitialized.
	}
....

fail:
	btrfs_close_devices(fs_info->fs_devices);// !
	btrfs_mapping_tree_free(&fs_info->mapping_tree);// !

	kfree(extent_root);
	kfree(tree_root);
	bdi_destroy(&fs_info->bdi);// !
...
)

Signed-off-by: Qinghuang Feng <qhfeng.kernel@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-21 10:49:16 -05:00
Yan Zheng 86288a198d Btrfs: fix stop searching test in replace_one_extent
replace_one_extent searches tree leaves for references to a given extent. It
stops searching if it goes beyond the last possible position.

The last possible position is computed by adding the starting offset of a found
file extent to the full size of the extent. The code uses physical size of the
extent as the full size. This is incorrect when compression is used.

The fix is get the full size from ram_bytes field of file extent item.

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
2009-01-21 10:49:16 -05:00
Jan Engelhardt 95029d7d59 Btrfs: change/remove typedef
Change one typedef to a regular enum, and remove an unused one.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-21 10:49:16 -05:00
Huang Weiyi 653249ff9a Btrfs: remove duplicated #include
Removed duplicated #include "compat.h"in
fs/btrfs/extent-tree.c

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-21 10:49:16 -05:00
Yan Zheng 5a7be515b1 Btrfs: Fix infinite loop in btrfs_extent_post_op
btrfs_extent_post_op calls finish_current_insert and del_pending_extents. They
both may enter infinite loops.

finish_current_insert enters infinite loop if it only finds some backrefs to
update.  The fix is to check for pending backref updates before restarting the
loop.

The infinite loop in del_pending_extents is due to a the skipped variable
not being properly reset before looping around.

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
2009-01-21 10:49:16 -05:00
Yan Zheng 3dfdb9348a Btrfs: fix locking issue in btrfs_remove_block_group
We should hold the block_group_cache_lock while modifying the
block groups red-black tree. Thank you,

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
2009-01-21 10:49:16 -05:00
Qinghuang Feng c6e308713a Btrfs: simplify iteration codes
Merge list_for_each* and list_entry to list_for_each_entry*

Signed-off-by: Qinghuang Feng <qhfeng.kernel@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-21 10:59:08 -05:00
Qinghuang Feng 57506d50ed Btrfs: check return value for kthread_run() correctly
kthread_run() returns the kthread or ERR_PTR(-ENOMEM), not NULL.

Signed-off-by: Qinghuang Feng <qhfeng.kernel@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-21 10:49:16 -05:00
Roland Dreier 119e10cf1b Btrfs: Remove extra KERN_INFO in the middle of a line
The "devid <xxx> transid <xxx>" printk in btrfs_scan_one_device()
actually follows another printk that doesn't end in a newline (since the
intention is for the two printks to make one line of output), so the
KERN_INFO just ends up messing up the output:

    device label exp <6>devid 1 transid 9 /dev/sda5

Fix this by changing the extra KERN_INFO to KERN_CONT.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-21 10:49:16 -05:00
Huang Weiyi 7eaebe7d50 Btrfs: removed unused #include <version.h>'s
Removed unused #include <version.h>'s in btrfs

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-21 10:49:16 -05:00
Josef Bacik 070604040b Btrfs: cleanup xattr code
Andrew's review of the xattr code revealed some minor issues that this patch
addresses.  Just an error return fix, got rid of a useless statement and
commented one of the trickier parts of __btrfs_getxattr.

Signed-off-by: Josef Bacik <jbacik@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-21 10:49:16 -05:00
Wang Cong 19d00cc196 Btrfs: cleanup fs/btrfs/super.c::btrfs_control_ioctl()
- Remove the unused local variable 'len';
- Check return value of kmalloc().

Signed-off-by: Wang Cong <wangcong@zeuux.org>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-21 10:49:16 -05:00
Linus Torvalds 4b48d9d44e Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: fix ioctl arg size (userland incompatible change!)
  Btrfs: Clear the device->running_pending flag before bailing on congestion
2009-01-16 09:32:33 -08:00
Chris Mason c071fcfdb6 Btrfs: fix ioctl arg size (userland incompatible change!)
The structure used to send device in btrfs ioctl calls was not
properly aligned, and so 32 bit ioctls would not work properly on
64 bit kernels.

We could fix this with compat ioctls, but we're just one byte away
and it doesn't make sense at this stage to carry about the compat ioctls
forever at this stage in the project.

This patch brings the ioctl arg up to an evenly aligned 4k.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-16 11:59:08 -05:00
Chris Mason 1d9e2ae949 Btrfs: Clear the device->running_pending flag before bailing on congestion
Btrfs maintains a queue of async bio submissions so the checksumming
threads don't have to wait on get_request_wait.  In order to avoid
extra wakeups, this code has a running_pending flag that is used
to tell new submissions they don't need to wake the thread.

When the threads notice congestion on a single device, they
may decide to requeue the job and move on to other devices.  This
makes sure the running_pending flag is cleared before the
job is requeued.

It should help avoid IO stalls by making sure the task is woken up
when new submissions come in.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-16 11:58:19 -05:00
Qinghuang Feng 1bcbf31337 btrfs & squashfs: Move btrfs and squashfsto's magic number to <linux/magic.h>
Use the standard magic.h for btrfs and squashfs.

Signed-off-by: Qinghuang Feng <qhfeng.kernel@gmail.com>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-15 16:39:38 -08:00
Linus Torvalds 0176260fc3 btrfs: fix for write_super_lockfs/unlockfs error handling
Commit c4be0c1dc4 added the ability for
write_super_lockfs to return errors, and renamed them to match.  But
btrfs didn't get converted.

Do the minimal conversion to make it compile again.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-10 06:09:52 -08:00
Chris Mason e293e97e36 Btrfs: explicitly mark the tree log root for writeback
Each subvolume has an extent_state_tree used to mark metadata
that needs to be sent to disk while syncing the tree.  This is
used in addition to the dirty bits on the pages themselves so that
a single subvolume can be sent to disk efficiently in disk order.

Normally this marking happens in btrfs_alloc_free_block, which also does
special recording of dirty tree blocks for the tree log roots.

Yan Zheng noticed that when the root of the log tree is allocated, it is added
to the wrong writeback list.  The fix used here is to explicitly set
it dirty as part of tree log creation.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-09 13:14:17 -05:00
Chris Mason 755efdc3c4 Btrfs: Drop the hardware crc32c asm code
This is already in the arch specific directories in mainline and
shouldn't be copied into btrfs.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-07 19:56:59 -05:00
David Woodhouse 709ac06a14 Btrfs: Add Documentation/filesystem/btrfs.txt, remove old COPYING
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-07 09:54:24 -05:00
Chris Mason 9ab86c8e01 Btrfs: kmap_atomic(KM_USER0) is safe for btrfs_readpage_end_io_hook
None of the checksum verification code schedules, so we can use the faster
kmap_atomic

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-07 09:48:51 -05:00
Chris Mason cc7172defc Btrfs: Don't use kmap_atomic(..., KM_IRQ0) during checksum verifies
Checksum verification happens in a helper thread, and there is no
need to mess with interrupts.  This switches to kmap() instead.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-06 13:26:40 -05:00
Yan Zheng 07d400a6df Btrfs: tree logging checksum fixes
This patch contains following things.

1) Limit the max size of btrfs_ordered_sum structure to PAGE_SIZE.  This
struct is kmalloced so we want to keep it reasonable.

2) Replace copy_extent_csums by btrfs_lookup_csums_range.  This was
duplicated code in tree-log.c

3) Remove replay_one_csum. csum items are replayed at the same time as
   replaying file extents. This guarantees we only replay useful csums.

4) nbytes accounting fix.

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
2009-01-06 11:42:00 -05:00
Yan Zheng 1ba12553f3 Btrfs: don't change file extent's ram_bytes in btrfs_drop_extents
btrfs_drop_extents doesn't change file extent's ram_bytes
in the case of booked extent. To be consistent, we should
also not change ram_bytes when truncating existing extent.

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
2009-01-06 09:58:02 -05:00
Yan Zheng 180591bcfe Btrfs: Use btrfs_join_transaction to avoid deadlocks during snapshot creation
Snapshot creation happens at a specific time during transaction commit.  We
need to make sure the code called by snapshot creation doesn't wait
for the running transaction to commit.

This changes btrfs_delete_inode and finish_pending_snaps to use
btrfs_join_transaction instead of btrfs_start_transaction to avoid deadlocks.

It would be better if btrfs_delete_inode didn't use the join, but the
call path that triggers it is:

btrfs_commit_transaction->create_pending_snapshots->
create_pending_snapshot->btrfs_lookup_dentry->
fixup_tree_root_location->btrfs_read_fs_root->
btrfs_read_fs_root_no_name->btrfs_orphan_cleanup->iput

This will be fixed in a later patch by moving the orphan cleanup to the
cleaner thread.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-06 09:58:06 -05:00
Chris Mason 9ca03b997f Btrfs: drop remaining LINUX_KERNEL_VERSION checks and compat code
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-06 09:38:55 -05:00
Chris Mason 43b774ba13 Btrfs: drop EXPORT symbols from extent_io.c
They should stay out until this is turned into generic code.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-05 22:05:48 -05:00
Chris Mason d397712bcc Btrfs: Fix checkpatch.pl warnings
There were many, most are fixed now.  struct-funcs.c generates some warnings
but these are bogus.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-05 21:25:51 -05:00
Liu Hui 1f3c79a28c Btrfs: Fix free block discard calls down to the block layer
This is a patch to fix discard semantic to make Btrfs work with FTL and SSD.
We can improve FTL's performance by telling it which sectors are freed by file
system. But if we don't tell FTL the information of free sectors in proper
time, the transaction mechanism of Btrfs will be destroyed and Btrfs could not
roll back the previous transaction under the power loss condition.

There are some problems in the old implementation:
1, In __free_extent(), the pinned down extents should not be discarded.
2, In free_extents(), the free extents are all pinned, so they need to
be discarded in transaction committing time instead of free_extents().
3, The reserved extent used by log tree should be discard too.

This patch change discard behavior as follows:
1, For the extents which need to be free at once,
   we discard them in update_block_group().
2, Delay discarding the pinned extent in btrfs_finish_extent_commit()
   when committing transaction.
3, Remove discarding from free_extents() and __free_extent()
4, Add discard interface into btrfs_free_reserved_extent()
5, Discard sectors before updating the free space cache, otherwise,
   FTL will destroy file system data.
2009-01-05 15:57:51 -05:00
Yan Zheng ec051c0f92 Btrfs: avoid orphan inode caused by log replay
drop_one_dir_item does not properly update inode's link count. It can be
reproduced by executing following commands:

#touch test
#sync
#rm -f test
#dd if=/dev/zero bs=4k count=1 of=test conv=fsync
#echo b > /proc/sysrq-trigger

This fixes it by adding an BTRFS_ORPHAN_ITEM_KEY for the inode

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
2009-01-05 15:43:42 -05:00