Commit Graph

415 Commits (eead19115329c5615ba03cbaf1c3fe24c14858a3)

Author SHA1 Message Date
Nick Piggin b6af1bcd87 ocfs2: convert to new aops
Plug ocfs2 into the ->write_begin and ->write_end aops.

A bunch of custom code is now gone - the iovec iteration stuff during write
and the ocfs2 splice write actor.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:58 -07:00
Linus Torvalds efefc6eb38 Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: (75 commits)
  PM: merge device power-management source files
  sysfs: add copyrights
  kobject: update the copyrights
  kset: add some kerneldoc to help describe what these strange things are
  Driver core: rename ktype_edd and ktype_efivar
  Driver core: rename ktype_driver
  Driver core: rename ktype_device
  Driver core: rename ktype_class
  driver core: remove subsystem_init()
  sysfs: move sysfs file poll implementation to sysfs_open_dirent
  sysfs: implement sysfs_open_dirent
  sysfs: move sysfs_dirent->s_children into sysfs_dirent->s_dir
  sysfs: make sysfs_root a regular directory dirent
  sysfs: open code sysfs_attach_dentry()
  sysfs: make s_elem an anonymous union
  sysfs: make bin attr open get active reference of parent too
  sysfs: kill unnecessary NULL pointer check in sysfs_release()
  sysfs: kill unnecessary sysfs_get() in open paths
  sysfs: reposition sysfs_dirent->s_mode.
  sysfs: kill sysfs_update_file()
  ...
2007-10-12 15:49:37 -07:00
Greg Kroah-Hartman 34980ca8fa Drivers: clean up direct setting of the name of a kset
A kset should not have its name set directly, so dynamically set the
name at runtime.

This is needed to remove the static array in the kobject structure which
will be changed in a future patch.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:02 -07:00
Mark Fasheh e7b3401960 ocfs2: Optionally return filldir errors
Modify ocfs2_dir_foreach_blk() to optionally return any error from the
filldir callback. This way ocfs2_dirforeach() can terminate early, as
opposed to always passing through the entire directory. This fixes a bug
introduced during a previous code refactor where ocfs2_empty_dir() would
loop infinitely.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-12 11:54:41 -07:00
Mark Fasheh 5b6a3a2b4a ocfs2: Write support for directories with inline data
Create all new directories with OCFS2_INLINE_DATA_FL and the inline data
bytes formatted as an empty directory. Inode size field reflects the actual
amount of inline data available, which makes searching for dirent space
very similar to the regular directory search.

Inline-data directories are automatically pushed out to extents on any
insert request which is too large for the available space.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:41 -07:00
Mark Fasheh 23193e513d ocfs2: Read support for directories with inline data
This splits out extent based directory read support and implements
inline-data versions of those functions. All knowledge of inline-data versus
extent based directories is internalized. For lookups the code uses
ocfs2_find_entry_id(), full dir iterations make use of
ocfs2_dir_foreach_blk_id().

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:40 -07:00
Mark Fasheh 1afc32b952 ocfs2: Write support for inline data
This fixes up write, truncate, mmap, and RESVSP/UNRESVP to understand inline
inode data.

For the most part, the changes to the core write code can be relied on to do
the heavy lifting. Any code calling ocfs2_write_begin (including shared
writeable mmap) can count on it doing the right thing with respect to
growing inline data to an extent tree.

Size reducing truncates, including UNRESVP can simply zero that portion of
the inode block being removed. Size increasing truncatesm, including RESVP
have to be a little bit smarter and grow the inode to an extent tree if
necessary.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:40 -07:00
Mark Fasheh 6798d35a31 ocfs2: Read support for inline data
This hooks up ocfs2_readpage() to populate a page with data from an inode
block. Direct IO reads from inline data are modified to fall back to
buffered I/O. Appropriate checks are also placed in the extent map code to
avoid reading an extent list when inline data might be stored.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:39 -07:00
Mark Fasheh 15b1e36bdb ocfs2: Structure updates for inline data
Add the disk, network and memory structures needed to support data in inode.

Struct ocfs2_inline_data is defined and embedded in ocfs2_dinode for storing
inline data.

A new inode field, i_dyn_features, is added to facilitate tracking of
dynamic inode state. Since it will be used often, we want to mirror it on
ocfs2_inode_info, and transfer it via the meta data lvb.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:39 -07:00
Mark Fasheh 8553cf4f36 ocfs2: Cleanup dirent size check
The check to see if a new dirent would fit in an old one is pretty ugly, and
it's done at least twice. Clean things up by putting this in it's own
easier-to-read function.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:39 -07:00
Mark Fasheh 38760e2432 ocfs2: Rename cleanups
ocfs2_rename() does direct manipulation of the dirent it's gotten back from
a directory search. Wrap this manipulation inside of a function so that we
can transparently change directory update behavior in the future. As an
added bonus, this gets rid of an ugly macro.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:38 -07:00
Mark Fasheh be94d11704 ocfs2: Provide convenience function for ino lookup
A couple paths which needed to just match a parent dir + name pair to an
inode number were a bit messy because they had to deal with
ocfs2_find_files_on_disk() which returns a larger number of values. Provide
a convenience function, ocfs2_lookup_ino_from_name() which internalizes all
the extra accounting.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:38 -07:00
Mark Fasheh 0bfbbf62a8 ocfs2: Implement ocfs2_empty_dir() as a caller of ocfs2_dir_foreach()
We can preserve the behavior of ocfs2_empty_dir(), while getting rid of the
open coded directory walk by just providing a smart filldir callback. This
also automatically gets to use the dir readahead code, though in this case
any advantage is minor at best.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:37 -07:00
Mark Fasheh 5eae5b96fc ocfs2: Remove open coded readdir()
ocfs2_queue_orphans() has an open coded readdir loop which can easily just
use a directory accessor function.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:37 -07:00
Mark Fasheh 7e8536797d ocfs2: Pass raw u64 to filldir
filldir_t can take this, so don't turn de->inode into a 32 bit value. Right
now this doesn't make a difference since no ocfs2 inodes overflow that, but
it could be a nasty surprise later on if some kernel code is calling
ocfs2_dir_foreach_blk() and expecting real inode numbers back...

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:37 -07:00
Mark Fasheh b8bc5f4fde ocfs2: Abstract out core dir listing functionality
Put this in it's own function so that the functionality can be overridden.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:36 -07:00
Mark Fasheh 316f4b9f98 ocfs2: Move directory manipulation code into dir.c
The code for adding, removing, deleting directory entries was splattered all
over namei.c. I'd rather have this all centralized, so that it's easier to
make changes for inline dir data, and eventually indexed directories.

None of the code in any of the functions was changed. I only removed the
static keyword from some prototypes so that they could be exported.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:36 -07:00
Mark Fasheh 1d410a6e33 ocfs2: Small refactor of truncate zeroing code
We'll want to reuse most of this when pushing inline data back out to an
extent. Keeping this part as a seperate patch helps to keep the upcoming
changes for write support uncluttered.

The core portion of ocfs2_zero_cluster_pages() responsible for making sure a
page is mapped and properly dirtied is abstracted out into it's own
function, ocfs2_map_and_dirty_page(). Actual functionality doesn't change,
though zeroing becomes optional.

We also turn part of ocfs2_free_write_ctxt() into  a common function for
unlocking and freeing a page array. This operation is very common (and
uniform) for Ocfs2 cluster sizes greater than page size, so it makes sense
to keep the code in one place.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:35 -07:00
Mark Fasheh 65ed39d6ca ocfs2: move nonsparse hole-filling into ocfs2_write_begin()
By doing this, we can remove any higher level logic which has to have
knowledge of btree functionality - any callers of ocfs2_write_begin() can
now expect it to do anything necessary to prepare the inode for new data.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:35 -07:00
Mark Fasheh 92e91ce2a3 ocfs2: Sync ocfs2_fs.h with ocfs2-tools
ocfs2-tools added some on-disk fields and flags which are used by
tunefs.ocfs2.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-12 11:54:34 -07:00
Denis Cheng bddb8eb37f [PATCH] fs/ocfs2/: removed unneeded initial value and function's return value
Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-12 11:54:34 -07:00
Sunil Mushran d550071c03 ocfs2: Implement show_options()
Implement sops->show_options() so as to allow /proc/mounts to show the mount
options.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-12 11:54:33 -07:00
Mark Fasheh 19b613d410 ocfs2: Clear slot map when umounting a local volume
This is technically harmless (recovery will clean it out later), but leaves
a bogus entry in the slot_map which really shouldn't be there.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-12 11:54:33 -07:00
Mark Fasheh 015452b15f ocfs2: Remove unused structure field
c_used_tail_recs in struct ocfs2_merge_ctxt is only ever set, so we can
remove it.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-12 11:54:32 -07:00
Tao Mao 518d7269f3 ocfs2: remove unused variable
delete_tail_recs in ocfs2_try_to_merge_extent() was only ever set, remove
it.

Signed-off-by: Tao Mao <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-12 11:54:32 -07:00
Tao Mao c77534f6fb ocfs2: remove mostly unused field from insert structure
ocfs2_insert_type->ins_free_records was only used in one place, and was set
incorrectly in most places. We can free up some memory and lose some code by
removing this.

* Small warning fixup contributed by Andrew Mortom <akpm@linux-foundation.org>

Signed-off-by: Tao Mao <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-12 11:54:32 -07:00
Al Viro 782e3b3b38 Fix up more bio fallout
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-12 00:29:50 -07:00
NeilBrown 6712ecf8f6 Drop 'size' argument from bio_endio and bi_end_io
As bi_end_io is only called once when the reqeust is complete,
the 'size' argument is now redundant.  Remove it.

Now there is no need for bio_endio to subtract the size completed
from bi_size.  So don't do that either.

While we are at it, change bi_end_io to return void.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-10 09:25:57 +02:00
Sunil Mushran bda0233b89 ocfs2: Unlock mutex in local alloc failure case
The fs was not unlocking the local alloc inode mutex in the code path in
which it failed to find a window of free bits in the global bitmap.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-03 11:14:45 -07:00
Sunil Mushran 813d974c53 ocfs2: Pack vote message and response structures
The ocfs2_vote_msg and ocfs2_response_msg structs needed to be
packed to ensure similar sizeofs in 32-bit and 64-bit arches. Without this,
we had inadvertantly broken 32/64 bit cross mounts.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-09-20 15:06:10 -07:00
Mark Fasheh 5c26a7b70f ocfs2: Don't double set write parameters
The target page offsets were being incorrectly set a second time in
ocfs2_prepare_page_for_write(), which was causing problems on a 16k page
size kernel. Additionally, ocfs2_write_failure() was incorrectly using those
parameters instead of the parameters for the individual page being cleaned
up.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-09-20 15:06:10 -07:00
Mark Fasheh db56246c69 ocfs2: Fix pos/len passed to ocfs2_write_cluster
This was broken for file systems whose cluster size is greater than page
size. Pos needs to be incremented as we loop through the descriptors, and
len needs to be capped to the size of a single cluster.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-09-20 15:06:09 -07:00
Mark Fasheh 415cb80037 ocfs2: Allow smaller allocations during large writes
The ocfs2 write code loops through a page much like the block code, except
that ocfs2 allocation units can be any size, including larger than page
size. Typically it's equal to or larger than page size - most kernels run 4k
pages, the minimum ocfs2 allocation (cluster) size.

Some changes introduced during 2.6.23 changed the way writes to pages are
handled, and inadvertantly broke support for > 4k page size. Instead of just
writing one cluster at a time, we now handle the whole page in one pass.

This means that multiple (small) seperate allocations might happen in the
same pass. The allocation code howver typically optimizes by getting the
maximum which was reserved. This triggered a BUG_ON in the extend code where
it'd ask for a single bit (for one part of a > 4k page) and get back more
than it asked for.

Fix this by providing a variant of the high level allocation function which
allows the caller to specify a maximum. The traditional function remains and
just calls the new one with a maximum determined from the initial
reservation.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-09-20 15:06:09 -07:00
Mark Fasheh e535e2efd2 ocfs2: Fix calculation of i_blocks during truncate
We were setting i_blocks too early - before truncating any allocation.
Correct things to set i_blocks after the allocation change.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-09-11 11:39:46 -07:00
tao.ma@oracle.com 30b8548f2c [PATCH] ocfs2: Fix a wrong cluster calculation.
In ocfs2_alloc_write_write_ctxt, the written clusters length is calculated
by the byte length only. This may cause some problems if we start to write
at some position in the end of one cluster and last to a second cluster
while the "len" is smaller than a cluster size. In that case, we have to
write 2 clusters actually.
So we have to take the start position into consideration also.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-09-11 11:39:05 -07:00
Tiger Yang c0123adef6 [PATCH] ocfs2: fix mount option parsing
For some mount option types, ocfs2_parse_options() will try to access
sb->s_fs_info to get at the ocfs2 private superblock. Unfortunately, that
hasn't been allocated yet and will cause a kernel crash.

Fix this by storing options in a struct which can then get pushed into the
ocfs2_super once it's been allocated later. If we need more options which
store to the ocfs2_super in the future, we can just fields to this struct.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-09-11 11:38:48 -07:00
Mark Fasheh e0dceaf0a4 ocfs2: set non-default s_time_gran during mount
We need to manually set this to '1' during mount, otherwise inode_setattr()
will chop off the nanosecond portion of our timestamps.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-08-09 17:27:58 -07:00
Sunil Mushran ce17204ae6 ocfs2: Retry sendpage() if it returns EAGAIN
Instead of treating EAGAIN, returned from sendpage(), as an error, this
patch retries the operation.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-08-09 17:27:38 -07:00
Sunil Mushran 480214d71f ocfs2: Fix rename/extend race
If one process is extending a file while another is renaming it, there
exists a window when rename could flush the old inode's stale i_size to
disk. This patch recognizes the fact that rename is only updating the old
inode's ctime, so it ensures only that value is flushed to disk.

Signed-off-by: Sunil Mushran <sunil.musran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-08-09 17:27:10 -07:00
Adrian Bunk 6a18380e7d [2.6 patch] ocfs2_insert_extent(): remove dead code
This patch removes some now dead code.

Spotted by the Coverity checker.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-08-09 17:26:03 -07:00
Mark Fasheh 5a25403175 ocfs2: Fix max offset calculations
ocfs2_max_file_offset() was over-estimating the largest file size for
several cases. This wasn't really a problem before, but now that we support
sparse files, it needs to be more accurate.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-08-09 17:25:49 -07:00
Mark Fasheh ce76fd30ce ocfs2: check ia_size limits in setattr
We have to manually check the requested truncate size as the check in
vmtruncate() comes too late for Ocfs2.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-08-09 17:25:38 -07:00
Mark Fasheh 7c08d70c69 ocfs2: Fix some casting errors related to file writes
ocfs2_align_clusters_to_page_index() needs to cast the clusters shift to
pgoff_t and ocfs2_file_buffered_write() needs loff_t when calculating
destination start for memcpy.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-08-09 17:25:27 -07:00
Mark Fasheh a00cce356b ocfs2: use s_maxbytes directly in ocfs2_change_file_space()
There's no need to recalculate things via ocfs2_max_file_offset() as we've
already done that to fill s_maxbytes, so use that instead. We can also
un-export ocfs2_max_file_offset() then.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-08-09 17:25:07 -07:00
Mark Fasheh c11e9fafb3 ocfs2: Restrict inode changes in ocfs2_update_inode_atime()
ocfs2_update_inode_atime() calls ocfs2_mark_inode_dirty() to push changes
from the struct inode into the ocfs2 disk inode. The problem is,
ocfs2_mark_inode_dirty() might change other fields, depending on what
happened to the struct inode. Since we don't always have locking to
serialize changes to other fields (like i_size, etc), just fix things up to
only touch the atime field.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-08-09 17:23:50 -07:00
Jens Axboe 3836df6b52 ocfs2: bad kunmap_atomic()
kunmap_atomic() takes the virtual address, not the mapped page as
argument.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-24 16:02:55 -07:00
Nick Piggin 1833633803 fix some conversion overflows
Fix page index to offset conversion overflows in buffer layer, ecryptfs,
and ocfs2.

It would be nice to convert the whole tree to page_offset, but for now
just fix the bugs.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-20 08:44:19 -07:00
Paul Mundt 20c2df83d2 mm: Remove slab destructors from kmem_cache_create().
Slab destructors were no longer supported after Christoph's
c59def9f22 change. They've been
BUGs for both slab and slub, and slob never supported them
either.

This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-07-20 10:11:58 +09:00
Linus Torvalds f745bb1c73 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
  ocfs2: ->fallocate() support
2007-07-19 14:16:44 -07:00
Nick Piggin d0217ac04c mm: fault feedback #1
Change ->fault prototype.  We now return an int, which contains
VM_FAULT_xxx code in the low byte, and FAULT_RET_xxx code in the next byte.
 FAULT_RET_ code tells the VM whether a page was found, whether it has been
locked, and potentially other things.  This is not quite the way he wanted
it yet, but that's changed in the next patch (which requires changes to
arch code).

This means we no longer set VM_CAN_INVALIDATE in the vma in order to say
that a page is locked which requires filemap_nopage to go away (because we
can no longer remain backward compatible without that flag), but we were
going to do that anyway.

struct fault_data is renamed to struct vm_fault as Linus asked. address
is now a void __user * that we should firmly encourage drivers not to use
without really good reason.

The page is now returned via a page pointer in the vm_fault struct.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:41 -07:00