Commit Graph

638 Commits (08413899db89d8d636c2a2d4ba5c356ab587d7ef)

Author SHA1 Message Date
Tao Ma 4338ab6a75 ocfs2: Fix an endian bug in online resize.
In ocfs2_group_add, 'cr' is a disk field of type 'ocfs2_chain_rec', and we
were putting cpu byteorder values into it. Swap things to the right endian
before storing.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-10 15:14:07 -07:00
Jan Engelhardt 90d99779a4 [PATCH] [OCFS2]: constify function pointer tables
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-10 15:13:57 -07:00
Joel Becker 0f71b7b40f ocfs2: Fix endian bug in o2dlm protocol negotiation.
struct dlm_query_join_packet is made up of four one-byte fields.  They
are effectively in big-endian order already.  However, little-endian
machines swap them before putting the packet on the wire (because
query_join's response is a status, and that status is treated as a u32
on the wire).  Thus, a big-endian and little-endian machines will
treat this structure differently.

The solution is to have little-endian machines swap the structure when
converting from the structure to the u32 representation.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-10 15:13:54 -07:00
Tao Ma 2af37ce82d ocfs2: Use dlm_print_one_lock_resource for lock resource print
__dlm_print_one_lock_resource must be called with spin_lock
the res->spinlock. While in some cases, we use it without this
precondition and lead to the failure of assert_spin_locked.
So call dlm_print_one_lock_resource instead.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-10 15:13:50 -07:00
Andrew Morton 3a4780a85d [PATCH] fs/ocfs2/dlm/dlmdomain.c: fix printk warning
fs/ocfs2/dlm/dlmdomain.c: In function 'dlm_send_join_cancels':
fs/ocfs2/dlm/dlmdomain.c:983: warning: format '%u' expects type 'unsigned int', but argument 7 has type 'long unsigned int'

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-10 15:13:39 -07:00
Julia Lawall 86c838b03d [PATCH] fs/ocfs2/aops.c: Correct use of ! and &
In commit e6bafba5b4, a bug was fixed that
involved converting !x & y to !(x & y).  The code below shows the same
pattern, and thus should perhaps be fixed in the same way.

This is not tested and clearly changes the semantics, so it is only
something to consider.

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-03 15:50:21 -08:00
Adrian Bunk 05488bbebe [2.6 patch] ocfs2: make dlm_do_assert_master() static
This patch makes the needlessly global dlm_do_assert_master() static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-03 15:50:21 -08:00
Adrian Bunk 200bfae37a [2.6 patch] make ocfs2_downconvert_thread() static
This patch makes the needlessly global ocfs2_downconvert_thread()
static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-03 15:50:21 -08:00
Adrian Bunk 006000566d [2.6 patch] fs/ocfs2/: possible cleanups
This patch contains the following cleanups that are now possible:
- make the following needlessly global functions static:
  - dlmglue.c:ocfs2_process_blocked_lock()
  - heartbeat.c:ocfs2_node_map_init()
- #if 0 the following unused global function plus support functions:
  - heartbeat.c:ocfs2_node_map_is_only()

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-03 15:50:21 -08:00
Marcin Slusarz 0dd3256e04 [PATCH] ocfs2: le*_add_cpu conversion
replace all:
little_endian_variable = cpu_to_leX(leX_to_cpu(little_endian_variable) +
					expression_in_cpu_byteorder);
with:
	leX_add_cpu(&little_endian_variable, expression_in_cpu_byteorder);
generated with semantic patch

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-03 15:50:21 -08:00
Mark Fasheh 1044e401af ocfs2: Fix writeout in ocfs2_data_convert_worker()
Commit f1f540688e "optimized"
ocfs2_data_convert_worker() to "only do work for regular files".
Unfortunately, I left out a '!', which casued it to *skip* regular files.
This was hidden from testing until recently because the default data
journaling mode (data=ordered) doesn't exercise this code.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2008-03-03 15:50:21 -08:00
Sunil Mushran 7ad8b3d30e ocfs2: Enable localalloc for local mounts
Commit 2fbe8d1ebe disabled localalloc
for local mounts. This caused issues as ocfs2 uses localalloc to
provide write locality. This patch enables localalloc for local mounts.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-03 15:50:21 -08:00
Marcin Slusarz 8b5f688368 byteorder: move le32_add_cpu & friends from OCFS2 to core
This patchset moves le*_add_cpu and be*_add_cpu functions from OCFS2 to core
header (1st), converts ext3 filesystem to this API (2nd) and replaces XFS
different named functions with new ones (3rd).

There are many places where these functions will be useful.  Just look at:
grep -r 'cpu_to_[ble12346]*([ble12346]*_to_cpu.*[-+]' linux-src/ Patch for
ext3 is an example how conversions will probably look like.

This patch:

- move inline functions which add native byte order variable to
  little/big endian variable to core header
  * le16_add_cpu(__le16 *var, u16 val)
  * le32_add_cpu(__le32 *var, u32 val)
  * le64_add_cpu(__le64 *var, u64 val)
  * be32_add_cpu(__be32 *var, u32 val)
- add for completeness:
  * be16_add_cpu(__be16 *var, u16 val)
  * be64_add_cpu(__be64 *var, u64 val)

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Acked-by: Mark Fasheh <mark.fasheh@oracle.com>
Cc: David Chinner <dgc@sgi.com>
Cc: Timothy Shimmin <tes@sgi.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:32 -08:00
Joel Becker d24fbcda0c ocfs2: Negotiate locking protocol versions.
Currently, when ocfs2 nodes connect via TCP, they advertise their
compatibility level.  If the versions do not match, two nodes cannot speak
to each other and they disconnect. As a result, this provides no forward or
backwards compatibility.

This patch implements a simple protocol negotiation at the dlm level by
introducing a major/minor version number scheme for entities that
communicate.  Specifically, o2dlm has a major/minor version for interaction
with o2dlm on other nodes, and ocfs2 itself has a major/minor version for
interacting with the filesystem on other nodes.

This will allow rolling upgrades of ocfs2 clusters when changes to the
locking or network protocols can be done in a backwards compatible manner.
In those cases, only the minor number is changed and the negotatied protocol
minor is returned from dlm join. In the far less likely event that a
required protocol change makes backwards compatibility impossible, we simply
bump the major number.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-02-06 16:11:29 -08:00
Christoph Lameter eebd2aa355 Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user
Simplify page cache zeroing of segments of pages through 3 functions

zero_user_segments(page, start1, end1, start2, end2)

        Zeros two segments of the page. It takes the position where to
        start and end the zeroing which avoids length calculations and
	makes code clearer.

zero_user_segment(page, start, end)

        Same for a single segment.

zero_user(page, start, length)

        Length variant for the case where we know the length.

We remove the zero_user_page macro. Issues:

1. Its a macro. Inline functions are preferable.

2. The KM_USER0 macro is only defined for HIGHMEM.

   Having to treat this special case everywhere makes the
   code needlessly complex. The parameter for zeroing is always
   KM_USER0 except in one single case that we open code.

Avoiding KM_USER0 makes a lot of code not having to be dealing
with the special casing for HIGHMEM anymore. Dealing with
kmap is only necessary for HIGHMEM configurations. In those
configurations we use KM_USER0 like we do for a series of other
functions defined in highmem.h.

Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
function could not be a macro. zero_user_* functions introduced
here can be be inline because that constant is not used when these
functions are called.

Also extract the flushing of the caches to be outside of the kmap.

[akpm@linux-foundation.org: fix nfs and ntfs build]
[akpm@linux-foundation.org: fix ntfs build some more]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: David Chinner <dgc@sgi.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:13 -08:00
Joe Perches c78bad11fb fs/: Spelling fixes
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 17:33:42 +02:00
Joel Becker 6b11d8179d ocfs2: Fix userspace ABI breakage in sysfs
The userspace ABI of ocfs2's internal cluster stack (o2cb) was broken by
commit c60b717879 "kset: convert ocfs2 to
use kset_create".  Specifically, the '/sys/o2cb' kset was moved to
'/sys/fs/o2cb'.  This breaks all ocfs2 tools and renders the
filesystem unmountable.

This fix moves '/sys/o2cb' back where it belongs.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-28 19:10:23 -08:00
Mark Fasheh 2fe5c1d7eb ocfs2: clean up bh null checks
If we know a buffer_head is non-null, then brelse() is unnecessary and
put_bh() can be used instead. Also, an explicit check for NULL is
unnecessary when using brelse(). This patch only covers buffer_head_io.c and
resize.c, which have recently added code which exhibits this problem.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:48 -08:00
Mark Fasheh 7ec373cf33 ocfs2: document access rules for blocked_lock_list
ocfs2_super->blocked_lock_list and ocfs2_super->blocked_lock_count have some
usage restrictions which aren't immediately obvious to anyone reading the
code. It's a good idea to document this so that we avoid making costly
mistakes in the future.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:48 -08:00
Mark Fasheh 0e5ae03203 ocfs2: bump version number
Bump the printed version to 1.5.0. This helps us quickly identify which
version of Ocfs2 a bug filer is running.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:46 -08:00
Tao Ma 2d4b1cbb44 ocfs2/dlm: Clear joining_node on hearbeat node down
Currently the process of dlm join contains 2 steps: query join and assert join.
After query join, the joined node will set its joining_node. So if the joining
node happens to panic before the 2nd step, the joined node will fail to clear
its joining_node flag because that node isn't in the domain map. It at least
cause 2 problems.
1. All the new join request will fail. So no new node can mount the volume.
2. The joined node can't umount the volume since during the umount process it
   has to wait for the joining_node to be unknown. So the umount will be hanged.

The solution is to clear the joining_node before we check the domain map.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:46 -08:00
Marcin Slusarz 4092d49f70 ocfs2: convert byte order of constant instead of variable
Convert byte order of constant instead of variable it will be done at
compile time vs run time. Remove unused le32_and_cpu.

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:46 -08:00
Sunil Mushran 17104683d2 ocfs2: Update default cluster timeouts
Lots of people are having trouble with the default timeouts, which are too
low. These new values are derived from an informal survey taken on
ocfs2-users, as well as data from bug reports. This should reduce the amount
of cluster disconnects and subsequent fencing seen during normal workloads.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:45 -08:00
Jan Kara 634bf74d1e ocfs2: printf fixes
Explicitely convert loff_t to long long in printf. Just for sure...

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:45 -08:00
Jan Kara 32c3c0e2e5 ocfs2: Use generic_file_llseek
We should use generic_file_llseek() and not default_llseek() so that
s_maxbytes gets properly checked when seeking.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:45 -08:00
Jan Kara d2849fb294 ocfs2: Safer read_inline_data()
In ocfs2_read_inline_data() we should store file size in loff_t. Although
the file size should fit in 32 bits we cannot be sure in case filesystem is
corrupted.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:44 -08:00
Jan Kara 5fa0613ea5 ocfs2: Silence false lockdep warnings
Create separate lockdep lock classes for system file's i_mutexes. They are
used to guard allocations and similar things and thus rank differently
than i_mutex of a regular file or directory.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:44 -08:00
Mark Fasheh 53fc622b9e [PATCH 2/2] ocfs2: cluster aware flock()
Hook up ocfs2_flock(), using the new flock lock type in dlmglue.c. A new
mount option, "localflocks" is added so that users can revert to old
functionality as need be.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:43 -08:00
Mark Fasheh cf8e06f1a8 [PATCH 1/2] ocfs2: add flock lock type
This adds a new dlmglue lock type which is intended to back flock()
requests.

Since these locks are driven from userspace, usage rules are much more
liberal than the typical Ocfs2 internal cluster lock. As a result, we can't
make use of most dlmglue features - lock caching and lock level
optimizations in particular. Additionally, userspace is free to deadlock
itself, so we have to deal with that in the same way as the rest of the
kernel - by allowing a signal to abort a lock request.

In order to keep ocfs2_cluster_lock() complexity down, ocfs2_file_lock()
does it's own dlm coordination. We still use the same helper functions
though, so duplicated code is kept to a minimum.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:43 -08:00
Sunil Mushran 2fbe8d1ebe ocfs2: Local alloc window size changeable via mount option
Local alloc is a performance optimization in ocfs2 in which a node
takes a window of bits from the global bitmap and then uses that for
all small local allocations. This window size is fixed to 8MB currently.
This patch allows users to specify the window size in MB including
disabling it by passing in 0. If the number specified is too large,
the fs will use the default value of 8MB.

mount -o localalloc=X /dev/sdX /mntpoint

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:43 -08:00
Mark Fasheh d147b3d630 ocfs2: Support commit= mount option
Mostly taken from ext3. This allows the user to set the jbd commit interval,
in seconds. The default of 5 seconds stays the same, but now users can
easily increase the commit interval. Typically, this would be increased in
order to benefit performance at the expense of data-safety.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:42 -08:00
Mark Fasheh 0957f00796 ocfs2: Add missing permission checks
Check that an online resize is being driven by a user with permission to
change system resource limits.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:19 -08:00
Tao Ma 7909f2bf83 [PATCH 2/2] ocfs2: Implement group add for online resize
This patch adds the ability for a userspace program to request that a
properly formatted cluster group be added to the main allocation bitmap for
an Ocfs2 file system. The request is made via an ioctl, OCFS2_IOC_GROUP_ADD.
On a high level, this is similar to ext3, but we use a different ioctl as
the structure which has to be passed through is different.

During an online resize, tunefs.ocfs2 will format any new cluster groups
which must be added to complete the resize, and call OCFS2_IOC_GROUP_ADD on
each one. Kernel verifies that the core cluster group information is valid
and then does the work of linking it into the global allocation bitmap.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:04:24 -08:00
Tao Ma d659072f73 [PATCH 1/2] ocfs2: Add group extend for online resize
This patch adds the ability for a userspace program to request an extend of
last cluster group on an Ocfs2 file system. The request is made via ioctl,
OCFS2_IOC_GROUP_EXTEND. This is derived from EXT3_IOC_GROUP_EXTEND, but is
obviously Ocfs2 specific.

tunefs.ocfs2 would call this for an online-resize operation if the last
cluster group isn't full.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 14:53:35 -08:00
Tao Ma e9d578a8f2 ocfs2: Initalize bitmap_cpg of ocfs2_super to be the maximum.
This value is initialized from global_bitmap->id2.i_chain.cl_cpg. If there
is only 1 group, it will be equal to the total clusters in the volume. So
as for online resize, it should change for all the nodes in the cluster.
It isn't easy and there is no corresponding lock for it.

bitmap_cpg is only used in 2 areas:
1. Check whether the suballoc is too large for us to allocate from the global
   bitmap, so it is little used. And now the suballoc size is 2048, it rarely
   meet this situation and the check is almost useless.
2. Calculate which group a cluster belongs to. We use it during truncate to
   figure out which cluster group an extent belongs too. But we should be OK
   if we increase it though as the cluster group calculated shouldn't change
   and we only ever have a small bitmap_cpg on file systems with a single
   cluster group.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 14:48:54 -08:00
Mark Fasheh 628a24f5bd ocfs2: Readpages support
Add ->readpages support to Ocfs2. This is rather trivial - all it required
is a small update to ocfs2_get_block (for mapping full extents via b_size)
and an ocfs2_readpages() function which partially mirrors ocfs2_readpage().

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 14:48:12 -08:00
Mark Fasheh e63aecb651 ocfs2: Rename ocfs2_meta_[un]lock
Call this the "inode_lock" now, since it covers both data and meta data.
This patch makes no functional changes.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 14:46:01 -08:00
Mark Fasheh c934a92d05 ocfs2: Remove data locks
The meta lock now covers both meta data and data, so this just removes the
now-redundant data lock.

Combining locks saves us a round of lock mastery per inode and one less lock
to ping between nodes during read/write.

We don't lose much - since meta locks were always held before a data lock
(and at the same level) ordered writeout mode (the default) ensured that
flushing for the meta data lock also pushed out data anyways.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 14:45:57 -08:00
Mark Fasheh f1f540688e ocfs2: Add data downconvert worker to inode lock
In order to extend inode lock coverage to inode data, we use the same data
downconvert worker with only a small modification to only do work for
regular files.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 14:45:54 -08:00
Mark Fasheh 34d024f843 ocfs2: Remove mount/unmount votes
The node maps that are set/unset by these votes are no longer relevant, thus
we can remove the mount and umount votes. Since those are the last two
remaining votes, we can also remove the entire vote infrastructure.

The vote thread has been renamed to the downconvert thread, and the small
amount of functionality related to managing it has been moved into
fs/ocfs2/dlmglue.c. All references to votes have been removed or updated.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 14:45:34 -08:00
Mark Fasheh 6f7b056ea9 ocfs2: Remove fs dependency on ocfs2_heartbeat module
Now that the dlm exposes domain information to us, we don't need generic
node up / node down callbacks. And since the DLM is only telling us when a
node goes down unexpectedly, we no longer need to optimize away node down
callbacks via the umount map.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 14:36:40 -08:00
Mark Fasheh 6561168cb4 ocfs2_dlm: Call node eviction callbacks from heartbeat handler
With this, a dlm client can take advantage of the group protocol in the dlm
to get full notification whenever a node within the dlm domain leaves
unexpectedly.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 14:36:40 -08:00
Greg Kroah-Hartman c60b717879 kset: convert ocfs2 to use kset_create
Dynamically create the kset instead of declaring it statically.

Also use the new kobj_attribute which cleans up this file a _lot_.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:23 -08:00
Greg Kroah-Hartman 3514faca19 kobject: remove struct kobj_type from struct kset
We don't need a "default" ktype for a kset.  We should set this
explicitly every time for each kset.  This change is needed so that we
can make ksets dynamic, and cleans up one of the odd, undocumented
assumption that the kset/kobject/ktype model has.

This patch is based on a lot of help from Kay Sievers.

Nasty bug in the block code was found by Dave Young
<hidave.darkstar@gmail.com>

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:10 -08:00
Mark Fasheh e8aed3450c ocfs2: Re-journal buffers after transaction extend
ocfs2_extend_trans() might call journal_restart() which will commit dirty
buffers and then restart the transaction. This means that any buffers which
still need changes should be passed to journal_access() again. Some paths
during extend weren't doing this right.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-12-17 10:51:23 -08:00
Mark Fasheh 0879c584ff ocfs2: Allow for debugging of transaction extends
The nastiest cases of transaction extends are also the rarest. We can expose
them more quickly at the expense of performance by going straight to the
journal_restart() in ocfs2_extend_trans(). Wrap things in OCFS2_DEBUG_FS so
that we only do this when "expensive debugging" is turned on.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-12-17 10:51:14 -08:00
Mark Fasheh 92295d8054 ocfs2: Don't panic when truncating an empty extent
This BUG_ON() was unintentionally left in after the sparse file support was
written.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-12-17 10:51:04 -08:00
Mark Fasheh a86370fbb6 ocfs2: fix exit-while-locked bug in ocfs2_queue_orphans()
We're holding the cluster lock when a failure might happen in
ocfs2_dir_foreach() so it needs to be released.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-12-17 10:49:43 -08:00
Al Viro 97bd7919e2 remove nonsense force-casts from ocfs2
endianness annotations in networking code had been in place for quite a
while; in particular, sin_port and s_addr are annotated as big-endian.

Code in ocfs2 had __force casts added apparently to shut the sparse
warnings up; of course, these days they only serve to *produce* warnings
for no reason whatsoever...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-05 09:25:20 -08:00
Mark Fasheh b1967d0edd ocfs2: reverse inline-data truncate args
ocfs2_truncate() and ocfs2_remove_inode_range() had reversed their "set
i_size" arguments to ocfs2_truncate_inline(). Fix things so that truncate
sets i_size, and punching a hole ignores it.

This exposed a problem where punching a hole in an inline-data file wasn't
updating the page cache, so fix that too.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27 16:47:03 -08:00
Mark Fasheh 0d8a4e0cd6 ocfs2: Fix comparison in ocfs2_size_fits_inline_data()
This was causing us to prematurely push out inline data by one byte.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27 16:47:03 -08:00
Mark Fasheh bccb9dad89 ocfs2: Remove bug statement in ocfs2_dentry_iput()
The existing bug statement didn't take into account unhashed dentries which
might not have a cluster lock on them. This could happen if a node exporting
the file system via NFS is rebooted, re-exported to nfs clients and then
unmounted. It's fine in this case to not have a dentry cluster lock.

Just remove the bug statement and replace it with an error print, which
does the proper checks. Though we want to know if something has happened
which might have prevented a cluster lock from being created, it's
definitely not necessary to panic the machine for this.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27 16:47:02 -08:00
Jan Kara 5a58c3ef22 [PATCH] ocfs2: Remove expensive bitmap scanning
Enable expensive bitmap scanning only if DEBUG option is enabled.
The bitmap scanning quite loads the CPU and on my machine the write
throughput of dd if=/dev/zero of=/ocfs2/file bs=1M count=500 conv=sync
improves from 37 MB/s to 45.4 MB/s in local mode...

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27 16:47:02 -08:00
Mark Fasheh a46043e08f ocfs2: log valid inode # on bad inode
If the inode block isn't valid then we don't want to print the value from
that, instead print the block number which was passed in (which should
always be correct). Also, turn this into a debug print for now - folks who
hit an actual problem always have other logs indicating what the source is.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27 16:47:02 -08:00
Mark Fasheh ef9f86ceb6 ocfs2: Filter -ENOSPC in mlog_errno()
It's almost never worth printing in that situation and we keep forgetting to
manually filter it out.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27 16:47:01 -08:00
Joe Perches 2759236f84 [PATCH] fs/ocfs2: Add missing "space"
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27 16:47:01 -08:00
Mark Fasheh e001e796e4 ocfs2: Reset journal parameters after s_mount_opt update
Right now we're just setting them from the existing parameters, not the
new ones that a remount specified.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27 16:47:01 -08:00
Trond Myklebust 91cf45f02a [NET]: Add the helper kernel_sock_shutdown()
...and fix a couple of bugs in the NBD, CIFS and OCFS2 socket handlers.

Looking at the sock->op->shutdown() handlers, it looks as if all of them
take a SHUT_RD/SHUT_WR/SHUT_RDWR argument instead of the
RCV_SHUTDOWN/SEND_SHUTDOWN arguments.
Add a helper, and then define the SHUT_* enum to ensure that kernel users
of shutdown() don't get confused.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Mark Fasheh <mark.fasheh@oracle.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-12 18:10:39 -08:00
Srinivas Eeda e325a88f17 ocfs2: fix rename vs unlink race
If another node unlinks the destination while ocfs2_rename() is waiting on a
cluster lock, ocfs2_rename() simply logs an error and continues. This causes
a crash because the renaming node is now trying to delete a non-existent
inode. The correct solution is to return -ENOENT.

Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-06 15:35:40 -08:00
Jan Kara bc7e97cbdd [PATCH] Fix possibly too long write in o2hb_setup_one_bio()
We should subtract start of our IO from PAGE_CACHE_SIZE to get the right
length of the write we want to perform.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-06 15:35:35 -08:00
Mark Fasheh 4e9563fd55 ocfs2: fix write() performance regression
On file systems which don't support sparse files, Ocfs2_map_page_blocks()
was reading blocks on appending writes. This caused write performance to
suffer dramatically. Fix this by detecting an appending write on a nonsparse
fs and skipping the read.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-06 15:35:29 -08:00
Mark Fasheh 9ea2d32f40 ocfs2: Commit journal on sync writes
We're missing a meta data commit for extending sync writes. In thoery, write
could return with the meta data required to read the data uncommitted to
disk. Fix that by detecting an allocating write and forcing a journal commit
in the sync case.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-06 15:32:00 -08:00
Mark Fasheh 9f70968af3 ocfs2: Re-order iput in ocfs2_drop_dentry_lock
Do this to avoid a theoretical (I haven't seen this in practice) race where
the downconvert thread might drop the dentry lock, allowing a remote unlink
to proceed before dropping the inode locks. This could bounce access to the
orphan dir between nodes.

There doesn't seem to be a need to do the same in ocfs2_dentry_iput() as
that's never called for the last ref drop from the downconvert thread.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-06 15:31:52 -08:00
Mark Fasheh 019d1b2247 ocfs2: Create locks at initially requested level
If we have not yet created a cluster lock, ocfs2_cluster_lock() will
first create it at NLMODE, and then convert the lock to either PRMODE or
EXMODE (whichever is requested).

Change ocfs2_cluster_lock() to just create the lock at the initially
requested level. ocfs2_locking_ast() handles this case fine, so the only
update required was in setup of locking state. This should reduce the number
of network messages required for a new lock by one, providing an incremental
performance enhancement.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-06 15:31:45 -08:00
Roel Kluin 3cf0c507dd [PATCH] Fix priority mistakes in fs/ocfs2/{alloc.c, dlmglue.c}
Fixes priority mistakes similar to '!x & y'

Signed-off-by: Roel Kluin <12o3l@tiscali.nl>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-06 15:31:39 -08:00
Adrian Bunk 0af4bd3887 [2.6 patch] make ocfs2_find_entry_el() static
ocfs2_find_entry_el() can become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-06 15:31:06 -08:00
Christoph Hellwig 3965516440 exportfs: make struct export_operations const
Now that nfsd has stopped writing to the find_exported_dentry member we an
mark the export_operations const

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Neil Brown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: <linux-ext4@vger.kernel.org>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: David Chinner <dgc@sgi.com>
Cc: Timothy Shimmin <tes@sgi.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Chris Mason <mason@suse.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: "Vladimir V. Saveliev" <vs@namesys.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22 08:13:21 -07:00
Christoph Hellwig 644f9ab3b0 ocfs2: new export ops
OCFS2 has it's own 64bit-firendly filehandle format so we can't use the
generic helpers here.  I'll add a struct for the types later.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Neil Brown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22 08:13:20 -07:00
Robert P. J. Day 3a4fa0a25d Fix misspellings of "system", "controller", "interrupt" and "necessary".
Fix the various misspellings of "system", controller", "interrupt" and
"[un]necessary".

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-10-19 23:10:43 +02:00
Pavel Emelyanov ba25f9dcc4 Use helpers to obtain task pid in printks
The task_struct->pid member is going to be deprecated, so start
using the helpers (task_pid_nr/task_pid_vnr/task_pid_nr_ns) in
the kernel.

The first thing to start with is the pid, printed to dmesg - in
this case we may safely use task_pid_nr(). Besides, printks produce
more (much more) than a half of all the explicit pid usage.

[akpm@linux-foundation.org: git-drm went and changed lots of stuff]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Dave Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:43 -07:00
Mathieu Desnoyers 2b47c3611d Fix f_version type: should be u64 instead of unsigned long
Fix f_version type: should be u64 instead of long

There is a type inconsistency between struct inode i_version and struct file
f_version.

fs.h:

struct inode
  u64                     i_version;

and

struct file
  unsigned long           f_version;

Users do:

fs/ext3/dir.c:

if (filp->f_version != inode->i_version) {

So why isn't f_version a u64 ? It becomes a problem if versions gets
higher than 2^32 and we are on an architecture where longs are 32 bits.

This patch changes the f_version type to u64, and updates the users accordingly.

It applies to 2.6.23-rc2-mm2.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Martin Bligh <mbligh@google.com>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: <linux-ext4@vger.kernel.org>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:53 -07:00
Christoph Lameter 4ba9b9d0ba Slab API: remove useless ctor parameter and reorder parameters
Slab constructors currently have a flags parameter that is never used.  And
the order of the arguments is opposite to other slab functions.  The object
pointer is placed before the kmem_cache pointer.

Convert

        ctor(void *object, struct kmem_cache *s, unsigned long flags)

to

        ctor(struct kmem_cache *s, void *object)

throughout the kernel

[akpm@linux-foundation.org: coupla fixes]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:45 -07:00
Peter Zijlstra e0bf68ddec mm: bdi init hooks
provide BDI constructor/destructor hooks

[akpm@linux-foundation.org: compile fix]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:45 -07:00
Nick Piggin b6af1bcd87 ocfs2: convert to new aops
Plug ocfs2 into the ->write_begin and ->write_end aops.

A bunch of custom code is now gone - the iovec iteration stuff during write
and the ocfs2 splice write actor.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:58 -07:00
Linus Torvalds efefc6eb38 Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: (75 commits)
  PM: merge device power-management source files
  sysfs: add copyrights
  kobject: update the copyrights
  kset: add some kerneldoc to help describe what these strange things are
  Driver core: rename ktype_edd and ktype_efivar
  Driver core: rename ktype_driver
  Driver core: rename ktype_device
  Driver core: rename ktype_class
  driver core: remove subsystem_init()
  sysfs: move sysfs file poll implementation to sysfs_open_dirent
  sysfs: implement sysfs_open_dirent
  sysfs: move sysfs_dirent->s_children into sysfs_dirent->s_dir
  sysfs: make sysfs_root a regular directory dirent
  sysfs: open code sysfs_attach_dentry()
  sysfs: make s_elem an anonymous union
  sysfs: make bin attr open get active reference of parent too
  sysfs: kill unnecessary NULL pointer check in sysfs_release()
  sysfs: kill unnecessary sysfs_get() in open paths
  sysfs: reposition sysfs_dirent->s_mode.
  sysfs: kill sysfs_update_file()
  ...
2007-10-12 15:49:37 -07:00
Greg Kroah-Hartman 34980ca8fa Drivers: clean up direct setting of the name of a kset
A kset should not have its name set directly, so dynamically set the
name at runtime.

This is needed to remove the static array in the kobject structure which
will be changed in a future patch.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:02 -07:00
Mark Fasheh e7b3401960 ocfs2: Optionally return filldir errors
Modify ocfs2_dir_foreach_blk() to optionally return any error from the
filldir callback. This way ocfs2_dirforeach() can terminate early, as
opposed to always passing through the entire directory. This fixes a bug
introduced during a previous code refactor where ocfs2_empty_dir() would
loop infinitely.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-12 11:54:41 -07:00
Mark Fasheh 5b6a3a2b4a ocfs2: Write support for directories with inline data
Create all new directories with OCFS2_INLINE_DATA_FL and the inline data
bytes formatted as an empty directory. Inode size field reflects the actual
amount of inline data available, which makes searching for dirent space
very similar to the regular directory search.

Inline-data directories are automatically pushed out to extents on any
insert request which is too large for the available space.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:41 -07:00
Mark Fasheh 23193e513d ocfs2: Read support for directories with inline data
This splits out extent based directory read support and implements
inline-data versions of those functions. All knowledge of inline-data versus
extent based directories is internalized. For lookups the code uses
ocfs2_find_entry_id(), full dir iterations make use of
ocfs2_dir_foreach_blk_id().

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:40 -07:00
Mark Fasheh 1afc32b952 ocfs2: Write support for inline data
This fixes up write, truncate, mmap, and RESVSP/UNRESVP to understand inline
inode data.

For the most part, the changes to the core write code can be relied on to do
the heavy lifting. Any code calling ocfs2_write_begin (including shared
writeable mmap) can count on it doing the right thing with respect to
growing inline data to an extent tree.

Size reducing truncates, including UNRESVP can simply zero that portion of
the inode block being removed. Size increasing truncatesm, including RESVP
have to be a little bit smarter and grow the inode to an extent tree if
necessary.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:40 -07:00
Mark Fasheh 6798d35a31 ocfs2: Read support for inline data
This hooks up ocfs2_readpage() to populate a page with data from an inode
block. Direct IO reads from inline data are modified to fall back to
buffered I/O. Appropriate checks are also placed in the extent map code to
avoid reading an extent list when inline data might be stored.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:39 -07:00
Mark Fasheh 15b1e36bdb ocfs2: Structure updates for inline data
Add the disk, network and memory structures needed to support data in inode.

Struct ocfs2_inline_data is defined and embedded in ocfs2_dinode for storing
inline data.

A new inode field, i_dyn_features, is added to facilitate tracking of
dynamic inode state. Since it will be used often, we want to mirror it on
ocfs2_inode_info, and transfer it via the meta data lvb.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:39 -07:00
Mark Fasheh 8553cf4f36 ocfs2: Cleanup dirent size check
The check to see if a new dirent would fit in an old one is pretty ugly, and
it's done at least twice. Clean things up by putting this in it's own
easier-to-read function.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:39 -07:00
Mark Fasheh 38760e2432 ocfs2: Rename cleanups
ocfs2_rename() does direct manipulation of the dirent it's gotten back from
a directory search. Wrap this manipulation inside of a function so that we
can transparently change directory update behavior in the future. As an
added bonus, this gets rid of an ugly macro.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:38 -07:00
Mark Fasheh be94d11704 ocfs2: Provide convenience function for ino lookup
A couple paths which needed to just match a parent dir + name pair to an
inode number were a bit messy because they had to deal with
ocfs2_find_files_on_disk() which returns a larger number of values. Provide
a convenience function, ocfs2_lookup_ino_from_name() which internalizes all
the extra accounting.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:38 -07:00
Mark Fasheh 0bfbbf62a8 ocfs2: Implement ocfs2_empty_dir() as a caller of ocfs2_dir_foreach()
We can preserve the behavior of ocfs2_empty_dir(), while getting rid of the
open coded directory walk by just providing a smart filldir callback. This
also automatically gets to use the dir readahead code, though in this case
any advantage is minor at best.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:37 -07:00
Mark Fasheh 5eae5b96fc ocfs2: Remove open coded readdir()
ocfs2_queue_orphans() has an open coded readdir loop which can easily just
use a directory accessor function.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:37 -07:00
Mark Fasheh 7e8536797d ocfs2: Pass raw u64 to filldir
filldir_t can take this, so don't turn de->inode into a 32 bit value. Right
now this doesn't make a difference since no ocfs2 inodes overflow that, but
it could be a nasty surprise later on if some kernel code is calling
ocfs2_dir_foreach_blk() and expecting real inode numbers back...

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:37 -07:00
Mark Fasheh b8bc5f4fde ocfs2: Abstract out core dir listing functionality
Put this in it's own function so that the functionality can be overridden.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:36 -07:00
Mark Fasheh 316f4b9f98 ocfs2: Move directory manipulation code into dir.c
The code for adding, removing, deleting directory entries was splattered all
over namei.c. I'd rather have this all centralized, so that it's easier to
make changes for inline dir data, and eventually indexed directories.

None of the code in any of the functions was changed. I only removed the
static keyword from some prototypes so that they could be exported.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:36 -07:00
Mark Fasheh 1d410a6e33 ocfs2: Small refactor of truncate zeroing code
We'll want to reuse most of this when pushing inline data back out to an
extent. Keeping this part as a seperate patch helps to keep the upcoming
changes for write support uncluttered.

The core portion of ocfs2_zero_cluster_pages() responsible for making sure a
page is mapped and properly dirtied is abstracted out into it's own
function, ocfs2_map_and_dirty_page(). Actual functionality doesn't change,
though zeroing becomes optional.

We also turn part of ocfs2_free_write_ctxt() into  a common function for
unlocking and freeing a page array. This operation is very common (and
uniform) for Ocfs2 cluster sizes greater than page size, so it makes sense
to keep the code in one place.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:35 -07:00
Mark Fasheh 65ed39d6ca ocfs2: move nonsparse hole-filling into ocfs2_write_begin()
By doing this, we can remove any higher level logic which has to have
knowledge of btree functionality - any callers of ocfs2_write_begin() can
now expect it to do anything necessary to prepare the inode for new data.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:35 -07:00
Mark Fasheh 92e91ce2a3 ocfs2: Sync ocfs2_fs.h with ocfs2-tools
ocfs2-tools added some on-disk fields and flags which are used by
tunefs.ocfs2.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-12 11:54:34 -07:00
Denis Cheng bddb8eb37f [PATCH] fs/ocfs2/: removed unneeded initial value and function's return value
Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-12 11:54:34 -07:00
Sunil Mushran d550071c03 ocfs2: Implement show_options()
Implement sops->show_options() so as to allow /proc/mounts to show the mount
options.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-12 11:54:33 -07:00
Mark Fasheh 19b613d410 ocfs2: Clear slot map when umounting a local volume
This is technically harmless (recovery will clean it out later), but leaves
a bogus entry in the slot_map which really shouldn't be there.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-12 11:54:33 -07:00
Mark Fasheh 015452b15f ocfs2: Remove unused structure field
c_used_tail_recs in struct ocfs2_merge_ctxt is only ever set, so we can
remove it.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-12 11:54:32 -07:00
Tao Mao 518d7269f3 ocfs2: remove unused variable
delete_tail_recs in ocfs2_try_to_merge_extent() was only ever set, remove
it.

Signed-off-by: Tao Mao <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-12 11:54:32 -07:00
Tao Mao c77534f6fb ocfs2: remove mostly unused field from insert structure
ocfs2_insert_type->ins_free_records was only used in one place, and was set
incorrectly in most places. We can free up some memory and lose some code by
removing this.

* Small warning fixup contributed by Andrew Mortom <akpm@linux-foundation.org>

Signed-off-by: Tao Mao <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-12 11:54:32 -07:00
Al Viro 782e3b3b38 Fix up more bio fallout
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-12 00:29:50 -07:00
NeilBrown 6712ecf8f6 Drop 'size' argument from bio_endio and bi_end_io
As bi_end_io is only called once when the reqeust is complete,
the 'size' argument is now redundant.  Remove it.

Now there is no need for bio_endio to subtract the size completed
from bi_size.  So don't do that either.

While we are at it, change bi_end_io to return void.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-10 09:25:57 +02:00
Sunil Mushran bda0233b89 ocfs2: Unlock mutex in local alloc failure case
The fs was not unlocking the local alloc inode mutex in the code path in
which it failed to find a window of free bits in the global bitmap.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-03 11:14:45 -07:00
Sunil Mushran 813d974c53 ocfs2: Pack vote message and response structures
The ocfs2_vote_msg and ocfs2_response_msg structs needed to be
packed to ensure similar sizeofs in 32-bit and 64-bit arches. Without this,
we had inadvertantly broken 32/64 bit cross mounts.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-09-20 15:06:10 -07:00
Mark Fasheh 5c26a7b70f ocfs2: Don't double set write parameters
The target page offsets were being incorrectly set a second time in
ocfs2_prepare_page_for_write(), which was causing problems on a 16k page
size kernel. Additionally, ocfs2_write_failure() was incorrectly using those
parameters instead of the parameters for the individual page being cleaned
up.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-09-20 15:06:10 -07:00
Mark Fasheh db56246c69 ocfs2: Fix pos/len passed to ocfs2_write_cluster
This was broken for file systems whose cluster size is greater than page
size. Pos needs to be incremented as we loop through the descriptors, and
len needs to be capped to the size of a single cluster.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-09-20 15:06:09 -07:00
Mark Fasheh 415cb80037 ocfs2: Allow smaller allocations during large writes
The ocfs2 write code loops through a page much like the block code, except
that ocfs2 allocation units can be any size, including larger than page
size. Typically it's equal to or larger than page size - most kernels run 4k
pages, the minimum ocfs2 allocation (cluster) size.

Some changes introduced during 2.6.23 changed the way writes to pages are
handled, and inadvertantly broke support for > 4k page size. Instead of just
writing one cluster at a time, we now handle the whole page in one pass.

This means that multiple (small) seperate allocations might happen in the
same pass. The allocation code howver typically optimizes by getting the
maximum which was reserved. This triggered a BUG_ON in the extend code where
it'd ask for a single bit (for one part of a > 4k page) and get back more
than it asked for.

Fix this by providing a variant of the high level allocation function which
allows the caller to specify a maximum. The traditional function remains and
just calls the new one with a maximum determined from the initial
reservation.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-09-20 15:06:09 -07:00
Mark Fasheh e535e2efd2 ocfs2: Fix calculation of i_blocks during truncate
We were setting i_blocks too early - before truncating any allocation.
Correct things to set i_blocks after the allocation change.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-09-11 11:39:46 -07:00
tao.ma@oracle.com 30b8548f2c [PATCH] ocfs2: Fix a wrong cluster calculation.
In ocfs2_alloc_write_write_ctxt, the written clusters length is calculated
by the byte length only. This may cause some problems if we start to write
at some position in the end of one cluster and last to a second cluster
while the "len" is smaller than a cluster size. In that case, we have to
write 2 clusters actually.
So we have to take the start position into consideration also.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-09-11 11:39:05 -07:00
Tiger Yang c0123adef6 [PATCH] ocfs2: fix mount option parsing
For some mount option types, ocfs2_parse_options() will try to access
sb->s_fs_info to get at the ocfs2 private superblock. Unfortunately, that
hasn't been allocated yet and will cause a kernel crash.

Fix this by storing options in a struct which can then get pushed into the
ocfs2_super once it's been allocated later. If we need more options which
store to the ocfs2_super in the future, we can just fields to this struct.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-09-11 11:38:48 -07:00
Mark Fasheh e0dceaf0a4 ocfs2: set non-default s_time_gran during mount
We need to manually set this to '1' during mount, otherwise inode_setattr()
will chop off the nanosecond portion of our timestamps.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-08-09 17:27:58 -07:00
Sunil Mushran ce17204ae6 ocfs2: Retry sendpage() if it returns EAGAIN
Instead of treating EAGAIN, returned from sendpage(), as an error, this
patch retries the operation.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-08-09 17:27:38 -07:00
Sunil Mushran 480214d71f ocfs2: Fix rename/extend race
If one process is extending a file while another is renaming it, there
exists a window when rename could flush the old inode's stale i_size to
disk. This patch recognizes the fact that rename is only updating the old
inode's ctime, so it ensures only that value is flushed to disk.

Signed-off-by: Sunil Mushran <sunil.musran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-08-09 17:27:10 -07:00
Adrian Bunk 6a18380e7d [2.6 patch] ocfs2_insert_extent(): remove dead code
This patch removes some now dead code.

Spotted by the Coverity checker.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-08-09 17:26:03 -07:00
Mark Fasheh 5a25403175 ocfs2: Fix max offset calculations
ocfs2_max_file_offset() was over-estimating the largest file size for
several cases. This wasn't really a problem before, but now that we support
sparse files, it needs to be more accurate.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-08-09 17:25:49 -07:00
Mark Fasheh ce76fd30ce ocfs2: check ia_size limits in setattr
We have to manually check the requested truncate size as the check in
vmtruncate() comes too late for Ocfs2.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-08-09 17:25:38 -07:00
Mark Fasheh 7c08d70c69 ocfs2: Fix some casting errors related to file writes
ocfs2_align_clusters_to_page_index() needs to cast the clusters shift to
pgoff_t and ocfs2_file_buffered_write() needs loff_t when calculating
destination start for memcpy.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-08-09 17:25:27 -07:00
Mark Fasheh a00cce356b ocfs2: use s_maxbytes directly in ocfs2_change_file_space()
There's no need to recalculate things via ocfs2_max_file_offset() as we've
already done that to fill s_maxbytes, so use that instead. We can also
un-export ocfs2_max_file_offset() then.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-08-09 17:25:07 -07:00
Mark Fasheh c11e9fafb3 ocfs2: Restrict inode changes in ocfs2_update_inode_atime()
ocfs2_update_inode_atime() calls ocfs2_mark_inode_dirty() to push changes
from the struct inode into the ocfs2 disk inode. The problem is,
ocfs2_mark_inode_dirty() might change other fields, depending on what
happened to the struct inode. Since we don't always have locking to
serialize changes to other fields (like i_size, etc), just fix things up to
only touch the atime field.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-08-09 17:23:50 -07:00
Jens Axboe 3836df6b52 ocfs2: bad kunmap_atomic()
kunmap_atomic() takes the virtual address, not the mapped page as
argument.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-24 16:02:55 -07:00
Nick Piggin 1833633803 fix some conversion overflows
Fix page index to offset conversion overflows in buffer layer, ecryptfs,
and ocfs2.

It would be nice to convert the whole tree to page_offset, but for now
just fix the bugs.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-20 08:44:19 -07:00
Paul Mundt 20c2df83d2 mm: Remove slab destructors from kmem_cache_create().
Slab destructors were no longer supported after Christoph's
c59def9f22 change. They've been
BUGs for both slab and slub, and slob never supported them
either.

This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-07-20 10:11:58 +09:00
Linus Torvalds f745bb1c73 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
  ocfs2: ->fallocate() support
2007-07-19 14:16:44 -07:00
Nick Piggin d0217ac04c mm: fault feedback #1
Change ->fault prototype.  We now return an int, which contains
VM_FAULT_xxx code in the low byte, and FAULT_RET_xxx code in the next byte.
 FAULT_RET_ code tells the VM whether a page was found, whether it has been
locked, and potentially other things.  This is not quite the way he wanted
it yet, but that's changed in the next patch (which requires changes to
arch code).

This means we no longer set VM_CAN_INVALIDATE in the vma in order to say
that a page is locked which requires filemap_nopage to go away (because we
can no longer remain backward compatible without that flag), but we were
going to do that anyway.

struct fault_data is renamed to struct vm_fault as Linus asked. address
is now a void __user * that we should firmly encourage drivers not to use
without really good reason.

The page is now returned via a page pointer in the vm_fault struct.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:41 -07:00
Nick Piggin 54cb8821de mm: merge populate and nopage into fault (fixes nonlinear)
Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes
the virtual address -> file offset differently from linear mappings.

->populate is a layering violation because the filesystem/pagecache code
should need to know anything about the virtual memory mapping.  The hitch here
is that the ->nopage handler didn't pass down enough information (ie.  pgoff).
 But it is more logical to pass pgoff rather than have the ->nopage function
calculate it itself anyway (because that's a similar layering violation).

Having the populate handler install the pte itself is likewise a nasty thing
to be doing.

This patch introduces a new fault handler that replaces ->nopage and
->populate and (later) ->nopfn.  Most of the old mechanism is still in place
so there is a lot of duplication and nice cleanups that can be removed if
everyone switches over.

The rationale for doing this in the first place is that nonlinear mappings are
subject to the pagefault vs invalidate/truncate race too, and it seemed stupid
to duplicate the synchronisation logic rather than just consolidate the two.

After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in
pagecache.  Seems like a fringe functionality anyway.

NOPAGE_REFAULT is removed.  This should be implemented with ->fault, and no
users have hit mainline yet.

[akpm@linux-foundation.org: cleanup]
[randy.dunlap@oracle.com: doc. fixes for readahead]
[akpm@linux-foundation.org: build fix]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:41 -07:00
Nick Piggin d00806b183 mm: fix fault vs invalidate race for linear mappings
Fix the race between invalidate_inode_pages and do_no_page.

Andrea Arcangeli identified a subtle race between invalidation of pages from
pagecache with userspace mappings, and do_no_page.

The issue is that invalidation has to shoot down all mappings to the page,
before it can be discarded from the pagecache.  Between shooting down ptes to
a particular page, and actually dropping the struct page from the pagecache,
do_no_page from any process might fault on that page and establish a new
mapping to the page just before it gets discarded from the pagecache.

The most common case where such invalidation is used is in file truncation.
This case was catered for by doing a sort of open-coded seqlock between the
file's i_size, and its truncate_count.

Truncation will decrease i_size, then increment truncate_count before
unmapping userspace pages; do_no_page will read truncate_count, then find the
page if it is within i_size, and then check truncate_count under the page
table lock and back out and retry if it had subsequently been changed (ptl
will serialise against unmapping, and ensure a potentially updated
truncate_count is actually visible).

Complexity and documentation issues aside, the locking protocol fails in the
case where we would like to invalidate pagecache inside i_size.  do_no_page
can come in anytime and filemap_nopage is not aware of the invalidation in
progress (as it is when it is outside i_size).  The end result is that
dangling (->mapping == NULL) pages that appear to be from a particular file
may be mapped into userspace with nonsense data.  Valid mappings to the same
place will see a different page.

Andrea implemented two working fixes, one using a real seqlock, another using
a page->flags bit.  He also proposed using the page lock in do_no_page, but
that was initially considered too heavyweight.  However, it is not a global or
per-file lock, and the page cacheline is modified in do_no_page to increment
_count and _mapcount anyway, so a further modification should not be a large
performance hit.  Scalability is not an issue.

This patch implements this latter approach.  ->nopage implementations return
with the page locked if it is possible for their underlying file to be
invalidated (in that case, they must set a special vm_flags bit to indicate
so).  do_no_page only unlocks the page after setting up the mapping
completely.  invalidation is excluded because it holds the page lock during
invalidation of each page (and ensures that the page is not mapped while
holding the lock).

This also allows significant simplifications in do_no_page, because we have
the page locked in the right place in the pagecache from the start.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:41 -07:00
Mark Fasheh 385820a38d ocfs2: ->fallocate() support
Plug ocfs2 into the ->fallocate() callback. This just re-uses the existing
preallocation code.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-19 00:23:55 -07:00
Jeremy Fitzhardinge 86313c488a usermodehelper: Tidy up waiting
Rather than using a tri-state integer for the wait flag in
call_usermodehelper_exec, define a proper enum, and use that.  I've
preserved the integer values so that any callers I've missed should
still work OK.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Joel Becker <joel.becker@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: David Howells <dhowells@redhat.com>
2007-07-18 08:47:40 -07:00
Linus Torvalds b8c638acac Merge branch 'uninit-var' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/misc-2.6
* 'uninit-var' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/misc-2.6:
  arch/i386/* fs/* ipc/*: mark variables with uninitialized_var()
  drivers/*: mark variables with uninitialized_var()
2007-07-17 15:19:06 -07:00
Jeff Garzik 8e1c091ccc arch/i386/* fs/* ipc/*: mark variables with uninitialized_var()
Mark variables with uninitialized_var() if such a warning appears,
and analysis proves that the var is initialized properly on all paths
it is used.

Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-07-17 16:23:19 -04:00
Satyam Sharma 3bd858ab1c Introduce is_owner_or_cap() to wrap CAP_FOWNER use with fsuid check
Introduce is_owner_or_cap() macro in fs.h, and convert over relevant
users to it. This is done because we want to avoid bugs in the future
where we check for only effective fsuid of the current task against a
file's owning uid, without simultaneously checking for CAP_FOWNER as
well, thus violating its semantics.
[ XFS uses special macros and structures, and in general looked ...
untouchable, so we leave it alone -- but it has been looked over. ]

The (current->fsuid != inode->i_uid) check in generic_permission() and
exec_permission_lite() is left alone, because those operations are
covered by CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH. Similarly operations
falling under the purview of CAP_CHOWN and CAP_LEASE are also left alone.

Signed-off-by: Satyam Sharma <ssatyam@cse.iitk.ac.in>
Cc: Al Viro <viro@ftp.linux.org.uk>
Acked-by: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 12:00:03 -07:00
Christoph Hellwig a569425512 knfsd: exportfs: add exportfs.h header
currently the export_operation structure and helpers related to it are in
fs.h.  fs.h is already far too large and there are very few places needing the
export bits, so split them off into a separate header.

[akpm@linux-foundation.org: fix cifs build]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:06 -07:00
Linus Torvalds add096909d Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (32 commits)
  [PATCH] ocfs2: zero_user_page conversion
  ocfs2: Support xfs style space reservation ioctls
  ocfs2: support for removing file regions
  ocfs2: update truncate handling of partial clusters
  ocfs2: btree support for removal of arbirtrary extents
  ocfs2: Support creation of unwritten extents
  ocfs2: support writing of unwritten extents
  ocfs2: small cleanup of ocfs2_write_begin_nolock()
  ocfs2: btree changes for unwritten extents
  ocfs2: abstract btree growing calls
  ocfs2: use all extent block suballocators
  ocfs2: plug truncate into cached dealloc routines
  ocfs2: simplify deallocation locking
  ocfs2: harden buffer check during mapping of page blocks
  ocfs2: shared writeable mmap
  ocfs2: factor out write aops into nolock variants
  ocfs2: rework ocfs2_buffered_write_cluster()
  ocfs2: take ip_alloc_sem during entire truncate
  ocfs2: Add "preferred slot" mount option
  [KJ PATCH] Replacing memset(<addr>,0,PAGE_SIZE) with clear_page() in fs/ocfs2/dlm/dlmrecovery.c
  ...
2007-07-16 10:52:55 -07:00
Tejun Heo 7b595756ec sysfs: kill unnecessary attribute->owner
sysfs is now completely out of driver/module lifetime game.  After
deletion, a sysfs node doesn't access anything outside sysfs proper,
so there's no reason to hold onto the attribute owners.  Note that
often the wrong modules were accounted for as owners leading to
accessing removed modules.

This patch kills now unnecessary attribute->owner.  Note that with
this change, userland holding a sysfs node does not prevent the
backing module from being unloaded.

For more info regarding lifetime rule cleanup, please read the
following message.

  http://article.gmane.org/gmane.linux.kernel/510293

(tweaked by Greg to not delete the field just yet, to make it easier to
merge things properly.)

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:06 -07:00
Eric Sandeen 54c57dc3b6 [PATCH] ocfs2: zero_user_page conversion
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:32:10 -07:00
Mark Fasheh b25801038d ocfs2: Support xfs style space reservation ioctls
We re-use the RESVSP/UNRESVSP ioctls from xfs which allow the user to
allocate and deallocate regions to a file without zeroing data or changing
i_size.

Though renamed, the structure passed in from user is identical to struct
xfs_flock64. The three fields that are actually used right now are l_whence,
l_start and l_len.

This should get ocfs2 immediate compatibility with userspace software using
the pre-existing xfs ioctls.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:32:09 -07:00
Mark Fasheh 063c4561f5 ocfs2: support for removing file regions
Provide an internal interface for the removal of arbitrary file regions.

ocfs2_remove_inode_range() takes a byte range within a file and will remove
existing extents within that range. Partial clusters will be zeroed so that
any read from within the region will return zeros.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:32:08 -07:00
Mark Fasheh 35edec1d52 ocfs2: update truncate handling of partial clusters
The partial cluster zeroing code used during truncate usually assumes that
the rightmost byte in the range to be zeroed lies on a cluster boundary.
This makes sense for truncate, but punching holes might require zeroing on
non-aligned rightmost boundaries.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:32:07 -07:00
Mark Fasheh d0c7d7082e ocfs2: btree support for removal of arbirtrary extents
Add code to the btree paths to support the removal of arbitrary regions
within an existing extent. With proper higher level support this can be used
to "punch holes" in a file. Truncate (a special case of hole punching) could
also be converted to use these methods.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:32:05 -07:00
Mark Fasheh 2ae99a6037 ocfs2: Support creation of unwritten extents
This can now be trivially supported with re-use of our existing extend code.

ocfs2_allocate_unwritten_extents() takes a start offset and a byte length
and iterates over the inode, adding extents (marked as unwritten) until len
is reached. Existing extents are skipped over.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:32:04 -07:00
Mark Fasheh b27b7cbcf1 ocfs2: support writing of unwritten extents
Update the write code to detect when the user is asking to write to an
unwritten extent. Like writing to a hole, we must zero the region between
the write and the cluster boundaries. Most of the existing cluster zeroing
logic can be re-used with some additional checks for the unwritten flag on
extent records.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:32:03 -07:00
Mark Fasheh 0d172baa55 ocfs2: small cleanup of ocfs2_write_begin_nolock()
We can easily seperate out the write descriptor setup and manipulation
into helper functions.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:32:01 -07:00
Mark Fasheh 328d5752e1 ocfs2: btree changes for unwritten extents
Writes to a region marked as unwritten might result in a record split or
merge. We can support splits by making minor changes to the existing insert
code. Merges require left rotations which mostly re-use right rotation
support functions.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:32:00 -07:00
Mark Fasheh c3afcbb344 ocfs2: abstract btree growing calls
The top level calls and logic for growing a tree can easily be abstracted
out of ocfs2_insert_extent() into a seperate function - ocfs2_grow_tree().

This allows future code to easily grow btrees when needed.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:31:58 -07:00
Mark Fasheh 1f6697d072 ocfs2: use all extent block suballocators
Now that we have a method to deallocate blocks from them, each node should
allocate extent blocks from their local suballocator file.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:31:56 -07:00
Mark Fasheh 59a5e416d1 ocfs2: plug truncate into cached dealloc routines
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:31:55 -07:00
Mark Fasheh 2b604351bc ocfs2: simplify deallocation locking
Deallocation of suballocator blocks, most notably extent blocks, might
involve multiple suballocator inodes.

The locking for this can get extremely complicated, especially when the
suballocator inodes to delete from aren't known until deep within an
unrelated codepath.

Implement a simple scheme for recording the blocks to be unlinked so that
the actual deallocation can be done in a context which won't deadlock.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:31:54 -07:00
Mark Fasheh bce997682f ocfs2: harden buffer check during mapping of page blocks
We don't want to submit buffer_new blocks for read i/o. This actually won't
happen right now because those requests during an allocating write are all nicely
aligned. It's probably a good idea to provide an explicit check though.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:31:52 -07:00
Mark Fasheh 7307de8051 ocfs2: shared writeable mmap
Implement cluster consistent shared writeable mappings using the
->page_mkwrite() callback.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:31:51 -07:00
Mark Fasheh 607d44aa3f ocfs2: factor out write aops into nolock variants
ocfs2_mkwrite() will want this so that it can add some mmap specific checks
before asking for a write.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:31:49 -07:00
Mark Fasheh 3a307ffc27 ocfs2: rework ocfs2_buffered_write_cluster()
Use some ideas from the new-aops patch series and turn
ocfs2_buffered_write_cluster() into a 2 stage operation with the caller
copying data in between. The code now understands multiple cluster writes as
a result of having to deal with a full page write for greater than 4k pages.

This sets us up to easily call into the write path during ->page_mkwrite().

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:31:46 -07:00
Mark Fasheh 2e89b2e48e ocfs2: take ip_alloc_sem during entire truncate
Use of the alloc sem during truncate was too narrow - we want to protect
the i_size change and page truncation against mmap now.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:19:57 -07:00
Sunil Mushran baf4661a82 ocfs2: Add "preferred slot" mount option
ocfs2 will attempt to assign the node the slot# provided in the mount
option. Failure to assign the preferred slot is not an error. This small
feature can be useful for automated testing.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:19:54 -07:00
Shani Moideen 5fb0f7f010 [KJ PATCH] Replacing memset(<addr>,0,PAGE_SIZE) with clear_page() in fs/ocfs2/dlm/dlmrecovery.c
Replacing memset(<addr>,0,PAGE_SIZE) with clear_page() in
fs/ocfs2/dlm/dlmrecovery.c

Signed-off-by: Shani Moideen <shani.moideen@wipro.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:19:52 -07:00
Christoph Hellwig 800deef3f6 [PATCH] ocfs2: use list_for_each_entry where benefical
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:19:49 -07:00
Joel Becker e6df3a663a ocfs2: Wake up a starting region if it gets killed in the background.
Tell o2cb_region_dev_write() to wake up if rmdir(2) happens on the
heartbeat region while it is starting up.  Then o2hb_region_dev_write()
can check to see if it is alive and act accordingly.  This prevents a hang
(not being woken) and a crash (if it's woken by a signal).

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:19:46 -07:00
Joel Becker 16c6a4f24d ocfs2: live heartbeat depends on the local node configuration
Removing the local node configuration out from underneath a running
heartbeat is "bad".  Provide an API in the ocfs2 nodemanager to request
a configfs dependancy on the local node, then use it in heartbeat.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:19:43 -07:00
Joel Becker 14829422be ocfs2: Depend on configfs heartbeat items.
ocfs2 mounts require a heartbeat region.  Use the new configfs_depend_item()
facility to actually depend on them so they can't go away from under us.

First, teach cluster/nodemanager.c to depend an item on the o2cb subsystem.
Then teach o2hb_register_callbacks to take a UUID and depend on the
appropriate region.  Finally, teach all users of o2hb to pass a UUID or
NULL if they don't require a pin.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:19:40 -07:00
Joel Becker e6bd07aee7 configfs: Convert subsystem semaphore to mutex
Convert the su_sem member of struct configfs_subsystem to a struct
mutex, as that's what it is. Also convert all the users and update
Documentation/configfs.txt and Documentation/configfs_example.c
accordingly.

[ Conflict in fs/dlm/config.c with commit
  3168b0780d manually resolved. --Mark ]

Inspired-by: Satyam Sharma <ssatyam@cse.iitk.ac.in>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:10:56 -07:00
Jens Axboe cac36bb06e pipe: change the ->pin() operation to ->confirm()
The name 'pin' was badly chosen, it doesn't pin a pipe buffer
in the most commonly used sense in the kernel. So change the
name to 'confirm', after debating this issue with Hugh
Dickins a bit.

A good return from ->confirm() means that the buffer is really
there, and that the contents are good.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 08:04:15 +02:00
Jens Axboe d6b29d7cee splice: divorce the splice structure/function definitions from the pipe header
We need to move even more stuff into the header so that folks can use
the splice_to_pipe() implementation instead of open-coding a lot of
pipe knowledge (see relay implementation), so move to our own header
file finally.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 08:04:14 +02:00
Jens Axboe 5ffc4ef45b sendfile: remove .sendfile from filesystems that use generic_file_sendfile()
They can use generic_file_splice_read() instead. Since sys_sendfile() now
prefers that, there should be no change in behaviour.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 08:04:13 +02:00
Jens Axboe 6a14b90bb6 vmsplice: add vmsplice-to-user support
A bit of a cheat, it actually just copies the data to userspace. But
this makes the interface nice and symmetric and enables people to build
on splice, with room for future improvement in performance.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 08:04:12 +02:00
Jens Axboe c66ab6fa70 splice: abstract out actor data
For direct splicing (or private splicing), the output may not be a file.
So abstract out the handling into a specified actor function and put
the data in the splice_desc structure earlier, so we can build on top
of that.

This is the first step in better splice handling for drivers, and also
for implementing vmsplice _to_ user memory.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 08:04:12 +02:00
Mark Fasheh eeb47d1234 ocfs2: Fix invalid assertion during write on 64k pages
The write path code intends to bug if a math error (or unhandled case)
results in a write outside of the current cluster boundaries. The actual
BUG_ON() statements however are incorrect, leading to a crash on kernels
with 64k page size. Fix those by checking against the right variables.

Also, move the assertions higher up within the functions so that they trip
*before* the code starts to mark buffers.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-06-06 16:42:03 -07:00
Tiger Yang 59be7dc97b ocfs2: Fix masklog breakage
Some of the sysfs changes inadvertantly broke the simple runtime debug log
filtering employed in ocfs2. Fix this by properly exporting the masklog
category filter names.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-06-06 16:41:08 -07:00
Christoph Hellwig d9b08b9efe [PATCH] ocfs2: use generic_segment_checks
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-25 11:06:37 -07:00
Mark Fasheh 8fccfc829a ocfs2: fix inode leak
We weren't cleaning up our inode reference on error in
ocfs2_reserve_local_alloc_bits(). Add a check for error return and iput() if
need be. Move the code to set the alloc context inode info to the end of the
function so we don't have any possibility of passing back a bad pointer.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-25 11:00:46 -07:00
Nate Diller 5c3c6bb770 [PATCH] ocfs2: use zero_user_page
Use zero_user_page() instead of open-coding it.

Signed-off-by: Nate Diller <nate.diller@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-25 11:00:39 -07:00
Mark Fasheh 1024c902ab ocfs2: unmap_mapping_range() in ocfs2_truncate()
We weren't calling this before, but since ocfs2 handles the entire truncate
operation, we should.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-25 11:00:31 -07:00
Mark Fasheh e9dfc0b2bc ocfs2: trylock in ocfs2_readpage()
Similarly to the page lock / cluster lock inversion in ocfs2_readpage, we
can deadlock on ip_alloc_sem. We can down_read_trylock() instead and just
return AOP_TRUNCATED_PAGE if the operation fails.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-25 11:00:23 -07:00
Christoph Lameter a35afb830f Remove SLAB_CTOR_CONSTRUCTOR
SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@ucw.cz>
Cc: David Chinner <dgc@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-17 05:23:04 -07:00
Randy Dunlap c4a7f5eb5f ocfs2: kobject/kset foobar
Fix gcc warning and Oops that it causes:

fs/ocfs2/cluster/masklog.c:161: warning: assignment from incompatible pointer type
[ 2776.204120] OCFS2 Node Manager 1.3.3
[ 2776.211729] BUG: spinlock bad magic on CPU#0, modprobe/4424
[ 2776.214269]  lock: ffff810021c8fe18, .magic: ffffffff, .owner: /6394416, .owner_cpu: 0
[ 2776.217864] [ 2776.217865] Call Trace:
[ 2776.219662]  [<ffffffff803426c8>] spin_bug+0x9e/0xe9
[ 2776.221921]  [<ffffffff803427bf>] _raw_spin_lock+0x23/0xf9
[ 2776.224417]  [<ffffffff8051acf4>] _spin_lock+0x9/0xb
[ 2776.226676]  [<ffffffff8033c3b1>] kobject_shadow_add+0x98/0x1ac
[ 2776.229367]  [<ffffffff8033c4d0>] kobject_add+0xb/0xd
[ 2776.231665]  [<ffffffff8033c4df>] kset_add+0xd/0xf
[ 2776.233845]  [<ffffffff8033c5a6>] kset_register+0x23/0x28
[ 2776.236309]  [<ffffffff8808ccb7>] :ocfs2_nodemanager:mlog_sys_init+0x68/0x6d
[ 2776.239518]  [<ffffffff8808ccee>] :ocfs2_nodemanager:o2cb_sys_init+0x32/0x4a
[ 2776.242726]  [<ffffffff880b80a6>] :ocfs2_nodemanager:init_o2nm+0xa6/0xd5
[ 2776.245772]  [<ffffffff8025266c>] sys_init_module+0x1471/0x15d2
[ 2776.248465]  [<ffffffff8033f250>] simple_strtoull+0x0/0xdc
[ 2776.250959]  [<ffffffff8020948e>] system_call+0x7e/0x83

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-10 09:26:52 -07:00
Randy Dunlap e63340ae6b header cleaning: don't include smp_lock.h when not used
Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.

Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:07 -07:00
Christoph Lameter 50953fe9e0 slab allocators: Remove SLAB_DEBUG_INITIAL flag
I have never seen a use of SLAB_DEBUG_INITIAL.  It is only supported by
SLAB.

I think its purpose was to have a callback after an object has been freed
to verify that the state is the constructor state again?  The callback is
performed before each freeing of an object.

I would think that it is much easier to check the object state manually
before the free.  That also places the check near the code object
manipulation of the object.

Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
compiled with SLAB debugging on.  If there would be code in a constructor
handling SLAB_DEBUG_INITIAL then it would have to be conditional on
SLAB_DEBUG otherwise it would just be dead code.  But there is no such code
in the kernel.  I think SLUB_DEBUG_INITIAL is too problematic to make real
use of, difficult to understand and there are easier ways to accomplish the
same effect (i.e.  add debug code before kfree).

There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
clear in fs inode caches.  Remove the pointless checks (they would even be
pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.

This is the last slab flag that SLUB did not support.  Remove the check for
unimplemented flags from SLUB.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:57 -07:00
Nick Piggin 6fe6900e1e mm: make read_cache_page synchronous
Ensure pages are uptodate after returning from read_cache_page, which allows
us to cut out most of the filesystem-internal PageUptodate calls.

I didn't have a great look down the call chains, but this appears to fixes 7
possible use-before uptodate in hfs, 2 in hfsplus, 1 in jfs, a few in
ecryptfs, 1 in jffs2, and a possible cleared data overwritten with readpage in
block2mtd.  All depending on whether the filler is async and/or can return
with a !uptodate page.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:51 -07:00
Linus Torvalds fa24aa561a Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
  ocfs2: Force use of GFP_NOFS in ocfs2_write()
  ocfs2: fix sparse warnings in fs/ocfs2/cluster
  ocfs2: fix sparse warnings in fs/ocfs2/dlm
  ocfs2: fix sparse warnings in fs/ocfs2
  [PATCH] Copy i_flags to ocfs2 inode flags on write
  [PATCH] ocfs2: use __set_current_state()
  ocfs2: Wrap access of directory allocations with ip_alloc_sem.
  [PATCH] fs/ocfs2/: make 3 functions static
  ocfs2: Implement compat_ioctl()
2007-05-04 20:44:54 -07:00
Greg Kroah-Hartman 823bccfc40 remove "struct subsystem" as it is no longer needed
We need to work on cleaning up the relationship between kobjects, ksets and
ktypes.  The removal of 'struct subsystem' is the first step of this,
especially as it is not really needed at all.

Thanks to Kay for fixing the bugs in this patch.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 18:57:59 -07:00
Mark Fasheh 9315f130e1 ocfs2: Force use of GFP_NOFS in ocfs2_write()
We can otherwise recurse into the file system.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-02 15:08:34 -07:00
Mark Fasheh 5fdf1e6771 ocfs2: fix sparse warnings in fs/ocfs2/cluster
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-02 15:08:23 -07:00
Mark Fasheh a7d25539fd ocfs2: fix sparse warnings in fs/ocfs2/dlm
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-02 15:08:15 -07:00
Mark Fasheh 1ca1a111b1 ocfs2: fix sparse warnings in fs/ocfs2
None of these are actually harmful, but the noise makes looking for real
problems difficult.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-02 15:08:08 -07:00
Jan Kara 6e4b0d5692 [PATCH] Copy i_flags to ocfs2 inode flags on write
Propagate flags such as S_APPEND, S_IMMUTABLE, etc. from i_flags into
ocfs2-specific ip_attr. Hence, when someone sets these flags via a different
interface than ioctl, they are stored correctly.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-02 15:07:58 -07:00
Milind Arun Choudhary 5c2c9d383e [PATCH] ocfs2: use __set_current_state()
use __set_current_state(TASK_*) instead of current->state = TASK_*, in
fs/ocfs2

Signed-off-by: Milind Arun Choudhary <milindchoudhary@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-02 15:07:50 -07:00
Joel Becker ee19a77956 ocfs2: Wrap access of directory allocations with ip_alloc_sem.
OCFS2_I(inode)->ip_alloc_sem is a read-write semaphore protecting
local concurrent access of ocfs2 inodes.  However, ocfs2 directories were
not taking the semaphore while they accessed or modified the allocation
tree.

ocfs2_extend_dir() needs to take the semaphore in a write mode when it
adds to the allocation.  All other directory users get there via
ocfs2_bread(), which takes the semaphore in read mode.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-02 15:07:42 -07:00
Adrian Bunk 6cb129f567 [PATCH] fs/ocfs2/: make 3 functions static
This patch makes the following needlessly global functions static:
- aops.c: ocfs2_write_data_page()
- dlmglue.c: ocfs2_dump_meta_lvb_info()
- file.c: ocfs2_set_inode_size()

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-02 15:07:27 -07:00
Mark Fasheh 586d232b19 ocfs2: Implement compat_ioctl()
We need this to support 32 bit system calls on 64 bit kernels.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-02 15:07:16 -07:00
Mark Fasheh 8341897882 ocfs2: Cache extent records
The extent map code was ripped out earlier because of an inability to deal
with holes. This patch adds back a simpler caching scheme requiring far less
code.

Our old extent map caching was designed back when meta data block caching in
Ocfs2 didn't work very well, resulting in many disk reads. These days our
metadata caching is much better, resulting in no un-necessary disk reads. As
a result, extent caching doesn't have to be as fancy, nor does it have to
cache as many extents. Keeping the last 3 extents seen should be sufficient
to give us a small performance boost on some streaming workloads.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:10:40 -07:00
Mark Fasheh 7cdfc3a1c3 ocfs2: Remember rw lock level during direct io
Cluster locking might have been redone because a direct write won't
complete, so this needs to be reflected in the iocb.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:07:45 -07:00
Mark Fasheh 8110b073a9 ocfs2: Fix up i_blocks calculation to know about holes
Older file systems which didn't support holes did a dumb calculation of
i_blocks based on i_size. This is no longer accurate, so fix things up to
take actual allocation into account.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:07:40 -07:00
Mark Fasheh 4f902c3772 ocfs2: Fix extent lookup to return true size of holes
Initially, we had wired things to return a size '1' of holes. Cook up a
small amount of code to find the next extent and calculate the number of
clusters between the virtual offset and the next allocated extent.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:02:45 -07:00
Mark Fasheh 49cb8d2d49 ocfs2: Read from an unwritten extent returns zeros
Return an optional extent flags field from our lookup functions and wire up
callers to treat unwritten regions as holes for the purpose of returning
zeros to the user.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:02:41 -07:00
Mark Fasheh e48edee2d8 ocfs2: make room for unwritten extents flag
Due to the size of our group bitmaps, we'll never have a leaf node extent
record with more than 16 bits worth of clusters. Split e_clusters up so that
leaf nodes can get a flags field where we can mark unwritten extents.
Interior nodes whose length references all the child nodes beneath it can't
split their e_clusters field, so we use a union to preserve sizing there.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:02:37 -07:00
Mark Fasheh 6af67d8205 ocfs2: Use own splice write actor
We need to fill holes during a splice write. Provide our own splice write
actor which can call ocfs2_file_buffered_write() with a splice-specific
callback.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:02:34 -07:00
Mark Fasheh fa41045fcb ocfs2: Use do_sync_mapping_range() in ocfs2_zero_tail_for_truncate()
Do this instead of filemap_fdatawrite() - this way we sync only the
range between i_size and the cluster boundary.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:02:30 -07:00
Mark Fasheh 60b11392f1 ocfs2: zero tail of sparse files on truncate
Since we don't zero on extend anymore, truncate needs to be fixed up to zero
the part of a file between i_size and and end of it's cluster. Otherwise a
subsequent extend could expose bad data.

This introduced a new helper, which can be used in ocfs2_write().

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:02:20 -07:00
Mark Fasheh 25baf2da14 ocfs2: Teach ocfs2_get_block() about holes
ocfs2_get_block() didn't understand sparse files, fix that. Also remove some
code that isn't really useful anymore. We can fix up
ocfs2_direct_IO_get_blocks() at the same time.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:02:16 -07:00
Mark Fasheh 5069120b72 ocfs2: remove ocfs2_prepare_write() and ocfs2_commit_write()
These are no longer used, and can't handle file systems with sparse file
allocation.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:02:12 -07:00
Mark Fasheh 9517bac6cc ocfs2: teach ocfs2_file_aio_write() about sparse files
Unfortunately, ocfs2 can no longer make use of generic_file_aio_write_nlock()
because allocating writes will require zeroing of pages adjacent to the I/O
for cluster sizes greater than page size.

Implement a custom file write here, which can order page locks for zeroing.
This also has the advantage that cluster locks can easily be ordered outside
of the page locks.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:02:08 -07:00
Mark Fasheh 89488984ac ocfs2: Turn off shared writeable mmap for local files systems with holes.
This will be turned back on once we can do allocation in ->page_mkwrite().

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:02:01 -07:00
Mark Fasheh abf8b15694 ocfs2: abstract out allocation locking
Right now, file allocation for ocfs2 is done within ocfs2_extend_file(),
which is either called from ->setattr() (for an i_size change), or at the
top of ocfs2_file_aio_write().

Inodes on file systems with sparse file support will want to do their
allocation during the actual write call.

In either case the cluster locking decisions are the same. We abstract out
that code into a new function, ocfs2_lock_allocators() which will be used by
a later patch to enable writing to sparse files.

This also provides a nice cleanup of ocfs2_extend_allocation().

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:01:58 -07:00