Commit Graph

2731 Commits (454e2398be9b9fa30433fccc548db34d19aa9958)

Author SHA1 Message Date
Jens Axboe 7afa6fd037 [PATCH] vmsplice: allow user to pass in gift pages
If SPLICE_F_GIFT is set, the user is basically giving this pages away to
the kernel. That means we can steal them for eg page cache uses instead
of copying it.

The data must be properly page aligned and also a multiple of the page size
in length.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-05-01 20:02:33 +02:00
Jens Axboe f6762b7ad8 [PATCH] pipe: enable atomic copying of pipe data to/from user space
The pipe ->map() method uses kmap() to virtually map the pages, which
is both slow and has known scalability issues on SMP. This patch enables
atomic copying of pipe pages, by pre-faulting data and using kmap_atomic()
instead.

lmbench bw_pipe and lat_pipe measurements agree this is a Good Thing. Here
are results from that on a UP machine with highmem (1.5GiB of RAM), running
first a UP kernel, SMP kernel, and SMP kernel patched.

Vanilla-UP:
Pipe bandwidth: 1622.28 MB/sec
Pipe bandwidth: 1610.59 MB/sec
Pipe bandwidth: 1608.30 MB/sec
Pipe latency: 7.3275 microseconds
Pipe latency: 7.2995 microseconds
Pipe latency: 7.3097 microseconds

Vanilla-SMP:
Pipe bandwidth: 1382.19 MB/sec
Pipe bandwidth: 1317.27 MB/sec
Pipe bandwidth: 1355.61 MB/sec
Pipe latency: 9.6402 microseconds
Pipe latency: 9.6696 microseconds
Pipe latency: 9.6153 microseconds

Patched-SMP:
Pipe bandwidth: 1578.70 MB/sec
Pipe bandwidth: 1579.95 MB/sec
Pipe bandwidth: 1578.63 MB/sec
Pipe latency: 9.1654 microseconds
Pipe latency: 9.2266 microseconds
Pipe latency: 9.1527 microseconds

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-05-01 20:02:05 +02:00
Jens Axboe e27dedd84c [PATCH] splice: call handle_ra_miss() on failure to lookup page
Notify the readahead logic of the missing page. Suggested by
Oleg Nesterov.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-05-01 19:59:54 +02:00
Jens Axboe 7f9c51f0d9 [PATCH] Add ->splice_read/splice_write to def_blk_fops
It can use the generic handlers.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-05-01 19:59:32 +02:00
Jens Axboe f84d751994 [PATCH] pipe: introduce ->pin() buffer operation
The ->map() function is really expensive on highmem machines right now,
since it has to use the slower kmap() instead of kmap_atomic(). Splice
rarely needs to access the virtual address of a page, so it's a waste
of time doing it.

Introduce ->pin() to take over the responsibility of making sure the
page data is valid. ->map() is then reduced to just kmap(). That way we
can also share a most of the pipe buffer ops between pipe.c and splice.c

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-05-01 19:59:03 +02:00
Jens Axboe 0568b409c7 [PATCH] splice: fix bugs in pipe_to_file()
Found by Oleg Nesterov <oleg@tv-sign.ru>, fixed by me.

- Only allow full pages to go to the page cache.
- Check page != buf->page instead of using PIPE_BUF_FLAG_STOLEN.
- Remember to clear 'stolen' if add_to_page_cache() fails.

And as a cleanup on that:

- Make the bottom fall-through logic a little less convoluted. Also make
  the steal path hold an extra reference to the page, so we don't have
  to differentiate between stolen and non-stolen at the end.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-05-01 19:50:48 +02:00
Jens Axboe 46e678c96b [PATCH] splice: fix bugs with stealing regular pipe pages
- Check that page has suitable count for stealing in the regular pipes.
- pipe_to_file() assumes that the page is locked on succesful steal, so
  do that in the pipe steal hook
- Missing unlock_page() in add_to_page_cache() failure.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-30 16:36:32 +02:00
Andreas Schwab 2833c28aa0 [PATCH] powerpc: Wire up *at syscalls
Wire up *at syscalls.

This patch has been tested on ppc64 (using glibc's testsuite, both 32bit
and 64bit), and compile-tested for ppc32 (I have currently no ppc32 system
available, but I expect no problems).

Signed-off-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-04-28 21:04:59 +10:00
Jens Axboe eb20796bf6 [PATCH] splice: make the read-side do batched page lookups
Use the new find_get_pages_contig() to potentially look up the entire
splice range in one single call. This speeds up generic_file_splice_read()
quite a bit.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-27 11:05:22 +02:00
Jens Axboe eb645a24de [PATCH] splice: switch to using page_cache_readahead()
Avoids doing useless work, when the file is fully cached.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-27 08:59:48 +02:00
James Morris e7edf9cded [PATCH] LSM: add missing hook to do_compat_readv_writev()
This patch addresses a flaw in LSM, where there is no mediation of readv()
and writev() in for 32-bit compatible apps using a 64-bit kernel.

This bug was discovered and fixed initially in the native readv/writev
code [1], but was not fixed in the compat code.  Thanks to Al for spotting
this one.

  [1] http://lwn.net/Articles/154282/

Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-26 07:52:21 -07:00
Al Viro a090d9132c [PATCH] protect ext3 ioctl modifying append_only, immutable, etc. with i_mutex
All modifications of ->i_flags in inodes that might be visible to
somebody else must be under ->i_mutex.  That patch fixes ext3 ioctl()
setting S_APPEND and friends.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-26 07:52:21 -07:00
Al Viro de0bb97aff [PATCH] forgotten ->b_data in memcpy() call in ext3/resize.c (oopsable)
sbi->s_group_desc is an array of pointers to buffer_head.  memcpy() of
buffer size from address of buffer_head is a bad idea - it will generate
junk in any case, may oops if buffer_head is close to the end of slab
page and next page is not mapped and isn't what was intended there.
IOW, ->b_data is missing in that call.  Fortunately, result doesn't go
into the primary on-disk data structures, so only backup ones get crap
written to them; that had allowed this bug to remain unnoticed until
now.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-26 07:52:21 -07:00
Linus Torvalds 7b97ebfb93 Merge branch 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block
* 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block:
  [PATCH] splice: add ->splice_write support for /dev/null
  [PATCH] splice: rearrange moving to/from pipe helpers
  [PATCH] Add support for the sys_vmsplice syscall
  [PATCH] splice: fix offset problems
  [PATCH] splice: fix min() warning
2006-04-26 07:47:55 -07:00
Jens Axboe 00522fb41a [PATCH] splice: rearrange moving to/from pipe helpers
We need these for people writing their own ->splice_read/write hooks.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-26 14:39:29 +02:00
Jens Axboe 912d35f867 [PATCH] Add support for the sys_vmsplice syscall
sys_splice() moves data to/from pipes with a file input/output. sys_vmsplice()
moves data to a pipe, with the input being a user address range instead.

This uses an approach suggested by Linus, where we can hold partial ranges
inside the pages[] map. Hopefully this will be useful for network
receive support as well.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-26 10:59:21 +02:00
Miklos Szeredi 8aa09a50b5 [fuse] fix race between checking and setting file->private_data
BKL does not protect against races if the task may sleep between
checking and setting a value.  So move checking of file->private_data
near to setting it in fuse_fill_super().

Found by Al Viro.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
2006-04-26 10:49:16 +02:00
Miklos Szeredi 6dbbcb1205 [fuse] fix deadlock between fuse_put_super() and request_end(), try #2
A deadlock was possible, when the last reference to the superblock was
held due to a background request containing a file reference.

Releasing the file would release the vfsmount which in turn would
release the superblock.  Since sbput_sem is held during the fput() and
fuse_put_super() tries to acquire this same semaphore, a deadlock
results.

The solution is to move the fput() outside the region protected by
sbput_sem.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
2006-04-26 10:49:06 +02:00
Miklos Szeredi 5a5fb1ea74 Revert "[fuse] fix deadlock between fuse_put_super() and request_end()"
This reverts 73ce8355c2 commit.

It was wrong, because it didn't take into account the requirement,
that iput() for background requests must be performed synchronously
with ->put_super(), otherwise active inodes may remain after unmount.

The right solution is to keep the sbput_sem and perform iput() within
the locked region, but move fput() outside sbput_sem.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
2006-04-26 10:48:55 +02:00
Jens Axboe 016b661e2f [PATCH] splice: fix offset problems
Make the move_from_pipe() actors return number of bytes processed, then
move_from_pipe() can decide more cleverly when to move on to the next
buffer.

This fixes problems with pipe offset and differing file offset.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-26 10:33:34 +02:00
Andrew Morton ba5f5d90c4 [PATCH] splice: fix min() warning
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-26 10:33:34 +02:00
Steve French 301dc3e6f6 [CIFS] Fix compile error when CONFIG_CIFS_EXPERIMENTAL is undefined
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2006-04-24 16:24:54 +00:00
Linus Torvalds 41bc3982b9 Merge master.kernel.org:/pub/scm/linux/kernel/git/sfrench/cifs-2.6-stable
* master.kernel.org:/pub/scm/linux/kernel/git/sfrench/cifs-2.6-stable:
  [CIFS] Fix typo in previous
  [CIFS] Readdir fixes to allow search to start at arbitrary position
  [CIFS] Use the kthread_ API instead of opencoding lots of hairy code for kernel
  [CIFS] Don't allow a backslash in a path component
  [CIFS] [CIFS] Do not take rename sem on most path based calls (during
2006-04-23 09:38:09 -07:00
Steve French b66ac3ea21 [CIFS] Fix typo in previous
Signed-off-by: Steve French <sfrench@us.ibm.com>
2006-04-23 01:54:50 +00:00
Jan Kara b9251b823b [PATCH] Fix reiserfs deadlock
reiserfs_cache_default_acl() should return whether we successfully found
the acl or not.  We have to return correct value even if reiserfs_get_acl()
returns error code and not just 0.  Otherwise callers such as
reiserfs_mkdir() can unnecessarily lock the xattrs and later functions such
as reiserfs_new_inode() fail to notice that we have already taken the lock
and try to take it again with obvious consequences.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-22 09:19:53 -07:00
Steve French 60808233f3 [CIFS] Readdir fixes to allow search to start at arbitrary position
in directory

Also includes first part of fix to compensate for servers which forget
to return . and .. as well as updates to changelog and cifs readme.

Signed-off-by: Steve French <sfrench@us.ibm.com>
2006-04-22 15:53:05 +00:00
Steve French 45af7a0f2e [CIFS] Use the kthread_ API instead of opencoding lots of hairy code for kernel
thread creation and teardown.

It does not move the cifsd thread handling to kthread due to problems
found in testing with wakeup of threads blocked in the socket peek api,
but the other cifs kernel threads now use kthread.
Also cleanup cifs_init to properly unwind when thread creation fails.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2006-04-21 22:52:25 +00:00
Steve French 296034f7de [CIFS] Don't allow a backslash in a path component
Unless Posix paths have been negotiated, the backslash, "\", is not a valid
character in a path component.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Steve French  <sfrench@us.ibm.com>
2006-04-21 18:18:37 +00:00
Steve French 0bd4fa977f [CIFS] [CIFS] Do not take rename sem on most path based calls (during
building of full path) to avoid hang rename/readdir hang

Reported by Alan Tyson

Signed-off-by: Steve French <sfrench@us.ibm.com>
2006-04-21 18:17:42 +00:00
David Woodhouse 21f1d5fc59 [RBTREE] Update JFFS2 to use rb_parent() accessor macro.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-04-21 13:17:57 +01:00
David Woodhouse c569882b2e [RBTREE] Update eventpoll.c to use rb_parent() accessor macro.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-04-21 13:17:24 +01:00
David Woodhouse 52b5108ca7 [RBTREE] Update ext3 to use rb_parent() accessor macro.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-04-21 13:15:57 +01:00
Jens Axboe 82aa5d6183 [PATCH] splice: fix smaller sized splice reads
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-20 13:05:48 +02:00
Linus Torvalds 949b211235 Merge git://git.linux-nfs.org/pub/linux/nfs-2.6
* git://git.linux-nfs.org/pub/linux/nfs-2.6:
  SUNRPC: Dead code in net/sunrpc/auth_gss/auth_gss.c
  NFS: remove needless check in nfs_opendir()
  NFS: nfs_show_stats; for_each_possible_cpu(), not NR_CPUS
  NFS: make 2 functions static
  NFS,SUNRPC: Fix compiler warnings if CONFIG_PROC_FS & CONFIG_SYSCTL are unset
  NFS: fix PROC_FS=n compile error
  VFS: Fix another open intent Oops
  RPCSEC_GSS: fix leak in krb5 code caused by superfluous kmalloc
2006-04-19 10:46:59 -07:00
Carsten Otte 7451c4f0ee NFS: remove needless check in nfs_opendir()
Local variable res was initialized to 0 - no check needed here.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-04-19 13:06:37 -04:00
John Hawkes b9d9506d94 NFS: nfs_show_stats; for_each_possible_cpu(), not NR_CPUS
Convert a for-loop that explicitly references "NR_CPUS" into the
potentially more efficient for_each_possible_cpu() construct.

Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-04-19 13:06:20 -04:00
Adrian Bunk ec535ce154 NFS: make 2 functions static
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-04-19 12:43:47 -04:00
Trond Myklebust e99170ff3b NFS,SUNRPC: Fix compiler warnings if CONFIG_PROC_FS & CONFIG_SYSCTL are unset
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-04-19 12:43:47 -04:00
Trond Myklebust 95cf959b24 VFS: Fix another open intent Oops
If the call to nfs_intent_set_file() fails to open a file in
nfs4_proc_create(), we should return an error.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-04-19 12:43:46 -04:00
Linus Torvalds 0efd9323f3 Merge branch 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block
* 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block:
  [PATCH] splice: fixup writeout path after ->map changes
  [PATCH] splice: offset fixes
  [PATCH] tee: link_pipe() must be careful when dropping one of the pipe locks
  [PATCH] splice: cleanup the SPLICE_F_NONBLOCK handling
  [PATCH] splice: close i_size truncate races on read
2006-04-19 09:25:52 -07:00
Dipankar Sarma ca99c1da08 [PATCH] Fix file lookup without ref
There are places in the kernel where we look up files in fd tables and
access the file structure without holding refereces to the file.  So, we
need special care to avoid the race between looking up files in the fd
table and tearing down of the file in another CPU.  Otherwise, one might
see a NULL f_dentry or such torn down version of the file.  This patch
fixes those special places where such a race may happen.

Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-19 09:13:51 -07:00
Arthur Othieno dda27d1a55 [PATCH] hugetlbfs: add Kconfig help text
In kernel bugzilla #6248 (http://bugzilla.kernel.org/show_bug.cgi?id=6248),
Adrian Bunk <bunk@stusta.de> notes that CONFIG_HUGETLBFS is missing Kconfig
help text.

Signed-off-by: Arthur Othieno <apgo@patchbomb.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-19 09:13:50 -07:00
Eric W. Biederman 5e85d4abe3 [PATCH] task: Make task list manipulations RCU safe
While we can currently walk through thread groups, process groups, and
sessions with just the rcu_read_lock, this opens the door to walking the
entire task list.

We already have all of the other RCU guarantees so there is no cost in
doing this, this should be enough so that proc can stop taking the
tasklist lock during readdir.

prev_task was killed because it has no users, and using it will miss new
tasks when doing an rcu traversal.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-19 09:13:49 -07:00
Jens Axboe 9e0267c26e [PATCH] splice: fixup writeout path after ->map changes
Since ->map() no longer locks the page, we need to adjust the handling
of those pages (and stealing) a little. This now passes full regressions
again.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-19 15:57:31 +02:00
Jens Axboe a4514ebd8e [PATCH] splice: offset fixes
- We need to adjust *ppos for writes as well.
- Copy back modified offset value if one was passed in, similar to
  what sendfile does.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-19 15:57:05 +02:00
Jens Axboe 2a27250e6c [PATCH] tee: link_pipe() must be careful when dropping one of the pipe locks
We need to ensure that we only drop a lock that is ordered last, to avoid
ABBA deadlocks with competing processes.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-19 15:56:40 +02:00
Jens Axboe c4f895cbe1 [PATCH] splice: cleanup the SPLICE_F_NONBLOCK handling
- generic_file_splice_read() more readable and correct
- Don't bail on page allocation with NONBLOCK set, just don't allow
  direct blocking on IO (eg lock_page).

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-19 15:56:12 +02:00
Jens Axboe 91ad66ef44 [PATCH] splice: close i_size truncate races on read
We need to check i_size after doing a blocking readpage.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-19 15:55:10 +02:00
Linus Torvalds 385910f2b2 x86: be careful about tailcall breakage for sys_open[at] too
Came up through a quick grep for other cases similar to the ftruncate()
one in commit 0a489cb3b6.

Also, add a comment, so that people who read the code understand why we
do what looks like a no-op.

(Again, this won't actually matter to any sane user, since libc will
save and restore the register gcc stomps on, but it's still wrong to
stomp on it)

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-18 13:22:59 -07:00
Linus Torvalds 0a489cb3b6 x86: don't allow tail-calls in sys_ftruncate[64]()
Gcc thinks it owns the incoming argument stack, but that's not true for
"asmlinkage" functions, and it corrupts the caller-set-up argument stack
when it pushes the third argument onto the stack.  Which can result in
%ebx getting corrupted in user space.

Now, normally nobody sane would ever notice, since libc will save and
restore %ebx anyway over the system call, but it's still wrong.

I'd much rather have "asmlinkage" tell gcc directly that it doesn't own
the stack, but no such attribute exists, so we're stuck with our hacky
manual "prevent_tail_call()" macro once more (we've had the same issue
before with sys_waitpid() and sys_wait4()).

Thanks to Hans-Werner Hilse <hilse@sub.uni-goettingen.de> for reporting
the issue and testing the fix.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-18 13:02:48 -07:00
Richard Purdie 373d5e7183 JFFS2: Return an error for long filenames
Return an error if a name is too long for JFFS2 rather than
corrupting data.

Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
2006-04-18 02:05:46 +01:00
Ananiev, Leonid I 75616cf985 [PATCH] ext3: Fix missed mutex unlock
Missed unlock_super()call is added in error condition code path.

Signed-off-by: Leonid Ananiev <leonid.i.ananiev@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-04-17 14:24:57 -07:00
Stephen Rothwell 2436f039d2 [PATCH] Fix block device symlink name
As noted further on the this file, some block devices have a / in their
name, so fix the "block:..." symlink name the same as the /sys/block name.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-04-17 14:24:57 -07:00
David Woodhouse 94171db1d2 Merge with git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git 2006-04-17 15:35:18 +01:00
David Woodhouse d96fb997c6 [JFFS2] Fix race in post-mount node checking
For a while now, we've postponed CRC-checking of data nodes to be done
by the GC thread, instead of being done while the user is waiting for
mount to finish. The GC thread would iterate through all the inodes on
the system and check each of their data nodes. It would skip over inodes
which had already been used or were already being read in by
read_inode(), because their data nodes would have been examined anyway.

However, we could sometimes reach the end of the for-each-inode loop and
still have some unchecked space left, if an inode we'd skipped was
_still_ in the process of being read. This fixes that race by actually
waiting for read_inode() to finish rather than just moving on.

Thanks to Ladislav Michl for coming up with a reproducible test case and
helping to track it down.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-04-17 00:19:48 +01:00
Kay Sievers d4d7e5dffc [PATCH] BLOCK: delay all uevents until partition table is scanned
[BLOCK] delay all uevents until partition table is scanned

Here we delay the annoucement of all block device events until the
disk's partition table is scanned and all partition devices are already
created and sysfs is populated.

We have a bunch of old bugs for removable storage handling where we
probe successfully for a filesystem on the raw disk, but at the
same time the kernel recognizes a partition table and creates partition
devices.
Currently there is no sane way to tell if partitions will show up or not
at the time the disk device is announced to userspace. With the delayed
events we can simply skip any probe for a filesystem on the raw disk when
we find already present partitions.

Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-04-14 11:41:24 -07:00
NeilBrown 4508a7a734 [PATCH] sysfs: Allow sysfs attribute files to be pollable
It works like this:
  Open the file
  Read all the contents.
  Call poll requesting POLLERR or POLLPRI (so select/exceptfds works)
  When poll returns,
     close the file and go to top of loop.
   or lseek to start of file and go back to the 'read'.

Events are signaled by an object manager calling
   sysfs_notify(kobj, dir, attr);

If the dir is non-NULL, it is used to find a subdirectory which
contains the attribute (presumably created by sysfs_create_group).

This has a cost of one int  per attribute, one wait_queuehead per kobject,
one int per open file.

The name "sysfs_notify" may be confused with the inotify
functionality.  Maybe it would be nice to support inotify for sysfs
attributes as well?

This patch also uses sysfs_notify to allow /sys/block/md*/md/sync_action
to be pollable

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-04-14 11:41:24 -07:00
Linus Torvalds 9a7e9f1c60 Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/mszeredi/fuse
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/mszeredi/fuse:
  [fuse] Direct I/O  should not use fuse_reset_request
  [fuse] Don't init request twice
  [fuse] Fix accounting the number of waiting requests
  [fuse] fix deadlock between fuse_put_super() and request_end()
2006-04-14 09:11:34 -07:00
Linus Torvalds 9ca686626c Merge branch 'tee' of git://brick.kernel.dk/data/git/linux-2.6-block
* 'tee' of git://brick.kernel.dk/data/git/linux-2.6-block:
  [PATCH] splice: add support for sys_tee()
  [PATCH] splice: pass offset around for ->splice_read() and ->splice_write()
2006-04-14 09:02:07 -07:00
Eric W. Biederman c06511d12d [PATCH] de_thread: Don't change our parents and ptrace flags.
This is two distinct changes.
 - Not changing our real parents.
 - Not changing our ptrace parents.

Not changing our real parents is trivially correct because both tasks
have the same real parents as they are part of a thread group.  Now that
we demote the leader to a thread there is no longer any reason to change
it's parentage.

Not changing our ptrace parents is a user visible change if someone
looks hard enough.  I don't think user space applications will care or
even notice.

In the practical and I think common case a debugger will have attached
to all of the threads using the same ptrace flags.  From my quick skim
of strace and gdb that appears to be the case.  Which if true means
debuggers will not notice a change.

Before this point we have already generated a ptrace event in do_exit
that reports the leaders pid has died so de_thread is visible to a
debugger.  Which means attempting to hide this case by copying flags
around appears excessive.

By not doing anything it avoids all of the weird locking issues between
de_thread and ptrace attach, and removes one case from consideration for
fixing the ptrace locking.

This only addresses Oleg's first concern with ptrace_attach, that of the
problems caused by reparenting.  Oleg's second concern is essentially a
race between ptrace_attach and release_task that causes an oops when we
get to force_sig_specific.  There is nothing special about de_thread
with respect to that race.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-14 08:49:19 -07:00
Randy Dunlap fb6a82c94a [PATCH] jffs2: fix printk warnings
Fix printk format warnings in jffs2.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-04-11 20:12:10 -04:00
Miklos Szeredi 56cf34ff07 [fuse] Direct I/O should not use fuse_reset_request
It's cleaner to allocate a new request, otherwise the uid/gid/pid
fields of the request won't be filled in.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
2006-04-11 21:16:51 +02:00
Miklos Szeredi 4858cae4f0 [fuse] Don't init request twice
Request is already initialized in fuse_request_alloc() so no need to
do it again in fuse_get_req().

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
2006-04-11 21:16:38 +02:00
Miklos Szeredi 9bc5dddad1 [fuse] Fix accounting the number of waiting requests
Properly accounting the number of waiting requests was forgotten in
"clean up request accounting" patch.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
2006-04-11 21:16:09 +02:00
Miklos Szeredi 73ce8355c2 [fuse] fix deadlock between fuse_put_super() and request_end()
A deadlock was possible, when the last reference to the superblock was
held due to a background request containing a file reference.

Releasing the file would release the vfsmount which in turn would
release the superblock.  Since sbput_sem is held during the fput() and
fuse_put_super() tries to acquire this same semaphore, a deadlock
results.

The chosen soltuion is to get rid of sbput_sem, and instead use the
spinlock to ensure the referenced inodes/file are released only once.
Since the actual release may sleep, defer these outside the locked
region, but using local variables instead of the structure members.

This is a much more rubust solution.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
2006-04-11 21:14:26 +02:00
Jens Axboe 70524490ee [PATCH] splice: add support for sys_tee()
Basically an in-kernel implementation of tee, which uses splice and the
pipe buffers as an intelligent way to pass data around by reference.

Where the user space tee consumes the input and produces a stdout and
file output, this syscall merely duplicates the data inside a pipe to
another pipe. No data is copied, the output just grabs a reference to the
input pipe data.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-11 15:51:17 +02:00
Jens Axboe cbb7e577e7 [PATCH] splice: pass offset around for ->splice_read() and ->splice_write()
We need not use ->f_pos as the offset for the file input/output. If the
user passed an offset pointer in through sys_splice(), just use that and
leave ->f_pos alone.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-11 15:47:07 +02:00
Linus Torvalds 88dd9c16ce Merge branch 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block
* 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block:
  [PATCH] vfs: add splice_write and splice_read to documentation
  [PATCH] Remove sys_ prefix of new syscalls from __NR_sys_*
  [PATCH] splice: warning fix
  [PATCH] another round of fs/pipe.c cleanups
  [PATCH] splice: comment styles
  [PATCH] splice: add Ingo as addition copyright holder
  [PATCH] splice: unlikely() optimizations
  [PATCH] splice: speedups and optimizations
  [PATCH] pipe.c/fifo.c code cleanups
  [PATCH] get rid of the PIPE_*() macros
  [PATCH] splice: speedup __generic_file_splice_read
  [PATCH] splice: add direct fd <-> fd splicing support
  [PATCH] splice: add optional input and output offsets
  [PATCH] introduce a "kernel-internal pipe object" abstraction
  [PATCH] splice: be smarter about calling do_page_cache_readahead()
  [PATCH] splice: optimize the splice buffer mapping
  [PATCH] splice: cleanup __generic_file_splice_read()
  [PATCH] splice: only call wake_up_interruptible() when we really have to
  [PATCH] splice: potential !page dereference
  [PATCH] splice: mark the io page as accessed
2006-04-11 06:34:02 -07:00
NeilBrown 358dd55aa3 [PATCH] knfsd: nfsd4: grant delegations more frequently
Keep unused openowners around for at least one lease period, to avoid the need
for as many open confirmations and to allow handing out more delegations.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:53 -07:00
NeilBrown ef0f3390eb [PATCH] knfsd: nfsd4: limit number of delegations handed out.
It's very easy for the server to DOS itself by just giving out too many
delegations.

For now we just solve the problem with a dumb hard limit.  Eventually we'll
want a smarter policy.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:53 -07:00
NeilBrown 4e2fd495b5 [PATCH] knfsd: nfsd4: add missing rpciod_down()
We should be shutting down rpciod for the callback channel when we shut down
the server.

Also note that we do rpciod_up() and create the callback client *before*
setting cb_set--the cb_set only determines whether the initial null was
succesful.  So cb_set is not a reliable determiner of whether we need to clean
up, only cb_client is.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:53 -07:00
NeilBrown 541e0e0981 [PATCH] knfsd: nfsd4: nfsd4_probe_callback cleanup
Some obvious cleanup.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:53 -07:00
NeilBrown 5e8d5c2948 [PATCH] knfsd: nfsd4: fix laundromat shutdown race
We need to make sure the laundromat work doesn't reschedule itself just when
we try to cancel it.  Also, we shouldn't be waiting for it to finish running
while holding the state lock, as that's a potential deadlock.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:52 -07:00
NeilBrown bb6e8a9f40 [PATCH] knfsd: nfsd4: fix corruption on readdir encoding with 64k pages
Fix corruption on readdir encoding with 64k pages.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:52 -07:00
NeilBrown 6ed6decccf [PATCH] knfsd: nfsd4: fix corruption of returned data when using 64k pages
In v4 we grab an extra page just for the padding of returned data.  The
formula that the rpc server uses to allocate pages for the response doesn't
take into account this extra page.

Instead of adjusting those formulae, we adopt the same solution as v2 and v3,
and put the "tail" data in the same page as the "head" data.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:52 -07:00
NeilBrown f0e2993e9e [PATCH] knfsd: nfsd4: remove nfsd_setuser from putrootfh
Since nfsd_setuser() is already called from any operation that uses the
current filehandle (because it's called from fh_verify), there's no reason to
call it from putrootfh.

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:52 -07:00
NeilBrown 54cceebb67 [PATCH] knfsd: nfsd: nfsd_setuser doesn't really need to modify rqstp->rq_cred.
In addition to setting the processes filesystem id's, nfsd_setuser also
modifies the value of the rq_cred which stores the id's that originally came
from the rpc call, for example to reflect root squashing.

There's no real reason to do that--the only case where rqstp->rq_cred is
actually used later on is in the NFSv4 SETCLIENTID/SETCLIENTID_CONFIRM
operations, and there the results are the opposite of what we want--those two
operations don't deal with the filesystem at all, they only record the
credentials used with the rpc call for later reference (so that we may require
the same credentials be used on later operations), and the credentials
shouldn't vary just because there was or wasn't a previous operation in the
compound that referred to some export

This fixes a bug which caused mounts from Solaris clients to fail.

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:52 -07:00
NeilBrown cd15654963 [PATCH] knfsd: nfsd: oops exporting nonexistent directory
Export a directory that does not exist:
	exportfs -orw,fsid=0,insecure,no_subtree_check client:/home/NFS4

Try to mount from client with nfs4. Mount hangs (I'm not sure why -
that's another issue).

While client is hung, back on server

	mkdir /home/NFS4

The server panics in dput.  I traced the problem back to svc_export_parse()
calling path_release() even though path_lookup() failed (it happens to fill in
the nameidata structure with a negative dentry - so the test after out:
succeeds).

After patching, an recreating the problem, the client mount still takes some
time before finally exiting with a message "couldn't read superblock".

Here is a simple patch to resolve this issue:

Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:52 -07:00
NeilBrown b5872b0dcc [PATCH] knfsd: nfsd4: fix acl xattr length return
We should be using the length from the second vfs_getxattr, in case it
changed.  (Note: there's still a small race here; we could end up returning
-ENOMEM if the length increased between the first and second call.  I don't
know whether it's worth spending a lot of effort to fix that.)

This makes XFS ACLs usable on NFS exports, which they currently aren't, since
XFS appears to be returning a too-large value for vfs_getxattr() when it's
passed a NULL buffer.  So there's probably an XFS bug here too, though since
getxattr with a NULL buffer is usually used to decide how much memory to
allocate, it may be a fairly harmless bug in most cases.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:51 -07:00
NeilBrown b905b7b0a0 [PATCH] knfsd: nfsd4: better nfs4acl errors
We're returning -1 in a few places in the NFSv4<->POSIX acl translation code
where we could return a reasonable error.

Also allows some minor simplification elsewhere.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:51 -07:00
NeilBrown 249920527f [PATCH] knfsd: nfsd4: Wrong error handling in nfs4acl
this fixes coverity id #3.  Coverity detected dead code, since the == -1
comparison only returns 0 or 1 to error.  Therefore the if ( error < 0 )
statement was always false.  Seems that this was an if( error = nfs4...  )
statement some time ago, which got broken during cleanup.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:51 -07:00
Adrian Bunk e465a77f94 [PATCH] fs/nfsd/nfs4state.c: make a struct static
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Marc Eshel <eshel@almaden.ibm.com>
Cc: Andy Adamson <andros@citi.umich.edu>
Cc: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:51 -07:00
NeilBrown d5b9026a67 [PATCH] knfsd: locks: flag NFSv4-owned locks
Use the fl_lmops field to identify which locks are ours, instead of trying to
look them up in our private hash.  This is safer and more efficient.

Earlier versions of this patch used a lock flag instead, but Trond pointed out
that adding a new flag for each lock manager wasn't going to scale well, and
suggested this approach instead; a separate patch converts lockd to using
fl_lmops in the same way.

In the NFSv4 case this looks like a bit of a hack, since the NFSv4 server
isn't currently actually defining a lock_manager_operations struct, so we end
up defining one *just* to serve as a cookie to identify our locks.

But it works, and we actually do expect to start using the
lock_manager_operations at some point anyway.

Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:51 -07:00
NeilBrown 7775f4c85d [PATCH] knfsd: Correct reserved reply space for read requests.
NFSd makes sure there is enough space to hold the maximum possible reply
before accepting a request.  The units for this maximum is (4byte) words.
However in three places, particularly for read request, the number given is
a number of bytes.

This means too much space is reserved which is slightly wasteful.

This is the sort of patch that could uncover a deeper bug, and it is not
critical, so it would be best for it to spend a while in -mm before going
in to mainline.

(akpm: target 2.6.17-rc2, 2.6.16.3 (approx))

Discovered-by: "Eivind  Sarto" <ivan@kasenna.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:51 -07:00
Miklos Szeredi 08a53cdce6 [PATCH] fuse: account background requests
The previous patch removed limiting the number of outstanding requests.  This
patch adds a much simpler limiting, that is also compatible with file locking
operations.

A task may have at most one synchronous request allocated.  So these requests
need not be otherwise limited.

However the number of background requests (release, forget, asynchronous
reads, interrupted requests) can grow indefinitely.  This can be used by a
malicous user to cause FUSE to allocate arbitrary amounts of unswappable
kernel memory, denying service.

For this reason add a limit for the number of background requests, and block
allocations of new requests until the number goes bellow the limit.

Also use this mechanism to block all requests until the INIT reply is
received.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:49 -07:00
Miklos Szeredi ce1d5a491f [PATCH] fuse: clean up request accounting
FUSE allocated most requests from a fixed size pool filled at mount time.
However in some cases (release/forget) non-pool requests were used.  File
locking operations aren't well served by the request pool, since they may
block indefinetly thus exhausting the pool.

This patch removes the request pool and always allocates requests on demand.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:49 -07:00
Miklos Szeredi a87046d822 [PATCH] fuse: consolidate device errors
Return consistent error values for the case when the opened device file has no
mount associated yet.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:48 -07:00
Miklos Szeredi d713311464 [PATCH] fuse: use a per-mount spinlock
Remove the global spinlock in favor of a per-mount one.

This patch is basically find & replace.  The difficult part has already been
done by the previous patch.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:48 -07:00
Miklos Szeredi 0720b31597 [PATCH] fuse: simplify locking
This is in preparation for removing the global spinlock in favor of a
per-mount one.

The only critical part is the interaction between fuse_dev_release() and
fuse_fill_super(): fuse_dev_release() must see the assignment to
file->private_data, otherwise it will leak the reference to fuse_conn.

This is ensured by the fput() operation, which will synchronize the assignment
with other CPU's that may do a final fput() soon after this.

Also redundant locking is removed from fuse_fill_super(), where exclusion is
already ensured by the BKL held for this function by the VFS.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:48 -07:00
Jeff Dike e5ac1d1e70 [PATCH] fuse: add O_NONBLOCK support to FUSE device
I don't like duplicating the connected and list_empty tests in fuse_dev_readv,
but this seemed cleaner than adding the f_flags test to request_wait.

Signed-off-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:48 -07:00
Jeff Dike 385a17bfc3 [PATCH] fuse: add O_ASYNC support to FUSE device
This adds asynchronous notification to FUSE - a FUSE server can request
O_ASYNC on a /dev/fuse file descriptor and receive SIGIO when there is input
available.

One subtlety - fuse_dev_fasync, which is called when O_ASYNC is requested,
does no locking, unlink the other methods.  I think it's unnecessary, as the
fuse_conn.fasync list is manipulated only by fasync_helper and kill_fasync,
which provide their own locking.  It would also be wrong to use the fuse_lock,
as it's a spin lock and fasync_helper can sleep.  My one concern with this is
the fuse_conn going away underneath fuse_dev_fasync - sys_fcntl takes a
reference on the file struct, so this seems not to be a problem.

Signed-off-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:48 -07:00
Miklos Szeredi 7025d9ad10 [PATCH] fuse: fix fuse_dev_poll() return value
fuse_dev_poll() returned an error value instead of a poll mask.  Luckily (or
unluckily) -ENODEV does contain the POLLERR bit.

There's also a race if filesystem is unmounted between fuse_get_conn() and
spin_lock(), in which case this event will be missed by poll().

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:47 -07:00
Miklos Szeredi d3406ffa4a [PATCH] fuse: fix oops in fuse_send_readpages()
During heavy parallel filesystem activity it was possible to Oops the kernel.
The reason is that read_cache_pages() could skip pages which have already been
inserted into the cache by another task.  Occasionally this may result in zero
pages actually being sent, while fuse_send_readpages() relies on at least one
page being in the request.

So check this corner case and just free the request instead of trying to send
it.

Reported and tested by Konstantin Isakov.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:47 -07:00
Ananiev, Leonid I 389ed39b97 [PATCH] ext3: Fix missed mutex unlock
Missed unlock_super()call is added in error condition code path.

Signed-off-by: Leonid Ananiev <leonid.i.ananiev@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:46 -07:00
Arnd Bergmann 091e881d0e [PATCH] inotify: check for NULL inode in inotify_d_instantiate
The spufs file system creates files in a directory before instantiating the
directory itself, which causes a NULL pointer access in
inotify_d_instantiate since c32ccd87bf.

I'd like to keep this behavior since it means that the user will not have
access to files in the directory before I know that I succeed in creating
everything in it.  This patch adds a simple check for the inode to keep
that working.

Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:45 -07:00
Vivek Goyal 68250ba5df [PATCH] kdump: enable CONFIG_PROC_VMCORE by default
Everybody seems to be using /proc/vmcore as a method to access the kernel
crash dump.  Hence probably it makes sense to enable CONFIG_PROC_VMCORE by
default if CONFIG_CRASH_DUMP is selected.  This makes kdump configuration
further easier for a user.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:45 -07:00
Roland McGrath f5e902817f [PATCH] process accounting: take original leader's start_time in non-leader exec
The only record we have of the real-time age of a process, regardless of
execs it's done, is start_time.  When a non-leader thread exec, the
original start_time of the process is lost.  Things looking at the
real-time age of the process are fooled, for example the process accounting
record when the process finally dies.  This change makes the oldest
start_time stick around with the process after a non-leader exec.  This way
the association between PID and start_time is kept constant, which seems
correct to me.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:42 -07:00
Davide Libenzi 2395140ee2 [PATCH] uniform POLLRDHUP handling between epoll and poll/select
As reported by Michael Kerrisk, POLLRDHUP handling was not consistent
between epoll and poll/select, since in epoll it was unmaskeable.  This
patch brings uniformity in POLLRDHUP handling.

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:42 -07:00
Vivek Goyal 80e8ff6341 [PATCH] kdump proc vmcore size oveflow fix
A couple of /proc/vmcore data structures overflow with 32bit systems having
memory more than 4G.  This patch fixes those.

Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:42 -07:00
Mitchell Blank Jr b04eb6aa08 [PATCH] select: don't overflow if (SELECT_STACK_ALLOC % sizeof(long) != 0)
If SELECT_STACK_ALLOC is not a multiple of sizeof(long) then stack_fds[]
would be shorter than SELECT_STACK_ALLOC bytes and could overflow later in
the function.  Fixed by simply rearranging the test later to work on
sizeof(stack_fds) Currently SELECT_STACK_ALLOC is 256 so this doesn't
happen, but it's nasty to have things like this hidden in the code.  What
if later someone decides to change SELECT_STACK_ALLOC to 300?

Signed-off-by: Mitchell Blank Jr <mitch@sfgoth.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:41 -07:00
Eric Van Hensbergen 00fbc6dfe7 [PATCH] 9p: handle sget() failure
Handle a failing sget() in v9fs_get_sb().

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:41 -07:00
Herbert Poetzl f6422f17d3 [PATCH] vfs: propagate mnt_flags into do_loopback/vfsmount
The mnt_flags are propagated into do_loopback(), so that they can be stored
with the vfsmount

Signed-off-by: Herbert Poetzl <herbert@13thfloor.at>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:41 -07:00
Andrew Morton 5246d05031 [PATCH] sync_file_range(): use unsigned for flags
Ulrich suggested that the `flags' arg to sync_file_range() become unsigned.

Cc: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:40 -07:00
Jeff Dike 7b04d7170e [PATCH] Add GFP_NOWAIT
Introduce GFP_NOWAIT, as an alias for GFP_ATOMIC & ~__GFP_HIGH.

This also changes XFS, which is the only in-tree user of this idiom that I
could find.  The XFS piece is compile-tested only.

Signed-off-by: Jeff Dike <jdike@addtoit.com>
Acked-by: Nathan Scott <nathans@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:35 -07:00
Andrew Morton 29ff2db551 [PATCH] select() warning fixes
fs/select.c: In function `core_sys_select':
fs/select.c:339: warning: assignment from incompatible pointer type
fs/select.c:376: warning: comparison of distinct pointer types lacks a cast

By using a void* we can remove lots of casts rather than adding more.

Cc: Jes Sorensen <jes@trained-monkey.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:30 -07:00
Ingo Molnar 341b446bc5 [PATCH] another round of fs/pipe.c cleanups
make pipe.c a bit more readable and hackable.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-11 13:57:45 +02:00
Ingo Molnar 73d62d83ec [PATCH] splice: comment styles
- capitalize consistently
 - end sentences in one way or another
 - update comment text to match the implementation

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-11 13:57:21 +02:00
Jens Axboe c2058e0611 [PATCH] splice: add Ingo as addition copyright holder
The comment is also somewhat out of date, correct that as well.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-11 13:56:34 +02:00
Jens Axboe 49570e9b29 [PATCH] splice: unlikely() optimizations
Also corrects a few comments. Patch mainly from Ingo, changes by me.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-11 13:56:09 +02:00
Jens Axboe 6f767b0425 [PATCH] splice: speedups and optimizations
- Kill the local variables that cache ->nrbufs, they just take up space.

- Only set do_wakeup for a real pipe. This is a big win for direct splicing.

- Kill i_mutex lock around ->f_pos update, regular io paths don't do this
  either.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-11 13:53:56 +02:00
Ingo Molnar 923f4f2394 [PATCH] pipe.c/fifo.c code cleanups
more code cleanups after the macro conversion:

 - standardize on 'struct pipe_inode_info *pipe' variable names
 - introduce 'pipe' temporaries to reduce mass inode->i_pipe dereferencing

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-11 13:53:33 +02:00
Ingo Molnar 9aeedfc471 [PATCH] get rid of the PIPE_*() macros
get rid of the PIPE_*() macros. Scripted transformation.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-11 13:53:10 +02:00
Jens Axboe 7480a90435 [PATCH] splice: speedup __generic_file_splice_read
Using find_get_page() is a lot faster than find_or_create_page(). This
gets splice a lot closer to sendfile() for fd -> socket transfers.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-11 13:52:47 +02:00
Jens Axboe b92ce55893 [PATCH] splice: add direct fd <-> fd splicing support
It's more efficient for sendfile() emulation. Basically we cache an
internal private pipe and just use that as the intermediate area for
pages. Direct splicing is not available from sys_splice(), it is only
meant to be used for sendfile() emulation.

Additional patch from Ingo Molnar to avoid the PIPE_BUFFERS loop at
exit for the normal fast path.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-11 13:52:07 +02:00
Nathan Scott 019ff2d57b [XFS] Fix a problem in aligning inode allocations to stripe unit
boundaries.

SGI-PV: 951862
SGI-Modid: xfs-linux-melb:xfs-kern:25726a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-04-11 15:45:05 +10:00
Nathan Scott 8c0b5113a5 [XFS] Fix utime(2) in the case that no times parameter was passed in.
SGI-PV: 949858
SGI-Modid: xfs-linux-melb:xfs-kern:25717a

Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-04-11 15:12:45 +10:00
David Chinner 58829e490e [XFS] Fix an inode use-after-free durin an unpin. When reclaiming inodes
that have been unlinked, we may need to execute transactions during
reclaim. By the time the transaction has hit the disk, the linux inode and
xfs vnode may already have been freed so we can't reference them safely.
Use the known xfs inode state to determine if it is safe to reference the
vnode and linux inode during the unpin operation.

SGI-PV: 946321
SGI-Modid: xfs-linux-melb:xfs-kern:25687a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-04-11 15:11:20 +10:00
David Chinner 1fc5d959d8 [XFS] Fix inode reclaim scalability regression. When a filesystem has
millions of inodes cached and has sparse cluster population, removing
inodes from the cluster hash consumes excessive amounts of CPU time.
Reduce the CPU cost by making removal O(1) via use of a double linked list
for the hash chains.

SGI-PV: 951551
SGI-Modid: xfs-linux-melb:xfs-kern:25683a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-04-11 15:11:12 +10:00
Nathan Scott 8272145c05 [XFS] Fix a writepage regression where we accidentally stopped honouring
nonblock mode with the new IO path code (since 2.6.16).

SGI-PV: 951662
SGI-Modid: xfs-linux-melb:xfs-kern:25676a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-04-11 15:10:55 +10:00
Nathan Scott e50bd16fe4 [XFS] Fix superblock validation regression for the zero imaxpct case.
Thanks to kjamieson for noticing.

SGI-PV: 951661
SGI-Modid: xfs-linux-melb:xfs-kern:25675a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-04-11 15:10:45 +10:00
Linus Torvalds e38d557896 Merge branch 'upstream-linus' of git://oss.oracle.com/home/sourcebo/git/ocfs2
* 'upstream-linus' of git://oss.oracle.com/home/sourcebo/git/ocfs2:
  [PATCH] CONFIGFS_FS must depend on SYSFS
  [PATCH] Bogus NULL pointer check in fs/configfs/dir.c
  ocfs2: Better I/O error handling in heartbeat
  ocfs2: test and set teardown flag early in user_dlm_destroy_lock()
  ocfs2: Handle the DLM_CANCELGRANT case in user_unlock_ast()
  ocfs2: catch an invalid ast case in dlmfs
  ocfs2: remove an overly aggressive BUG() in dlmfs
  ocfs2: multi node truncate fix
2006-04-10 16:44:09 -07:00
Eric W. Biederman de12a7878c [PATCH] de_thread: Don't confuse users do_each_thread.
Oleg Nesterov spotted two interesting bugs with the current de_thread
code.  The simplest is a long standing double decrement of
__get_cpu_var(process_counts) in __unhash_process.  Caused by
two processes exiting when only one was created.

The other is that since we no longer detach from the thread_group list
it is possible for do_each_thread when run under the tasklist_lock to
see the same task_struct twice.  Once on the task list as a
thread_group_leader, and once on the thread list of another
thread.

The double appearance in do_each_thread can cause a double increment
of mm_core_waiters in zap_threads resulting in problems later on in
coredump_wait.

To remedy those two problems this patch takes the simple approach
of changing the old thread group leader into a child thread.
The only routine in release_task that cares is __unhash_process,
and it can be trivially seen that we handle cleaning up a
thread group leader properly.

Since de_thread doesn't change the pid of the exiting leader process
and instead shares it with the new leader process.  I change
thread_group_leader to recognize group leadership based on the
group_leader field and not based on pids.  This should also be
slightly cheaper then the existing thread_group_leader macro.

I performed a quick audit and I couldn't see any user of
thread_group_leader that cared about the difference.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-10 16:36:50 -07:00
Adrian Bunk 65714b9184 [PATCH] CONFIGFS_FS must depend on SYSFS
This patch fixes the a compile error with CONFIG_SYSFS=n

Configfs is creating, as a matter of policy, the /sys/kernel/config
mountpoint.  This means it requires CONFIG_SYSFS.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-04-10 11:17:21 -07:00
Eric Sesterhenn cbca692c24 [PATCH] Bogus NULL pointer check in fs/configfs/dir.c
We check the "group" pointer after we dereference it.  This check is
bogus, as it cannot be NULL coming in.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-04-10 11:16:17 -07:00
Ingo Molnar 529565dcb1 [PATCH] splice: add optional input and output offsets
add optional input and output offsets to sys_splice(), for seekable file
descriptors:

 asmlinkage long sys_splice(int fd_in, loff_t __user *off_in,
                            int fd_out, loff_t __user *off_out,
                            size_t len, unsigned int flags);

semantics are straightforward: f_pos will be updated with the offset
provided by user-space, before the splice transfer is about to begin.
Providing a NULL offset pointer means the existing f_pos will be used
(and updated in situ).  Providing an offset for a pipe results in
-ESPIPE. Providing an invalid offset pointer results in -EFAULT.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-10 15:18:58 +02:00
Ingo Molnar 3a326a2ce8 [PATCH] introduce a "kernel-internal pipe object" abstraction
separate out the 'internal pipe object' abstraction, and make it
usable to splice. This cleans up and fixes several aspects of the
internal splice APIs and the pipe code:

 - pipes: the allocation and freeing of pipe_inode_info is now more symmetric
   and more streamlined with existing kernel practices.

 - splice: small micro-optimization: less pointer dereferencing in splice
   methods

Signed-off-by: Ingo Molnar <mingo@elte.hu>

Update XFS for the ->splice_read/->splice_write changes.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-10 15:18:35 +02:00
Jens Axboe 0b749ce380 [PATCH] splice: be smarter about calling do_page_cache_readahead()
We don't want to call into the read-ahead logic unless we are at the
start of a page, _or_ we have multiple pages to read.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-10 09:05:04 +02:00
Jens Axboe 49d0b21be2 [PATCH] splice: optimize the splice buffer mapping
We don't really need to lock down the pages, just make sure they
are uptodate.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-10 09:04:41 +02:00
Jens Axboe 16c523ddab [PATCH] splice: cleanup __generic_file_splice_read()
The whole shadow/pages logic got overly complex, and this simpler
approach is actually faster in testing.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-10 09:03:58 +02:00
Jens Axboe c0bd1f650b [PATCH] splice: only call wake_up_interruptible() when we really have to
__wake_up_common() is pretty heavy in the kernel profiles, this brings
it down to a more acceptable level.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-10 09:03:32 +02:00
Dave Jones 9aefe431f5 [PATCH] splice: potential !page dereference
We can get to out: with a NULL page, which we probably
don't want to be calling page_cache_release() on.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-10 09:02:40 +02:00
Jens Axboe c7f21e4f5a [PATCH] splice: mark the io page as accessed
We should do that, since we do the LRU manipulation ourselves now. Suggested
by Nick Piggin.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-10 09:01:01 +02:00
Mark Fasheh a9e2ae3917 ocfs2: Better I/O error handling in heartbeat
Propagate errors received in o2hb_bio_end_io() back to the heartbeat thread
so it can skip re-arming the timer.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-04-07 18:03:09 -07:00
Mark Fasheh 2cd9888590 ocfs2: test and set teardown flag early in user_dlm_destroy_lock()
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-04-07 17:39:43 -07:00
Mark Fasheh f43e6918c0 ocfs2: Handle the DLM_CANCELGRANT case in user_unlock_ast()
Remove the code which attempted to catch it via dlmunlock() return status -
this never happens there.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-04-07 17:37:52 -07:00
Mark Fasheh cc6eb72595 ocfs2: catch an invalid ast case in dlmfs
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-04-07 17:36:16 -07:00
Mark Fasheh 1f7bc828e3 ocfs2: remove an overly aggressive BUG() in dlmfs
Don't BUG() user_dlm_unblock_lock() on the absence of the USER_LOCK_BLOCKED
flag - this turns out to be a valid case. Make some of the related BUG()
statements print more useful information.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-04-07 17:27:43 -07:00
Mark Fasheh ab0920ce7e ocfs2: multi node truncate fix
Fix ocfs2_truncate_file() so that it forces a truncate_inode_pages() on all
interested nodes in all cases of a truncate(), not just allocation change.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-04-07 16:47:24 -07:00
Linus Torvalds d69636157a Merge branch 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block
* 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block:
  [PATCH] splice: fix page stealing LRU handling.
  [PATCH] splice: page stealing needs to wait_on_page_writeback()
  [PATCH] splice: export generic_splice_sendpage
  [PATCH] splice: add a SPLICE_F_MORE flag
  [PATCH] splice: add comments documenting more of the code
  [PATCH] splice: improve writeback and clean up page stealing
  [PATCH] splice: fix shadow[] filling logic
2006-04-02 14:22:06 -07:00
Jens Axboe 3e7ee3e7b3 [PATCH] splice: fix page stealing LRU handling.
Originally from Nick Piggin, just adapted to the newer branch.

You can't check PageLRU without holding zone->lru_lock.  The page
release code can get away with it only because the page refcount is 0 at
that point. Also, you can't reliably remove pages from the LRU unless
the refcount is 0. Ever.

Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-02 23:11:04 +02:00
Jens Axboe ad8d6f0a78 [PATCH] splice: page stealing needs to wait_on_page_writeback()
Thanks to Andrew for the good explanation of why this is so. akpm writes:

If a page is under writeback and we remove it from pagecache, it's still
going to get written to disk.  But the VFS no longer knows about that page,
nor that this page is about to modify disk blocks.

So there might be scenarios in which those
blocks-which-are-about-to-be-written-to get reused for something else.
When writeback completes, it'll scribble on those blocks.

This won't happen in ext2/ext3-style filesystems in normal mode because the
page has buffers and try_to_release_page() will fail.

But ext2 in nobh mode doesn't attach buffers at all - it just sticks the
page in a BIO, finds some new blocks, points the BIO at those blocks and
lets it rip.

While that write IO's in flight, someone could truncate the file.  Truncate
won't block on the writeout because the page isn't in pagecache any more.
So truncate will the free the blocks from the file under the page's feet.
Then something else can reallocate those blocks.  Then write data to them.

Now, the original write completes, corrupting the filesystem.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-02 23:10:32 +02:00
Jens Axboe 059a8f3734 [PATCH] splice: export generic_splice_sendpage
Forgot that one, thanks Jeff. Also move the other EXPORT_SYMBOL
to right below the functions.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-02 23:06:05 +02:00
Jens Axboe b2b39fa478 [PATCH] splice: add a SPLICE_F_MORE flag
This lets userspace indicate whether more data will be coming in a
subsequent splice call.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-02 23:05:41 +02:00
Jens Axboe 83f9135bdd [PATCH] splice: add comments documenting more of the code
Hopefully this will make Andrew a little more happy.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-02 23:05:09 +02:00
Jens Axboe 4f6f0bd2ff [PATCH] splice: improve writeback and clean up page stealing
By cleaning up the writeback logic (killing write_one_page() and the manual
set_page_dirty()), we can get rid of ->stolen inside the pipe_buffer and
just keep it local in pipe_to_file().

This also adds dirty page balancing logic and O_SYNC handling.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-02 23:04:46 +02:00
Jens Axboe 53cd9ae886 [PATCH] splice: fix shadow[] filling logic
Clear the entire range, and don't increment pidx or we keep filling
the same position again and again.

Thanks to KAMEZAWA Hiroyuki.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-02 23:04:21 +02:00
Linus Torvalds a2308b7f08 Merge git://oss.sgi.com:8090/oss/git/xfs-2.6
* git://oss.sgi.com:8090/oss/git/xfs-2.6:
  [XFS] Provide XFS support for the splice syscall.
  [XFS] Reenable write barriers by default.
  [XFS] Make project quota enforcement return an error code consistent with
  [XFS] Implement the silent parameter to fill_super, previously ignored.
  [XFS] Cleanup comment to remove reference to obsoleted function
2006-04-02 13:11:25 -07:00
Greg Kroah-Hartman 6e0dd741a8 [PATCH] sysfs: zero terminate sysfs write buffers
No one should be writing a PAGE_SIZE worth of data to a normal sysfs
file, so properly terminate the buffer.

Thanks to Al Viro for pointing out my supidity here.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-02 13:03:31 -07:00
Linus Torvalds 63589ed078 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (48 commits)
  Documentation: fix minor kernel-doc warnings
  BUG_ON() Conversion in drivers/net/
  BUG_ON() Conversion in drivers/s390/net/lcs.c
  BUG_ON() Conversion in mm/slab.c
  BUG_ON() Conversion in mm/highmem.c
  BUG_ON() Conversion in kernel/signal.c
  BUG_ON() Conversion in kernel/signal.c
  BUG_ON() Conversion in kernel/ptrace.c
  BUG_ON() Conversion in ipc/shm.c
  BUG_ON() Conversion in fs/freevxfs/
  BUG_ON() Conversion in fs/udf/
  BUG_ON() Conversion in fs/sysv/
  BUG_ON() Conversion in fs/inode.c
  BUG_ON() Conversion in fs/fcntl.c
  BUG_ON() Conversion in fs/dquot.c
  BUG_ON() Conversion in md/raid10.c
  BUG_ON() Conversion in md/raid6main.c
  BUG_ON() Conversion in md/raid5.c
  Fix minor documentation typo
  BFP->BPF in Documentation/networking/tuntap.txt
  ...
2006-04-02 12:58:45 -07:00
Linus Torvalds 29e350944f splice: add SPLICE_F_NONBLOCK flag
It doesn't make the splice itself necessarily nonblocking (because the
actual file descriptors that are spliced from/to may block unless they
have the O_NONBLOCK flag set), but it makes the splice pipe operations
nonblocking.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-02 12:46:35 -07:00
Martin Waitz a580290c3e Documentation: fix minor kernel-doc warnings
This patch updates the comments to match the actual code.

Signed-off-by: Martin Waitz <tali@admingilde.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-04-02 13:59:55 +02:00
Eric Sesterhenn 7ec7073809 BUG_ON() Conversion in fs/freevxfs/
this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-04-02 13:41:02 +02:00
Eric Sesterhenn 2c2111c2bd BUG_ON() Conversion in fs/udf/
this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-04-02 13:40:13 +02:00
Eric Sesterhenn d6735bfcc9 BUG_ON() Conversion in fs/sysv/
this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-04-02 13:39:21 +02:00
Eric Sesterhenn b7542f8c7e BUG_ON() Conversion in fs/inode.c
this changes if() BUG(); constructs to BUG_ON() which is
cleaner and can better optimized away

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-04-02 13:38:18 +02:00
Eric Sesterhenn f6298aab2e BUG_ON() Conversion in fs/fcntl.c
this changes if() BUG(); constructs to BUG_ON() which is
cleaner and can better optimized away

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-04-02 13:37:19 +02:00
Eric Sesterhenn 8abf6a4707 BUG_ON() Conversion in fs/dquot.c
this changes if() BUG(); constructs to BUG_ON() which is
cleaner and can better optimized away

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-04-02 13:36:13 +02:00
Adrian Bunk 733f896927 Merge with git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git 2006-04-02 10:37:38 +02:00
Linus Torvalds 547a77ae62 Merge master.kernel.org:/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  [CIFS] Fix typo in earlier cifs_unlink change and protect one
  [CIFS] Incorrect signature sent on SMB Read
  [CIFS] Fix unlink oops when indirectly called in rename error path
  [CIFS] Fix two remaining coverity scan tool warnings.
  [CIFS] Set correct lock type on new posix unlock call
  [CIFS] Upate cifs change log
  [CIFS] Fix slow oplock break response when mounts to different
  [CIFS] Workaround various server bugs found in testing at connectathon
  [CIFS] Allow fallback for setting file size to Procom SMB server when
  [CIFS] Make POSIX CIFS Extensions SetFSInfo match exactly what we want
  [CIFS] Move noisy debug message (triggerred by some older servers) from
  [CIFS] Use correct pid on new cifs posix byte range lock call
  [CIFS] Add posix (advisory) byte range locking support to cifs client
  [CIFS] CIFS readdir perf optimizations part 1
  [CIFS] Free small buffers earlier so we exceed the cifs
  [CIFS] Fix large (ie over 64K for MaxCIFSBufSize) buffer case for wrapping
  [CIFS] Convert remaining places in fs/cifs from
  [CIFS] SessionSetup cleanup part 2
  [CIFS] fix compile error (typo) and warning in cifssmb.c
  [CIFS] Cleanup NTLMSSP session setup handling
2006-03-31 21:27:53 -08:00
Eric Sesterhenn 99cee0cd75 BUG_ON() Conversion in fs/sysfs/
this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-04-01 01:18:38 +02:00
Eric Sesterhenn 5df0d31241 BUG_ON() Conversion in fs/smbfs/
this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-04-01 01:16:26 +02:00
Eric Sesterhenn 4b4d1cc733 BUG_ON() Conversion in fs/jffs2/
this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-04-01 01:15:35 +02:00
Eric Sesterhenn 0bf3ba538a BUG_ON() Conversion in fs/hfsplus/
this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-04-01 01:14:43 +02:00
Eric Sesterhenn 7dddb12c63 BUG_ON() Conversion in fs/exec.c
this changes if() BUG(); constructs to BUG_ON() which is
cleaner and can better optimized away

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-04-01 01:13:38 +02:00
Eric Sesterhenn d4569d2e69 BUG_ON() Conversion in fs/direct-io.c
this changes if() BUG(); constructs to BUG_ON() which is
cleaner and can better optimized away

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-04-01 01:10:13 +02:00
Steve French 06bcfedd05 [CIFS] Fix typo in earlier cifs_unlink change and protect one
extra path.

Since cifs_unlink can also be called from rename path and there
was one report of oops am making the extra check for null inode.

Signed-off-by: Steve French <sfrench@us.ibm.com>
2006-03-31 22:43:50 +00:00
Steve French e9917a000f [CIFS] Incorrect signature sent on SMB Read
Fixes Samba bug 3621 and kernel.org bug 6147

For servers which require SMB/CIFS packet signing, we were sending the
wrong signature (all zeros) on SMB Read request.  The new cifs routine
to do signatures across an iovec was not complete - and SMB Read, unlike
the new SMBWrite2, did not fall back to the older routine (ie use
SendReceive vs. the more efficient SendReceive2 ie used the older
cifs_sign_smb vs. the disabled  cifs_sign_smb2) for calculating signatures.

This finishes up cifs_sign_smb2/cifs_calc_signature2 so that the callers
of SendReceive2 can get SMB/CIFS packet signatures.

Now that cifs_sign_smb2 is supported, we could start using it in
the write path but this smaller fix does not include the change
to use SMBWrite2 when signatures are required (which when enabled
will make more Writes more efficient and alloc less memory).
Currently Write2 is only used when signatures are not
required at the moment but after more testing we will enable
that as well).

Thanks to James Slepicka and Sam Flory for initial investigation.

Signed-off-by: Steve French <sfrench@us.ibm.com>
2006-03-31 21:22:00 +00:00
Jes Sorensen 30c14e40ed [PATCH] avoid unaligned access when accessing poll stack
Commit 70674f95c0a2ea694d5c39f4e514f538a09be36f:

  [PATCH] Optimize select/poll by putting small data sets on the stack

resulted in the poll stack being 4-byte aligned on 64-bit architectures,
causing misaligned accesses to elements in the array.

This patch fixes it by declaring the stack in terms of 'long' instead
of 'char'.

Force alignment of poll and select stacks to long to avoid unaligned
access on 64 bit architectures.

Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31 12:30:48 -08:00
Adrian Bunk a244e1698a [PATCH] fs/namei.c: make lookup_hash() static
As announced, lookup_hash() can now become static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31 12:19:01 -08:00
Eric W. Biederman 3e7e241f8c [PATCH] dcache: Add helper d_hash_and_lookup
It is very common to hash a dentry and then to call lookup.  If we take fs
specific hash functions into account the full hash logic can get ugly.
Further full_name_hash as an inline function is almost 100 bytes on x86 so
having a non-inline choice in some cases can measurably decrease code size.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31 12:19:00 -08:00
Herbert Poetzl e4e5d3fc80 [PATCH] cleanup in proc_check_chroot()
proc_check_chroot() does the check in a very unintuitive way (keeping a
copy of the argument, then modifying the argument), and has uncommented
sideeffects.

Signed-off-by: Herbert Poetzl <herbert@13thfloor.at>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31 12:18:59 -08:00
Trond Myklebust 993dfa8776 [PATCH] fs/locks.c: Fix sys_flock() race
sys_flock() currently has a race which can result in a double free in the
multi-thread case.

Thread 1			Thread 2

sys_flock(file, LOCK_EX)
				sys_flock(file, LOCK_UN)

If Thread 2 removes the lock from inode->i_lock before Thread 1 tests for
list_empty(&lock->fl_link) at the end of sys_flock, then both threads will
end up calling locks_free_lock for the same lock.

Fix is to make flock_lock_file() do the same as posix_lock_file(), namely
to make a copy of the request, so that the caller can always free the lock.

This also has the side-effect of fixing up a reference problem in the
lockd handling of flock.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31 12:18:56 -08:00
Amy Griffis 7a2bd3f7ef [PATCH] inotify: IN_DELETE events missing
IN_DELETE events are no longer generated for the removal of a file from a
watched directory.

This seems to be a result of clearing DCACHE_INOTIFY_PARENT_WATCHED in
d_delete() directly before calling fsnotify_nameremove().

Assuming the flag doesn't need to be cleared before dentry_iput(), this
should do the trick.

Signed-off-by: Amy Griffis <amy.griffis@hp.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Acked-by: Robert Love <rml@novell.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31 12:18:55 -08:00
OGAWA Hirofumi 094e320d76 [PATCH] fat: kill reserved names
Since these names on old MSDOS is used as device, so, current fat driver
doesn't allow a user to create those names.  But many OSes and even Windows
can create those names actually, now.

This patch removes the reserved name check.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31 12:18:55 -08:00
Andrew Morton f79e2abb9b [PATCH] sys_sync_file_range()
Remove the recently-added LINUX_FADV_ASYNC_WRITE and LINUX_FADV_WRITE_WAIT
fadvise() additions, do it in a new sys_sync_file_range() syscall instead.
Reasons:

- It's more flexible.  Things which would require two or three syscalls with
  fadvise() can be done in a single syscall.

- Using fadvise() in this manner is something not covered by POSIX.

The patch wires up the syscall for x86.

The sycall is implemented in the new fs/sync.c.  The intention is that we can
move sys_fsync(), sys_fdatasync() and perhaps sys_sync() into there later.

Documentation for the syscall is in fs/sync.c.

A test app (sync_file_range.c) is in
http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz.

The available-to-GPL-modules do_sync_file_range() is for knfsd: "A COMMIT can
say NFS_DATA_SYNC or NFS_FILE_SYNC.  I can skip the ->fsync call for
NFS_DATA_SYNC which is hopefully the more common."

Note: the `async' writeout mode SYNC_FILE_RANGE_WRITE will turn synchronous if
the queue is congested.  This is trivial to fix: add a new flag bit, set
wbc->nonblocking.  But I'm not sure that we want to expose implementation
details down to that level.

Note: it's notable that we can sync an fd which wasn't opened for writing.
Same with fsync() and fdatasync()).

Note: the code takes some care to handle attempts to sync file contents
outside the 16TB offset on 32-bit machines.  It makes such attempts appear to
succeed, for best 32-bit/64-bit compatibility.  Perhaps it should make such
requests fail...

Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31 12:18:54 -08:00
Joe Korty 68eef3b479 [PATCH] Simplify proc/devices and fix early termination regression
Make baby-simple the code for /proc/devices.  Based on the proven design
for /proc/interrupts.

This also fixes the early-termination regression 2.6.16 introduced, as
demonstrated by:

    # dd if=/proc/devices bs=1
    Character devices:
      1 mem
    27+0 records in
    27+0 records out

This should also work (but is untested) when /proc/devices >4096 bytes,
which I believe is what the original 2.6.16 rewrite fixed.

[akpm@osdl.org: cleanups, simplifications]
Signed-off-by: Joe Korty <joe.korty@ccur.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31 12:18:53 -08:00
Miklos Szeredi 5ce29646eb [PATCH] locks: don't panic
Don't panic!  Just BUG_ON().

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31 12:18:52 -08:00
Al Viro 347f217cec [PATCH] uml: __user annotations
__user annotations (hppfs)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31 12:18:51 -08:00
Steve French f1682e94a3 Merge with /pub/scm/linux/kernel/git/torvalds/linux-2.6.git
Signed-off-by: Steve French <sfrench@us.ibm.com>
2006-03-31 15:43:35 +00:00
Jeff Garzik a0f0678025 [PATCH] splice exports
Woe be unto he who builds their filesystems as modules.

Signed-off-by: Jeff Garzik <jeff@garzik.org>
[ Obscure quote from the infamous geek bible? ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-30 22:16:24 -08:00
Steve French 6910ab30a2 [CIFS] Fix unlink oops when indirectly called in rename error path
under heavy stress.

Signed-off-by: Steve French <sfrench@us.ibm.com>
2006-03-31 03:37:08 +00:00
Steve French d62e54abca Merge with /pub/scm/linux/kernel/git/torvalds/linux-2.6.git
Signed-off-by: Steve French <sfrench@us.ibm.com>
2006-03-31 03:35:56 +00:00
Nathan Scott 1b895840ce [XFS] Provide XFS support for the splice syscall.
Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-03-31 13:08:59 +10:00
Nathan Scott 3bbcc8e397 [XFS] Reenable write barriers by default.
SGI-PV: 912426
SGI-Modid: xfs-linux-melb:xfs-kern:25634a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-03-31 13:04:56 +10:00
Nathan Scott 9a2a7de268 [XFS] Make project quota enforcement return an error code consistent with
its use.

SGI-PV: 951300
SGI-Modid: xfs-linux-melb:xfs-kern:25633a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-03-31 13:04:49 +10:00
Nathan Scott 764d1f89a5 [XFS] Implement the silent parameter to fill_super, previously ignored.
SGI-PV: 951299
SGI-Modid: xfs-linux-melb:xfs-kern:25632a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-03-31 13:04:17 +10:00
Mandy Kirkconnell 4b4fa25ced [XFS] Cleanup comment to remove reference to obsoleted function
xfs_bmap_do_search_extents().

SGI-PV: 951415
SGI-Modid: xfs-linux-melb:xfs-kern:208491a

Signed-off-by: Mandy Kirkconnell <alkirkco@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-03-31 13:03:58 +10:00
Jens Axboe 5abc97aa25 [PATCH] splice: add support for SPLICE_F_MOVE flag
This enables the caller to migrate pages from one address space page
cache to another.  In buzz word marketing, you can do zero-copy file
copies!

Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-30 12:28:18 -08:00
Jens Axboe 5274f052e7 [PATCH] Introduce sys_splice() system call
This adds support for the sys_splice system call. Using a pipe as a
transport, it can connect to files or sockets (latter as output only).

From the splice.c comments:

   "splice": joining two ropes together by interweaving their strands.

   This is the "extended pipe" functionality, where a pipe is used as
   an arbitrary in-memory buffer. Think of a pipe as a small kernel
   buffer that you can use to transfer data from one end to the other.

   The traditional unix read/write is extended with a "splice()" operation
   that transfers data buffers to or from a pipe buffer.

   Named by Larry McVoy, original implementation from Linus, extended by
   Jens to support splicing to files and fixing the initial implementation
   bugs.

Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-30 12:28:18 -08:00
Linus Torvalds 76babde121 Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (67 commits)
  [PATCH] powerpc: Remove oprofile spinlock backtrace code
  [PATCH] powerpc: Add oprofile calltrace support to all powerpc cpus
  [PATCH] powerpc: Add oprofile calltrace support
  [PATCH] for_each_possible_cpu: ppc
  [PATCH] for_each_possible_cpu: powerpc
  [PATCH] lock PTE before updating it in 440/BookE page fault handler
  [PATCH] powerpc: Kill _machine and hard-coded platform numbers
  ppc: Fix compile error in arch/ppc/lib/strcase.c
  [PATCH] git-powerpc: WARN was a dumb idea
  [PATCH] powerpc: a couple of trivial compile warning fixes
  powerpc: remove OCP references
  powerpc: Make uImage default build output for MPC8540 ADS
  powerpc: move math-emu over to arch/powerpc
  powerpc: use memparse() for mem= command line parsing
  ppc: fix strncasecmp prototype
  [PATCH] powerpc: make ISA floppies work again
  [PATCH] powerpc: Fix some initcall return values
  [PATCH] powerpc: Workaround for pSeries RTAS bug
  [PATCH] spufs: fix __init/__exit annotations
  [PATCH] powerpc: add hvc backend for rtas
  ...
2006-03-29 11:28:30 -08:00
Linus Torvalds e71ac6032e Merge git://oss.sgi.com:8090/oss/git/xfs-2.6
* git://oss.sgi.com:8090/oss/git/xfs-2.6:
  [XFS] Cleanup in XFS after recent get_block_t interface tweaks.
  [XFS] Remove unused/obsoleted function: xfs_bmap_do_search_extents()
  [XFS] A change to inode chunk allocation to try allocating the new chunk
  Fixes a regression from the recent "remove ->get_blocks() support"
  [XFS] Fix compiler warning and small code inconsistencies in compat
  [XFS] We really suck at spulling.  Thanks to Chris Pascoe for fixing all
2006-03-29 08:55:36 -08:00
Oleg Nesterov aa1757f90b [PATCH] convert sighand_cache to use SLAB_DESTROY_BY_RCU
This patch borrows a clever Hugh's 'struct anon_vma' trick.

Without tasklist_lock held we can't trust task->sighand until we locked it
and re-checked that it is still the same.

But this means we don't need to defer 'kmem_cache_free(sighand)'.  We can
return the memory to slab immediately, all we need is to be sure that
sighand->siglock can't dissapear inside rcu protected section.

To do so we need to initialize ->siglock inside ctor function,
SLAB_DESTROY_BY_RCU does the rest.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 18:36:42 -08:00
Oleg Nesterov 8fafabd86f [PATCH] remove add_parent()'s parent argument
add_parent(p, parent) is always called with parent == p->parent, and it makes
no sense to do it differently.  This patch removes this argument.

No changes in affected .o files.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 18:36:41 -08:00
Eric W. Biederman d73d65293e [PATCH] pidhash: kill switch_exec_pids
switch_exec_pids is only called from de_thread by way of exec, and it is
only called when we are exec'ing from a non thread group leader.

Currently switch_exec_pids gives the leader the pid of the thread and
unhashes and rehashes all of the process groups.  The leader is already in
the EXIT_DEAD state so no one cares about it's pids.  The only concern for
the leader is that __unhash_process called from release_task will function
correctly.  If we don't touch the leader at all we know that
__unhash_process will work fine so there is no need to touch the leader.

For the task becomming the thread group leader, we just need to give it the
pid of the old thread group leader, add it to the task list, and attach it
to the session and the process group of the thread group.

Currently de_thread is also adding the task to the task list which is just
silly.

Currently the only leader of __detach_pid besides detach_pid is
switch_exec_pids because of the ugly extra work that was being
performed.

So this patch removes switch_exec_pids because it is doing too much, it is
creating an unnecessary special case in pid.c, duing work duplicated in
de_thread, and generally obscuring what it is going on.

The necessary work is added to de_thread, and it seems to be a little
clearer there what is going on.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 18:36:40 -08:00
Oleg Nesterov 1434261c07 [PATCH] simplify exec from init's subthread
I think it is enough to take tasklist_lock for reading while changing
child_reaper:

	Reparenting needs write_lock(tasklist_lock)

	Only one thread in a thread group can do exec()

	sighand->siglock garantees that get_signal_to_deliver()
	will not see a stale value of child_reaper.

This means that we can change child_reaper earlier, without calling
zap_other_threads() twice.

"child_reaper = current" is a NOOP when init does exec from main thread, we
don't care.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 18:36:40 -08:00
Eric W. Biederman fef23e7fbb [PATCH] exec: allow init to exec from any thread.
After looking at the problem of init calling exec some more I figured out
an easy way to make the code work.

The actual symptom without out this patch is that all threads will die
except pid == 1, and the thread calling exec.  The thread calling exec will
wait forever for pid == 1 to die.

Since pid == 1 does not install a handler for SIGKILL it will never die.

This modifies the tests for init from current->pid == 1 to the equivalent
current == child_reaper.  And then it causes exec in the ugly case to
modify child_reaper.

The only weird symptom is that you wind up with an init process that
doesn't have the oldest start time on the box.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 18:36:40 -08:00
Paul Mackerras bac30d1a78 Merge ../linux-2.6 2006-03-29 13:24:50 +11:00
Nathan Scott c25366680b [XFS] Cleanup in XFS after recent get_block_t interface tweaks.
Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-03-29 10:44:40 +10:00
Mandy Kirkconnell 0b7e56a450 [XFS] Remove unused/obsoleted function: xfs_bmap_do_search_extents()
SGI-PV: 951415
SGI-Modid: xfs-linux-melb:xfs-kern:208490a

Signed-off-by: Mandy Kirkconnell <alkirkco@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-03-29 09:53:03 +10:00
Glen Overby 3ccb8b5f65 [XFS] A change to inode chunk allocation to try allocating the new chunk
contiguous with the most recently allocated chunk.  On a striped
filesystem, this will fill a stripe unit with inodes before allocating new
inodes in another stripe unit.

SGI-PV: 951416
SGI-Modid: xfs-linux-melb:xfs-kern:208488a

Signed-off-by: Glen Overby <overby@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-03-29 09:52:28 +10:00
Nathan Scott 3c674e7423 Fixes a regression from the recent "remove ->get_blocks() support"
change.  inode->i_blkbits should be used when making a get_block_t
request of a filesystem instead of dio->blkbits, as that does not
indicate the filesystem block size all the time (depends on request
alignment - see start of __blockdev_direct_IO).

Signed-off-by: Nathan Scott <nathans@sgi.com>
Acked-by: Badari Pulavarty <pbadari@us.ibm.com>
2006-03-29 09:26:15 +10:00
Nathan Scott e0edd5962b [XFS] Fix compiler warning and small code inconsistencies in compat
ioctl32 land.

SGI-PV: 904196
SGI-Modid: xfs-linux-melb:xfs-kern:25590a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-03-29 08:55:47 +10:00
Nathan Scott c41564b5af [XFS] We really suck at spulling. Thanks to Chris Pascoe for fixing all
these typos.

SGI-PV: 904196
SGI-Modid: xfs-linux-melb:xfs-kern:25539a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-03-29 08:55:14 +10:00
Alexey Dobriyan 7f927fcc2f [PATCH] Typo fixes
Fix a lot of typos.  Eyeballed by jmc@ in OpenBSD.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:08 -08:00
Arjan van de Ven 4b6f5d20b0 [PATCH] Make most file operations structs in fs/ const
This is a conversion to make the various file_operations structs in fs/
const.  Basically a regexp job, with a few manual fixups

The goal is both to increase correctness (harder to accidentally write to
shared datastructures) and reducing the false sharing of cachelines with
things that get dirty in .data (while .rodata is nicely read only and thus
cache clean)

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:06 -08:00
Arjan van de Ven 99ac48f54a [PATCH] mark f_ops const in the inode
Mark the f_ops members of inodes as const, as well as fix the
ripple-through this causes by places that copy this f_ops and then "do
stuff" with it.

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:05 -08:00
KAMEZAWA Hiroyuki 0a94502277 [PATCH] for_each_possible_cpu: fixes for generic part
replaces for_each_cpu with for_each_possible_cpu().

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:05 -08:00
Vadim Lobanov 68c3431ae2 [PATCH] Fold select_bits_alloc/free into caller code.
Remove an unnecessary level of indirection in allocating and freeing select
bits, as per the select_bits_alloc() and select_bits_free() functions.
Both select.c and compat.c are updated.

Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:04 -08:00
Eric Dumazet e4a1f129f9 [PATCH] use fget_light() in select/poll
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:04 -08:00
Andi Kleen 70674f95c0 [PATCH] Optimize select/poll by putting small data sets on the stack
Optimize select and poll by a using stack space for small fd sets

This brings back an old optimization from Linux 2.0.  Using the stack is
faster than kmalloc.  On a Intel P4 system it speeds up a select of a
single pty fd by about 13% (~4000 cycles -> ~3500)

It also saves memory because a daemon hanging in select or poll will
usually save one or two less pages.  This can add up - e.g.  if you have 10
daemons blocking in poll/select you save 40KB of memory.

I did a patch for this long ago, but it was never applied.  This version is
a reimplementation of the old patch that tries to be less intrusive.  I
only did the minimal changes needed for the stack allocation.

The cut off point before external memory is allocated is currently at
832bytes.  The system calls always allocate this much memory on the stack.

These 832 bytes are divided into 256 bytes frontend data (for the select
bitmaps of the pollfds) and the rest of the space for the wait queues used
by the low level drivers.  There are some extreme cases where this won't
work out for select and it falls back to allocating memory too early -
especially with very sparse large select bitmaps - but the majority of
processes who only have a small number of file descriptors should be ok.
[TBD: 832/256 might not be the best split for select or poll]

I suspect more optimizations might be possible, but they would be more
complicated.  One way would be to cache the select/poll context over
multiple system calls because typically the input values should be similar.
 Problem is when to flush the file descriptors out though.

Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:04 -08:00
Adrian Bunk f5b95ff010 [PATCH] autofs4: proper prototype for autofs4_dentry_release()
Add a proper prototype for autofs4_dentry_release() to autofs_i.h.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:03 -08:00
Adrian Bunk a28af471b8 [PATCH] fs/fat/: proper prototypes for two functions
Add proper prototypes for fat_cache_init() and fat_cache_destroy() in
msdos_fs.h.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:03 -08:00
Benjamin Herrenschmidt e8222502ee [PATCH] powerpc: Kill _machine and hard-coded platform numbers
This removes statically assigned platform numbers and reworks the
powerpc platform probe code to use a better mechanism.  With this,
board support files can simply declare a new machine type with a
macro, and implement a probe() function that uses the flattened
device-tree to detect if they apply for a given machine.

We now have a machine_is() macro that replaces the comparisons of
_machine with the various PLATFORM_* constants.  This commit also
changes various drivers to use the new macro instead of looking at
_machine.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-28 23:15:54 +11:00
Michael Ellerman 5149fa47ec [PATCH] powerpc: Cope with duplicate node & property names in /proc/device-tree
Various dodgy firmware might give us nodes and/or properties in the device
tree with conflicting names. That's generally ok, except for when we export
the device tree via /proc, so check when we're creating the proc device tree
and munge names accordingly.

Tested on a faked device tree with kexec, would be good if someone with
actual bogus firmware could try it, but just for completeness.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-28 16:45:23 +11:00
Jun'ichi Nomura b4cf1b72ee [PATCH] dm/md dependency tree in sysfs: convert bd_sem to bd_mutex
Convert bd_sem to bd_mutex

Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:45:00 -08:00
Jun'ichi Nomura 641dc636b0 [PATCH] dm/md dependency tree in sysfs: bd_claim_by_kobject
Adding bd_claim_by_kobject() function which takes kobject as additional
signature of holder device and creates sysfs symlinks between holder device
and claimed device.  bd_release_from_kobject() is a counterpart of
bd_claim_by_kobject.

Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:45:00 -08:00
Andrew Morton 100873687d [PATCH] dm-md-dependency-tree-in-sysfs-holders-slaves-subdirectory-tidy
Remove all the CONFIG_SYSFS stuff.  That's supposed to all be implemented up
in header files.

Yes, the CONFIG_SYSFS=n data structures will be a little larger than
necessary, but that's a tradeoff we can decide to make.

Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:59 -08:00
Jun'ichi Nomura 6a4d44c1f1 [PATCH] dm/md dependency tree in sysfs: holders/slaves subdirectory
Creating "slaves" and "holders" directories in /sys/block/<disk> and
creating "holders" directory under /sys/block/<disk>/<partition>

Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:59 -08:00
KAMEZAWA Hiroyuki ec936fc563 [PATCH] for_each_online_pgdat: renaming for_each_pgdat
Replace for_each_pgdat() with for_each_online_pgdat().

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:48 -08:00
Adrian Bunk 74cae61ab4 [PATCH] fs/nfsd/export.c,net/sunrpc/cache.c: make needlessly global code static
We can now make some code static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:43 -08:00
NeilBrown baab935ff3 [PATCH] knfsd: Convert sunrpc_cache to use krefs
.. it makes some of the code nicer.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:43 -08:00
NeilBrown f9ecc921b5 [PATCH] knfsd: Use new cache code for name/id lookup caches
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:42 -08:00
NeilBrown 8d270f7f4c [PATCH] knfsd: Use new cache_lookup for svc_expkey cache
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:42 -08:00
NeilBrown 4f7774c3a0 [PATCH] knfsd: Use new cache_lookup for svc_export
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:42 -08:00
NeilBrown 7d317f2c9f [PATCH] knfsd: Get rid of 'inplace' sunrpc caches
These were an unnecessary wart.  Also only have one 'DefineSimpleCache..'
instead of two.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:41 -08:00
NeilBrown eab7e2e647 [PATCH] knfsd: Break the hard linkage from svc_expkey to svc_export
Current svc_expkey holds a pointer to the svc_export structure, so updates to
that structure have to be in-place, which is a wart on the whole cache
infrastruct.  So we break that linkage and just do a second lookup.

If this became a performance issue, it would be possible to put a direct link
back in which was only used conditionally.  i.e.  when an object is replaced
in the cache, we set a flag in the old object.  When dereferencing the link
from svc_expkey, if the flag is set, we drop the reference and do a fresh
lookup.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:41 -08:00
NeilBrown efc36aa560 [PATCH] knfsd: Change the store of auth_domains to not be a 'cache'
The 'auth_domain's are simply handles on internal data structures.  They do
not cache information from user-space, and forcing them into the mold of a
'cache' misrepresents their true nature and causes confusion.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:41 -08:00
Ian Kent 3e7b191980 [PATCH] autofs4: atomic var underflow
Fix accidental underflow of the atomic counter.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:41 -08:00
Ian Kent 871f94344c [PATCH] autofs4: follow_link missing functionality
This functionality is also need for operation of autofs v5 direct mounts.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:41 -08:00
Dave Jones 3370c74b2e [PATCH] Remove redundant check from autofs4_put_super
We have to have a valid sbi here, or we'd have oopsed already.  (There's a
dereference of sbi->catatonic a few lines above)

Coverity #740

Signed-off-by: Dave Jones <davej@redhat.com>
Cc: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:41 -08:00
Ian Kent 44d53eb041 [PATCH] autofs4: change AUTOFS_TYP_* AUTOFS_TYPE_*
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:41 -08:00
Ian Kent 5c0a32fc2c [PATCH] autofs4: add new packet type for v5 communications
This patch define a new autofs packet for autofs v5 and updates the waitq.c
functions to handle the additional packet type.

Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:40 -08:00
Ian Kent 3a15e2ab5d [PATCH] autofs4: add v5 expire logic
This patch adds expire logic for autofs direct mounts.

Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:40 -08:00
Ian Kent 34ca959cfc [PATCH] autofs4: add v5 follow_link mount trigger method
This patch adds a follow_link inode method for the root of an autofs direct
mount trigger.  It also adds the corresponding mount options and updates the
show_mount method.

Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:40 -08:00
Ian Kent 051d381259 [PATCH] autofs4: nameidata needs to be up to date for follow_link
In order to be able to trigger a mount using the follow_link inode method the
nameidata struct that is passed in needs to have the vfsmount of the autofs
trigger not its parent.

During a path walk if an autofs trigger is mounted on a dentry, when the
follow_link method is called, the nameidata struct contains the vfsmount and
mountpoint dentry of the parent mount while the dentry that is passed in is
the root of the autofs trigger mount.  I believe it is impossible to get the
vfsmount of the trigger mount, within the follow_link method, when only the
parent vfsmount and the root dentry of the trigger mount are known.

This patch updates the nameidata struct on entry to __do_follow_link if it
detects that it is out of date.  It moves the path_to_nameidata to above
__do_follow_link to facilitate calling it from there.  The dput_path is moved
as well as that seemed sensible.  No changes are made to these two functions.

Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:40 -08:00
Ian Kent e3474a8eb3 [PATCH] autofs4: change may_umount* functions to boolean
Change the functions may_umount and may_umount_tree to boolean functions to
aid code readability.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:40 -08:00
Ian Kent 90a59c7cf5 [PATCH] autofs4: rename simple_empty_nolock function
Rename the function simple_empty_nolock to __simple_empty in line with kernel
naming conventions.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:40 -08:00
Ian Kent e77fbddf77 [PATCH] autofs4: white space cleanup for waitq.c
Whitespace and formating changes to waitq code.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:40 -08:00
Ian Kent d7c4a5f108 [PATCH] autofs4: add a show mount options for proc filesystem
Add show_options method to display autofs4 mount options in the proc
filesystem.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:40 -08:00
Ian Kent 862b110f01 [PATCH] autofs4: remove update_atime unused function
Remove the update of i_atime from autofs4 in favour of having VFS update it.
i_atime is never used for expire in autofs4.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:39 -08:00
Ian Kent e0a7aae940 [PATCH] autofs4: expire mounts that hold no (extra) references only
Alter the expire semantics that define how "busyness" is determined.
Currently a last_used counter is updated on every revalidate from processes
other than the mount owner process group.

This patch changes that so that an expire candidate is busy only if it has a
reference count greater than the expected minimum, such as when there is an
open file or working directory in use.

This method is the only way that busyness can be established for direct mounts
within the new implementation.  For consistency the expire semantic is made
the same for all mounts.

A side effect of the patch is that mounts which remain mounted unessessarily
in the presence of some GUI programs that scan the filesystem should now
expire.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:39 -08:00
Ian Kent 1aff3c8b05 [PATCH] autofs4: fix false negative return from expire
Fix the case where an expire returns busy on a tree mount when it is in fact
not busy.  This case was overlooked when the patch to prevent the expiring
away of "scaffolding" directories for tree mounts was applied.

The problem arises when a tree of mounts is a member of a map with other keys.
 The current logic will not expire the tree if any other mount in the map is
busy.  The solution is to maintain a "minimum" use count for each autofs
dentry and compare this to the actual dentry usage count during expire.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:39 -08:00
Ian Kent 1ce12bad85 [PATCH] autofs4: simplify expire tree traversal
Simplify the expire tree traversal code by using a function from namespace.c
to calculate the next entry in the top down tree traversals carried out during
the expire operation.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:39 -08:00
Ian Kent 1f5f2c3059 [PATCH] autofs4: expire code readability cleanup
Change the names of the boolean functions autofs4_check_mount and
autofs4_check_tree to autofs4_mount_busy and autofs4_tree_busy respectively
and alters their return codes to suit in order to aid code readabilty.

A couple of white space cleanups are included as well.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:39 -08:00
Ian Kent 2d753e62b8 [PATCH] autofs4: can't mount due to mount point dir not empty
Addresse a problem where stale dentrys stop mounts from happening.

When a mount point directory is pre-created and a non-existent entry within it
is requested a dentry ends up being created within the mount point directory
which stops future mounts.  The problem is solved by ignoring negative,
unhashed dentrys in the mount point d_subdirs list.

Additionally the apparent cacheing of -ENOENT returns from requests is
removed.  The test on d_time is a tautology and d_time is not initialised and
has an unexpected value.  In short it doesn't do what it's meant to.

The cacheing of failed requests to the daemon is important and will be
followed up later.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:39 -08:00
Ian Kent f360ce3be4 [PATCH] autofs4: use libfs routines for readdir
Change readdir routines to use the cursor based routines in libfs.c.  This
removes reliance on old readdir code from 2.4 and should improve efficiency of
readdir in autofs4.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:39 -08:00
Ian Kent 718c604a28 [PATCH] autofs4: lookup white space cleanup
Whitespace and formating changes to lookup code.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:39 -08:00
Jeff Dike 86c79cbcee [PATCH] uml: fix hostfs stack corruption
Noted by Oleg Drokin:
We initialized an extra slot of struct kstatfs.spare, sometimes
causing stack corruption.

Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Oleg Drokin <green@clusterfs.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:39 -08:00
Linus Torvalds 9ae21d1bb3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
  drivers/char/ftape/lowlevel/fdc-io.c: Correct a comment
  Kconfig help: MTD_JEDECPROBE already supports Intel
  Remove ugly debugging stuff
  do_mounts.c: Minor ROOT_DEV comment cleanup
  BUG_ON() Conversion in drivers/s390/block/dasd_devmap.c
  BUG_ON() Conversion in mm/mempool.c
  BUG_ON() Conversion in mm/memory.c
  BUG_ON() Conversion in kernel/fork.c
  BUG_ON() Conversion in ipc/sem.c
  BUG_ON() Conversion in fs/ext2/
  BUG_ON() Conversion in fs/hfs/
  BUG_ON() Conversion in fs/dcache.c
  BUG_ON() Conversion in fs/buffer.c
  BUG_ON() Conversion in input/serio/hp_sdc_mlc.c
  BUG_ON() Conversion in md/dm-table.c
  BUG_ON() Conversion in md/dm-path-selector.c
  BUG_ON() Conversion in drivers/isdn
  BUG_ON() Conversion in drivers/char
  BUG_ON() Conversion in drivers/mtd/
2006-03-26 09:41:18 -08:00
Artem B. Bityuckiy 3be7d29fb9 Remove ugly debugging stuff
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-03-26 18:58:31 +02:00