linux

q3k/linux

Author	SHA1	Message	Date
Eric Sandeen	16317ec2e5	ecryptfs: redo dget,mntget on dentry_open failure Thanks to Jeff Moyer for pointing this out. If the RDWR dentry_open() in ecryptfs_init_persistent_file fails, it will do a dput/mntput. Need to re-take references if we retry as RDONLY. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Acked-by: Mike Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-12-23 12:54:37 -08:00
Eric Sandeen	c8161f64cc	ecryptfs: fix unlocking in error paths Thanks to Josef Bacik for finding these. A couple of ecryptfs error paths don't properly unlock things they locked. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: Josef Bacik <jbacik@redhat.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-12-23 12:54:37 -08:00
Jan Kara	c525460e27	Don't send quota messages repeatedly when hardlimit reached We should send quota message to netlink only once when hardlimit is reached. Otherwise user could easily make the system busy by trying to exceed the hardlimit (and also the messages could be anoying if you cannot stop writing just now). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-12-23 12:54:36 -08:00
Jan Kara	22dd483721	Fix computation of SKB size for quota messages Fix computation of size of skb needed for quota message. We should use netlink provided functions and not just an ad-hoc number. Also don't print the return value from nla_put_foo() as it is always -1. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-12-23 12:54:36 -08:00
Eric Sandeen	b88629060b	ecryptfs: fix string overflow on long cipher names Passing a cipher name > 32 chars on mount results in an overflow when the cipher name is printed, because the last character in the struct ecryptfs_key_tfm's cipher_name string was never zeroed. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Acked-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-12-23 12:54:36 -08:00
Linus Torvalds	2b5baad165	Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6 * 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6: [XFS] Initialise current offset in xfs_file_readdir correctly [XFS] Fix mknod regression	2007-12-20 17:02:22 -08:00
Lachlan McIlroy	4743e0ec12	[XFS] Initialise current offset in xfs_file_readdir correctly After reading the directory contents into the temporary buffer, we grab each dirent and pass it to filldir witht eh current offset of the dirent. The current offset was not being set for the first dirent in the temporary buffer, which coul dresult in bad offsets being set in the f_pos field result in looping and duplicate entries being returned from readdir. SGI-PV: 974905 SGI-Modid: xfs-linux-melb:xfs-kern:30282a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2007-12-21 11:40:05 +11:00
Christoph Hellwig	bad60fdd14	[XFS] Fix mknod regression This was broken by my '[XFS] simplify xfs_create/mknod/symlink prototype', which assigned the re-shuffled ondisk dev_t back to the rdev variable in xfs_vn_mknod. Because of that i_rdev is set to the ondisk dev_t instead of the linux dev_t later down the function. Fortunately the fix for it is trivial: we can just remove the assignment because xfs_revalidate_inode has done the proper job before unlocking the inode. SGI-PV: 974873 SGI-Modid: xfs-linux-melb:xfs-kern:30273a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2007-12-21 11:39:58 +11:00
Ivan Kokshaysky	3c378158d4	mm: fix exit_mmap BUG() on a.out binary exit The problem was introduced by commit "mm: variable length argument support" (`b6a2fea393`) as it didn't update fs/binfmt_aout.c like other binfmt's. I noticed that on alpha when accidentally launched old OSF/1 Acrobat Reader binary. Obviously, other architectures are affected as well. Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Ollie Wild <aaw@google.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hugh@veritas.com> Cc: Adrian Bunk <bunk@stusta.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-12-20 07:49:53 -08:00
Lachlan McIlroy	041388b54e	[XFS] Put the correct offset in dirent d_off The recent filldir regression fix was not putting the correct d_off in each dirent. This was resulting in incorrect cookies being passed to dmapi ioctls and the wrong offset appearing in the dirents. readdir was unaffected as the filp->f_pos was being updated with the correct offset and this was being written into the last dirent in each buffer. Fix the XFS code to do the right thing. SGI-PV: 973746 SGI-Modid: xfs-linux-melb:xfs-kern:30240a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2007-12-18 17:16:23 +11:00
Lachlan McIlroy	c734c79bc3	[XFS] Don't wait for pending I/Os when purging blocks beyond eof. On last close of a file we purge blocks beyond eof. The same code is used when we truncate the file size down. In this case we need to wait for any pending I/Os for dirty pages beyond the new eof. For the last close case we are not changing the file size and therefore do not need to wait for any I/Os to complete. This fixes a performance bottleneck where writes into the page cache and cache flushes can become mutually exclusive. SGI-PV: 964002 SGI-Modid: xfs-linux-melb:xfs-kern:30220a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Peter Leckie <pleckie@sgi.com>	2007-12-18 17:16:17 +11:00
Jan Kara	087ee8d5be	Fix compilation warning in dquot.c Fix compilation warning about discarded const. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-12-17 19:28:17 -08:00
Eric Sandeen	7a3f595cc8	ecryptfs: fix fsx data corruption problems ecryptfs in 2.6.24-rc3 wasn't surviving fsx for me at all, dying after 4 ops. Generally, encountering problems with stale data and improperly zeroed pages. An extending truncate + write for example would expose stale data. With the changes below I got to a million ops and beyond with all mmap ops disabled - mmap still needs work. (A version of this patch on a RHEL5 kernel ran for over 110 million fsx ops) I added a few comments as well, to the best of my understanding as I read through the code. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Acked-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-12-17 19:28:17 -08:00
Eric Sandeen	7c9e70efbf	ecryptfs: set s_blocksize from lower fs in sb eCryptfs wasn't setting s_blocksize in it's superblock; just pick it up from the lower FS. Having an s_blocksize of 0 made things like "filefrag" which call FIGETBSZ unhappy. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Acked-by: Mike Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-12-17 19:28:17 -08:00
Andries E. Brouwer	b47b6f38e5	ext3, ext4: avoid divide by zero As it turns out, the kernel divides by EXT3_INODES_PER_GROUP(s) when mounting an ext3 filesystem. If that number is zero, a crash follows. Below a patch. This crash was reported by Joeri de Ruiter, Carst Tankink and Pim Vullers. Cc: <linux-ext4@vger.kernel.org> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-12-17 19:28:16 -08:00
Uwe Kleine-König	9e2de407be	fs/Kconfig: grammar fix This was introduced in 4af8e944c22d8af92a7548354a9567250cc1a782 Signed-off-by: Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-12-17 19:28:16 -08:00
Eric Sandeen	459e216429	ecryptfs: initialize new auth_tokens before teardown ecryptfs_destroy_mount_crypt_stat() checks whether each auth_tok->global_auth_tok_key is nonzero and if so puts that key. However, in some early mount error paths nothing has initialized the pointer, and we try to key_put() garbage. Running the bad cipher tests in the testsuite exposes this, and it's happy with the following change. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-12-17 19:28:15 -08:00
Linus Torvalds	2cc3a8f6ac	Merge git://git.linux-nfs.org/pub/linux/nfs-2.6 * git://git.linux-nfs.org/pub/linux/nfs-2.6: MAINTAINERS: update the NFS CLIENT entry NFS: Fix an Oops in NFS unmount Revert "NFS: Ensure we return zero if applications attempt to write zero bytes" SUNRPC xprtrdma: fix XDR tail buf marshalling for all ops NFSv2/v3: Fix a memory leak when using -onolock NFS: Fix NFS mountpoint crossing...	2007-12-17 13:36:17 -08:00
Mark Fasheh	e8aed3450c	ocfs2: Re-journal buffers after transaction extend ocfs2_extend_trans() might call journal_restart() which will commit dirty buffers and then restart the transaction. This means that any buffers which still need changes should be passed to journal_access() again. Some paths during extend weren't doing this right. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-12-17 10:51:23 -08:00
Mark Fasheh	0879c584ff	ocfs2: Allow for debugging of transaction extends The nastiest cases of transaction extends are also the rarest. We can expose them more quickly at the expense of performance by going straight to the journal_restart() in ocfs2_extend_trans(). Wrap things in OCFS2_DEBUG_FS so that we only do this when "expensive debugging" is turned on. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-12-17 10:51:14 -08:00
Mark Fasheh	92295d8054	ocfs2: Don't panic when truncating an empty extent This BUG_ON() was unintentionally left in after the sparse file support was written. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-12-17 10:51:04 -08:00
Mark Fasheh	a86370fbb6	ocfs2: fix exit-while-locked bug in ocfs2_queue_orphans() We're holding the cluster lock when a failure might happen in ocfs2_dir_foreach() so it needs to be released. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-12-17 10:49:43 -08:00
Trond Myklebust	a10db50a4a	NFS: Fix an Oops in NFS unmount Ensure that the dummy 'root dentry' is invisible to d_find_alias(). If not, then it may be spliced into the tree if a parent directory from the same filesystem gets mounted at a later time. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-12-12 11:12:15 -05:00
Trond Myklebust	a5576cfa5c	Revert "NFS: Ensure we return zero if applications attempt to write zero bytes" This reverts commit `b9148c6b80`. On Wed, 12 Dec 2007 10:57:30 -0500, Chuck Lever wrote > commit `b9148c6b` should be reverted. It was recently forward-ported > from some years-old patches, and is clearly not needed now. > > On Dec 11, 2007, at 5:21 PM, Adrian Bunk wrote: > >> This code became dead after commit >> `b9148c6b80` >> (which BTW doesn't seem to have changed any behaviour) and can >> therefore >> be removed. >> >> Spotted by the Coverity checker. >> >> Signed-off-by: Adrian Bunk <bunk@kernel.org> >> >> --- >> --- linux-2.6/fs/nfs/direct.c.old 2007-12-02 21:54:53.000000000 +0100 >> +++ linux-2.6/fs/nfs/direct.c 2007-12-02 21:55:10.000000000 +0100 >> @@ -897,15 +897,12 @@ ssize_t nfs_file_direct_write(struct kio >> if (!count) >> goto out; /* return 0 */ >> >> retval = -EINVAL; >> if ((ssize_t) count < 0) >> goto out; >> - retval = 0; >> - if (!count) >> - goto out; >> >> retval = nfs_sync_mapping(mapping); >> if (retval) >> goto out; >> >> retval = nfs_direct_write(iocb, iov, nr_segs, pos, count); >> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-12-12 11:08:33 -05:00
Trond Myklebust	5cef338b30	NFSv2/v3: Fix a memory leak when using -onolock Neil Brown said: > Hi Trond, > > We found that a machine which made moderately heavy use of > 'automount' was leaking some nfs data structures - particularly the > 4K allocated by rpc_alloc_iostats. > It turns out that this only happens with filesystems with -onolock > set. > The problem is that if NFS_MOUNT_NONLM is set, nfs_start_lockd doesn't > set server->destroy, so when the filesystem is unmounted, the > ->client_acl is not shutdown, and so several resources are still > held. Multiple mount/umount cycles will slowly eat away memory > several pages at a time. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: NeilBrown <neilb@suse.de>	2007-12-11 22:01:56 -05:00
Trond Myklebust	4584f520e1	NFS: Fix NFS mountpoint crossing... The check that was added to nfs_xdev_get_sb() to work around broken servers, works fine for NFSv2, but causes mountpoint crossing on NFSv3 to always return ESTALE. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-12-11 19:01:45 -05:00
Eric W. Biederman	3790ee4bd8	proc: remove/Fix proc generic d_revalidate Ultimately to implement /proc perfectly we need an implementation of d_revalidate because files and directories can be removed behind the back of the VFS, and d_revalidate is the only way we can let the VFS know that this has happened. Unfortunately the linux VFS can not cope with anything in the path to a mount point going away. So a proper d_revalidate method that calls d_drop also needs to call have_submounts which is moderately expensive, so you really don't want a d_revalidate method that unconditionally calls it, but instead only calls it when the backing object has really gone away. proc generic entries only disappear on module_unload (when not counting the fledgling network namespace) so it is quite rare that we actually encounter that case and has not actually caused us real world trouble yet. So until we get a proper test for keeping dentries in the dcache fix the current d_revalidate method by completely removing it. This returns us to the current status quo. So with CONFIG_NETNS=n things should look as they have always looked. For CONFIG_NETNS=y things work most of the time but there are a few rare corner cases that don't behave properly. As the network namespace is barely present in 2.6.24 this should not be a problem. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "Denis V. Lunev" <den@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-12-10 19:43:55 -08:00
Linus Torvalds	41f81e88e0	Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6 * 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6: [XFS] Fix xfs_ichgtime()s broken usage of I_SYNC [XFS] Make xfsbufd threads freezable [XFS] revert to double-buffering readdir [XFS] Fix broken inode cluster setup. [XFS] Clear XBF_READ_AHEAD flag on I/O completion. [XFS] Fixed a few bugs in xfs_buf_associate_memory() [XFS] 971064 Various fixups for xfs_bulkstat(). [XFS] Fix dbflush panic in xfs_qm_sync.	2007-12-10 10:18:27 -08:00
David Chinner	cf10e82bdc	[XFS] Fix xfs_ichgtime()s broken usage of I_SYNC The recent I_LOCK->I_SYNC changes mistakenly changed xfs_ichgtime to look at I_SYNC instead of I_LOCK. This was incorrect and prevents newly created inodes from moving to the dirty list. Change this to the correct check which is for I_NEW, not I_LOCK or I_SYNC so that behaviour is correct. SGI-PV: 974225 SGI-Modid: xfs-linux-melb:xfs-kern:30204a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2007-12-10 13:47:56 +11:00
Rafael J. Wysocki	978c7b2ff4	[XFS] Make xfsbufd threads freezable Fix breakage caused by commit `8314418629` that did not introduce the necessary call to set_freezable() in xfs/linux-2.6/xfs_buf.c . SGI-PV: 974224 SGI-Modid: xfs-linux-melb:xfs-kern:30203a Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2007-12-10 13:47:36 +11:00
Christoph Hellwig	e89bc612d6	[XFS] revert to double-buffering readdir The current readdir implementation deadlocks on a btree buffers locks because nfsd calls back into ->lookup from the filldir callback. The only short-term fix for this is to revert to the old inefficient double-buffering scheme. SGI-PV: 973377 SGI-Modid: xfs-linux-melb:xfs-kern:30201a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2007-12-10 13:47:15 +11:00
David Chinner	a7430847fc	[XFS] Fix broken inode cluster setup. The radix tree based inode caches did away with the inode cluster hashes, replacing them with a bunch of masking and gang lookups on the radix tree. This masking got broken when moving the code to per-ag radix trees and indexing by agino # rather than straight inode number. The result is clustered inode writeback does not cluster and things can go extremely slowly when there are lots of inodes to write. Fix it up by comparing the agino # of the inode we just looked up to the index of the cluster we are looking for. Tested-by: Torsten Kaiser <just.for.lkml@googlemail.com> SGI-PV: 972915 SGI-Modid: xfs-linux-melb:xfs-kern:30033a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2007-12-10 13:46:59 +11:00
Lachlan McIlroy	77be55a5a1	[XFS] Clear XBF_READ_AHEAD flag on I/O completion. SGI-PV: 972554 SGI-Modid: xfs-linux-melb:xfs-kern:30128a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2007-12-10 13:46:45 +11:00
Lachlan McIlroy	d1afb678ce	[XFS] Fixed a few bugs in xfs_buf_associate_memory() - calculation of 'page_count' was incorrect as it did not consider the offset of 'mem' into the first page. The logic to bump 'page_count' didn't work if 'len' was <= PAGE_CACHE_SIZE (ie offset = 3k, len = 2k). - setting b_buffer_length to 'len' is incorrect if 'offset' is > 0. Set it to the total length of the buffer. - I suspect that passing a non-aligned address into mem_to_page() for the first page may have been causing issues - don't know but just tidy up that code anyway. SGI-PV: 971596 SGI-Modid: xfs-linux-melb:xfs-kern:30143a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2007-12-10 13:46:20 +11:00
Lachlan McIlroy	cd57e594ad	[XFS] 971064 Various fixups for xfs_bulkstat(). - sanity check for NULL user buffer in xfs_ioc_bulkstat[_compat]() - remove the special case for XFS_IOC_FSBULKSTAT with count == 1. This special case causes bulkstat to fail because the special case uses xfs_bulkstat_single() instead of xfs_bulkstat() and the two functions have different semantics. xfs_bulkstat() will return the next inode after the one supplied while skipping internal inodes (ie quota inodes). xfs_bulkstate_single() will only lookup the inode supplied and return an error if it is an internal inode. - in xfs_bulkstat(), need to initialise 'lastino' to the inode supplied so in cases were we return without examining any inodes the scan wont restart back at zero. - sanity check for valid *ubcountp values. Cannot sanity check for valid ubuffer here because some users of xfs_bulkstat() don't supply a buffer. - checks against 'ubleft' (the space left in the user's buffer) should be against 'statstruct_size' which is the supplied minimum object size. The mixture of checks against statstruct_size and 0 was one of the reasons we were skipping inodes. - if the formatter function returns BULKSTAT_RV_NOTHING and an error and the error is not ENOENT or EINVAL then we need to abort the scan. ENOENT is for inodes that are no longer valid and we just skip them. EINVAL is returned if we try to lookup an internal inode so we skip them too. For a DMF scan if the inode and DMF attribute cannot fit into the space left in the user's buffer it would return ERANGE. We didn't handle this error and skipped the inode. We would continue to skip inodes until one fitted into the user's buffer or we completed the scan. - put back the recalculation of agino (that got removed with the last fix) at the end of the while loop. This is because the code at the start of the loop expects agino to be the last inode examined if it is non-zero. - if we found some inodes but then encountered an error, return success this time and the error next time. If the formatter aborted with ENOMEM we will now return this error but only if we couldn't read any inodes. Previously if we encountered ENOMEM without reading any inodes we returned a zero count and no error which falsely indicated the scan was complete. SGI-PV: 973431 SGI-Modid: xfs-linux-melb:xfs-kern:30089a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com>	2007-12-10 13:44:11 +11:00
Donald Douwsma	d757762bf2	[XFS] Fix dbflush panic in xfs_qm_sync. The recent behaviour layer removal dropped the check for quotas that have been requested at mount time but have subsequently been turned off. This results in a panic when accessing m_quotainfo which has been freed. This patch adds the check originally made by xfs_qm_syncall() to xfs_qm_sync(). SGI-PV: 969769 SGI-Modid: xfs-linux-melb:xfs-kern:29908a Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2007-12-10 13:40:10 +11:00
Len Brown	f7a5274d7d	Pull suspend-2.6.24 into release branch	2007-12-06 16:26:52 -05:00
Al Viro	97bd7919e2	remove nonsense force-casts from ocfs2 endianness annotations in networking code had been in place for quite a while; in particular, sin_port and s_addr are annotated as big-endian. Code in ocfs2 had __force casts added apparently to shut the sparse warnings up; of course, these days they only serve to produce warnings for no reason whatsoever... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-12-05 09:25:20 -08:00
Al Viro	7e46aa5c8c	regression: bfs endianness bug BFS_FILEBLOCKS() expects struct bfs_inode * (on-disk data, with little- endian fields), not struct bfs_inode_info * (in-core stuff, with host- endian ones). It's a macro and fields with the right names are present in bfs_inode_info, so it compiles, but on big-endian host it gives bogus results. Introduced in commit `f433dc5634` ("Fixes to the BFS filesystem driver"). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-12-05 09:25:20 -08:00
Al Viro	9b5e6857b3	regression: cifs endianness bug access_flags_to_mode() gets on-the-wire data (little-endian) and treats it as host-endian. Introduced in commit `e01b640013` ("[CIFS] enable get mode from ACL when cifsacl mount option specified") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-12-05 09:25:19 -08:00
Alexey Dobriyan	5a622f2d0f	proc: fix proc_dir_entry refcounting Creating PDEs with refcount 0 and "deleted" flag has problems (see below). Switch to usual scheme: * PDE is created with refcount 1 * every de_get does +1 * every de_put() and remove_proc_entry() do -1 * once refcount reaches 0, PDE is freed. This elegantly fixes at least two following races (both observed) without introducing new locks, without abusing old locks, without spreading lock_kernel(): 1) PDE leak remove_proc_entry de_put ----------------- ------ [refcnt = 1] if (atomic_read(&de->count) == 0) if (atomic_dec_and_test(&de->count)) if (de->deleted) /* also not taken! / free_proc_entry(de); else de->deleted = 1; [refcount=0, deleted=1] 2) use after free remove_proc_entry de_put ----------------- ------ [refcnt = 1] if (atomic_dec_and_test(&de->count)) if (atomic_read(&de->count) == 0) free_proc_entry(de); / boom! / if (de->deleted) free_proc_entry(de); BUG: unable to handle kernel paging request at virtual address 6b6b6b6b printing eip: c10acdda pdpt = 00000000338f8001 *pde = 0000000000000000 Oops: 0000 [#1] PREEMPT SMP Modules linked in: af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom Pid: 23161, comm: cat Not tainted (2.6.24-rc2-8c0863403f109a43d7000b4646da4818220d501f #4) EIP: 0060:[<c10acdda>] EFLAGS: 00210097 CPU: 1 EIP is at strnlen+0x6/0x18 EAX: 6b6b6b6b EBX: 6b6b6b6b ECX: 6b6b6b6b EDX: fffffffe ESI: c128fa3b EDI: f380bf34 EBP: ffffffff ESP: f380be44 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process cat (pid: 23161, ti=f380b000 task=f38f2570 task.ti=f380b000) Stack: c10ac4f0 00000278 c12ce000 f43cd2a8 00000163 00000000 7da86067 00000400 c128fa20 00896b18 f38325a8 c128fe20 ffffffff 00000000 c11f291e 00000400 f75be300 c128fa20 f769c9a0 c10ac779 f380bf34 f7bfee70 c1018e6b f380bf34 Call Trace: [<c10ac4f0>] vsnprintf+0x2ad/0x49b [<c10ac779>] vscnprintf+0x14/0x1f [<c1018e6b>] vprintk+0xc5/0x2f9 [<c10379f1>] handle_fasteoi_irq+0x0/0xab [<c1004f44>] do_IRQ+0x9f/0xb7 [<c117db3b>] preempt_schedule_irq+0x3f/0x5b [<c100264e>] need_resched+0x1f/0x21 [<c10190ba>] printk+0x1b/0x1f [<c107c8ad>] de_put+0x3d/0x50 [<c107c8f8>] proc_delete_inode+0x38/0x41 [<c107c8c0>] proc_delete_inode+0x0/0x41 [<c1066298>] generic_delete_inode+0x5e/0xc6 [<c1065aa9>] iput+0x60/0x62 [<c1063c8e>] d_kill+0x2d/0x46 [<c1063fa9>] dput+0xdc/0xe4 [<c10571a1>] __fput+0xb0/0xcd [<c1054e49>] filp_close+0x48/0x4f [<c1055ee9>] sys_close+0x67/0xa5 [<c10026b6>] sysenter_past_esp+0x5f/0x85 ======================= Code: c9 74 0c f2 ae 74 05 bf 01 00 00 00 4f 89 fa 5f 89 d0 c3 85 c9 57 89 c7 89 d0 74 05 f2 ae 75 01 4f 89 f8 5f c3 89 c1 89 c8 eb 06 <80> 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 c3 90 90 90 57 83 c9 EIP: [<c10acdda>] strnlen+0x6/0x18 SS:ESP 0068:f380be44 Also, remove broken usage of ->deleted from reiserfs: if sget() succeeds, module is already pinned and remove_proc_entry() can't happen => nobody can mark PDE deleted. Dummy proc root in netns code is not marked with refcount 1. AFAICS, we never get it, it's just for proper /proc/net removal. I double checked CLONE_NETNS continues to work. Patch survives many hours of modprobe/rmmod/cat loops without new bugs which can be attributed to refcounting. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-12-05 09:21:20 -08:00
Jan Kara	d4beaf4ab5	jbd: Fix assertion failure in fs/jbd/checkpoint.c Before we start committing a transaction, we call __journal_clean_checkpoint_list() to cleanup transaction's written-back buffers. If this call happens to remove all of them (and there were already some buffers), __journal_remove_checkpoint() will decide to free the transaction because it isn't (yet) a committing transaction and soon we fail some assertion - the transaction really isn't ready to be freed :). We change the check in __journal_remove_checkpoint() to free only a transaction in T_FINISHED state. The locking there is subtle though (as everywhere in JBD ;(). We use j_list_lock to protect the check and a subsequent call to __journal_drop_transaction() and do the same in the end of journal_commit_transaction() which is the only place where a transaction can get to T_FINISHED state. Probably I'm too paranoid here and such locking is not really necessary - checkpoint lists are processed only from log_do_checkpoint() where a transaction must be already committed to be processed or from __journal_clean_checkpoint_list() where kjournald itself calls it and thus transaction cannot change state either. Better be safe if something changes in future... Signed-off-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-12-05 09:21:20 -08:00
Evgeniy Dushistov	0c664f9742	ufs: fix nexstep dir block size This patch fixes regression, introduced since 2.6.16. NextStep variant of UFS as OpenStep uses directory block size equals to 1024. Without this change, ufs_check_page fails in many cases. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Cc: Dave Bailey <dsbailey@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-12-05 09:21:18 -08:00
Jeff Moyer	e00ba3dae0	aio: only account I/O wait time in read_events if there are active requests On 2.6.24, top started showing 100% iowait on one CPU when a UML instance was running (but completely idle). The UML code sits in io_getevents waiting for an event to be submitted and completed. Fix this by checking ctx->reqs_active before scheduling to determine whether or not we are waiting for I/O. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Cc: Zach Brown <zach.brown@oracle.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Jeff Dike <jdike@addtoit.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-12-05 09:21:18 -08:00
Rafael J. Wysocki	e136e769d4	Freezer: Fix JFFS2 garbage collector freezing issue (rev. 2) Fix breakage caused by commit `d5d8c5976d` "freezer: do not send signals to kernel threads" in jffs2_garbage_collect_thread() that assumed it would be sent signals by the freezer. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Pete MacKay <armlinux@architechnical.net> Signed-off-by: Len Brown <len.brown@intel.com>	2007-12-04 01:35:41 -05:00
Linus Torvalds	8002cedc1a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/net-2.6: (27 commits) [INET]: Fix inet_diag dead-lock regression [NETNS]: Fix /proc/net breakage [TEXTSEARCH]: Do not allow zero length patterns in the textsearch infrastructure [NETFILTER]: fix forgotten module release in xt_CONNMARK and xt_CONNSECMARK [NETFILTER]: xt_TCPMSS: remove network triggerable WARN_ON [DECNET]: dn_nl_deladdr() almost always returns no error [IPV6]: Restore IPv6 when MTU is big enough [RXRPC]: Add missing select on CRYPTO mac80211: rate limit wep decrypt failed messages rfkill: fix double-mutex-locking mac80211: drop unencrypted frames if encryption is expected mac80211: Fix behavior of ieee80211_open and ieee80211_close ieee80211: fix unaligned access in ieee80211_copy_snap mac80211: free ifsta->extra_ie and clear IEEE80211_STA_PRIVACY_INVOKED SCTP: Fix build issues with SCTP AUTH. SCTP: Fix chunk acceptance when no authenticated chunks were listed. SCTP: Fix the supported extensions paramter SCTP: Fix SCTP-AUTH to correctly add HMACS paramter. SCTP: Fix the number of HB transmissions. [TCP] illinois: Incorrect beta usage ...	2007-12-03 08:15:36 -08:00
Eric W. Biederman	2b1e300a9d	[NETNS]: Fix /proc/net breakage Well I clearly goofed when I added the initial network namespace support for /proc/net. Currently things work but there are odd details visible to user space, even when we have a single network namespace. Since we do not cache proc_dir_entry dentries at the moment we can just modify ->lookup to return a different directory inode depending on the network namespace of the process looking at /proc/net, replacing the current technique of using a magic and fragile follow_link method. To accomplish that this patch: - introduces a shadow_proc method to allow different dentries to be returned from proc_lookup. - Removes the old /proc/net follow_link magic - Fixes a weakness in our not caching of proc generic dentries. As shadow_proc uses a task struct to decided which dentry to return we can go back later and fix the proc generic caching without modifying any code that uses the shadow_proc method. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2007-12-02 00:33:17 +11:00
Heiko Carstens	81257def2a	tty: add the new termios2 ioctls to the compatible list. Make them depend on TCGETS2. If that one is implemented the rest should be there as well. Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-11-29 09:24:55 -08:00
Miklos Szeredi	08b633070a	fuse: fix attribute caching after rename Invalidate attributes on rename, since some filesystems may update st_ctime. Reported by Szabolcs Szakacsits Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-11-29 09:24:54 -08:00
John Muir	fbee36b92a	fuse: fix uninitialized field in fuse_inode I found problems accessing (executing) previously existing files, until I did chmod on them (or setattr). If the fi->attr_version is not initialized, then it could be larger than fc->attr_version until a setattr is executed, and as a result the inode attributes would never be set. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-11-29 09:24:54 -08:00
Miklos Szeredi	d0186b25e6	fuse: fix FUSE_FILE_OPS sending FUSE_FILE_OPS is meant to signal that the kernel will send the open file to to the userspace filesystem for operations on open files, so that sillyrenaming unlinked files becomes unnecessary. However this needs VFS changes, which won't make it into 2.6.24. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-11-29 09:24:54 -08:00
Miklos Szeredi	a6643094e7	fuse: pass open flags to read and write Some open flags (O_APPEND, O_DIRECT) can be changed with fcntl(F_SETFL, ...) after open, but fuse currently only sends the flags to userspace in open. To make it possible to correcly handle changing flags, send the current value to userspace in each read and write. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-11-29 09:24:54 -08:00
Miklos Szeredi	7dca9fd39f	fuse: cleanup: add fuse_get_attr_version() Extract repeated code into helper function, as suggested by Akpm. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-11-29 09:24:54 -08:00
Miklos Szeredi	bcb4be809d	fuse: fix reading past EOF Currently reading a fuse file will stop at cached i_size and return EOF, even though the file might have grown since the attributes were last updated. So detect if trying to read past EOF, and refresh the attributes before continuing with the read. Thanks to mpb for the report. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-11-29 09:24:54 -08:00
Tobias Poschwatta	6454d1f903	fix up ext2_fs.h for userspace after reservations backport In commit `a686cd898b`: "Val's cross-port of the ext3 reservations code into ext2." include/linux/ext2_fs.h got a new function whose return value is only defined if __KERNEL__ is defined. Putting #ifdef __KERNEL__ around the function seems to help, patch below. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-11-29 09:24:53 -08:00
Eric W. Biederman	19fd4bb2a0	proc: remove races from proc_id_readdir() Oleg noticed that the call of task_pid_nr_ns() in proc_pid_readdir is racy with respect to tasks exiting. After a bit of examination it also appears that the call itself is completely unnecessary. So to fix the problem this patch modifies next_tgid() to return both a tgid and the task struct in question. A structure is introduced to return these values because it is slightly cleaner and easier to optimize, and the resulting code is a little shorter. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-11-29 09:24:52 -08:00
Alexey Dobriyan	c2319540cd	proc: fix NULL ->i_fop oops proc_kill_inodes() can clear ->i_fop in the middle of vfs_readdir resulting in NULL dereference during "file->f_op->readdir(file, buf, filler)". The solution is to remove proc_kill_inodes() completely: a) we don't have tricky modules implementing their tricky readdir hooks which could keeping this revoke from hell. b) In a situation when module is gone but PDE still alive, standard readdir will return only "." and "..", because pde->next was cleared by remove_proc_entry(). c) the race proc_kill_inode() destined to prevent is not completely fixed, just race window made smaller, because vfs_readdir() is run without sb_lock held and without file_list_lock held. Effectively, ->i_fop is cleared at random moment, which can't fix properly anything. BUG: unable to handle kernel NULL pointer dereference at virtual address 00000018 printing eip: c1061205 pdpt = 0000000005b22001 pde = 0000000000000000 Oops: 0000 [#1] PREEMPT SMP Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw sr_mod k8temp cdrom hwmon amd_rng Pid: 2033, comm: find Not tainted (2.6.24-rc1-b1d08ac064268d0ae2281e98bf5e82627e0f0c56 #2) EIP: 0060:[<c1061205>] EFLAGS: 00010246 CPU: 0 EIP is at vfs_readdir+0x47/0x74 EAX: c6b6a780 EBX: 00000000 ECX: c1061040 EDX: c5decf94 ESI: c6b6a780 EDI: fffffffe EBP: c9797c54 ESP: c5decf78 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process find (pid: 2033, ti=c5dec000 task=c64bba90 task.ti=c5dec000) Stack: c5decf94 c1061040 fffffff7 0805ffbc 00000000 c6b6a780 c1061295 0805ffbc 00000000 00000400 00000000 00000004 0805ffbc 4588eff4 c5dec000 c10026ba 00000004 0805ffbc 00000400 0805ffbc 4588eff4 bfdc6c70 000000dc 0000007b Call Trace: [<c1061040>] filldir64+0x0/0xc5 [<c1061295>] sys_getdents64+0x63/0xa5 [<c10026ba>] sysenter_past_esp+0x5f/0x85 ======================= Code: 49 83 78 18 00 74 43 8d 6b 74 bf fe ff ff ff 89 e8 e8 b8 c0 12 00 f6 83 2c 01 00 00 10 75 22 8b 5e 10 8b 4c 24 04 89 f0 8b 14 24 <ff> 53 18 f6 46 1a 04 89 c7 75 0b 8b 56 0c 8b 46 08 e8 c8 66 00 EIP: [<c1061205>] vfs_readdir+0x47/0x74 SS:ESP 0068:c5decf78 hch: "Nice, getting rid of this is a very good step formwards. Unfortunately we have another copy of this junk in security/selinux/selinuxfs.c:sel_remove_entries() which would need the same treatment." Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Acked-by: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-11-29 09:24:52 -08:00
Linus Torvalds	cae2f9c46d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6: sysfs: fix off-by-one error in fill_read_buffer() kobject: two typo fixes UIO: add UIO documentation target to DocBook Makefile UIO: fix up the UIO documentation create /sys/.../power when CONFIG_PM is set allow LEGACY_PTYS to be set to 0	2007-11-28 15:59:50 -08:00
Miao Xie	8118a859dc	sysfs: fix off-by-one error in fill_read_buffer() I found that there is a off-by-one problem in the following code. Version: 2.6.24-rc2 File: fs/sysfs/file.c:118-122 Function: fill_read_buffer -------------------------------------------------------------------- count = ops->show(kobj, attr_sd->s_attr.attr, buffer->page); sysfs_put_active_two(attr_sd); BUG_ON(count > (ssize_t)PAGE_SIZE); -------------------------------------------------------------------- Because according to the specification of the sysfs and the implement of the show methods, the show methods return the number of bytes which would be generated for the given input, excluding the trailing null.So if the return value of the show methods equals PAGE_SIZE - 1, the buffer is full in fact. And if the return value equals PAGE_SIZE, the resulting string was already truncated,or buffer overflow occurred. This patch fixes an off-by-one error in fill_read_buffer. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Tejun Heo <teheo@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-11-28 13:53:53 -08:00
Ingo Molnar	c46f739dd3	vfs: coredumping fix fix: http://bugzilla.kernel.org/show_bug.cgi?id=3043 only allow coredumping to the same uid that the coredumping task runs under. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Alan Cox <alan@redhat.com> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Al Viro <viro@ftp.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-11-28 10:58:01 -08:00
Mark Fasheh	b1967d0edd	ocfs2: reverse inline-data truncate args ocfs2_truncate() and ocfs2_remove_inode_range() had reversed their "set i_size" arguments to ocfs2_truncate_inline(). Fix things so that truncate sets i_size, and punching a hole ignores it. This exposed a problem where punching a hole in an inline-data file wasn't updating the page cache, so fix that too. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-11-27 16:47:03 -08:00
Mark Fasheh	0d8a4e0cd6	ocfs2: Fix comparison in ocfs2_size_fits_inline_data() This was causing us to prematurely push out inline data by one byte. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-11-27 16:47:03 -08:00
Mark Fasheh	bccb9dad89	ocfs2: Remove bug statement in ocfs2_dentry_iput() The existing bug statement didn't take into account unhashed dentries which might not have a cluster lock on them. This could happen if a node exporting the file system via NFS is rebooted, re-exported to nfs clients and then unmounted. It's fine in this case to not have a dentry cluster lock. Just remove the bug statement and replace it with an error print, which does the proper checks. Though we want to know if something has happened which might have prevented a cluster lock from being created, it's definitely not necessary to panic the machine for this. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-11-27 16:47:02 -08:00
Jan Kara	5a58c3ef22	[PATCH] ocfs2: Remove expensive bitmap scanning Enable expensive bitmap scanning only if DEBUG option is enabled. The bitmap scanning quite loads the CPU and on my machine the write throughput of dd if=/dev/zero of=/ocfs2/file bs=1M count=500 conv=sync improves from 37 MB/s to 45.4 MB/s in local mode... Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-11-27 16:47:02 -08:00
Mark Fasheh	a46043e08f	ocfs2: log valid inode # on bad inode If the inode block isn't valid then we don't want to print the value from that, instead print the block number which was passed in (which should always be correct). Also, turn this into a debug print for now - folks who hit an actual problem always have other logs indicating what the source is. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-11-27 16:47:02 -08:00
Mark Fasheh	ef9f86ceb6	ocfs2: Filter -ENOSPC in mlog_errno() It's almost never worth printing in that situation and we keep forgetting to manually filter it out. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-11-27 16:47:01 -08:00
Joe Perches	2759236f84	[PATCH] fs/ocfs2: Add missing "space" Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-11-27 16:47:01 -08:00
Mark Fasheh	e001e796e4	ocfs2: Reset journal parameters after s_mount_opt update Right now we're just setting them from the existing parameters, not the new ones that a remount specified. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-11-27 16:47:01 -08:00
Linus Torvalds	423eaf8f00	Merge git://git.linux-nfs.org/pub/linux/nfs-2.6 * git://git.linux-nfs.org/pub/linux/nfs-2.6: NFS: Clean up new multi-segment direct I/O changes NFS: Ensure we return zero if applications attempt to write zero bytes NFS: Support multiple segment iovecs in the NFS direct I/O path NFS: Introduce iovec I/O helpers to fs/nfs/direct.c SUNRPC: Add missing "space" to net/sunrpc/auth_gss.c SUNRPC: make sunrpc/xprtsock.c:xs_setup_{udp,tcp}() static NFS: fs/nfs/dir.c should #include "internal.h" NFS: make nfs_wb_page_priority() static NFS: mount failure causes bad page state SUNRPC: remove NFS/RDMA client's binary sysctls kernel BUG at fs/nfs/namespace.c:108! - can be triggered by bad server sunrpc: rpc_pipe_poll may miss available data in some cases sunrpc: return error if unsupported enctype or cksumtype is encountered sunrpc: gss_pipe_downcall(), don't assume all errors are transient NFS: Fix the ustat() regression	2007-11-26 19:42:59 -08:00
Linus Torvalds	0685ab4fb8	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: sched: bump version of kernel/sched_debug.c sched: fix minimum granularity tunings sched: fix RLIMIT_CPU comment sched: fix kernel/acct.c comment sched: fix prev_stime calculation sched: don't forget to unlock uids_mutex on error paths	2007-11-26 19:42:08 -08:00
Chuck Lever	02fe494619	NFS: Clean up new multi-segment direct I/O changes Simplify calling sequence of nfs_direct_{read,write}_schedule(), and rename them to reflect their new role. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-11-26 16:32:40 -05:00
Chuck Lever	b9148c6b80	NFS: Ensure we return zero if applications attempt to write zero bytes A zero byte count direct write request should be a successful no-op, not an error. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-11-26 16:32:38 -05:00
Chuck Lever	c216fd708e	NFS: Support multiple segment iovecs in the NFS direct I/O path Allow applications to perform asynchronous scatter-gather direct I/O to NFS files. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-11-26 16:32:36 -05:00
Chuck Lever	19f737879c	NFS: Introduce iovec I/O helpers to fs/nfs/direct.c Add helpers that iterate over multi-segment iovecs. These will be used to support multi-segment scatter/gather direct I/O in a later patch. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-11-26 16:32:35 -05:00
Adrian Bunk	4c30d56edc	NFS: fs/nfs/dir.c should #include "internal.h" Every file should include the headers containing the prototypes for its global functions (in this case nfs_access_cache_shrinker()). Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-11-26 16:24:49 -05:00
Adrian Bunk	5334eb13d4	NFS: make nfs_wb_page_priority() static nfs_wb_page_priority() can now become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-11-26 16:24:48 -05:00
Russell King	f16c960332	NFS: mount failure causes bad page state While testing a kernel based upon `ecd744eec3` (with wrong boot arguments), I got the following bad page state entry while NFS was trying to mount it's rootfs: IP-Config: Complete: device=eth0, addr=192.168.1.101, mask=255.255.255.0, gw=255.255.255.255, host=192.168.1.101, domain=, nis-domain=(none), bootserver=192.168.1.100, rootserver=192.168.1.100, rootpath= Looking up port of RPC 100003/2 on 192.168.1.100 rpcbind: server 192.168.1.100 not responding, timed out Root-NFS: Unable to get nfsd port number from server, using default Looking up port of RPC 100005/1 on 192.168.1.100 rpcbind: server 192.168.1.100 not responding, timed out Root-NFS: Unable to get mountd port number from server, using default mount: server 192.168.1.100 not responding, timed out Root-NFS: Server returned error -5 while mounting /nfs/rootfs/ VFS: Unable to mount root fs via NFS, trying floppy. Bad page state in process 'swapper' page:c02b1260 flags:0x00000400 mapping:00000000 mapcount:0 count:0 Trying to fix it up, but a reboot is needed Backtrace: [<c0023e34>] (dump_stack+0x0/0x14) from [<c0062570>] (bad_page+0x70/0xac) [<c0062500>] (bad_page+0x0/0xac) from [<c0064914>] (free_hot_cold_page+0x80/0x178) [<c0064894>] (free_hot_cold_page+0x0/0x178) from [<c0064a74>] (free_hot_page+0x14/0x18) [<c0064a60>] (free_hot_page+0x0/0x18) from [<c0067078>] (put_page+0xf8/0x154) [<c0066f80>] (put_page+0x0/0x154) from [<c007dbc8>] (kfree+0xc8/0xd0) [<c007db00>] (kfree+0x0/0xd0) from [<c00cbb54>] (nfs_get_sb+0x230/0x710) [<c00cb924>] (nfs_get_sb+0x0/0x710) from [<c0084334>] (vfs_kern_mount+0x58/0xac)[<c00842dc>] (vfs_kern_mount+0x0/0xac) from [<c00843c0>] (do_kern_mount+0x38/0xf4) [<c0084388>] (do_kern_mount+0x0/0xf4) from [<c0099c7c>] (do_mount+0x1e8/0x614) ... This seems to be caused by use of an uninitialised structure due to NULL options being passed to nfs_validate_mount_data(). Ensure that the parsed mount data is always initialised. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> (Trond: added fix for the same bug in nfs4_validate_mount_data()). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-11-26 16:24:22 -05:00
Ingo Molnar	08e4570a4a	sched: fix prev_stime calculation Srivatsa Vaddagiri noticed occasionally incorrect CPU usage values in top and tracked it down to stime going below 0 in task_stime(). Negative values are possible there due to the sampled nature of stime/utime. Fix suggested by Balbir Singh. Signed-off-by: Ingo Molnar <mingo@elte.hu> Tested-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Reviewed-by: Balbir Singh <balbir@linux.vnet.ibm.com>	2007-11-26 21:21:49 +01:00
Steve French	2b83457bde	[CIFS] Fix check after use error in ACL code Spotted by the coverity scanner. CC: Adrian Bunk <bunk@kernel.org> Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-11-25 10:01:00 +00:00
Steve French	058250a0d5	Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6	2007-11-25 09:53:27 +00:00
Jeff Layton	cea218054a	[CIFS] Fix potential data corruption when writing out cached dirty pages Fix RedHat bug 329431 The idea here is separate "conscious" from "unconscious" flushes. Conscious flushes are those due to a fsync() or close(). Unconscious ones are flushes that occur as a side effect of some other operation or due to memory pressure. Currently, when an error occurs during an unconscious flush (ENOSPC or EIO), we toss out the page and don't preserve that error to report to the user when a conscious flush occurs. If after the unconscious flush, there are no more dirty pages for the inode, the conscious flush will simply return success even though there were previous errors when writing out pages. This can lead to data corruption. The easiest way to reproduce this is to mount up a CIFS share that's very close to being full or where the user is very close to quota. mv a file to the share that's slightly larger than the quota allows. The writes will all succeed (since they go to pagecache). The mv will do a setattr to set the new file's attributes. This calls filemap_write_and_wait, which will return an error since all of the pages can't be written out. Then later, when the flush and release ops occur, there are no more dirty pages in pagecache for the file and those operations return 0. mv then assumes that the file was written out correctly and deletes the original. CIFS already has a write_behind_rc variable where it stores the results from earlier flushes, but that value is only reported in cifs_close. Since the VFS ignores the return value from the release operation, this isn't helpful. We should be reporting this error during the flush operation. This patch does the following: 1) changes cifs_fsync to use filemap_write_and_wait and cifs_flush and also sync to check its return code. If it returns successful, they then check the value of write_behind_rc to see if an earlier flush had reported any errors. If so, they return that error and clear write_behind_rc. 2) sets write_behind_rc in a few other places where pages are written out as a side effect of other operations and the code waits on them. 3) changes cifs_setattr to only call filemap_write_and_wait for ATTR_SIZE changes. 4) makes cifs_writepages accurately distinguish between EIO and ENOSPC errors when writing out pages. Some simple testing indicates that the patch works as expected and that it fixes the reproduceable known problem. Acked-by: Dave Kleikamp <shaggy@austin.rr.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-11-20 23:19:03 +00:00
Petr Tesarik	2a97468024	[CIFS] Fix spurious reconnect on 2nd peek from read of SMB length When retrying kernel_recvmsg() because of a short read, check returned length against the remaining length, not against total length. This avoids unneeded session reconnects which would otherwise occur when kernel_recvmsg() finally returns zero when asked to read zero bytes. Signed-off-by: Petr Tesarik <ptesarik@suse.cz> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-11-20 02:24:08 +00:00
Neil Brown	4c1fe2f78a	kernel BUG at fs/nfs/namespace.c:108! - can be triggered by bad server Hi Trond, I have discovered that the BUG_ON in nfs_follow_mountpoint: BUG_ON(IS_ROOT(dentry)); can be triggered by a misbehaving server. What happens is the client does a lookup and discoveres that the named directory has a different fsid, so it initiates a mount. It then performs a GETATTR on the mounted directory and gets a different fsid again (due to a bug in the NFS server). This causes nfs_follow_mountpoint to be called on the newly mounted root, which triggers the BUG_ON. To duplicate this, have a directory which contains some mountpoints, and export that directory with the "crossmnt" flag using nfs-utils 1.1.1 (or 1.1.0 I think) The GETATTR on the root of the mounted filesystem will return the information for the top exportpoint, while a lookup will return the correct information. This difference causes the NFS client to BUG. I think the best way to fix this is to trap this possibility early, so just before completing the mount in the NFS client, check that it isn't going to use nfs_mountpoint_inode_operations. As long as i_op will never change once set (is that true?), this should be adequately safe. The following patch shows a possible approach, and it works for me. i.e. when the NFS server is misbehaving, I get ESTALE on those mountpoints, while when the NFS server is working correctly, I get correct behaviour on the client. NeilBrown Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-11-17 13:08:48 -05:00
Trond Myklebust	b09b9417d0	NFS: Fix the ustat() regression Since 2.6.18, the superblock sb->s_root has been a dummy dentry with a dummy inode. This breaks ustat(), which actually uses sb->s_root in a vfstat() call. Fix this by making the s_root a dummy alias to the directory inode that was used when creating the superblock. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-11-17 13:08:44 -05:00
Steve French	f7a44eadd5	[CIFS] remove build warning CC: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-11-17 00:01:51 +00:00
Steve French	2442421b17	[CIFS] Have CIFS_SessSetup build correct SPNEGO SessionSetup request Have CIFS_SessSetup call cifs_get_spnego_key when Kerberos is negotiated. Use the info in the key payload to build a session setup request packet. Also clean up how the request buffer in the function is freed on error. With appropriate user space helper (in samba/source/client). Kerberos support (secure session establishment can be done now via Kerberos, previously users would have to use NTLMv2 instead for more secure session setup). Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-11-16 23:37:35 +00:00
Steve French	8840dee9dc	[CIFS] minor checkpatch cleanup Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-11-16 23:05:52 +00:00
Jeff Layton	d6c2e4d02b	[CIFS] have cifs_get_spnego_key get the hostname from TCP_Server_Info Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-11-16 22:23:17 +00:00
Jeff Layton	c359cf3c61	[CIFS] add hostname field to TCP_Server_Info struct ...and populate it with the hostname portion of the UNC string. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-11-16 22:22:06 +00:00
Jeff Layton	70fe7dc055	[CIFS] clean up error handling in cifs_mount Move all of the kfree's sprinkled in the middle of the function to the end, and have the code set rc and just goto there on error. Also zero out the password string before freeing it. Looks like this should also fix a potential memory leak of the prepath string if an error occurs near the end of the function. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-11-16 22:21:07 +00:00
Steve French	68bf728a22	[CIFS] add ver= prefix to upcall format version Acked-by: Jeff Layton <jlayton@redhat.com> Acked-by: Igor Mammedov <niallan@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-11-16 18:32:52 +00:00
Jan Kara	7c06a8dc64	Fix 64KB blocksize in ext3 directories With 64KB blocksize, a directory entry can have size 64KB which does not fit into 16 bits we have for entry lenght. So we store 0xffff instead and convert value when read from / written to disk. The patch also converts some places to use ext3_next_entry() when we are changing them anyway. [akpm@linux-foundation.org: coding-style cleanups] Signed-off-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-11-14 18:45:43 -08:00
Jeff Layton	dbaf4c024a	smbfs: fix debug builds Fix some warnings with SMBFS_DEBUG_* builds. This patch makes it so that builds with -Werror don't fail. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-11-14 18:45:43 -08:00
Arjan van de Ven	cb51f973bc	mark sys_open/sys_read exports unused sys_open / sys_read were used in the early 1.2 days to load firmware from disk inside drivers. Since 2.0 or so this was deprecated behavior, but several drivers still were using this. Since a few years we have a request_firmware() API that implements this in a nice, consistent way. Only some old ISA sound drivers (pre-ALSA) still straggled along for some time.... however with commit `c2b1239a9f` the last user is now gone. This is a good thing, since using sys_open / sys_read etc for firmware is a very buggy to dangerous thing to do; these operations put an fd in the process file descriptor table.... which then can be tampered with from other threads for example. For those who don't want the firmware loader, filp_open()/vfs_read are the better APIs to use, without this security issue. The patch below marks sys_open and sys_read unused now that they're really not used anymore, and for deletion in the 2.6.25 timeframe. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-11-14 18:45:42 -08:00
Eric W. Biederman	9fcc2d15b1	proc: simplify and correct proc_flush_task Currently we special case when we have only the initial pid namespace. Unfortunately in doing so the copied case for the other namespaces was broken so we don't properly flush the thread directories :( So this patch removes the unnecessary special case (removing a usage of proc_mnt) and corrects the flushing of the thread directories. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Kirill Korotaev <dev@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-11-14 18:45:42 -08:00
Adrian Bunk	8744969a81	fuse_file_alloc(): fix NULL dereferences Fix obvious NULL dereferences spotted by the Coverity checker. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-11-14 18:45:42 -08:00
Fengguang Wu	c06a018fa5	reiserfs: don't drop PG_dirty when releasing sub-page-sized dirty file This is not a new problem in 2.6.23-git17. 2.6.22/2.6.23 is buggy in the same way. Reiserfs could accumulate dirty sub-page-size files until umount time. They cannot be synced to disk by pdflush routines or explicit `sync' commands. Only `umount' can do the trick. The direct cause is: the dirty page's PG_dirty is wrongly _cleared_. Call trace: [<ffffffff8027e920>] cancel_dirty_page+0xd0/0xf0 [<ffffffff8816d470>] :reiserfs:reiserfs_cut_from_item+0x660/0x710 [<ffffffff8816d791>] :reiserfs:reiserfs_do_truncate+0x271/0x530 [<ffffffff8815872d>] :reiserfs:reiserfs_truncate_file+0xfd/0x3b0 [<ffffffff8815d3d0>] :reiserfs:reiserfs_file_release+0x1e0/0x340 [<ffffffff802a187c>] __fput+0xcc/0x1b0 [<ffffffff802a1ba6>] fput+0x16/0x20 [<ffffffff8029e676>] filp_close+0x56/0x90 [<ffffffff8029fe0d>] sys_close+0xad/0x110 [<ffffffff8020c41e>] system_call+0x7e/0x83 Fix the bug by removing the cancel_dirty_page() call. Tests show that it causes no bad behaviors on various write sizes. === for the patient === Here are more detailed demonstrations of the problem. 1) the page has both PG_dirty(D)/PAGECACHE_TAG_DIRTY(d) after being written to; and then only PAGECACHE_TAG_DIRTY(d) remains after the file is closed. ------------------------------ screen 0 ------------------------------ [T0] root /home/wfg# cat > /test/tiny [T1] hi [T2] root /home/wfg# ------------------------------ screen 1 ------------------------------ [T1] root /home/wfg# echo /test/tiny > /proc/filecache [T1] root /home/wfg# cat /proc/filecache # file /test/tiny # flags R:referenced A:active M:mmap U:uptodate D:dirty W:writeback O:owner B:buffer d:dirty w:writeback # idx len state refcnt 0 1 ___UD__Bd_ 2 [T2] root /home/wfg# cat /proc/filecache # file /test/tiny # flags R:referenced A:active M:mmap U:uptodate D:dirty W:writeback O:owner B:buffer d:dirty w:writeback # idx len state refcnt 0 1 ___U___Bd_ 2 2) note the non-zero 'cancelled_write_bytes' after /tmp/hi is copied. ------------------------------ screen 0 ------------------------------ [T0] root /home/wfg# echo hi > /tmp/hi [T1] root /home/wfg# cp /tmp/hi /dev/stdin /test [T2] hi [T3] root /home/wfg# ------------------------------ screen 1 ------------------------------ [T1] root /proc/4397# cd /proc/`pidof cp` [T1] root /proc/4713# cat io rchar: 8396 wchar: 3 syscr: 20 syscw: 1 read_bytes: 0 write_bytes: 20480 cancelled_write_bytes: 4096 [T2] root /proc/4713# cat io rchar: 8399 wchar: 6 syscr: 21 syscw: 2 read_bytes: 0 write_bytes: 24576 cancelled_write_bytes: 4096 //Question: the 'write_bytes' is a bit more than expected ;-) Tested-by: Maxim Levitsky <maximlevitsky@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Reviewed-by: Chris Mason <chris.mason@oracle.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-11-14 18:45:41 -08:00
Dmitri Vorobiev	f433dc5634	Fixes to the BFS filesystem driver I found a few bugs in the BFS driver. Detailed description of the bugs as well as the steps to reproduce the errors are given in the kernel bugzilla. Please follow these links for more information: http://bugzilla.kernel.org/show_bug.cgi?id=9363 http://bugzilla.kernel.org/show_bug.cgi?id=9364 http://bugzilla.kernel.org/show_bug.cgi?id=9365 http://bugzilla.kernel.org/show_bug.cgi?id=9366 This patch fixes the bugs described above. Besides, the patch introduces coding style changes to make the BFS driver conform to the requirements specified for Linux kernel code. Finally, I made a few cosmetic changes such as removal of trivial debug output. Also, the patch removes the fields `si_lf_ioff' and `si_lf_sblk' of the in-core superblock structure. These fields are initialized but never actually used. If you are wondering why I need BFS, here is the answer: I am using this driver in the context of Linux kernel classes I am teaching in the Moscow State University and in the International Institute of Information Technology in Pune, India. Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com> Cc: Tigran Aivazian <tigran@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-11-14 18:45:40 -08:00
Adam Litke	9a119c056d	hugetlb: allow bulk updating in hugetlb_*_quota() Add a second parameter 'delta' to hugetlb_get_quota and hugetlb_put_quota to allow bulk updating of the sbinfo->free_blocks counter. This will be used by the next patch in the series. Signed-off-by: Adam Litke <agl@us.ibm.com> Cc: Ken Chen <kenchen@google.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: David Gibson <hermes@gibson.dropbear.id.au> Cc: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-11-14 18:45:40 -08:00
Adam Litke	c79fb75e5a	hugetlb: fix quota management for private mappings The hugetlbfs quota management system was never taught to handle MAP_PRIVATE mappings when that support was added. Currently, quota is debited at page instantiation and credited at file truncation. This approach works correctly for shared pages but is incomplete for private pages. In addition to hugetlb_no_page(), private pages can be instantiated by hugetlb_cow(); but this function does not respect quotas. Private huge pages are treated very much like normal, anonymous pages. They are not "backed" by the hugetlbfs file and are not stored in the mapping's radix tree. This means that private pages are invisible to truncate_hugepages() so that function will not credit the quota. This patch (based on a prototype provided by Ken Chen) moves quota crediting for all pages into free_huge_page(). page->private is used to store a pointer to the mapping to which this page belongs. This is used to credit quota on the appropriate hugetlbfs instance. Signed-off-by: Adam Litke <agl@us.ibm.com> Cc: Ken Chen <kenchen@google.com> Cc: Ken Chen <kenchen@google.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: David Gibson <hermes@gibson.dropbear.id.au> Cc: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-11-14 18:45:40 -08:00

1 2 3 4 5 ...

7284 commits