linux

Commit Graph

Author	SHA1	Message	Date
Jason Gunthorpe	44a743f687	xfs: Fix error return for fallocate() on XFS Noticed that through glibc fallocate would return 28 rather than -1 and errno = 28 for ENOSPC. The xfs routines uses XFS_ERROR format positive return error codes while the syscalls use negative return codes. Fixup the two cases in xfs_vn_fallocate syscall to convert to negative. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-11 15:11:23 -06:00
Christoph Hellwig	30ac0683dd	xfs: cleanup dmapi macros in the umount path Stop the flag saving as we never mangle those in the unmount path, and hide all the weird arguents to the dmapi code inside the XFS_SEND_PREUNMOUNT / XFS_SEND_UNMOUNT macros. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-11 15:11:23 -06:00
Christoph Hellwig	0c3dc2b02a	xfs: remove incorrect sparse annotation for xfs_iget_cache_miss xfs_iget_cache_miss does not get called with the pag_ici_lock held, so the __releases annotation is incorrect. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-11 15:11:23 -06:00
Christoph Hellwig	b8f82a4a6f	xfs: kill the STATIC_INLINE macro Remove our own STATIC_INLINE macro. For small function inside implementation files just use STATIC and let gcc inline it, and for those in headers do the normal static inline - they are all small enough to be inlined for debug builds, too. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-11 15:11:22 -06:00
Christoph Hellwig	5683f53e36	xfs: uninline xfs_get_extsz_hint This function is too large to efficiently be inlined. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-11 15:11:22 -06:00
Christoph Hellwig	e82fa0c7ca	xfs: rename xfs_attr_fetch to xfs_attr_get_int Using a totally different name for the low-level get operation does not fit the _int convention used in the rest of the attr code, so rename it. While we're at it also fix the prototype to use the normal convention and mark it static as it's never used outside of xfs_attr.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-11 15:11:22 -06:00
Christoph Hellwig	6ad112bfb5	xfs: simplify xfs_buf_get / xfs_buf_read interfaces Currently the low-level buffer cache interfaces are highly confusing as we have a _flags variant of each that does actually respect the flags, and one without _flags which has a flags argument that gets ignored and overriden with a default set. Given that very few places use the default arguments get rid of the duplication and convert all callers to pass the flags explicitly. Also remove the now confusing _flags postfix. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-11 15:11:21 -06:00
Christoph Hellwig	c355c656fe	xfs: remove IO_ISAIO We set the IO_ISAIO flag for all read/write I/O since early Linux 2.6.x. Remove it as it has lost it's purpose long ago. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-11 15:11:21 -06:00
Andy Poling	fc5bc4c85c	xfs: Wrapped journal record corruption on read at recovery Summary of problem: If a journal record wraps at the physical end of the journal, it has to be read in two parts in xlog_do_recovery_pass(): a read at the physical end and a read at the physical beginning. If xlog_bread() has to re-align the first read, the second read request does not take that re-alignment into account. If the first read was re-aligned, the second read over-writes the end of the data from the first read, effectively corrupting it. This can happen either when reading the record header or reading the record data. The first sanity check in xlog_recover_process_data() is to check for a valid clientid, so that is the error reported. Summary of fix: If there was a first read at the physical end, XFS_BUF_PTR() returns where the data was requested to begin. Conversely, because it is the result of xlog_align(), offset indicates where the requested data for the first read actually begins - whether or not xlog_bread() has re-aligned it. Using offset as the base for the calculation of where to place the second read data ensures that it will be correctly placed immediately following the data from the first read instead of sometimes over-writing the end of it. The attached patch has resolved the reported problem of occasional inability to recover the journal (reporting "bad clientid"). Signed-off-by: Andy Poling <andy@realbig.com> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-11 15:11:21 -06:00
Christoph Hellwig	5ec4fabb02	xfs: cleanup data end I/O handlers Currently we have different end I/O handlers for read vs the different types of write I/O. But they are all very similar so we could just use one with a few conditionals and reduce code size a lot. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-11 15:11:20 -06:00
Christoph Hellwig	06342cf8ad	xfs: use WRITE_SYNC_PLUG for synchronous writeout The VM and I/O schedulers now expect us to use WRITE_SYNC_PLUG for synchronous writeout. Right now I can't see any changes in performance numbers with this, but we're getting some beating for not using it, and the knowledge definitely could help the block code to make better decisions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-11 15:11:20 -06:00
Christoph Hellwig	033da48fda	xfs: reset the i_iolock lock class in the reclaim path The iolock is used for protecting reads, writes and block truncates against each other. We have two classes of callers, the first one is induced by a file operation and requires a reference to the inode be held and not dropped after the operation is done: - xfs_vm_vmap, xfs_vn_fallocate, xfs_read, xfs_write, xfs_splice_read, xfs_splice_write and xfs_setattr are all implementations of VFS methods that require a live inode - xfs_getbmap and xfs_swap_extents are ioctl subcommand for which the same is true - xfs_truncate_file is only called on quota inodes just returned from xfs_iget - xfs_sync_inode_data does the lock just after an igrab() - xfs_filestream_associate and xfs_filestream_new_ag take the iolock on the parent inode of an inode which by VFS rules must be referenced And we have various calls to truncate blocks past EOF or the whole file when dropping the last reference to an inode. Unfortunately lockdep complains when we do memory allocations that can recurse into the filesystem in the first class because the second class happens to take the same lock. To avoid this re-init the iolock in the beginning of xfs_fs_clear_inode to get a new lock class. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-11 15:11:20 -06:00
Christoph Hellwig	80641dc66a	xfs: I/O completion handlers must use NOFS allocations When completing I/O requests we must not allow the memory allocator to recurse into the filesystem, as we might deadlock on waiting for the I/O completion otherwise. The only thing currently allocating normal GFP_KERNEL memory is the allocation of the transaction structure for the unwritten extent conversion. Add a memflags argument to _xfs_trans_alloc to allow controlling the allocator behaviour. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Thomas Neumann <tneumann@users.sourceforge.net> Tested-by: Thomas Neumann <tneumann@users.sourceforge.net> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-11 15:11:20 -06:00
Christoph Hellwig	c56c9631cb	xfs: fix mmap_sem/iolock inversion in xfs_free_eofblocks When xfs_free_eofblocks is called from ->release the VM might already hold the mmap_sem, but in the write path we take the iolock before taking the mmap_sem in the generic write code. Switch xfs_free_eofblocks to only trylock the iolock if called from ->release and skip trimming the prellocated blocks in that case. We'll still free them later on the final iput. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-11 15:11:19 -06:00
Christoph Hellwig	848ce8f731	xfs: simplify inode teardown Currently the reclaim code for the case where we don't reclaim the final reclaim is overly complicated. We know that the inode is clean but instead of just directly reclaiming the clean inode we go through the whole process of marking the inode reclaimable just to directly reclaim it from the calling context. Besides being overly complicated this introduces a race where iget could recycle an inode between marked reclaimable and actually being reclaimed leading to panics. This patch gets rid of the existing reclaim path, and replaces it with a simple call to xfs_ireclaim if the inode was clean. While we're at it we also use the slightly more lax xfs_inode_clean check we'd use later to determine if we need to flush the inode here. Finally get rid of xfs_reclaim function and place the remaining small bits of reclaim code directly into xfs_fs_destroy_inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Patrick Schreurs <patrick@news-service.com> Reported-by: Tommy van Leeuwen <tommy@news-service.com> Tested-by: Patrick Schreurs <patrick@news-service.com> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-11 15:11:19 -06:00
Linus Torvalds	4e2ccdb040	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (49 commits) nilfs2: separate wait function from nilfs_segctor_write nilfs2: add iterator for segment buffers nilfs2: hide nilfs_write_info struct in segment buffer code nilfs2: relocate io status variables to segment buffer nilfs2: do not return io error for bio allocation failure nilfs2: use list_splice_tail or list_splice_tail_init nilfs2: replace mark_inode_dirty as nilfs_mark_inode_dirty nilfs2: delete mark_inode_dirty in nilfs_delete_entry nilfs2: delete mark_inode_dirty in nilfs_commit_chunk nilfs2: change return type of nilfs_commit_chunk nilfs2: split nilfs_unlink as nilfs_do_unlink and nilfs_unlink nilfs2: delete redundant mark_inode_dirty nilfs2: expand inode_inc_link_count and inode_dec_link_count nilfs2: delete mark_inode_dirty from nilfs_set_link nilfs2: delete mark_inode_dirty in nilfs_new_inode nilfs2: add norecovery mount option nilfs2: add helper to get if volume is in a valid state nilfs2: move recovery completion into load_nilfs function nilfs2: apply readahead for recovery on mount nilfs2: clean up get/put function of a segment usage ...	2009-12-11 11:49:18 -08:00
Eric W. Biederman	e16acb503b	sysfs: sysfs_setattr remove unnecessary permission check. inode_change_ok already clears the SGID bit when necessary so there is no reason for sysfs_setattr to carry code to do the same, and it is good to kill the extra copy because when I moved the code last in certain corner cases the code will look at the wrong gid. Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:54 -08:00
Eric W. Biederman	ca1bab3819	sysfs: Factor out sysfs_rename from sysfs_rename_dir and sysfs_move_dir These two functions do 90% of the same work and it doesn't significantly obfuscate the function to allow both the parent dir and the name to change at the same time. So merge them together to simplify maintenance, and increase testing. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:54 -08:00
Eric W. Biederman	832b6af198	sysfs: Propagate renames to the vfs on demand By teaching sysfs_revalidate to hide a dentry for a sysfs_dirent if the sysfs_dirent has been renamed, and by teaching sysfs_lookup to return the original dentry if the sysfs dirent has been renamed. I can show the results of renames correctly without having to update the dcache during the directory rename. This massively simplifies the rename logic allowing a lot of weird sysfs special cases to be removed along with a lot of now unnecesary helper code. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:54 -08:00
Eric W. Biederman	a16bbc3430	sysfs: Gut sysfs_addrm_start and sysfs_addrm_finish With lazy inode updates and dentry operations bringing everything into sync on demand there is no longer any need to immediately update the vfs or grab i_mutex to protect those updates as we make changes to sysfs. Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:54 -08:00
Eric W. Biederman	06fc0d66f7	sysfs: In sysfs_chmod_file lazily propagate the mode change. Now that sysfs_getattr and sysfs_permission refresh the vfs inode there is no need to immediatly push the mode change into the vfs cache. Reducing the amount of work needed and simplifying the locking. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:54 -08:00
Eric W. Biederman	e61ab4ae48	sysfs: Implement sysfs_getattr & sysfs_permission With the implementation of sysfs_getattr and sysfs_permission sysfs becomes able to lazily propogate inode attribute changes from the sysfs_dirents to the vfs inodes. This paves the way for deleting significant chunks of now unnecessary code. While doing this we did not reference sysfs_setattr from sysfs_symlink_inode_operations so I added along with sysfs_getattr and sysfs_permission. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:54 -08:00
Eric W. Biederman	c099aacd48	sysfs: Nicely indent sysfs_symlink_inode_operations Lining up the functions in sysfs_symlink_inode_operations follows the pattern in the rest of sysfs and makes things slightly more readable. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:53 -08:00
Eric W. Biederman	6b0bfe9383	sysfs: Update s_iattr on link and unlink. Currently sysfs updates the timestamps on the vfs directory inode when we create or remove a directory entry but doesn't update the cached copy on the sysfs_dirent, fix that oversight. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:53 -08:00
Eric W. Biederman	35df63c46c	sysfs: Fix locking and factor out sysfs_sd_setattr Cleanly separate the work that is specific to setting the attributes of a sysfs_dirent from what is needed to update the attributes of a vfs inode. Additionally grab the sysfs_mutex to keep any nasties from surprising us when updating the sysfs_dirent. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:53 -08:00
Eric W. Biederman	4be3df28be	sysfs: Simplify iattr time assignments The granularity of sysfs time when we keep it is 1 ns. Which when passed to timestamp_trunc results in a nop. So remove the unnecessary function call making sysfs_setattr slightly easier to read. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:53 -08:00
Eric W. Biederman	4c6974f51a	sysfs: Simplify sysfs_chmod_file semantics Currently every caller of sysfs_chmod_file happens at either file creation time to set a non-default mode or in response to a specific user requested space change in policy. Making timestamps of when the chmod happens and notification of a file changing mode uninteresting. Remove the unnecessary time stamp and filesystem change notification, and removes the last of the explicit inotify and donitfy support from sysfs. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:53 -08:00
Eric W. Biederman	e8f077c883	sysfs: Use dentry_ops instead of directly playing with the dcache Calling d_drop unconditionally when a sysfs_dirent is deleted has the potential to leak mounts, so instead implement dentry delete and revalidate operations that cause sysfs dentries to be removed at the appropriate time. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:53 -08:00
Eric W. Biederman	28a027cfc0	sysfs: Rename sysfs_d_iput to sysfs_dentry_iput Using dentry instead of d in the function name is what several other filesystems are doing and it seems to be a more readable convention. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:53 -08:00
Eric W. Biederman	f44d3e7857	sysfs: Update sysfs_setxattr so it updates secdata under the sysfs_mutex The sysfs_mutex is required to ensure updates are and will remain atomic with respect to other inode iattr updates, that do not happen through the filesystem. Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:53 -08:00
Mathieu Desnoyers	d3a3b0adad	debugfs: fix create mutex racy fops and private data Setting fops and private data outside of the mutex at debugfs file creation introduces a race where the files can be opened with the wrong file operations and private data. It is easy to trigger with a process waiting on file creation notification. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:53 -08:00
Stefan Richter	f38506c49d	sysfs: mark a locally-only used function static Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Acked-by: David P. Quigley <dpquigl@tycho.nsa.gov> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:51 -08:00
Arnd Bergmann	637e8a60a7	usbdevfs: move compat_ioctl handling to devio.c Half the compat_ioctl handling is in devio.c, the other half is in fs/compat_ioctl.c. This moves everything into one place for consistency. As a positive side-effect, push down the BKL into the ioctl methods. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Oliver Neukum <oliver@neukum.org> Cc: Alon Bar-Lev <alon.barlev@gmail.com> Cc: David Vrabel <david.vrabel@csr.com> Cc: linux-usb@vger.kernel.org	2009-12-10 22:55:37 +01:00
Arnd Bergmann	3695669cd4	lp: move compat_ioctl handling into lp.c Handling for LPSETTIMEOUT can easily be done in lp_ioctl, which is the only user. As a positive side-effect, push the BKL into the ioctl methods. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-10 22:55:36 +01:00
Arnd Bergmann	43c6e7b97f	compat_ioctl: pass compat pointer directly to handlers Instead of having each handler call compat_ptr, we can now convert the pointer once and pass that to each handler. This saves a little bit of both source and object code size. Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2009-12-10 22:52:12 +01:00
Arnd Bergmann	661f627da9	compat_ioctl: simplify lookup table The compat_ioctl table now only contains entries for COMPATIBLE_IOCTL, so we only need to know if a number is listed in it or now. As an optimization, we hash the table entries with a reversible transformation to get a more uniform distribution over it, sort the table at startup and then guess the position in the table when an ioctl number gets called to do a linear search from there. With the current set of ioctl numbers and the chosen transformation function, we need an average of four steps to find if a number is in the set, all of the accesses within one or two cache lines. This at least as good as the previous hash table approach but saves 8.5 kb of kernel memory. Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2009-12-10 22:52:11 +01:00
Arnd Bergmann	789f0f8911	compat_ioctl: simplify calling of handlers The compat_ioctl array now contains only entries for ioctl numbers that do not require a separate handler. By special-casing the ULONG_IOCTL case in the do_ioctl_trans function, we can kill the final use of a function pointer in the array. text data bss dec hex filename 7539 13352 2080 22971 59bb before/fs/compat_ioctl.o 7910 8552 2080 18542 486e after/fs/compat_ioctl.o Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2009-12-10 22:52:10 +01:00
Arnd Bergmann	5a07ea0b97	compat_ioctl: inline all conversion handlers This makes all ioctl conversion handlers called from a single switch statement, leaving only COMPATIBLE_IOCTL and ULONG_IOCTL statements in the table. This is somewhat more space efficient and also lets us simplify the handling of the lookup table significantly. before: text data bss dec hex filename 7619 14024 2080 23723 5cab obj/fs/compat_ioctl.o after: 7567 13352 2080 22999 59d7 obj/fs/compat_ioctl.o Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2009-12-10 22:52:09 +01:00
Arnd Bergmann	348c4b9078	compat_ioctl: Remove BKL We have always called ioctl conversion handlers under the big kernel lock, although that is generally not necessary. In particular it is not needed for conversion of data structures and for calling sys_ioctl or do_vfs_ioctl, which will get the BKL again if needed. Handlers doing more than those two have been moved out, so we can kill off the BKL from compat_sys_ioctl. This may significantly improve latencies with 32 bit applications, and it avoids a common scenario where a thread acquires the BKL twice. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2009-12-10 22:52:08 +01:00
Arnd Bergmann	fb07a5f857	compat_ioctl: remove all VT ioctl handling The VT driver now handles all of these ioctls directly, so we can remove the handlers from common code. These are the only handlers that require the BKL because they directly perform the ioctl action rather than just converting the data structures. Once they are gone, we can remove the BKL from the remaining ioctl conversion handlers. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2009-12-10 22:52:08 +01:00
Linus Torvalds	02412f49f6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: always use GFP_NOFS	2009-12-10 09:33:59 -08:00
Linus Torvalds	4515c3069d	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (47 commits) ext4: Fix potential fiemap deadlock (mmap_sem vs. i_data_sem) ext4: Do not override ext2 or ext3 if built they are built as modules jbd2: Export jbd2_log_start_commit to fix ext4 build ext4: Fix insufficient checks in EXT4_IOC_MOVE_EXT ext4: Wait for proper transaction commit on fsync ext4: fix incorrect block reservation on quota transfer. ext4: quota macros cleanup ext4: ext4_get_reserved_space() must return bytes instead of blocks ext4: remove blocks from inode prealloc list on failure ext4: wait for log to commit when umounting ext4: Avoid data / filesystem corruption when write fails to copy data ext4: Use ext4 file system driver for ext2/ext3 file system mounts ext4: Return the PTR_ERR of the correct pointer in setup_new_group_blocks() jbd2: Add ENOMEM checking in and for jbd2_journal_write_metadata_buffer() ext4: remove unused parameter wbc from __ext4_journalled_writepage() ext4: remove encountered_congestion trace ext4: move_extent_per_page() cleanup ext4: initialize moved_len before calling ext4_move_extents() ext4: Fix double-free of blocks with EXT4_IOC_MOVE_EXT ext4: use ext4_data_block_valid() in ext4_free_blocks() ...	2009-12-10 09:33:29 -08:00
Linus Torvalds	a5eba3f66f	Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd * 'for-linus' of git://git.open-osd.org/linux-open-osd: exofs: Multi-device mirror support exofs: Move all operations to an io_engine exofs: move osd.c to ios.c exofs: statfs blocks is sectors not FS blocks exofs: Prints on mount and unmout exofs: refactor exofs_i_info initialization into common helper exofs: dbg-print less exofs: More sane debug print trivial: some small fixes in exofs documentation	2009-12-10 09:32:24 -08:00
Linus Torvalds	fc1495bf99	Merge git://git.infradead.org/ubifs-2.6 * git://git.infradead.org/ubifs-2.6: UBIFS: fix return code in check_leaf UBI: flush wl before clearing update marker MAINTAINERS: change e-mail of Artem Bityutskiy UBIFS: remove manual O_SYNC handling UBIFS: support mounting of UBI volume character devices UBI: Add ubi_open_volume_path	2009-12-10 09:31:45 -08:00
Trond Myklebust	190f38e5ce	NFS: Fix nfs_migrate_page() The call to migrate_page() will cause the page->private field to be cleared. Also fix up the locking around the page->private transfer, so that we ensure that calls to nfs_page_find_request() don't end up racing. Finally, fix up a double free bug: nfs_unlock_request() already calls nfs_release_request() for us... Reported-by: Wu Fengguang <fengguang.wu@intel.com> Tested-by: Andi Kleen <andi@firstfloor.org> Cc: stable@kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-10 09:05:55 -05:00
Roel Kluin	8e0eb4011b	ext3: PTR_ERR return of wrong pointer in setup_new_group_blocks() Return the PTR_ERR of the correct pointer. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:55 +01:00
Jan Kara	68eb3db083	ext3: Fix data / filesystem corruption when write fails to copy data When ext3_write_begin fails after allocating some blocks or generic_perform_write fails to copy data to write, we truncate blocks already instantiated beyond i_size. Although these blocks were never inside i_size, we have to truncate pagecache of these blocks so that corresponding buffers get unmapped. Otherwise subsequent __block_prepare_write (called because we are retrying the write) will find the buffers mapped, not call ->get_block, and thus the page will be backed by already freed blocks leading to filesystem and data corruption. Reported-by: James Y Knight <foom@fuhm.net> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:55 +01:00
Jan Kara	5a20bdfcdc	ext4: Support for 64-bit quota format Add support for new 64-bit quota format. It is enough to add proper mount options handling. The rest is done by the generic code. Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:54 +01:00
Jan Kara	1aeec43432	ext3: Support for vfsv1 quota format We just have to add proper mount options handling. The rest is handled by the generic quota code. CC: linux-ext4@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:54 +01:00
Jan Kara	498c60153e	quota: Implement quota format with 64-bit space and inode limits So far the maximum quota space limit was 4TB. Apparently this isn't enough for Lustre guys anymore. So implement new quota format which raises block limits to 2^64 bytes. Also store number of inodes and inode limits in 64-bit variables as 2^32 files isn't that insanely high anymore. The first version of the patch has been developed by Andrew Perepechko <Andrew.Perepechko@Sun.COM>. CC: Andrew.Perepechko@Sun.COM Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:54 +01:00
Jan Kara	3067393005	quota: Move definition of QFMT_OCFS2 to linux/quota.h Move definition of this constant to linux/quota.h so that it cannot clash with other format IDs. CC: Joel Becker <joel.becker@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:53 +01:00
Jérémy Cochoy	92e128884b	ext2: fix comment in ext2_find_entry about return values Signed-off-by: Jérémy Cochoy <jeremy.cochoy@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:53 +01:00
Alexey Fisher	4cf46b67eb	ext3: Unify log messages in ext3 Make messages produced by ext3 more unified. It should be easy to parse. dmesg before patch: [ 4893.684892] reservations ON [ 4893.684896] xip option not supported [ 4893.684964] EXT3-fs warning: maximal mount count reached, running e2fsck is recommended dmesg after patch: [ 873.300792] EXT3-fs (loop0): using internal journaln [ 873.300796] EXT3-fs (loop0): mounted filesystem with writeback data mode [ 924.163657] EXT3-fs (loop0): error: can't find ext3 filesystem on dev loop0. [ 723.755642] EXT3-fs (loop0): error: bad blocksize 8192 [ 357.874687] EXT3-fs (loop0): error: no journal found. mounting ext3 over ext2? [ 873.300764] EXT3-fs (loop0): warning: maximal mount count reached, running e2fsck is recommended [ 924.163657] EXT3-fs (loop0): error: can't find ext3 filesystem on dev loop0. Signed-off-by: Alexey Fisher <bug-track@fisher-privat.net> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:53 +01:00
Stephen Hemminger	2074abfeb8	ext2: clear uptodate flag on super block I/O error This fixes a WARN backtrace in mark_buffer_dirty() that occurs during unmount when a USB or floppy device is removed. I reported this a kernel regression, but looks like it might have been there for longer than that. The super block update from a previous operation has marked the buffer as in error, and the flag has to be cleared before doing the update. (Similar code already exists in ext4). Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:53 +01:00
Alexey Fisher	2314b07cb4	ext2: Unify log messages in ext2 make messages produced by ext2 more unified. It should be easy to parse. dmesg before patch: [ 4893.684892] reservations ON [ 4893.684896] xip option not supported [ 4893.684961] EXT2-fs warning: mounting ext3 filesystem as ext2 [ 4893.684964] EXT2-fs warning: maximal mount count reached, running e2fsck is recommended [ 4893.684990] EXT II FS: 0.5b, 95/08/09, bs=1024, fs=1024, gc=2, bpg=8192, ipg=1280, mo=80010] dmesg after patch: [ 4893.684892] EXT2-fs (loop0): reservations ON [ 4893.684896] EXT2-fs (loop0): xip option not supported [ 4893.684961] EXT2-fs (loop0): warning: mounting ext3 filesystem as ext2 [ 4893.684964] EXT2-fs (loop0): warning: maximal mount count reached, running e2fsck is recommended [ 4893.684990] EXT2-fs (loop0): 0.5b, 95/08/09, bs=1024, fs=1024, gc=2, bpg=8192, ipg=1280, mo=80010] Signed-off-by: Alexey Fisher <bug-track@fisher-privat.net> Reviewed-by: Andreas Dilger <adilger@sun.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:52 +01:00
Eric Sandeen	dee1d3b627	ext3: make "norecovery" an alias for "noload" Users on the list recently complained about differences across filesystems w.r.t. how to mount without a journal replay. In the discussion it was noted that xfs's "norecovery" option is perhaps more descriptively accurate than "noload," so let's make that an alias for ext3. Also show this status in /proc/mounts Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:52 +01:00
Eric Sandeen	b918397542	ext3: Don't update the superblock in ext3_statfs() commit `a71ce8c6c9` updated ext3_statfs() to update the on-disk superblock counters, but modified this buffer directly without any journaling of the change. This is one of the accesses that was causing the crc errors in journal replay as seen in kernel.org bugzilla #14354. The modifications were originally to keep the sb "more" in sync, so that a readonly fsck of the device didn't flag this as an error (as often), but apparently e2fsprogs deals with this differently now, anyway. Based on Ted's patch for ext4, which was in turn based on my work on that bug and another preliminary patch... Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:52 +01:00
Eric Sandeen	d965736b8c	ext3: journal all modifications in ext3_xattr_set_handle ext3_xattr_set_handle() was zeroing out an inode outside of journaling constraints; this is one of the accesses that was causing the crc errors in journal replay as seen in kernel.org bugzilla #14354. Although ext3 doesn't have the crc issue, modifications out of journal control are a Bad Thing. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:52 +01:00
Jan Kara	c56818d7dc	quota: Fix WARN_ON in lookup_one_len We should hold i_mutex when looking up quota files for journaled quotas, otherwise a WARN_ON in lookup_one_len triggers. The fact that we didn't hold i_mutex previously probably could not lead to a real bug since the filesystem is just being mounted / remounted read-write and thus the root directory cannot change anyway but it's definitely cleaner with i_mutex. Reported-by: Bastien ROUCARIES <roucaries.bastien@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:51 +01:00
Alexey Dobriyan	1472da5fdc	const: struct quota_format_ops Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:51 +01:00
Christoph Hellwig	5ced58f735	ubifs: remove manual O_SYNC handling generic_file_aio_write already calls into ->fsync to handle O_SYNC/O_DSYNC. Remove the duplicate call to ubifs_sync_wbufs_by_inode which is already covered by ubifs_fsync. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:51 +01:00
Christoph Hellwig	027cf316af	afs: remove manual O_SYNC handling generic_file_aio_write already calls into ->fsync to handle O_SYNC/O_DSYNC. Remove the duplicate manual invocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:50 +01:00
Christoph Hellwig	94004ed726	kill wait_on_page_writeback_range All callers really want the more logical filemap_fdatawait_range interface, so convert them to use it and merge wait_on_page_writeback_range into filemap_fdatawait_range. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:50 +01:00
Christoph Hellwig	6b2f3d1f76	vfs: Implement proper O_SYNC semantics While Linux provided an O_SYNC flag basically since day 1, it took until Linux 2.4.0-test12pre2 to actually get it implemented for filesystems, since that day we had generic_osync_around with only minor changes and the great "For now, when the user asks for O_SYNC, we'll actually give O_DSYNC" comment. This patch intends to actually give us real O_SYNC semantics in addition to the O_DSYNC semantics. After Jan's O_SYNC patches which are required before this patch it's actually surprisingly simple, we just need to figure out when to set the datasync flag to vfs_fsync_range and when not. This patch renames the existing O_SYNC flag to O_DSYNC while keeping it's numerical value to keep binary compatibility, and adds a new real O_SYNC flag. To guarantee backwards compatiblity it is defined as expanding to both the O_DSYNC and the new additional binary flag (__O_SYNC) to make sure we are backwards-compatible when compiled against the new headers. This also means that all places that don't care about the differences can just check O_DSYNC and get the right behaviour for O_SYNC, too - only places that actuall care need to check __O_SYNC in addition. Drivers and network filesystems have been updated in a fail safe way to always do the full sync magic if O_DSYNC is set. The few places setting O_SYNC for lower layers are kept that way for now to stay failsafe. We enforce that O_DSYNC is set when __O_SYNC is set early in the open path to make sure we always get these sane options. Note that parisc really screwed up their headers as they already define a O_DSYNC that has always been a no-op. We try to repair it by using it for the new O_DSYNC and redefinining O_SYNC to send both the traditional O_SYNC numerical value _and_ the O_DSYNC one. Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Dilger <adilger@sun.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:50 +01:00
Jan Kara	59bc055211	zisofs: Implement reading of compressed files when PAGE_CACHE_SIZE > compress block size Also split and cleanup zisofs_readpage() when we are changing it anyway. Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:49 +01:00
Boaz Harrosh	04dc1e88ad	exofs: Multi-device mirror support This patch changes on-disk format, it is accompanied with a parallel patch to mkfs.exofs that enables multi-device capabilities. After this patch, old exofs will refuse to mount a new formatted FS and new exofs will refuse an old format. This is done by moving the magic field offset inside the FSCB. A new FSCB version field was added. In the future, exofs will refuse to mount unmatched FSCB version. To up-grade or down-grade an exofs one must use mkfs.exofs --upgrade option before mounting. Introduced, a new object that contains a device-table. This object contains the default data-map and a linear array of devices information, which identifies the devices used in the filesystem. This object is only written to offline by mkfs.exofs. This is why it is kept separate from the FSCB, since the later is written to while mounted. Same partition number, same object number is used on all devices only the device varies. * define the new format, then load the device table on mount time make sure every thing is supported. * Change I/O engine to now support Mirror IO, .i.e write same data to multiple devices, read from a random device to spread the read-load from multiple clients (TODO: stripe read) Implementation notes: A few points introduced in previous patch should be mentioned here: * Special care was made so absolutlly all operation that have any chance of failing are done before any osd-request is executed. This is to minimize the need for a data consistency recovery, to only real IO errors. * Each IO state has a kref. It starts at 1, any osd-request executed will increment the kref, finally when all are executed the first ref is dropped. At IO-done, each request completion decrements the kref, the last one to return executes the internal _last_io() routine. _last_io() will call the registered io_state_done. On sync mode a caller does not supply a done method, indicating a synchronous request, the caller is put to sleep and a special io_state_done is registered that will awaken the caller. Though also in sync mode all operations are executed in parallel. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2009-12-10 09:59:23 +02:00
Boaz Harrosh	06886a5a3d	exofs: Move all operations to an io_engine In anticipation for multi-device operations, we separate osd operations into an abstract I/O API. Currently only one device is used but later when adding more devices, we will drive all devices in parallel according to a "data_map" that describes how data is arranged on multiple devices. The file system level operates, like before, as if there is one object (inode-number) and an i_size. The io engine will split this to the same object-number but on multiple device. At first we introduce Mirror (raid 1) layout. But at the final outcome we intend to fully implement the pNFS-Objects data-map, including raid 0,4,5,6 over mirrored devices, over multiple device-groups. And more. See: http://tools.ietf.org/html/draft-ietf-nfsv4-pnfs-obj-12 * Define an io_state based API for accessing osd storage devices in an abstract way. Usage: First a caller allocates an io state with: exofs_get_io_state(struct exofs_sb_info sbi, struct exofs_io_state* ios); Then calles one of: exofs_sbi_create(struct exofs_io_state ios); exofs_sbi_remove(struct exofs_io_state ios); exofs_sbi_write(struct exofs_io_state ios); exofs_sbi_read(struct exofs_io_state ios); exofs_oi_truncate(struct exofs_i_info oi, u64 new_len); And when done exofs_put_io_state(struct exofs_io_state ios); * Convert all source files to use this new API * Convert from bio_alloc to bio_kmalloc * In io engine we make use of the now fixed osd_req_decode_sense There are no functional changes or on disk additions after this patch. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2009-12-10 09:59:22 +02:00
Boaz Harrosh	8ce9bdd1fb	exofs: move osd.c to ios.c If I do a "git mv" together with a massive code change and commit in one patch, git looses the rename and records a delete/new instead. This is bad because I want a rename recorded so later rebased/cherry-picked patches to the old name will work. Also the --follow is lost. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2009-12-10 09:59:21 +02:00
Boaz Harrosh	cae012d853	exofs: statfs blocks is sectors not FS blocks Even though exofs has a 4k block size, statfs blocks is in sectors (512 bytes). Also if target returns 0 for capacity then make it ULLONG_MAX. df does not like zero-size filesystems Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2009-12-10 09:59:21 +02:00
Boaz Harrosh	19fe294f2e	exofs: Prints on mount and unmout It is important to print in the logs when a filesystem was mounted and eventually unmounted. Print the osd-device's osd_name and pid the FS was mounted/unmounted on. TODO: How to also print the namespace path the filesystem was mounted on? Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2009-12-10 09:59:20 +02:00
Boaz Harrosh	9cfdc7aa9f	exofs: refactor exofs_i_info initialization into common helper There are two places that initialize inodes: exofs_iget() and exofs_new_inode() As more members of exofs_i_info that need initialization are added this code will grow. (soon) Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2009-12-10 09:59:19 +02:00
Boaz Harrosh	fe33cc1ee1	exofs: dbg-print less Iner-loops printing is converted to EXOFS_DBG2 which is #defined to nothing. It is now almost bareable to just leave debug-on. Every operation is printed once, with most relevant info (I hope). Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2009-12-10 09:59:18 +02:00
Boaz Harrosh	58311c43df	exofs: More sane debug print debug prints should be somewhat useful without actually reading the source code Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2009-12-10 09:59:17 +02:00
Linus Torvalds	4ef58d4e2a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (42 commits) tree-wide: fix misspelling of "definition" in comments reiserfs: fix misspelling of "journaled" doc: Fix a typo in slub.txt. inotify: remove superfluous return code check hdlc: spelling fix in find_pvc() comment doc: fix regulator docs cut-and-pasteism mtd: Fix comment in Kconfig doc: Fix IRQ chip docs tree-wide: fix assorted typos all over the place drivers/ata/libata-sff.c: comment spelling fixes fix typos/grammos in Documentation/edac.txt sysctl: add missing comments fs/debugfs/inode.c: fix comment typos sgivwfb: Make use of ARRAY_SIZE. sky2: fix sky2_link_down copy/paste comment error tree-wide: fix typos "couter" -> "counter" tree-wide: fix typos "offest" -> "offset" fix kerneldoc for set_irq_msi() spidev: fix double "of of" in comment comment typo fix: sybsystem -> subsystem ...	2009-12-09 19:43:33 -08:00
Theodore Ts'o	fab3a549e2	ext4: Fix potential fiemap deadlock (mmap_sem vs. i_data_sem) Fix the following potential circular locking dependency between mm->mmap_sem and ei->i_data_sem: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.32-04115-gec044c5 #37 ------------------------------------------------------- ureadahead/1855 is trying to acquire lock: (&mm->mmap_sem){++++++}, at: [<ffffffff81107224>] might_fault+0x5c/0xac but task is already holding lock: (&ei->i_data_sem){++++..}, at: [<ffffffff811be1fd>] ext4_fiemap+0x11b/0x159 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&ei->i_data_sem){++++..}: [<ffffffff81099bfa>] __lock_acquire+0xb67/0xd0f [<ffffffff81099e7e>] lock_acquire+0xdc/0x102 [<ffffffff81516633>] down_read+0x51/0x84 [<ffffffff811a2414>] ext4_get_blocks+0x50/0x2a5 [<ffffffff811a3453>] ext4_get_block+0xab/0xef [<ffffffff81154f39>] do_mpage_readpage+0x198/0x48d [<ffffffff81155360>] mpage_readpages+0xd0/0x114 [<ffffffff811a104b>] ext4_readpages+0x1d/0x1f [<ffffffff810f8644>] __do_page_cache_readahead+0x12f/0x1bc [<ffffffff810f86f2>] ra_submit+0x21/0x25 [<ffffffff810f0cfd>] filemap_fault+0x19f/0x32c [<ffffffff81107b97>] __do_fault+0x55/0x3a2 [<ffffffff81109db0>] handle_mm_fault+0x327/0x734 [<ffffffff8151aaa9>] do_page_fault+0x292/0x2aa [<ffffffff81518205>] page_fault+0x25/0x30 [<ffffffff812a34d8>] clear_user+0x38/0x3c [<ffffffff81167e16>] padzero+0x20/0x31 [<ffffffff81168b47>] load_elf_binary+0x8bc/0x17ed [<ffffffff81130e95>] search_binary_handler+0xc2/0x259 [<ffffffff81166d64>] load_script+0x1b8/0x1cc [<ffffffff81130e95>] search_binary_handler+0xc2/0x259 [<ffffffff8113255f>] do_execve+0x1ce/0x2cf [<ffffffff81027494>] sys_execve+0x43/0x5a [<ffffffff8102918a>] stub_execve+0x6a/0xc0 -> #0 (&mm->mmap_sem){++++++}: [<ffffffff81099aa4>] __lock_acquire+0xa11/0xd0f [<ffffffff81099e7e>] lock_acquire+0xdc/0x102 [<ffffffff81107251>] might_fault+0x89/0xac [<ffffffff81139382>] fiemap_fill_next_extent+0x95/0xda [<ffffffff811bcb43>] ext4_ext_fiemap_cb+0x138/0x157 [<ffffffff811be069>] ext4_ext_walk_space+0x178/0x1f1 [<ffffffff811be21e>] ext4_fiemap+0x13c/0x159 [<ffffffff811390e6>] do_vfs_ioctl+0x348/0x4d6 [<ffffffff811392ca>] sys_ioctl+0x56/0x79 [<ffffffff81028cb2>] system_call_fastpath+0x16/0x1b other info that might help us debug this: 1 lock held by ureadahead/1855: #0: (&ei->i_data_sem){++++..}, at: [<ffffffff811be1fd>] ext4_fiemap+0x11b/0x159 stack backtrace: Pid: 1855, comm: ureadahead Not tainted 2.6.32-04115-gec044c5 #37 Call Trace: [<ffffffff81098c70>] print_circular_bug+0xa8/0xb7 [<ffffffff81099aa4>] __lock_acquire+0xa11/0xd0f [<ffffffff8102f229>] ? sched_clock+0x9/0xd [<ffffffff81099e7e>] lock_acquire+0xdc/0x102 [<ffffffff81107224>] ? might_fault+0x5c/0xac [<ffffffff81107251>] might_fault+0x89/0xac [<ffffffff81107224>] ? might_fault+0x5c/0xac [<ffffffff81124b44>] ? __kmalloc+0x13b/0x18c [<ffffffff81139382>] fiemap_fill_next_extent+0x95/0xda [<ffffffff811bcb43>] ext4_ext_fiemap_cb+0x138/0x157 [<ffffffff811bca0b>] ? ext4_ext_fiemap_cb+0x0/0x157 [<ffffffff811be069>] ext4_ext_walk_space+0x178/0x1f1 [<ffffffff811be21e>] ext4_fiemap+0x13c/0x159 [<ffffffff81107224>] ? might_fault+0x5c/0xac [<ffffffff811390e6>] do_vfs_ioctl+0x348/0x4d6 [<ffffffff8129f6d0>] ? __up_read+0x8d/0x95 [<ffffffff81517fb5>] ? retint_swapgs+0x13/0x1b [<ffffffff811392ca>] sys_ioctl+0x56/0x79 [<ffffffff81028cb2>] system_call_fastpath+0x16/0x1b Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-09 21:30:02 -05:00
Theodore Ts'o	a214238d3b	ext4: Do not override ext2 or ext3 if built they are built as modules The CONFIG_EXT4_USE_FOR_EXT23 option must not try to take over the ext2 or ext3 file systems if the those file system drivers are configured to be built as mdoules. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-09 21:09:58 -05:00
Theodore Ts'o	3b799d15f2	jbd2: Export jbd2_log_start_commit to fix ext4 build This fixes: ERROR: "jbd2_log_start_commit" [fs/ext4/ext4.ko] undefined! Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-09 20:42:53 -05:00
Linus Torvalds	a9280fed38	Merge branch 'reiserfs/kill-bkl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing * 'reiserfs/kill-bkl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing: (31 commits) kill-the-bkl/reiserfs: turn GFP_ATOMIC flag to GFP_NOFS in reiserfs_get_block() kill-the-bkl/reiserfs: drop the fs race watchdog from _get_block_create_0() kill-the-bkl/reiserfs: definitely drop the bkl from reiserfs_ioctl() kill-the-bkl/reiserfs: always lock the ioctl path kill-the-bkl/reiserfs: fix reiserfs lock to cpu_add_remove_lock dependency kill-the-bkl/reiserfs: Fix induced mm->mmap_sem to sysfs_mutex dependency kill-the-bkl/reiserfs: panic in case of lock imbalance kill-the-bkl/reiserfs: fix recursive reiserfs write lock in reiserfs_commit_write() kill-the-bkl/reiserfs: fix recursive reiserfs lock in reiserfs_mkdir() kill-the-bkl/reiserfs: fix "reiserfs lock" / "inode mutex" lock inversion dependency kill-the-bkl/reiserfs: move the concurrent tree accesses checks per superblock kill-the-bkl/reiserfs: acquire the inode mutex safely kill-the-bkl/reiserfs: unlock only when needed in search_by_key kill-the-bkl/reiserfs: use mutex_lock in reiserfs_mutex_lock_safe kill-the-bkl/reiserfs: factorize the locking in reiserfs_write_end() kill-the-bkl/reiserfs: reduce number of contentions in search_by_key() kill-the-bkl/reiserfs: don't hold the write recursively in reiserfs_lookup() kill-the-bkl/reiserfs: lock only once on reiserfs_get_block() kill-the-bkl/reiserfs: conditionaly release the write lock on fs_changed() kill-the-BKL/reiserfs: add reiserfs_cond_resched() ...	2009-12-09 07:58:15 -08:00
Benjamin Herrenschmidt	bcd6acd51f	Merge commit 'origin/master' into next Conflicts: include/linux/kvm.h	2009-12-09 17:14:38 +11:00
Ricardo Labiaga	7cab89b275	nfs41: Invoke RECLAIM_COMPLETE on all new client ids The NFSv4.1 spec indicates RECLAIM_COMPLETE is to be issued whenever a client establishes a new client id, not only after detecting the server has rebooted. Set the NFS4CLNT_RECLAIM_REBOOT bit after every new client id has been established. This enables us to issue RECLAIM_COMPLETE during the wrap up of the NFS4CLNT_RECLAIM_REBOOT state. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-08 14:35:28 -05:00
Linus Torvalds	6035ccd8e9	Merge branch 'for-2.6.33' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.33' of git://git.kernel.dk/linux-2.6-block: (113 commits) cfq-iosched: Do not access cfqq after freeing it block: include linux/err.h to use ERR_PTR cfq-iosched: use call_rcu() instead of doing grace period stall on queue exit blkio: Allow CFQ group IO scheduling even when CFQ is a module blkio: Implement dynamic io controlling policy registration blkio: Export some symbols from blkio as its user CFQ can be a module block: Fix io_context leak after failure of clone with CLONE_IO block: Fix io_context leak after clone with CLONE_IO cfq-iosched: make nonrot check logic consistent io controller: quick fix for blk-cgroup and modular CFQ cfq-iosched: move IO controller declerations to a header file cfq-iosched: fix compile problem with !CONFIG_CGROUP blkio: Documentation blkio: Wait on sync-noidle queue even if rq_noidle = 1 blkio: Implement group_isolation tunable blkio: Determine async workload length based on total number of queues blkio: Wait for cfq queue to get backlogged if group is empty blkio: Propagate cgroup weight updation to cfq groups blkio: Drop the reference to queue once the task changes cgroup blkio: Provide some isolation between groups ...	2009-12-08 08:19:16 -08:00
Linus Torvalds	d7fc02c7ba	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits) mac80211: fix reorder buffer release iwmc3200wifi: Enable wimax core through module parameter iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter iwmc3200wifi: Coex table command does not expect a response iwmc3200wifi: Update wiwi priority table iwlwifi: driver version track kernel version iwlwifi: indicate uCode type when fail dump error/event log iwl3945: remove duplicated event logging code b43: fix two warnings ipw2100: fix rebooting hang with driver loaded cfg80211: indent regulatory messages with spaces iwmc3200wifi: fix NULL pointer dereference in pmkid update mac80211: Fix TX status reporting for injected data frames ath9k: enable 2GHz band only if the device supports it airo: Fix integer overflow warning rt2x00: Fix padding bug on L2PAD devices. WE: Fix set events not propagated b43legacy: avoid PPC fault during resume b43: avoid PPC fault during resume tcp: fix a timewait refcnt race ... Fix up conflicts due to sysctl cleanups (dead sysctl_check code and CTL_UNNUMBERED removed) in kernel/sysctl_check.c net/ipv4/sysctl_net_ipv4.c net/ipv6/addrconf.c net/sctp/sysctl.c	2009-12-08 07:55:01 -08:00
Linus Torvalds	1557d33007	Merge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6: (43 commits) security/tomoyo: Remove now unnecessary handling of security_sysctl. security/tomoyo: Add a special case to handle accesses through the internal proc mount. sysctl: Drop & in front of every proc_handler. sysctl: Remove CTL_NONE and CTL_UNNUMBERED sysctl: kill dead ctl_handler definitions. sysctl: Remove the last of the generic binary sysctl support sysctl net: Remove unused binary sysctl code sysctl security/tomoyo: Don't look at ctl_name sysctl arm: Remove binary sysctl support sysctl x86: Remove dead binary sysctl support sysctl sh: Remove dead binary sysctl support sysctl powerpc: Remove dead binary sysctl support sysctl ia64: Remove dead binary sysctl support sysctl s390: Remove dead sysctl binary support sysctl frv: Remove dead binary sysctl support sysctl mips/lasat: Remove dead binary sysctl support sysctl drivers: Remove dead binary sysctl support sysctl crypto: Remove dead binary sysctl support sysctl security/keys: Remove dead binary sysctl support sysctl kernel: Remove binary sysctl logic ...	2009-12-08 07:38:50 -08:00
Trond Myklebust	88069f77e1	NFSv41: Fix a potential state leakage when restarting nfs4_close_prepare Currently, if the call to nfs4_setup_sequence() in nfs4_close_prepare fails, any later retries will fail to launch an RPC call, due to the fact that the &state->flags will have been cleared. Ditto if nfs4_close_done() triggers a call to the NFSv4.1 version of nfs_restart_rpc(). We therefore move the actual clearing of the state->flags to nfs4_close_done(), when we know that the RPC call was successful. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-08 08:33:16 -05:00
Roel Kluin	b38882f5c0	UBIFS: fix return code in check_leaf Return the PTR_ERR of the correct pointer. This fixes the debugging code. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2009-12-08 14:24:00 +02:00
Jiri Kosina	d014d04386	Merge branch 'for-next' into for-linus Conflicts: kernel/irq/chip.c	2009-12-07 18:36:35 +01:00
Ricardo Labiaga	74e7bb73a3	nfs41: Handle NFSv4.1 session errors in the delegation recall code Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-07 09:48:30 -05:00
Ricardo Labiaga	7970886118	nfs41: Retry delegation return if it failed with session error Update nfs4_delegreturn_done() to retry the operation after setting the NFS4CLNT_SESSION_SETUP bit to indicate the need to reset the session. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-07 09:23:21 -05:00
Ricardo Labiaga	bcfa49f6f9	nfs41: Handle session errors during delegation return Add session error handling to nfs4_open_delegation_recall() Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-07 09:22:29 -05:00
Ricardo Labiaga	f455848a11	nfs41: Mark stateids in need of reclaim if state manager gets stale clientid The state manager was not marking the stateids as needing to be reclaimed after reestablishing the clientid. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-07 09:16:09 -05:00
Trond Myklebust	0110ee152b	NFS: Fix up the declaration of nfs4_restart_rpc when NFSv4 not configured Also rename it: it is used in generic code, and so should not have a 'nfs4' prefix. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-12-07 09:00:24 -05:00
Akira Fujita	4a58579b9e	ext4: Fix insufficient checks in EXT4_IOC_MOVE_EXT This patch fixes three problems in the handling of the EXT4_IOC_MOVE_EXT ioctl: 1. In current EXT4_IOC_MOVE_EXT, there are read access mode checks for original and donor files, but they allow the illegal write access to donor file, since donor file is overwritten by original file data. To fix this problem, change access mode checks of original (r->r/w) and donor (r->w) files. 2. Disallow the use of donor files that have a setuid or setgid bits. 3. Call mnt_want_write() and mnt_drop_write() before and after ext4_move_extents() calling to get write access to a mount. Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-06 23:38:31 -05:00
Jan Kara	b436b9bef8	ext4: Wait for proper transaction commit on fsync We cannot rely on buffer dirty bits during fsync because pdflush can come before fsync is called and clear dirty bits without forcing a transaction commit. What we do is that we track which transaction has last changed the inode and which transaction last changed allocation and force it to disk on fsync. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-08 23:51:10 -05:00
Dmitry Monakhov	194074acac	ext4: fix incorrect block reservation on quota transfer. Inside ->setattr() call both ATTR_UID and ATTR_GID may be valid This means that we may end-up with transferring all quotas. Add we have to reserve QUOTA_DEL_BLOCKS for all quotas, as we do in case of QUOTA_INIT_BLOCKS. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Reviewed-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-08 22:42:28 -05:00
Dmitry Monakhov	5aca07eb7d	ext4: quota macros cleanup Currently all quota block reservation macros contains hard-coded "2" aka MAXQUOTAS value. This is no good because in some places it is not obvious to understand what does this digit represent. Let's introduce new macro with self descriptive name. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Acked-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-08 22:42:15 -05:00
Dmitry Monakhov	8aa6790f87	ext4: ext4_get_reserved_space() must return bytes instead of blocks Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Acked-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-08 22:41:52 -05:00
Curt Wohlgemuth	b844167edc	ext4: remove blocks from inode prealloc list on failure This fixes a leak of blocks in an inode prealloc list if device failures cause ext4_mb_mark_diskspace_used() to fail. Signed-off-by: Curt Wohlgemuth <curtw@google.com> Acked-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-08 22:18:25 -05:00
Josef Bacik	d4edac314e	ext4: wait for log to commit when umounting There is a potential race when a transaction is committing right when the file system is being umounting. This could reduce in a race because EXT4_SB(sb)->s_group_info could be freed in ext4_put_super before the commit code calls a callback so the mballoc code can release freed blocks in the transaction, resulting in a panic trying to access the freed s_group_info. The fix is to wait for the transaction to finish committing before we shutdown the multiblock allocator. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-08 21:48:58 -05:00
Jan Kara	b9a4207d5e	ext4: Avoid data / filesystem corruption when write fails to copy data When ext4_write_begin fails after allocating some blocks or generic_perform_write fails to copy data to write, we truncate blocks already instantiated beyond i_size. Although these blocks were never inside i_size, we have to truncate the pagecache of these blocks so that corresponding buffers get unmapped. Otherwise subsequent __block_prepare_write (called because we are retrying the write) will find the buffers mapped, not call ->get_block, and thus the page will be backed by already freed blocks leading to filesystem and data corruption. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-08 21:24:33 -05:00
Theodore Ts'o	24b584240a	ext4: Use ext4 file system driver for ext2/ext3 file system mounts Add a new config option, CONFIG_EXT4_USE_FOR_EXT23 which if enabled, will cause ext4 to be used for either ext2 or ext3 file system mounts when ext2 or ext3 is not enabled in the configuration. This allows minimalist kernel fanatics to drop to file system drivers from their compiled kernel with out losing functionality. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-07 14:08:51 -05:00

1 2 3 4 5 ...

16192 Commits (d1da96aada79fd1d29ae4e3157120d1ce1e77594)