linux

Commit Graph

Author	SHA1	Message	Date
Linus Torvalds	2d56d3c43c	Merge branch 'server-cluster-locking-api' of git://linux-nfs.org/~bfields/linux * 'server-cluster-locking-api' of git://linux-nfs.org/~bfields/linux: gfs2: nfs lock support for gfs2 lockd: add code to handle deferred lock requests lockd: always preallocate block in nlmsvc_lock() lockd: handle test_lock deferrals lockd: pass cookie in nlmsvc_testlock lockd: handle fl_grant callbacks lockd: save lock state on deferral locks: add fl_grant callback for asynchronous lock return nfsd4: Convert NFSv4 to new lock interface locks: add lock cancel command locks: allow {vfs,posix}_lock_file to return conflicting lock locks: factor out generic/filesystem switch from setlock code locks: factor out generic/filesystem switch from test_lock locks: give posix_test_lock same interface as ->lock locks: make ->lock release private data before returning in GETLK case locks: create posix-to-flock helper functions locks: trivial removal of unnecessary parentheses	2007-05-07 12:34:24 -07:00
Linus Torvalds	5cefcab3db	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (34 commits) [GFS2] Uncomment sprintf_symbol calling code [DLM] lowcomms style [GFS2] printk warning fixes [GFS2] Patch to fix mmap of stuffed files [GFS2] use lib/parser for parsing mount options [DLM] Lowcomms nodeid range & initialisation fixes [DLM] Fix dlm_lowcoms_stop hang [DLM] fix mode munging [GFS2] lockdump improvements [GFS2] Patch to detect corrupt number of dir entries in leaf and/or inode blocks [GFS2] bz 236008: Kernel gpf doing cat /debugfs/gfs2/xxx (lock dump) [DLM] fs/dlm/ast.c should #include "ast.h" [DLM] Consolidate transport protocols [DLM] Remove redundant assignment [GFS2] Fix bz 234168 (ignoring rgrp flags) [DLM] change lkid format [DLM] interface for purge (2/2) [DLM] add orphan purging code (1/2) [DLM] split create_message function [GFS2] Set drop_count to 0 (off) by default ...	2007-05-07 12:26:27 -07:00
Bryan Wu	1394f03221	blackfin architecture This adds support for the Analog Devices Blackfin processor architecture, and currently supports the BF533, BF532, BF531, BF537, BF536, BF534, and BF561 (Dual Core) devices, with a variety of development platforms including those avaliable from Analog Devices (BF533-EZKit, BF533-STAMP, BF537-STAMP, BF561-EZKIT), and Bluetechnix! Tinyboards. The Blackfin architecture was jointly developed by Intel and Analog Devices Inc. (ADI) as the Micro Signal Architecture (MSA) core and introduced it in December of 2000. Since then ADI has put this core into its Blackfin processor family of devices. The Blackfin core has the advantages of a clean, orthogonal,RISC-like microprocessor instruction set. It combines a dual-MAC (Multiply/Accumulate), state-of-the-art signal processing engine and single-instruction, multiple-data (SIMD) multimedia capabilities into a single instruction-set architecture. The Blackfin architecture, including the instruction set, is described by the ADSP-BF53x/BF56x Blackfin Processor Programming Reference http://blackfin.uclinux.org/gf/download/frsrelease/29/2549/Blackfin_PRM.pdf The Blackfin processor is already supported by major releases of gcc, and there are binary and source rpms/tarballs for many architectures at: http://blackfin.uclinux.org/gf/project/toolchain/frs There is complete documentation, including "getting started" guides available at: http://docs.blackfin.uclinux.org/ which provides links to the sources and patches you will need in order to set up a cross-compiling environment for bfin-linux-uclibc This patch, as well as the other patches (toolchain, distribution, uClibc) are actively supported by Analog Devices Inc, at: http://blackfin.uclinux.org/ We have tested this on LTP, and our test plan (including pass/fails) can be found at: http://docs.blackfin.uclinux.org/doku.php?id=testing_the_linux_kernel [m.kozlowski@tuxland.pl: balance parenthesis in blackfin header files] Signed-off-by: Bryan Wu <bryan.wu@analog.com> Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Signed-off-by: Aubrey Li <aubrey.li@analog.com> Signed-off-by: Jie Zhang <jie.zhang@analog.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:58 -07:00
Akinobu Mita	5bc98594d5	hugetlbfs: add NULL check in hugetlb_zero_setup() If hugetlbfs module_init() fails, hugetlbfs_vfsmount is not initialized and shmget() with SHM_HUGETLB flag will cause NULL pointer dereference. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: William Irwin <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:57 -07:00
Christoph Lameter	50953fe9e0	slab allocators: Remove SLAB_DEBUG_INITIAL flag I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by SLAB. I think its purpose was to have a callback after an object has been freed to verify that the state is the constructor state again? The callback is performed before each freeing of an object. I would think that it is much easier to check the object state manually before the free. That also places the check near the code object manipulation of the object. Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was compiled with SLAB debugging on. If there would be code in a constructor handling SLAB_DEBUG_INITIAL then it would have to be conditional on SLAB_DEBUG otherwise it would just be dead code. But there is no such code in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real use of, difficult to understand and there are easier ways to accomplish the same effect (i.e. add debug code before kfree). There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be clear in fs inode caches. Remove the pointless checks (they would even be pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors. This is the last slab flag that SLUB did not support. Remove the check for unimplemented flags from SLUB. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:57 -07:00
Benjamin Herrenschmidt	036e08568c	get_unmapped_area handles MAP_FIXED in hugetlbfs Generic hugetlb_get_unmapped_area() now handles MAP_FIXED by just calling prepare_hugepage_range() Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: William Irwin <bill.irwin@oracle.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Andi Kleen <ak@suse.de> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: Matthew Wilcox <willy@debian.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Adam Litke <agl@us.ibm.com> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:57 -07:00
Christoph Lameter	0a31bd5f2b	KMEM_CACHE(): simplify slab cache creation This patch provides a new macro KMEM_CACHE(<struct>, <flags>) to simplify slab creation. KMEM_CACHE creates a slab with the name of the struct, with the size of the struct and with the alignment of the struct. Additional slab flags may be specified if necessary. Example struct test_slab { int a,b,c; struct list_head; } __cacheline_aligned_in_smp; test_slab_cache = KMEM_CACHE(test_slab, SLAB_PANIC) will create a new slab named "test_slab" of the size sizeof(struct test_slab) and aligned to the alignment of test slab. If it fails then we panic. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:55 -07:00
Peter Zijlstra	96018fdacb	mm: optimize acorn partition truncate invalidate_bdev() is superfluous when truncate_inode_pages() is also called. do call invalidate_bh_lrus() though, to avoid stale pointers. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:55 -07:00
Peter Zijlstra	f9a14399ae	mm: optimize kill_bdev() Remove duplicate work in kill_bdev(). It currently invalidates and then truncates the bdev's mapping. invalidate_mapping_pages() will opportunistically remove pages from the mapping. And truncate_inode_pages() will forcefully remove all pages. The only thing truncate doesn't do is flush the bh lrus. So do that explicitly. This avoids (very unlikely) but possible invalid lookup results if the same bdev is quickly re-issued. It also will prevent extreme kernel latencies which are observed when blockdevs which have a large amount of pagecache are unmounted, by avoiding invalidate_mapping_pages() on that path. invalidate_mapping_pages() has no cond_resched (it can be called under spinlock), whereas truncate_inode_pages() has one. [akpm@linux-foundation.org: restore nrpages==0 optimisation] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:55 -07:00
Peter Zijlstra	f98393a64c	mm: remove destroy_dirty_buffers from invalidate_bdev() Remove the destroy_dirty_buffers argument from invalidate_bdev(), it hasn't been used in 6 years (so akpm says). find * -name \.[ch] \| xargs grep -l invalidate_bdev \| while read file; do quilt add $file; sed -ie 's/invalidate_bdev($[^,]$,[^)]*)/invalidate_bdev(\1)/g' $file; done Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:55 -07:00
Christoph Lameter	d85f33855c	Make page->private usable in compound pages If we add a new flag so that we can distinguish between the first page and the tail pages then we can avoid to use page->private in the first page. page->private == page for the first page, so there is no real information in there. Freeing up page->private makes the use of compound pages more transparent. They become more usable like real pages. Right now we have to be careful f.e. if we are going beyond PAGE_SIZE allocations in the slab on i386 because we can then no longer use the private field. This is one of the issues that cause us not to support debugging for page size slabs in SLAB. Having page->private available for SLUB would allow more meta information in the page struct. I can probably avoid the 16 bit ints that I have in there right now. Also if page->private is available then a compound page may be equipped with buffer heads. This may free up the way for filesystems to support larger blocks than page size. We add PageTail as an alias of PageReclaim. Compound pages cannot currently be reclaimed. Because of the alias one needs to check PageCompound first. The RFC for the this approach was discussed at http://marc.info/?t=117574302800001&r=1&w=2 [nacc@us.ibm.com: fix hugetlbfs] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:53 -07:00
David Rientjes	b813e931b4	smaps: add clear_refs file to clear reference Adds /proc/pid/clear_refs. When any non-zero number is written to this file, pte_mkold() and ClearPageReferenced() is called for each pte and its corresponding page, respectively, in that task's VMAs. This file is only writable by the user who owns the task. It is now possible to measure _approximately_ how much memory a task is using by clearing the reference bits with echo 1 > /proc/pid/clear_refs and checking the reference count for each VMA from the /proc/pid/smaps output at a measured time interval. For example, to observe the approximate change in memory footprint for a task, write a script that clears the references (echo 1 > /proc/pid/clear_refs), sleeps, and then greps for Pgs_Referenced and extracts the size in kB. Add the sizes for each VMA together for the total referenced footprint. Moments later, repeat the process and observe the difference. For example, using an efficient Mozilla: accumulated time referenced memory ---------------- ----------------- 0 s 408 kB 1 s 408 kB 2 s 556 kB 3 s 1028 kB 4 s 872 kB 5 s 1956 kB 6 s 416 kB 7 s 1560 kB 8 s 2336 kB 9 s 1044 kB 10 s 416 kB This is a valuable tool to get an approximate measurement of the memory footprint for a task. Cc: Hugh Dickins <hugh@veritas.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> [akpm@linux-foundation.org: build fixes] [mpm@selenic.com: rename for_each_pmd] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:52 -07:00
David Rientjes	f79f177c25	smaps: add pages referenced count to smaps Adds an additional unsigned long field to struct mem_size_stats called 'referenced'. For each pte walked in the smaps code, this field is incremented by PAGE_SIZE if it has pte-reference bits. An additional line was added to the /proc/pid/smaps output for each VMA to indicate how many pages within it are currently marked as referenced or accessed. Cc: Hugh Dickins <hugh@veritas.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:52 -07:00
David Rientjes	826fad1b93	smaps: extract pmd walker from smaps code Extracts the pmd walker from smaps-specific code in fs/proc/task_mmu.c. The new struct pmd_walker includes the struct vm_area_struct of the memory to walk over. Iteration begins at the vma->vm_start and completes at vma->vm_end. A pointer to another data structure may be stored in the private field such as struct mem_size_stats, which acts as the smaps accumulator. For each pmd in the VMA, the action function is called with a pointer to its struct vm_area_struct, a pointer to the pmd_t, its start and end addresses, and the private field. The interface for walking pmd's in a VMA for fs/proc/task_mmu.c is now: void for_each_pmd(struct vm_area_struct vma, void (action)(struct vm_area_struct vma, pmd_t pmd, unsigned long addr, unsigned long end, void private), void private); Since the pmd walker is now extracted from the smaps code, smaps_one_pmd() is invoked for each pmd in the VMA. Its behavior and efficiency is identical to the existing implementation. Cc: Hugh Dickins <hugh@veritas.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:52 -07:00
Adrian Bunk	ac267728f1	mm/slab.c: proper prototypes Add proper prototypes in include/linux/slab.h. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:52 -07:00
Nick Piggin	3d67f2d7c0	fs: buffer don't PageUptodate without page locked __block_write_full_page is calling SetPageUptodate without the page locked. This is unusual, but not incorrect, as PG_writeback is still set. However the next patch will require that SetPageUptodate always be called with the page locked. Simply don't bother setting the page uptodate in this case (it is unusual that the write path does such a thing anyway). Instead just leave it to the read side to bring the page uptodate when it notices that all buffers are uptodate. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:51 -07:00
Nick Piggin	6fe6900e1e	mm: make read_cache_page synchronous Ensure pages are uptodate after returning from read_cache_page, which allows us to cut out most of the filesystem-internal PageUptodate calls. I didn't have a great look down the call chains, but this appears to fixes 7 possible use-before uptodate in hfs, 2 in hfsplus, 1 in jfs, a few in ecryptfs, 1 in jffs2, and a possible cleared data overwritten with readpage in block2mtd. All depending on whether the filler is async and/or can return with a !uptodate page. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:51 -07:00
Adrian Bunk	d2ba27e800	proper prototype for hugetlb_get_unmapped_area() Add a proper prototype for hugetlb_get_unmapped_area() in include/linux/hugetlb.h. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: William Irwin <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:51 -07:00
David Woodhouse	fcf3cafb3e	[JFFS2] Remove another bogus optimisation in jffs2_add_tn_to_tree() We attempted to insert new nodes into the tree by just using rb_replace_node to let them replace an earlier node which they completely overlapped. However, that could place the new node into the wrong place in the tree, since its start could be node only before the start of the victim, but before the node _before_ the victim in the tree (if that previous node actually ends _after_ the new node, thus isn't entirely overlapped and wasn't itself chosen to be the victim). Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-05-07 13:16:13 +01:00
Marc Eshel	586759f03e	gfs2: nfs lock support for gfs2 Add NFS lock support to GFS2. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Acked-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-06 20:38:50 -04:00
Marc Eshel	1a8322b2b0	lockd: add code to handle deferred lock requests Rewrite nlmsvc_lock() to use the asynchronous interface. As with testlock, we answer nlm requests in nlmsvc_lock by first looking up the block and then using the results we find in the block if B_QUEUED is set, and calling vfs_lock_file() otherwise. If this a new lock request and we get -EINPROGRESS return on a non-blocking request then we defer the request. Also modify nlmsvc_unlock() to call the filesystem method if appropriate. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2007-05-06 20:38:50 -04:00
Marc Eshel	f812048020	lockd: always preallocate block in nlmsvc_lock() Normally we could skip ever having to allocate a block in the case where the client asks for a non-blocking lock, or asks for a blocking lock that succeeds immediately. However we're going to want to always look up a block first in order to check whether we're revisiting a deferred lock call, and to be prepared to handle the case where the filesystem returns -EINPROGRESS--in that case we want to make sure the lock we've given the filesystem is the one embedded in the block that we'll use to track the deferred request. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2007-05-06 20:38:50 -04:00
Marc Eshel	5ea0d75037	lockd: handle test_lock deferrals Rewrite nlmsvc_testlock() to use the new asynchronous interface: instead of immediately doing a posix_test_lock(), we first look for a matching block. If the subsequent test_lock returns anything other than -EINPROGRESS, we then remove the block we've found and return the results. If it returns -EINPROGRESS, then we defer the lock request. In the case where the block we find in the first step has B_QUEUED set, we bypass the vfs_test_lock entirely, instead using the block to decide how to respond: with nlm_lck_denied if B_TIMED_OUT is set. with nlm_granted if B_GOT_CALLBACK is set. by dropping if neither B_TIMED_OUT nor B_GOT_CALLBACK is set Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2007-05-06 20:38:50 -04:00
Marc Eshel	85f3f1b3f7	lockd: pass cookie in nlmsvc_testlock Change NLM internal interface to pass more information for test lock; we need this to make sure the cookie information is pushed down to the place where we do request deferral, which is handled for testlock by the following patch. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2007-05-06 20:38:50 -04:00
Marc Eshel	0e4ac9d935	lockd: handle fl_grant callbacks Add code to handle file system callback when the lock is finally granted. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2007-05-06 20:38:50 -04:00
Marc Eshel	2b36f412ab	lockd: save lock state on deferral We need to keep some state for a pending asynchronous lock request, so this patch adds that state to struct nlm_block. This also adds a function which defers the request, by calling rqstp->rq_chandle.defer and storing the resulting deferred request in a nlm_block structure which we insert into lockd's global block list. That new function isn't called yet, so it's dead code until a later patch. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2007-05-06 20:38:50 -04:00
Marc Eshel	2beb6614f5	locks: add fl_grant callback for asynchronous lock return Acquiring a lock on a cluster filesystem may require communication with remote hosts, and to avoid blocking lockd or nfsd threads during such communication, we allow the results to be returned asynchronously. When a ->lock() call needs to block, the file system will return -EINPROGRESS, and then later return the results with a call to the routine in the fl_grant field of the lock_manager_operations struct. This differs from the case when ->lock returns -EAGAIN to a blocking lock request; in that case, the filesystem calls fl_notify when the lock is granted, and the caller retries the original lock. So while fl_notify is merely a hint to the caller that it should retry, fl_grant actually communicates the final result of the lock operation (with the lock already acquired in the succesful case). Therefore fl_grant takes a lock, a status and, for the test lock case, a conflicting lock. We also allow fl_grant to return an error to the filesystem, to handle the case where the fl_grant requests arrives after the lock manager has already given up waiting for it. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2007-05-06 20:38:49 -04:00
Marc Eshel	fd85b8170d	nfsd4: Convert NFSv4 to new lock interface Convert NFSv4 to the new lock interface. We don't define any callback for now, so we're not taking advantage of the asynchronous feature--that's less critical for the multi-threaded nfsd then it is for the single-threaded lockd. But this does allow a cluster filesystems to export cluster-coherent locking to NFS. Note that it's cluster filesystems that are the issue--of the filesystems that define lock methods (nfs, cifs, etc.), most are not exportable by nfsd. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2007-05-06 20:38:49 -04:00
Marc Eshel	9b9d2ab415	locks: add lock cancel command Lock managers need to be able to cancel pending lock requests. In the case where the exported filesystem manages its own locks, it's not sufficient just to call posix_unblock_lock(); we need to let the filesystem know what's happening too. We do this by adding a new fcntl lock command: FL_CANCELLK. Some day this might also be made available to userspace applications that could benefit from an asynchronous locking api. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-05-06 20:38:28 -04:00
Marc Eshel	150b393456	locks: allow {vfs,posix}_lock_file to return conflicting lock The nfsv4 protocol's lock operation, in the case of a conflict, returns information about the conflicting lock. It's unclear how clients can use this, so for now we're not going so far as to add a filesystem method that can return a conflicting lock, but we may as well return something in the local case when it's easy to. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-05-06 19:23:24 -04:00
Marc Eshel	7723ec9777	locks: factor out generic/filesystem switch from setlock code Factor out the code that switches between generic and filesystem-specific lock methods; eventually we want to call this from lock managers (lockd and nfsd) too; currently they only call the generic methods. This patch does that for all the setlk code. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-05-06 18:08:49 -04:00
J. Bruce Fields	3ee17abd14	locks: factor out generic/filesystem switch from test_lock Factor out the code that switches between generic and filesystem-specific lock methods; eventually we want to call this from lock managers (lockd and nfsd) too; currently they only call the generic methods. This patch does that for test_lock. Note that this hasn't been necessary until recently, because the few filesystems that define ->lock() (nfs, cifs...) aren't exportable via NFS. However GFS (and, in the future, other cluster filesystems) need to implement their own locking to get cluster-coherent locking, and also want to be able to export locking to NFS (lockd and NFSv4). So we accomplish this by factoring out code such as this and exporting it for the use of lockd and nfsd. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-05-06 18:06:44 -04:00
Marc Eshel	9d6a8c5c21	locks: give posix_test_lock same interface as ->lock posix_test_lock() and ->lock() do the same job but have gratuitously different interfaces. Modify posix_test_lock() so the two agree, simplifying some code in the process. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-05-06 17:39:00 -04:00
J. Bruce Fields	70cc6487a4	locks: make ->lock release private data before returning in GETLK case The file_lock argument to ->lock is used to return the conflicting lock when found. There's no reason for the filesystem to return any private information with this conflicting lock, but nfsv4 is. Fix nfsv4 client, and modify locks.c to stop calling fl_release_private for it in this case. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Cc: "Trond Myklebust" <Trond.Myklebust@netapp.com>"	2007-05-06 17:38:19 -04:00
David Woodhouse	96dd8d25d1	[JFFS2] Remove broken insert_point optimisation in jffs2_add_tn_to_tree() The original code would remember, during the first pass over the tree, a suitable place to start the insertion from when we eventually come to add a new node. The optimisation was broken, and we sometimes ended up inserting a new node in the wrong place because we started the insertion from the wrong point. Just ditch the optimisation and start the insertion from the root of the tree, for now. I'll try it again when I'm feeling cleverer. Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-05-06 14:41:40 +01:00
Linus Torvalds	b7405e1643	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: [CIFS] Fix typo in cifs readme from previous commit [CIFS] Make sec=none force an anonymous mount [CIFS] Change semaphore to mutex for cifs lock_sem [CIFS] Fix oops in reset_cifs_unix_caps on reconnect [CIFS] UID/GID override on CIFS mounts to Samba [CIFS] prefixpath mounts to servers supporting posix paths used wrong slash [CIFS] Update cifs version to 1.49 [CIFS] Replace kmalloc/memset combination with kzalloc [CIFS] Add IPv6 support [CIFS] New CIFS POSIX mkdir performance improvement (part 2) [CIFS] New CIFS POSIX mkdir performance improvement [CIFS] Add write perm for usr to file on windows should remove r/o dos attr [CIFS] Remove unnecessary parm to cifs_reopen_file [CIFS] Switch cifsd to kthread_run from kernel_thread [CIFS] Remove unnecessary checks	2007-05-05 15:30:53 -07:00
Steve French	0ec54aa8af	[CIFS] Fix typo in cifs readme from previous commit Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-05-05 22:08:06 +00:00
Linus Torvalds	ea62ccd00f	Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6 * 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (231 commits) [PATCH] i386: Don't delete cpu_devs data to identify different x86 types in late_initcall [PATCH] i386: type may be unused [PATCH] i386: Some additional chipset register values validation. [PATCH] i386: Add missing !X86_PAE dependincy to the 2G/2G split. [PATCH] x86-64: Don't exclude asm-offsets.c in Documentation/dontdiff [PATCH] i386: avoid redundant preempt_disable in __unlazy_fpu [PATCH] i386: white space fixes in i387.h [PATCH] i386: Drop noisy e820 debugging printks [PATCH] x86-64: Fix allnoconfig error in genapic_flat.c [PATCH] x86-64: Shut up warnings for vfat compat ioctls on other file systems [PATCH] x86-64: Share identical video.S between i386 and x86-64 [PATCH] x86-64: Remove CONFIG_REORDER [PATCH] x86-64: Print type and size correctly for unknown compat ioctls [PATCH] i386: Remove copy_*_user BUG_ONs for (size < 0) [PATCH] i386: Little cleanups in smpboot.c [PATCH] x86-64: Don't enable NUMA for a single node in K8 NUMA scanning [PATCH] x86: Use RDTSCP for synchronous get_cycles if possible [PATCH] i386: Add X86_FEATURE_RDTSCP [PATCH] i386: Implement X86_FEATURE_SYNC_RDTSC on i386 [PATCH] i386: Implement alternative_io for i386 ... Fix up trivial conflict in include/linux/highmem.h manually. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-05 14:55:20 -07:00
Dave Kleikamp	05ec9e26be	JFS: Fix race waking up jfsIO kernel thread It's possible for a journal I/O request to be added to the log_redrive queue and the jfsIO thread to be awakened after the thread releases log_redrive_lock but before it sets its state to TASK_INTERRUPTIBLE. The jfsIO thread should set the state before giving up the spinlock, so the waking thread will really wake it. Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>	2007-05-05 14:24:05 -05:00
David Woodhouse	1123e2a859	[JFFS2] Remember to calculate overlap on nodes which replace older nodes This fixes a problem Artem found with the integck test tool -- we weren't correctly keeping track of the 'overlap' flag in some cases, which led to the nodes being played back in an incorrect order and file corruption. Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-05-05 16:29:34 +01:00
David Woodhouse	3fddb6c985	[JFFS2] Don't advance c->wbuf_ofs to next eraseblock after wbuf flush After flushing the last page of an eraseblock, don't leave the wbuf 'offset' field pointing at the start of the next physical eraseblock. This was causing a BUG() on NOR-ECC (Sibley) flash, where we start writing a little further in, after the cleanmarker. Debugged by Alexander Belyakov <abelyako@googlemail.com> Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-05-05 09:52:49 +01:00
Linus Torvalds	fa24aa561a	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: ocfs2: Force use of GFP_NOFS in ocfs2_write() ocfs2: fix sparse warnings in fs/ocfs2/cluster ocfs2: fix sparse warnings in fs/ocfs2/dlm ocfs2: fix sparse warnings in fs/ocfs2 [PATCH] Copy i_flags to ocfs2 inode flags on write [PATCH] ocfs2: use __set_current_state() ocfs2: Wrap access of directory allocations with ip_alloc_sem. [PATCH] fs/ocfs2/: make 3 functions static ocfs2: Implement compat_ioctl()	2007-05-04 20:44:54 -07:00
Jeff Layton	8426c39c12	[CIFS] Make sec=none force an anonymous mount We had a customer report that attempting to make CIFS mount with a null username (i.e. doing an anonymous mount) doesn't work. Looking through the code, it looks like CIFS expects a NULL username from userspace in order to trigger an anonymous mount. The mount.cifs code doesn't seem to ever pass a null username to the kernel, however. It looks also like the kernel can take a sec=none option, but it only seems to look at it if the username is already NULL. This seems redundant and effectively makes sec=none useless. The following patch makes sec=none force an anonymous mount. Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-05-05 03:27:49 +00:00
Linus Torvalds	4d4700707c	Merge git://git.linux-nfs.org/pub/linux/nfs-2.6 * git://git.linux-nfs.org/pub/linux/nfs-2.6: (28 commits) NFS: Fix a compile glitch on 64-bit systems NFS: Clean up nfs_create_request comments spkm3: initialize hash spkm3: remove bad kfree, unnecessary export spkm3: fix spkm3's use of hmac NFS4: invalidate cached acl on setacl NFS: Fix directory caching problem - with test case and patch. NFS: Set meaningful value for fattr->time_start in readdirplus results. NFS: Added support to turn off the NFSv3 READDIRPLUS RPC. SUNRPC: RPC client should retry with different versions of rpcbind SUNRPC: remove old portmapper NFS: switch NFSROOT to use new rpcbind client SUNRPC: switch the RPC server to use the new rpcbind registration API SUNRPC: switch socket-based RPC transports to use rpcbind SUNRPC: introduce rpcbind: replacement for in-kernel portmapper SUNRPC: Eliminate side effects from rpc_malloc SUNRPC: RPC buffer size estimates are too large NLM: Shrink the maximum request size of NLM4 requests NFS: Use pgoff_t in structures and functions that pass page cache offsets NFS: Clean up nfs_sync_mapping_wait() ...	2007-05-04 19:55:11 -07:00
Linus Torvalds	7e20ef030d	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (49 commits) [SCTP]: Set assoc_id correctly during INIT collision. [SCTP]: Re-order SCTP initializations to avoid race with sctp_rcv() [SCTP]: Fix the SO_REUSEADDR handling to be similar to TCP. [SCTP]: Verify all destination ports in sctp_connectx. [XFRM] SPD info TLV aggregation [XFRM] SAD info TLV aggregationx [AF_RXRPC]: Sort out MTU handling. [AF_IUCV/IUCV] : Add missing section annotations [AF_IUCV]: Implementation of a skb backlog queue [NETLINK]: Remove bogus BUG_ON [IPV6]: Some cleanups in include/net/ipv6.h [TCP]: zero out rx_opt in tcp_disconnect() [BNX2]: Fix TSO problem with small MSS. [NET]: Rework dev_base via list_head (v3) [TCP] Highspeed: Limited slow-start is nowadays in tcp_slow_start [BNX2]: Update version and reldate. [BNX2]: Print bus information for PCIE devices. [BNX2]: Add 1-shot MSI handler for 5709. [BNX2]: Restructure PHY event handling. [BNX2]: Add indirect spinlock. ...	2007-05-04 19:36:58 -07:00
Trond Myklebust	84dde76c4a	NFS: Fix a compile glitch on 64-bit systems fs/nfs/pagelist.c:226: error: conflicting types for 'nfs_pageio_init' include/linux/nfs_page.h:80: error: previous declaration of 'nfs_pageio_init' was here Thanks to Andrew for spotting this... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-05-04 14:44:06 -04:00
Pavel Emelianov	7562f876cd	[NET]: Rework dev_base via list_head (v3) Cleanup of dev_base list use, with the aim to simplify making device list per-namespace. In almost every occasion, use of dev_base variable and dev->next pointer could be easily replaced by for_each_netdev loop. A few most complicated places were converted to using first_netdev()/next_netdev(). Signed-off-by: Pavel Emelianov <xemul@openvz.org> Acked-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 15:13:45 -07:00
David Howells	ec9c948546	[AFS]: Adjust the new netdevice scanning code Adjust the new netdevice scanning code provided by Patrick McHardy: (1) Restore the function banner comments that were dropped. (2) Rather than using an array size of 6 in some places and an array size of ETH_ALEN in others, pass a pointer instead and pass the array size through so that we can actually check it. (3) Do the buffer fill count check before checking the for_primary_ifa condition again. This permits us to skip that check should maxbufs be reached before we run out of interfaces. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 03:29:41 -07:00
Patrick McHardy	dc1f6bff6a	[AFS]: Replace rtnetlink client by direct dev_base walking Replace the large and complicated rtnetlink client by two simple functions for getting the MAC address for the first ethernet device and building a list of IPv4 addresses. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 03:28:49 -07:00
Patrick McHardy	5b35fad9d4	[AFS]: Fix memory leak in SRXAFSCB_GetCapabilities The interface array is not freed on exit. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 03:27:39 -07:00
David Howells	fbb3fcba72	[AFS]: Fix use of __exit functions from __init path Fix use of __exit functions from __init path. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 03:12:46 -07:00
David Howells	80c72fe415	[AFS/AF_RXRPC]: Miscellaneous fixes. Make miscellaneous fixes to AFS and AF_RXRPC: () Make AF_RXRPC select KEYS rather than RXKAD or AFS_FS in Kconfig. () Don't use FS_BINARY_MOUNTDATA. () Remove a done 'TODO' item in a comemnt on afs_get_sb(). () Don't pass a void * as the page pointer argument of kmap_atomic() as this breaks on m68k. Patch from Geert Uytterhoeven <geert@linux-m68k.org>. () Use match_() functions rather than doing my own parsing. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 03:11:29 -07:00
Roland Dreier	796e5661f6	[CIFS] Change semaphore to mutex for cifs lock_sem Originally at http://lkml.org/lkml/2006/9/2/86 The recent change to "allow Windows blocking locks to be cancelled via a CANCEL_LOCK call" introduced a new semaphore in struct cifsFileInfo, lock_sem. However, semaphores used as mutexes are deprecated these days, and there's no reason to add a new one to the kernel. Therefore, convert lock_sem to a struct mutex (and also fix one indentation glitch on one of the lines changed anyway). Signed-off-by: Roland Dreier <roland@digitalvampire.org> Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-05-03 04:33:45 +00:00
Steve French	0b2365f826	[CIFS] Fix oops in reset_cifs_unix_caps on reconnect Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-05-03 04:30:13 +00:00
Greg Kroah-Hartman	823bccfc40	remove "struct subsystem" as it is no longer needed We need to work on cleaning up the relationship between kobjects, ksets and ktypes. The removal of 'struct subsystem' is the first step of this, especially as it is not really needed at all. Thanks to Kay for fixing the bugs in this patch. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-05-02 18:57:59 -07:00
Randy Dunlap	2609e7b9be	sysfs: printk format warning Fix sysfs printk format warning: fs/sysfs/bin.c:62: warning: format '%d' expects type 'int', but argument 4 has type 'size_t' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-05-02 18:57:59 -07:00
Mark Fasheh	9315f130e1	ocfs2: Force use of GFP_NOFS in ocfs2_write() We can otherwise recurse into the file system. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-05-02 15:08:34 -07:00
Mark Fasheh	5fdf1e6771	ocfs2: fix sparse warnings in fs/ocfs2/cluster Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-05-02 15:08:23 -07:00
Mark Fasheh	a7d25539fd	ocfs2: fix sparse warnings in fs/ocfs2/dlm Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-05-02 15:08:15 -07:00
Mark Fasheh	1ca1a111b1	ocfs2: fix sparse warnings in fs/ocfs2 None of these are actually harmful, but the noise makes looking for real problems difficult. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-05-02 15:08:08 -07:00
Jan Kara	6e4b0d5692	[PATCH] Copy i_flags to ocfs2 inode flags on write Propagate flags such as S_APPEND, S_IMMUTABLE, etc. from i_flags into ocfs2-specific ip_attr. Hence, when someone sets these flags via a different interface than ioctl, they are stored correctly. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-05-02 15:07:58 -07:00
Milind Arun Choudhary	5c2c9d383e	[PATCH] ocfs2: use __set_current_state() use __set_current_state(TASK_) instead of current->state = TASK_, in fs/ocfs2 Signed-off-by: Milind Arun Choudhary <milindchoudhary@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-05-02 15:07:50 -07:00
Joel Becker	ee19a77956	ocfs2: Wrap access of directory allocations with ip_alloc_sem. OCFS2_I(inode)->ip_alloc_sem is a read-write semaphore protecting local concurrent access of ocfs2 inodes. However, ocfs2 directories were not taking the semaphore while they accessed or modified the allocation tree. ocfs2_extend_dir() needs to take the semaphore in a write mode when it adds to the allocation. All other directory users get there via ocfs2_bread(), which takes the semaphore in read mode. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-05-02 15:07:42 -07:00
Adrian Bunk	6cb129f567	[PATCH] fs/ocfs2/: make 3 functions static This patch makes the following needlessly global functions static: - aops.c: ocfs2_write_data_page() - dlmglue.c: ocfs2_dump_meta_lvb_info() - file.c: ocfs2_set_inode_size() Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-05-02 15:07:27 -07:00
Mark Fasheh	586d232b19	ocfs2: Implement compat_ioctl() We need this to support 32 bit system calls on 64 bit kernels. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-05-02 15:07:16 -07:00
Andi Kleen	2724b6db66	[PATCH] x86-64: Shut up warnings for vfat compat ioctls on other file systems vfat implements compat handlers for these ioctls, but when they were executed on other file systems the kernel would still complain about an unknown compat ioctl. Just declare them as compatible and let them be rejected when not needed by the normal path. This makes wine runs a lot quieter Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:21 +02:00
Andi Kleen	a106009bdf	[PATCH] x86-64: Print type and size correctly for unknown compat ioctls Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:21 +02:00
Andi Kleen	9d016dd43b	[PATCH] x86-64: Shut up 32bit emulation for SIOCGIFCOUNT The kernel doesn't implement it, but some programs like java use it anyways. Shut the code up. Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:20 +02:00
Andi Kleen	421f028100	[PATCH] x86-64: Define IGNORE_IOCTL() macro for compat_ioctls Define a new IGNORE_IOCTL() to let a compat ioctl not be warned about even when it is not implemented. This is the same as COMPATIBLE_IOCTL internally, but better self documentng. Valid reasons to use this: - It is implemented with ->compat_ioctl on some device, but programs call it on others too. - The ioctl is not implemented in the native kernel, but programs call it commonly anyways. Most other reasons are not valid. Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:20 +02:00
Ian Campbell	79e030114a	[PATCH] i386: Allow i386 crash kernels to handle x86_64 dumps The specific case I am encountering is kdump under Xen with a 64 bit hypervisor and 32 bit kernel/userspace. The dump created is 64 bit due to the hypervisor but the dump kernel is 32 bit for maximum compatibility. It's possibly less likely to be useful in a purely native scenario but I see no reason to disallow it. [akpm@linux-foundation.org: build fix] Signed-off-by: Ian Campbell <ian.campbell@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: Vivek Goyal <vgoyal@in.ibm.com> Cc: Horms <horms@verge.net.au> Cc: Magnus Damm <magnus.damm@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:09 +02:00
Jason Uhlenkott	a19b89cad5	NFS: Clean up nfs_create_request comments Remove some stale comments about hard limits which went away in 2.5. Signed-off-by: Jason Uhlenkott <juhlenko@akamai.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-05-02 07:37:29 -07:00
J. Bruce Fields	08efa202eb	NFS4: invalidate cached acl on setacl The ACL that the server sets may not be exactly the one we set--for example, it may silently turn off bits that it does not support. So we should remove any cached ACL so that any subsequent request for the ACL will go to the server. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-05-02 07:36:09 -07:00
David Woodhouse	7c96b7a146	[JFFS2] Remove dead file histo_mips.h Its contents were subsumed into compr_rubin.c in a previous commit, but I forgot to git-rm it. Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-05-02 08:36:21 +01:00
Steven Whitehouse	37fde8ca6c	[GFS2] Uncomment sprintf_symbol calling code Now that the patch from -mm has gone upstream, we can uncomment the code in GFS2 which uses sprintf_symbol. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Robert Peterson <rpeterso@redhat.com>	2007-05-01 09:51:39 +01:00
David Teigland	617e82e10c	[DLM] lowcomms style Replace some printk with log_print, and fix some simple cases of lines over 80. Also, return -ENOTCONN if lowcomms_start fails due to no local IP address being available. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:51 +01:00
akpm@linux-foundation.org	f391a4ead6	[GFS2] printk warning fixes alpha: fs/gfs2/dir.c: In function 'gfs2_dir_read_leaf': fs/gfs2/dir.c:1322: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type 'sector_t' fs/gfs2/dir.c: In function 'gfs2_dir_read': fs/gfs2/dir.c:1455: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type '__u64' Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:48 +01:00
Steven Whitehouse	bf126aee6d	[GFS2] Patch to fix mmap of stuffed files If a stuffed file is mmaped and a page fault is generated at some offset above the initial page, we need to create a zero page to hang the buffer heads off before we can unstuff the file. This is a fix for bz #236087 Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:46 +01:00
Josef Bacik	476c006be0	[GFS2] use lib/parser for parsing mount options This patch converts the mount option parsing to use the kernels lib/parser stuff like all of the other filesystems. I tested this and it works well. Thank you, Signed-off-by: Josef Bacik <jwhiter@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:43 +01:00
Patrick Caulfield	30d3a2373f	[DLM] Lowcomms nodeid range & initialisation fixes Fix a few range & initialization bugs in lowcomms. - max_nodeid is really the highest nodeid encountered, so all loops must include it in their iterations. - clean dlm_local_count & connection_idr so we can do a clean restart. - Remove a spurious BUG_ON Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:41 +01:00
Josef Bacik	2439fe5072	[DLM] Fix dlm_lowcoms_stop hang When you attempt to release a lockspace in DLM, it will hang trying to down a semaphore that has already been downed. The attached patch fixes the problem. Signed-off-by: Josef Bacik <jwhiter@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Patrick Caulfield <pcaulfie@redhat.com>	2007-05-01 09:11:38 +01:00
David Teigland	7d3c1feb80	[DLM] fix mode munging There are flags to enable two specialized features in the dlm: 1. CONVDEADLK causes the dlm to resolve conversion deadlocks internally by changing the granted mode of locks to NL. 2. ALTPR/ALTCW cause the dlm to change the requested mode of locks to PR or CW to grant them if the normal requested mode can't be granted. GFS direct i/o exercises both of these features, especially when mixed with buffered i/o. The dlm has problems with them. The first problem is on the master node. If it demotes a lock as a part of converting it, the actual step of converting the lock isn't being done after the demotion, the lock is just left sitting on the granted queue with a granted mode of NL. I think the mistaken assumption was that the call to grant_pending_locks() would grant it, but that function naturally doesn't look at locks on the granted queue. The second problem is on the process node. If the master either demotes or gives an altmode, the munging of the gr/rq modes is never done in the process copy of the lock, leaving the master/process copies out of sync. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:36 +01:00
Robert Peterson	5f8820960c	[GFS2] lockdump improvements The patch below consists of the following changes (in code order): 1. I fixed a minor compiler warning regarding the printing of a kernel symbol address. 2. I implemented a suggestion from Dave Teigland that moves the debugfs information for gfs2 into a subdirectory so we can easily expand our use of debugfs in the future. The current code keeps the glock information in: /debug/gfs2/<fs> With the patch, the new code keeps the glock information in: /debug/gfs2/<fs>/glock That will allow us to create more debugfs files in the future. 3. This fixes a bug whereby a failed mount attempt causes the debugfs file to not be deleted. Failed mount attempts should always clean up after themselves, including deleting the debugfs file and/or directory. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:33 +01:00
Steven Whitehouse	bdd19a22f8	[GFS2] Patch to detect corrupt number of dir entries in leaf and/or inode blocks This patch detects when the number of entries in a leaf block or inode block (in the case of stuffed directories) is corrupt and informs the user. It prevents us from running off the end of the array thats been allocated for the sorting in this case, Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:30 +01:00
Robert Peterson	7a0079d9e3	[GFS2] bz 236008: Kernel gpf doing cat /debugfs/gfs2/xxx (lock dump) This is for Bugzilla Bug 236008: Kernel gpf doing cat /debugfs/gfs2/xxx (lock dump) seen at the "gfs2 summit". This also fixes the bug that caused garbage to be printed by the "initialized at" field. I apologize for the kludge, but that code will all be ripped out anyway when the official sprint_symbol function becomes available in the Linux kernel. I also changed some formatting so that spaces are replaced by proper tabs. Signed-off-by: Robert Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:28 +01:00
Adrian Bunk	8fa1de386f	[DLM] fs/dlm/ast.c should #include "ast.h" Every file should include the headers containing the prototypes for it's global functions. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:25 +01:00
Patrick Caulfield	6ed7257b46	[DLM] Consolidate transport protocols This patch consolidates the TCP & SCTP protocols for the DLM into a single file and makes it switchable at run-time (well, at least before the DLM actually starts up!) For RHEL5 this patch requires Neil Horman's patch that expands the in-kernel socket API but that has already been twice ACKed so it should be OK. The patch adds a new lowcomms.c file that replaces the existing lowcomms-sctp.c & lowcomms-tcp.c files. Signed-off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:23 +01:00
Patrick Caulfield	fc7c44f03d	[DLM] Remove redundant assignment This patch removes a redundant (and incorrect) assignment from compat_output Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:20 +01:00
Steven Whitehouse	a43a49066d	[GFS2] Fix bz 234168 (ignoring rgrp flags) Ths following patch makes GFS2 use the rgrp flags properly. Although there are also separate flags for both data and metadata as well, I've not implemented these as there seems little use for them. On the otherhand, the "noalloc" flag is generally useful for future changes we might which to make, so this ensures that we interpret it correctly. In addition I fixed the comment above the function which was incorrect. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:17 +01:00
David Teigland	ce03f12b37	[DLM] change lkid format A lock id is a uint32 and is used as an opaque reference to the lock. For userland apps, the lkid is passed up, through libdlm, as the return value from a write() on the dlm device. This created a problem when the high bit was 1, making the lkid look like an error. This is fixed by changing how the lkid is composed. The low 16 bits identified the hash bucket for the lock and the high 16 bits were a per-bucket counter (which eventually hit 0x8000 causing the problem). These are simply swapped around; the number of hash table buckets is far below 0x8000, making all lkid's positive when viewed as signed. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:15 +01:00
David Teigland	72c2be776b	[DLM] interface for purge (2/2) Add code to accept purge commands from userland. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:12 +01:00
David Teigland	8499137d4e	[DLM] add orphan purging code (1/2) Add code for purging orphan locks. A process can also purge all of its own non-orphan locks by passing a pid of zero. Code already exists for processes to create persistent locks that become orphans when the process exits, but the complimentary capability for another process to then purge these orphans has been missing. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:10 +01:00
David Teigland	7e4dac3359	[DLM] split create_message function This splits the current create_message() function into two parts so that later patches can call the new lower-level _create_message() function when they don't have an rsb struct. No functional change in this patch. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:07 +01:00
Steven Whitehouse	f01963f264	[GFS2] Set drop_count to 0 (off) by default This sets the drop_count to 0 by default which is a better default for most people. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:05 +01:00
David Teigland	b9af8a788a	[GFS2] use log_error before LM_OUT_ERROR We always want to see the details of the error returned to gfs, but log_debug is often turned off, so use log_error (printk). Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:02 +01:00
David Teigland	ef0c2bb05f	[DLM] overlapping cancel and unlock Full cancel and force-unlock support. In the past, cancel and force-unlock wouldn't work if there was another operation in progress on the lock. Now, both cancel and unlock-force can overlap an operation on a lock, meaning there may be 2 or 3 operations in progress on a lock in parallel. This support is important not only because cancel and force-unlock are explicit operations that an app can use, but both are used implicitly when a process exits while holding locks. Summary of changes: - add-to and remove-from waiters functions were rewritten to handle situations with more than one remote operation outstanding on a lock - validate_unlock_args detects when an overlapping cancel/unlock-force can be sent and when it needs to be delayed until a request/lookup reply is received - processing request/lookup replies detects when cancel/unlock-force occured during the op, and carries out the delayed cancel/unlock-force - manipulation of the "waiters" (remote operation) state of a lock moved under the standard rsb mutex that protects all the other lock state - the two recovery routines related to locks on the waiters list changed according to the way lkb's are now locked before accessing waiters state - waiters recovery detects when lkb's being recovered have overlapping cancel/unlock-force, and may not recover such locks - revert_lock (cancel) returns a value to distinguish cases where it did nothing vs cases where it actually did a cancel; the cancel completion ast should only be done when cancel did something - orphaned locks put on new list so they can be found later for purging - cancel must be called on a lock when making it an orphan - flag user locks (ENDOFLIFE) at the end of their useful life (to the application) so we can return an error for any further cancel/unlock-force - we weren't setting COMP/BAST ast flags if one was already set, so we'd lose either a completion or blocking ast - clear an unread bast on a lock that's become unlocked Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:00 +01:00
Patrick Caulfield	0320672702	[DLM] fix coverity-spotted stupidity Replacement patch to remove redundant code rather than moving it around. Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:10:57 +01:00
Robert Peterson	04b933f27b	[GFS2] Red Hat bz 228540: owner references In Testing the previously posted and accepted patch for https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=228540 I uncovered some gfs2 badness. It turns out that the current gfs2 code saves off a process pointer when glocks is taken in both the glock and glock holder structures. Those structures will persist in memory long after the process has ended; pointers to poisoned memory. This problem isn't caused by the 228540 fix; the new capability introduced by the fix just uncovered the problem. I wrote this patch that avoids saving process pointers and instead saves off the process pid. Rather than referencing the bad pointers, it now does process lookups. There is special code that makes the output nicer for printing holder information for processes that have ended. This patch also adds a stub for the new "sprint_symbol" function that exists in Andrew Morton's -mm patch set, but won't go into the base kernel until 2.6.22, since it adds functionality but doesn't fix a bug. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:10:55 +01:00
Benjamin Marzinski	172e045a7f	[GFS2] flush the log if a transaction can't allocate space This is a fix for bz #208514. When GFS2 frees up space, the freed blocks aren't available for reuse until the resource group is successfully written to the ondisk journal. So in rare cases, GFS2 operations will fail, saying that the filesystem is out of space, when in reality, you are just waiting for a log flush. For instance, on a 1Gig filesystem, if I continually write 10 Mb to a file, and then truncate it, after a hundred interations, the write will fail with -ENOSPC, even though the filesystem is just 1% full. The attached patch calls a log flush in these cases. I tested this patch fairly heavily to check if there were any locking issues that I missed, and it seems to work just fine. Also, this patch only does the log flush if get_local_rgrp makes a complete loop of resource groups without skipping any do to locking issues. The code would be slightly simpler if it just always did the log flush after the first failed pass, and you could only ever have to go through the loop twice, instead of up to three times. However, I guessed that failing to find a rg simply do to locking issues would be common enough to skip the log flush in that case, but I'm not certain that this is the right way to go. Either way, I don't suppose this code will be hit all that often. Signed-off-by: Benjamin E. Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:10:52 +01:00
Benjamin Marzinski	6883562588	[GFS2] Fix log entry list corruption When glock_lo_add and rg_lo_add attempt to add an element to the log, they check to see if has already been added before locking the log. If another process adds that element to the log in this window between the check and locking the log, the element will be added to the list twice. This causes the log element list to become corrupted in such a way that the log element can never be successfully removed from the list. This patch pulls the list_empty() check inside the log lock, to remove this window. Signed-off-by: Benjamin E. Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:10:50 +01:00
Steven Whitehouse	f35ac346bc	[GFS2] Speed up lock_dlm's locking (move sprintf) The following patch speeds up lock_dlm's locking by moving the sprintf out from the lock acquisition path and into the lock creation path. This reduces the amount of CPU time used in acquiring locks by a fair amount. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: David Teigland <teigland@redhat.com>	2007-05-01 09:10:47 +01:00
Patrick Caulfield	254da030df	[DLM] Don't delete misc device if lockspace removal fails Currently if the lockspace removal fails the misc device associated with a lockspace is left deleted. After that there is no way to access the orphaned lockspace from userland. This patch recreates the misc device if th dlm_release_lockspace fails. I believe this is better than attempting to remove the lockspace first because that leaves an unattached device lying around. The potential gap in which there is no access to the lockspace between removing the misc device and recreating it is acceptable ... after all the application is trying to remove it, and only new users of the lockspace will be affected. Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:10:44 +01:00
Steven Whitehouse	420d2a1028	[GFS2] Fix a bug on i386 due to evaluation order Since gcc didn't evaluate the last two terms of the expression in glock.c:1881 as a constant expression, it resulted in an error on i386 due to the lack of a 64bit divide instruction. This adds some brackets to fix the problem. This was reported by Andrew Morton. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org>	2007-05-01 09:10:42 +01:00
Steven Whitehouse	3b8249f617	[GFS2] Fix bz 224480 and cleanup glock demotion code This patch prevents the printing of a warning message in cases where the fs is functioning normally by handing off responsibility for unlinked, but still open inodes, to another node for eventual deallocation. Also, there is now an improved system for ensuring that such requests to other nodes do not get lost. The callback on the iopen lock is only ever called when i_nlink == 0 and when a node is unable to deallocate it due to it still being in use on another node. When a node receives the callback therefore, it knows that i_nlink must be zero, so we mark it as such (in gfs2_drop_inode) in order that it will then attempt deallocation of the inode itself. As an additional benefit, queuing a demote request no longer requires a memory allocation. This simplifies the code for dealing with gfs2_holders as it removes one special case. There are two new fields in struct gfs2_glock. gl_demote_state is the state which the remote node has requested and gl_demote_time is the time when the request came in. Both fields are only valid when the GLF_DEMOTE flag is set in gl_flags. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:10:39 +01:00
Josef Whiter	1de9139092	[GFS2] Fix bz 231380, unlock page before dequeing glocks in gfs2_commit_write If we are writing a file, and in the middle of writing the file another node attempts to get a shared lock on that file (by doing a du for example) the process doing the writing will hang waiting on lock_page. The reason for this is because when we have waiters on a exclusive glock, we will go through and flush out all dirty pages associated with that inode and release the lock. The problem is that when we flush the dirty pages, we could hit a page that we have locked durring the generic_file_buffered_write part of this operation. This patch unlocks the page before we go to dequeue the lock and locks it immediatly afterwards, since generic_file_buffered_write needs the page locked when the commit_write is completed. This patch resolves the problem, however if somebody sees a better way to do this please don't hesistate to yell. Signed-off-by: Josef Whiter <jwhiter@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:10:37 +01:00
Patrick Caulfield	89adc934f3	[DLM] Fix uninitialised variable in receiving The length of the second element of the kvec array was not initialised before being added to the first one. This could cause invalid lengths to be passed to kernel_recvmsg Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:10:34 +01:00
Josef Whiter	5c7342d894	[GFS2] fix bz 231369, gfs2 will oops if you specify an invalid mount option If you specify an invalid mount option when trying to mount a gfs2 filesystem, gfs2 will oops. The attached patch resolves this problem. Signed-off-by: Josef Whiter <jwhiter@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:10:32 +01:00
Robert Peterson	7c52b166c5	[GFS2] Add gfs2_tool lockdump support to gfs2 (bz 228540) The attached patch resolves bz 228540. This adds the capability for gfs2 to dump gfs2 locks through the debugfs file system. This used to exist in gfs1 as "gfs_tool lockdump" but it's missing from gfs2 because all the ioctls were stripped out. Please see the bugzilla for more history about the fix. This patch is also attached to the bugzilla record. The patch is against Steve Whitehouse's latest nmw git tree kernel (2.6.21-rc1) and has been tested on system trin-10. Signed-off-by: Robert Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:10:29 +01:00
Neil Brown	83672d392f	NFS: Fix directory caching problem - with test case and patch. Try running this script in an NFS mounted directory (Client relatively recent - 2.6.18 has the problem as does 2.6.20). ------------------------------------------------------ #!/bin/bash # # This script will produce the following errormessage from tar: # # tar: newdir/innerdir/innerfile: file changed as we read it # create dirs rm -rf nfstest mkdir -p nfstest/dir/innerdir # create files (should not be empty) echo "Hello World!" >nfstest/dir/file echo "Hello World!" >nfstest/dir/innerdir/innerfile # problem only happens if we sleep before chmod sleep 1 # change file modes chmod -R a+r nfstest # rename dir mv nfstest/dir nfstest/newdir # tar it tar -cf nfstest/nfstest.tar -C nfstest newdir # restore old dir name mv nfstest/newdir nfstest/dir -------------------------------------------------------- What happens: The 'chmod -R' does a readdir_plus in each directory and the results get cached in the page cache. It then updates the ctime on each file by one second. When this happens, the post-op attributes are used to update the ctime stored on the client to match the value in the kernel. The 'mv' calls shrink_dcache_parent on the directory tree which flushes all the dentries (so a new lookup will be required) but doesn't flush the inodes or pagecache. The 'tar' does a readdir on each directory, but (in the case of 'innerdir' at least) satisfies it from the pagecache and uses the READDIRPLUS data to update all the inodes. In the case of 'innerdir/innerfile', the ctime is out of date. 'tar' then calls 'lstat' on innerdir/innerfile getting an old ctime. It then opens the file (triggering a GETATTR), reads the content, and then calls fstat to see if anything has changed. It finds that ctime has changed and so complains. The problem seems to be that the cache readdirplus info is kept around for too long. My patch below discards pagecache data for directories when dentry_iput is called on them. This effectively removes the symptom which convinces me that I correctly understand the problem. However I'm not convinced that is a proper solution, as there could easily be other races that trigger the same problem without being affected by this 'fix'. One possibility would be to require that readdirplus pagecache data be only used once to instantiate an inode. Somehow it should then be invalidated so that if the dentry subsequently disappears, it will cause a new request to the server to fill in the stat data. Another possibility is to compare the cache_change_attribute on the inode with something similar for the readdirplus info and reject the info from readdirplus if it is too old. I haven't tried to implement these and would value other opinions before I do. Thanks, NeilBrown Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:19 -07:00
Neil Brown	1f4eab7e7c	NFS: Set meaningful value for fattr->time_start in readdirplus results. Don't use uninitialsed value for fattr->time_start in readdirplus results. The 'fattr' structure filled in by nfs3_decode_direct does not get a value for ->time_start set. Thus if an entry is for an inode that we already have in cache, when nfs_readdir_lookup calls nfs_fhget, it will call nfs_refresh_inode and may update the inode with out-of-date information. Directories are read a page at a time, so each page could have a different timestamp that "should" be used to set the time_start for the fattr for info in that page. However storing the timestamp per page is awkward. (We could stick in the first 4 bytes and only read 4092 bytes, but that is a bigger code change than I am interested it). This patch ignores the readdir_plus attributes if a readdir finds the information already in cache, and otherwise sets ->time_start to the time the readdir request was sent to the server. It might be nice to store - in the directory inode - the time stamp for the earliest readdir request that is still in the page cache, so that we don't ignore attribute data that we don't have to. This patch doesn't do that. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:18 -07:00
Steve Dickson	74dd34e6e8	NFS: Added support to turn off the NFSv3 READDIRPLUS RPC. READDIRPLUS can be a performance hindrance when the client is working with large directories. In addition, some servers still have bugs in their implementations (e.g. Tru64 returns wrong values for the fsid). Add a mount flag to enable users to turn it off at mount time following the implementation in Apple's NFS client. Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:16 -07:00
Chuck Lever	00a6e7bbf9	SUNRPC: RPC client should retry with different versions of rpcbind Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:16 -07:00
Chuck Lever	df8b172a88	NFS: switch NFSROOT to use new rpcbind client It is arguable whether NFSROOT will support IPv6, and thus whether rpcb_getport_external needs to support rpcbind versions greater than 2. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:14 -07:00
Chuck Lever	2bea90d43a	SUNRPC: RPC buffer size estimates are too large The RPC buffer size estimation logic in net/sunrpc/clnt.c always significantly overestimates the requirements for the buffer size. A little instrumentation demonstrated that in fact rpc_malloc was never allocating the buffer from the mempool, but almost always called kmalloc. To compute the size of the RPC buffer more precisely, split p_bufsiz into two fields; one for the argument size, and one for the result size. Then, compute the sum of the exact call and reply header sizes, and split the RPC buffer precisely between the two. That should keep almost all RPC buffers within the 2KiB buffer mempool limit. And, we can finally be rid of RPC_SLACK_SPACE! Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:10 -07:00
Chuck Lever	511d2e8855	NLM: Shrink the maximum request size of NLM4 requests NLM version 4 requests estimate the call and reply header sizes rather conservatively, using the very maximum size allowed in the protocol even though Linux always uses only a small fraction of the allowable space. Reduce the size of caller and lock arguments to conserve RPC buffer space while XDR encoding NLM4 arguments. Add compile-time checks to ensure the hostname string won't overflow NLM protocol maximums. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:09 -07:00
Trond Myklebust	ca52fec152	NFS: Use pgoff_t in structures and functions that pass page cache offsets Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:09 -07:00
Trond Myklebust	724c439c20	NFS: Clean up nfs_sync_mapping_wait() It has no business touching wbc->pages_skipped. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:08 -07:00
Trond Myklebust	8d5658c949	NFS: Fix a buffer overflow in the allocation of struct nfs_read/writedata Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:07 -07:00
Trond Myklebust	c63c7b0513	NFS: Fix a race when doing NFS write coalescing Currently we do write coalescing in a very inefficient manner: one pass in generic_writepages() in order to lock the pages for writing, then one pass in nfs_flush_mapping() and/or nfs_sync_mapping_wait() in order to gather the locked pages for coalescing into RPC requests of size "wsize". In fact, it turns out there is actually a deadlock possible here since we only start I/O on the second pass. If the user signals the process while we're in nfs_sync_mapping_wait(), for instance, then we may exit before starting I/O on all the requests that have been queued up. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:06 -07:00
Trond Myklebust	8b09bee308	NFS: Cleanup for nfs_readpages() Do the coalescing of read requests into block sized requests at start of I/O as we scan through the pages instead of going through a second pass. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:05 -07:00
Trond Myklebust	bcb71bba7e	NFS: Another cleanup of the read/write request coalescing code Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:04 -07:00
Trond Myklebust	d8a5ad75cc	NFS: Cleanup the coalescing code Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:04 -07:00
Trond Myklebust	91e59c368c	NFS: Don't wait for congestion in nfs_update_request() It is redundant, and will interfere with the call to balance_dirty_pages_ratelimited_nr in generic_file_write(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:03 -07:00
Amnon Aaronsohn	1a0ba9ae48	NFS: statfs error-handling fix The nfs statfs function returns a success code on error, and fills the output buffer with invalid values. The attached patch makes it return a correct error code instead. Signed-off-by: Amnon Aaronsohn <amnonaar@gmail.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> (Modified patch to reinstate the dprintk())	2007-04-30 22:17:02 -07:00
Trond Myklebust	d585158b60	NFS: Fix nfs_set_page_dirty() Be more careful about testing page->mapping. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:02 -07:00
Jeff Mahoney	1173a729fc	reiserfs: suppress lockdep warning We're getting lockdep warnings due to a post-2.6.21-rc7 bugfix. The xattr_sem can never be taken in the manner described. Internal inodes are protected by I_PRIVATE. Add the appropriate annotation. Cc: <stable@kernel.org> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-30 16:40:40 -07:00
Steve French	4523cc3044	[CIFS] UID/GID override on CIFS mounts to Samba When CIFS Unix Extensions are negotiated we get the Unix uid and gid owners of the file from the server (on the Unix Query Path Info levels), but if the server's uids don't match the client uid's users were having to disable the Unix Extensions (which turned off features they still wanted). The changeset patch allows users to override uid and/or gid for file/directory owner with a default uid and/or gid specified at mount (as is often done when mounting from Linux cifs client to Windows server). This changeset also displays the uid and gid used by default in /proc/mounts (if applicable). Also cleans up code by adding some of the missing spaces after "if" keywords per-kernel style guidelines (as suggested by Randy Dunlap when he reviewed the patch). Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-04-30 20:13:06 +00:00
Linus Torvalds	cd9bb7e736	Merge branch 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block * 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block: [PATCH] elevator: elv_list_lock does not need irq disabling [BLOCK] Don't pin lots of memory in mempools cfq-iosched: speedup cic rb lookup ll_rw_blk: add io_context private pointer cfq-iosched: get rid of cfqq hash cfq-iosched: tighten queue request overlap condition cfq-iosched: improve sync vs async workloads cfq-iosched: never allow an async queue idling cfq-iosched: get rid of ->dispatch_slice cfq-iosched: don't pass unused preemption variable around cfq-iosched: get rid of ->cur_rr and ->cfq_list cfq-iosched: slice offset should take ioprio into account [PATCH] cfq-iosched: style cleanups and comments cfq-iosched: sort IDLE queues into the rbtree cfq-iosched: sort RT queues into the rbtree [PATCH] cfq-iosched: speed up rbtree handling cfq-iosched: rework the whole round-robin list concept cfq-iosched: minor updates cfq-iosched: development update cfq-iosched: improve preemption for cooperating tasks	2007-04-30 08:12:39 -07:00
Jens Axboe	5972511b77	[BLOCK] Don't pin lots of memory in mempools Currently we scale the mempool sizes depending on memory installed in the machine, except for the bio pool itself which sits at a fixed 256 entry pre-allocation. There's really no point in "optimizing" this OOM path, we just need enough preallocated to make progress. A single unit is enough, lets scale it down to 2 just to be on the safe side. This patch saves ~150kb of pinned kernel memory on a 32-bit box. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:08:17 +02:00
Paul Mackerras	49e1900d4c	Merge branch 'linux-2.6' into for-2.6.22	2007-04-30 12:38:01 +10:00
Linus Torvalds	42fae7fb1c	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [NET]: Fix networking compilation errors [AF_RXRPC/AFS]: Arch-specific fixes. [AFS]: Fix VLocation record update wakeup [NET]: Revert sk_buff walker cleanups.	2007-04-27 16:20:37 -07:00
Linus Torvalds	f00546363f	Merge git://git.infradead.org/mtd-2.6 * git://git.infradead.org/mtd-2.6: (46 commits) [MTD] [MAPS] drivers/mtd/maps/ck804xrom.c: convert pci_module_init() [MTD] [NAND] CM-x270 MTD driver [MTD] [NAND] Wrong calculation of page number in nand_block_bad() [MTD] [MAPS] fix plat-ram printk format [JFFS2] Fix compr_rubin.c build after include file elimination. [JFFS2] Handle inodes with only a single metadata node with non-zero isize [JFFS2] Tidy up licensing/copyright boilerplate. [MTD] [OneNAND] Exit loop only when column start with 0 [MTD] [OneNAND] Fix access the past of the real oobfree array [MTD] [OneNAND] Update Samsung OneNAND official URL [JFFS2] Better fix for all-zero node headers [JFFS2] Improve read_inode memory usage, v2. [JFFS2] Improve failure mode if inode checking leaves unchecked space. [JFFS2] Fix cross-endian build. [MTD] Finish conversion mtd_blkdevs to use the kthread API [JFFS2] Obsolete dirent nodes immediately on unlink, where possible. Use menuconfig objects: MTD [MTD] mtd_blkdevs: Convert to use the kthread API [MTD] Fix fwh_lock locking [JFFS2] Speed up mount for directly-mapped NOR flash ...	2007-04-27 15:34:57 -07:00
David Howells	b1bdb691c3	[AF_RXRPC/AFS]: Arch-specific fixes. Fixes for various arch compilation problems: () Missing module exports. () Variable name collision when rxkad and af_rxrpc both built in (rxrpc_debug). () Large constant representation problem (AFS_UUID_TO_UNIX_TIME). () Configuration dependencies. (*) printk() format warnings. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-27 15:28:45 -07:00
David Howells	47051a2152	[AFS]: Fix VLocation record update wakeup Fix the wakeup transitions after a VLocation record update completes one way or another. This builds on Dave Miller's partial fix. Also move wakeups outside the spinlocked sections. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-27 15:26:30 -07:00
Linus Torvalds	d868772fff	Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: (46 commits) dev_dbg: check dev_dbg() arguments drivers/base/attribute_container.c: use mutex instead of binary semaphore mod_sysfs_setup() doesn't return errno when kobject_add_dir() failure occurs s2ram: add arch irq disable/enable hooks define platform wakeup hook, use in pci_enable_wake() security: prevent permission checking of file removal via sysfs_remove_group() device_schedule_callback() needs a module reference s390: cio: Delay uevents for subchannels sysfs: bin.c printk fix Driver core: use mutex instead of semaphore in DMA pool handler driver core: bus_add_driver should return an error if no bus debugfs: Add debugfs_create_u64() the overdue removal of the mount/umount uevents kobject: Comment and warning fixes to kobject.c Driver core: warn when userspace writes to the uevent file in a non-supported way Driver core: make uevent-environment available in uevent-file kobject core: remove rwsem from struct subsystem qeth: Remove usage of subsys.rwsem PHY: remove rwsem use from phy core IEEE1394: remove rwsem use from ieee1394 core ...	2007-04-27 12:58:54 -07:00
David Woodhouse	d1da4e50e5	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/mtd/Kconfig Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-04-27 19:16:19 +01:00
James Morris	057f6c019f	security: prevent permission checking of file removal via sysfs_remove_group() Prevent permission checking from being performed when the kernel wants to unconditionally remove a sysfs group, by introducing an kernel-only variant of lookup_one_len(), lookup_one_len_kern(). Additionally, as sysfs_remove_group() does not check the return value of the lookup before using it, a BUG_ON has been added to pinpoint the cause of any problems potentially caused by this (and as a form of annotation). Signed-off-by: James Morris <jmorris@namei.org> Cc: Nagendra Singh Tomar <nagendra_tomar@adaptec.com> Cc: Tejun Heo <htejun@gmail.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-04-27 10:57:33 -07:00
Alan Stern	523ded71de	device_schedule_callback() needs a module reference This patch (as896b) fixes an oversight in the design of device_schedule_callback(). It is necessary to acquire a reference to the module owning the callback routine, to prevent the module from being unloaded before the callback can run. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Cc: Satyam Sharma <satyam.sharma@gmail.com> Cc: Neil Brown <neilb@suse.de> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-04-27 10:57:32 -07:00
Andrew Morton	45cd8d8e1e	sysfs: bin.c printk fix fs/sysfs/bin.c: In function 'read': fs/sysfs/bin.c:77: warning: format '%zd' expects type 'signed size_t', but argument 4 has type 'int' Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-04-27 10:57:32 -07:00
Michael Ellerman	8447891fe8	debugfs: Add debugfs_create_u64() I went to use this the other day, only to find it didn't exist. It's a straight copy of the debugfs u32 code, then s/u32/u64/. A quick test shows it seems to be working. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-04-27 10:57:31 -07:00
Adrian Bunk	3106d46f51	the overdue removal of the mount/umount uevents This patch contains the overdue removal of the mount/umount uevents. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-04-27 10:57:31 -07:00
Linus Torvalds	b928ed5618	Merge branch 'for-linus' of git://git.infradead.org/ubi-2.6 * 'for-linus' of git://git.infradead.org/ubi-2.6: UBI: remove unused variable UBI: add me to MAINTAINERS JFFS2: add UBI support UBI: Unsorted Block Images	2007-04-27 10:42:35 -07:00
Linus Torvalds	ea6db58f3e	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (27 commits) ocfs2: Cache extent records ocfs2: Remember rw lock level during direct io ocfs2: Fix up i_blocks calculation to know about holes ocfs2: Fix extent lookup to return true size of holes ocfs2: Read from an unwritten extent returns zeros ocfs2: make room for unwritten extents flag ocfs2: Use own splice write actor ocfs2: Use do_sync_mapping_range() in ocfs2_zero_tail_for_truncate() [PATCH] Turn do_sync_file_range() into do_sync_mapping_range() ocfs2: zero tail of sparse files on truncate ocfs2: Teach ocfs2_get_block() about holes ocfs2: remove ocfs2_prepare_write() and ocfs2_commit_write() ocfs2: teach ocfs2_file_aio_write() about sparse files ocfs2: Turn off shared writeable mmap for local files systems with holes. ocfs2: abstract out allocation locking ocfs2: teach extend/truncate about sparse files ocfs2: temporarily remove extent map caching ocfs2: sparse b-tree support ocfs2: small cleanup of ocfs2_request_delete() ocfs2: remove unused code ...	2007-04-27 10:29:56 -07:00
Artem Bityutskiy	0029da3bf4	JFFS2: add UBI support This patch make JFFS2 able to work with UBI volumes via the emulated MTD devices which are directly mapped to these volumes. Signed-off-by: Artem Bityutskiy <dedekind@infradead.org>	2007-04-27 14:24:08 +03:00
David S. Miller	39bf094930	[AFS]: Eliminate cmpxchg() usage in vlocation code. cmpxchg() is not available on every processor so can't be used in generic code. Replace with spinlock protection on the ->state changes, wakeups, and wait loops. Add what appears to be a missing wakeup on transition to AFS_VL_VALID state in afs_vlocation_updater(). Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-26 20:39:14 -07:00
David S. Miller	ba3e0e1acc	[AFS]: Fix u64 printing in debug logging. Need 'unsigned long long' casts to quiet warnings on 64-bit platforms when using %ll on a u64. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-26 16:06:22 -07:00
David Howells	260a980317	[AFS]: Add "directory write" support. Add support for the create, link, symlink, unlink, mkdir, rmdir and rename VFS operations to the in-kernel AFS filesystem. Also: (1) Fix dentry and inode revalidation. d_revalidate should only look at state of the dentry. Revalidation of the contents of an inode pointed to by a dentry is now separate. (2) Fix afs_lookup() to hash negative dentries as well as positive ones. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-26 15:59:35 -07:00
David Howells	c35eccb1f6	[AFS]: Implement the CB.InitCallBackState3 operation. Implement the CB.InitCallBackState3 operation for the fileserver to call. This reduces the amount of network traffic because if this op is aborted, the fileserver will then attempt an CB.InitCallBackState operation. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-26 15:58:49 -07:00
David Howells	b908fe6b2d	[AFS]: Add support for the CB.GetCapabilities operation. Add support for the CB.GetCapabilities operation with which the fileserver can ask the client for the following information: (1) The list of network interfaces it has available as IPv4 address + netmask plus the MTUs. (2) The client's UUID. (3) The extended capabilities of the client, for which the only current one is unified error mapping (abort code interpretation). To support this, the patch adds the following routines to AFS: (1) A function to iterate through all the network interfaces using RTNETLINK to extract IPv4 addresses and MTUs. (2) A function to iterate through all the network interfaces using RTNETLINK to pull out the MAC address of the lowest index interface to use in UUID construction. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-26 15:58:17 -07:00
David Howells	00d3b7a453	[AFS]: Add security support. Add security support to the AFS filesystem. Kerberos IV tickets are added as RxRPC keys are added to the session keyring with the klog program. open() and other VFS operations then find this ticket with request_key() and either use it immediately (eg: mkdir, unlink) or attach it to a file descriptor (open). Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-26 15:57:07 -07:00
David Howells	436058a49e	[AFS]: Handle multiple mounts of an AFS superblock correctly. Handle multiple mounts of an AFS superblock correctly, checking to see whether the superblock is already initialised after calling sget() rather than just unconditionally stamping all over it. Also delete the "silent" parameter to afs_fill_super() as it's not used and can, in any case, be obtained from sb->s_flags. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-26 15:56:24 -07:00
David Howells	63b6be55e8	[AF_RXRPC]: Delete the old RxRPC code. Delete the old RxRPC code as it's now no longer used. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-26 15:55:48 -07:00
David Howells	08e0e7c82e	[AF_RXRPC]: Make the in-kernel AFS filesystem use AF_RXRPC. Make the in-kernel AFS filesystem use AF_RXRPC instead of the old RxRPC code. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-26 15:55:03 -07:00
David Howells	ec26815ad8	[AFS]: Clean up the AFS sources Clean up the AFS sources. Also remove references to AFS keys. RxRPC keys are used instead. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-26 15:49:28 -07:00
Mark Fasheh	8341897882	ocfs2: Cache extent records The extent map code was ripped out earlier because of an inability to deal with holes. This patch adds back a simpler caching scheme requiring far less code. Our old extent map caching was designed back when meta data block caching in Ocfs2 didn't work very well, resulting in many disk reads. These days our metadata caching is much better, resulting in no un-necessary disk reads. As a result, extent caching doesn't have to be as fancy, nor does it have to cache as many extents. Keeping the last 3 extents seen should be sufficient to give us a small performance boost on some streaming workloads. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:10:40 -07:00
Mark Fasheh	7cdfc3a1c3	ocfs2: Remember rw lock level during direct io Cluster locking might have been redone because a direct write won't complete, so this needs to be reflected in the iocb. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:07:45 -07:00
Mark Fasheh	8110b073a9	ocfs2: Fix up i_blocks calculation to know about holes Older file systems which didn't support holes did a dumb calculation of i_blocks based on i_size. This is no longer accurate, so fix things up to take actual allocation into account. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:07:40 -07:00
Mark Fasheh	4f902c3772	ocfs2: Fix extent lookup to return true size of holes Initially, we had wired things to return a size '1' of holes. Cook up a small amount of code to find the next extent and calculate the number of clusters between the virtual offset and the next allocated extent. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:02:45 -07:00
Mark Fasheh	49cb8d2d49	ocfs2: Read from an unwritten extent returns zeros Return an optional extent flags field from our lookup functions and wire up callers to treat unwritten regions as holes for the purpose of returning zeros to the user. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:02:41 -07:00
Mark Fasheh	e48edee2d8	ocfs2: make room for unwritten extents flag Due to the size of our group bitmaps, we'll never have a leaf node extent record with more than 16 bits worth of clusters. Split e_clusters up so that leaf nodes can get a flags field where we can mark unwritten extents. Interior nodes whose length references all the child nodes beneath it can't split their e_clusters field, so we use a union to preserve sizing there. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:02:37 -07:00
Mark Fasheh	6af67d8205	ocfs2: Use own splice write actor We need to fill holes during a splice write. Provide our own splice write actor which can call ocfs2_file_buffered_write() with a splice-specific callback. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:02:34 -07:00
Mark Fasheh	fa41045fcb	ocfs2: Use do_sync_mapping_range() in ocfs2_zero_tail_for_truncate() Do this instead of filemap_fdatawrite() - this way we sync only the range between i_size and the cluster boundary. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:02:30 -07:00
Mark Fasheh	5b04aa3a64	[PATCH] Turn do_sync_file_range() into do_sync_mapping_range() do_sync_file_range() accepts a file * from which it takes an address_space to sync. Abstract out the bulk of the function into do_sync_mapping_range() which takes the address_space directly. This way callers who want to sync an address_space directly can take advantage of the functionality provided. do_sync_file_range() is preserved as a small wrapper around do_sync_mapping_range(). Ocfs2 in particular would like to use this to initiate a sync of a specific inode range during truncate, where a file * may not be available. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-04-26 15:02:26 -07:00
Mark Fasheh	60b11392f1	ocfs2: zero tail of sparse files on truncate Since we don't zero on extend anymore, truncate needs to be fixed up to zero the part of a file between i_size and and end of it's cluster. Otherwise a subsequent extend could expose bad data. This introduced a new helper, which can be used in ocfs2_write(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:02:20 -07:00
Mark Fasheh	25baf2da14	ocfs2: Teach ocfs2_get_block() about holes ocfs2_get_block() didn't understand sparse files, fix that. Also remove some code that isn't really useful anymore. We can fix up ocfs2_direct_IO_get_blocks() at the same time. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:02:16 -07:00
Mark Fasheh	5069120b72	ocfs2: remove ocfs2_prepare_write() and ocfs2_commit_write() These are no longer used, and can't handle file systems with sparse file allocation. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:02:12 -07:00
Mark Fasheh	9517bac6cc	ocfs2: teach ocfs2_file_aio_write() about sparse files Unfortunately, ocfs2 can no longer make use of generic_file_aio_write_nlock() because allocating writes will require zeroing of pages adjacent to the I/O for cluster sizes greater than page size. Implement a custom file write here, which can order page locks for zeroing. This also has the advantage that cluster locks can easily be ordered outside of the page locks. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:02:08 -07:00
Mark Fasheh	89488984ac	ocfs2: Turn off shared writeable mmap for local files systems with holes. This will be turned back on once we can do allocation in ->page_mkwrite(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:02:01 -07:00
Mark Fasheh	abf8b15694	ocfs2: abstract out allocation locking Right now, file allocation for ocfs2 is done within ocfs2_extend_file(), which is either called from ->setattr() (for an i_size change), or at the top of ocfs2_file_aio_write(). Inodes on file systems with sparse file support will want to do their allocation during the actual write call. In either case the cluster locking decisions are the same. We abstract out that code into a new function, ocfs2_lock_allocators() which will be used by a later patch to enable writing to sparse files. This also provides a nice cleanup of ocfs2_extend_allocation(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:01:58 -07:00
Mark Fasheh	3a0782d09c	ocfs2: teach extend/truncate about sparse files For ocfs2_truncate_file(), we eliminate the "simple" truncate case which no longer exists since i_size is not tied to i_clusters. In ocfs2_extend_file(), we skip the allocation / page zeroing code for file systems which understand sparse files. The core truncate code is changed to do a bottom up tree traversal. This gets abstracted out into it's own function. To make things more readable, most of the special case handling for in-inode extents from ocfs2_do_truncate() is also removed. Though write support for sparse files comes in a later patch, we at least update ocfs2_prepare_inode_for_write() to skip allocation for sparse files. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:01:56 -07:00
Mark Fasheh	363041a5f7	ocfs2: temporarily remove extent map caching The code in extent_map.c is not prepared to deal with a subtree being rotated between lookups. This can happen when filling holes in sparse files. Instead of a lengthy patch to update the code (which would likely lose the benefit of caching subtree roots), we remove most of the algorithms and implement a simple path based lookup. A less ambitious extent caching scheme will be added in a later patch. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 15:01:31 -07:00
Mark Fasheh	dcd0538ff4	ocfs2: sparse b-tree support Introduce tree rotations into the b-tree code. This will allow ocfs2 to support sparse files. Much of the added code is designed to be generic (in the ocfs2 sense) so that it can later be re-used to implement large extended attributes. This patch only adds the rotation code and does minimal updates to callers of the extent api. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 14:44:03 -07:00
Mark Fasheh	6f16bf655c	ocfs2: small cleanup of ocfs2_request_delete() There are two checks in there (one for inode newness, one for other mounted nodes) which are unnecessary, so remove them. The DLM will allow the trylock in either case without any messaging overhead. Removing these makes ocfs2_request_delete() a one liner function, so just move the trylock out one level into ocfs2_query_inode_wipe(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 14:40:55 -07:00
Tiger Yang	68e2b740c4	ocfs2: remove unused code Remove node messaging code that becomes unused with the delete inode vote removal. [Removed even more cruft which I spotted during review --Mark] Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 14:40:16 -07:00
Tiger Yang	500086300e	ocfs2: Remove delete inode vote Ocfs2 currently does cluster-wide node messaging to check the open state of an inode during delete. This patch removes that mechanism in favor of an inode cluster lock which is taken at shared read when an inode is first read and dropped in clear_inode(). This allows a deleting node to test the liveness of an inode by attempting to take an exclusive lock. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 14:39:48 -07:00
Mark Fasheh	a9f5f70739	ocfs2: filter more error prints We don't want to print anything at all in ocfs2_lookup() when getting an error from ocfs2_iget() - it could be something as innocuous as a signal being detected in the dlm. ocfs2_permission() should filter on -ENOENT which ocfs2_meta_lock() can return if the inode was deleted on another node. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 13:39:08 -07:00
Sunil Mushran	bebe6f120b	ocfs2: Replace panic() with emergency_restart() when fencing We have noticed panic() hanging leading us to a situation in which the node, while otherwise dead, is still disk heartbeating. This leads to a hung cluster as the other nodes are waiting for this node to stop disk heartbeating. This situation is only resolved by power resetting the box. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 13:39:02 -07:00
Sunil Mushran	5d262cc7dd	ocfs2: Silence compiler warnings Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 13:38:55 -07:00
Mark Fasheh	be9e986b82	ocfs2: Local mounts should skip inode updates We don't want the extent map and uptodate cache destruction in ocfs2_meta_lock_update() on a local mount, so skip that. This fixes several bugs with uptodate being cleared on buffers and extent maps being corrupted. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 13:35:21 -07:00
Sunil Mushran	0d01af6e5d	ocfs2_dlm: Call cond_resched_lock() once per hash bucket scan In dlm_migrate_all_locks(), we currently call cond_resched_lock() after processing each lockres in a hash bucket. Move it outside the loop so as to call it only after the entire hash bucket has been processed. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 13:33:11 -07:00
Srinivas Eeda	756a1501dd	ocfs2_dlm: fix race in dlm_remaster_locks There is a possibility that dlm_remaster_locks could overwride node->state with DLM_RECO_NODE_DATA_REQUESTED after dlm_reco_data_done_handler sets the node->state to DLM_RECO_NODE_DATA_DONE. This could lead to recovery getting stuck and requires a cluster reboot. Synchronize with dlm_reco_state_lock spinlock. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-04-26 13:33:02 -07:00
Steve French	984acfe1cf	[CIFS] prefixpath mounts to servers supporting posix paths used wrong slash Acked-by: Alexander Bokovoy <abokovoy@ru.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-04-26 16:42:50 +00:00
Steve French	deb0420c6f	[CIFS] Update cifs version to 1.49 Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-04-26 14:35:54 +00:00
Milind Arun Choudhary	3cbb1c8e1a	JFS: use __set_current_state() use __set_current_state(TASK_) instead of current->state = TASK_ Signed-off-by: Milind Arun Choudhary <milindchoudhary@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>	2007-04-26 07:30:29 -05:00
David Woodhouse	ef2e58ea6b	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6	2007-04-26 09:31:28 +01:00
Andrew Morton	f6449f4ece	[JFFS2] Fix compr_rubin.c build after include file elimination. It seems to be silly season lately. (Oops, test builds are more useful if the file in question is actually configured on. dwmw2). Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-04-26 07:27:04 +01:00
Patrick McHardy	af65bdfce9	[NETLINK]: Switch cb_lock spinlock to mutex and allow to override it Switch cb_lock to mutex and allow netlink kernel users to override it with a subsystem specific mutex for consistent locking in dump callbacks. All netlink_dump_start users have been audited not to rely on any side-effects of the previously used spinlock. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:03 -07:00
Arnaldo Carvalho de Melo	b529ccf279	[NETLINK]: Introduce nlmsg_hdr() helper For the common "(struct nlmsghdr *)skb->data" sequence, so that we reduce the number of direct accesses to skb->data and for consistency with all the other cast skb member helpers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:34 -07:00
Eric Dumazet	ae40eb1ef3	[NET]: Introduce SIOCGSTAMPNS ioctl to get timestamps with nanosec resolution Now network timestamps use ktime_t infrastructure, we can add a new ioctl() SIOCGSTAMPNS command to get timestamps in 'struct timespec'. User programs can thus access to nanosecond resolution. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> CC: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:04 -07:00
David Woodhouse	61c4b23770	[JFFS2] Handle inodes with only a single metadata node with non-zero isize This should never happen unless there's corruption on the medium and the actual data nodes go missing. But the failure mode (an oops when we assume the fragtree isn't empty and go looking for its last node) isn't useful. Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-04-25 17:04:23 +01:00
Dave Kleikamp	3e2221c73c	Copy i_flags to jfs inode flags on write This mirrors Jan Kara's patches for ext3. This patch makes sure that changes made to inode->i_flags are reflected on disk for jfs. It also moves a call of jfs_set_inode_flags() to be more consistent with where jfs_get_inode_flags() is called. Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>	2007-04-25 09:36:20 -05:00
David Woodhouse	c00c310eac	[JFFS2] Tidy up licensing/copyright boilerplate. In particular, remove the bit in the LICENCE file about contacting Red Hat for alternative arrangements. Their errant IS department broke that arrangement a long time ago -- the policy of collecting copyright assignments from contributors came to an end when the plug was pulled on the servers hosting the project, without notice or reason. We do still dual-license it for use with eCos, with the GPL+exception licence approved by the FSF as being GPL-compatible. It's just that nobody has the right to license it differently. Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-04-25 14:16:47 +01:00
vignesh	eaa33a9ac0	[CIFS] Replace kmalloc/memset combination with kzalloc Signed-off-by: Vignesh Babu <vignesh.babu@wipro.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-04-25 12:13:48 +00:00
Steve French	5858ae44e2	[CIFS] Add IPv6 support IPv6 support was started a few years ago in the cifs client, but lacked a kernel helper function for parsing the ascii form of the ipv6 address. Now that that is added (and now IPv6 is the default that some OS use now) it was fairly easy to finish the cifs ipv6 support. This requires that CIFS_EXPERIMENTAL be enabled and (at least until the mount.cifs module is modified to use a new ipv6 friendly call instead of gethostbyname) and the ipv6 address be passed on the mount as "ip=" mount option. Thanks Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-04-25 11:59:10 +00:00
Steve French	cbac3cba66	[CIFS] New CIFS POSIX mkdir performance improvement (part 2) Fix incorrect parsing of return data Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-04-25 11:46:06 +00:00
Joakim Tjernlund	0dec4c8bc6	[JFFS2] Better fix for all-zero node headers No need to check for all-zero header since the header cannot be zero due to other checks. Replace the all-zero header check in readinode.c with a check for the magic word. Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-04-25 04:13:06 +01:00
David Woodhouse	df8e96f391	[JFFS2] Improve read_inode memory usage, v2. We originally used to read every node and allocate a jffs2_tmp_dnode_info structure for each, before processing them in (reverse) version order and discarding the ones which are obsoleted by later nodes. With huge logfiles, this behaviour caused memory problems. For example, a file involved in OLPC trac #1292 has 1822391 nodes, and would cause the XO machine to run out of memory during the first stage of read_inode(). Instead of just inserting nodes into a tree in version order as we find them, we now put them into a tree in order of their offset within the file, which allows us to immediately discard nodes which are completely obsoleted. We don't use a full tree with 'fragments' pointing to the real data structure, as we do in the normal fragtree. We sort only on the start address, and add an 'overlapped' flag to the tmp_dnode_info to indicate that the node in question is (partially) overlapped by another. When the scan is complete, we start at the end of the file, adding each node to a real fragtree as before. Where the node is non-overlapped, we just add it (it doesn't matter that it's not the latest version; there is no overlap). When the node at the end of the tree _is_ overlapped, we sort it and all its overlapping nodes into version order and then add them to the fragtree in that order. This 'early discard' reduces the peak allocation of tmp_dnode_info structures from 1.8M to a mere 62872 (3.5%) in the degenerate case referenced above. This version of the patch also correctly rememembers the highest node version# seen for an inode when it's scanned. Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-04-25 03:23:42 +01:00
Jeff Mahoney	9b7f375505	reiserfs: fix xattr root locking/refcount bug The listxattr() and getxattr() operations are only protected by a read lock. As a result, if either of these operations run in parallel, a race condition exists where the xattr_root will end up being cached twice, which results in the leaking of a reference and a BUG() on umount. This patch refactors get_xa_root(), __get_xa_root(), and create_xa_root(), into one get_xa_root() function that takes the appropriate locking around the entire critical section. Reported, diagnosed and tested by Andrea Righi <a.righi@cineca.it> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Andrea Righi <a.righi@cineca.it> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Cc: Edward Shishkin <edward@namesys.com> Cc: Alex Zarochentsev <zam@namesys.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-24 08:23:09 -07:00
Latchesar Ionkov	c959df9f01	v9fs: don't use primary fid when removing file v9fs_insert uses v9fs_fid_lookup (which also locks the fid) to get the primary fid associated with the dentry and destroys the v9fs_fid struct after removing the file. If another process called v9fs_fid_lookup on the same dentry, it may wait undefinitely for the fid's lock (as the struct is freed). This patch changes v9fs_remove to use a cloned fid, so the primary fid is not locked and freed. Signed-off-by: Latchesar Ionkov <lucho@ionkov.net> Cc: Eric Van Hensbergen <ericvh@hera.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-24 08:23:08 -07:00
Steve French	2dd29d3133	[CIFS] New CIFS POSIX mkdir performance improvement Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-04-23 22:07:35 +00:00
David Woodhouse	44b998e1eb	[JFFS2] Improve failure mode if inode checking leaves unchecked space. We should never find the unchecked size is non-zero after we've finished checking all inodes. If it happens, used to BUG(), leaving the alloc_sem held and deadlocking. Instead, just return -ENOSPC after complaining. The GC thread will die, but read-only operation should be able to continue and the file system should be unmountable. Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-04-23 12:11:46 +01:00

... 2 3 4 5 6 ...

5481 Commits (6d18c9220965b437287c3a7e803725c24992ceac)