Commit graph

8831 commits

Author SHA1 Message Date
Denis V. Lunev
21ac295b42 afs: use non-racy method for proc entries creation
Use proc_create()/proc_create_data() to make sure that ->proc_fops and ->data
be setup before gluing PDE to main tree.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:20 -07:00
Denis V. Lunev
34b37235c6 nfs: use proc_create to setup de->proc_fops
Use proc_create() to make sure that ->proc_fops be setup before gluing PDE to
main tree.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:20 -07:00
Denis V. Lunev
9ef2db2630 nfsd: use proc_create to setup de->proc_fops
Use proc_create() to make sure that ->proc_fops be setup before gluing PDE to
main tree.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:20 -07:00
Denis V. Lunev
59b7435149 proc: introduce proc_create_data to setup de->data
This set of patches fixes an proc ->open'less usage due to ->proc_fops flip in
the most part of the kernel code.  The original OOPS is described in the
commit 2d3a4e3666:

    Typical PDE creation code looks like:

    	pde = create_proc_entry("foo", 0, NULL);
    	if (pde)
    		pde->proc_fops = &foo_proc_fops;

    Notice that PDE is first created, only then ->proc_fops is set up to
    final value. This is a problem because right after creation
    a) PDE is fully visible in /proc , and
    b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's
       possible to ->read without ->open (see one class of oopses below).

    The fix is new API called proc_create() which makes sure ->proc_fops are
    set up before gluing PDE to main tree. Typical new code looks like:

    	pde = proc_create("foo", 0, NULL, &foo_proc_fops);
    	if (!pde)
    		return -ENOMEM;

    Fix most networking users for a start.

    In the long run, create_proc_entry() for regular files will go.

In addition to this, proc_create_data is introduced to fix reading from
proc without PDE->data. The race is basically the same as above.

create_proc_entries is replaced in the entire kernel code as new method
is also simply better.

This patch:

The problem is the same as for de->proc_fops.  Right now PDE becomes visible
without data set.  So, the entry could be looked up without data.  This, in
most cases, will simply OOPS.

proc_create_data call is created to address this issue.  proc_create now
becomes a wrapper around it.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Chris Mason <chris.mason@oracle.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jaroslav Kysela <perex@suse.cz>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Karsten Keil <kkeil@suse.de>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Osterlund <petero2@telia.com>
Cc: Pierre Peiffer <peifferp@gmail.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:20 -07:00
Alexey Dobriyan
b640a89ddd proc: convert /proc/tty/ldiscs to seq_file interface
Note: THIS_MODULE and header addition aren't technically needed because
      this code is not modular, but let's keep it anyway because people
      can copy this code into modular code.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:20 -07:00
Alexey Dobriyan
8731f14d37 proc: remove ->get_info infrastructure
Now that last dozen or so users of ->get_info were removed, ditch it too.
Everyone sane shouldd have switched to seq_file interface long ago.

P.S.: Co-existing 3 interfaces (->get_info/->read_proc/->proc_fops) for proc
      is long-standing crap, BTW, thus
      a) put ->read_proc/->write_proc/read_proc_entry() users on death row,
      b) new such users should be rejected,
      c) everyone is encouraged to convert his favourite ->read_proc user or
         I'll do it, lazy bastards.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:19 -07:00
Alexey Dobriyan
c74c120a21 proc: remove proc_root from drivers
Remove proc_root export.  Creation and removal works well if parent PDE is
supplied as NULL -- it worked always that way.

So, one useless export removed and consistency added, some drivers created
PDEs with &proc_root as parent but removed them as NULL and so on.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:18 -07:00
Alexey Dobriyan
928b4d8c89 proc: remove proc_root_driver
Use creation by full path: "driver/foo".

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:18 -07:00
Alexey Dobriyan
36a5aeb878 proc: remove proc_root_fs
Use creation by full path instead: "fs/foo".

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:18 -07:00
Alexey Dobriyan
9c37066d88 proc: remove proc_bus
Remove proc_bus export and variable itself. Using pathnames works fine
and is slightly more understandable and greppable.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:18 -07:00
Alexey Dobriyan
5e971dce0b proc: drop several "PDE valid/invalid" checks
proc-misc code is noticeably full of "if (de)" checks when PDE passed is
always valid.  Remove them.

Addition of such check in proc_lookup_de() is for failed lookup case.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:18 -07:00
Alexey Dobriyan
7cee4e00e0 proc: less special case in xlate code
If valid "parent" is passed to proc_create/remove_proc_entry(), then name of
PDE should consist of only one path component, otherwise creation or or
removal will fail.  However, if NULL is passed as parent then create/remove
accept full path as a argument.  This is arbitrary restriction -- all
infrastructure is in place.

So, patch allows the following to succeed:

	create_proc_entry("foo/bar", 0, pde_baz);
	remove_proc_entry("baz/foo/bar", &proc_root);

Also makes the following to behave identically:

	create_proc_entry("foo/bar", 0, NULL);
	create_proc_entry("foo/bar", 0, &proc_root);

Discrepancy noticed by Den Lunev (IIRC).

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:17 -07:00
Alexey Dobriyan
f649d6d326 proc: simplify locking in remove_proc_entry()
proc_subdir_lock protects only modifying and walking through PDE lists, so
after we've found PDE to remove and actually removed it from lists, there is
no need to hold proc_subdir_lock for the rest of operation.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:17 -07:00
Roland McGrath
638fa202cd procfs: mem permission cleanup
This cleans up the permission checks done for /proc/PID/mem i/o calls.  It
puts all the logic in a new function, check_mem_permission().

The old code repeated the (!MAY_PTRACE(task) || !ptrace_may_attach(task))
magical expression multiple times.  The new function does all that work in one
place, with clear comments.

The old code called security_ptrace() twice on successful checks, once in
MAY_PTRACE() and once in __ptrace_may_attach().  Now it's only called once,
and only if all other checks have succeeded.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:17 -07:00
Alexey Dobriyan
0d5c9f5f59 proc: switch to proc_create()
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:17 -07:00
Matt Helsley
925d1c401f procfs task exe symlink
The kernel implements readlink of /proc/pid/exe by getting the file from
the first executable VMA.  Then the path to the file is reconstructed and
reported as the result.

Because of the VMA walk the code is slightly different on nommu systems.
This patch avoids separate /proc/pid/exe code on nommu systems.  Instead of
walking the VMAs to find the first executable file-backed VMA we store a
reference to the exec'd file in the mm_struct.

That reference would prevent the filesystem holding the executable file
from being unmounted even after unmapping the VMAs.  So we track the number
of VM_EXECUTABLE VMAs and drop the new reference when the last one is
unmapped.  This avoids pinning the mounted filesystem.

[akpm@linux-foundation.org: improve comments]
[yamamoto@valinux.co.jp: fix dup_mmap]
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: David Howells <dhowells@redhat.com>
Cc:"Eric W. Biederman" <ebiederm@xmission.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:17 -07:00
Alexey Dobriyan
e93b4ea20a proc: print more information when removing non-empty directories
This usually saves one recompile to insert similar printk like below. :)

Sample nastygram:

remove_proc_entry: removing non-empty directory '/proc/foo', leaking at least 'bar'
------------[ cut here ]------------
WARNING: at fs/proc/generic.c:776 remove_proc_entry+0x18a/0x200()
Modules linked in: foo(-) container fan battery dock sbs ac sbshc backlight ipv6 loop af_packet amd_rng sr_mod i2c_amd8111 i2c_amd756 cdrom i2c_core button thermal processor
Pid: 3034, comm: rmmod Tainted: G   M     2.6.25-rc1 #5

Call Trace:
 [<ffffffff80231974>] warn_on_slowpath+0x64/0x90
 [<ffffffff80232a6e>] printk+0x4e/0x60
 [<ffffffff802d6c8a>] remove_proc_entry+0x18a/0x200
 [<ffffffff8045cd88>] mutex_lock_nested+0x1c8/0x2d0
 [<ffffffff8025f0f0>] __try_stop_module+0x0/0x40
 [<ffffffff8025effd>] sys_delete_module+0x14d/0x200
 [<ffffffff8045df3d>] lockdep_sys_exit_thunk+0x35/0x67
 [<ffffffff8031c307>] __up_read+0x27/0xa0
 [<ffffffff8045decc>] trace_hardirqs_on_thunk+0x35/0x3a
 [<ffffffff8020b6ab>] system_call_after_swapgs+0x7b/0x80

---[ end trace 10ef850597e89c54 ]---

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:17 -07:00
WANG Cong
4220b7fe89 elf: fix shadowed variables in fs/binfmt_elf.c
Fix these sparse warings:
fs/binfmt_elf.c:1749:29: warning: symbol 'tmp' shadows an earlier one
fs/binfmt_elf.c:1734:28: originally declared here
fs/binfmt_elf.c:2009:26: warning: symbol 'vma' shadows an earlier one
fs/binfmt_elf.c:1892:24: originally declared here

[akpm@linux-foundation.org: chose better variable name]
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:16 -07:00
Cyrill Gorcunov
6970c8eff8 BINFMT: fill_elf_header cleanup - use straight memset first
This patch does simplify fill_elf_header function by setting
to zero the whole elf header first. So we fillup the fields
we really need only.

before:
   text    data     bss     dec     hex filename
  11735      80       0   11815    2e27 fs/binfmt_elf.o

after:
   text    data     bss     dec     hex filename
  11710      80       0   11790    2e0e fs/binfmt_elf.o

viola, 25 bytes of text is freed

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:16 -07:00
Balbir Singh
cf475ad28a cgroups: add an owner to the mm_struct
Remove the mem_cgroup member from mm_struct and instead adds an owner.

This approach was suggested by Paul Menage.  The advantage of this approach
is that, once the mm->owner is known, using the subsystem id, the cgroup
can be determined.  It also allows several control groups that are
virtually grouped by mm_struct, to exist independent of the memory
controller i.e., without adding mem_cgroup's for each controller, to
mm_struct.

A new config option CONFIG_MM_OWNER is added and the memory resource
controller selects this config option.

This patch also adds cgroup callbacks to notify subsystems when mm->owner
changes.  The mm_cgroup_changed callback is called with the task_lock() of
the new task held and is called just prior to changing the mm->owner.

I am indebted to Paul Menage for the several reviews of this patchset and
helping me make it lighter and simpler.

This patch was tested on a powerpc box, it was compiled with both the
MM_OWNER config turned on and off.

After the thread group leader exits, it's moved to init_css_state by
cgroup_exit(), thus all future charges from runnings threads would be
redirected to the init_css_set's subsystem.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelianov <xemul@openvz.org>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: Hirokazu Takahashi <taka@valinux.co.jp>
Cc: David Rientjes <rientjes@google.com>,
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Reviewed-by: Paul Menage <menage@google.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:10 -07:00
Serge E. Hallyn
08ce5f16ee cgroups: implement device whitelist
Implement a cgroup to track and enforce open and mknod restrictions on device
files.  A device cgroup associates a device access whitelist with each cgroup.
 A whitelist entry has 4 fields.  'type' is a (all), c (char), or b (block).
'all' means it applies to all types and all major and minor numbers.  Major
and minor are either an integer or * for all.  Access is a composition of r
(read), w (write), and m (mknod).

The root device cgroup starts with rwm to 'all'.  A child devcg gets a copy of
the parent.  Admins can then remove devices from the whitelist or add new
entries.  A child cgroup can never receive a device access which is denied its
parent.  However when a device access is removed from a parent it will not
also be removed from the child(ren).

An entry is added using devices.allow, and removed using
devices.deny.  For instance

	echo 'c 1:3 mr' > /cgroups/1/devices.allow

allows cgroup 1 to read and mknod the device usually known as
/dev/null.  Doing

	echo a > /cgroups/1/devices.deny

will remove the default 'a *:* mrw' entry.

CAP_SYS_ADMIN is needed to change permissions or move another task to a new
cgroup.  A cgroup may not be granted more permissions than the cgroup's parent
has.  Any task can move itself between cgroups.  This won't be sufficient, but
we can decide the best way to adequately restrict movement later.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix may-be-used-uninitialized warning]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Acked-by: James Morris <jmorris@namei.org>
Looks-good-to: Pavel Emelyanov <xemul@openvz.org>
Cc: Daniel Hokka Zakrisson <daniel@hozac.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:09 -07:00
Michael Halcrow
2f9b12a31f eCryptfs: protect crypt_stat->flags in ecryptfs_open()
Make sure crypt_stat->flags is protected with a lock in ecryptfs_open().

Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:07 -07:00
Michael Halcrow
6a3fd92e73 eCryptfs: make key module subsystem respect namespaces
Make eCryptfs key module subsystem respect namespaces.

Since I will be removing the netlink interface in a future patch, I just made
changes to the netlink.c code so that it will not break the build.  With my
recent patches, the kernel module currently defaults to the device handle
interface rather than the netlink interface.

[akpm@linux-foundation.org: export free_user_ns()]
Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:07 -07:00
Michael Halcrow
f66e883eb6 eCryptfs: integrate eCryptfs device handle into the module.
Update the versioning information.  Make the message types generic.  Add an
outgoing message queue to the daemon struct.  Make the functions to parse
and write the packet lengths available to the rest of the module.  Add
functions to create and destroy the daemon structs.  Clean up some of the
comments and make the code a little more consistent with itself.

[akpm@linux-foundation.org: printk fixes]
Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:07 -07:00
Michael Halcrow
8bf2debd5f eCryptfs: introduce device handle for userspace daemon communications
A regular device file was my real preference from the get-go, but I went with
netlink at the time because I thought it would be less complex for managing
send queues (i.e., just do a unicast and move on).  It turns out that we do
not really get that much complexity reduction with netlink, and netlink is
more heavyweight than a device handle.

In addition, the netlink interface to eCryptfs has been broken since 2.6.24.
I am assuming this is a bug in how eCryptfs uses netlink, since the other
in-kernel users of netlink do not seem to be having any problems.  I have had
one report of a user successfully using eCryptfs with netlink on 2.6.24, but
for my own systems, when starting the userspace daemon, the initial helo
message sent to the eCryptfs kernel module results in an oops right off the
bat.  I spent some time looking at it, but I have not yet found the cause.
The netlink interface breaking gave me the motivation to just finish my patch
to migrate to a regular device handle.  If I cannot find out soon why the
netlink interface in eCryptfs broke, I am likely to just send a patch to
disable it in 2.6.24 and 2.6.25.  I would like the device handle to be the
preferred means of communicating with the userspace daemon from 2.6.26 on
forward.

This patch:

Functions to facilitate reading and writing to the eCryptfs miscellaneous
device handle.  This will replace the netlink interface as the preferred
mechanism for communicating with the userspace eCryptfs daemon.

Each user has his own daemon, which registers itself by opening the eCryptfs
device handle.  Only one daemon per euid may be registered at any given time.
The eCryptfs module sends a message to a daemon by adding its message to the
daemon's outgoing message queue.  The daemon reads the device handle to get
the oldest message off the queue.

Incoming messages from the userspace daemon are immediately handled.  If the
message is a response, then the corresponding process that is blocked waiting
for the response is awakened.

Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:07 -07:00
Miklos Szeredi
9c3580aa52 ecryptfs: add missing lock around notify_change
Callers of notify_change() need to hold i_mutex.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:07 -07:00
Harvey Harrison
18d1dbf1d4 ecryptfs: replace remaining __FUNCTION__ occurrences
__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:06 -07:00
Adrian Bunk
05db67a4f2 remove ecryptfs_header_cache_0
Remove the no longer used ecryptfs_header_cache_0.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:06 -07:00
OGAWA Hirofumi
762873c251 vfs: fix unconditional write_super() call in file_fsync()
We need to check ->s_dirt before calling write_super().  It became the cause
of an unneeded write.

This bug was noticed by Sudhanshu Saxena.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:06 -07:00
David Howells
8f0cfa52a1 xattr: add missing consts to function arguments
Add missing consts to xattr function arguments.

Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:06 -07:00
Jan Blunck
7ec02ef159 vfs: remove lives_below_in_same_fs()
Remove lives_below_in_same_fs() since is_subdir() from fs/dcache.c is
providing the same functionality.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Acked-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:06 -07:00
Matthias Kaehlcke
c5c8be3ce5 fs/inode.c: use hlist_for_each_entry()
fs/inode.c: use hlist_for_each_entry() in find_inode() and find_inode_fast()

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Matthias Kaehlcke <matthias@kaehlcke.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:06 -07:00
Jan Kara
af065b8a19 vfs: skip inodes without pages to free in drop_pagecache_sb()
Many inodes have no pagecache, so we can avoid lots of lock-takings.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Fengguang Wu <wfg@mail.ustc.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:05 -07:00
Jan Kara
eccb95cee4 vfs: fix lock inversion in drop_pagecache_sb()
Fix longstanding lock inversion in drop_pagecache_sb by dropping inode_lock
before calling __invalidate_mapping_pages().  We just have to make sure inode
won't go away from under us by keeping reference to it and putting the
reference only after we have safely resumed the scan of the inode list.  A bit
tricky but not too bad...

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: David Chinner <dgc@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:05 -07:00
David Howells
e1d2c8b69a fdpic: check that the size returned by kernel_read() is what we asked for
Check that the size of the read returned by kernel_read() is what we asked
for.  If it isn't, then reject the binary as being a badly formatted.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:05 -07:00
S.Caglar Onur
2e50b6ccda fs/binfmt_aout.c: use printk_ratelimit()
Use printk_ratelimit() instead of jiffies based arithmetic, suggested by Geert
Uytterhoeven

Signed-off-by: S.Caglar Onur <caglar@pardus.org.tr>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:04 -07:00
Pavel Emelyanov
3a2e7f47d7 binfmt_misc.c: avoid potential kernel stack overflow
This can be triggered with root help only, but...

Register the ":text:E::txt::/root/cat.txt:' rule in binfmt_misc (by root) and
try launching the cat.txt file (by anyone) :) The result is - the endless
recursion in the load_misc_binary -> open_exec -> load_misc_binary chain and
stack overflow.

There's a similar problem with binfmt_script, and there's a sh_bang memner on
linux_binprm structure to handle this, but simply raising this in binfmt_misc
may break some setups when the interpreter of some misc binaries is a script.

So the proposal is to turn sh_bang into a bit, add a new one (the misc_bang)
and raise it in load_misc_binary.  After this, even if we set up the misc ->
script -> misc loop for binfmts one of them will step on its own bang and
exit.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:04 -07:00
Andrew Morton
a2d416dcc9 codafs: fix build warning
powerpc:

fs/coda/coda_linux.c: In function 'coda_iattr_to_vattr':
fs/coda/coda_linux.c:137: warning: large integer implicitly truncated to unsigned type

Cc: Jan Harkes <jaharkes@cs.cmu.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:04 -07:00
Tetsuo Handa
175a06ae30 exec: remove argv_len from struct linux_binprm
I noticed that 2.6.24.2 calculates bprm->argv_len at do_execve().  But it
doesn't update bprm->argv_len after "remove_arg_zero() +
copy_strings_kernel()" at load_script() etc.

audit_bprm() is called from search_binary_handler() and
search_binary_handler() is called from load_script() etc.  Thus, I think the
condition check

  if (bprm->argv_len > (audit_argv_kb << 10))
          return -E2BIG;

in audit_bprm() might return wrong result when strlen(removed_arg) !=
strlen(spliced_args).  Why not update bprm->argv_len at load_script() etc.  ?

By the way, 2.6.25-rc3 seems to not doing the condition check.  Is the field
bprm->argv_len no longer needed?

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Ollie Wild <aaw@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:03 -07:00
Julia Lawall
8d4b69002e fs/affs/file.c: use BUG_ON
if (...) BUG(); should be replaced with BUG_ON(...) when the test has no
side-effects to allow a definition of BUG_ON that drops the code completely.

The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@ disable unlikely @ expression E,f; @@

(
  if (<... f(...) ...>) { BUG(); }
|
- if (unlikely(E)) { BUG(); }
+ BUG_ON(E);
)

@@ expression E,f; @@

(
  if (<... f(...) ...>) { BUG(); }
|
- if (E) { BUG(); }
+ BUG_ON(E);
)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:02 -07:00
Jim Meyering
cd6fda3608 hfsplus: handle match_strdup failure
fs/hfsplus/options.c (hfsplus_parse_options): Handle match_strdup failure.

Signed-off-by: Jim Meyering <meyering@redhat.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:02 -07:00
Jim Meyering
3fbe5c3100 hfs: handle match_strdup failure
fs/hfs/super.c (parse_options): Handle match_strdup failure, twice.

Signed-off-by: Jim Meyering <meyering@redhat.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:01 -07:00
Jim Meyering
6db27dd9d2 affs: handle match_strdup failure
fs/affs/super.c (parse_options): Remove useless initialization.  Handle
match_strdup failure.

Signed-off-by: Jim Meyering <meyering@redhat.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:01 -07:00
Jiri Olsa
61d64576a2 fs: remove unused fops from struct char_device_struct
struct char_device_struct::fops is no longer used: remove it.

Signed-off-by: Jiri Olsa <olsajiri@gmail.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:01 -07:00
Harvey Harrison
f249fdd8c1 autofs4: fix sparse warning in root.c
fs/autofs4/root.c:536:23: warning: symbol 'ino' shadows an earlier one
fs/autofs4/root.c:510:22: originally declared here

There is no need to redeclare, we are at the end of the loop and in
the next iteration of the loop, ino will be reset.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:01 -07:00
Adrian Bunk
3202e1811f make BINFMT_FLAT a bool
I have not yet seen anyone saying he has a reasonable use case for using
BINFMT_FLAT modular on his embedded device.

Considering that fs/binfmt_flat.c even lacks a MODULE_LICENSE() I really doubt
there is any, and this patch therefore makes BINFMT_FLAT a bool.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Bryan Wu <cooloney.lkml@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:01 -07:00
Adrian Bunk
f1e3af72c1 make fs/buffer.c:cont_expand_zero() static
cont_expand_zero() can become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:01 -07:00
Adrian Bunk
946a57b526 remove generic_commit_write()
Remove the obsolete and no longer used generic_commit_write().

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:01 -07:00
Adrian Bunk
45cc2b96f2 fs/timerfd.c should #include <linux/syscalls.h>
Every file should include the headers containing the prototypes for its global
functions (in this case for sys_timerfd_*()).

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:01 -07:00
Adrian Bunk
d5470b596a fs/aio.c: make 3 functions static
Make the following needlessly global functions static:

- __put_ioctx()
- lookup_ioctx()
- io_submit_one()

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:00 -07:00
Adrian Bunk
07d45da616 fs/drop_caches.c: make 2 functions static
Make the following needlessly global functions static:

- drop_pagecache()
- drop_slab()

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:00 -07:00
Adrian Bunk
f11b00f3bd fs/fs-writeback.c: make 2 functions static
Make the following needlessly global functions static:

- writeback_acquire()
- writeback_release()

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:00 -07:00
Adrian Bunk
67cde59537 make vfs_ioctl() static
Make the needlessly global vfs_ioctl() static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:00 -07:00
Adrian Bunk
6b09ae6692 make __put_super() static
Make the needlessly global __put_super() static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:00 -07:00
Adrian Bunk
8b1919a1e8 fs/freevxfs/: proper externs
Move the extern declarations of several structs to vxfs_extern.h

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:00 -07:00
Adrian Bunk
4b0a8da7a7 fs/hfsplus/: proper externs
Add proper extern declarations for two structs in fs/hfsplus/hfsplus_fs.h

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:00 -07:00
Adrian Bunk
4488c59c94 fs/ramfs/ extern cleanup
- internal.h shouldn't duplicate the extern declaration for
  ramfs_file_operations already in include/linux/ramfs.h
- file-mmu.c needs two #include's for seeing the extern declarations
  of it's global struct's

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:00 -07:00
Harvey Harrison
63e3453e54 befs: fix sparse warning in linuxvfs.c
Use link as the variable name to avoid shadowing the arg.

fs/befs/linuxvfs.c:492:8: warning: symbol 'p' shadows an earlier one
fs/befs/linuxvfs.c:488:77: originally declared here

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: "Sergey S. Kostyliov" <rathamahata@php4.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:05:59 -07:00
Harvey Harrison
9fe76c763f coda: add static to functions in dir.c
coda_unlink, coda_rmdir, coda_readdir can all be static, the forward
declarations already were.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Jan Harkes <jaharkes@cs.cmu.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:05:59 -07:00
Harvey Harrison
e5949050f2 adfs: work around bogus sparse warning
fs/adfs/dir_f.c:126:4: warning: do-while statement is not a compound statement

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:05:59 -07:00
Davide Libenzi
cdac75e6f2 epoll: avoid kmemcheck warning
Epoll calls rb_set_parent(n, n) to initialize the rb-tree node, but
rb_set_parent() accesses node's pointer in its code.  This creates a
warning in kmemcheck (reported by Vegard Nossum) about an uninitialized
memory access.  The warning is harmless since the following rb-tree node
insert is going to overwrite the node data.  In any case I think it's
better to not have that happening at all, and fix it by simplifying the
code to get rid of a few lines that became superfluous after the previous
epoll changes.

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:05:59 -07:00
FUJITA Tomonori
68154e90c9 block: add dma alignment and padding support to blk_rq_map_kern
This patch adds bio_copy_kern similar to
bio_copy_user. blk_rq_map_kern uses bio_copy_kern instead of
bio_map_kern if necessary.

bio_copy_kern uses temporary pages and the bi_end_io callback frees
these pages. bio_copy_kern saves the original kernel buffer at
bio->bi_private it doesn't use something like struct bio_map_data to
store the information about the caller.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-04-29 09:50:34 +02:00
Tom Zanussi
c3270e577c relay: fix splice problem
Splice isn't always incrementing the ppos correctly, which broke
relay splice.

Signed-off-by: Tom Zanussi <zanussi@comcast.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-04-29 09:48:15 +02:00
Olof Johansson
9f354858b8 fatfs: fix build warning with 64k PAGE_SIZE
Annoying gcc warning:

fs/fat/inode.c: In function 'fat_fill_super':
fs/fat/inode.c:1222: warning: comparison is always false due to limited range of data type

Change it to compare with 4K instead of PAGE_CACHE_SIZE, as suggested
by OGAWA-san.

[FAT spec says: logical_sector_size should be 512, 1024, 2048 4096]
So, at least for now, we limit it to 4096.

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:47 -07:00
Frank Seidel
0607fd0258 fat: detect media without partition table correctly
I received a complaint that some FAT formated medias (e.g.  sd memory cards)
trigger a "unknown partition table" message even though there is no partition
table and they work correctly, while in general (when e.g.  formated with
mkdosfs or even Windows Vista) this message is not shown.

Currently this seems only to happen when the medias get formatted with Windows
XP (and possibly Win 2000).  Then the boot indicator byte contains garbage
(part of text message) and so do the other parts checked by msdos_paritition
which then later triggers this message.

References: novell bug #364365

Most fat formatted media without partition table contains zeros in the boot
indication and the other tested bytes and so falls through the checks in
msdos_partition, leading it to return with 1 (all is fine).

But some (e.g.  WinXP formatted) fat fomated medias don't use boot_ind and so
the check fails and causes a "unkown partition table" warning eventhough there
is none and everything would be fine.

This additional check directly verifies if there is a fat formatted medium
without a partition table.

Signed-off-by: Frank Seidel <fseidel@suse.de>
Cc: Andreas Dilger <adilger@sun.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:47 -07:00
Andrew Morton
73f20e58b1 FAT_VALID_MEDIA(): remove pointless test
The on-disk media specification field in FAT is only 8-bits, so testing for
<=0xff is pointless, and can generate a "comparison is always true due to
limited range of data type" warning.

While we're there, convert FAT_VALID_MEDIA() into a C function - the present
implementation is buggy: it generates either one or two references to its
argument.

Cc: Frank Seidel <fseidel@suse.de>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:47 -07:00
OGAWA Hirofumi
c7a6c4edc7 fat: use __getname()
__getname() is faster than __get_free_page(). Use it.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:47 -07:00
Keith Mok
f22032ba8d vfat: bug fix for vfat cannot handle filename with 255
This patch fix the problem that the buffer allocated for convert of unicode to
utf8 in fat/dir.c is too small.

And cannot handle filename with 255 asian characters when mounted with utf8
options.

Also it fix the filename length limitation checking in vfat/namei.c that the
filename length should be checked against the number of converted unicode
characters.

Not the length before NLS/UTF8 converted.

Signed-off-by: Keith Mok <ek9852@gmail.com>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:47 -07:00
OGAWA Hirofumi
061e97469f Add balance_dirty_pages_ratelimited() to cont_expand_zero()
On the systems, ftruncate() which expand size for FAT became the cause
of OOM.  The cont_expand_zero() filled all memory with dirty pages,
and since disk is very slow, limit of page scanning was exceeded, then
it triggered OOM.

This adds balance_dirty_pages_ratelimited() to avoid filling memory
with dirty pages.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:47 -07:00
OGAWA Hirofumi
e69be4c9c4 fat: Remove fat_clusters_flush()
This removes unneeded fat_clusters_flush().

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:47 -07:00
OGAWA Hirofumi
606e423e43 fat: Update free_clusters even if it is untrusted
Currently, free_clusters is not updated until it is trusted, because
Windows doesn't update it correctly.

But if user is using FAT driver of Linux, it updates free_clusters
correctly.  Instead, this updates it even if it's untrusted, so if
free_clustes is correct, now keep correct value.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:47 -07:00
OGAWA Hirofumi
1ae43f826b fat: Add allow_utime option
Normally utime(2) checks current process is owner of the file, or it
has CAP_FOWNER capability.  But FAT filesystem doesn't have uid/gid as
on disk info, so normal check is too unflexible.

With this option you can relax it.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:47 -07:00
OGAWA Hirofumi
e97e8de388 fat: fat_setattr() fix
Fix fat_setattr() on the case of showexec option. If user specified
showexec option, inode->i_mode may not have S_IXUGO. This just use
inode->i_mode to fix it.

And with this patch, we don't allow chmod() on memory inode, it's just
bad behaviour. IOW, we allow changing S_IWUGO only which can be stored
to disk.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:47 -07:00
OGAWA Hirofumi
1278fdd34b fat: fat_notify_change() and check_mode() cleanup
- Rename fat_notify_change() to fat_setattr()
- check_mode() cleanup
- Change layout of code

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:47 -07:00
OGAWA Hirofumi
3754a54447 fat: kill is_bad_inode() check
FAT doesn't need to check bad inode anymore.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:46 -07:00
Jan Kara
d5dee5c395 reiserfs: unpack tails on quota files
Quota files cannot have tails because quota_write and quota_read functions do
not support them.  So far when quota files did have tail, we just refused to
turn quotas on it.  Sadly this check has been wrong and so there are now
plenty installations where quota files don't have NOTAIL flag set and so now
after fixing the check, they suddently fail to turn quotas on.  Since it's
easy to unpack the tail from kernel, do this from reiserfs_quota_on() which
solves the problem and is generally nicer to users anyway.

Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: <urhausen@urifabi.net>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:46 -07:00
Jan Kara
a2fe594fa3 reiserfs: fix hang on umount with quotas when journal is aborted
Call dquot_drop() from reiserfs_dquot_drop() even if we fail to start a
transaction.  Otherwise we never get to dropping references to quota
structures from the inode and umount will hang indefinitely.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:46 -07:00
Harvey Harrison
fbe5498b3d reiserfs: replace remaining __FUNCTION__ occurrences
__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:46 -07:00
Harvey Harrison
8acc570fab reiserfs: fix more sparse warnings in do_balan.c
fs/reiserfs/do_balan.c:1467:10: warning: symbol 'ret_val' shadows an earlier one
fs/reiserfs/do_balan.c:275:6: originally declared here
fs/reiserfs/do_balan.c:1471:23: warning: symbol 'ih' shadows an earlier one
fs/reiserfs/do_balan.c:249:67: originally declared here

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:46 -07:00
Harvey Harrison
e13601bc6a reiserfs: fix sparse warning in journal.c
fs/reiserfs/journal.c:4319:2: warning: returning void-valued expression

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:46 -07:00
Marcin Slusarz
9e902df6be reiserfs: le*_add_cpu conversion
replace all:
little_endian_variable = cpu_to_leX(leX_to_cpu(little_endian_variable) +
					expression_in_cpu_byteorder);
with:
	leX_add_cpu(&little_endian_variable, expression_in_cpu_byteorder);
generated with semantic patch

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:46 -07:00
Harvey Harrison
78e917d59c udf: fix sparse warning in namei.c
Let's use bsize instead.
fs/udf/namei.c:960:12: warning: symbol 'elen' shadows an earlier one
fs/udf/namei.c:937:15: originally declared here

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:46 -07:00
Harvey Harrison
36a53ddf85 ufs: replace __inline with inline
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:45 -07:00
Marcin Slusarz
0045edaaf9 ufs: remove unused fs64_add and fs64_sub
remove fs64_add and fs64_sub - they probably weren't ever used because
their prototypes used u32 instead of __fs64

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:45 -07:00
Harvey Harrison
9746077a71 ufs: replace remaining __FUNCTION__ occurrences
__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:45 -07:00
Marcin Slusarz
3c5afae2ba ufs: [bl]e*_add_cpu conversion
replace all:
big/little_endian_variable = cpu_to_[bl]eX([bl]eX_to_cpu(big/little_endian_variable) +
					expression_in_cpu_byteorder);
with:
	[bl]eX_add_cpu(&big/little_endian_variable, expression_in_cpu_byteorder);
generated with semantic patch

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:45 -07:00
Harvey Harrison
08fc99bfc3 jbd: replace remaining __FUNCTION__ occurrences
__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:45 -07:00
Harvey Harrison
e05b6b524b ext3: replace remaining __FUNCTION__ occurrences
__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:45 -07:00
Jan Kara
fa1ff1e02f ext3: fix mount messages when quota disabled
When quota is disabled, we should not print 'journaled quota not supported'
when user tried to mount non-journaled quota.  Also fix typo in the message.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:45 -07:00
Aneesh Kumar K.V
2588ef83f7 ext3: retry block allocation if new blocks are allocated from system zone
If the block allocator gets blocks out of system zone ext3 calls ext3_error.
But if the file system is mounted with errors=continue retry block allocation.
 We need to mark the system zone blocks as in use to make sure retry don't
pick them again

System zone is the block range mapping block bitmap, inode bitmap and inode
table.

[akpm@linux-foundation.org: fix typo in comment]
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:45 -07:00
Jan Kara
07c9938a4e ext3: fix hang on umount with quotas when journal is aborted
Call dquot_drop() from ext3_dquot_drop() even if we fail to start a
transaction.  Otherwise we never get to dropping references to quota
structures from the inode and umount will hang indefinitely.  Thanks to
Payphone LIOU for spotting the problem.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Payphone LIOU <lioupayphone@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:45 -07:00
Jan Kara
0b23076988 ext3: fix update of mtime and ctime on rename
Make ext3 update mtime and ctime of the directory into which we move file even
if the directory entry already exists.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:44 -07:00
Josef Bacik
5b9a499d77 jbd: fix possible journal overflow issues
There are several cases where the running transaction can get buffers added to
its BJ_Metadata list which it never dirtied, which makes its t_nr_buffers
counter end up larger than its t_outstanding_credits counter.

This will cause issues when starting new transactions as while we are logging
buffers we decrement t_outstanding_buffers, so when t_outstanding_buffers goes
negative, we will report that we need less space in the journal than we
actually need, so transactions will be started even though there may not be
enough room for them.  In the worst case scenario (which admittedly is almost
impossible to reproduce) this will result in the journal running out of space.

The fix is to only
refile buffers from the committing transaction to the running transactions
BJ_Modified list when b_modified is set on that journal, which is the only way
to be sure if the running transaction has modified that buffer.

This patch also fixes an accounting error in journal_forget, it is possible
that we can call journal_forget on a buffer without having modified it, only
gotten write access to it, so instead of freeing a credit, we only do so if
the buffer was modified.  The assert will help catch if this problem occurs.
Without these two patches I could hit this assert within minutes of running
postmark, with them this issue no longer arises.  Thank you,

Signed-off-by: Josef Bacik <jbacik@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Acked-by: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:44 -07:00
Josef Bacik
5bc833feaa jbd: fix the way the b_modified flag is cleared
Currently at the start of a journal commit we loop through all of the buffers
on the committing transaction and clear the b_modified flag (the flag that is
set when a transaction modifies the buffer) under the j_list_lock.

The problem is that everywhere else this flag is modified only under the jbd
lock buffer flag, so it will race with a running transaction who could
potentially set it, and have it unset by the committing transaction.

This is also a big waste, you can have several thousands of buffers that you
are clearing the modified flag on when you may not need to.  This patch
removes this code and instead clears the b_modified flag upon entering
do_get_write_access/journal_get_create_access, so if that transaction does
indeed use the buffer then it will be accounted for properly, and if it does
not then we know we didn't use it.

That will be important for the next patch in this series.  Tested thoroughly
by myself using postmark/iozone/bonnie++.

Signed-off-by: Josef Bacik <jbacik@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Acked-by: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:44 -07:00
Julia Lawall
269b261916 fs/ext3: use BUG_ON
if (...) BUG(); should be replaced with BUG_ON(...) when the test has no
side-effects to allow a definition of BUG_ON that drops the code completely.

The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@ disable unlikely @ expression E,f; @@

(
  if (<... f(...) ...>) { BUG(); }
|
- if (unlikely(E)) { BUG(); }
+ BUG_ON(E);
)

@@ expression E,f; @@

(
  if (<... f(...) ...>) { BUG(); }
|
- if (E) { BUG(); }
+ BUG_ON(E);
)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:44 -07:00
Akinobu Mita
33575f8ffe ext3: check ext3_journal_get_write_access() errors
Check ext3_journal_get_write_access() errors.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:44 -07:00
Akinobu Mita
e0e369a7dd ext3: use ext3_get_group_desc()
Use ext3_get_group_desc()

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:44 -07:00
Akinobu Mita
22a5daf537 ext3: add missing ext3_journal_stop()
Add missing ext3_journal_stop() in error handling.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:44 -07:00
Akinobu Mita
1eaafeae4b ext3: use ext3_group_first_block_no()
Use ext3_group_first_block_no()

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:44 -07:00
Adrian Bunk
15633005e0 make ext3_xattr_list() static
Make the needlessly global ext3_xattr_list() static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:44 -07:00