linux

q3k/linux

Author	SHA1	Message	Date
Oleg Nesterov	a2a8474c3f	exec: do not sleep in TASK_TRACED under ->cred_guard_mutex Tom Horsley reports that his debugger hangs when it tries to read /proc/pid_of_tracee/maps, this happens since "mm_for_maps: take ->cred_guard_mutex to fix the race with exec" 04b836cbf19e885f8366bccb2e4b0474346c02d commit in 2.6.31. But the root of the problem lies in the fact that do_execve() path calls tracehook_report_exec() which can stop if the tracer sets PT_TRACE_EXEC. The tracee must not sleep in TASK_TRACED holding this mutex. Even if we remove ->cred_guard_mutex from mm_for_maps() and proc_pid_attr_write(), another task doing PTRACE_ATTACH should not hang until it is killed or the tracee resumes. With this patch do_execve() does not use ->cred_guard_mutex directly and we do not hold it throughout, instead: - introduce prepare_bprm_creds() helper, it locks the mutex and calls prepare_exec_creds() to initialize bprm->cred. - install_exec_creds() drops the mutex after commit_creds(), and thus before tracehook_report_exec()->ptrace_stop(). or, if exec fails, free_bprm() drops this mutex when bprm->cred != NULL which indicates install_exec_creds() was not called. Reported-by: Tom Horsley <tom.horsley@att.net> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-05 11:30:42 -07:00
Mimi Zohar	6777d773a4	kernel_read: redefine offset type vfs_read() offset is defined as loff_t, but kernel_read() offset is only defined as unsigned long. Redefine kernel_read() offset as loff_t. Cc: stable@kernel.org Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-08-24 14:58:23 +10:00
Oleg Nesterov	793285fcaf	cred_guard_mutex: do not return -EINTR to user-space do_execve() and ptrace_attach() return -EINTR if mutex_lock_interruptible(->cred_guard_mutex) fails. This is not right, change the code to return ERESTARTNOINTR. Perhaps we should also change proc_pid_attr_write(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: David Howells <dhowells@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-06 13:57:04 -07:00
Linus Torvalds	8a1ca8cedd	Merge branch 'perfcounters-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perfcounters-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (574 commits) perf_counter: Turn off by default perf_counter: Add counter->id to the throttle event perf_counter: Better align code perf_counter: Rename L2 to LL cache perf_counter: Standardize event names perf_counter: Rename enums perf_counter tools: Clean up u64 usage perf_counter: Rename perf_counter_limit sysctl perf_counter: More paranoia settings perf_counter: powerpc: Implement generalized cache events for POWER processors perf_counters: powerpc: Add support for POWER7 processors perf_counter: Accurate period data perf_counter: Introduce struct for sample data perf_counter tools: Normalize data using per sample period data perf_counter: Annotate exit ctx recursion perf_counter tools: Propagate signals properly perf_counter tools: Small frequency related fixes perf_counter: More aggressive frequency adjustment perf_counter/x86: Fix the model number of Intel Core2 processors perf_counter, x86: Correct some event and umask values for Intel processors ...	2009-06-11 14:01:07 -07:00
James Morris	2c9e703c61	Merge branch 'master' into next Conflicts: fs/exec.c Removed IMA changes (the IMA checks are now performed via may_open()). Signed-off-by: James Morris <jmorris@namei.org>	2009-05-22 18:40:59 +10:00
Mimi Zohar	b9fc745db8	integrity: path_check update - Add support in ima_path_check() for integrity checking without incrementing the counts. (Required for nfsd.) - rename and export opencount_get to ima_counts_get - replace ima_shm_check calls with ima_counts_get - export ima_path_check Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-05-22 09:43:41 +10:00
Ingo Molnar	dc3f81b129	Merge commit 'v2.6.30-rc6' into perfcounters/core Merge reason: this branch was on an -rc4 base, merge it up to -rc6 to get the latest upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-18 07:37:49 +02:00
David Howells	5e751e992f	CRED: Rename cred_exec_mutex to reflect that it's a guard against ptrace Rename cred_exec_mutex to reflect that it's a guard against foreign intervention on a process's credential state, such as is made by ptrace(). The attachment of a debugger to a process affects execve()'s calculation of the new credential state - _and_ also setprocattr()'s calculation of that state. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-05-11 08:15:36 +10:00
Al Viro	6e8341a11e	Switch open_exec() and sys_uselib() to do_open_filp() ... and make path_lookup_open() static Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-05-09 10:49:42 -04:00
Al Viro	a44ddbb6d8	Make open_exec() and sys_uselib() use may_open(), instead of duplicating its parts Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-05-09 10:49:42 -04:00
Ivan Kokshaysky	74641f584d	alpha: binfmt_aout fix This fixes the problem introduced by commit `3bfacef412` (get rid of special-casing the /sbin/loader on alpha): osf/1 ecoff binary segfaults when binfmt_aout built as module. That happens because aout binary handler gets on the top of the binfmt list due to late registration, and kernel attempts to execute the binary without preparatory work that must be done by binfmt_loader. Fixed by changing the registration order of the default binfmt handlers using list_add_tail() and introducing insert_binfmt() function which places new handler on the top of the binfmt list. This might be generally useful for installing arch-specific frontends for default handlers or just for overriding them. Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Richard Henderson <rth@twiddle.net Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-05-02 15:36:10 -07:00
Ingo Molnar	e7fd5d4b3d	Merge branch 'linus' into perfcounters/core Merge reason: This brach was on -rc1, refresh it to almost-rc4 to pick up the latest upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-29 14:47:05 +02:00
Oleg Nesterov	437f7fdb60	check_unsafe_exec: s/lock_task_sighand/rcu_read_lock/ write_lock(&current->fs->lock) guarantees we can't wrongly miss LSM_UNSAFE_SHARE, this is what we care about. Use rcu_read_lock() instead of ->siglock to iterate over the sub-threads. We must see all CLONE_THREAD\|CLONE_FS threads which didn't pass exit_fs(), it takes fs->lock too. With or without this patch we can miss the freshly cloned thread and set LSM_UNSAFE_SHARE, we don't care. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> [ Fixed lock/unlock typo - Hugh ] Acked-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-24 07:39:45 -07:00
Oleg Nesterov	8c652f96d3	do_execve() must not clear fs->in_exec if it was set by another thread If do_execve() fails after check_unsafe_exec(), it clears fs->in_exec unconditionally. This is wrong if we race with our sub-thread which also does do_execve: Two threads T1 and T2 and another process P, all share the same ->fs. T1 starts do_execve(BAD_FILE). It calls check_unsafe_exec(), since ->fs is shared, we set LSM_UNSAFE but not ->in_exec. P exits and decrements fs->users. T2 starts do_execve(), calls check_unsafe_exec(), now ->fs is not shared, we set fs->in_exec. T1 continues, open_exec(BAD_FILE) fails, we clear ->in_exec and return to the user-space. T1 does clone(CLONE_FS /* without CLONE_THREAD */). T2 continues without LSM_UNSAFE_SHARE while ->fs is shared with another process. Change check_unsafe_exec() to return res = 1 if we set ->in_exec, and change do_execve() to clear ->in_exec depending on res. When do_execve() suceeds, it is safe to clear ->in_exec unconditionally. It can be set only if we don't share ->fs with another process, and since we already killed all sub-threads either ->in_exec == 0 or we are the only user of this ->fs. Also, we do not need fs->lock to clear fs->in_exec. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Acked-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-24 07:39:45 -07:00
Peter Zijlstra	8d1b2d9361	perf_counter: track task-comm data Similar to the mmap data stream, add one that tracks the task COMM field, so that the userspace reporting knows what to call a task. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <20090408130409.127422406@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-08 19:05:47 +02:00
Ingo Molnar	f541ae326f	Merge branch 'linus' into perfcounters/core-v2 Merge reason: we have gathered quite a few conflicts, need to merge upstream Conflicts: arch/powerpc/kernel/Makefile arch/x86/ia32/ia32entry.S arch/x86/include/asm/hardirq.h arch/x86/include/asm/unistd_32.h arch/x86/include/asm/unistd_64.h arch/x86/kernel/cpu/common.c arch/x86/kernel/irq.c arch/x86/kernel/syscall_table_32.S arch/x86/mm/iomap_32.c include/linux/sched.h kernel/Makefile Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-06 09:02:57 +02:00
Al Viro	5ad4e53bd5	Get rid of indirect include of fs_struct.h Don't pull it in sched.h; very few files actually need it and those can include directly. sched.h itself only needs forward declaration of struct fs_struct; Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-31 23:00:27 -04:00
Al Viro	f1191b50ec	check_unsafe_exec() doesn't care about signal handlers sharing ... since we'll unshare sighand anyway Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-31 23:00:26 -04:00
Al Viro	498052bba5	New locking/refcounting for fs_struct * all changes of current->fs are done under task_lock and write_lock of old fs->lock * refcount is not atomic anymore (same protection) * its decrements are done when removing reference from current; at the same time we decide whether to free it. * put_fs_struct() is gone * new field - ->in_exec. Set by check_unsafe_exec() if we are trying to do execve() and only subthreads share fs_struct. Cleared when finishing exec (success and failure alike). Makes CLONE_FS fail with -EAGAIN if set. * check_unsafe_exec() may fail with -EAGAIN if another execve() from subthread is in progress. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-31 23:00:26 -04:00
Hugh Dickins	e426b64c41	fix setuid sometimes doesn't Joe Malicki reports that setuid sometimes doesn't: very rarely, a setuid root program does not get root euid; and, by the way, they have a health check running lsof every few minutes. Right, check_unsafe_exec() notes whether the files_struct is being shared by more threads than will get killed by the exec, and if so sets LSM_UNSAFE_SHARE to make bprm_set_creds() careful about euid. But /proc/<pid>/fd and /proc/<pid>/fdinfo lookups make transient use of get_files_struct(), which also raises that sharing count. There's a rather simple fix for this: exec's check on files->count has been redundant ever since 2.6.1 made it unshare_files() (except while compat_do_execve() omitted to do so) - just remove that check. [Note to -stable: this patch will not apply before 2.6.29: earlier releases should just remove the files->count line from unsafe_exec().] Reported-by: Joe Malicki <jmalicki@metacarta.com> Narrowed-down-by: Michael Itz <mitz@metacarta.com> Tested-by: Joe Malicki <jmalicki@metacarta.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-28 17:30:00 -07:00
James Morris	703a3cd728	Merge branch 'master' into next	2009-03-24 10:52:46 +11:00
Kentaro Takeda	f9ce1f1cda	Add in_execve flag into task_struct. This patch allows LSM modules to determine whether current process is in an execve operation or not so that they can behave differently while an execve operation is in progress. This patch is needed by TOMOYO. Please see another patch titled "LSM adapter functions." for backgrounds. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-12 15:15:03 +11:00
Ingo Molnar	95fd4845ed	Merge commit 'v2.6.29-rc4' into perfcounters/core Conflicts: arch/x86/kernel/setup_percpu.c arch/x86/mm/fault.c drivers/acpi/processor_idle.c kernel/irq/handle.c	2009-02-11 09:22:04 +01:00
David Howells	0bf2f3aec5	CRED: Fix SUID exec regression The patch: commit `a6f76f23d2` CRED: Make execve() take advantage of copy-on-write credentials moved the place in which the 'safeness' of a SUID/SGID exec was performed to before de_thread() was called. This means that LSM_UNSAFE_SHARE is now calculated incorrectly. This flag is set if any of the usage counts for fs_struct, files_struct and sighand_struct are greater than 1 at the time the determination is made. All of which are true for threads created by the pthread library. However, since we wish to make the security calculation before irrevocably damaging the process so that we can return it an error code in the case where we decide we want to reject the exec request on this basis, we have to make the determination before calling de_thread(). So, instead, we count up the number of threads (CLONE_THREAD) that are sharing our fs_struct (CLONE_FS), files_struct (CLONE_FILES) and sighand_structs (CLONE_SIGHAND/CLONE_THREAD) with us. These will be killed by de_thread() and so can be discounted by check_unsafe_exec(). We do have to be careful because CLONE_THREAD does not imply FS or FILES. We _assume_ that there will be no extra references to these structs held by the threads we're going to kill. This can be tested with the attached pair of programs. Build the two programs using the Makefile supplied, and run ./test1 as a non-root user. If successful, you should see something like: [dhowells@andromeda tmp]$ ./test1 --TEST1-- uid=4043, euid=4043 suid=4043 exec ./test2 --TEST2-- uid=4043, euid=0 suid=0 SUCCESS - Correct effective user ID and if unsuccessful, something like: [dhowells@andromeda tmp]$ ./test1 --TEST1-- uid=4043, euid=4043 suid=4043 exec ./test2 --TEST2-- uid=4043, euid=4043 suid=4043 ERROR - Incorrect effective user ID! The non-root user ID you see will depend on the user you run as. [test1.c] #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <pthread.h> static void thread_func(void arg) { while (1) {} } int main(int argc, char argv) { pthread_t tid; uid_t uid, euid, suid; printf("--TEST1--\n"); getresuid(&uid, &euid, &suid); printf("uid=%d, euid=%d suid=%d\n", uid, euid, suid); if (pthread_create(&tid, NULL, thread_func, NULL) < 0) { perror("pthread_create"); exit(1); } printf("exec ./test2\n"); execlp("./test2", "test2", NULL); perror("./test2"); _exit(1); } [test2.c] #include <stdio.h> #include <stdlib.h> #include <unistd.h> int main(int argc, char argv) { uid_t uid, euid, suid; getresuid(&uid, &euid, &suid); printf("--TEST2--\n"); printf("uid=%d, euid=%d suid=%d\n", uid, euid, suid); if (euid != 0) { fprintf(stderr, "ERROR - Incorrect effective user ID!\n"); exit(1); } printf("SUCCESS - Correct effective user ID\n"); exit(0); } [Makefile] CFLAGS = -D_GNU_SOURCE -Wall -Werror -Wunused all: test1 test2 test1: test1.c gcc $(CFLAGS) -o test1 test1.c -lpthread test2: test2.c gcc $(CFLAGS) -o test2 test2.c sudo chown root.root test2 sudo chmod +s test2 Reported-by: David Smith <dsmith@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: David Smith <dsmith@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-07 08:46:18 +11:00
James Morris	cb5629b10d	Merge branch 'master' into next Conflicts: fs/namei.c Manually merged per: diff --cc fs/namei.c index 734f2b5,bbc15c2..0000000 --- a/fs/namei.c +++ b/fs/namei.c @@@ -860,9 -848,8 +849,10 @@@ static int __link_path_walk(const char nd->flags \|= LOOKUP_CONTINUE; err = exec_permission_lite(inode); if (err == -EAGAIN) - err = vfs_permission(nd, MAY_EXEC); + err = inode_permission(nd->path.dentry->d_inode, + MAY_EXEC); + if (!err) + err = ima_path_check(&nd->path, MAY_EXEC); if (err) break; @@@ -1525,14 -1506,9 +1509,14 @@@ int may_open(struct path path, int acc flag &= ~O_TRUNC; } - error = vfs_permission(nd, acc_mode); + error = inode_permission(inode, acc_mode); if (error) return error; + - error = ima_path_check(&nd->path, ++ error = ima_path_check(path, + acc_mode & (MAY_READ \| MAY_WRITE \| MAY_EXEC)); + if (error) + return error; / * An append-only file must be opened in append mode for writing. */ Signed-off-by: James Morris <jmorris@namei.org>	2009-02-06 11:01:45 +11:00
Mimi Zohar	6146f0d5e4	integrity: IMA hooks This patch replaces the generic integrity hooks, for which IMA registered itself, with IMA integrity hooks in the appropriate places directly in the fs directory. Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-06 09:05:30 +11:00
Ingo Molnar	77835492ed	Merge commit 'v2.6.29-rc2' into perfcounters/core Conflicts: include/linux/syscalls.h	2009-01-21 16:37:27 +01:00
Heiko Carstens	1e7bfb2134	[CVE-2009-0029] System call wrappers part 27 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2009-01-14 14:15:29 +01:00
Ingo Molnar	506c10f26c	Merge commit 'v2.6.29-rc1' into perfcounters/core Conflicts: include/linux/kernel_stat.h	2009-01-11 02:42:53 +01:00
WANG Cong	8cd3ac3aca	fs/exec.c: make do_coredump() void No one cares do_coredump()'s return value, and also it seems that it is also not necessary. So make it void. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: WANG Cong <wangcong@zeuux.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-06 15:59:29 -08:00
Tetsuo Handa	350eaf791b	do_coredump(): check return from argv_split() do_coredump() accesses helper_argv[0] without checking helper_argv != NULL. This can happen if page allocation failed. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-06 15:59:14 -08:00
Luiz Fernando N. Capitulino	eaccbfa564	fs/exec.c:__bprm_mm_init(): clean up error handling Untangle the error unwinding in this function, saving a test of local variable `vma'. Signed-off-by: Luiz Fernando N. Capitulino <lcapitulino@mandriva.com.br> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-06 15:59:11 -08:00
Eric Paris	6110e3abbf	sys_execve and sys_uselib do not call into fsnotify sys_execve and sys_uselib do not call into fsnotify so inotify does not get open events for these types of syscalls. This patch simply makes the requisite fsnotify calls. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-01-05 11:54:28 -05:00
Al Viro	3bfacef412	get rid of special-casing the /sbin/loader on alpha ... just make it a binfmt handler like #! one. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-03 11:45:54 -08:00
Christoph Hellwig	cb23beb551	kill vfs_permission With all the nameidata removal there's no point anymore for this helper. Of the three callers left two will go away with the next lookup series anyway. Also add proper kerneldoc to inode_permission as this is the main permission check routine now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-12-31 18:07:41 -05:00
Linus Torvalds	bb758e9637	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: hrtimers: fix warning in kernel/hrtimer.c x86: make sure we really have an hpet mapping before using it x86: enable HPET on Fujitsu u9200 linux/timex.h: cleanup for userspace posix-timers: simplify de_thread()->exit_itimers() path posix-timers: check ->it_signal instead of ->it_pid to validate the timer posix-timers: use "struct pid" instead of "struct task_struct" nohz: suppress needless timer reprogramming clocksource, acpi_pm.c: put acpi_pm_read_slow() under CONFIG_PCI nohz: no softirq pending warnings for offline cpus hrtimer: removing all ur callback modes, fix hrtimer: removing all ur callback modes, fix hotplug hrtimer: removing all ur callback modes x86: correct link to HPET timer specification rtc-cmos: export second NVRAM bank Fixed up conflicts in sound/drivers/pcsp/pcsp.c and sound/core/hrtimer.c manually.	2008-12-30 16:16:21 -08:00
Ingo Molnar	e1df957670	Merge branch 'linus' into perfcounters/core Conflicts: fs/exec.c include/linux/init_task.h Simple context conflicts.	2008-12-29 09:45:15 +01:00
James Morris	cbacc2c7f0	Merge branch 'next' into for-linus	2008-12-25 11:40:09 +11:00
Ingo Molnar	f65cb45cba	perfcounters: flush on setuid exec Pavel Machek pointed out that performance counters should be flushed when crossing protection domains on setuid execution. Reported-by: Pavel Machek <pavel@suse.cz> Acked-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-16 14:00:15 +01:00
Oleg Nesterov	8187926bda	posix-timers: simplify de_thread()->exit_itimers() path Impact: simplify code de_thread() postpones release_task(leader) until after exit_itimers(). This was needed because !SIGEV_THREAD_ID timers could use ->group_leader without get_task_struct(). With the recent changes we can release the leader earlier and simplify the code. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-12-12 17:01:03 +01:00
Roland McGrath	85f334666a	tracehook: exec double-reporting fix The patch `6341c39` "tracehook: exec" introduced a small regression in 2.6.27 regarding binfmt_misc exec event reporting. Since the reporting is now done in the common search_binary_handler() function, an exec of a misc binary will result in two (or possibly multiple) exec events being reported, instead of just a single one, because the misc handler contains a recursive call to search_binary_handler. To add to the confusion, if PTRACE_O_TRACEEXEC is not active, the multiple SIGTRAP signals will in fact cause only a single ptrace intercept, as the signals are not queued. However, if PTRACE_O_TRACEEXEC is on, the debugger will actually see multiple ptrace intercepts (PTRACE_EVENT_EXEC). The test program included below demonstrates the problem. This change fixes the bug by calling tracehook_report_exec() only in the outermost search_binary_handler() call (bprm->recursion_depth == 0). The additional change to restore bprm->recursion_depth after each binfmt load_binary call is actually superfluous for this bug, since we test the value saved on entry to search_binary_handler(). But it keeps the use of of the depth count to its most obvious expected meaning. Depending on what binfmt handlers do in certain cases, there could have been false-positive tests for recursion limits before this change. /* Test program using PTRACE_O_TRACEEXEC. This forks and exec's the first argument with the rest of the arguments, while ptrace'ing. It expects to see one PTRACE_EVENT_EXEC stop and then a successful exit, with no other signals or events in between. Test for kernel doing two PTRACE_EVENT_EXEC stops for a binfmt_misc exec: $ gcc -g traceexec.c -o traceexec $ sudo sh -c 'echo :test:M::foobar::/bin/cat: > /proc/sys/fs/binfmt_misc/register' $ echo 'foobar test' > ./foobar $ chmod +x ./foobar $ ./traceexec ./foobar; echo $? ==> good <== foobar test 0 $ ==> bad <== foobar test unexpected status 0x4057f != 0 3 $ / #include <stdio.h> #include <sys/types.h> #include <sys/wait.h> #include <sys/ptrace.h> #include <unistd.h> #include <signal.h> #include <stdlib.h> static void wait_for (pid_t child, int expect) { int status; pid_t p = wait (&status); if (p != child) { perror ("wait"); exit (2); } if (status != expect) { fprintf (stderr, "unexpected status %#x != %#x\n", status, expect); exit (3); } } int main (int argc, char argv) { pid_t child = fork (); if (child < 0) { perror ("fork"); return 127; } else if (child == 0) { ptrace (PTRACE_TRACEME); raise (SIGUSR1); execv (argv[1], &argv[1]); perror ("execve"); _exit (127); } wait_for (child, W_STOPCODE (SIGUSR1)); if (ptrace (PTRACE_SETOPTIONS, child, 0L, (void ) (long) PTRACE_O_TRACEEXEC) != 0) { perror ("PTRACE_SETOPTIONS"); return 4; } if (ptrace (PTRACE_CONT, child, 0L, 0L) != 0) { perror ("PTRACE_CONT"); return 5; } wait_for (child, W_STOPCODE (SIGTRAP \| (PTRACE_EVENT_EXEC << 8))); if (ptrace (PTRACE_CONT, child, 0L, 0L) != 0) { perror ("PTRACE_CONT"); return 6; } wait_for (child, W_EXITCODE (0, 0)); return 0; } Reported-by: Arnd Bergmann <arnd@arndb.de> CC: Ulrich Weigand <ulrich.weigand@de.ibm.com> Signed-off-by: Roland McGrath <roland@redhat.com>	2008-12-09 19:36:38 -08:00
David Howells	a6f76f23d2	CRED: Make execve() take advantage of copy-on-write credentials Make execve() take advantage of copy-on-write credentials, allowing it to set up the credentials in advance, and then commit the whole lot after the point of no return. This patch and the preceding patches have been tested with the LTP SELinux testsuite. This patch makes several logical sets of alteration: (1) execve(). The credential bits from struct linux_binprm are, for the most part, replaced with a single credentials pointer (bprm->cred). This means that all the creds can be calculated in advance and then applied at the point of no return with no possibility of failure. I would like to replace bprm->cap_effective with: cap_isclear(bprm->cap_effective) but this seems impossible due to special behaviour for processes of pid 1 (they always retain their parent's capability masks where normally they'd be changed - see cap_bprm_set_creds()). The following sequence of events now happens: (a) At the start of do_execve, the current task's cred_exec_mutex is locked to prevent PTRACE_ATTACH from obsoleting the calculation of creds that we make. (a) prepare_exec_creds() is then called to make a copy of the current task's credentials and prepare it. This copy is then assigned to bprm->cred. This renders security_bprm_alloc() and security_bprm_free() unnecessary, and so they've been removed. (b) The determination of unsafe execution is now performed immediately after (a) rather than later on in the code. The result is stored in bprm->unsafe for future reference. (c) prepare_binprm() is called, possibly multiple times. (i) This applies the result of set[ug]id binaries to the new creds attached to bprm->cred. Personality bit clearance is recorded, but now deferred on the basis that the exec procedure may yet fail. (ii) This then calls the new security_bprm_set_creds(). This should calculate the new LSM and capability credentials into bprm->cred. This folds together security_bprm_set() and parts of security_bprm_apply_creds() (these two have been removed). Anything that might fail must be done at this point. (iii) bprm->cred_prepared is set to 1. bprm->cred_prepared is 0 on the first pass of the security calculations, and 1 on all subsequent passes. This allows SELinux in (ii) to base its calculations only on the initial script and not on the interpreter. (d) flush_old_exec() is called to commit the task to execution. This performs the following steps with regard to credentials: (i) Clear pdeath_signal and set dumpable on certain circumstances that may not be covered by commit_creds(). (ii) Clear any bits in current->personality that were deferred from (c.i). (e) install_exec_creds() [compute_creds() as was] is called to install the new credentials. This performs the following steps with regard to credentials: (i) Calls security_bprm_committing_creds() to apply any security requirements, such as flushing unauthorised files in SELinux, that must be done before the credentials are changed. This is made up of bits of security_bprm_apply_creds() and security_bprm_post_apply_creds(), both of which have been removed. This function is not allowed to fail; anything that might fail must have been done in (c.ii). (ii) Calls commit_creds() to apply the new credentials in a single assignment (more or less). Possibly pdeath_signal and dumpable should be part of struct creds. (iii) Unlocks the task's cred_replace_mutex, thus allowing PTRACE_ATTACH to take place. (iv) Clears The bprm->cred pointer as the credentials it was holding are now immutable. (v) Calls security_bprm_committed_creds() to apply any security alterations that must be done after the creds have been changed. SELinux uses this to flush signals and signal handlers. (f) If an error occurs before (d.i), bprm_free() will call abort_creds() to destroy the proposed new credentials and will then unlock cred_replace_mutex. No changes to the credentials will have been made. (2) LSM interface. A number of functions have been changed, added or removed: () security_bprm_alloc(), ->bprm_alloc_security() () security_bprm_free(), ->bprm_free_security() Removed in favour of preparing new credentials and modifying those. () security_bprm_apply_creds(), ->bprm_apply_creds() () security_bprm_post_apply_creds(), ->bprm_post_apply_creds() Removed; split between security_bprm_set_creds(), security_bprm_committing_creds() and security_bprm_committed_creds(). () security_bprm_set(), ->bprm_set_security() Removed; folded into security_bprm_set_creds(). () security_bprm_set_creds(), ->bprm_set_creds() New. The new credentials in bprm->creds should be checked and set up as appropriate. bprm->cred_prepared is 0 on the first call, 1 on the second and subsequent calls. () security_bprm_committing_creds(), ->bprm_committing_creds() (*) security_bprm_committed_creds(), ->bprm_committed_creds() New. Apply the security effects of the new credentials. This includes closing unauthorised files in SELinux. This function may not fail. When the former is called, the creds haven't yet been applied to the process; when the latter is called, they have. The former may access bprm->cred, the latter may not. (3) SELinux. SELinux has a number of changes, in addition to those to support the LSM interface changes mentioned above: (a) The bprm_security_struct struct has been removed in favour of using the credentials-under-construction approach. (c) flush_unauthorized_files() now takes a cred pointer and passes it on to inode_has_perm(), file_has_perm() and dentry_open(). Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:24 +11:00
David Howells	d84f4f992c	CRED: Inaugurate COW credentials Inaugurate copy-on-write credentials management. This uses RCU to manage the credentials pointer in the task_struct with respect to accesses by other tasks. A process may only modify its own credentials, and so does not need locking to access or modify its own credentials. A mutex (cred_replace_mutex) is added to the task_struct to control the effect of PTRACE_ATTACHED on credential calculations, particularly with respect to execve(). With this patch, the contents of an active credentials struct may not be changed directly; rather a new set of credentials must be prepared, modified and committed using something like the following sequence of events: struct cred new = prepare_creds(); int ret = blah(new); if (ret < 0) { abort_creds(new); return ret; } return commit_creds(new); There are some exceptions to this rule: the keyrings pointed to by the active credentials may be instantiated - keyrings violate the COW rule as managing COW keyrings is tricky, given that it is possible for a task to directly alter the keys in a keyring in use by another task. To help enforce this, various pointers to sets of credentials, such as those in the task_struct, are declared const. The purpose of this is compile-time discouragement of altering credentials through those pointers. Once a set of credentials has been made public through one of these pointers, it may not be modified, except under special circumstances: (1) Its reference count may incremented and decremented. (2) The keyrings to which it points may be modified, but not replaced. The only safe way to modify anything else is to create a replacement and commit using the functions described in Documentation/credentials.txt (which will be added by a later patch). This patch and the preceding patches have been tested with the LTP SELinux testsuite. This patch makes several logical sets of alteration: (1) execve(). This now prepares and commits credentials in various places in the security code rather than altering the current creds directly. (2) Temporary credential overrides. do_coredump() and sys_faccessat() now prepare their own credentials and temporarily override the ones currently on the acting thread, whilst preventing interference from other threads by holding cred_replace_mutex on the thread being dumped. This will be replaced in a future patch by something that hands down the credentials directly to the functions being called, rather than altering the task's objective credentials. (3) LSM interface. A number of functions have been changed, added or removed: () security_capset_check(), ->capset_check() () security_capset_set(), ->capset_set() Removed in favour of security_capset(). () security_capset(), ->capset() New. This is passed a pointer to the new creds, a pointer to the old creds and the proposed capability sets. It should fill in the new creds or return an error. All pointers, barring the pointer to the new creds, are now const. () security_bprm_apply_creds(), ->bprm_apply_creds() Changed; now returns a value, which will cause the process to be killed if it's an error. () security_task_alloc(), ->task_alloc_security() Removed in favour of security_prepare_creds(). () security_cred_free(), ->cred_free() New. Free security data attached to cred->security. () security_prepare_creds(), ->cred_prepare() New. Duplicate any security data attached to cred->security. () security_commit_creds(), ->cred_commit() New. Apply any security effects for the upcoming installation of new security by commit_creds(). () security_task_post_setuid(), ->task_post_setuid() Removed in favour of security_task_fix_setuid(). () security_task_fix_setuid(), ->task_fix_setuid() Fix up the proposed new credentials for setuid(). This is used by cap_set_fix_setuid() to implicitly adjust capabilities in line with setuid() changes. Changes are made to the new credentials, rather than the task itself as in security_task_post_setuid(). () security_task_reparent_to_init(), ->task_reparent_to_init() Removed. Instead the task being reparented to init is referred directly to init's credentials. NOTE! This results in the loss of some state: SELinux's osid no longer records the sid of the thread that forked it. () security_key_alloc(), ->key_alloc() () security_key_permission(), ->key_permission() Changed. These now take cred pointers rather than task pointers to refer to the security context. (4) sys_capset(). This has been simplified and uses less locking. The LSM functions it calls have been merged. (5) reparent_to_kthreadd(). This gives the current thread the same credentials as init by simply using commit_thread() to point that way. (6) __sigqueue_alloc() and switch_uid() __sigqueue_alloc() can't stop the target task from changing its creds beneath it, so this function gets a reference to the currently applicable user_struct which it then passes into the sigqueue struct it returns if successful. switch_uid() is now called from commit_creds(), and possibly should be folded into that. commit_creds() should take care of protecting __sigqueue_alloc(). (7) [sg]et[ug]id() and co and [sg]et_current_groups. The set functions now all use prepare_creds(), commit_creds() and abort_creds() to build and check a new set of credentials before applying it. security_task_set[ug]id() is called inside the prepared section. This guarantees that nothing else will affect the creds until we've finished. The calling of set_dumpable() has been moved into commit_creds(). Much of the functionality of set_user() has been moved into commit_creds(). The get functions all simply access the data directly. (8) security_task_prctl() and cap_task_prctl(). security_task_prctl() has been modified to return -ENOSYS if it doesn't want to handle a function, or otherwise return the return value directly rather than through an argument. Additionally, cap_task_prctl() now prepares a new set of credentials, even if it doesn't end up using it. (9) Keyrings. A number of changes have been made to the keyrings code: (a) switch_uid_keyring(), copy_keys(), exit_keys() and suid_keys() have all been dropped and built in to the credentials functions directly. They may want separating out again later. (b) key_alloc() and search_process_keyrings() now take a cred pointer rather than a task pointer to specify the security context. (c) copy_creds() gives a new thread within the same thread group a new thread keyring if its parent had one, otherwise it discards the thread keyring. (d) The authorisation key now points directly to the credentials to extend the search into rather pointing to the task that carries them. (e) Installing thread, process or session keyrings causes a new set of credentials to be created, even though it's not strictly necessary for process or session keyrings (they're shared). (10) Usermode helper. The usermode helper code now carries a cred struct pointer in its subprocess_info struct instead of a new session keyring pointer. This set of credentials is derived from init_cred and installed on the new process after it has been cloned. call_usermodehelper_setup() allocates the new credentials and call_usermodehelper_freeinfo() discards them if they haven't been used. A special cred function (prepare_usermodeinfo_creds()) is provided specifically for call_usermodehelper_setup() to call. call_usermodehelper_setkeys() adjusts the credentials to sport the supplied keyring as the new session keyring. (11) SELinux. SELinux has a number of changes, in addition to those to support the LSM interface changes mentioned above: (a) selinux_setprocattr() no longer does its check for whether the current ptracer can access processes with the new SID inside the lock that covers getting the ptracer's SID. Whilst this lock ensures that the check is done with the ptracer pinned, the result is only valid until the lock is released, so there's no point doing it inside the lock. (12) is_single_threaded(). This function has been extracted from selinux_setprocattr() and put into a file of its own in the lib/ directory as join_session_keyring() now wants to use it too. The code in SELinux just checked to see whether a task shared mm_structs with other tasks (CLONE_VM), but that isn't good enough. We really want to know if they're part of the same thread group (CLONE_THREAD). (13) nfsd. The NFS server daemon now has to use the COW credentials to set the credentials it is going to use. It really needs to pass the credentials down to the functions it calls, but it can't do that until other patches in this series have been applied. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:23 +11:00
David Howells	86a264abe5	CRED: Wrap current->cred and a few other accessors Wrap current->cred and a few other accessors to hide their actual implementation. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:18 +11:00
David Howells	b6dff3ec5e	CRED: Separate task security context from task_struct Separate the task security context from task_struct. At this point, the security data is temporarily embedded in the task_struct with two pointers pointing to it. Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in entry.S via asm-offsets. With comment fixes Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:16 +11:00
David Howells	da9592edeb	CRED: Wrap task credential accesses in the filesystem subsystem Wrap access to task credentials so that they can be separated more easily from the task_struct during the introduction of COW creds. Change most current->(\|e\|s\|fs)[ug]id to current_(\|e\|s\|fs)[ug]id(). Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more sense to use RCU directly rather than a convenient wrapper; these will be addressed by later patches. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:05 +11:00
Oleg Nesterov	6409324b38	coredump: format_corename: don't append .%pid if multi-threaded If the coredumping is multi-threaded, format_corename() appends .%pid to the corename. This was needed before the proper multi-thread core dump support, now all the threads in the mm go into a single unified core file. Remove this special case, it is not even documented and we have "%p" and core_uses_pid. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Roland McGrath <roland@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: La Monte Yarroll <piggy@laurelnetworks.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:39 -07:00
Linus Torvalds	c8d8a2321f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: module: remove CONFIG_KMOD in comment after #endif remove CONFIG_KMOD from fs remove CONFIG_KMOD from drivers Manually fix conflict due to include cleanups in drivers/md/md.c	2008-10-16 12:38:34 -07:00
Oleg Nesterov	07edbde508	pid_ns: de_thread: kill the now unneeded ->child_reaper change de_thread() checks if the old leader was the ->child_reaper, this is not possible any longer. With the previous patch ->group_leader itself will change ->child_reaper on exit. Henceforth find_new_reaper() is the only function (apart from initialization) which plays with ->child_reaper. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-16 11:21:47 -07:00
Kirill A. Shutemov	53112488be	alpha: introduce field 'taso' into struct linux_binprm This change is Alpha-specific. It adds field 'taso' into struct linux_binprm to remember if the application is TASO. Previously, field sh_bang was used for this purpose. Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-16 11:21:38 -07:00

1 2 3 4

200 commits