linux/fs
Daisuke HATAYAMA 05f47fda9f coredump: unify dump_seek() implementations for each binfmt_*.c
The current ELF dumper can produce broken corefiles if program headers
exceed 65535.  In particular, the program in 64-bit environment often
demands more than 65535 mmaps.  If you google max_map_count, then you can
find many users facing this problem.

Solaris has already dealt with this issue, and other OSes have also
adopted the same method as in Solaris.  Currently, Sun's document and AMD
64 ABI include the description for the extension, where they call the
extension Extended Numbering.  See Reference for further information.

I believe that linux kernel should adopt the same way as they did, so I've
written this patch.

I am also preparing for patches of GDB and binutils.

How to fix
==========

In new dumping process, there are two cases according to weather or
not the number of program headers is equal to or more than 65535.

 - if less than 65535, the produced corefile format is exactly the same
   as the ordinary one.

 - if equal to or more than 65535, then e_phnum field is set to newly
   introduced constant PN_XNUM(0xffff) and the actual number of program
   headers is set to sh_info field of the section header at index 0.

Compatibility Concern
=====================

 * As already mentioned in Summary, Sun and AMD64 has already adopted
   this.  See Reference.

 * There are four combinations according to whether kernel and userland
   tools are respectively modified or not.  The next table summarizes
   shortly for each combination.

                  ---------------------------------------------
                     Original Kernel    |   Modified Kernel
                  ---------------------------------------------
    	            < 65535  | >= 65535 | < 65535  | >= 65535
  -------------------------------------------------------------
   Original Tools |    OK    |  broken  |   OK     | broken (#)
  -------------------------------------------------------------
   Modified Tools |    OK    |  broken  |   OK     |    OK
  -------------------------------------------------------------

  Note that there is no case that `OK' changes to `broken'.

  (#) Although this case remains broken, O-M behaves better than
  O-O. That is, while in O-O case e_phnum field would be extremely
  small due to integer overflow, in O-M case it is guaranteed to be at
  least 65535 by being set to PN_XNUM(0xFFFF), much closer to the
  actual correct value than the O-O case.

Test Program
============

Here is a test program mkmmaps.c that is useful to produce the
corefile with many mmaps. To use this, please take the following
steps:

$ ulimit -c unlimited
$ sysctl vm.max_map_count=70000 # default 65530 is too small
$ sysctl fs.file-max=70000
$ mkmmaps 65535

Then, the program will abort and a corefile will be generated.

If failed, there are two cases according to the error message
displayed.

 * ``out of memory'' means vm.max_map_count is still smaller

 * ``too many open files'' means fs.file-max is still smaller

So, please change it to a larger value, and then retry it.

mkmmaps.c
==
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <unistd.h>
int main(int argc, char **argv)
{
	int maps_num;
	if (argc < 2) {
		fprintf(stderr, "mkmmaps [number of maps to be created]\n");
		exit(1);
	}
	if (sscanf(argv[1], "%d", &maps_num) == EOF) {
		perror("sscanf");
		exit(2);
	}
	if (maps_num < 0) {
		fprintf(stderr, "%d is invalid\n", maps_num);
		exit(3);
	}
	for (; maps_num > 0; --maps_num) {
		if (MAP_FAILED == mmap((void *)NULL, (size_t) 1, PROT_READ,
					MAP_SHARED | MAP_ANONYMOUS, (int) -1,
					(off_t) NULL)) {
			perror("mmap");
			exit(4);
		}
	}
	abort();
	{
		char buffer[128];
		sprintf(buffer, "wc -l /proc/%u/maps", getpid());
		system(buffer);
	}
	return 0;
}

Tested on i386, ia64 and um/sys-i386.
Built on sh4 (which covers fs/binfmt_elf_fdpic.c)

References
==========

 - Sun microsystems: Linker and Libraries.
   Part No: 817-1984-17, September 2008.
   URL: http://docs.sun.com/app/docs/doc/817-1984

 - System V ABI AMD64 Architecture Processor Supplement
   Draft Version 0.99., May 11, 2009.
   URL: http://www.x86-64.org/

This patch:

There are three different definitions for dump_seek() functions in
binfmt_aout.c, binfmt_elf.c and binfmt_elf_fdpic.c, respectively.  The
only for binfmt_elf.c.

My next patch will move dump_seek() into a header file in order to share
the same implementations for dump_write() and dump_seek().  As the first
step, this patch unify these three definitions for dump_seek() by applying
the past commits that have been applied only for binfmt_elf.c.

Specifically, the modification made here is part of the following commits:

  * d025c9db7f
  * 7f14daa19e

This patch does not change a shape of corefiles.

Signed-off-by: Daisuke HATAYAMA <d.hatayama@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-06 11:26:45 -08:00
..
9p fs/9p: Add hardlink support to .u extension 2010-03-05 15:04:42 -06:00
adfs pass writeback_control to ->write_inode 2010-03-05 13:25:52 -05:00
affs pass writeback_control to ->write_inode 2010-03-05 13:25:52 -05:00
afs make sure data is on disk before calling ->write_inode 2010-03-05 13:25:10 -05:00
autofs trivial: remove unnecessary semicolons 2009-09-21 15:14:58 +02:00
autofs4 Use kill_litter_super() in autofs4 ->kill_sb() 2010-03-03 14:07:54 -05:00
befs befs: fix leak 2010-02-07 03:06:21 -05:00
bfs pass writeback_control to ->write_inode 2010-03-05 13:25:52 -05:00
btrfs pass writeback_control to ->write_inode 2010-03-05 13:25:52 -05:00
cachefiles CacheFiles: Fix a race in cachefiles_delete_object() vs rename 2010-02-20 10:06:35 -05:00
cifs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2010-03-04 08:15:33 -08:00
coda sysctl: Drop & in front of every proc_handler. 2009-11-18 08:37:40 -08:00
configfs Fix configfs leak 2010-01-14 09:05:42 -05:00
cramfs
debugfs Lose the new_name argument of fsnotify_move() 2010-02-08 14:38:36 -05:00
devpts devpts_get_tty() should validate inode 2009-12-11 15:18:05 -08:00
dlm dlm: use bastmode in debugfs output 2010-02-26 12:15:54 -06:00
ecryptfs ecryptfs: use after free 2010-01-19 22:36:06 -06:00
efs
exofs pass writeback_control to ->write_inode 2010-03-05 13:25:52 -05:00
exportfs nfs: new subdir Documentation/filesystems/nfs 2009-10-27 19:34:04 -04:00
ext2 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 2010-03-05 13:20:53 -08:00
ext3 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 2010-03-05 13:20:53 -08:00
ext4 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 2010-03-05 13:20:53 -08:00
fat pass writeback_control to ->write_inode 2010-03-05 13:25:52 -05:00
freevxfs
fscache FS-Cache: Avoid maybe-used-uninitialised warning on variable 2009-12-16 07:20:13 -08:00
fuse Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse 2010-03-03 08:08:21 -08:00
gfs2 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 2010-03-05 13:20:53 -08:00
hfs pass writeback_control to ->write_inode 2010-03-05 13:25:52 -05:00
hfsplus pass writeback_control to ->write_inode 2010-03-05 13:25:52 -05:00
hostfs
hpfs Don't mess with generic_permission() under ->d_lock in hpfs 2010-03-03 14:07:58 -05:00
hppfs hppfs can use existing proc_mnt, no need for do_kern_mount() in there 2010-03-03 14:08:00 -05:00
hugetlbfs Untangling ima mess, part 1: alloc_file() 2009-12-16 12:16:47 -05:00
isofs Merge branch 'for-2.6.33' of git://linux-nfs.org/~bfields/linux 2009-12-16 10:43:34 -08:00
jbd jbd: Delay discarding buffers in journal_unmap_buffer 2010-03-05 00:20:26 +01:00
jbd2 jbd2: clean up an assertion in jbd2_journal_commit_transaction() 2010-02-24 12:11:20 -05:00
jffs2 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2009-12-16 12:04:02 -08:00
jfs Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 2010-03-05 13:20:53 -08:00
lockd Merge branch 'for-2.6.33' of git://linux-nfs.org/~bfields/linux 2009-12-16 10:43:34 -08:00
minix pass writeback_control to ->write_inode 2010-03-05 13:25:52 -05:00
ncpfs tree-wide: fix assorted typos all over the place 2009-12-04 15:39:55 +01:00
nfs Merge branch 'writeback-for-2.6.34' into nfs-for-2.6.34 2010-03-05 15:46:18 -05:00
nfs_common
nfsd vfs: take f_lock on modifying f_mode after open time 2010-03-06 11:26:25 -08:00
nilfs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2010-03-04 08:15:33 -08:00
nls Merge git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6 2009-09-30 09:31:14 -07:00
notify switch inotify_user to anon_inode 2010-02-19 03:35:12 -05:00
ntfs pass writeback_control to ->write_inode 2010-03-05 13:25:52 -05:00
ocfs2 bitops: rename for_each_bit() to for_each_set_bit() 2010-03-06 11:26:23 -08:00
omfs pass writeback_control to ->write_inode 2010-03-05 13:25:52 -05:00
openpromfs
partitions block: Stop using byte offsets 2010-01-11 14:30:09 +01:00
proc proc: warn on non-existing proc entries 2010-03-06 11:26:45 -08:00
qnx4 qnx4: use hweight8 2009-12-16 07:20:18 -08:00
quota quota: stop using QUOTA_OK / NO_QUOTA 2010-03-05 00:20:31 +01:00
ramfs nommu: fix shared mmap after truncate shrinkage problems 2010-01-16 12:15:40 -08:00
reiserfs Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 2010-03-05 13:20:53 -08:00
romfs fix leak in romfs_fill_super() 2010-01-26 22:22:26 -05:00
smbfs fs: Make unload_nls() NULL pointer safe 2009-09-24 07:47:42 -04:00
squashfs Squashfs: get rid of obsolete definition in header file 2010-03-05 15:35:35 +00:00
sysfs sysfs: sysfs_sd_setattr set iattrs unconditionally 2010-02-16 15:42:42 -08:00
sysv pass writeback_control to ->write_inode 2010-03-05 13:25:52 -05:00
ubifs lib: build list_sort() only if needed 2010-03-06 11:26:35 -08:00
udf Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 2010-03-05 13:20:53 -08:00
ufs Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 2010-03-05 13:20:53 -08:00
xfs Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 2010-03-05 13:20:53 -08:00
Kconfig Revert "task_struct: make journal_info conditional" 2009-12-17 13:23:24 -08:00
Kconfig.binfmt
Makefile
aio.c aio: remove unused field 2009-12-16 07:20:13 -08:00
anon_inodes.c Sanitize f_flags helpers 2009-12-22 12:27:34 -05:00
attr.c fs: use rlimit helpers 2010-03-06 11:26:29 -08:00
bad_inode.c
binfmt_aout.c coredump: unify dump_seek() implementations for each binfmt_*.c 2010-03-06 11:26:45 -08:00
binfmt_elf.c Split 'flush_old_exec' into two functions 2010-01-29 08:22:01 -08:00
binfmt_elf_fdpic.c coredump: unify dump_seek() implementations for each binfmt_*.c 2010-03-06 11:26:45 -08:00
binfmt_em86.c
binfmt_flat.c fs: use rlimit helpers 2010-03-06 11:26:29 -08:00
binfmt_misc.c
binfmt_script.c
binfmt_som.c Split 'flush_old_exec' into two functions 2010-01-29 08:22:01 -08:00
bio-integrity.c block: fix bugs in bio-integrity mempool usage 2010-01-30 20:28:19 +01:00
bio.c Revert "blkdev: fix merge_bvec_fn return value checks" 2010-03-02 19:17:34 +01:00
block_dev.c freeze_bdev: don't deactivate successfully frozen MS_RDONLY sb 2010-02-07 03:06:21 -05:00
buffer.c Merge branch 'writeback' of git://git.kernel.dk/linux-2.6-block 2009-09-25 09:27:30 -07:00
char_dev.c fs/char_dev.c: remove useless loop 2009-09-24 07:21:03 -07:00
compat.c compat.c: Remove dependence on nfsd private headers 2009-12-14 18:12:10 -05:00
compat_binfmt_elf.c
compat_ioctl.c fs/compat_ioctl.c: suppress two warnings 2010-03-06 11:26:35 -08:00
dcache.c fix race in d_splice_alias() 2010-03-03 14:13:08 -05:00
dcookies.c
direct-io.c dio: fix use-after-free 2009-12-17 04:52:13 -05:00
drop_caches.c sysctl: remove "struct file *" argument of ->proc_handler 2009-09-24 07:21:04 -07:00
eventfd.c eventfd - allow atomic read and waitqueue remove 2010-01-25 12:26:38 -02:00
eventpoll.c anonfd: Allow making anon files read-only 2009-12-22 12:27:34 -05:00
exec.c exec: create initial stack independent of PAGE_SIZE 2010-03-06 11:26:33 -08:00
fcntl.c fs: use rlimit helpers 2010-03-06 11:26:29 -08:00
fifo.c
file.c fs: use rlimit helpers 2010-03-06 11:26:29 -08:00
file_table.c vfs: take f_lock on modifying f_mode after open time 2010-03-06 11:26:25 -08:00
filesystems.c
fs-writeback.c pass writeback_control to ->write_inode 2010-03-05 13:25:52 -05:00
fs_struct.c
generic_acl.c make generic_acl slightly more generic 2009-12-16 12:16:49 -05:00
inode.c dquot: move dquot initialization responsibility into the filesystem 2010-03-05 00:20:30 +01:00
internal.h Take vfsmount_lock to fs/internal.h 2010-03-03 14:07:59 -05:00
ioctl.c __generic_block_fiemap(): fix for files bigger than 4GB 2009-11-12 07:26:01 -08:00
ioprio.c
libfs.c libfs: Unexport and kill simple_prepare_write 2010-03-03 13:00:17 -05:00
locks.c Switch may_open() and break_lease() to passing O_... 2010-03-03 13:00:21 -05:00
mbcache.c
mpage.c
namei.c Fix a dumb typo - use of & instead of && 2010-03-06 10:54:48 -08:00
namespace.c vfs: add NOFOLLOW flag to umount(2) 2010-03-03 14:08:00 -05:00
nfsctl.c Switch may_open() and break_lease() to passing O_... 2010-03-03 13:00:21 -05:00
no-block.c
open.c Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 2010-03-05 13:20:53 -08:00
pipe.c fs: no games with DCACHE_UNHASHED 2009-12-17 10:51:40 -05:00
pnode.c Kill CL_PROPAGATION, sanitize fs/pnode.c:get_source() 2010-03-03 13:00:22 -05:00
pnode.h VFS: Clean up shared mount flag propagation 2010-03-03 14:07:55 -05:00
posix_acl.c
read_write.c sendfile(): check f_op.splice_write() rather than f_op.sendpage() 2009-11-04 09:09:52 +01:00
read_write.h
readdir.c
select.c fs: use rlimit helpers 2010-03-06 11:26:29 -08:00
seq_file.c seq_file: add RCU versions of new hlist/list iterators (v3) 2010-02-22 15:45:54 -08:00
signalfd.c anonfd: Allow making anon files read-only 2009-12-22 12:27:34 -05:00
splice.c sendfile(): check f_op.splice_write() rather than f_op.sendpage() 2009-11-04 09:09:52 +01:00
stack.c VFS/fsstack: handle 32-bit smp + preempt + large files in fsstack_copy_inode_size 2009-12-17 10:58:17 -05:00
stat.c Add unlocked version of inode_add_bytes() function 2009-12-23 13:33:54 +01:00
super.c Mirror MS_KERNMOUNT in ->mnt_flags 2010-03-03 14:08:00 -05:00
sync.c quota: move code from sync_quota_sb into vfs_quota_sync 2010-03-05 00:20:24 +01:00
timerfd.c anonfd: Allow making anon files read-only 2009-12-22 12:27:34 -05:00
utimes.c
xattr.c sanitize xattr handler prototypes 2009-12-16 12:16:49 -05:00
xattr_acl.c VFS: Use GFP_NOFS in posix_acl_from_xattr() 2009-12-03 11:48:07 +00:00