linux

q3k/linux

Author	SHA1	Message	Date
Linus Torvalds	2240a7bb47	tytso-for-linus-20111214 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iQIcBAABCAAGBQJO6PXBAAoJENNvdpvBGATwpuEP/2RCxmdWYZ8/6Z6pmTh3hHN5 fx6HckTdvLQOvbQs72wzVW0JKyc25QmW2mQc5z3MjSymjf/RbEKihPUITRNbHrTD T2sP/lWu09AKLioEg4ucAKn/A7Do3UDIkXTszvVVP/t2psVPzLeJ1njQKra14Nyz o0+gSlnwuGx9WaxfR+7MYNs2ikdSkXIeYsiFAOY4YOxwwC99J/lZ0YaNkbI7UBtC yu2XLIvPboa5JZXANq2G3VhVIETMmOyRTCC76OAXjqkdp9nLFWDG0ydqQh0vVZwL xQGOmAj+l3BNTE0QmMni1w7A0SBU3N6xBA5HN6Y49RlbsMYG27aN54Fy5K2R41I3 QXVhBL53VD6b0KaITcoz7jIGIy6qk9Wx+2WcCYtQBSIjL2YwlaJq0PL07+vRamex sqHGDejcNY87i6AV0DP6SNuCFCi9xFYoAoMi9Wu5E9+T+Vck0okFzW/luk/FvsSP YA5Dh+vISyBeCnWQvcnBmsUQyf8d9MaNnejZ48ath+GiiMfY8USAZ29RAG4VuRtS 9DAyTTIBA73dKpnvEV9u4i8Lwd8hRVMOnPyOO785NwEXk3Ng08pPSSbMklW6UfCY 4nr5UNB13ZPbXx4uoAvATMpCpYxMaLEdxmeMvgXpkekl0hHBzpVDey1Vu9fb/a5n dQpo6WWG9HIJ23hOGAGR =n3Lm -----END PGP SIGNATURE----- Merge tag 'tytso-for-linus-20111214' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * tag 'tytso-for-linus-20111214' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: handle EOF correctly in ext4_bio_write_page() ext4: remove a wrong BUG_ON in ext4_ext_convert_to_initialized ext4: correctly handle pages w/o buffers in ext4_discard_partial_buffers() ext4: avoid potential hang in mpage_submit_io() when blocksize < pagesize ext4: avoid hangs in ext4_da_should_update_i_disksize() ext4: display the correct mount option in /proc/mounts for [no]init_itable ext4: Fix crash due to getting bogus eh_depth value on big-endian systems ext4: fix ext4_end_io_dio() racing against fsync() .. using the new signed tag merge of git that now verifies the gpg signature automatically. Yay. The branchname was just 'dev', which is prettier. I'll tell Ted to use nicer tag names for future cases.	2011-12-14 18:25:58 -08:00
Linus Torvalds	30aaca4582	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: llseek fix race fuse: fix llseek bug fuse: fix fuse_retrieve	2011-12-14 18:23:35 -08:00
Linus Torvalds	ddb360778a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs/ncpfs: fix error paths and goto statements in ncp_fill_super() configfs: register_filesystem() called too early fuse: register_filesystem() called too early ubifs: too early register_filesystem() ... and the same kind of leak for mqueue procfs: fix a vfsmount longterm reference leak	2011-12-14 18:22:55 -08:00
Djalal Harouni	759c361eb9	fs/ncpfs: fix error paths and goto statements in ncp_fill_super() The label 'out_bdi' should be followed by bdi_destroy() instead of fput() which should be after the 'out_fput' label. If bdi_setup_and_register() fails then jump to the 'out_fput' label instead of the 'out_bdi' one. If fget(data.info_fd) fails then jump to the previously fixed 'out_bdi' label to call bdi_destroy() otherwise the bdi object will not be destroyed. Compile tested only. Signed-off-by: Djalal Harouni <tixxdz@opendz.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-12-14 00:45:33 -05:00
Yongqiang Yang	5a0dc7365c	ext4: handle EOF correctly in ext4_bio_write_page() We need to zero out part of a page which beyond EOF before setting uptodate, otherwise, mapread or write will see non-zero data beyond EOF. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2011-12-13 22:29:12 -05:00
Yongqiang Yang	5b5ffa49d4	ext4: remove a wrong BUG_ON in ext4_ext_convert_to_initialized If a file is fallocated on a hole, map->m_lblk + map->m_len may be greater than ee_block + ee_len. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2011-12-13 22:13:42 -05:00
Yongqiang Yang	093e6e3666	ext4: correctly handle pages w/o buffers in ext4_discard_partial_buffers() If a page has been read into memory and never been written, it has no buffers, but we should handle the page in truncate or punch hole. VFS code of writing operations has handled holes correctly, so this patch removes the code handling holes in writing operations. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2011-12-13 22:05:05 -05:00
Yongqiang Yang	13a79a4741	ext4: avoid potential hang in mpage_submit_io() when blocksize < pagesize If there is an unwritten but clean buffer in a page and there is a dirty buffer after the buffer, then mpage_submit_io does not write the dirty buffer out. As a result, da_writepages loops forever. This patch fixes the problem by checking dirty flag. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2011-12-13 21:51:55 -05:00
Andrea Arcangeli	ea51d132db	ext4: avoid hangs in ext4_da_should_update_i_disksize() If the pte mapping in generic_perform_write() is unmapped between iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the "copied" parameter to ->end_write can be zero. ext4 couldn't cope with it with delayed allocations enabled. This skips the i_disksize enlargement logic if copied is zero and no new data was appeneded to the inode. gdb> bt #0 0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\ 08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467 #1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\ xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512 #2 0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\ ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440 #3 generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\ os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482 #4 0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\ xffff88001e26be40) at mm/filemap.c:2600 #5 0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\ zed out>, pos=<value optimized out>) at mm/filemap.c:2632 #6 0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\ t fs/ext4/file.c:136 #7 0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \ ppos=0xffff88001e26bf48) at fs/read_write.c:406 #8 0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\ 000, pos=0xffff88001e26bf48) at fs/read_write.c:435 #9 0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\ 4000) at fs/read_write.c:487 #10 <signal handler called> #11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ () #12 0x0000000000000000 in ?? () gdb> print offset $22 = 0xffffffffffffffff gdb> print idx $23 = 0xffffffff gdb> print inode->i_blkbits $24 = 0xc gdb> up #1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\ xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512 2512 if (ext4_da_should_update_i_disksize(page, end)) { gdb> print start $25 = 0x0 gdb> print end $26 = 0xffffffffffffffff gdb> print pos $27 = 0x108000 gdb> print new_i_size $28 = 0x108000 gdb> print ((struct ext4_inode_info )((char )inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize $29 = 0xd9000 gdb> down 2467 for (i = 0; i < idx; i++) gdb> print i $30 = 0xd44acbee This is 100% reproducible with some autonuma development code tuned in a very aggressive manner (not normal way even for knumad) which does "exotic" changes to the ptes. It wouldn't normally trigger but I don't see why it can't happen normally if the page is added to swap cache in between the two faults leading to "copied" being zero (which then hangs in ext4). So it should be fixed. Especially possible with lumpy reclaim (albeit disabled if compaction is enabled) as that would ignore the young bits in the ptes. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2011-12-13 21:41:15 -05:00
Linus Torvalds	653f42f6b6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: add missing spin_unlock at ceph_mdsc_build_path() ceph: fix SEEK_CUR, SEEK_SET regression crush: fix mapping calculation when force argument doesn't exist ceph: use i_ceph_lock instead of i_lock rbd: remove buggy rollback functionality rbd: return an error when an invalid header is read ceph: fix rasize reporting by ceph_show_options	2011-12-13 14:59:42 -08:00
Linus Torvalds	4dde6dedad	Merge branch 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux * 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux: writeback: set max_pause to lowest value on zero bdi_dirty writeback: permit through good bdi even when global dirty exceeded writeback: comment on the bdi dirty threshold fs: Make write(2) interruptible by a fatal signal writeback: Fix issue on make htmldocs	2011-12-13 14:58:56 -08:00
Yehuda Sadeh	9d5a09e659	ceph: add missing spin_unlock at ceph_mdsc_build_path() one of the paths was missing spin_unlock Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2011-12-13 11:59:53 -08:00
Al Viro	7c6455e368	configfs: register_filesystem() called too early Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-12-13 12:35:15 -05:00
Al Viro	988f032567	fuse: register_filesystem() called too early same story as with ubifs Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-12-13 12:35:14 -05:00
Al Viro	5cc361e3b8	ubifs: too early register_filesystem() doing that before you are ready to handle mount() is a Bad Idea(tm)... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-12-13 12:35:13 -05:00
Sage Weil	6a82c47aa8	ceph: fix SEEK_CUR, SEEK_SET regression Commit `06222e491e` got the if wrong so that it always evaluates as true. This is semantically harmless, but makes SEEK_CUR and SEEK_SET needlessly query the server. Rewrite the if to explicitly enumerate the cases we DO need a valid i_size to make this code less fragile. Reported-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2011-12-13 09:19:26 -08:00
Miklos Szeredi	73104b6e37	fuse: llseek fix race Fix race between lseek(fd, 0, SEEK_CUR) and read/write. This was fixed in generic code by commit `5b6f1eb97d` (vfs: lseek(fd, 0, SEEK_CUR) race condition). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2011-12-13 11:40:59 +01:00
Roel Kluin	b48c6af208	fuse: fix llseek bug The test in fuse_file_llseek() "not SEEK_CUR or not SEEK_SET" always evaluates to true. This was introduced in 3.1 by commit `06222e49` (fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek) and changed the behavior of SEEK_CUR and SEEK_SET to always retrieve the file attributes. This is a performance regression. Fix the test so that it makes sense. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@vger.kernel.org CC: Josef Bacik <josef@redhat.com> CC: Al Viro <viro@zeniv.linux.org.uk>	2011-12-13 10:37:00 +01:00
Miklos Szeredi	48706d0a91	fuse: fix fuse_retrieve Fix two bugs in fuse_retrieve(): - retrieving more than one page would yield repeated instances of the first page - if more than FUSE_MAX_PAGES_PER_REQ pages were requested than the request page array would overflow fuse_retrieve() was added in 2.6.36 and these bugs had been there since the beginning. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@vger.kernel.org	2011-12-13 10:36:59 +01:00
Theodore Ts'o	fc6cb1cda5	ext4: display the correct mount option in /proc/mounts for [no]init_itable /proc/mounts was showing the mount option [no]init_inode_table when the correct mount option that will be accepted by parse_options() is [no]init_itable. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2011-12-12 22:06:18 -05:00
Paul Mackerras	b4611abfa9	ext4: Fix crash due to getting bogus eh_depth value on big-endian systems Commit `1939dd84b3` ("ext4: cleanup ext4_ext_grow_indepth code") added a reference to ext4_extent_header.eh_depth, but forget to pass the value read through le16_to_cpu. The result is a crash on big-endian machines, such as this crash on a POWER7 server: attempt to access beyond end of device sda8: rw=0, want=776392648163376, limit=168558560 Unable to handle kernel paging request for data at address 0x6b6b6b6b6b6b6bcb Faulting instruction address: 0xc0000000001f5f38 cpu 0x14: Vector: 300 (Data Access) at [c000001bd1aaecf0] pc: c0000000001f5f38: .__brelse+0x18/0x60 lr: c0000000002e07a4: .ext4_ext_drop_refs+0x44/0x80 sp: c000001bd1aaef70 msr: 9000000000009032 dar: 6b6b6b6b6b6b6bcb dsisr: 40000000 current = 0xc000001bd15b8010 paca = 0xc00000000ffe4600 pid = 19911, comm = flush-8:0 enter ? for help [c000001bd1aaeff0] c0000000002e07a4 .ext4_ext_drop_refs+0x44/0x80 [c000001bd1aaf090] c0000000002e0c58 .ext4_ext_find_extent+0x408/0x4c0 [c000001bd1aaf180] c0000000002e145c .ext4_ext_insert_extent+0x2bc/0x14c0 [c000001bd1aaf2c0] c0000000002e3fb8 .ext4_ext_map_blocks+0x628/0x1710 [c000001bd1aaf420] c0000000002b2974 .ext4_map_blocks+0x224/0x310 [c000001bd1aaf4d0] c0000000002b7f2c .mpage_da_map_and_submit+0xbc/0x490 [c000001bd1aaf5a0] c0000000002b8688 .write_cache_pages_da+0x2c8/0x430 [c000001bd1aaf720] c0000000002b8b28 .ext4_da_writepages+0x338/0x670 [c000001bd1aaf8d0] c000000000157280 .do_writepages+0x40/0x90 [c000001bd1aaf940] c0000000001ea830 .writeback_single_inode+0xe0/0x530 [c000001bd1aafa00] c0000000001eb680 .writeback_sb_inodes+0x210/0x300 [c000001bd1aafb20] c0000000001ebc84 .__writeback_inodes_wb+0xd4/0x140 [c000001bd1aafbe0] c0000000001ebfec .wb_writeback+0x2fc/0x3e0 [c000001bd1aafce0] c0000000001ed770 .wb_do_writeback+0x2f0/0x300 [c000001bd1aafdf0] c0000000001ed848 .bdi_writeback_thread+0xc8/0x340 [c000001bd1aafed0] c0000000000c5494 .kthread+0xb4/0xc0 [c000001bd1aaff90] c000000000021f48 .kernel_thread+0x54/0x70 This is due to getting ext_depth(inode) == 0x101 and therefore running off the end of the path array in ext4_ext_drop_refs into following unallocated structures. This fixes it by adding the necessary le16_to_cpu. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-12-12 11:00:56 -05:00
Theodore Ts'o	b5a7e97039	ext4: fix ext4_end_io_dio() racing against fsync() We need to make sure iocb->private is cleared before we put the io_end structure on i_completed_io_list. Otherwise fsync() could potentially run on another CPU and free the iocb structure out from under us. Reported-by: Kent Overstreet <koverstreet@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2011-12-12 10:53:02 -05:00
Linus Torvalds	8def5f51b0	Merge git://git.samba.org/sfrench/cifs-2.6 * git://git.samba.org/sfrench/cifs-2.6: cifs: check for NULL last_entry before calling cifs_save_resume_key cifs: attempt to freeze while looping on a receive attempt cifs: Fix sparse warning when calling cifs_strtoUCS CIFS: Add descriptions to the brlock cache functions	2011-12-09 14:45:44 -08:00
Michal Hocko	2a95ea6c0d	procfs: do not overflow get_{idle,iowait}_time for nohz Since commit `a25cac5198` ("proc: Consider NO_HZ when printing idle and iowait times") we are reporting idle/io_wait time also while a CPU is tickless. We rely on get_{idle,iowait}_time functions to retrieve proper data. These functions, however, use usecs_to_cputime to translate micro seconds time to cputime64_t. This is just an alias to usecs_to_jiffies which reduces the data type from u64 to unsigned int and also checks whether the given parameter overflows jiffies_to_usecs(MAX_JIFFY_OFFSET) and returns MAX_JIFFY_OFFSET in that case. When we overflow depends on CONFIG_HZ but especially for CONFIG_HZ_300 it is quite low (1431649781) so we are getting MAX_JIFFY_OFFSET for >3000s! until we overflow unsigned int. Just for reference CONFIG_HZ_100 has an overflow window around 20s, CONFIG_HZ_250 ~8s and CONFIG_HZ_1000 ~2s. This results in a bug when people saw [h]top going mad reporting 100% CPU usage even though there was basically no CPU load. The reason was simply that /proc/stat stopped reporting idle/io_wait changes (and reported MAX_JIFFY_OFFSET) and so the only change happening was for user system time. Let's use nsecs_to_jiffies64 instead which doesn't reduce the precision to 32b type and it is much more appropriate for cumulative time values (unlike usecs_to_jiffies which intended for timeout calculations). Signed-off-by: Michal Hocko <mhocko@suse.cz> Tested-by: Artem S. Tashkinov <t.artem@mailcity.com> Cc: Dave Jones <davej@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-12-09 07:50:29 -08:00
Claudio Scordino	b53fc7c297	fs/proc/meminfo.c: fix compilation error Fix the error message "directives may not be used inside a macro argument" which appears when the kernel is compiled for the cris architecture. Signed-off-by: Claudio Scordino <claudio@evidence.eu.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-12-09 07:50:28 -08:00
Al Viro	905ad269c5	procfs: fix a vfsmount longterm reference leak kern_mount() doesn't pair with plain mntput()... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-12-09 00:40:19 -05:00
Jeff Layton	7023676f9e	cifs: check for NULL last_entry before calling cifs_save_resume_key Prior to commit `eaf35b1`, cifs_save_resume_key had some NULL pointer checks at the top. It turns out that at least one of those NULL pointer checks is needed after all. When the LastNameOffset in a FIND reply appears to be beyond the end of the buffer, CIFSFindFirst and CIFSFindNext will set srch_inf.last_entry to NULL. Since `eaf35b1`, the code will now oops in this situation. Fix this by having the callers check for a NULL last entry pointer before calling cifs_save_resume_key. No change is needed for the call site in cifs_readdir as it's not reachable with a NULL current_entry pointer. This should fix: https://bugzilla.redhat.com/show_bug.cgi?id=750247 Cc: stable@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Reported-by: Adam G. Metzler <adamgmetzler@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>	2011-12-08 22:04:47 -06:00
Jeff Layton	95edcff497	cifs: attempt to freeze while looping on a receive attempt In the recent overhaul of the demultiplex thread receive path, I neglected to ensure that we attempt to freeze on each pass through the receive loop. Reported-and-Tested-by: Woody Suwalski <terraluna977@gmail.com> Reported-and-Tested-by: Adam Williamson <awilliam@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>	2011-12-08 22:04:47 -06:00
Steve French	59edb63ad0	cifs: Fix sparse warning when calling cifs_strtoUCS Fix sparse endian check warning while calling cifs_strtoUCS CHECK fs/cifs/smbencrypt.c fs/cifs/smbencrypt.c:216:37: warning: incorrect type in argument 1 (different base types) fs/cifs/smbencrypt.c:216:37: expected restricted __le16 [usertype] <noident> fs/cifs/smbencrypt.c:216:37: got unsigned short <noident> Signed-off-by: Steve French <smfrench@gmail.com> Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com	2011-12-08 22:04:47 -06:00
Pavel Shilovsky	9a5101c896	CIFS: Add descriptions to the brlock cache functions Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-12-08 22:04:47 -06:00
Linus Torvalds	fb38f9b8fe	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: drop spin lock when memory alloc fails Btrfs: check if the to-be-added device is writable Btrfs: try cluster but don't advance in search list Btrfs: try to allocate from cluster even at LOOP_NO_EMPTY_SIZE	2011-12-08 13:18:59 -08:00
Liu Bo	1cf4ffdb32	Btrfs: drop spin lock when memory alloc fails Drop spin lock in convert_extent_bit() when memory alloc fails, otherwise, it will be a deadlock. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-12-08 08:55:47 -05:00
Li Zefan	a5d1633361	Btrfs: check if the to-be-added device is writable If we call ioctl(BTRFS_IOC_ADD_DEV) directly, we'll succeed in adding a readonly device to a btrfs filesystem, and btrfs will write to that device, emitting kernel errors: [ 3109.833692] lost page write due to I/O error on loop2 [ 3109.833720] lost page write due to I/O error on loop2 ... Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-12-08 08:55:46 -05:00
Alexandre Oliva	274bd4fb3e	Btrfs: try cluster but don't advance in search list When we find an existing cluster, we switch to its block group as the current block group, possibly skipping multiple blocks in the process. Furthermore, under heavy contention, multiple threads may fail to allocate from a cluster and then release just-created clusters just to proceed to create new ones in a different block group. This patch tries to allocate from an existing cluster regardless of its block group, and doesn't switch to that group, instead proceeding to try to allocate a cluster from the group it was iterating before the attempt. Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-12-08 08:55:40 -05:00
Alexandre Oliva	062c05c46b	Btrfs: try to allocate from cluster even at LOOP_NO_EMPTY_SIZE If we reach LOOP_NO_EMPTY_SIZE, we won't even try to use a cluster that others might have set up. Odds are that there won't be one, but if someone else succeeded in setting it up, we might as well use it, even if we don't try to set up a cluster again. Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-12-07 19:50:42 -05:00
Linus Torvalds	a694ad94bc	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: fix the logspace waiting algorithm xfs: fix nfs export of 64-bit inodes numbers on 32-bit kernels xfs: fix allocation length overflow in xfs_bmapi_write()	2011-12-07 16:13:54 -08:00
Sage Weil	be655596b3	ceph: use i_ceph_lock instead of i_lock We have been using i_lock to protect all kinds of data structures in the ceph_inode_info struct, including lists of inodes that we need to iterate over while avoiding races with inode destruction. That requires grabbing a reference to the inode with the list lock protected, but igrab() now takes i_lock to check the inode flags. Changing the list lock ordering would be a painful process. However, using a ceph-specific i_ceph_lock in the ceph inode instead of i_lock is a simple mechanical change and avoids the ordering constraints imposed by igrab(). Reported-by: Amon Ott <a.ott@m-privacy.de> Signed-off-by: Sage Weil <sage@newdream.net>	2011-12-07 10:46:44 -08:00
Linus Torvalds	3172f8fe1c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix apparmor dereferencing potentially freed dentry, sanitize __d_path() API	2011-12-07 08:14:42 -08:00
Al Viro	02125a8264	fix apparmor dereferencing potentially freed dentry, sanitize __d_path() API __d_path() API is asking for trouble and in case of apparmor d_namespace_path() getting just that. The root cause is that when __d_path() misses the root it had been told to look for, it stores the location of the most remote ancestor in root. Without grabbing references. Sure, at the moment of call it had been pinned down by what we have in path. And if we raced with umount -l, we could have very well stopped at vfsmount/dentry that got freed as soon as prepend_path() dropped vfsmount_lock. It is safe to compare these pointers with pre-existing (and known to be still alive) vfsmount and dentry, as long as all we are asking is "is it the same address?". Dereferencing is not safe and apparmor ended up stepping into that. d_namespace_path() really wants to examine the place where we stopped, even if it's not connected to our namespace. As the result, it looked at ->d_sb->s_magic of a dentry that might've been already freed by that point. All other callers had been careful enough to avoid that, but it's really a bad interface - it invites that kind of trouble. The fix is fairly straightforward, even though it's bigger than I'd like: * prepend_path() root argument becomes const. * __d_path() is never called with NULL/NULL root. It was a kludge to start with. Instead, we have an explicit function - d_absolute_root(). Same as __d_path(), except that it doesn't get root passed and stops where it stops. apparmor and tomoyo are using it. * __d_path() returns NULL on path outside of root. The main caller is show_mountinfo() and that's precisely what we pass root for - to skip those outside chroot jail. Those who don't want that can (and do) use d_path(). * __d_path() root argument becomes const. Everyone agrees, I hope. * apparmor does NOT try to use __d_path() or any of its variants when it sees that path->mnt is an internal vfsmount. In that case it's definitely not mounted anywhere and dentry_path() is exactly what we want there. Handling of sysctl()-triggered weirdness is moved to that place. * if apparmor is asked to do pathname relative to chroot jail and __d_path() tells it we it's not in that jail, the sucker just calls d_absolute_path() instead. That's the other remaining caller of __d_path(), BTW. * seq_path_root() does _NOT_ return -ENAMETOOLONG (it's stupid anyway - the normal seq_file logics will take care of growing the buffer and redoing the call of ->show() just fine). However, if it gets path not reachable from root, it returns SEQ_SKIP. The only caller adjusted (i.e. stopped ignoring the return value as it used to do). Reviewed-by: John Johansen <john.johansen@canonical.com> ACKed-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org	2011-12-06 23:57:18 -05:00
Christoph Hellwig	9f9c19ec1a	xfs: fix the logspace waiting algorithm Apply the scheme used in log_regrant_write_log_space to wake up any other threads waiting for log space before the newly added one to log_regrant_write_log_space as well, and factor the code into readable helpers. For each of the queues we have add two helpers: - one to try to wake up all waiting threads. This helper will also be usable by xfs_log_move_tail once we remove the current opportunistic wakeups in it. - one to sleep on t_wait until enough log space is available, loosely modelled after Linux waitqueues. And use them to reimplement the guts of log_regrant_write_log_space and log_regrant_write_log_space. These two function now use one and the same algorithm for waiting on log space instead of subtly different ones before, with an option to completely unify them in the near future. Also move the filesystem shutdown handling to the common caller given that we had to touch it anyway. Based on hard debugging and an earlier patch from Chandra Seetharaman <sekharan@us.ibm.com>. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandra Seetharaman <sekharan@us.ibm.com> Tested-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2011-12-06 14:19:47 -06:00
Christoph Hellwig	c29f7d457a	xfs: fix nfs export of 64-bit inodes numbers on 32-bit kernels The i_ino field in the VFS inode is of type unsigned long and thus can't hold the full 64-bit inode number on 32-bit kernels. We have the full inode number in the XFS inode, so use that one for nfs exports. Note that I've also switched the 32-bit file handles types to it, just to make the code more consistent and copy & paste errors less likely to happen. Reported-by: Guoquan Yang <ygq51@hotmail.com> Reported-by: Hank Peng <pengxihan@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>	2011-12-06 10:46:23 -06:00
Dave Chinner	a99ebf43f4	xfs: fix allocation length overflow in xfs_bmapi_write() When testing the new xfstests --large-fs option that does very large file preallocations, this assert was tripped deep in xfs_alloc_vextent(): XFS: Assertion failed: args->minlen <= args->maxlen, file: fs/xfs/xfs_alloc.c, line: 2239 The allocation was trying to allocate a zero length extent because the lower 32 bits of the allocation length was zero. The remaining length of the allocation to be done was an exact multiple of 2^32 - the first case I saw was at 496TB remaining to be allocated. This turns out to be an overflow when converting the allocation length (a 64 bit quantity) into the extent length to allocate (a 32 bit quantity), and it requires the length to be allocated an exact multiple of 2^32 blocks to trip the assert. Fix it by limiting the extent lenth to allocate to MAXEXTLEN. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2011-12-02 16:24:02 -06:00
Linus Torvalds	ffb8fb5469	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: fix attr2 vs large data fork assert xfs: force buffer writeback before blocking on the ilock in inode reclaim xfs: validate acl count	2011-12-02 10:38:20 -08:00
Sage Weil	2151937d7c	ceph: fix rasize reporting by ceph_show_options Fix typo. Reported-by: mowang da <whooya.xxl@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2011-12-02 09:27:54 -08:00
Linus Torvalds	0a4ebed781	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (31 commits) ocfs2: avoid unaligned access to dqc_bitmap ocfs2: Use filemap_write_and_wait() instead of write_inode_now() ocfs2: honor O_(D)SYNC flag in fallocate ocfs2: Add a missing journal credit in ocfs2_link_credits() -v2 ocfs2: send correct UUID to cleancache initialization ocfs2: Commit transactions in error cases -v2 ocfs2: make direntry invalid when deleting it fs/ocfs2/dlm/dlmlock.c: free kmem_cache_zalloc'd data using kmem_cache_free ocfs2: Avoid livelock in ocfs2_readpage() ocfs2: serialize unaligned aio ocfs2: Implement llseek() ocfs2: Fix ocfs2_page_mkwrite() ocfs2: Add comment about orphan scanning ocfs2: Clean up messages in the fs ocfs2/cluster: Cluster up now includes network connections too ocfs2/cluster: Add new function o2net_fill_node_map() ocfs2/cluster: Fix output in file elapsed_time_in_ms ocfs2/dlm: dlmlock_remote() needs to account for remastery ocfs2/dlm: Take inflight reference count for remotely mastered resources too ocfs2/dlm: Cleanup dlm_wait_for_node_death() and dlm_wait_for_node_recovery() ...	2011-12-01 14:55:34 -08:00
Akinobu Mita	939255798a	ocfs2: avoid unaligned access to dqc_bitmap The dqc_bitmap field of struct ocfs2_local_disk_chunk is 32-bit aligned, but not 64-bit aligned. The dqc_bitmap is accessed by ocfs2_set_bit(), ocfs2_clear_bit(), ocfs2_test_bit(), or ocfs2_find_next_zero_bit(). These are wrapper macros for ext2__bit() which need to take an unsigned long aligned address (though some architectures are able to handle unaligned address correctly) So some 64bit architectures may not be able to access the dqc_bitmap correctly. This avoids such unaligned access by using another wrapper functions for ext2__bit(). The code is taken from fs/ext4/mballoc.c which also need to handle unaligned bitmap access. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Joel Becker <jlbec@evilplan.org>	2011-12-01 14:39:32 -08:00
Linus Torvalds	b930c26416	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: fix meta data raid-repair merge problem Btrfs: skip allocation attempt from empty cluster Btrfs: skip block groups without enough space for a cluster Btrfs: start search for new cluster at the beginning Btrfs: reset cluster's max_size when creating bitmap Btrfs: initialize new bitmaps' list Btrfs: fix oops when calling statfs on readonly device Btrfs: Don't error on resizing FS to same size Btrfs: fix deadlock on metadata reservation when evicting a inode Fix URL of btrfs-progs git repository in docs btrfs scrub: handle -ENOMEM from init_ipath()	2011-12-01 08:28:53 -08:00
Jan Schmidt	f4a8e6563e	Btrfs: fix meta data raid-repair merge problem Commit `4a54c8c16` introduced raid-repair, killing the individual readpage_io_failed_hook entries from inode.c and disk-io.c. Commit `4bb31e92` introduced new readahead code, adding a readpage_io_failed_hook to disk-io.c. The raid-repair commit had logic to disable raid-repair, if readpage_io_failed_hook is set. Thus, the readahead commit effectively disabled raid-repair for meta data. This commit changes the logic to always attempt raid-repair when needed and call the readpage_io_failed_hook in case raid-repair fails. This is much more straight forward and should have been like that from the beginning. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Reported-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-12-01 09:30:36 -05:00
Alexandre Oliva	be064d1139	Btrfs: skip allocation attempt from empty cluster If we don't have a cluster, don't bother trying to allocate from it, jumping right away to the attempt to allocate a new cluster. Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-11-30 13:43:00 -05:00
Alexandre Oliva	425d83156c	Btrfs: skip block groups without enough space for a cluster We test whether a block group has enough free space to hold the requested block, but when we're doing clustered allocation, we can save some cycles by testing whether it has enough room for the cluster upfront, otherwise we end up attempting to set up a cluster and failing. Only in the NO_EMPTY_SIZE loop do we attempt an unclustered allocation, and by then we'll have zeroed the cluster size, so this patch won't stop us from using the block group as a last resort. Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-11-30 13:43:00 -05:00

1 2 3 4 5 ...

24938 commits