linux

q3k/linux

Author	SHA1	Message	Date
Ryusuke Konishi	d501d73689	nilfs2: separate function for creating new btree node block Adds a separate routine for creating a btree node block. This is a preparation to reduce the depth of function calls during submitting btree node buffer. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:49 +09:00
Ryusuke Konishi	b34a65069c	nilfs2: avoid readahead on metadata file for create mode This turns off readhead action of metadata file if nilfs_mdt_get_block function was called with a create flag. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:49 +09:00
Ryusuke Konishi	ef7d4757a5	nilfs2: simplify nilfs_sufile_get_ncleansegs function Previously, this function took an status code to return possible error codes. The ("nilfs2: add local variable to cache the number of clean segments") patch removed the possibility to return errors. So, this simplifies the function definition to make it directly return the number of clean segments. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:49 +09:00
Ryusuke Konishi	aa474a2201	nilfs2: add local variable to cache the number of clean segments This makes it possible for sufile to get the number of clean segments faster. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:49 +09:00
Ryusuke Konishi	7b16c8a211	nilfs2: unfold nilfs_sufile_block_get_header function This unfolds the nilfs_sufile_block_get_header() function for simplicity. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:48 +09:00
Ryusuke Konishi	fd66c0d5c3	nilfs2: hide nilfs_mdt_clear calls in nilfs_mdt_destroy This will hide a function call of nilfs_mdt_clear() in nilfs_mdt_destroy(). This ensures nilfs_mdt_destroy() to do cleanup jobs included in nilfs_mdt_clear(). Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:48 +09:00
Ryusuke Konishi	3961f0e277	nilfs2: eliminate inlines to directly read/write inode of metadata files Removes two inline functions: nilfs_mdt_read_inode_direct() and nilfs_mdt_write_inode_direct(). Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:48 +09:00
Ryusuke Konishi	8707df3847	nilfs2: separate read method of meta data files on super root block Will displace nilfs_mdt_read_inode_direct function with an individual read method: nilfs_dat_read, nilfs_sufile_read, nilfs_cpfile_read. This provides the opportunity to initialize local variables of each metadata file after reading the inode. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:48 +09:00
Ryusuke Konishi	79739565e1	nilfs2: separate constructor of metadata files This will displace nilfs_mdt_new() constructor with individual metadata file constructors like nilfs_dat_new(), new_sufile_new(), nilfs_cpfile_new(), and nilfs_ifile_new(). This makes it possible for each metadata file to have own intialization code. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:48 +09:00
Ryusuke Konishi	5731e191f2	nilfs2: add size option of private object to metadata file allocator This adds an optional "object size" argument to nilfs_mdt_new_common() function; the argument specifies the size of private object attached to a newly allocated metadata file inode. This will afford space to keep local variables for meta data files. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:48 +09:00
Ryusuke Konishi	9cb4e0d2b9	nilfs2: move out mark_inode_dirty calls from bmap routines Previously, nilfs_bmap_add_blocks() and nilfs_bmap_sub_blocks() called mark_inode_dirty() after they changed the number of data blocks. This moves these calls outside bmap outermost functions like nilfs_bmap_insert() or nilfs_bmap_truncate(). This will mitigate overhead for truncate or delete operation since they repeatedly remove set of blocks. Nearly 10 percent improvement was observed for removal of a large file: # dd if=/dev/zero of=/test/aaa bs=1M count=512 # time rm /test/aaa real 2.968s -> 2.705s Further optimization may be possible by eliminating these mark_inode_dirty() uses though I avoid mixing separate changes here. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:47 +09:00
Ryusuke Konishi	09bf4aae0a	nilfs2: stop marking metadata inode dirty within btree operations Since metadata file routines mark the inode dirty after they successfully changed bmap objects, nilfs_mdt_mark_dirty() calls in nilfs_bmap_add_blocks() and nilfs_bmap_sub_blocks() are redundant. This removes these overlapping calls from the bmap routines. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:47 +09:00
Ryusuke Konishi	30db4e6c3d	nilfs2: remove buffer locking from btree code lock_buffer() and unlock_buffer() uses in btree.c are eliminable because btree functions gain buffer heads through nilfs_btnode_get(), which never returns an on-the-fly buffer. Although nilfs_clear_dirty_page() and nilfs_copy_back_pages() in nilfs_commit_gcdat_inode() juggle btree node buffers of DAT, this is safe because these operations are protected by a log writer lock or the metadata file semaphore of DAT. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:47 +09:00
Ryusuke Konishi	a49762fd11	nilfs2: remove buffer locking in nilfs_mark_inode_dirty This lock is eliminable because inodes on the buffer can be updated independently. Although a log writer also fills in bmap data on the on-disk inodes, this update is exclusively done by a log writer lock. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:47 +09:00
Jiro SEKIBA	e2073e7857	nilfs2: cleanup unused match_bool function match_bool function is not used anymore. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:47 +09:00
Jiro SEKIBA	91f1953bf3	nilfs2: Using nobarrier option instead of barrier=off Since most of fs using nofoobar style option, modified barrier=off option as nobarrier. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:47 +09:00
Jiro SEKIBA	6600b9dd8e	nilfs2: move definition of struct nilfs_btree_node This is a trivial patch to expose struct nilfs_fs_btree_node. The struct should be exposed outside of kernel, for it is disk format. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:46 +09:00
Ryusuke Konishi	9b945d537d	nilfs2: get rid of BUG_ON use in btree lookup routines The current btree lookup routines make a kernel oops when detected inconsistency in btree blocks. These routines should instead return a proper error code because the inconsistency usually comes from corruption of on-disk metadata. This fixes the issue by converting BUG_ON calls to proper error handlings. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:46 +09:00
Jiro SEKIBA	18dafac1a4	nilfs2: deleted inconsistent comment in nilfs_load_inode_block() The comment says, "Caller of this function MUST lock s_inode_lock", however just above the comment, it locks s_inode_lock in the function. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-15 17:17:46 +09:00
Ryusuke Konishi	c1ea985c71	nilfs2: fix lock order reversal in chcp operation Will fix the following lock order reversal lockdep detected: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.32-rc6 #7 ------------------------------------------------------- chcp/30157 is trying to acquire lock: (&nilfs->ns_mount_mutex){+.+.+.}, at: [<fed7cfcc>] nilfs_cpfile_change_cpmode+0x46/0x752 [nilfs2] but task is already holding lock: (&nilfs->ns_segctor_sem){++++.+}, at: [<fed7ca32>] nilfs_transaction_begin+0xba/0x110 [nilfs2] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&nilfs->ns_segctor_sem){++++.+}: [<c105799c>] __lock_acquire+0x109c/0x139d [<c1057d26>] lock_acquire+0x89/0xa0 [<c14151e2>] down_read+0x31/0x45 [<fed6d77b>] nilfs_attach_checkpoint+0x8f/0x16b [nilfs2] [<fed6e393>] nilfs_get_sb+0x3e7/0x653 [nilfs2] [<c10c0ccb>] vfs_kern_mount+0x8b/0x124 [<c10c0db2>] do_kern_mount+0x37/0xc3 [<c10d7517>] do_mount+0x64d/0x69d [<c10d75cd>] sys_mount+0x66/0x95 [<c1002a14>] sysenter_do_call+0x12/0x32 -> #1 (&type->s_umount_key#31/1){+.+.+.}: [<c105799c>] __lock_acquire+0x109c/0x139d [<c1057d26>] lock_acquire+0x89/0xa0 [<c104c0f3>] down_write_nested+0x34/0x52 [<c10c08fe>] sget+0x22e/0x389 [<fed6e133>] nilfs_get_sb+0x187/0x653 [nilfs2] [<c10c0ccb>] vfs_kern_mount+0x8b/0x124 [<c10c0db2>] do_kern_mount+0x37/0xc3 [<c10d7517>] do_mount+0x64d/0x69d [<c10d75cd>] sys_mount+0x66/0x95 [<c1002a14>] sysenter_do_call+0x12/0x32 -> #0 (&nilfs->ns_mount_mutex){+.+.+.}: [<c1057727>] __lock_acquire+0xe27/0x139d [<c1057d26>] lock_acquire+0x89/0xa0 [<c1414d63>] mutex_lock_nested+0x41/0x23e [<fed7cfcc>] nilfs_cpfile_change_cpmode+0x46/0x752 [nilfs2] [<fed801b2>] nilfs_ioctl+0x11a/0x7da [nilfs2] [<c10cca12>] vfs_ioctl+0x27/0x6e [<c10ccf93>] do_vfs_ioctl+0x491/0x4db [<c10cd022>] sys_ioctl+0x45/0x5f [<c1002a14>] sysenter_do_call+0x12/0x32 Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-13 10:33:24 +09:00
Ryusuke Konishi	c083234f15	nilfs2: fix missing cleanup of gc cache on error cases This fixes an -rc1 regression brought by the commit: `1cf58fa840` ("nilfs2: shorten freeze period due to GC in write operation v3"). Although the patch moved out a function call of nilfs_ioctl_move_blocks() to nilfs_ioctl_clean_segments() from nilfs_ioctl_prepare_clean_segments(), it didn't move corresponding cleanup job needed for the error case. This will move the missing cleanup job to the destination function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: Jiro SEKIBA <jir@unicus.jp>	2009-11-08 19:04:25 +09:00
Ryusuke Konishi	5399dd1fc8	nilfs2: fix kernel oops in error case of nilfs_ioctl_move_blocks This fixes a kernel oops reported by Markus Trippelsdorf in the email titled "[NILFS users] kernel Oops while running nilfs_cleanerd". The oops was caused by a bug of error path in nilfs_ioctl_move_blocks() function, which was inlined in nilfs_ioctl_clean_segments(). nilfs_ioctl_move_blocks checks duplication of blocks which will be moved in garbage collection. But, the check should have be done within nilfs_ioctl_move_inode_block() to prevent list corruption among buffers storing the target blocks. To fix the kernel oops, this moves forward the duplication check before the list insertion. I also tested this for stable trees [2.6.30, 2.6.31]. Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable <stable@kernel.org>	2009-11-08 19:01:35 +09:00
Ryusuke Konishi	05b4358ad5	nilfs2: add zero-fill for new btree node buffers Adds missing initialization of newly allocated b-tree node buffers. This avoids garbage data to be mixed in b-tree node blocks. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-03 12:32:03 +09:00
Ryusuke Konishi	aeda7f6343	nilfs2: fix irregular checkpoint creation due to data flush When nilfs flushes out dirty data to reduce memory pressure, creation of checkpoints is wrongly postponed. This bug causes irregular checkpoint creation especially in small footprint systems. To correct this issue, a timer for the checkpoint creation has to be continued if a log writer does not create a checkpoint. This will do the correction. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-03 12:32:03 +09:00
Ryusuke Konishi	b1e19e5601	nilfs2: fix dirty page accounting leak causing hang at write Bruno Prémont and Dunphy, Bill noticed me that NILFS will certainly hang on ARM-based targets. I found this was caused by an underflow of dirty pages counter. A b-tree cache routine was marking page dirty without adjusting page account information. This fixes the dirty page accounting leak and resolves the hang on arm-based targets. Reported-by: Bruno Prémont <bonbons@linux-vserver.org> Reported-by: Dunphy, Bill <WDunphy@tandbergdata.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Bruno Prémont <bonbons@linux-vserver.org> Cc: stable <stable@kernel.org>	2009-11-03 12:31:36 +09:00
Alexey Dobriyan	828c09509b	const: constify remaining file_operations [akpm@linux-foundation.org: fix KVM] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-01 16:11:11 -07:00
Ryusuke Konishi	3cc811bffd	nilfs2: fix missing initialization of i_dir_start_lookup member The i_dir_start_lookup field in nilfs_inode_info objects should be cleared when the objects are allocated, but the the initialization was missing in case of reading from disk. This adds the initialization. Since the variable just gives a start page on directory lookups, the bug was nonfatal until now. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-29 20:32:13 +09:00
Ryusuke Konishi	1f28fcd925	nilfs2: fix missing zero-fill initialization of btree node cache This will fix file system corruption which infrequently happens after mount. The problem was reported from users with the title "[NILFS users] Fail to mount NILFS." (Message-ID: <200908211918.34720.yuri@itinteg.net>), and so forth. I've also experienced the corruption multiple times on kernel 2.6.30 and 2.6.31. The problem turned out to be caused due to discordance between mapping->nrpages of a btree node cache and the actual number of pages hung on the cache; if the mapping->nrpages becomes zero even as it has pages, truncate_inode_pages() returns without doing anything. Usually this is harmless except it may cause page leak, but garbage collection fairly infrequently sees a stale page remained in the btree node cache of DAT (i.e. disk address translation file of nilfs), and induces the corruption. I identified a missing initialization in btree node caches was the root cause. This corrects the bug. I've tested this for kernel 2.6.30 and 2.6.31. Reported-by: Yuri Chislov <yuri@itinteg.net> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable <stable@kernel.org>	2009-09-29 20:12:56 +09:00
Alexey Dobriyan	f0f37e2f77	const: mark struct vm_struct_operations * mark struct vm_area_struct::vm_ops as const * mark vm_ops in AGP code But leave TTM code alone, something is fishy there with global vm_ops being used. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-27 11:39:25 -07:00
Alexey Dobriyan	6e1d5dcc2b	const: mark remaining inode_operations as const Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-22 07:17:24 -07:00
Alexey Dobriyan	7f09410bbc	const: mark remaining address_space_operations const Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-22 07:17:24 -07:00
Alexey Dobriyan	ac4cfdd6d1	const: mark remaining export_operations const Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-22 07:17:24 -07:00
Alexey Dobriyan	b87221de6a	const: mark remaining super_operations const Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-22 07:17:24 -07:00
Jens Axboe	2c96ce9f20	fs: remove bdev->bd_inode_backing_dev_info It has been unused since it was introduced in: commit 520808bf20e90fdbdb320264ba7dd5cf9d47dcac Author: Andrew Morton <akpm@osdl.org> Date: Fri May 21 00:46:17 2004 -0700 [PATCH] block device layer: separate backing_dev_info infrastructure So lets just kill it. Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-16 15:16:18 +02:00
Ryusuke Konishi	41f4db0f48	fs/Kconfig: move nilfs2 outside misc filesystems Some people asked me questions like the following: On Wed, 15 Jul 2009 13:11:21 +0200, Leon Woestenberg wrote: > just wondering, any reasons why NILFS2 is one of the miscellaneous > filesystems and, for example, btrfs, is not in Kconfig? Actually, nilfs is NOT a filesystem came from other operating systems, but a filesystem created purely for Linux. Nor is it a flash filesystem but that for generic block devices. So, this moves nilfs outside the misc category as I responded in LKML "Re: Why does NILFS2 hide under Miscellaneous filesystems?" (Message-Id: <20090716.002526.93465395.ryusuke@osrg.net>). Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:16 +09:00
Ryusuke Konishi	0f3fe33b39	nilfs2: convert nilfs_bmap_lookup to an inline function The nilfs_bmap_lookup() is now a wrapper function of nilfs_bmap_lookup_at_level(). This moves the nilfs_bmap_lookup() to a header file converting it to an inline function and gives an opportunity for optimization. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:16 +09:00
Ryusuke Konishi	2e0c2c7392	nilfs2: allow btree code to directly call dat operations The current btree code is written so that btree functions call dat operations via wrapper functions in bmap.c when they allocate, free, or modify virtual block addresses. This abstraction requires additional function calls and causes frequent call of nilfs_bmap_get_dat() function since it is used in the every wrapper function. This removes the wrapper functions and makes them available from btree.c and direct.c, which will increase the opportunity of compiler optimization. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:16 +09:00
Ryusuke Konishi	bd8169efae	nilfs2: add update functions of virtual block address to dat This is a preparation for the successive cleanup ("nilfs2: allow btree to directly call dat operations"). This adds functions bundling a few operations to change an entry of virtual block address on the dat file. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:15 +09:00
Ryusuke Konishi	7a102b0923	nilfs2: remove individual gfp constants for each metadata file This gets rid of NILFS_CPFILE_GFP, NILFS_SUFILE_GFP, NILFS_DAT_GFP, and NILFS_IFILE_GFP. All of these constants refer to NILFS_MDT_GFP, and can be removed. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:15 +09:00
Ryusuke Konishi	3218929dbd	nilfs2: stop zero-fill of btree path just before free it The btree path object is cleared just before it is freed. This will remove the code doing the unnecessary clear operation. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:15 +09:00
Ryusuke Konishi	6d28f7ea43	nilfs2: remove unused btree argument from btree functions Even though many btree functions take a btree object as their first argument, most of them are not used in their functions. This sticky use of the btree argument is hurting code readability and giving the possibility of inefficient code generation. So, this removes the unnecessary btree arguments. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:15 +09:00
Ryusuke Konishi	9ead986373	nilfs2: remove nilfs_dat_abort_start and nilfs_dat_abort_free These functions are not called from any functions. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:15 +09:00
Jiro SEKIBA	1cf58fa840	nilfs2: shorten freeze period due to GC in write operation v3 This is a re-revised patch to shorten freeze period. This version include a fix of the bug Konishi-san mentioned last time. When GC is runnning, GC moves live block to difference segments. Copying live blocks into memory is done in a transaction, however it is not necessarily to be in the transaction. This patch will get the nilfs_ioctl_move_blocks() out from transaction lock and put it before the transaction. I ran sysbench fileio test against nilfs partition. I copied some DVD/CD images and created snapshot to create live blocks before starting the benchmark. Followings are summary of rc8 and rc8 w/ the patch of per-request statistics, which is min/max and avg. I ran each test three times and bellow is average of those numers. According to this benchmark result, average time is slightly degrated. However, worstcase (max) result is significantly improved. This can address a few seconds write freeze. - random write per-request performance of rc8 min 0.843ms max 680.406ms avg 3.050ms - random write per-request performance of rc8 w/ this patch min 0.843ms -> 100.00% max 380.490ms -> 55.90% avg 3.233ms -> 106.00% - sequential write per-request performance of rc8 min 0.736ms max 774.343ms avg 2.883ms - sequential write per-request performance of rc8 w/ this patch min 0.720ms -> 97.80% max 644.280ms-> 83.20% avg 3.130ms -> 108.50% -----8<-----8<-----nilfs_cleanerd.conf-----8<-----8<----- protection_period 150 selection_policy timestamp # timestamp in ascend order nsegments_per_clean 2 cleaning_interval 2 retry_interval 60 use_mmap log_priority info -----8<-----8<-----nilfs_cleanerd.conf-----8<-----8<----- Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:15 +09:00
Zhu Yanhai	43be0ec038	nilfs2: add more check routines in mount process nilfs2: Add more safeguard routines and protections in mount process, which also makes nilfs2 report consistency error messages when checkpoint number is invalid. Signed-off-by: Zhu Yanhai <zhu.yanhai@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:14 +09:00
Zhang Qiang	a4f0b9c5b4	nilfs2: An unassigned variable is assigned to a never used structure member nilfs2: In procedure 'nilfs_get_sb()', when a nilfs filesysttem is mounted for the first time, local variable 'nilfs->ns_last_cno' is used before loading the latest checkpoint number from disk (in 'nilfs_fill_super'). 'nilfs->ns_last_cno' is assigned to 'sd.cno', but 'sd.cno' has never been used in the procedure. Signed-off-by: Zhang Qiang <zhangqiang.buaa@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:14 +09:00
Ryusuke Konishi	c1b353f04a	nilfs2: use GFP_NOIO for bio_alloc instead of GFP_NOWAIT Alberto Bertogli advised me about bio_alloc() use in nilfs: On Sat, 13 Jun 2009 22:52:40 -0300, Alberto Bertogli wrote: > By the way, those bio_alloc()s are using GFP_NOWAIT but it looks > like they could use at least GFP_NOIO or GFP_NOFS, since the caller > can (and sometimes do) sleep. The only caller is nilfs_submit_bh(), > which calls nilfs_submit_seg_bio() which can sleep calling > wait_for_completion(). This takes in the comment and replaces the use of GFP_NOWAIT flag with GFP_NOIO. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:14 +09:00
Jiro SEKIBA	1dfa27105a	nilfs2: stop using periodic write_super callback This removes nilfs_write_super and commit super block in nilfs internal thread, instead of periodic write_super callback. VFS layer calls ->write_super callback periodically. However, it looks like that calling back is ommited when disk I/O is busy. And when cleanerd (nilfs GC) is runnig, disk I/O tend to be busy thus nilfs superblock is not synchronized as nilfs designed. To avoid it, syncing superblock by nilfs thread instead of pdflush. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:14 +09:00
Jiro SEKIBA	79efdd9411	nilfs2: clean up nilfs_write_super Separate conditions that check if syncing super block and alternative super block are required as inline functions to reuse the conditions. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:14 +09:00
Jiro SEKIBA	6233caa9d5	nilfs2: fix disorder of nilfs_write_super in nilfs_sync_fs This fixes disorder of nilfs_write_super in nilfs_sync_fs. Commiting super block must be the end of the function so that every changes are reflected. ->sync_fs() is not called frequently so this makes nilfs_sync_fs call nilfs_commit_super instead of nilfs_write_super. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:14 +09:00
Jiro SEKIBA	ec5d66abdb	nilfs2: remove redundant super block commit This removes redundant super block commit. nilfs_write_super will call nilfs_commit_super to store super block into block device. However, nilfs_put_super will call nilfs_commit_super right after calling nilfs_write_super. So calling nilfs_write_super in nilfs_put_super would be redundant. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:13 +09:00
Jiro SEKIBA	b58a285ba4	nilfs2: implement nilfs_show_options to display mount options in /proc/mounts This is a patch to display mount options in procfs. Mount options will show up in the /proc/mounts as other fs does. ... /dev/sda6 /mnt nilfs2 ro,relatime,barrier=off,cp=3,order=strict 0 0 ... Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:13 +09:00
Ryusuke Konishi	1435110467	nilfs2: always lookup disk block address before reading metadata block The current metadata file code skips disk address lookup for its data block if the buffer has a mapped flag. This has a potential risk to cause read request to be performed against the stale block address that GC moved, and it may lead to meta data corruption. The mapped flag is safe if the buffer has an uptodate flag, otherwise it may prevent necessary update of disk address in the next read. This will avoid the potential problem by ensuring disk address lookup before reading metadata block even for buffers with the mapped flag. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:13 +09:00
Ryusuke Konishi	027d6404eb	nilfs2: use semaphore to protect pointer to a writable FS-instance will get rid of nilfs_get_writer() and nilfs_put_writer() pair used to retain a writable FS-instance for a period. The pair functions were making up some kind of recursive lock with a mutex, but they became overkill since the commit `201913ed74`. Furthermore, they caused the following lockdep warning because the mutex can be released by a task which didn't lock it: ===================================== [ BUG: bad unlock balance detected! ] ------------------------------------- kswapd0/422 is trying to release lock (&nilfs->ns_writer_mutex) at: [<c1359ff5>] mutex_unlock+0x8/0xa but there are no more locks to release! other info that might help us debug this: no locks held by kswapd0/422. stack backtrace: Pid: 422, comm: kswapd0 Not tainted 2.6.31-rc4-nilfs #51 Call Trace: [<c1358f97>] ? printk+0xf/0x18 [<c104fea7>] print_unlock_inbalance_bug+0xcc/0xd7 [<c11578de>] ? prop_put_global+0x3/0x35 [<c1050195>] lock_release+0xed/0x1dc [<c1359ff5>] ? mutex_unlock+0x8/0xa [<c1359f83>] __mutex_unlock_slowpath+0xaf/0x119 [<c1359ff5>] mutex_unlock+0x8/0xa [<d1284add>] nilfs_mdt_write_page+0xd8/0xe1 [nilfs2] [<c1092653>] shrink_page_list+0x379/0x68d [<c109171b>] ? isolate_pages_global+0xb4/0x18c [<c1092bd2>] shrink_list+0x26b/0x54b [<c10930be>] shrink_zone+0x20c/0x2a2 [<c10936b7>] kswapd+0x407/0x591 [<c1091667>] ? isolate_pages_global+0x0/0x18c [<c1040603>] ? autoremove_wake_function+0x0/0x33 [<c10932b0>] ? kswapd+0x0/0x591 [<c104033b>] kthread+0x69/0x6e [<c10402d2>] ? kthread+0x0/0x6e [<c1003e33>] kernel_thread_helper+0x7/0x1a This patch uses a reader/writer semaphore instead of the own lock and kills this warning. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:13 +09:00
Heiko Carstens	b5696e5e0d	nilfs2: fix format string compile warning (ino_t) Unlike on most other architectures ino_t is an unsigned int on s390. So add an explicit cast to avoid this compile warning: fs/nilfs2/recovery.c: In function 'recover_dsync_blocks': fs/nilfs2/recovery.c:555: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'ino_t' Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:13 +09:00
Ryusuke Konishi	1b2f5a641b	nilfs2: fix ignored error code in __nilfs_read_inode() The __nilfs_read_inode function is ignoring the error code returned from nilfs_read_inode_common(), and wrongly delivers a success code (zero) when it escapes from the function in erroneous cases. This adds the missing error handling. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:12 +09:00
Ryusuke Konishi	b1f1b8ce0a	nilfs2: fix preempt count underflow in nilfs_btnode_prepare_change_key This will fix the following preempt count underflow reported from users with the title "[NILFS users] segctord problem" (Message-ID: <949415.6494.qm@web58808.mail.re1.yahoo.com> and Message-ID: <debc30fc0908270825v747c1734xa59126623cfd5b05@mail.gmail.com>): WARNING: at kernel/sched.c:4890 sub_preempt_count+0x95/0xa0() Hardware name: HP Compaq 6530b (KR980UT#ABC) Modules linked in: bridge stp llc bnep rfcomm l2cap xfs exportfs nilfs2 cowloop loop vboxnetadp vboxnetflt vboxdrv btusb bluetooth uvcvideo videodev v4l1_compat v4l2_compat_ioctl32 arc4 snd_hda_codec_analog ecb iwlagn iwlcore rfkill lib80211 mac80211 snd_hda_intel snd_hda_codec ehci_hcd uhci_hcd usbcore snd_hwdep snd_pcm tg3 cfg80211 psmouse snd_timer joydev libphy ohci1394 snd_page_alloc hp_accel lis3lv02d ieee1394 led_class i915 drm i2c_algo_bit video backlight output i2c_core dm_crypt dm_mod Pid: 4197, comm: segctord Not tainted 2.6.30-gentoo-r4-64 #7 Call Trace: [<ffffffff8023fa05>] ? sub_preempt_count+0x95/0xa0 [<ffffffff802470f8>] warn_slowpath_common+0x78/0xd0 [<ffffffff8024715f>] warn_slowpath_null+0xf/0x20 [<ffffffff8023fa05>] sub_preempt_count+0x95/0xa0 [<ffffffffa04ce4db>] nilfs_btnode_prepare_change_key+0x11b/0x190 [nilfs2] [<ffffffffa04d01ad>] nilfs_btree_assign_p+0x19d/0x1e0 [nilfs2] [<ffffffffa04d10ad>] nilfs_btree_assign+0xbd/0x130 [nilfs2] [<ffffffffa04cead7>] nilfs_bmap_assign+0x47/0x70 [nilfs2] [<ffffffffa04d9bc6>] nilfs_segctor_do_construct+0x956/0x20f0 [nilfs2] [<ffffffff805ac8e2>] ? _spin_unlock_irqrestore+0x12/0x40 [<ffffffff803c06e0>] ? __up_write+0xe0/0x150 [<ffffffff80262959>] ? up_write+0x9/0x10 [<ffffffffa04ce9f3>] ? nilfs_bmap_test_and_clear_dirty+0x43/0x60 [nilfs2] [<ffffffffa04cd627>] ? nilfs_mdt_fetch_dirty+0x27/0x60 [nilfs2] [<ffffffffa04db5fc>] nilfs_segctor_construct+0x8c/0xd0 [nilfs2] [<ffffffffa04dc3dc>] nilfs_segctor_thread+0x15c/0x3a0 [nilfs2] [<ffffffffa04dbe20>] ? nilfs_construction_timeout+0x0/0x10 [nilfs2] [<ffffffff80252633>] ? add_timer+0x13/0x20 [<ffffffff802370da>] ? __wake_up_common+0x5a/0x90 [<ffffffff8025e960>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa04dc280>] ? nilfs_segctor_thread+0x0/0x3a0 [nilfs2] [<ffffffffa04dc280>] ? nilfs_segctor_thread+0x0/0x3a0 [nilfs2] [<ffffffff8025e556>] kthread+0x56/0x90 [<ffffffff8020cdea>] child_rip+0xa/0x20 [<ffffffff8025e500>] ? kthread+0x0/0x90 [<ffffffff8020cde0>] ? child_rip+0x0/0x20 This problem was caused due to a missing radix_tree_preload() call in the retry path of nilfs_btnode_prepare_change_key() function. Reported-by: Eric A <eric225125@yahoo.com> Reported-by: Jerome Poulin <jeromepoulin@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Jerome Poulin <jeromepoulin@gmail.com> Cc: stable@kernel.org	2009-08-31 12:03:06 +09:00
Ryusuke Konishi	a924586036	nilfs2: fix oopses with doubly mounted snapshots will fix kernel oopses like the following: # mount -t nilfs2 -r -o cp=20 /dev/sdb1 /test1 # mount -t nilfs2 -r -o cp=20 /dev/sdb1 /test2 # umount /test1 # umount /test2 BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1069 in_atomic(): 0, irqs_disabled(): 1, pid: 3886, name: umount.nilfs2 1 lock held by umount.nilfs2/3886: #0: (&type->s_umount_key#31){+.+...}, at: [<c10b398a>] deactivate_super+0x52/0x6c irq event stamp: 1219 hardirqs last enabled at (1219): [<c135c774>] __mutex_unlock_slowpath+0xf8/0x119 hardirqs last disabled at (1218): [<c135c6d5>] __mutex_unlock_slowpath+0x59/0x119 softirqs last enabled at (1214): [<c1033316>] __do_softirq+0x1a5/0x1ad softirqs last disabled at (1205): [<c1033354>] do_softirq+0x36/0x5a Pid: 3886, comm: umount.nilfs2 Not tainted 2.6.31-rc6 #55 Call Trace: [<c1023549>] __might_sleep+0x107/0x10e [<c13603c0>] do_page_fault+0x246/0x397 [<c136017a>] ? do_page_fault+0x0/0x397 [<c135e753>] error_code+0x6b/0x70 [<c136017a>] ? do_page_fault+0x0/0x397 [<c104f805>] ? __lock_acquire+0x91/0x12fd [<c1050a62>] ? __lock_acquire+0x12ee/0x12fd [<c1050a62>] ? __lock_acquire+0x12ee/0x12fd [<c1050b2b>] lock_acquire+0xba/0xdd [<d0d17d3f>] ? nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2] [<c135d4fe>] down_write+0x2a/0x46 [<d0d17d3f>] ? nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2] [<d0d17d3f>] nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2] [<c104ea2c>] ? mark_held_locks+0x43/0x5b [<c104ecb1>] ? trace_hardirqs_on_caller+0x10b/0x133 [<c104ece4>] ? trace_hardirqs_on+0xb/0xd [<d0d09ac1>] nilfs_put_super+0x2f/0xca [nilfs2] [<c10b3352>] generic_shutdown_super+0x49/0xb8 [<c10b33de>] kill_block_super+0x1d/0x31 [<c10e6599>] ? vfs_quota_off+0x0/0x12 [<c10b398f>] deactivate_super+0x57/0x6c [<c10c4bc3>] mntput_no_expire+0x8c/0xb4 [<c10c5094>] sys_umount+0x27f/0x2a4 [<c10c50c6>] sys_oldumount+0xd/0xf [<c10031a4>] sysenter_do_call+0x12/0x38 ... This turns out to be a bug brought by an -rc1 patch ("nilfs2: simplify remaining sget() use"). In the patch, a new "put resource" function, nilfs_put_sbinfo() was introduced to delay freeing nilfs_sb_info struct. But the nilfs_put_sbinfo() mistakenly used atomic_dec_and_test() function to check the reference count, and it caused the nilfs_sb_info was freed when user mounted a snapshot twice. This bug also suggests there was unseen memory leak in usual mount /umount operations for nilfs. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-08-19 02:10:13 +09:00
Zhang Qiang	1154ecbd2f	nilfs2: missing a read lock for segment writer in nilfs_attach_checkpoint() 'ns_cno' of structure 'the_nilfs' must be protected from segment writer, in other words, the caller of nilfs_get_checkpoint should hold read lock for nilfs->ns_segctor_sem. This patch adds the lock/unlock operations in nilfs_attach_checkpoint() when calling nilfs_cpfile_get_checkpoint(). Signed-off-by: Zhang Qiang <zhangqiang.buaa@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-08-18 17:32:27 +09:00
Ryusuke Konishi	01a261e09a	nilfs2: fix missing unlock in error path of nilfs_mdt_write_page This adds a missing unlock of nilfs->ns_writer_mutex in nilfs_mdt_write_page() function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-08-02 22:24:15 +09:00
Ryusuke Konishi	a97778457f	nilfs2: fix oops due to inconsistent state in page with discrete b-tree nodes Andrea Gelmini gave me a report that a kernel oops hit on a nilfs filesystem with a 1KB block size when doing rsync. This turned out to be caused by an inconsistency of dirty state between a page and its buffers storing b-tree node blocks. If the page had multiple buffers split over multiple logs, and if the logs were written at a time, a dirty flag remained in the page even every dirty flag in the buffers was cleared. This will fix the failure by dropping the dirty flag properly for pages with the discrete multiple b-tree nodes. Reported-by: Andrea Gelmini <andrea.gelmini@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Andrea Gelmini <andrea.gelmini@gmail.com> Cc: stable@kernel.org	2009-08-01 22:48:32 +09:00
Ryusuke Konishi	4fed598a49	fs/Kconfig: move nilfs2 out fs/Kconfig file was split into individual fs/*/Kconfig files before nilfs was merged. I've found the current config entry of nilfs is tainting the work. Sorry, I didn't notice. This fixes the violation. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Alexey Dobriyan <adobriyan@gmail.com>	2009-07-14 12:34:17 +09:00
Alexey Dobriyan	405f55712d	headers: smp_lock.h redux * Remove smp_lock.h from files which don't need it (including some headers!) * Add smp_lock.h to files which do need it * Make smp_lock.h include conditional in hardirq.h It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT This will make hardirq.h inclusion cheaper for every PREEMPT=n config (which includes allmodconfig/allyesconfig, BTW) Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-12 12:22:34 -07:00
Jiro SEKIBA	d9a0a345ab	nilfs2: fix disorder in cp count on error during deleting checkpoints This fixes a bug that checkpoint count gets wrong on errors when deleting a series of checkpoints. The count error is persistent since the checkpoint count is stored on disk. Some userland programs refer to the count via ioctl, and this bugfix is needed to prevent malfunction of such programs. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable@kernel.org	2009-07-05 10:44:20 +09:00
Ryusuke Konishi	ff54de363a	nilfs2: fix lockdep warning between regular file and inode file This will fix the following false positive of recursive locking which lockdep has detected: ============================================= [ INFO: possible recursive locking detected ] 2.6.30-nilfs #42 --------------------------------------------- nilfs_cleanerd/10607 is trying to acquire lock: (&bmap->b_sem){++++-.}, at: [<e0d025b7>] nilfs_bmap_lookup_at_level+0x1a/0x74 [nilfs2] but task is already holding lock: (&bmap->b_sem){++++-.}, at: [<e0d024e0>] nilfs_bmap_truncate+0x19/0x6a [nilfs2] other info that might help us debug this: 2 locks held by nilfs_cleanerd/10607: #0: (&nilfs->ns_segctor_sem){++++.+}, at: [<e0d0d75a>] nilfs_transaction_begin+0xb6/0x10c [nilfs2] #1: (&bmap->b_sem){++++-.}, at: [<e0d024e0>] nilfs_bmap_truncate+0x19/0x6a [nilfs2] Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-07-05 10:44:20 +09:00
Ryusuke Konishi	4a52df7797	nilfs2: fix incorrect KERN_CRIT messages in case of write failures In case of write-failure retries, the following KERN_CRIT level messages are mistakenly output by nilfs_dat_commit_start() function: nilfs_dat_commit_start: vbn = 408463, start = 12506, end = 18446744073709551615, pbn = 530210 nilfs_dat_commit_start: vbn = 408515, start = 12506, end = 18446744073709551615, pbn = 530211 nilfs_dat_commit_start: vbn = 408464, start = 12506, end = 18446744073709551615, pbn = 530212 ... This suppresses these messages. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable@kernel.org	2009-07-05 10:44:20 +09:00
Ryusuke Konishi	8227b29722	nilfs2: fix hang problem of log writer which occurs after write failures Leandro Lucarella gave me a report that nilfs gets stuck after its write function fails. The problem turned out to be caused by bugs which leave writeback flag on pages. This fixes the problem by ensuring to clear the writeback flag in error path. Reported-by: Leandro Lucarella <llucax@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable@kernel.org	2009-07-05 10:44:20 +09:00
Ryusuke Konishi	0cfae3d879	nilfs2: remove unlikely directive causing mis-conversion of error code The following error code handling in nilfs_segctor_write() function wrongly converted negative error codes to a truth value (i.e. 1): err = unlikely(err) ? : res; which originaly meant to be err = err ? : res; This mis-conversion caused that write or sync functions receive the unexpected error code. This fixes the bug by removing the unlikely directive. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable@kernel.org	2009-07-05 10:44:19 +09:00
Al Viro	d441b1c293	switch nilfs2 to inode->i_acl Actually, get rid of private analog, since nothing in there is using ACLs at all so far. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:17:05 -04:00
Linus Torvalds	9c7cb99a82	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (22 commits) nilfs2: support contiguous lookup of blocks nilfs2: add sync_page method to page caches of meta data nilfs2: use device's backing_dev_info for btree node caches nilfs2: return EBUSY against delete request on snapshot nilfs2: modify list of unsupported features in caveats nilfs2: enable sync_page method nilfs2: set bio unplug flag for the last bio in segment nilfs2: allow future expansion of metadata read out via get info ioctl NILFS2: Pagecache usage optimization on NILFS2 nilfs2: remove nilfs_btree_operations from btree mapping nilfs2: remove nilfs_direct_operations from direct mapping nilfs2: remove bmap pointer operations nilfs2: remove useless b_low and b_high fields from nilfs_bmap struct nilfs2: remove pointless NULL check of bpop_commit_alloc_ptr function nilfs2: move get block functions in bmap.c into btree codes nilfs2: remove nilfs_bmap_delete_block nilfs2: remove nilfs_bmap_put_block nilfs2: remove header file for segment list operations nilfs2: eliminate removal list of segments nilfs2: add sufile function that can modify multiple segment usages ...	2009-06-15 09:13:49 -07:00
Ryusuke Konishi	aa7dfb8954	nilfs2: get rid of bd_mount_sem use from nilfs This will remove every bd_mount_sem use in nilfs. The intended exclusion control was replaced by the previous patch ("nilfs2: correct exclusion control in nilfs_remount function") for nilfs_remount(), and this patch will replace remains with a new mutex that this inserts in nilfs object. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:18 -04:00
Ryusuke Konishi	e59399d010	nilfs2: correct exclusion control in nilfs_remount function nilfs_remount() changes mount state of a superblock instance. Even though nilfs accesses other superblock instances during mount or remount, the mount state was not properly protected in nilfs_remount(). Moreover, nilfs_remount() has a lock order reversal problem; nilfs_get_sb() holds: 1. bdev->bd_mount_sem 2. sb->s_umount (sget acquires) and nilfs_remount() holds: 1. sb->s_umount (locked by the caller in vfs) 2. bdev->bd_mount_sem To avoid these problems, this patch divides a semaphore protecting super block instances from nilfs->ns_sem, and applies it to the mount state protection in nilfs_remount(). With this change, bd_mount_sem use is removed from nilfs_remount() and the lock order reversal will be resolved. And the new rw-semaphore, nilfs->ns_super_sem will properly protect the mount state except the modification from nilfs_error function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:18 -04:00
Ryusuke Konishi	6dd4740662	nilfs2: simplify remaining sget() use This simplifies the test function passed on the remaining sget() callsite in nilfs. Instead of checking mount type (i.e. ro-mount/rw-mount/snapshot mount) in the test function passed to sget(), this patch first looks up the nilfs_sb_info struct which the given mount type matches, and then acquires the super block instance holding the nilfs_sb_info. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:18 -04:00
Ryusuke Konishi	3f82ff5516	nilfs2: get rid of sget use for checking if current mount is present This stops using sget() for checking if an r/w-mount or an r/o-mount exists on the device. This elimination uses a back pointer to the current mount added to nilfs object. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:17 -04:00
Ryusuke Konishi	33c8e57c86	nilfs2: get rid of sget use for acquiring nilfs object This will change the way to obtain nilfs object in nilfs_get_sb() function. Previously, a preliminary sget() call was performed, and the nilfs object was acquired from a super block instance found by the sget() call. This patch, instead, instroduces a new dedicated function find_or_create_nilfs(); as the name implies, the function finds an existent nilfs object from a global list or creates a new one if no object is found on the device. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:17 -04:00
Ryusuke Konishi	81fc20bd0e	nilfs2: remove meaningless EBUSY case from nilfs_get_sb function The following EBUSY case in nilfs_get_sb() is meaningless. Indeed, this error code is never returned to the caller. if (!s->s_root) { ... } else if (!(s->s_flags & MS_RDONLY)) { err = -EBUSY; } This simply removes the else case. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:17 -04:00
Christoph Hellwig	d731e06323	nilfs2: call nilfs2_write_super from nilfs2_sync_fs The call to ->write_super from __sync_filesystem will go away, so make sure nilfs2 performs the same actions from inside ->sync_fs. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:17 -04:00
Alessio Igor Bogani	337eb00a2c	Push BKL down into ->remount_fs() [xfs, btrfs, capifs, shmem don't need BKL, exempt] Signed-off-by: Alessio Igor Bogani <abogani@texware.it> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:11 -04:00
Christoph Hellwig	6cfd014842	push BKL down into ->put_super Move BKL into ->put_super from the only caller. A couple of filesystems had trivial enough ->put_super (only kfree and NULLing of s_fs_info + stuff in there) to not get any locking: coda, cramfs, efs, hugetlbfs, omfs, qnx4, shmem, all others got the full treatment. Most of them probably don't need it, but I'd rather sort that out individually. Preferably after all the other BKL pushdowns in that area. [AV: original used to move lock_super() down as well; these changes are removed since we don't do lock_super() at all in generic_shutdown_super() now] [AV: fuse, btrfs and xfs are known to need no damn BKL, exempt] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:07 -04:00
Christoph Hellwig	8c85e12512	remove ->write_super call in generic_shutdown_super We just did a full fs writeout using sync_filesystem before, and if that's not enough for the filesystem it can perform it's own writeout in ->put_super, which many filesystems already do. Move a call to foofs_write_super into every foofs_put_super for now to guarantee identical behaviour until it's cleaned up by the individual filesystem maintainers. Exceptions: - affs already has identical copy & pasted code at the beginning of affs_put_super so no need to do it twice. - xfs does the right thing without it and I have changes pending for the xfs tree touching this are so I don't really need conflicts here.. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:06 -04:00
Linus Torvalds	c9059598ea	Merge branch 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block: (153 commits) block: add request clone interface (v2) floppy: fix hibernation ramdisk: remove long-deprecated "ramdisk=" boot-time parameter fs/bio.c: add missing __user annotation block: prevent possible io_context->refcount overflow Add serial number support for virtio_blk, V4a block: Add missing bounce_pfn stacking and fix comments Revert "block: Fix bounce limit setting in DM" cciss: decode unit attention in SCSI error handling code cciss: Remove no longer needed sendcmd reject processing code cciss: change SCSI error handling routines to work with interrupts enabled. cciss: separate error processing and command retrying code in sendcmd_withirq_core() cciss: factor out fix target status processing code from sendcmd functions cciss: simplify interface of sendcmd() and sendcmd_withirq() cciss: factor out core of sendcmd_withirq() for use by SCSI error handling code cciss: Use schedule_timeout_uninterruptible in SCSI error handling code block: needs to set the residual length of a bidi request Revert "block: implement blkdev_readpages" block: Fix bounce limit setting in DM Removed reference to non-existing file Documentation/PCI/PCI-DMA-mapping.txt ... Manually fix conflicts with tracing updates in: block/blk-sysfs.c drivers/ide/ide-atapi.c drivers/ide/ide-cd.c drivers/ide/ide-floppy.c drivers/ide/ide-tape.c include/trace/events/block.h kernel/trace/blktrace.c	2009-06-11 11:10:35 -07:00
Ryusuke Konishi	c3a7abf06c	nilfs2: support contiguous lookup of blocks Although get_block() callback function can return extent of contiguous blocks with bh->b_size, nilfs_get_block() function did not support this feature. This adds contiguous lookup feature to the block mapping codes of nilfs, and allows the nilfs_get_blocks() function to return the extent information by applying the feature. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:12 +09:00
Ryusuke Konishi	fa032744ad	nilfs2: add sync_page method to page caches of meta data This applies block_sync_page() function to the sync_page method of page caches for meta data files, gc page caches, and btree node buffers. This is a companion patch of ("nilfs2: enable sync_page mothod") which applied the function for data pages. This allows lock_page() for those meta data to unplug pending bio requests. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:12 +09:00
Ryusuke Konishi	a53b4751ae	nilfs2: use device's backing_dev_info for btree node caches Previously, default_backing_dev_info was used for the mapping of btree node caches. This uses device dependent backing_dev_info to allow detailed control of the device for the btree node pages. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:12 +09:00
Ryusuke Konishi	30c25be71f	nilfs2: return EBUSY against delete request on snapshot This helps userland programs like the rmcp command to distinguish error codes returned against a checkpoint removal request. Previously -EPERM was returned, and not discriminable from real permission errors. This also allows removal of the latest checkpoint because the deletion leads to create a new checkpoint, and thus it's harmless for the filesystem. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:12 +09:00
Ryusuke Konishi	e85dc1d529	nilfs2: enable sync_page method This adds a missing sync_page method which unplugs bio requests when waiting for page locks. This will improve read performance of nilfs. Here is a measurement result using dd command. Without this patch: # mount -t nilfs2 /dev/sde1 /test # dd if=/test/aaa of=/dev/null bs=512k 1024+0 records in 1024+0 records out 536870912 bytes (537 MB) copied, 6.00688 seconds, 89.4 MB/s With this patch: # mount -t nilfs2 /dev/sde1 /test # dd if=/test/aaa of=/dev/null bs=512k 1024+0 records in 1024+0 records out 536870912 bytes (537 MB) copied, 3.54998 seconds, 151 MB/s Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:11 +09:00
Ryusuke Konishi	30bda0b8ae	nilfs2: set bio unplug flag for the last bio in segment This sets BIO_RW_UNPLUG flag on the last bio of each segment during write. The last bio should be unplugged immediately because the caller waits for the completion after the submission. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:11 +09:00
Ryusuke Konishi	003ff182fd	nilfs2: allow future expansion of metadata read out via get info ioctl Nilfs has some ioctl commands to read out metadata from meta data files: - NILFS_IOCTL_GET_CPINFO for checkpoint file, - NILFS_IOCTL_GET_SUINFO for segment usage file, and - NILFS_IOCTL_GET_VINFO for Disk Address Transalation (DAT) file, respectively. Every routine on these metadata files is implemented so that it allows future expansion of on-disk format. But, the above ioctl commands do not support expansion even though nilfs_argv structure can handle arbitrary size for data exchanged via ioctl. This allows future expansion of the following structures which give basic format of the "get information" ioctls: - struct nilfs_cpinfo - struct nilfs_suinfo - struct nilfs_vinfo So, this introduces forward compatility of such ioctl commands. In this patch, a sanity check in nilfs_ioctl_get_info() function is changed to accept larger data structure [1], and metadata read routines are rewritten so that they become compatible for larger structures; the routines will just ignore the remaining fields which the current version of nilfs doesn't know. [1] The ioctl function already has another upper limit (PAGE_SIZE against a structure, which appears in nilfs_ioctl_wrap_copy function), and this will not cause security problem. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:11 +09:00
Hisashi Hifumi	258ef67e24	NILFS2: Pagecache usage optimization on NILFS2 Hi, I introduced "is_partially_uptodate" aops for NILFS2. A page can have multiple buffers and even if a page is not uptodate, some buffers can be uptodate on pagesize != blocksize environment. This aops checks that all buffers which correspond to a part of a file that we want to read are uptodate. If so, we do not have to issue actual read IO to HDD even if a page is not uptodate because the portion we want to read are uptodate. "block_is_partially_uptodate" function is already used by ext2/3/4. With the following patch random read/write mixed workloads or random read after random write workloads can be optimized and we can get performance improvement. I did a performance test using the sysbench. 1 --file-block-size=8K --file-total-size=2G --file-test-mode=rndrw --file-fsync-freq=0 --fil e-rw-ratio=1 run -2.6.30-rc5 Test execution summary: total time: 151.2907s total number of events: 200000 total time taken by event execution: 2409.8387 per-request statistics: min: 0.0000s avg: 0.0120s max: 0.9306s approx. 95 percentile: 0.0439s Threads fairness: events (avg/stddev): 12500.0000/238.52 execution time (avg/stddev): 150.6149/0.01 -2.6.30-rc5-patched Test execution summary: total time: 140.8828s total number of events: 200000 total time taken by event execution: 2240.8577 per-request statistics: min: 0.0000s avg: 0.0112s max: 0.8750s approx. 95 percentile: 0.0418s Threads fairness: events (avg/stddev): 12500.0000/218.43 execution time (avg/stddev): 140.0536/0.01 arch: ia64 pagesize: 16k Thanks. Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:11 +09:00
Ryusuke Konishi	7cde31d7d6	nilfs2: remove nilfs_btree_operations from btree mapping will remove indirect function calls using nilfs_btree_operations table. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:11 +09:00
Ryusuke Konishi	355c6b6103	nilfs2: remove nilfs_direct_operations from direct mapping will remove indirect function calls using nilfs_direct_operations table. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:11 +09:00
Ryusuke Konishi	d4b961576d	nilfs2: remove bmap pointer operations Previously, the bmap codes of nilfs used three types of function tables. The abuse of indirect function calls decreased source readability and suffered many indirect jumps which would confuse branch prediction of processors. This eliminates one type of the function tables, nilfs_bmap_ptr_operations, which was used to dispatch low level pointer operations of the nilfs bmap. This adds a new integer variable "b_ptr_type" to nilfs_bmap struct, and uses the value to select the pointer operations. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:10 +09:00
Ryusuke Konishi	3033342a0b	nilfs2: remove useless b_low and b_high fields from nilfs_bmap struct This will cut off 16 bytes from the nilfs_bmap struct which is embedded in the on-memory inode of nilfs. The b_high field was never used, and the b_low field stores a constant value which can be determined by whether the inode uses btree for block mapping or not. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:10 +09:00
Ryusuke Konishi	e473c1f265	nilfs2: remove pointless NULL check of bpop_commit_alloc_ptr function This indirect function is set to NULL only for gc cache inodes, but the gc cache inodes never call this function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:10 +09:00
Ryusuke Konishi	f198dbb9cf	nilfs2: move get block functions in bmap.c into btree codes Two get block function for btree nodes, nilfs_bmap_get_block() and nilfs_bmap_get_new_block(), are called only from the btree codes. This relocation will increase opportunities of compiler optimization. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:10 +09:00
Ryusuke Konishi	9f098900ad	nilfs2: remove nilfs_bmap_delete_block nilfs_bmap_delete_block() is a wrapper function calling nilfs_btnode_delete(). This removes it for simplicity. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:10 +09:00
Ryusuke Konishi	087d01b425	nilfs2: remove nilfs_bmap_put_block nilfs_bmap_put_block() is a wrapper function calling brelse(). This eliminates the wrapper for simplicity. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:10 +09:00
Ryusuke Konishi	654137dd46	nilfs2: remove header file for segment list operations This will eliminate obsolete list operations of nilfs_segment_entry structure which has been used to handle mutiple segment numbers. The patch ("nilfs2: remove list of freeing segments") removed use of the structure from the segment constructor code, and this patch simplifies the remaining code by integrating it into recovery.c. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:09 +09:00
Ryusuke Konishi	071cb4b819	nilfs2: eliminate removal list of segments This will clean up the removal list of segments and the related functions from segment.c and ioctl.c, which have hurt code readability. This elimination is applied by using nilfs_sufile_updatev() previously introduced in the patch ("nilfs2: add sufile function that can modify multiple segment usages"). Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:09 +09:00
Ryusuke Konishi	dda54f4b87	nilfs2: add sufile function that can modify multiple segment usages This is a preparation for the later cleanup patch ("nilfs2: remove list of freeing segments"). This adds nilfs_sufile_updatev() to sufile, which can modify multiple segment usages at a time. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:09 +09:00
Ryusuke Konishi	d97a51a7e3	nilfs2: unify bmap operations starting use of indirect block address This simplifies some low level functions of bmap. Three bmap pointer operations, nilfs_bmap_start_v(), nilfs_bmap_commit_v(), and nilfs_bmap_abort_v(), are unified into one nilfs_bmap_start_v() function. And the related indirect function calls are replaced with it. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:09 +09:00
Ryusuke Konishi	6582207064	nilfs2: remove nilfs_dat_prepare_free function This function is unused. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:09 +09:00
Ryusuke Konishi	62013ab5d5	nilfs2: fix bh leak in nilfs_cpfile_delete_checkpoints function The nilfs_cpfile_delete_checkpoints() wrongly skips brelse() for the header block of checkpoint file in case of errors. This fixes the leak bug. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-05-30 22:07:50 +09:00
Martin K. Petersen	e1defc4ff0	block: Do away with the notion of hardsect_size Until now we have had a 1:1 mapping between storage device physical block size and the logical block sized used when addressing the device. With SATA 4KB drives coming out that will no longer be the case. The sector size will be 4KB but the logical block size will remain 512-bytes. Hence we need to distinguish between the physical block size and the logical ditto. This patch renames hardsect_size to logical_block_size. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-22 23:22:54 +02:00
Ryusuke Konishi	d504685363	nilfs2: fix memory leak in nilfs_ioctl_clean_segments This fixes a new memory leak problem in garbage collection. The problem was brought by the bugfix patch ("nilfs2: fix lock order reversal in nilfs_clean_segments ioctl"). Thanks to Kentaro Suzuki for finding this problem. Reported-by: Kentaro Suzuki <k_suzuki@ms.sylc.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-05-22 20:49:04 +09:00
Ryusuke Konishi	83aca8f480	nilfs2: check size of array structured data exchanged via ioctls Although some ioctls of nilfs2 exchange data in the form of indirectly referenced array, some of them lack size check on the array elements. This inserts the missing checks and rejects requests if data of ioctl does not have a valid format. We usually don't have to check size of structures that we associated with ioctl commands because the size is tested implicitly for identifying ioctl command; the checks this patch adds are for the cases where the implicit check is not applied. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-05-12 01:48:54 +09:00
Ryusuke Konishi	4f6b828837	nilfs2: fix lock order reversal in nilfs_clean_segments ioctl This is a companion patch to ("nilfs2: fix possible circular locking for get information ioctls"). This corrects lock order reversal between mm->mmap_sem and nilfs->ns_segctor_sem in nilfs_clean_segments() which was detected by lockdep check: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.30-rc3-nilfs-00003-g360bdc1 #7 ------------------------------------------------------- mmap/5294 is trying to acquire lock: (&nilfs->ns_segctor_sem){++++.+}, at: [<d0d0e846>] nilfs_transaction_begin+0xb6/0x10c [nilfs2] but task is already holding lock: (&mm->mmap_sem){++++++}, at: [<c043700a>] do_page_fault+0x1d8/0x30a which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&mm->mmap_sem){++++++}: [<c01470a5>] __lock_acquire+0x1066/0x13b0 [<c01474a9>] lock_acquire+0xba/0xdd [<c01836bc>] might_fault+0x68/0x88 [<c023c61d>] copy_from_user+0x2a/0x111 [<d0d120d0>] nilfs_ioctl_prepare_clean_segments+0x1d/0xf1 [nilfs2] [<d0d0e2aa>] nilfs_clean_segments+0x6d/0x1b9 [nilfs2] [<d0d11f68>] nilfs_ioctl+0x2ad/0x318 [nilfs2] [<c01a3be7>] vfs_ioctl+0x22/0x69 [<c01a408e>] do_vfs_ioctl+0x460/0x499 [<c01a4107>] sys_ioctl+0x40/0x5a [<c01031a4>] sysenter_do_call+0x12/0x38 [<ffffffff>] 0xffffffff -> #0 (&nilfs->ns_segctor_sem){++++.+}: [<c0146e0b>] __lock_acquire+0xdcc/0x13b0 [<c01474a9>] lock_acquire+0xba/0xdd [<c0433f1d>] down_read+0x2a/0x3e [<d0d0e846>] nilfs_transaction_begin+0xb6/0x10c [nilfs2] [<d0cfe0e5>] nilfs_page_mkwrite+0xe7/0x154 [nilfs2] [<c0183b0b>] __do_fault+0x165/0x376 [<c01855cd>] handle_mm_fault+0x287/0x5d1 [<c043712d>] do_page_fault+0x2fb/0x30a [<c0435462>] error_code+0x72/0x78 [<ffffffff>] 0xffffffff where nilfs_clean_segments() holds: nilfs->ns_segctor_sem -> copy_from_user() --> page fault -> mm->mmap_sem And, page fault path may hold: page fault -> mm->mmap_sem --> nilfs_page_mkwrite() -> nilfs->ns_segctor_sem Even though nilfs_clean_segments() does not perform write access on given user pages, it may cause deadlock because nilfs->ns_segctor_sem is shared per device and mm->mmap_sem can be shared with other tasks. To avoid this problem, this patch moves all calls of copy_from_user() outside the nilfs->ns_segctor_sem lock in the ioctl. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-05-11 14:54:41 +09:00
Ryusuke Konishi	47eb6b9c8f	nilfs2: fix possible circular locking for get information ioctls This is one of two patches which are to correct possible circular locking between mm->mmap_sem and nilfs->ns_segctor_sem. The problem was detected by lockdep check as follows: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.30-rc3-nilfs-00002-g3552613 #6 ------------------------------------------------------- mmap/5418 is trying to acquire lock: (&nilfs->ns_segctor_sem){++++.+}, at: [<d0d0e852>] nilfs_transaction_begin+0xb6/0x10c [nilfs2] but task is already holding lock: (&mm->mmap_sem){++++++}, at: [<c043700a>] do_page_fault+0x1d8/0x30a which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&mm->mmap_sem){++++++}: [<c01470a5>] __lock_acquire+0x1066/0x13b0 [<c01474a9>] lock_acquire+0xba/0xdd [<c01836bc>] might_fault+0x68/0x88 [<c023c730>] copy_to_user+0x2c/0xfc [<d0d11b4f>] nilfs_ioctl_wrap_copy+0x103/0x160 [nilfs2] [<d0d11fa9>] nilfs_ioctl+0x30a/0x3b0 [nilfs2] [<c01a3be7>] vfs_ioctl+0x22/0x69 [<c01a408e>] do_vfs_ioctl+0x460/0x499 [<c01a4107>] sys_ioctl+0x40/0x5a [<c01031a4>] sysenter_do_call+0x12/0x38 [<ffffffff>] 0xffffffff -> #0 (&nilfs->ns_segctor_sem){++++.+}: [<c0146e0b>] __lock_acquire+0xdcc/0x13b0 [<c01474a9>] lock_acquire+0xba/0xdd [<c0433f1d>] down_read+0x2a/0x3e [<d0d0e852>] nilfs_transaction_begin+0xb6/0x10c [nilfs2] [<d0cfe0e5>] nilfs_page_mkwrite+0xe7/0x154 [nilfs2] [<c0183b0b>] __do_fault+0x165/0x376 [<c01855cd>] handle_mm_fault+0x287/0x5d1 [<c043712d>] do_page_fault+0x2fb/0x30a [<c0435462>] error_code+0x72/0x78 [<ffffffff>] 0xffffffff other info that might help us debug this: 1 lock held by mmap/5418: #0: (&mm->mmap_sem){++++++}, at: [<c043700a>] do_page_fault+0x1d8/0x30a stack backtrace: Pid: 5418, comm: mmap Not tainted 2.6.30-rc3-nilfs-00002-g3552613 #6 Call Trace: [<c0432145>] ? printk+0xf/0x12 [<c0145c48>] print_circular_bug_tail+0xaa/0xb5 [<c0146e0b>] __lock_acquire+0xdcc/0x13b0 [<d0d10149>] ? nilfs_sufile_get_stat+0x1e/0x105 [nilfs2] [<c013b59a>] ? up_read+0x16/0x2c [<d0d10225>] ? nilfs_sufile_get_stat+0xfa/0x105 [nilfs2] [<c01474a9>] lock_acquire+0xba/0xdd [<d0d0e852>] ? nilfs_transaction_begin+0xb6/0x10c [nilfs2] [<c0433f1d>] down_read+0x2a/0x3e [<d0d0e852>] ? nilfs_transaction_begin+0xb6/0x10c [nilfs2] [<d0d0e852>] nilfs_transaction_begin+0xb6/0x10c [nilfs2] [<d0cfe0e5>] nilfs_page_mkwrite+0xe7/0x154 [nilfs2] [<c0183b0b>] __do_fault+0x165/0x376 [<c01855cd>] handle_mm_fault+0x287/0x5d1 [<c043700a>] ? do_page_fault+0x1d8/0x30a [<c013b54f>] ? down_read_trylock+0x39/0x43 [<c043712d>] do_page_fault+0x2fb/0x30a [<c0436e32>] ? do_page_fault+0x0/0x30a [<c0435462>] error_code+0x72/0x78 [<c0436e32>] ? do_page_fault+0x0/0x30a This makes the lock granularity of nilfs->ns_segctor_sem finer than that of the mmap semaphore for ioctl commands except nilfs_clean_segments(). The successive patch ("nilfs2: fix lock order reversal in nilfs_clean_segments ioctl") is required to fully resolve the problem. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-05-11 12:57:46 +09:00
Ryusuke Konishi	843382370e	nilfs2: ensure to clear dirty state when deleting metadata file block This would fix the following failure during GC: nilfs_cpfile_delete_checkpoints: cannot delete block NILFS: GC failed during preparation: cannot delete checkpoints: err=-2 The problem was caused by a break in state consistency between page cache and btree; the above block was removed from the btree but the page buffering the block was remaining in the page cache in dirty state. This resolves the inconsistency by ensuring to clear dirty state of the page buffering the deleted block. Reported-by: David Arendt <admin@prnet.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-05-10 17:04:42 +09:00
Ryusuke Konishi	201913ed74	nilfs2: fix circular locking dependency of writer mutex This fixes the following circular locking dependency problem: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.30-rc3 #5 ------------------------------------------------------- segctord/3895 is trying to acquire lock: (&nilfs->ns_writer_mutex){+.+...}, at: [<d0d02172>] nilfs_mdt_get_block+0x89/0x20f [nilfs2] but task is already holding lock: (&bmap->b_sem){++++..}, at: [<d0d02d99>] nilfs_bmap_propagate+0x14/0x2e [nilfs2] which lock already depends on the new lock. The bugfix is done by replacing call sites of nilfs_get_writer() which are never called from read-only context with direct dereferencing of pointer to a writable FS-instance. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-05-09 13:36:57 +09:00
Ryusuke Konishi	85c2a74fab	nilfs2: fix possible recovery failure due to block creation without writer Some function calls in nilfs_prepare_segment_for_recovery() may fail because they can create blocks on meta data files without configuring a writable FS-instance. Concretely, nilfs_mdt_create_block() routine of meta data files will fail in that case. This fixes the problem by temporarily attaching a writable FS-instace during the function is called. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-05-09 13:36:56 +09:00
Ryusuke Konishi	c85399c2da	nilfs2: fix possible mismatch of sufile counters on recovery On-disk counters ndirtysegs and ncleansegs of sufile, can go wrong after roll-forward recovery because nilfs_prepare_segment_for_recovery() function marks segments dirty without adjusting value of these counters. This fixes the problem by adding a function to sufile which does the operation adjusting the counters, and by letting the recovery function use it. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-04-13 09:53:52 +09:00
Ryusuke Konishi	a703018f7b	nilfs2: segment usage file cleanups This will simplify sufile.c by sharing common code which repeatedly appears in routines updating a segment usage entry; a wrapper function nilfs_sufile_update() is introduced for the purpose, and counter modifications are integrated to a new function nilfs_sufile_mod_counter(). This is a preparation for the successive bugfix patch ("nilfs2: fix possible mismatch of sufile counters on recovery"). Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-04-13 09:53:51 +09:00
Ryusuke Konishi	88072faf9a	nilfs2: fix wrong accounting and duplicate brelse in nilfs_sufile_set_error The nilfs_sufile_set_error() function wrongly adjusts the number of dirty segments instead of the number of clean segments. In addition, the function calls brelse() twice for the same buffer head. This fixes these bugs. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-04-13 09:53:51 +09:00
Ryusuke Konishi	3efb55b496	nilfs2: simplify handling of active state of segments fix This fixes a bug of ("nilfs2: simplify handling of active state of segments") patch. The patch did not take account that a base index is increased in nilfs_sufile_get_suinfo() function if requested entries go across block boundary on sufile. Due to this bug, the active flag sometimes appears on wrong segments and has induced malfunction of garbage collection. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-04-13 09:53:51 +09:00
Ryusuke Konishi	e7a7402c0d	nilfs2: remove module version A MODULE_VERSION() macro has been used in out-of-tree nilfs modules, but it's needless and not updated in tree. So, this removes it along with the version declaration. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-04-13 09:53:50 +09:00
Ryusuke Konishi	c2698e50e3	nilfs2: fix lockdep recursive locking warning on meta data files This fixes the following false detection of lockdep against nilfs meta data files: ============================================= [ INFO: possible recursive locking detected ] 2.6.29 #26 --------------------------------------------- mount.nilfs2/4185 is trying to acquire lock: (&mi->mi_sem){----}, at: [<d0c7925b>] nilfs_sufile_get_stat+0x1e/0x105 [nilfs2] but task is already holding lock: (&mi->mi_sem){----}, at: [<d0c72026>] nilfs_count_free_blocks+0x48/0x84 [nilfs2] Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-04-13 09:53:50 +09:00
Ryusuke Konishi	bcb48891b0	nilfs2: fix lockdep recursive locking warning on bmap The bmap semaphore of DAT file can be held while a bmap of other files is locked. This has caused the following false detection of lockdep check: mount.nilfs2/4667 is trying to acquire lock: (&bmap->b_sem){..--}, at: [<d0c6c4b4>] nilfs_bmap_lookup_at_level+0x1a/0x74 [nilfs2] but task is already holding lock: (&bmap->b_sem){..--}, at: [<d0c6c4b4>] nilfs_bmap_lookup_at_level+0x1a/0x74 [nilfs2] This will fix the false detection by distinguishing semaphores of the DAT and other files. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-04-13 09:53:49 +09:00
Ryusuke Konishi	c306af23e1	nilfs2: return f_fsid for statfs2 This follows the change of Coly Li's series ("fs: return f_fsid for statfs(2)"), and make nilfs2 return f_fsid info for statfs(2). Acked-by: Coly Li <coly.li@suse.de> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-04-13 09:53:49 +09:00
Ryusuke Konishi	612392307c	nilfs2: support nanosecond timestamp After a review of user's feedback for finding out other compatibility issues, I found nilfs improperly initializes timestamps in inode; CURRENT_TIME was used there instead of CURRENT_TIME_SEC even though nilfs didn't have nanosecond timestamps on disk. A few users gave us the report that the tar program sometimes failed to expand symbolic links on nilfs, and it turned out to be the cause. Instead of applying the above displacement, I've decided to support nanosecond timestamps on this occation. Fortunetaly, a needless 64-bit field was in the nilfs_inode struct, and I found it's available for this purpose without impact for the users. So, this will do the enhancement and resolve the tar problem. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:20 -07:00
Ryusuke Konishi	e339ad31f5	nilfs2: introduce secondary super block The former versions didn't have extra super blocks. This improves the weak point by introducing another super block at unused region in tail of the partition. This doesn't break disk format compatibility; older versions just ingore the secondary super block, and new versions just recover it if it doesn't exist. The partition created by an old mkfs may not have unused region, but in that case, the secondary super block will not be added. This doesn't make more redundant copies of the super block; it is a future work. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:20 -07:00
Ryusuke Konishi	cece552074	nilfs2: simplify handling of active state of segments will reduce some lines of segment constructor. Previously, the state was complexly controlled through a list of segments in order to keep consistency in meta data of usage state of segments. Instead, this presents ``calculated'' active flags to userland cleaner program and stop maintaining its real flag on disk. Only by this fake flag, the cleaner cannot exactly know if each segment is reclaimable or not. However, the recent extension of nilfs_sustat ioctl struct (nilfs2-extend-nilfs_sustat-ioctl-struct.patch) can prevent the cleaner from reclaiming in-use segment wrongly. So, now I can apply this for simplification. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:20 -07:00
Ryusuke Konishi	c96fa464a5	nilfs2: mark minor flag for checkpoint created by internal operation Nilfs creates checkpoints even for garbage collection or metadata updates such as checkpoint mode change. So, user often sees checkpoints created only by such internal operations. This is inconvenient in some situations. For example, application that monitors checkpoints and changes them to snapshots, will fall into an infinite loop because it cannot distinguish internally created checkpoints. This patch solves this sort of problem by adding a flag to checkpoint for identification. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:19 -07:00
Ryusuke Konishi	458c5b0822	nilfs2: clean up sketch file The sketch file is a file to mark checkpoints with user data. It was experimentally introduced in the original implementation, and now obsolete. The file was handled differently with regular files; the file size got truncated when a checkpoint was created. This stops the special treatment and will treat it as a regular file. Most users are not affected because mkfs.nilfs2 no longer makes this file. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:19 -07:00
Ryusuke Konishi	e626874685	nilfs2: super block operations fix endian bug This adds a missing endian conversion of checksum field in the super block. This fixes compatibility issue on big endian machines which will come to surface after supporting recovery of super block. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:19 -07:00
Ryusuke Konishi	1f5abe7e7d	nilfs2: replace BUG_ON and BUG calls triggerable from ioctl Pekka Enberg advised me: > It would be nice if BUG(), BUG_ON(), and panic() calls would be > converted to proper error handling using WARN_ON() calls. The BUG() > call in nilfs_cpfile_delete_checkpoints(), for example, looks to be > triggerable from user-space via the ioctl() system call. This will follow the comment and keep them to a minimum. Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:19 -07:00
Ryusuke Konishi	2c2e52fc4f	nilfs2: extend nilfs_sustat ioctl struct This adds a new argument to the nilfs_sustat structure. The extended field allows to delete volatile active state of segments, which was needed to protect freshly-created segments from garbage collection but has confused code dealing with segments. This extension alleviates the mess and gives room for further simplifications. The volatile active flag is not persistent, so it's eliminable on this occasion without affecting compatibility other than the ioctl change. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:19 -07:00
Ryusuke Konishi	7a9461939a	nilfs2: use unlocked_ioctl Pekka Enberg suggested converting ->ioctl operations to use ->unlocked_ioctl to avoid BKL. The conversion was verified to be safe, so I will take it on this occasion. Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:19 -07:00
Ryusuke Konishi	8082d36aed	nilfs2: remove compat ioctl code This removes compat code from the nilfs ioctls and applies the same function for both .ioctl and .compat_ioctl file operations. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:18 -07:00
Ryusuke Konishi	dc498d09be	nilfs2: use fixed sized types for ioctl structures Nilfs ioctl had structures not having fixed sized types such as: struct nilfs_argv { void *v_base; size_t v_nmembs; size_t v_size; int v_index; int v_flags; }; Further, some of them are wrongly aligned: e.g. struct nilfs_cpmode { __u64 cm_cno; int cm_mode; }; The size of wrongly aligned structures varies depending on architectures, and it breaks the identity of ioctl commands, which leads to arch dependent errors. Previously, these are compensated by using compat_ioctl. This fixes these problems and allows removal of compat ioctl. Since this will change sizes of those structures, binary compatibility for the past utilities will once break; new utilities have to be used instead. However, it would be helpful to avoid platform dependent problems in the long term. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:18 -07:00
Ryusuke Konishi	1088dcf4c3	nilfs2: remove timedwait ioctl command This removes NILFS_IOCTL_TIMEDWAIT command from ioctl interface along with the related flags and wait queue. The command is terrible because it just sleeps in the ioctl. I prefer to avoid this by devising means of event polling in userland program. By reconsidering the userland GC daemon, I found this is possible without changing behaviour of the daemon and sacrificing efficiency. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:18 -07:00
Ryusuke Konishi	76068c4ff1	nilfs2: fix buggy behavior seen in enumerating checkpoints This will fix the weird behavior of lscp command in listing continuously created checkpoints; the output of lscp is rewinded regularly for the recent nilfs. As a result of debugging, a defect was found in nilfs_cpfile_do_get_cpinfo() function. Though the function can be repeatedly called to enumerate checkpoints and it can skip invalid checkpoint entries, the index value was not carried between successive calls. The bug has long been present, and came to surface after applying a bugfix nilfs2-fix-problems-of-memory-allocation-in-ioctl.patch, which increased frequency of calling the function. The similar bugfix was already applied for ``snapshots'' by nilfs2-fix-gc-failure-on-volumes-keeping-numerous-snapshots.patch. This fixes the problem by making the index argument bidirectional on the function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:18 -07:00
Pekka Enberg	8acfbf0939	nilfs2: clean up indirect function calling conventions This cleans up the strange indirect function calling convention used in nilfs to follow the normal kernel coding style. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:17 -07:00
Ryusuke Konishi	7fa10d2001	nilfs2: fix improper return values of nilfs_get_cpinfo ioctl A few tool developers gave me requests for fixing inconvenient return value of nilfs_get_cpinfo() ioctl; if the requested mode is NILFS_SNAPSHOT and the specified start entry is not a snapshot, the ioctl unnaturally returns one as the number of acquired snapshot item. In addition, the ioctl function returns an ENOENT error for checkpoints within blocks deleted by garbage collection. These behaviors require corrections for programs which enumerate snapshots. This resolves the inconvenience by changing the return values to zero for the above cases. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:17 -07:00
Ryusuke Konishi	b028fcfc4c	nilfs2: fix gc failure on volumes keeping numerous snapshots This resolves the following failure of nilfs2 cleaner daemon: nilfs_cleanerd[20670]: cannot clean segments: No such file or directory nilfs_cleanerd[20670]: shutdown When creating thousands of snapshots, the cleaner daemon had rarely died as above due to an error returned from the kernel code. After applying the recent patch which fixed memory allocation problems in ioctl (Message-Id: <20081215.155840.105124170.ryusuke@osrg.net>), the problem gets more frequent. It turned out to be a bug of nilfs_ioctl_wrap_copy function and one of its callback routines to read out information of snapshots; if the nilfs_ioctl_wrap_copy function divided a large read request into multiple requests, the second and later requests have failed since a restart position on snapshot meta data was not properly set forward. It's a deficiency of the callback interface that cannot pass the restart position among multiple requests. This patch fixes the issue by allowing nilfs_ioctl_wrap_copy and snapshot read functions to exchange a position argument. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:17 -07:00
Ryusuke Konishi	047180f2d7	nilfs2: insert explanations in gcinode file The file gcinode.c gives buffer cache functions for on-disk blocks moved in garbage collection. Joern Engel has suggested inserting its explanations in the source file (Message-ID: <20080917144146.GD8750@logfs.org> and <20080917224953.GB14644@logfs.org>). This follows the comment. Cc: Joern Engel <joern@logfs.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:17 -07:00
Ryusuke Konishi	47420c7998	nilfs2: avoid double error caused by nilfs_transaction_end Pekka Enberg pointed out that double error handlings found after nilfs_transaction_end() can be avoided by separating abort operation: OK, I don't understand this. The only way nilfs_transaction_end() can fail is if we have NILFS_TI_SYNC set and we fail to construct the segment. But why do we want to construct a segment if we don't commit? I guess what I'm asking is why don't we have a separate nilfs_transaction_abort() function that can't fail for the erroneous case to avoid this double error value tracking thing? This does the separation and renames nilfs_transaction_end() to nilfs_transaction_commit() for clarification. Since, some calls of these functions were used just for exclusion control against the segment constructor, they are replaced with semaphore operations. Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:17 -07:00
Ryusuke Konishi	a2e7d2df82	nilfs2: cleanup nilfs_clear_inode This will remove the following unnecessary locks and cleanup code in nilfs_clear_inode(): - unnecessary protection using nilfs_transaction_begin() and nilfs_transaction_end(). - cleanup code of i_dirty list field which is never chained when this function is called. - spinlock used when releasing i_bh field. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:17 -07:00
Ryusuke Konishi	3358b4aaa8	nilfs2: fix problems of memory allocation in ioctl This is another patch for fixing the following problems of a memory copy function in nilfs2 ioctl: (1) It tries to allocate 128KB size of memory even for small objects. (2) Though the function repeatedly tries large memory allocations while reducing the size, GFP_NOWAIT flag is not specified. This increases the possibility of system memory shortage. (3) During the retries of (2), verbose warnings are printed because _GFP_NOWARN flag is not used for the kmalloc calls. The first patch was still doing large allocations by kmalloc which are repeatedly tried while reducing the size. Andi Kleen told me that using copy_from_user for large memory is not good from the viewpoint of preempt latency: On Fri, 12 Dec 2008 21:24:11 +0100, Andi Kleen <andi@firstfloor.org> wrote: > > In the current interface, each data item is copied twice: one is to > > the allocated memory from user space (via copy_from_user), and another > > For such large copies it is better to use multiple smaller (e.g. 4K) > copy user, that gives better real time preempt latencies. Each cfu has a > cond_resched(), but only one, not multiple times in the inner loop. He also advised me that: On Sun, 14 Dec 2008 16:13:27 +0100, Andi Kleen <andi@firstfloor.org> wrote: > Better would be if you could go to PAGE_SIZE. order 0 allocations > are typically the fastest / least likely to stall. > > Also in this case it's a good idea to use __get_free_pages() > directly, kmalloc tends to be become less efficient at larger > sizes. For the function in question, the size of buffer memory can be reduced since the buffer is repeatedly used for a number of small objects. On the other hand, it may incur large preempt latencies for larger buffer because a copy_from_user (and a copy_to_user) was applied only once each cycle. With that, this revision uses the order 0 allocations with __get_free_pages() to fix the original problems. Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:16 -07:00
Ryusuke Konishi	0c4fb87764	nilfs2: update makefile and Kconfig This adds a Makefile for the nilfs2 file system, and updates the makefile and Kconfig file in the file system directory. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:16 -07:00
Koji Sato	7942b919f7	nilfs2: ioctl operations This adds userland interface implemented with ioctl. Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:16 -07:00
Ryusuke Konishi	a3d93f709e	nilfs2: block cache for garbage collection This adds the cache of on-disk blocks to be moved in garbage collection. The disk blocks are held with dummy inodes (called gcinodes), and this file provides lookup function of the dummy inodes, and their buffer read function. Signed-off-by: Seiji Kihara <kihara.seiji@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Yoshiji Amagai <amagai.yoshiji@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:16 -07:00
Ryusuke Konishi	84ef1ecfde	nilfs2: another dat for garbage collection NILFS2 uses another DAT inode during garbage collection to ensure atomicity and consistency of the DAT in the transient state. This twin inode is called GCDAT. This adds functions to initialize the GCDAT and to switch page caches and B-tree node caches between these two inodes. Signed-off-by: Seiji Kihara <kihara.seiji@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Yoshiji Amagai <amagai.yoshiji@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:16 -07:00
Ryusuke Konishi	0f3e1c7f23	nilfs2: recovery functions This adds recovery function on mount. Usually the recovery is achieved by just finding the latest super root. When logs without checkpoints were appended for data sync operations after the latest super root, the recovery function will perform roll forwarding and reconstruct new log(s) with a super root. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:16 -07:00
Ryusuke Konishi	f30bf3e40f	nilfs2: fix missed-sync issue for do_sync_mapping_range() Chris Mason pointed out that there is a missed sync issue in nilfs_writepages(): On Wed, 17 Dec 2008 21:52:55 -0500, Chris Mason wrote: > It looks like nilfs_writepage ignores WB_SYNC_NONE, which is used by > do_sync_mapping_range(). where WB_SYNC_NONE in do_sync_mapping_range() was replaced with WB_SYNC_ALL by Nick's patch (commit: `ee53a891f4`). This fixes the problem by letting nilfs_writepages() write out the log of file data within the range if sync_mode is WB_SYNC_ALL. This involves removal of nilfs_file_aio_write() which was previously needed to ensure O_SYNC sync writes. Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:15 -07:00
Ryusuke Konishi	9ff05123e3	nilfs2: segment constructor This adds the segment constructor (also called log writer). The segment constructor collects dirty buffers for every dirty inode, makes summaries of the buffers, assigns disk block addresses to the buffers, and then submits BIOs for the buffers. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:15 -07:00
Ryusuke Konishi	64b5a32e0b	nilfs2: segment buffer This adds the segment buffer which is used to constuct logs. [akpm@linux-foundation.org: BIO_RW_SYNC got removed] Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:15 -07:00
Ryusuke Konishi	783f61843e	nilfs2: super block operations This adds super block operations for the nilfs2 file system. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:15 -07:00
Ryusuke Konishi	8a9d2191e9	nilfs2: operations for the_nilfs core object This adds functions on the_nilfs object, which keeps shared resources and states among a read/write mount and snapshots mounts going individually. the_nilfs is allocated per block device; it is created when user first mount a snapshot or a read/write mount on the device, then it is reused for successive mounts. It will be freed when all mount instances on the device are detached. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:15 -07:00
Ryusuke Konishi	d25006523d	nilfs2: pathname operations This adds pathname operations, most of which comes from the ext2 file system. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:15 -07:00
Yoshiji Amagai	2ba466d74e	nilfs2: directory entry operations This adds directory handling functions, most of which comes from the ext2 file system. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Yoshiji Amagai <amagai.yoshiji@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:15 -07:00
Ryusuke Konishi	f183ff4f05	nilfs2: file operations This adds primitives for regular file handling. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:14 -07:00
Ryusuke Konishi	05fe58fdc1	nilfs2: inode operations This adds inode level operations of the nilfs2 file system. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:14 -07:00
Koji Sato	6c98cd4ecb	nilfs2: segment usage file This adds a meta data file which stores the allocation state of segments. [konishi.ryusuke@lab.ntt.co.jp: fix wrong counting of checkpoints and dirty segments] Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:14 -07:00
Koji Sato	2961980972	nilfs2: checkpoint file This adds a meta data file which holds checkpoint entries in its data blocks. Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:14 -07:00
Ryusuke Konishi	43bfb45ed4	nilfs2: inode map file This adds a meta data file which stores on-disk inodes in its data blocks. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Yoshiji Amagai <amagai.yoshiji@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:14 -07:00
Koji Sato	a17564f58b	nilfs2: disk address translator This adds the disk address translation file (DAT) whose primary function is to convert virtual disk block numbers to actual disk block numbers. The virtual block numbers of NILFS are associated with checkpoint generation numbers, and this file also provides functions to manage the lifetime information of each virtual block number. Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:14 -07:00
Ryusuke Konishi	5442680fd2	nilfs2: persistent object allocator This adds common functions to allocate or deallocate entries with bitmaps on a meta data file. This feature is used by the DAT and ifile. Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Yoshiji Amagai <amagai.yoshiji@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:13 -07:00
Ryusuke Konishi	5eb563f5f2	nilfs2: meta data file This adds the meta data file, which serves common buffer functions to the DAT, sufile, cpfile, ifile, and so forth. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:13 -07:00
Ryusuke Konishi	0bd49f9446	nilfs2: buffer and page operations This adds common routines for buffer/page operations used in B-tree node caches, meta data files, or segment constructor (log writer). NILFS uses copy functions for buffers and pages due to the following reasons: 1) Relocation required for COW Since NILFS changes address of on-disk blocks, moving buffers in page cache is needed for the buffers which are not addressed by a file offset. If buffer size is smaller than page size, this involves partial copy of pages. 2) Freezing mmapped pages NILFS calculates checksums for each log to ensure its validity. If page data changes after the checksum calculation, this validity check will not work correctly. To avoid this failure for mmaped pages, NILFS freezes their data by copying. 3) Copy-on-write for DAT pages NILFS makes clones of DAT page caches in a copy-on-write manner during GC processes, and this ensures atomicity and consistency of the DAT in the transient state. In addition, NILFS uses two obsolete functions, nilfs_mark_buffer_dirty() and nilfs_clear_page_dirty() respectively. * nilfs_mark_buffer_dirty() was required to avoid NULL pointer dereference faults: Since the page cache of B-tree node pages or data page cache of pseudo inodes does not have a valid mapping->host, calling mark_buffer_dirty() for their buffers causes the fault; it calls __mark_inode_dirty(NULL) through __set_page_dirty(). * nilfs_clear_page_dirty() was needed in the two cases: 1) For B-tree node pages and data pages of the dat/gcdat, NILFS2 clears page dirty flags when it copies back pages from the cloned cache (gcdat->{i_mapping,i_btnode_cache}) to its original cache (dat->{i_mapping,i_btnode_cache}). 2) Some B-tree operations like insertion or deletion may dispose buffers in dirty state, and this needs to cancel the dirty state of their pages. clear_page_dirty_for_io() caused faults because it does not clear the dirty tag on the page cache. Signed-off-by: Seiji Kihara <kihara.seiji@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:13 -07:00
Ryusuke Konishi	a60be987d4	nilfs2: B-tree node cache This adds routines for B-tree node buffers. Signed-off-by: Seiji Kihara <kihara.seiji@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:13 -07:00
Koji Sato	36a580eb48	nilfs2: direct block mapping This adds block mappings using direct pointers which are stored in the i_bmap array of inode. Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:13 -07:00
Koji Sato	17c76b0104	nilfs2: B-tree based block mapping This adds declarations and functions of NILFS2 B-tree. Two variants are integrated in the NILFS2 B-tree. The B-tree for the most files points to the child nodes or data blocks with virtual block addresses, whereas the B-tree of the DAT uses actual block addresses. Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:13 -07:00
Koji Sato	bdb265eae0	nilfs2: integrated block mapping This adds structures and operations for the block mapping (bmap for short). NILFS2 uses direct mappings for short files or B-tree based mappings for longer files. Every on-disk data block is held with inodes and managed through this block mapping. The nilfs_bmap structure and a set of functions here provide this capability to the NILFS2 inode. [penberg@cs.helsinki.fi: remove a bunch of bmap wrapper macros] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:13 -07:00
Ryusuke Konishi	65b4643d3b	nilfs2: add inode and other major structures This adds the following common structures of the NILFS2 file system. * nilfs_inode_info structure: gives on-memory inode. * nilfs_sb_info structure: keeps per-mount state and a special inode for the ifile. This structure is attached to the super_block structure. * the_nilfs structure: keeps shared state and locks among a read/write mount and snapshot mounts. This keeps special inodes for the sufile, cpfile, dat, and another dat inode used during GC (gcdat). This also has a hash table of dummy inodes to cache disk blocks during GC (gcinodes). * nilfs_transaction_info structure: keeps per task state while nilfs is writing logs or doing indivisible inode or namespace operations. This structure is used to identify context during log making and store nest level of the lock which ensures atomicity of file system operations. Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:12 -07:00

... 2 3 4 5 6 ...

314 commits