linux

Commit Graph

Author	SHA1	Message	Date
Benny Halevy	508dc6e110	nfsd41: free_session/free_client must be called under the client_lock The session client is manipulated under the client_lock hence both free_session and nfsd4_del_conns must be called under this lock. This patch adds a BUG_ON that checks this condition in the respective functions and implements the missing locks. nfsd4_{get,put}_session helpers were moved to the C file that uses them so to prevent use from external files and an unlocked version of nfsd4_put_session is provided for external use from nfs4xdr.c Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-03-06 18:13:35 -05:00
Benny Halevy	e27f49c33b	nfsd41: refactor nfsd4_deleg_xgrade_none_ext logic out of nfsd4_process_open2 Handle the case where the nfsv4.1 client asked to uprade or downgrade its delegations and server returns no delegation. In this case, op_delegate_type is set to NFS4_OPEN_DELEGATE_NONE_EXT and op_why_no_deleg is set respectively to WND4_NOT_SUPP_{UP,DOWN}GRADE Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-03-06 18:13:35 -05:00
Benny Halevy	4aa8913cb0	nfsd41: refactor nfs4_open_deleg_none_ext logic out of nfs4_open_delegation When a 4.1 client asks for a delegation and the server returns none op_delegate_type is set to NFS4_OPEN_DELEGATE_NONE_EXT and op_why_no_deleg is set to either WND4_CONTENTION or WND4_RESOURCE. Or, if the client sent a NFS4_SHARE_WANT_CANCEL (which it is not supposed to ever do until our server supports delegations signaling), op_why_no_deleg is set to WND4_CANCELLED. Note that for WND4_CONTENTION and WND4_RESOURCE, the xdr layer is hard coded at this time to encode boolean FALSE for ond_server_will_push_deleg / ond_server_will_signal_avail. Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-03-06 18:13:34 -05:00
J. Bruce Fields	a8ae08ebf1	nfsd4: fix recovery-entry leak nfsd startup failure Another leak on error Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-03-06 18:13:32 -05:00
Jeff Layton	a6d6b7811c	nfsd4: fix recovery-dir leak on nfsd startup failure The current code never calls nfsd4_shutdown_recdir if nfs4_state_start returns an error. Also, it's better to go ahead and consolidate these functions since one is just a trivial wrapper around the other. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-03-06 18:13:25 -05:00
J. Bruce Fields	393d8ed80f	nfsd4: purge stable client records with insufficient state To escape having your stable storage record purged at the end of the grace period, it's not sufficient to simply have performed a setclientid_confirm; you also need to meet the same requirements as someone creating a new record: either you should have done an open or open reclaim (in the 4.0 case) or a reclaim_complete (in the 4.1 case). Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-03-06 18:13:24 -05:00
J. Bruce Fields	1255a8f36c	nfsd4: don't set cl_firststate on first reclaim in 4.1 case We set cl_firststate when we first decide that a client will be permitted to reclaim state on next boot. This happens: - for new 4.0 clients, when they confirm their first open - for returning 4.0 clients, when they reclaim their first open - for 4.1+ clients, when they perform reclaim_complete We also use cl_firststate to decide whether a reclaim_complete has already been performed, in the 4.1+ case. We were setting it on 4.1 open reclaims, which caused spurious COMPLETE_ALREADY errors on RECLAIM_COMPLETE from an nfs4.1 client with anything to reclaim. Reported-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-03-06 18:13:23 -05:00
Benny Halevy	d24433cdc9	nfsd41: implement NFS4_SHARE_WANT_NO_DELEG, NFS4_OPEN_DELEGATE_NONE_EXT, why_no_deleg Respect client request for not getting a delegation in NFSv4.1 Appropriately return delegation "type" NFS4_OPEN_DELEGATE_NONE_EXT and WND4_NOT_WANTED reason. [nfsd41: add missing break when encoding op_why_no_deleg] Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-02-17 18:38:53 -05:00
Bryan Schumaker	03cfb42025	NFSD: Clean up the test_stateid function When I initially wrote it, I didn't understand how lists worked so I wrote something that didn't use them. I think making a list of stateids to test is a more straightforward implementation, especially compared to especially compared to decoding stateids while simultaneously encoding a reply to the client. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-02-17 18:38:52 -05:00
NeilBrown	de5b8e8e04	lockd: fix arg parsing for grace_period and timeout. If you try to set grace_period or timeout via a module parameter to lockd, and do this on a big-endian machine where sizeof(int) != sizeof(unsigned long) it won't work. This number given will be effectively shifted right by the difference in those two sizes. So cast kp->arg properly to get correct result. Cc: stable@kernel.org Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-02-17 18:38:51 -05:00
Benny Halevy	2c8bd7e0d1	nfsd41: split out share_access want and signal flags while decoding Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-02-17 18:38:42 -05:00
Benny Halevy	00b5f95a26	nfsd41: share_access_to_flags should consider only nfs4.x share_access flags Currently, it will not correctly ignore any nfsv4.1 signal flags if the client sends them. Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-02-17 11:50:36 -05:00
Tigran Mkrtchyan	37c593c573	nfsd41: use current stateid by value Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-02-15 11:20:45 -05:00
Tigran Mkrtchyan	9428fe1abb	nfsd41: consume current stateid on DELEGRETURN and OPENDOWNGRADE Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-02-15 11:20:44 -05:00
Tigran Mkrtchyan	1e97b5190d	nfsd41: handle current stateid in SETATTR and FREE_STATEID Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-02-15 11:20:43 -05:00
Tigran Mkrtchyan	d14710532f	nfsd41: mark LOOKUP, LOOKUPP and CREATE to invalidate current stateid Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-02-15 11:20:42 -05:00
Tigran Mkrtchyan	8307111476	nfsd41: save and restore current stateid with current fh Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-02-15 11:20:41 -05:00
Tigran Mkrtchyan	80e01cc1e2	nfsd41: mark PUTFH, PUTPUBFH and PUTROOTFH to clear current stateid Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-02-15 11:20:41 -05:00
Tigran Mkrtchyan	30813e2773	nfsd41: consume current stateid on read and write Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-02-15 11:20:40 -05:00
Tigran Mkrtchyan	62cd4a591c	nfsd41: handle current stateid on lock and locku Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-02-15 11:20:39 -05:00
Tigran Mkrtchyan	8b70484c67	nfsd41: handle current stateid in open and close Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-02-15 11:20:38 -05:00
Tigran Mkrtchyan	19ff0f288c	nfsd4: initialize current stateid at compile time Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-02-15 11:20:29 -05:00
J. Bruce Fields	bf5c43c8f1	nfsd4: check for uninitialized slot This fixes an oops when a buggy client tries to use an initial seqid of 0 on a new slot, which we may misinterpret as a replay. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-02-14 17:01:58 -05:00
J. Bruce Fields	73e79482b4	nfsd4: rearrange struct nfsd4_slot Combine two booleans into a single flag field, move the smaller fields to the end. (In practice this doesn't make the struct any smaller. But we'll be adding another flag here soon.) Remove some debugging code that doesn't look useful, while we're in the neighborhood. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-02-14 17:01:29 -05:00
J. Bruce Fields	f6d82485e9	nfsd4: fix sessions slotid wraparound logic From RFC 5661 2.10.6.1: "If the previous sequence ID was 0xFFFFFFFF, then the next request for the slot MUST have the sequence ID set to zero." While we're there, delete some redundant comments. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-02-13 16:15:18 -05:00
J. Bruce Fields	508f922756	nfsd: fix default iosize calculation on 32bit The rpc buffers will be allocated out of low memory, so we should really only be taking that into account. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-02-03 15:33:17 -05:00
J. Bruce Fields	87b0fc7deb	nfsd: cleanup setting of default max_block_size Move calculation of the default into a helper function. Get rid of an unused variable "err" while we're there. Thanks to Mi Jinlong for catching an arithmetic error in a previous version. Cc: Mi Jinlong <mijinlong@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-02-03 15:32:41 -05:00
Dan Carpenter	3476964dba	nfsd: remove some unneeded checks We check for zero length strings in the caller now, so these aren't needed. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-02-03 14:26:42 -05:00
Linus Torvalds	d3712b9dfc	Pull request from git://github.com/prasad-joshi/logfs_upstream.git There are few important bug fixes for LogFS Shortlog: Joern Engel (5): logfs: Prevent memory corruption logfs: remove useless BUG_ON logfs: Free areas before calling generic_shutdown_super() logfs: Grow inode in delete path Logfs: Allow NULL block_isbad() methods Prasad Joshi (5): logfs: update page reference count for pined pages logfs: take write mutex lock during fsync and sync logfs: set superblock shutdown flag after generic sb shutdown logfs: Propagate page parameter to __logfs_write_inode MAINTAINERS: Add Prasad Joshi in LogFS maintiners Diffstat: MAINTAINERS \| 1 + fs/logfs/dev_mtd.c \| 26 +++++++++++------------- fs/logfs/dir.c \| 2 +- fs/logfs/file.c \| 2 + fs/logfs/gc.c \| 2 +- fs/logfs/inode.c \| 4 ++- fs/logfs/journal.c \| 1 - fs/logfs/logfs.h \| 5 +++- fs/logfs/readwrite.c \| 51 +++++++++++++++++++++++++++++++++---------------- fs/logfs/segment.c \| 51 ++++++++++++++++++++++++++++++++++++++----------- fs/logfs/super.c \| 3 +- 11 files changed, 99 insertions(+), 49 deletions(-) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJPKByhAAoJEDFA/f+3K+ZNQSUP/3gACcIwcsl+FnXPWBtz9XIG g0DjXoRDd/sR0u25nLgjCVdBJgx5FVEyA+PvLgvvUU2KCAsqI5F/EQ+fLJs21YEN TzepBO5aHtFZbNEjo6WiXOlDbBePTtk44WrN6jqoCHM/aDeT4Wof3NZBmHWNN1PX B2RtEZ0ypJ7/b1OY2LUNcQfTaJXNgVoP8Hkx4KGY5LUVxVrBXxvDTU7YbkS8a+ys 1Yje/EQ4XD4RyZB42TmFEuTenvGPRgMGVFdnkJKuON8EmJQ8Hc61jEf5d7Q8sWef dH5F/ptoAaR9a9LbbO8LoYuBZ8MR8848NPsrNPpr/gWntj46Z79yII8Jr7YoSDyw zq5G2dZbwlbVrtVWKGae47THkNB8bljR/g4cijvPAkvuIAku6mg+dgjVHAhZ/t+J xu8+Gy2sWHUH2gmoSXuoNyppOvYpPIRd5RB16PizMH3bw+sMad2K8/rfOKnmF1/r HTM2jZ5bDcHVDjSuVI6u2m/mQX+PmPXUTffreaFXuSI75YpT0dqN3nponTX4EgFI Ad9ZBQvdg8w1LGDsNxIAaqrGx4Q87RxqfUV4W/wo6N8gKsp+I2y4GtYMeD/CEKyi wncKg10YwoMXZj7cBAkWgPlgrOBYCPwYZc/1DVRHvqrHo/m13SJrWDKkNKVvoXzH 2y4Tfi5w1WDRUT7yeoyK =TA1A -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstream There are few important bug fixes for LogFS * tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstream: Logfs: Allow NULL block_isbad() methods logfs: Grow inode in delete path logfs: Free areas before calling generic_shutdown_super() logfs: remove useless BUG_ON MAINTAINERS: Add Prasad Joshi in LogFS maintiners logfs: Propagate page parameter to __logfs_write_inode logfs: set superblock shutdown flag after generic sb shutdown logfs: take write mutex lock during fsync and sync logfs: Prevent memory corruption logfs: update page reference count for pined pages Fix up conflict in fs/logfs/dev_mtd.c due to semantic change in what "mtd->block_isbad" means in commit f2933e86ad93: "Logfs: Allow NULL block_isbad() methods" clashing with the abstraction changes in the commits 7086c19d0742: "mtd: introduce mtd_block_isbad interface" and d58b27ed58a3: "logfs: do not use 'mtd->block_isbad' directly". This resolution takes the semantics from commit `f2933e86ad`, and just makes mtd_block_isbad() return zero (false) if the 'block_isbad' function is NULL. But that also means that now "mtd_can_have_bb()" always returns 0. Now, "mtd_block_markbad()" will obviously return an error if the low-level driver doesn't support bad blocks, so this is somewhat non-symmetric, but it actually makes sense if a NULL "block_isbad" function is considered to mean "I assume that all my blocks are always good".	2012-01-31 09:23:59 -08:00
Linus Torvalds	0a96265754	Here are some patches for the 3.3-rc1 tree. It contains the removal of the sysdev code, now that all users of it are gone, as well as some sysfs bugfixes that have been reported by users. There are also some documentation updates here as well. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iEYEABECAAYFAk8jKW4ACgkQMUfUDdst+ynAUwCfVWwHJxpb4DSSMVZhGOnHMQrL ZjIAn00gPeSs5u8y1nPvFrFikbon4FDs =bzVy -----END PGP SIGNATURE----- Merge tag 'driver-core-3.3-rc1-bugfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Here are some patches for the 3.3-rc1 tree. It contains the removal of the sysdev code, now that all users of it are gone, as well as some sysfs bugfixes that have been reported by users. There are also some documentation updates here as well. * tag 'driver-core-3.3-rc1-bugfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: sysfs: Complain bitterly about attempts to remove files from nonexistent directories. stable: update documentation to ask for kernel version base/core.c:fix typo in comment in function device_add Documentation: devres: add allocation functions to list of supported calls Documentation update for the driver model core kernel-doc: fix new warnings in driver-core kernel-doc: fix new warnings in debugfs kernel-doc: fix new warnings in device.h driver core: remove drivers/base/sys.c and include/linux/sysdev.h	2012-01-28 18:20:48 -08:00
Linus Torvalds	67d2433ee7	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: fix reservations in btrfs_page_mkwrite Btrfs: advance window_start if we're using a bitmap btrfs: mask out gfp flags in releasepage Btrfs: fix enospc error caused by wrong checks of the chunk Btrfs: do not defrag a file partially Btrfs: fix warning for 32-bit build of fs/btrfs/check-integrity.c Btrfs: use cluster->window_start when allocating from a cluster bitmap Btrfs: Check for NULL page in extent_range_uptodate btrfs: Fix busyloops in transaction waiting code Btrfs: make sure a bitmap has enough bytes Btrfs: fix uninit warning in backref.c	2012-01-28 17:00:19 -08:00
Joern Engel	f2933e86ad	Logfs: Allow NULL block_isbad() methods Not all mtd drivers define block_isbad(). Let's assume no bad blocks instead of refusing to mount. Signed-off-by: Joern Engel <joern@logfs.org>	2012-01-28 11:43:40 +05:30
Joern Engel	bbe0138712	logfs: Grow inode in delete path Can be necessary if an inode gets deleted (through -ENOSPC) before being written. Might be better to move this into logfs_write_rec(), but for now go with the stupid&safe patch. Signed-off-by: Joern Engel <joern@logfs.org>	2012-01-28 11:43:07 +05:30
Joern Engel	1bcceaff8c	logfs: Free areas before calling generic_shutdown_super() Or hit an assertion in map_invalidatepage() instead. Signed-off-by: Joern Engel <joern@logfs.org>	2012-01-28 11:42:39 +05:30
Joern Engel	6c69494f6b	logfs: remove useless BUG_ON It prevents write sizes >4k. Signed-off-by: Joern Engel <joern@logfs.org>	2012-01-28 11:41:56 +05:30
Prasad Joshi	0bd90387ed	logfs: Propagate page parameter to __logfs_write_inode During GC LogFS has to rewrite each valid block to a separate segment. Rewrite operation reads data from an old segment and writes it to a newly allocated segment. Since every write operation changes data block pointers maintained in inode, inode should also be rewritten. In GC path to avoid AB-BA deadlock LogFS marks a page with PG_pre_locked in addition to locking the page (PG_locked). The page lock is ignored iff the page is pre-locked. LogFS uses a special file called segment file. The segment file maintains an 8 bytes entry for every segment. It keeps track of erase count, level etc. for every segment. Bad things happen with a segment belonging to the segment file is GCed ------------[ cut here ]------------ kernel BUG at /home/prasad/logfs/readwrite.c:297! invalid opcode: 0000 [#1] SMP Modules linked in: logfs joydev usbhid hid psmouse e1000 i2c_piix4 serio_raw [last unloaded: logfs] Pid: 20161, comm: mount Not tainted 3.1.0-rc3+ #3 innotek GmbH VirtualBox EIP: 0060:[<f809132a>] EFLAGS: 00010292 CPU: 0 EIP is at logfs_lock_write_page+0x6a/0x70 [logfs] EAX: 00000027 EBX: f73f5b20 ECX: c16007c8 EDX: 00000094 ESI: 00000000 EDI: e59be6e4 EBP: c7337b28 ESP: c7337b18 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process mount (pid: 20161, ti=c7336000 task=eb323f70 task.ti=c7336000) Stack: f8099a3d c7337b24 f73f5b20 00001002 c7337b50 f8091f6d f8099a4d f80994e4 00000003 00000000 c7337b68 00000000 c67e4400 00001000 c7337b80 f80935e5 00000000 00000000 00000000 00000000 e1fcf000 0000000f e59be618 c70bf900 Call Trace: [<f8091f6d>] logfs_get_write_page.clone.16+0xdd/0x100 [logfs] [<f80935e5>] logfs_mod_segment_entry+0x55/0x110 [logfs] [<f809460d>] logfs_get_segment_entry+0x1d/0x20 [logfs] [<f8091060>] ? logfs_cleanup_journal+0x50/0x50 [logfs] [<f809521b>] ostore_get_erase_count+0x1b/0x40 [logfs] [<f80965b8>] logfs_open_area+0xc8/0x150 [logfs] [<c141a7ec>] ? kmemleak_alloc+0x2c/0x60 [<f809668e>] __logfs_segment_write.clone.16+0x4e/0x1b0 [logfs] [<c10dd563>] ? mempool_kmalloc+0x13/0x20 [<c10dd563>] ? mempool_kmalloc+0x13/0x20 [<f809696f>] logfs_segment_write+0x17f/0x1d0 [logfs] [<f8092e8c>] logfs_write_i0+0x11c/0x180 [logfs] [<f8092f35>] logfs_write_direct+0x45/0x90 [logfs] [<f80934cd>] __logfs_write_buf+0xbd/0xf0 [logfs] [<c102900e>] ? kmap_atomic_prot+0x4e/0xe0 [<f809424b>] logfs_write_buf+0x3b/0x60 [logfs] [<f80947a9>] __logfs_write_inode+0xa9/0x110 [logfs] [<f8094cb0>] logfs_rewrite_block+0xc0/0x110 [logfs] [<f8095300>] ? get_mapping_page+0x10/0x60 [logfs] [<f8095aa0>] ? logfs_load_object_aliases+0x2e0/0x2f0 [logfs] [<f808e57d>] logfs_gc_segment+0x2ad/0x310 [logfs] [<f808e62a>] __logfs_gc_once+0x4a/0x80 [logfs] [<f808ed43>] logfs_gc_pass+0x683/0x6a0 [logfs] [<f8097a89>] logfs_mount+0x5a9/0x680 [logfs] [<c1126b21>] mount_fs+0x21/0xd0 [<c10f6f6f>] ? __alloc_percpu+0xf/0x20 [<c113da41>] ? alloc_vfsmnt+0xb1/0x130 [<c113db4b>] vfs_kern_mount+0x4b/0xa0 [<c113e06e>] do_kern_mount+0x3e/0xe0 [<c113f60d>] do_mount+0x34d/0x670 [<c10f2749>] ? strndup_user+0x49/0x70 [<c113fcab>] sys_mount+0x6b/0xa0 [<c142d87c>] syscall_call+0x7/0xb Code: f8 e8 8b 93 39 c9 8b 45 f8 3e 0f ba 28 00 19 d2 85 d2 74 ca eb d0 0f 0b 8d 45 fc 89 44 24 04 c7 04 24 3d 9a 09 f8 e8 09 92 39 c9 <0f> 0b 8d 74 26 00 55 89 e5 3e 8d 74 26 00 8b 10 80 e6 01 74 09 EIP: [<f809132a>] logfs_lock_write_page+0x6a/0x70 [logfs] SS:ESP 0068:c7337b18 ---[ end trace 96e67d5b3aa3d6ca ]--- The patch passes locked page to __logfs_write_inode. It calls function logfs_get_wblocks() to pre-lock the page. This ensures any further attempts to lock the page are ignored (esp from get_erase_count). Acked-by: Joern Engel <joern@logfs.org> Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>	2012-01-28 11:38:25 +05:30
Prasad Joshi	ecfd890991	logfs: set superblock shutdown flag after generic sb shutdown While unmounting the file system LogFS calls generic_shutdown_super. The function does file system independent superblock shutdown. However, it might result in call file system specific inode eviction. LogFS marks FS shutting down by setting bit LOGFS_SB_FLAG_SHUTDOWN in super->s_flags. Since, inode eviction might call truncate on inode, following BUG is observed when file system is unmounted: ------------[ cut here ]------------ kernel BUG at /home/prasad/logfs/segment.c:362! invalid opcode: 0000 [#1] PREEMPT SMP CPU 3 Modules linked in: logfs binfmt_misc ppdev virtio_blk parport_pc lp parport psmouse floppy virtio_pci serio_raw virtio_ring virtio Pid: 1933, comm: umount Not tainted 3.0.0+ #4 Bochs Bochs RIP: 0010:[<ffffffffa008c841>] [<ffffffffa008c841>] logfs_segment_write+0x211/0x230 [logfs] RSP: 0018:ffff880062d7b9e8 EFLAGS: 00010202 RAX: 000000000000000e RBX: ffff88006eca9000 RCX: 0000000000000000 RDX: ffff88006fd87c40 RSI: ffffea00014ff468 RDI: ffff88007b68e000 RBP: ffff880062d7ba48 R08: 8000000020451430 R09: 0000000000000000 R10: dead000000100100 R11: 0000000000000000 R12: ffff88006fd87c40 R13: ffffea00014ff468 R14: ffff88005ad0a460 R15: 0000000000000000 FS: 00007f25d50ea760(0000) GS:ffff88007fd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000d05e48 CR3: 0000000062c72000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process umount (pid: 1933, threadinfo ffff880062d7a000, task ffff880070b44500) Stack: ffff880062d7ba38 ffff88005ad0a508 0000000000001000 0000000000000000 8000000020451430 ffffea00014ff468 ffff880062d7ba48 ffff88005ad0a460 ffff880062d7bad8 ffffea00014ff468 ffff88006fd87c40 0000000000000000 Call Trace: [<ffffffffa0088fee>] logfs_write_i0+0x12e/0x190 [logfs] [<ffffffffa0089360>] __logfs_write_rec+0x140/0x220 [logfs] [<ffffffffa0089312>] __logfs_write_rec+0xf2/0x220 [logfs] [<ffffffffa00894a4>] logfs_write_rec+0x64/0xd0 [logfs] [<ffffffffa0089616>] __logfs_write_buf+0x106/0x110 [logfs] [<ffffffffa008a19e>] logfs_write_buf+0x4e/0x80 [logfs] [<ffffffffa008a6b8>] __logfs_write_inode+0x98/0x110 [logfs] [<ffffffffa008a7c4>] logfs_truncate+0x54/0x290 [logfs] [<ffffffffa008abfc>] logfs_evict_inode+0xdc/0x190 [logfs] [<ffffffff8115eef5>] evict+0x85/0x170 [<ffffffff8115f126>] iput+0xe6/0x1b0 [<ffffffff8115b4a8>] shrink_dcache_for_umount_subtree+0x218/0x280 [<ffffffff8115ce91>] shrink_dcache_for_umount+0x51/0x90 [<ffffffff8114796c>] generic_shutdown_super+0x2c/0x100 [<ffffffffa008cc47>] logfs_kill_sb+0x57/0xf0 [logfs] [<ffffffff81147de5>] deactivate_locked_super+0x45/0x70 [<ffffffff811487ea>] deactivate_super+0x4a/0x70 [<ffffffff81163934>] mntput_no_expire+0xa4/0xf0 [<ffffffff8116469f>] sys_umount+0x6f/0x380 [<ffffffff814dd46b>] system_call_fastpath+0x16/0x1b Code: 55 c8 49 8d b6 a8 00 00 00 45 89 f9 45 89 e8 4c 89 e1 4c 89 55 b8 c7 04 24 00 00 00 00 e8 68 fc ff ff 4c 8b 55 b8 e9 3c ff ff ff <0f> 0b 0f 0b c7 45 c0 00 00 00 00 e9 44 fe ff ff 66 66 66 66 66 RIP [<ffffffffa008c841>] logfs_segment_write+0x211/0x230 [logfs] RSP <ffff880062d7b9e8> ---[ end trace fe6b040cea952290 ]--- Therefore, move super->s_flags setting after the fs-indenpendent work has been finished. Reviewed-by: Joern Engel <joern@logfs.org> Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>	2012-01-28 11:37:47 +05:30
Prasad Joshi	13ced29cb2	logfs: take write mutex lock during fsync and sync LogFS uses super->s_write_mutex while writing data to disk. Taking the same mutex lock in sync and fsync code path solves the following BUG: ------------[ cut here ]------------ kernel BUG at /home/prasad/logfs/dev_bdev.c:134! Pid: 2387, comm: flush-253:16 Not tainted 3.0.0+ #4 Bochs Bochs RIP: 0010:[<ffffffffa007deed>] [<ffffffffa007deed>] bdev_writeseg+0x25d/0x270 [logfs] Call Trace: [<ffffffffa007c381>] logfs_open_area+0x91/0x150 [logfs] [<ffffffff8128dcb2>] ? find_level.clone.9+0x62/0x100 [<ffffffffa007c49c>] __logfs_segment_write.clone.20+0x5c/0x190 [logfs] [<ffffffff810ef005>] ? mempool_kmalloc+0x15/0x20 [<ffffffff810ef383>] ? mempool_alloc+0x53/0x130 [<ffffffffa007c7a4>] logfs_segment_write+0x1d4/0x230 [logfs] [<ffffffffa0078f8e>] logfs_write_i0+0x12e/0x190 [logfs] [<ffffffffa0079300>] __logfs_write_rec+0x140/0x220 [logfs] [<ffffffffa0079444>] logfs_write_rec+0x64/0xd0 [logfs] [<ffffffffa00795b6>] __logfs_write_buf+0x106/0x110 [logfs] [<ffffffffa007a13e>] logfs_write_buf+0x4e/0x80 [logfs] [<ffffffffa0073e33>] __logfs_writepage+0x23/0x80 [logfs] [<ffffffffa007410c>] logfs_writepage+0xdc/0x110 [logfs] [<ffffffff810f5ba7>] __writepage+0x17/0x40 [<ffffffff810f6208>] write_cache_pages+0x208/0x4f0 [<ffffffff810f5b90>] ? set_page_dirty+0x70/0x70 [<ffffffff810f653a>] generic_writepages+0x4a/0x70 [<ffffffff810f75d1>] do_writepages+0x21/0x40 [<ffffffff8116b9d1>] writeback_single_inode+0x101/0x250 [<ffffffff8116bdbd>] writeback_sb_inodes+0xed/0x1c0 [<ffffffff8116c5fb>] writeback_inodes_wb+0x7b/0x1e0 [<ffffffff8116cc23>] wb_writeback+0x4c3/0x530 [<ffffffff814d984d>] ? sub_preempt_count+0x9d/0xd0 [<ffffffff8116cd6b>] wb_do_writeback+0xdb/0x290 [<ffffffff814d984d>] ? sub_preempt_count+0x9d/0xd0 [<ffffffff814d6208>] ? _raw_spin_unlock_irqrestore+0x18/0x40 [<ffffffff8105aa5a>] ? del_timer+0x8a/0x120 [<ffffffff8116cfac>] bdi_writeback_thread+0x8c/0x2e0 [<ffffffff8116cf20>] ? wb_do_writeback+0x290/0x290 [<ffffffff8106d2e6>] kthread+0x96/0xa0 [<ffffffff814de514>] kernel_thread_helper+0x4/0x10 [<ffffffff8106d250>] ? kthread_worker_fn+0x190/0x190 [<ffffffff814de510>] ? gs_change+0xb/0xb RIP [<ffffffffa007deed>] bdev_writeseg+0x25d/0x270 [logfs] ---[ end trace 0211ad60a57657c4 ]--- Reviewed-by: Joern Engel <joern@logfs.org> Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>	2012-01-28 11:36:06 +05:30
Joern Engel	934eed395d	logfs: Prevent memory corruption This is a bad one. I wonder whether we were so far protected by no_free_segments(sb) usually being smaller than LOGFS_NO_AREAS. Found by Dan Carpenter <dan.carpenter@oracle.com> using smatch. Signed-off-by: Joern Engel <joern@logfs.org> Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>	2012-01-28 11:24:21 +05:30
Prasad Joshi	96150606e2	logfs: update page reference count for pined pages LogFS sets PG_private flag to indicate a pined page. We assumed that marking a page as private is enough to ensure its existence. But instead it is necessary to hold a reference count to the page. The change resolves the following BUG BUG: Bad page state in process flush-253:16 pfn:6a6d0 page flags: 0x100000000000808(uptodate\|private) Suggested-and-Acked-by: Joern Engel <joern@logfs.org> Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>	2012-01-28 11:23:10 +05:30
Chris Mason	9998eb7034	Btrfs: fix reservations in btrfs_page_mkwrite Josef fixed btrfs_page_mkwrite to properly release reserved extents if there was an error. But if we fail to get a reservation and we fail to dirty the inode (for ENOSPC reasons), we'll end up trying to release a reservation we never had. This makes sure we only release if we were able to reserve. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-27 10:44:44 -05:00
Josef Bacik	9b23062840	Btrfs: advance window_start if we're using a bitmap If we span a long area in a bitmap we could end up taking a lot of time searching to the next free area if we're searching from the original window_start, so advance window_start in order to make sure we don't do any superficial searching. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-26 15:01:12 -05:00
David Sterba	0c4e538bcc	btrfs: mask out gfp flags in releasepage btree_releasepage is a callback and can be passed unknown gfp flags and then they may end up in kmem_cache_alloc called from alloc_extent_state, slab allocator will BUG_ON when there is HIGHMEM or DMA32 flag set. This may happen when btrfs is mounted from a loop device, which masks out __GFP_IO flag. The check in try_release_extent_state 3399 if ((mask & GFP_NOFS) == GFP_NOFS) 3400 mask = GFP_NOFS; will not work and passes unfiltered flags further resulting in crash at mm/slab.c:2963 [<000000000024ae4c>] cache_alloc_refill+0x3b4/0x5c8 [<000000000024c810>] kmem_cache_alloc+0x204/0x294 [<00000000001fd3c2>] mempool_alloc+0x52/0x170 [<000003c000ced0b0>] alloc_extent_state+0x40/0xd4 [btrfs] [<000003c000cee5ae>] __clear_extent_bit+0x38a/0x4cc [btrfs] [<000003c000cee78c>] try_release_extent_state+0x9c/0xd4 [btrfs] [<000003c000cc4c66>] btree_releasepage+0x7e/0xd0 [btrfs] [<0000000000210d84>] shrink_page_list+0x6a0/0x724 [<0000000000211394>] shrink_inactive_list+0x230/0x578 [<0000000000211bb8>] shrink_list+0x6c/0x120 [<0000000000211e4e>] shrink_zone+0x1e2/0x228 [<0000000000211f24>] shrink_zones+0x90/0x254 [<0000000000213410>] do_try_to_free_pages+0xac/0x420 [<0000000000213ae0>] try_to_free_pages+0x13c/0x1b0 [<0000000000204e6c>] __alloc_pages_nodemask+0x5b4/0x9a8 [<00000000001fb04a>] grab_cache_page_write_begin+0x7e/0xe8 Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-26 15:01:12 -05:00
Miao Xie	9e622d6bea	Btrfs: fix enospc error caused by wrong checks of the chunk When we did sysbench test for inline files, enospc error happened easily though there was lots of free disk space which could be allocated for new chunks. Reproduce steps: # mkfs.btrfs -b $((2 * 1024 * 1024 * 1024)) <test partition> # mount <test partition> /mnt # ulimit -n 102400 # cd /mnt # sysbench --num-threads=1 --test=fileio --file-num=81920 \ > --file-total-size=80M --file-block-size=1K --file-io-mode=sync \ > --file-test-mode=seqwr prepare # sysbench --num-threads=1 --test=fileio --file-num=81920 \ > --file-total-size=80M --file-block-size=1K --file-io-mode=sync \ > --file-test-mode=seqwr run <soon later, BUG_ON() was triggered by enospc error> The reason of this bug is: Now, we can reserve space which is larger than the free space in the chunks if we have enough free disk space which can be used for new chunks. By this way, the space allocator should allocate a new chunk by force if there is no free space in the free space cache. But there are two wrong checks which break this operation. One is if (ret == -ENOSPC && num_bytes > min_alloc_size) in btrfs_reserve_extent(), it is wrong, we should try to allocate a new chunk even we fail to allocate free space by minimum allocable size. The other is if (space_info->force_alloc) force = space_info->force_alloc; in do_chunk_alloc(). It makes the allocator ignore CHUNK_ALLOC_FORCE If someone sets ->force_alloc to CHUNK_ALLOC_LIMITED, and makes the enospc error happen. Fix these two wrong checks. Especially the second one, we fix it by changing the value of CHUNK_ALLOC_LIMITED and CHUNK_ALLOC_FORCE, and make CHUNK_ALLOC_FORCE greater than CHUNK_ALLOC_LIMITED since CHUNK_ALLOC_FORCE has higher priority. And if the value which is passed in by the caller is greater than ->force_alloc, use the passed value. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-26 15:01:12 -05:00
Liu Bo	7ec31b548a	Btrfs: do not defrag a file partially xfstests 218 complains that btrfs defrags a file partially: After: 1 Write backwards sync, but contiguous - should defrag to 1 extent Before: 10 -After: 1 +After: 2 To fix this, we need to set max_to_defrag count properly. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-26 15:01:12 -05:00
Stefan Behrens	0b485143d8	Btrfs: fix warning for 32-bit build of fs/btrfs/check-integrity.c There have been 4 warnings on 32-bit build, they are herewith fixed. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-26 15:01:11 -05:00
Josef Bacik	0b4a9d248f	Btrfs: use cluster->window_start when allocating from a cluster bitmap We specifically set window_start in the cluster struct to indicate where the cluster starts in a bitmap, but we've been using min_start to indicate where we're searching from. This is usually the start of the blockgroup, so essentially means we're constantly searching from the start of any bitmap we find, which completely negates all the trouble we go to in order to setup a cluster. So start using window_start to make sure we actually use the area we found. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-26 15:01:11 -05:00
Mitch Harder	8bedd51b61	Btrfs: Check for NULL page in extent_range_uptodate A user has encountered a NULL pointer kernel oops in btrfs when encountering media errors. The problem has been identified as an unhandled NULL pointer returned from find_get_page(). This modification simply checks for a NULL page, and returns with an error if found (the extent_range_uptodate() function returns 1 on errors). After testing this patch, the user reported that the error with the NULL pointer oops was solved. However, there is still a remaining problem with a thread becoming stuck in wait_on_page_locked(page) in the read_extent_buffer_pages(...) function in extent_io.c for (i = start_i; i < num_pages; i++) { page = extent_buffer_page(eb, i); wait_on_page_locked(page); if (!PageUptodate(page)) ret = -EIO; } This patch leaves the issue with the locked page yet to be resolved. Signed-off-by: Mitch Harder <mitch.harder@sabayonlinux.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-26 15:01:11 -05:00
Jan Kara	6dd70ce4eb	btrfs: Fix busyloops in transaction waiting code wait_log_commit() and wait_for_writer() were using slightly different conditions for deciding whether they should call schedule() and whether they should continue in the wait loop. Thus it could happen that we busylooped when the first condition was not true while the second one was. That is burning CPU cycles needlessly and is deadly on UP machines... Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-26 15:01:11 -05:00
Josef Bacik	357b9784b7	Btrfs: make sure a bitmap has enough bytes We have only been checking for min_bytes available in bitmap entries, but we won't successfully setup a bitmap cluster unless it has at least bytes in the bitmap, so in the common case min_bytes is 4k and we want something like 2MB, so if there are a bunch of bitmap entries with less than 2mb's in them, we'll search all them anyway, which is suboptimal. Fix this check. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-26 15:01:11 -05:00

1 2 3 4 5 ...

25641 Commits (508dc6e110c6dbdc0bbe84298ccfe22de7538486)