linux

Commit Graph

Author	SHA1	Message	Date
Sunil Mushran	29576f8bb5	ocfs2/dlm: Link all lockres' to a tracking list This patch links all the lockres' to a tracking list in dlm_ctxt. We will use this in an upcoming patch that will walk the entire list and to dump the lockres states to a debugfs file. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:08 -07:00
Sunil Mushran	724bdca9b8	ocfs2/dlm: Create slabcaches for lock and lockres This patch makes the o2dlm allocate memory for lockres, lockname and lock structures from slabcaches rather than kmalloc. This allows us to not only make these allocs more efficient but also allows us to track the memory being consumed by these structures. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:08 -07:00
Sunil Mushran	12eb0035d6	ocfs2/dlm: Rename slabcache dlm_mle_cache to o2dlm_mle This patch renames dlm_mle_slabcache to prevent namespace clashes with fs/dlm. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:08 -07:00
Joel Becker	9341d22942	ocfs2: Allow selection of cluster plug-ins. ocfs2 now supports plug-ins for the classic O2CB stack as well as userspace cluster stacks in conjunction with fs/dlm. This allows zero, one, or both of the plug-ins to be selected in Kconfig. For local mounts (non-clustered), neither plug-in is needed. Both plugins can be loaded at one time, the runtime will select the one needed for the cluster systme in use. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:07 -07:00
Joel Becker	b92eccdd28	ocfs2: Add kbuild for ocfs2_stack_user.ko Add ocfs2_stack_user.ko to the Makefile so that it builds. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:07 -07:00
Joel Becker	8f318311fa	ocfs2: Change mlog_bug_on to BUG_ON in ocfs2_lockid.h The masklog code is in the o2cb stack, but ocfs2_lockid.h now needs to be included by the user stack. The BUG() in ocfs2_lock_type_string() does not need masklog support, so change it to a regular BUG_ON(). Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:07 -07:00
David Teigland	cf4d8d75d8	ocfs2: add fsdlm to stackglue Add code to use fs/dlm. [ Modified to be part of the stack_user module -- Joel ] Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:07 -07:00
Joel Becker	d4b95eef4d	ocfs2: Add the 'set version' message to the ocfs2_control device. The "SETV" message sets the filesystem locking protocol version as negotiated by the client. The client negotiates based on the maximum version advertised in /sys/fs/ocfs2/max_locking_protocol. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:07 -07:00
Joel Becker	3cfd4ab6b6	ocfs2: Add the local node id to the handshake. This is the second part of the ocfs2_control handshake. After negotiating the ocfs2_control protocol, the daemon tells the filesystem what the local node id is via the SETN message. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:06 -07:00
Joel Becker	de870ef022	ocfs2: Introduce the DOWN message to ocfs2_control When the control daemon sees a node go down, it sends a DOWN message through the ocfs2_control device. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:06 -07:00
Joel Becker	462c7e6a25	ocfs2: Start the ocfs2_control handshake. When a control daemon opens the ocfs2_control device, it must perform a handshake to tell the filesystem it is something capable of monitoring cluster status. Only after the handshake is complete will the filesystem allow mounts. This is the first part of the handshake. The daemon reads all supported ocfs2_control protocols, then writes in the protocol it will use. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:06 -07:00
Joel Becker	6427a72755	ocfs2: Add the ocfs2_control misc device. The ocfs2_control misc device is how a userspace control daemon (controld) talks to the filesystem. Introduce the bare-bones filesystem ops. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:06 -07:00
Joel Becker	8adf0536c9	ocfs2: Add the user stack module. Add a skeleton for the stack_user module. It's just the barebones module code. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:06 -07:00
Joel Becker	9c6c877c04	ocfs2: Add the 'cluster_stack' sysfs file. Userspace can now query and specify the cluster stack in use via the /sys/fs/ocfs2/cluster_stack file. By default, it is 'o2cb', which is the classic stack. Thus, old tools that do not know how to modify this file will work just fine. The stack cannot be modified if there is a live filesystem. ocfs2_cluster_connect() now takes the expected cluster stack as an argument. This way, the filesystem and the stack glue ensure they are speaking to the same backend. If the stack is 'o2cb', the o2cb stack plugin is used. For any other value, the fsdlm stack plugin is selected. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:05 -07:00
Joel Becker	b61817e116	ocfs2: Add the USERSPACE_STACK incompat bit. The filesystem gains the USERSPACE_STACK incomat bit and the s_cluster_info field on the superblock. When a userspace stack is in use, the name of the stack is stored on-disk for mount-time verification. The "cluster_stack" option is added to mount(2) processing. The mount process needs to pass the matching stack name. If the passed name and the on-disk name do not match, the mount is failed. When using the classic o2cb stack, the incompat bit is not set and no mount option is used other than the usual heartbeat=local. Thus, the filesystem is compatible with older tools. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:05 -07:00
Joel Becker	74ae4e104d	ocfs2: Create stack glue sysfs files. Introduce a set of sysfs files that describe the current stack glue state. The files live under /sys/fs/ocfs2. The locking_protocol file displays the version of ocfs2's locking code. The loaded_cluster_plugins file displays all of the currently loaded stack plugins. When filesystems are mounted, the active_cluster_plugin file will display the plugin in use. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:05 -07:00
Joel Becker	286eaa95c5	ocfs2: Break out stackglue into modules. We define the ocfs2_stack_plugin structure to represent a stack driver. The o2cb stack code is split into stack_o2cb.c. This becomes the ocfs2_stack_o2cb.ko module. The stackglue generic functions are similarly split into the ocfs2_stackglue.ko module. This module now provides an interface to register drivers. The ocfs2_stack_o2cb driver registers itself. As part of this interface, ocfs2_stackglue can load drivers on demand. This is accomplished in ocfs2_cluster_connect(). ocfs2_cluster_disconnect() is now notified when a _hangup() is pending. If a hangup is pending, it will not release the driver module and will let _hangup() do that. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2008-04-18 08:56:05 -07:00
Joel Becker	e3dad42bf9	ocfs2: Create ocfs2_stack_operations and split out the o2cb stack. Define the ocfs2_stack_operations structure. Build o2cb_stack_ops from all of the o2cb-specific stack functions. Change the generic stack glue functions to call the stack_ops instead of the o2cb functions directly. The o2cb functions are moved to stack_o2cb.c. The headers are cleaned up to where only needed headers are included. In this code, stackglue.c and stack_o2cb.c refer to some shared extern variables. When they become modules, that will change. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:05 -07:00
Joel Becker	553aa7e408	ocfs2: Split o2cb code from generic stack functions. Split off the o2cb-specific funtionality from the generic stack glue calls. This is a precurser to wrapping the o2cb functionality in an operations vector. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:05 -07:00
Joel Becker	63e0c48ae6	ocfs2: Clean up stackglue initialization The stack glue initialization function needs a better name so that it can be used cleanly when stackglue becomes a module. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:05 -07:00
Joel Becker	cf0acdcd64	ocfs2: Abstract out a debugging function for underlying dlms. dlmglue.c was still referencing a raw o2dlm lksb in one instance. Let's create a generic ocfs2_dlm_dump_lksb() function. This allows underlying DLMs to print whatever they want about their lock. We then move the o2dlm dump into stackglue.c where it belongs. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:04 -07:00
David Teigland	1693a5c011	ocfs2: handle async EAGAIN from NOQUEUE request When using fsdlm, -EAGAIN is returned in the async callback for NOQUEUE requests. Fix up dlmglue to expect this. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:04 -07:00
Joel Becker	de551246e7	ocfs2: Remove CANCELGRANT from the view of dlmglue. o2dlm has the non-standard behavior of providing a cancel callback (unlock_ast) even when the cancel has failed (the locking operation succeeded without canceling). This is called CANCELGRANT after the status code sent to the callback. fs/dlm does not provide this callback, so dlmglue must be changed to live without it. o2dlm_unlock_ast_wrapper() in stackglue now ignores CANCELGRANT calls. Because dlmglue no longer sees CANCELGRANT, ocfs2_unlock_ast() no longer needs to check for it. ocfs2_locking_ast() must catch that a cancel was tried and clear the cancel state. Making these changes opens up a locking race. dlmglue uses the the OCFS2_LOCK_BUSY flag to ensure only one thread is calling the dlm at any one time. But dlmglue must unlock the lockres before calling into the dlm. In the small window of time between unlocking the lockres and calling the dlm, the downconvert thread can try to cancel the lock. The downconvert thread is checking the OCFS2_LOCK_BUSY flag - it doesn't know that ocfs2_dlm_lock() has not yet been called. Because ocfs2_dlm_lock() has not yet been called, the cancel operation will just be a no-op. There's nothing to cancel. With CANCELGRANT, dlmglue uses the CANCELGRANT callback to clear up the cancel state. When it comes around again, it will retry the cancel. Eventually, the first thread will have called into ocfs2_dlm_lock(), and either the lock or the cancel will succeed. The downconvert thread can then do its downconvert. Without CANCELGRANT, there is nothing to clean up the cancellation state. The downconvert thread does not know to retry its operations. More importantly, the original lock may be blocking on the other node that is trying to cancel us. With neither able to make progress, the ast is never called and the cancellation state is never cleaned up that way. dlmglue is deadlocked. The OCFS2_LOCK_PENDING flag is introduced to remedy this window. It is set at the same time OCFS2_LOCK_BUSY is. Thus, the downconvert thread can check whether the lock is cancelable. If not, it just loops around to try again. Once ocfs2_dlm_lock() is called, the thread then clears OCFS2_LOCK_PENDING and wakes the downconvert thread. Now, if the downconvert thread finds the lock BUSY, it can safely try to cancel it. Whether the cancel works or not, the state will be properly set and the lock processing can continue. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:04 -07:00
Mark Fasheh	0abd6d1803	ocfs2: Fill node number during cluster stack init It doesn't make sense to query for a node number before connecting to the cluster stack. This should be safe to do because node_num is only just printed, and we're actually only moving the setting of node num a small amount further in the mount process. [ Disconnect when node query fails -- Joel ] Reviewed-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:04 -07:00
Joel Becker	6953b4c008	ocfs2: Move o2hb functionality into the stack glue. The last bit of classic stack used directly in ocfs2 code is o2hb. Specifically, the check for heartbeat during mount and the call to ocfs2_hb_ctl during unmount. We create an extra API, ocfs2_cluster_hangup(), to encapsulate the call to ocfs2_hb_ctl. Other stacks will just leave hangup() empty. The check for heartbeat is moved into ocfs2_cluster_connect(). It will be matched by a similar check for other stacks. With this change, only stackglue.c includes cluster/ headers. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:04 -07:00
Joel Becker	19fdb624dc	ocfs2: Abstract out node number queries. ocfs2 asks the cluster stack for the local node's node number for two reasons; to fill the slot map and to print it. While the slot map isn't necessary for userspace cluster stacks, the printing is very nice for debugging. Thus we add ocfs2_cluster_this_node() as a generic API to get this value. It is anticipated that the slot map will not be used under a userspace cluster stack, so validity checks of the node num only need to exist in the slot map code. Otherwise, it just gets used and printed as an opaque value. [ Fixed up some "int" versus "unsigned int" issues and made osb->node_num truly opaque. --Mark ] Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:04 -07:00
Joel Becker	4670c46ded	ocfs2: Introduce the new ocfs2_cluster_connect/disconnect() API. This step introduces a cluster stack agnostic API for initializing and exiting. fs/ocfs2/dlmglue.c no longer uses o2cb/o2dlm knowledge to connect to the stack. It is all handled in stackglue.c. heartbeat.c no longer needs to know how it gets called. ocfs2_do_node_down() is now a clean recovery trigger. The big gotcha is the ordering of initializations and de-initializations done underneath ocfs2_cluster_connect(). ocfs2_dlm_init() used to do all o2dlm initialization in one block. Thus, the o2dlm functionality of ocfs2_cluster_connect() is very straightforward. ocfs2_dlm_shutdown(), however, did a few things between de-registration of the eviction callback and actually shutting down the domain. Now de-registration and shutdown of the domain are wrapped within the single ocfs2_cluster_disconnect() call. I've checked the code paths to make sure we can safely tear down things in ocfs2_dlm_shutdown() before calling ocfs2_cluster_disconnect(). The filesystem has already set itself to ignore the callback. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:04 -07:00
Joel Becker	8f2c9c1b16	ocfs2: Create the lock status block union. Wrap the lock status block (lksb) in a union. Later we will add a union element for the fs/dlm lksb. Create accessors for the status and lvb fields. Other than a debugging function, dlmglue.c does not directly reference the o2dlm locking path anymore. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:04 -07:00
Joel Becker	7431cd7e8d	ocfs2: Use -errno instead of dlm_status for ocfs2_dlm_lock/unlock() API. Change the ocfs2_dlm_lock/unlock() functions to return -errno values. This is the first step towards elminiating dlm_status in fs/ocfs2/dlmglue.c. The change also passes -errno values to ->unlock_ast(). [ Fix a return code in dlmglue.c and change the error translation table into an array of ints. --Mark ] Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:03 -07:00
Joel Becker	bd3e76105d	ocfs2: Use global DLM_ constants in generic code. The ocfs2 generic code should use the values in <linux/dlmconstants.h>. stackglue.c will convert them to o2dlm values. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:03 -07:00
Joel Becker	24ef1815e5	ocfs2: Separate out dlm lock functions. This is the first in a series of patches to isolate ocfs2 from the underlying cluster stack. Here we wrap the dlm locking functions with ocfs2-specific calls. Because ocfs2 always uses the same dlm lock status callbacks, we can eliminate the callbacks from the filesystem visible functions. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:03 -07:00
Joel Becker	386a2ef857	ocfs2: New slot map format The old slot map had a few limitations: - It was limited to one block, so the maximum slot count was 255. - Each slot was signed 16bits, limiting node numbers to INT16_MAX. - An empty slot was marked by the magic 0xFFFF (-1). The new slot map format provides 32bit node numbers (UINT32_MAX), a separate space to mark a slot in use, and extra room to grow. The slot map is now bounded by i_size, not a block. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:03 -07:00
Joel Becker	fb86b1f071	ocfs2: Define the contents of the slot_map file. The slot map file is merely an array of __le16. Wrap it in a structure for cleaner reference. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:03 -07:00
Joel Becker	fc881fa0d5	ocfs2: De-magic the in-memory slot map. The in-memory slot map uses the same magic as the on-disk one. There is a special value to mark a slot as invalid. It relies on the size of certain types and so on. Write a new in-memory map that keeps validity as a separate field. Outside of the I/O functions, OCFS2_INVALID_SLOT now means what it is supposed to. It also is no longer tied to the type size. This also means that only the I/O functions refer to 16bit quantities. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:03 -07:00
Joel Becker	1c8d9a6a33	ocfs2: slot_map I/O based on max_slots. The slot map code assumed a slot_map file has one block allocated. This changes the code to I/O as many blocks as will cover max_slots. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:02 -07:00
Joel Becker	553abd046a	ocfs2: Change the recovery map to an array of node numbers. The old recovery map was a bitmap of node numbers. This was sufficient for the maximum node number of 254. Going forward, we want node numbers to be UINT32. Thus, we need a new recovery map. Note that we can't keep track of slots here. We must write down the node number to recovery before we get the locks needed to convert a node number into a slot number. The recovery map is now an array of unsigned ints, max_slots in size. It moves to journal.c with the rest of recovery. Because it needs to be initialized, we move all of recovery initialization into a new function, ocfs2_recovery_init(). This actually cleans up ocfs2_initialize_super() a little as well. Following on, recovery cleaup becomes part of ocfs2_recovery_exit(). A number of node map functions are rendered obsolete and are removed. Finally, waiting on recovery is wrapped in a function rather than naked checks on the recovery_event. This is a cleanup from Mark. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:02 -07:00
Joel Becker	d85b20e4b3	ocfs2: Make ocfs2_slot_info private. Just use osb_lock around the ocfs2_slot_info data. This allows us to take the ocfs2_slot_info structure private in slot_info.c. All access is now via accessors. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:02 -07:00
Mark Fasheh	8e8a4603b5	ocfs2: Move slot map access into slot_map.c journal.c and dlmglue.c would refresh the slot map by hand. Instead, have the update and clear functions do the work inside slot_map.c. The eventual result is to make ocfs2_slot_info defined privately in slot_map.c Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-04-18 08:56:02 -07:00
Linus Torvalds	253ba4e79e	Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6 * 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6: (87 commits) [XFS] Fix merge failure [XFS] The forward declarations for the xfs_ioctl() helpers and the [XFS] Update XFS documentation for noikeep/ikeep. [XFS] Update XFS Documentation for ikeep and ihashsize [XFS] Remove unused HAVE_SPLICE macro. [XFS] Remove CONFIG_XFS_SECURITY. [XFS] xfs_bmap_compute_maxlevels should be based on di_forkoff [XFS] Always use di_forkoff when checking for attr space. [XFS] Ensure the inode is joined in xfs_itruncate_finish [XFS] Remove periodic logging of in-core superblock counters. [XFS] fix logic error in xfs_alloc_ag_vextent_near() [XFS] Don't error out on good I/Os. [XFS] Catch log unmount failures. [XFS] Sanitise xfs_log_force error checking. [XFS] Check for errors when changing buffer pointers. [XFS] Don't allow silent errors in xfs_inactive(). [XFS] Catch errors from xfs_imap(). [XFS] xfs_bulkstat_one_dinode() never returns an error. [XFS] xfs_iflush_fork() never returns an error. [XFS] Catch unwritten extent conversion errors. ...	2008-04-18 08:39:39 -07:00
Linus Torvalds	4adeaaf51e	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6: jfs: replace __inline with inline jfs: le*_add_cpu conversion	2008-04-18 08:37:19 -07:00
Roel Kluin	62be1f7167	[GFS2] fix assertion in log_refund() since unsigned, unused >= 0 is always true. Signed-off-by: Roel Kluin <12o3l@tiscali.nl> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-04-18 08:36:09 +01:00
David S. Miller	1e42198609	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6	2008-04-17 23:56:30 -07:00
Lachlan McIlroy	65e67f5165	[XFS] Fix merge failure	2008-04-18 12:59:45 +10:00
Lachlan McIlroy	3b2816be27	[XFS] The forward declarations for the xfs_ioctl() helpers and the associated comment about gcc behavior really aren't needed; all of these functions are marked STATIC which includes noinline, and the stack usage won't be a problem. This effectively just removes the forward declarations and moves xfs_ioctl() back to the end of the file. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30534a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 12:43:35 +10:00
Donald Douwsma	e687330b5e	[XFS] Remove unused HAVE_SPLICE macro. HAVE_SPLICE was part of the infrastructure for building 2.4 and 2.6 kernels out of the same tree. Now we don't build 2.4 kernels this SGI-PV: 971046 SGI-Modid: xfs-linux-melb:xfs-kern:30878a Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 12:04:29 +10:00
Eric Sandeen	f7d3c34788	[XFS] Remove CONFIG_XFS_SECURITY. There is no point to the CONFIG_XFS_SECURITY option; it disables the ability to set security attributes at runtime, but it does not actually slim down or remove any code for runtime. Just remove it and always allow security attributes to be set. SGI-PV: 980310 SGI-Modid: xfs-linux-melb:xfs-kern:30877a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 12:04:19 +10:00
Tim Shimmin	6d1337b29b	[XFS] xfs_bmap_compute_maxlevels should be based on di_forkoff Fix up xfs_bmap_compute_maxlevels() to account for the case when we go from using attr2 to using attr1. In that case attr1 will no longer necessarily be at m_attr_offset>>3, but could be at a different value for di_forkoff. Therefore, we return the worst case scenario using MINDBTPTRS and MINABTPTRS, as this function is used for determining the maximum log space. SGI-PV: 979606 SGI-Modid: xfs-linux-melb:xfs-kern:30862a Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 12:04:08 +10:00
Eric Sandeen	cb49dbb130	[XFS] Always use di_forkoff when checking for attr space. In the case where we mount a filesystem which was previously using the attr2 format as attr1, returning the default mp->m_attroffset instead of the per-inode di_forkoff for inline attribute fit calculations, may result in corruption, if for example, the data fork is already taking more space than the default fork offset and we try to add an extended attribute. Fix tested by xfstests/186. SGI-PV: 979606 SGI-Modid: xfs-linux-melb:xfs-kern:30861a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 12:03:40 +10:00
David Chinner	f6485057c5	[XFS] Ensure the inode is joined in xfs_itruncate_finish On success, we still need to join the inode to the current transaction in xfs_itruncate_finish(). Fixes regression from error handling changes. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30845a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 12:03:26 +10:00
David Chinner	7e20694d91	[XFS] Remove periodic logging of in-core superblock counters. xfssyncd triggers the logging of superblock counters every 30s if the filesystem is made with lazy-count=1. This will prevent disks from idling and spinning down as there will be a log write every 30s. With the way counter recovery works for lazy-count=1, this code is unnecessary and provides no real benefit, so just remove it. SGI-PV: 980145 SGI-Modid: xfs-linux-melb:xfs-kern:30840a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Barry Naujok <bnaujok@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 12:03:12 +10:00
David Chinner	e6430037e9	[XFS] fix logic error in xfs_alloc_ag_vextent_near() Fix a logic error in xfs_alloc_ag_vextent_near(). This is a regression introduced by the error handling changes. SGI-PV: 890084 SGI-Modid: xfs-linux-melb:xfs-kern:30838a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Barry Naujok <bnaujok@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 12:03:02 +10:00
David Chinner	d4055947bd	[XFS] Don't error out on good I/Os. xfsbdstrat() made all I/Os error out, good or bad. Fix it. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30836a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 12:02:41 +10:00
David Chinner	1bb7d6b5a8	[XFS] Catch log unmount failures. Unmounting the log can fail. unlikely, but it can. Catch all the error conditions an make sure it's propagated upwards. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30833a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 12:02:30 +10:00
David Chinner	b911ca0472	[XFS] Sanitise xfs_log_force error checking. xfs_log_force() is declared to return an error, but we almost never check it. We don't need to check it in most cases; if there's a log I/O error then we'll be shutting down the filesystem anyway and that means we'll catch the error somewhere else. However, on certain calls we should be returning an error - sync transactions, fsync, sync writes, etc. so this isn't a pure black and white distinction. Hence make xfs_log_force() a void function that issues a warning to the syslog on error, and call _xfs_log_force() in all the places where we actually care about the error status returned. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30832a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 12:02:20 +10:00
David Chinner	234f56aca2	[XFS] Check for errors when changing buffer pointers. xfs_buf_associate_memory() can fail, but the return is never checked. Propagate the error through XFS_BUF_SET_PTR() so that failures are detected. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30831a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 12:02:10 +10:00
David Chinner	78e9da77f1	[XFS] Don't allow silent errors in xfs_inactive(). xfs_inactive() fails to report errors when committing the inactive transaction. Hence we can get silent failures either finishing off the truncation or committing the transaction. Even if we get errors, we need to continue, so simply warn loudly to the system if we get errors here. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30830a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 12:01:58 +10:00
David Chinner	64bfe1bfae	[XFS] Catch errors from xfs_imap(). Catch errors from xfs_imap() in log recovery when we might be trying to map an invalid inode number due to a corrupted log. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30829a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 12:01:39 +10:00
David Chinner	7b07339048	[XFS] xfs_bulkstat_one_dinode() never returns an error. Mark it void. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30828a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 12:01:27 +10:00
David Chinner	e4ac967b11	[XFS] xfs_iflush_fork() never returns an error. xfs_iflush_fork() never returns an error. Mark it void and clean up the code calling it that checks for errors. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30827a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 12:01:11 +10:00
David Chinner	cc88466f3f	[XFS] Catch unwritten extent conversion errors. On unwritten I/O completion, we fail to propagate an error when converting the extent to a written extent. This means that the I/O silently fails. propagate the error onto the ioend so that the inode is marked with an error appropriately. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30826a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 12:00:58 +10:00
David Chinner	958d4ec606	[XFS] xfs_bdwrite() does not return errors. xfs_bdwrite() cannot return an error; it only queues buffers to the delayed write list and as such never encounters anything that can fail. Mark it void. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30825a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 12:00:46 +10:00
David Chinner	db7a19f2c8	[XFS] Ensure xfs_bawrite() errors are checked. xfs_bawrite() can return immediate error status on async writes. Unlike xfsbdstrat() we don't ever check the error on the buffer after the call, so we currently do not catch errors at all here. Ensure we catch and propagate or warn to the syslog about up-front async write errors. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30824a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 12:00:35 +10:00
David Chinner	d64e31a2f5	[XFS] Ensure errors from xfs_bdstrat() are correctly checked. xfsbdstrat() is declared to return an error. That is never checked because the error is propagated by the xfs_buf_t that is passed through the function. Mark xfsbdstrat() as returning void and comment the prototype on the methods needed for error checking. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30823a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 12:00:24 +10:00
Barry Naujok	556b8b166c	[XFS] remove bhv_vname_t and xfs_rename code SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30804a Signed-off-by: Barry Naujok <bnaujok@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 12:00:12 +10:00
David Chinner	7c9ef85c56	[XFS] Catch errors returned from xfs_bmap_last_offset(). xfs_bmap_last_offset() can fail and return an error. xfs_iomap_write_allocate() fails to detect and propagate the error. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30802a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:59:45 +10:00
David Chinner	fc6149d8d9	[XFS] Check for xfs_free_extent() failing. xfs_free_extent() can fail, but log recovery never bothers to check if it successfully free the extent it was supposed to. This could lead to silent corruption during log recovery. Abort log recovery if we fail to free an extent. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30801a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:59:23 +10:00
David Chinner	d87dd6360d	[XFS] Warn if errors come from block_truncate_page(). block_truncate_page() can return errors that we currently ignore and silently discard. We should not ever get errors reported here - an error indicates a bug somewhere else. Hence catch the error and issue a stack dump to the syslog because we cannot propagate the error any further up the call chain. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30800a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:59:12 +10:00
David Chinner	c2b1cba683	[XFS] xfs_bmap_adjacent() never returns an error. Mark it void. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30798a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:58:46 +10:00
David Chinner	12375c8237	[XFS] Make xfs_alloc_compute_aligned() void. xfs_alloc_compute_aligned() returns a value based on a comparison of the computed extent length and the minimum length allowed. This is only used by some callers - the other four return parameters are used more often. Hence move the comparison to the code that actually needs to do it and make xfs_alloc_compute_aligned() a void function. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30797a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:58:36 +10:00
David Chinner	f4586e4061	[XFS] Clean up xfs_alloc_search_busy() return values. xfs_alloc_search_busy() returns an index into the busy array if the extent was found in the array. This is never checked, and the xfs_alloc_search_busy() does a log force to prevent reuse of the extent before the free transaction hits the disk. Hence the return value is useless. Declare the function void and remove the slot number from the tracing as well. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30796a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:58:27 +10:00
David Chinner	e5720eec05	[XFS] Propagate errors from xfs_trans_commit(). xfs_trans_commit() can return errors when there are problems in the transaction subsystem. They are indicative that the entire transaction may be incomplete, and hence the error should be propagated as there is a good possibility that there is something fatally wrong in the filesystem. Catch and propagate or warn about commit errors in the places where they are currently ignored. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30795a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:58:17 +10:00
David Chinner	3c1e2bbe5b	[XFS] Propagate xfs_trans_reserve() errors. xfs_trans_reserve() reports errors that should not be ignored. For example, a shutdown filesystem will report errors through xfs_trans_reserve() to prevent further changes from being attempted on a damaged filesystem. Catch and propagate all error conditions from xfs_trans_reserve(). SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30794a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:58:08 +10:00
David Chinner	5ca1f261a0	[XFS] Catch errors from xfs_acl_vremove(). Removing an ACL can return an error. Propagate it. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30793a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:57:57 +10:00
David Chinner	0c92829967	[XFS] Catch errors from xfs_acl_setmode(). Propagate the error status from xfs_acl_setmode() so that callers know if the ACl was set correctly or not. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30792a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:57:46 +10:00
David Chinner	88ab020853	[XFS] Propagate quota file truncation errors. Truncating the quota files can silently fail. Ensure that truncation errors are propagated to the callers. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30791a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:57:36 +10:00
David Chinner	cb6edc26c3	[XFS] Catch errors when turning off quotas. When turning off quota, we need to write various transactions to the log to ensure that they are cleanly removed in the case of a crash. We need to check that the transactions hit the disk correctly. If we fail to write the final quota off transaction, we are corrupt in memory and so the only option is to shut the filesystem down at this point. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30790a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:57:26 +10:00
David Chinner	31d5577b35	[XFS] Catch errors resetting quota flags. Warn to the syslog if we fail to reset the quota flags in the superblock when a quota check fails. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30789a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:57:16 +10:00
David Chinner	53aa7915d6	[XFS] Clean up quotamount error handling. xfs_qm_mount_quotas() returns an error status that is ignored. If we fail to mount quotas, we continue with quota's turned off, which is all handled inside xfs_qm_mount_quotas(). Mark it as void to indicate that errors need not be returned to the callers. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30788a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:57:05 +10:00
David Chinner	3c56836f92	[XFS] Check for dquot flush errors xfs_qm_dqflush() can fail, but the return is not checked anywhere. Hence we never know if we've failed to flush a dquot to disk. Propagate the error and warn to the syslog if a flush ever fails. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30787a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:56:55 +10:00
David Chinner	4b8879df8c	[XFS] Propagate xfs_qm_dqflush_all() errors. xfs_qm_dqflush_all() can return flush errors. Ensure they are propagated into the quotacheck code to determine if the quotacheck succeeded or not. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30786a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:54:56 +10:00
David Chinner	5b1397385b	[XFS] xfs_qm_reset_dqcounts() does not return errors. Declare it void. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30785a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:54:06 +10:00
David Chinner	714082bc12	[XFS] Report errors from xfs_reserve_blocks(). xfs_reserve_blocks() can fail in interesting ways. In neither case is it a fatal error, but the result can lead to sub-optimal behaviour. Warn to the syslog if the call fails but otherwise continue. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30784a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:53:51 +10:00
David Chinner	36fbe6e6bd	[XFS] xfs_icsb_counter_disabled() never returns an error. Mark it void. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30782a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:52:43 +10:00
David Chinner	a414047fc9	[XFS] Remove useless whitespace in function prototypes Makes it simpler to annotate function prototypes with __must_check via sed scripts. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30781a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:51:58 +10:00
David Chinner	3c85c36cc2	[XFS] xfs_quiesce_fs() never returns an error. Mark it void. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30780a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:51:46 +10:00
Christoph Hellwig	b6ddc4e6fe	[XFS] Don't validate symlink target component length This target component validation is not POSIX conformant and it is not done by any other Linux filesystem so remove it from XFS. SGI-PV: 980080 SGI-Modid: xfs-linux-melb:xfs-kern:30776a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:51:36 +10:00
Harvey Harrison	34a622b2e1	[XFS] replace remaining __FUNCTION__ occurrences __FUNCTION__ is gcc-specific, use __func__ SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30775a Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:51:26 +10:00
Harvey Harrison	0225da1f35	[XFS] Replace __inline with inline Remove the remaining uses of __inline in the XFS code base. SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30774a Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:51:15 +10:00
David Chinner	6b1d1a732f	[XFS] Fix lock inversion in forced shutdown. Recent changes to xlog_state_release_iclog() placed the grant_lock inside the icloglock. forced unmount of the log does this the opposite way around, but does not depend on the order for correct working. Fix the inversion by changing the order locks are gained in xfs_log_force_umount(). SGI-PV: 979661 SGI-Modid: xfs-linux-melb:xfs-kern:30773a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:51:04 +10:00
David Chinner	4679b2d36d	[XFS] Reorganise xlog_t for better cacheline isolation of contention To reduce contention on the log in large CPU count, separate out different parts of the xlog_t structure onto different cachelines. Move each lock onto a different cacheline along with all the members that are accessed/modified while that lock is held. Also, move the debugging code into debug code. SGI-PV: 978729 SGI-Modid: xfs-linux-melb:xfs-kern:30772a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:50:53 +10:00
David Chinner	eb01c9cd87	[XFS] Remove the xlog_ticket allocator The ticket allocator is just a simple slab implementation internal to the log. It requires the icloglock to be held when manipulating it and this contributes to contention on that lock. Just kill the entire allocator and use a memory zone instead. While there, allow us to gracefully fail allocation with ENOMEM. SGI-PV: 978729 SGI-Modid: xfs-linux-melb:xfs-kern:30771a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:50:39 +10:00
David Chinner	114d23aae5	[XFS] Per iclog callback chain lock Rather than use the icloglock for protecting the iclog completion callback chain, use a new per-iclog lock so that walking the callback chain doesn't require holding a global lock. This reduces contention on the icloglock during transaction commit and log I/O completion by reducing the number of times we need to hold the global icloglock during these operations. SGI-PV: 978729 SGI-Modid: xfs-linux-melb:xfs-kern:30770a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:50:22 +10:00
Lachlan McIlroy	2abdb8c881	[XFS] Prevent xfs_bmap_check_leaf_extents() referencing unmapped memory. While investigating the extent corruption bug I ran into this bug in debug only code. xfs_bmap_check_leaf_extents() loops through the leaf blocks of the extent btree checking that every extent is entirely before the next extent. It also compares the last extent in the previous block to the first extent in the current block when the previous block has been released and potentially unmapped. So take a copy of the last extent instead of a pointer. Also move the last extent check out of the loop because we only need to do it once. SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30718a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-04-18 11:49:51 +10:00
Christoph Hellwig	433550990e	[XFS] remove most calls to VN_RELE Most VN_RELE calls either directly contain a XFS_ITOV or have the corresponding xfs_inode already in scope. Use the IRELE helper instead of VN_RELE to clarify the code. With a little more work we can kill VN_RELE altogether and define IRELE in terms of iput directly. SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30710a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:49:08 +10:00
Lachlan McIlroy	df26cfe849	[XFS] split xfs_ioc_xattr The three subcases of xfs_ioc_xattr don't share any semantics and almost no code, so split it into three separate helpers. SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30709a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:44:03 +10:00
Christoph Hellwig	f3dcc13f6f	[XFS] cleanup root inode handling in xfs_fs_fill_super - rename rootvp to root for clarify - remove useless vn_to_inode call - check is_bad_inode before calling d_alloc_root - use iput instead of VN_RELE in the error case SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30708a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:42:36 +10:00
David Chinner	59a33f9f77	[XFS] Ensure a btree insert returns a valid cursor. When writing into preallocated regions there is a case where XFS can oops or hang doing the unwritten extent conversion on I/O completion. It turns out that the problem is related to the btree cursor being invalid. When we do an insert into the tree, we may need to split blocks in the tree. When we only split at the leaf level (i.e. level 0), everything works just fine. However, if we have a multi-level split in the btreee, the cursor passed to the insert function is no longer valid once the insert is complete. The leaf level split is handled correctly because all the operations at level 0 are done using the original cursor, hence it is updated correctly. However, when we need to update the next level up the tree, we don't use that cursor - we use a cloned cursor that points to the index in the next level up where we need to do the insert. Hence if we need to split a second level, the changes to the tree are reflected in the cloned cursor and not the original cursor. This clone-and-move-up-a-level-on-split behaviour recurses all the way to the top of the tree. The complexity here is that these cloned cursors do not point to the original index that was inserted - they point to the newly allocated block (the right block) and the original cursor pointer to that level may still point to the left block. Hence, without deep examination of the cloned cursor and buffers, we cannot update the original cursor with the new path from the cloned cursor. In these cases the original cursor could be pointing to the wrong block(s) and hence a subsequent modification to the tree using that cursor will lead to corruption of the tree. The crash case occurs when the tree changes height - we insert a new level in the tree, and the cursor does not have a buffer in it's path for that level. Hence any attempt to walk back up the cursor to the root block will result in a null pointer dereference. To make matters even more complex, the BMAP BT is rooted in an inode, so we can have a change of height in the btree without a root split. That is, if the root block in the inode is full when we split a leaf node, we cannot fit the pointer to the new block in the root, so we allocate a new block, migrate all the ptrs out of the inode into the new block and point the inode root block at the newly allocated block. This changes the height of the tree without a root split having occurred and hence invalidates the path in the original cursor. The patch below prevents xfs_bmbt_insert() from returning with an invalid cursor by detecting the cases that invalidate the original cursor and refresh it by do a lookup into the btree for the original index we were inserting at. Note that the INOBT, AGFBNO and AGFCNT btree implementations also have this bug, but the cursor is currently always destroyed or revalidated after an insert for those trees. Hence this patch only address the problem in the BMBT code. SGI-PV: 979339 SGI-Modid: xfs-linux-melb:xfs-kern:30701a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:42:21 +10:00
David Chinner	75de2a91c9	[XFS] Account for inode cluster alignment in all allocations At ENOSPC, we can get a filesystem shutdown due to a cancelling a dirty transaction in xfs_mkdir or xfs_create. This is due to the initial allocation attempt not taking into account inode alignment and hence we can prepare the AGF freelist for allocation when it's not actually possible to do an allocation. This results in inode allocation returning ENOSPC with a dirty transaction, and hence we shut down the filesystem. Because the first allocation is an exact allocation attempt, we must tell the allocator that the alignment does not affect the allocation attempt. i.e. we will accept any extent alignment as long as the extent starts at the block we want. Unfortunately, this means that if the longest free extent is less than the length + alignment necessary for fallback allocation attempts but is long enough to attempt a non-aligned allocation, we will modify the free list. If we then have the exact allocation fail, all other allocation attempts will also fail due to the alignment constraint being taken into account. Hence the initial attempt needs to set the "alignment slop" field so that alignment, while not required, must be taken into account when determining if there is enough space left in the AG to do the allocation. That means if the exact allocation fails, we will not dirty the freelist if there is not enough space available fo a subsequent allocation to succeed. Hence we get an ENOSPC error back to userspace without shutting down the filesystem. SGI-PV: 978886 SGI-Modid: xfs-linux-melb:xfs-kern:30699a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:42:09 +10:00
Josef 'Jeff' Sipek	535f6b3735	[XFS] Replace custom AIL linked-list code with struct list_head Replace the xfs_ail_entry_t with a struct list_head and clean the surrounding code up. Also fixes a livelock in xfs_trans_first_push_ail() by terminating the loop at the head of the list correctly. SGI-PV: 978682 SGI-Modid: xfs-linux-melb:xfs-kern:30636a Signed-off-by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:41:57 +10:00
Christoph Hellwig	a45c796867	[XFS] Remove superflous xfs_readsb call in xfs_mountfs. When xfs_mountfs is called by xfs_mount xfs_readsb was called 35 lines above unconditionally, so there is no need to try to read the superblock if it's not present. If any other port doesn't have the superblock read at this point it should just call it directly from it's xfs_mount equivalent. SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30603a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:41:46 +10:00

1 2 3 4 5 ...

8421 Commits (2e4b7fcd926006531935a4c79a5e9349fe51125b)