linux

q3k/linux

Author	SHA1	Message	Date
Alex Elder	e124a82f3c	rbd: protect the rbd_dev_list with a spinlock The rbd_dev_list is just a simple list of all the current rbd_devices. Using the ctl_mutex as a concurrency guard is overkill. Instead, use a spinlock for that specific purpose. This also reduces the window that the ctl_mutex needs to be held in rbd_add(). Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	1ddbe94eda	rbd: rework calculation of new rbd id's In order to select a new unique identifier for an added rbd device, the list of all existing ones is searched and a value one greater than the highest id is used. The list search can be avoided by using an atomic variable that keeps track of the current highest id. Using a get/put model for id's we can limit the boundless growth of id numbers a bit by arranging to reuse the current highest id once it gets released. Add these calls to "put" the id when an rbd is getting removed. Note that this changes the pattern of device id's used--new values will never be below the highest one seen so far (even if there exists an unused lower one). I assert this is OK because the key property of an rbd id is its uniqueness, not its magnitude. Regardless, a follow-on patch will restore the old way of doing things, I just think this commit just makes the incremental change to atomics a little easier to understand. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	b7f23c361b	rbd: encapsulate new rbd id selection Move the loop that finds a new unique rbd id to use into its own helper function. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Josh Durgin	cc9d734c3d	rbd: use a single value of snap_name to mean no snap There's already a constant for this anyway. Since rbd_header_set_snap() is only used to set the rbd device snap_name field, just do that within that function rather than having it take the snap_name as an argument. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net> v2: Changed interface rbd_header_set_snap() so it explicitly updates the snap_name in the rbd_device. Also added a BUILD_BUG_ON() to verify the size of the snap_name field is sufficient for SNAP_HEAD_NAME.	2012-03-22 10:47:47 -05:00
Alex Elder	1dbb439913	rbd: do not duplicate ceph_client pointer in rbd_device The rbd_device structure maintains a duplicate copy of the ceph_client pointer maintained in its rbd_client structure. There appears to be no good reason for this, and its presence presents a risk of them getting out of synch or otherwise misused. So kill it off, and use the rbd_client copy only. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	ee57741c52	rbd: make ceph_parse_options() return a pointer ceph_parse_options() takes the address of a pointer as an argument and uses it to return the address of an allocated structure if successful. With this interface is not evident at call sites that the pointer is always initialized. Change the interface to return the address instead (or a pointer-coded error code) to make the validity of the returned pointer obvious. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	2107978668	rbd: a few small cleanups Some minor cleanups in "drivers/block/rbd.c: - Use the more meaningful "RBD_MAX_OBJ_NAME_LEN" in place if "96" in the definition of RBD_MAX_MD_NAME_LEN. - Use DEFINE_SPINLOCK() to define and initialize node_lock. - Drop a needless (char *) cast in parse_rbd_opts_token(). - Make a few minor formatting changes. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:46 -05:00
Linus Torvalds	6c073a7ee2	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: fix safety of rbd_put_client() rbd: fix a memory leak in rbd_get_client() ceph: create a new session lock to avoid lock inversion ceph: fix length validation in parse_reply_info() ceph: initialize client debugfs outside of monc->mutex ceph: change "ceph.layout" xattr to be "ceph.file.layout"	2012-02-02 15:47:33 -08:00
Alex Elder	d23a4b3fd6	rbd: fix safety of rbd_put_client() The rbd_client structure uses a kref to arrange for cleaning up and freeing an instance when its last reference is dropped. The cleanup routine is rbd_client_release(), and one of the things it does is delete the rbd_client from rbd_client_list. It acquires node_lock to do so, but the way it is done is still not safe. The problem is that when attempting to reuse an existing rbd_client, the structure found might already be in the process of getting destroyed and cleaned up. Here's the scenario, with "CLIENT" representing an existing rbd_client that's involved in the race: Thread on CPU A \| Thread on CPU B --------------- \| --------------- rbd_put_client(CLIENT) \| rbd_get_client() kref_put() \| (acquires node_lock) kref->refcount becomes 0 \| __rbd_client_find() returns CLIENT calls rbd_client_release() \| kref_get(&CLIENT->kref); \| (releases node_lock) (acquires node_lock) \| deletes CLIENT from list \| ...and starts using CLIENT... (releases node_lock) \| and frees CLIENT \| <-- but CLIENT gets freed here Fix this by having rbd_put_client() acquire node_lock. The result could still be improved, but at least it avoids this problem. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-02-02 12:56:59 -08:00
Alex Elder	97bb59a03d	rbd: fix a memory leak in rbd_get_client() If an existing rbd client is found to be suitable for use in rbd_get_client(), the rbd_options structure is not being freed as it should. Fix that. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-02-02 12:49:27 -08:00
Alex Elder	0e805a1d85	rbd: initialize snap_rwsem in rbd_add() New rbd device structures get initialized in rbd_add(). Many of the fields rely on being initially zero-filled. However we lockdep was noticing that the rw_semaphore embedded in the header field was not getting properly initialized. Fix that. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-01-12 11:00:50 -08:00
Josh Durgin	51703306b3	rbd: remove buggy rollback functionality This doesn't interact with resizing well, since it doesn't set the size of the device to the size at the snapshot. It's also an expensive operation to be synchronous. Rollback can still be done with the userspace rbd tool. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>	2011-12-07 10:46:19 -08:00
Josh Durgin	81e759fbf7	rbd: return an error when an invalid header is read This protects against opening future rbd images that have incompatible format changes. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>	2011-12-07 10:46:10 -08:00
Linus Torvalds	97d2eb13a0	Merge branch 'for-linus' of git://ceph.newdream.net/git/ceph-client * 'for-linus' of git://ceph.newdream.net/git/ceph-client: libceph: fix double-free of page vector ceph: fix 32-bit ino numbers libceph: force resend of osd requests if we skip an osdmap ceph: use kernel DNS resolver ceph: fix ceph_monc_init memory leak ceph: let the set_layout ioctl set single traits Revert "ceph: don't truncate dirty pages in invalidate work thread" ceph: replace leading spaces with tabs libceph: warn on msg allocation failures libceph: don't complain on msgpool alloc failures libceph: always preallocate mon connection libceph: create messenger with client ceph: document ioctls ceph: implement (optional) max read size ceph: rename rsize -> rasize ceph: make readpages fully async	2011-10-28 16:42:18 -07:00
Sage Weil	6ab00d465a	libceph: create messenger with client This simplifies the init/shutdown paths, and makes client->msgr available during the rest of the setup process. Signed-off-by: Sage Weil <sage@newdream.net>	2011-10-25 16:10:15 -07:00
Jiri Kosina	e060c38434	Merge branch 'master' into for-next Fast-forward merge with Linus to be able to merge patches based on more recent version of the tree.	2011-09-15 15:08:18 +02:00
Justin P. Mattock	699324871f	treewide: remove extra semicolons from various parts of the kernel This is a resend from the original, changing the title from PATCH to RFC(since this is a review for commit, and I should have put that the first go around). and also removing some of the commit's with ia64 and bash since it is significant. let me know if I might have missed anything etc.. Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2011-09-15 14:50:49 +02:00
Josh Durgin	029bcbd8b0	rbd: set blk_queue request sizes to object size This improves performance since more requests can be merged. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>	2011-07-26 11:29:35 -07:00
Yehuda Sadeh	79e3057c4c	rbd: cancel watch request when releasing the device We were missing this cleanup, so when a device was released the osd didn't clean up its watchers list, so following notifications could be slow as osd needed to timeout on the client. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2011-07-26 11:29:04 -07:00
Sage Weil	9db4b3e327	rbd: handle online resize of underlying rbd image If we get a notification that the image header has changed, check for a change in the image size. Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-24 11:52:08 -07:00
Sage Weil	aedfec59ee	rbd: use snprintf for disk->disk_name Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-24 11:52:03 -07:00
Sage Weil	916d4d6727	rbd: cleanup: make kfree match kmalloc Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-24 11:52:01 -07:00
Sage Weil	13143d2d1c	rbd: warn on update_snaps failure on notify Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-19 11:25:05 -07:00
Yehuda Sadeh	1fec70932d	rbd: fix split bio handling The rbd driver currently splits bios when they span an object boundary. However, the blk_end_request expects the completions to roll up the results in block device order, and the split rbd/ceph ops can complete in any order. This patch adds a struct rbd_req_coll to track completion of split requests and ensures that the results are passed back up to the block layer in order. This fixes errors where the file system gets completion of a read operation that spans an object boundary before the data has actually arrived. The bug is easily reproduced with iozone with a working set larger than available RAM. Reported-by: Fyodor Ustinov <ufm@ufm.su> Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-13 13:52:57 -07:00
Sage Weil	11f770027b	rbd: fix leak of ops struct The ops vector must be freed by the rbd_do_request caller. Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-12 20:59:14 -07:00
Sage Weil	4ad12621e4	libceph: fix ceph_osdc_alloc_request error checks ceph_osdc_alloc_request returns NULL on failure. Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-03 09:28:13 -07:00
Yehuda Sadeh	59c2be1e4d	rbd: use watch/notify for changes in rbd header Send notifications when we change the rbd header (e.g. create a snapshot) and wait for such notifications. This allows synchronizing the snapshot creation between different rbd clients/rools. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-22 11:33:56 -07:00
Yehuda Sadeh	766fc43973	rbd: fix cleanup when trying to mount inexistent image Previously we didn't clean up the sysfs entry that was just created. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-12 15:15:18 -08:00
Yehuda Sadeh	dfc5606dc5	rbd: replace the rbd sysfs interface The new interface creates directories per mapped image and under each it creates a subdir per available snapshot. This allows keeping a cleaner interface within the sysfs guidelines. The ABI documentation was updated too. Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-01 15:53:22 -08:00
Dan Carpenter	85b5aaa624	rbd: passing wrong variable to bvec_kunmap_irq() We should be passing "buf" here insead of "bv". This is tricky because it's not the same as kmap() and kunmap(). GCC does warn about it if you compile on i386 with CONFIG_HIGHMEM. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2010-10-20 15:38:25 -07:00
Dan Carpenter	b8d0638a98	rbd: null vs ERR_PTR ceph_alloc_page_vector() returns ERR_PTR(-ENOMEM) on errors. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2010-10-20 15:38:24 -07:00
Yehuda Sadeh	f4cf3deef4	block: rbd: removing unnecessary test rbd_get_segment() can't return a negative value, we don't need to check the return output. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2010-10-20 15:38:20 -07:00
Vasiliy Kulikov	28f259b7cd	block: rbd: fixed may leaks rbd_client_create() doesn't free rbdc, this leads to many leaks. seg_len in rbd_do_op() is unsigned, so (seg_len < 0) makes no sense. Also if fixed check fails then seg_name is leaked. Signed-off-by: Vasiliy Kulikov <segooon@gmail.com> Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2010-10-20 15:38:19 -07:00
Yehuda Sadeh	602adf4002	rbd: introduce rados block device (rbd), based on libceph The rados block device (rbd), based on osdblk, creates a block device that is backed by objects stored in the Ceph distributed object storage cluster. Each device consists of a single metadata object and data striped over many data objects. The rbd driver supports read-only snapshots. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2010-10-20 15:38:13 -07:00

34 commits