Commit Graph

76 Commits (30b29537bcba070b3df8d7d24c1975676a1a6a4f)

Author SHA1 Message Date
Dan Magenheimer e8b4553457 zcache: Set SWIZ_BITS to 8 to reduce tmem bucket lock contention.
SWIZ_BITS > 8 results in a much larger number of "tmem_obj"
allocations, likely one per page-placed-in-frontswap.  The
tmem_obj is not huge (roughly 100 bytes), but it is large
enough to add a not-insignificant memory overhead to zcache.

The SWIZ_BITS=8  will get roughly the same lock contention
without the space wastage.

The effect of SWIZ_BITS can be thought of as "2^SWIZ_BITS is
the number of unique oids that be generated" (This concept is
limited to frontswap's use of tmem).

Acked-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-02-08 14:14:12 -08:00
Dan Magenheimer 9256a4789b zcache: fix deadlock condition
I discovered this deadlock condition awhile ago working on RAMster
but it affects zcache as well.  The list spinlock must be
locked prior to the page spinlock and released after.  As
a result, the page copy must also be done while the locks are held.

Applies to 3.2.  Konrad, please push (via GregKH?)...
this is definitely a bug fix so need not be pushed during
a -rc0 window.

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-02-08 14:14:12 -08:00
Dan Magenheimer 91c6cc9b5c mm: zcache/tmem/cleancache: s/flush/invalidate/
Complete the renaming from "flush" to "invalidate" across
both tmem frontends (cleancache and frontswap) and both tmem backends
(Xen and zcache), as required by akpm.

This change is completely cosmetic.

[v10: no change]
[v9: akpm@linux-foundation.org: change "flush" to "invalidate", part 3]
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Jan Beulich <JBeulich@novell.com>
Acked-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Rik Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
[v11: Remove the frontswap part]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-01-23 16:06:37 -05:00
Bernhard Heinloth ebadb73043 Staging: zcache: Fix calls to obsolete function
Function "strict_strtol" replaced by "kstrtol" as suggested by the checkpatch script

Signed-off-by: Bernhard Heinloth <bernhard@heinloth.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-26 18:13:55 -08:00
Greg Kroah-Hartman 43a3beb6da Merge branch 'staging-next' into Linux 3.1
This was done to resolve a conflict in the
drivers/staging/comedi/drivers/ni_labpc.c file that resolved a build
bugfix in Linus's tree with a "better" bugfix that was in the
staging-next tree that resolved the issue in a more complete manner.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-25 09:18:11 +02:00
Seth Jennings 00bf256011 staging: zcache: remove zcache_direct_reclaim_lock
zcache_do_preload() currently does a spin_trylock() on the
zcache_direct_reclaim_lock. Holding this lock intends to prevent
shrink_zcache_memory() from evicting zbud pages as a result
of a preload.

However, it also prevents two threads from
executing zcache_do_preload() at the same time.  The first
thread will obtain the lock and the second thread's spin_trylock()
will fail (an aborted preload) causing the page to be either lost
(cleancache) or pushed out to the swap device (frontswap). It
also doesn't ensure that the call to shrink_zcache_memory() is
on the same thread as the call to zcache_do_preload().

Additional, there is no need for this mechanism because all
zcache_do_preload() calls that come down from cleancache already
have PF_MEMALLOC set in the process flags which prevents
direct reclaim in the memory manager. If the zcache_do_preload()
call is done from the frontswap path, we _want_ reclaim to be
done (which it isn't right now).

This patch removes the zcache_direct_reclaim_lock and related
statistics in zcache.

Based on v3.1-rc8

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Reviewed-by: Dave Hansen <dave@linux.vnet.ibm.com>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-17 15:24:11 -07:00
Seth Jennings 3d65c85f91 staging: zcache: reduce tmem bucket lock contention
tmem uses hash buckets each with their own rbtree and lock to
quickly lookup tmem objects.  tmem has TMEM_HASH_BUCKETS (256)
buckets per pool.  However, because of the way the tmem_oid is
generated for frontswap pages, only 16 unique tmem_oids are being
generated, resulting in only 16 of the 256 buckets being used.
This cause high lock contention for the per bucket locks.

This patch changes SWIZ_BITS to include more bits of the offset.
The result is that all 256 hash buckets are potentially used resulting in a
95% drop in hash bucket lock contention.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-12 09:29:03 -06:00
Seth Jennings 8550be08cb staging: zcache: fix crash on cpu remove
In the case that a cpu is taken offline before zcache_do_preload() is
ever called on the cpu, the per-cpu zcache_preloads structure will
be uninitialized.  In the CPU_DEAD case for zcache_cpu_notifier(),
kp->obj is not checked before calling kmem_cache_free() on it.
If it is NULL, a crash results.

This patch ensures that both kp->obj and kp->page are not NULL before
calling the respective free functions. In practice, just checking
one or the other should be sufficient since they are assigned together
in zcache_do_preload(), but I check both for safety.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Acked-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-11 10:02:49 -06:00
Seth Jennings 80976804f5 staging: zcache: fix cleancache crash
After commit c5f5c4db39 ("staging: zcache: fix crash on high memory
swap") cleancache crashes on the first successful get.  This was caused
by a remaining virt_to_page() call in zcache_pampd_get_data_and_free()
that only gets run in the cleancache path.

The patch converts the virt_to_page() to struct page casting like was
done for other instances in c5f5c4db39.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Tested-By: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-20 14:17:13 -07:00
Greg Kroah-Hartman 6eafa4604c Merge 3.1-rc4 into staging-next
This resolves a conflict with:
	drivers/staging/brcm80211/brcmsmac/types.h

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-29 08:47:46 -07:00
Dan Carpenter 1dcab0875b Staging: zcache: signedness bug in tmem_get()
"ret" needs to be signed for the error handling to work properly.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-23 14:52:20 -07:00
Seth Jennings c5f5c4db39 staging: zcache: fix crash on high memory swap
zcache_put_page() was modified to pass page_address(page) instead of the
actual page structure. In combination with the function signature changes
to tmem_put() and zcache_pampd_create(), zcache_pampd_create() tries to
(re)derive the page structure from the virtual address.  However, if the
original page is a high memory page (or any unmapped page), this
virt_to_page() fails because the page_address() in zcache_put_page()
returned NULL.

This patch changes zcache_put_page() and zcache_get_page() to pass
the page structure instead of the page's virtual address, which
may or may not exist.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-23 14:52:20 -07:00
Seth Jennings 0428fec32c staging: zcache: fix typos
The patch fixes two typos in zcache-main.c

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-23 14:49:33 -07:00
Seth Jennings dbe82eb117 staging: zcache: fix possible sleep under lock
zcache_new_pool() calls kmalloc() with GFP_KERNEL which has
__GFP_WAIT set.  However, zcache_new_pool() gets called on
a stack that holds the swap_lock spinlock, leading to a
possible sleep-with-lock situation. The lock is obtained
in enable_swap_info().

The patch replaces GFP_KERNEL with GFP_ATOMIC.

v2: replace with GFP_ATOMIC, not GFP_IOFS

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-23 14:49:33 -07:00
Nitin Gupta d8c778fdf2 zcache: Fix build error when sysfs is not defined
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-08 12:05:35 -07:00
Thadeu Lima de Souza Cascardo 3ca15c4486 zcache: Use div_u64 for 64-bit division
xv_get_total_size_bytes returns a u64 value and it's used in a division.
This causes build failures in 32-bit architectures, as reported by Randy
Dunlap.

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-08 12:05:34 -07:00
Thadeu Lima de Souza Cascardo 12623f07b9 staging: zcache: include module.h for MODULE_LICENSE
The oncoming cleanup of module.h usage requires the explicit inclusion
of module.h when it was otherwise being included indirectly. Otherwise,
building zcache will fail.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-03 07:25:49 -07:00
Thadeu Lima de Souza Cascardo fd6b68bbac staging: zcache: module is GPL
This avoids tainting the kernel as if a proprietary module was loaded.
The kernel will still be tainted because this is a staging driver.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-02 16:06:18 -07:00
Thadeu Lima de Souza Cascardo bf0c0259c7 staging: fix zcache building
zcache is only building tmem.c and not building zcache.c. To keep the
module name, zcache.c must be renamed if symbols from tmem.c are to
remain unexported.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-02 16:06:18 -07:00
Dan Magenheimer 966b9016a1 staging: zcache: support multiple clients, prep for KVM and RAMster
This is version 3 of an update to zcache, incorporating feedback from the list.
This patch adds support to the in-kernel transcendent memory ("tmem") code
and the zcache driver for multiple clients, which will be needed for both
RAMster and KVM support.  It also adds additional tmem callbacks to support
RAMster and corresponding no-op stubs in the zcache driver.  In v2, I've
also taken the liberty of adding some additional sysfs variables to
both surface information and allow policy control.  Those experimenting
with zcache should find them useful.  V3 clarifies some code walking
and declaring arrays.

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>

[v3: error27@gmail.com: fix array bounds/walking]
[v2: konrad.wilk@oracle.com: fix bools, add check for NULL, fix a comment]
[v2: sjenning@linux.vnet.ibm.com: add info/tunables for poor compression]
[v2: marcusklemm@googlemail.com: add tunable for max persistent pages]
Acked-by: Dan Carpenter <error27@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: linux-mm@kvack.org
Cc: kvm@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-07-08 14:18:53 -07:00
Ying Han 1495f230fa vmscan: change shrinker API by passing shrink_control struct
Change each shrinker's API by consolidating the existing parameters into
shrink_control struct.  This will simplify any further features added w/o
touching each file of shrinker.

[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: fix warning]
[kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API]
[akpm@linux-foundation.org: fix xfs warning]
[akpm@linux-foundation.org: update gfs2]
Signed-off-by: Ying Han <yinghan@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 08:39:26 -07:00
Nitin Gupta 3c8bb7aab9 staging: Allow sharing xvmalloc for zram and zcache
Both zram and zcache use xvmalloc allocator. If xvmalloc
is compiled separately for both of them, we will get linker
error if they are both selected as "built-in". We can also
get linker error regarding missing xvmalloc symbols if zram
is not built.

So, we now compile xvmalloc separately and export its symbols
which are then used by both of zram and zcache.

Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-23 14:02:56 -08:00
Vasiliy Kulikov 69648bed53 staging: zcache: fix memory leak
obj is not freed if __get_free_page() failed.

Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-18 13:24:53 -08:00
Dan Magenheimer 6630889735 staging: zcache: misc build/config
[PATCH V2 3/3] drivers/staging: zcache: misc build/config

Makefiles and Kconfigs to build zcache in drivers/staging

There is a dependency on xvmalloc.* which in 2.6.37 resides
in drivers/staging/zram.  Should this move or disappear,
some Makefile/Kconfig changes will be required.

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-09 15:08:22 -08:00
Dan Magenheimer 9cc06bf88d staging: zcache: host services and PAM services
[PATCH V2 2/3] drivers/staging: zcache: host services and PAM services

Zcache provides host services (memory allocation) for tmem,
a "shim" to interface cleancache and frontswap to tmem, and
two different page-addressable memory implemenations using
lzo1x compression.  The first, "compression buddies" ("zbud")
compresses pairs of pages and supplies a shrinker interface
that allows entire pages to be reclaimed.  The second is
a shim to xvMalloc which is more space-efficient but
less receptive to page reclamation.  The first is used
for ephemeral pools and the second for persistent pools.
All ephemeral pools share the same memory, that is, even
pages from different pools can share the same page.

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-09 15:07:13 -08:00
Dan Magenheimer daa6afa6d9 staging: zcache: in-kernel tmem code
[PATCH V2 1/3] drivers/staging: zcache: in-kernel tmem code

Transcendent memory ("tmem") is a clean API/ABI that provides
for an efficient address translation and a set of highly
concurrent access methods to copy data between a page-oriented
data source (e.g. cleancache or frontswap) and a page-addressable
memory ("PAM") data store.  Of critical importance, the PAM data
store is of unknown (and possibly varying) size so any individual
access may succeed or fail as defined by the API/ABI.

Tmem exports a basic set of access methods (e.g. put, get,
flush, flush object, new pool, and destroy pool) which are
normally called from a "host" (e.g. zcache).

To be functional, two sets of "ops" must be registered by the
host, one to provide "host services" (memory allocation) and
one to provide page-addressable memory ("PAM") hooks.

Tmem supports one or more "clients", each which can provide
a set of "pools" to partition pages.  Each pool contains
a set of "objects"; each object holds pointers to some number
of PAM page descriptors ("pampd"), indexed by an "index" number.
This triple <pool id, object id, index> is sometimes referred
to as a "handle".  Tmem's primary function is to essentially
provide address translation of handles into pampds and move
data appropriately.

As an example, for cleancache, a pool maps to a filesystem,
an object maps to a file, and the index is the page offset
into the file.  And in this patch, zcache is the host and
each PAM descriptor points to a compressed page of data.

Tmem supports two kinds of pages: "ephemeral" and "persistent".
Ephemeral pages may be asynchronously reclaimed "bottoms up"
so the data structures and concurrency model must allow for
this.  For example, each pampd must retain sufficient information
to invalidate tmem's handle-to-pampd translation.
its containing object so that, on reclaim, all tmem data
structures can be made consistent.

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-09 15:06:52 -08:00