If a kthread happens to use get_user_pages() on an mm (as KSM does),
there's a chance that it will end up trying to read in a swap page, then
oops in grab_swap_token() because the kthread has no mm: GUP passes down
the right mm, so grab_swap_token() ought to be using it.
We have not identified a stronger case than KSM's daemon (not yet in
mainline), but the issue must have come up before, since RHEL has included
a fix for this for years (though a different fix, they just back out of
grab_swap_token if current->mm is unset: which is what we first proposed,
but using the right mm here seems more correct).
Reported-by: Izik Eidus <ieidus@redhat.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Following bug was uncovered by compiling with '-W' flag:
CC mm/thrash.o
mm/thrash.c: In function âgrab_swap_tokenâ:
mm/thrash.c:52: warning: comparison of unsigned expression < 0 is always false
Variable token_priority is unsigned, so decrementing first and then
checking the result does not work; fixed by reversing the test, patch
attached (compile tested only).
I am not sure if likely() makes much sense in this new situation, but
I'll let somebody else to make a decision on that.
Signed-off-by: Mika Kukkonen <mikukkon@iki.fi>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch makes the needlessly global "global_faults" static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The new swap token patches replace the current token traversal algo. The old
algo had a crude timeout parameter that was used to handover the token from
one task to another. This algo, transfers the token to the tasks that are in
need of the token. The urgency for the token is based on the number of times
a task is required to swap-in pages. Accordingly, the priority of a task is
incremented if it has been badly affected due to swap-outs. To ensure that
the token doesnt bounce around rapidly, the token holders are given a priority
boost. The priority of tasks is also decremented, if their rate of swap-in's
keeps reducing. This way, the condition to check whether to pre-empt the swap
token, is a matter of comparing two task's priority fields.
[akpm@osdl.org: cleanups]
Signed-off-by: Ashwin Chaugule <ashwin.chaugule@celunite.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some users (hi Zwane) have seen a problem when running a workload that
eats nearly all of physical memory - th system does an OOM kill, even
when there is still a lot of swap free.
The problem appears to be a very big task that is holding the swap
token, and the VM has a very hard time finding any other page in the
system that is swappable.
Instead of ignoring the swap token when sc->priority reaches 0, we could
simply take the swap token away from the memory hog and make sure we
don't give it back to the memory hog for a few seconds.
This patch resolves the problem Zwane ran into.
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It turns out that the original swap token implementation, by Song Jiang, only
enforced the swap token while the task holding the token is handling a page
fault. This patch approximates that, without adding an additional flag to the
mm_struct, by checking whether the mm->mmap_sem is held for reading, like the
page fault code does.
This patch has the effect of automatically, and gradually, disabling the
enforcement of the swap token when there is little or no paging going on, and
"turning up" the intensity of the swap token code the more the task holding
the token is thrashing.
Thanks to Song Jiang for pointing out this aspect of the token based thrashing
control concept.
The new code shows a slight degradation over the old swap token code, but
still a big win over running without the swap token.
2.6.12+ swap token disabled
$ for i in `seq 10` ; do /usr/bin/time ./qsbench -n 30000000 -p 3 ; done
101.74user 23.13system 8:26.91elapsed 24%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (38597major+430315minor)pagefaults 0swaps
101.98user 24.91system 8:03.06elapsed 26%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (33939major+430457minor)pagefaults 0swaps
101.93user 22.12system 7:34.90elapsed 27%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (33166major+421267minor)pagefaults 0swaps
101.82user 22.38system 8:31.40elapsed 24%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (39338major+433262minor)pagefaults 0swaps
2.6.12+ swap token enabled, timeout 300 seconds
$ for i in `seq 4` ; do /usr/bin/time ./qsbench -n 30000000 -p 3 ; done
102.58user 16.08system 3:41.44elapsed 53%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (19707major+285786minor)pagefaults 0swaps
102.07user 19.56system 4:00.64elapsed 50%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (19012major+299259minor)pagefaults 0swaps
102.64user 18.25system 4:07.31elapsed 48%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (21990major+304831minor)pagefaults 0swaps
101.39user 19.41system 5:15.81elapsed 38%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (24850major+323321minor)pagefaults 0swaps
2.6.12+ with new swap token code, timeout 300 seconds
$ for i in `seq 4` ; do /usr/bin/time ./qsbench -n 30000000 -p 3 ; done
101.87user 24.66system 5:53.20elapsed 35%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (26848major+363497minor)pagefaults 0swaps
102.83user 19.95system 4:17.25elapsed 47%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (19946major+305722minor)pagefaults 0swaps
102.09user 19.46system 5:12.57elapsed 38%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (25461major+334994minor)pagefaults 0swaps
101.67user 20.61system 4:52.97elapsed 41%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (22190major+329508minor)pagefaults 0swaps
Signed-off-by: Rik Van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!