8b8edefa2f
Make fscache object state transition callbacks use workqueue instead of slow-work. New dedicated unbound CPU workqueue fscache_object_wq is created. get/put callbacks are renamed and modified to take @object and called directly from the enqueue wrapper and the work function. While at it, make all open coded instances of get/put to use fscache_get/put_object(). * Unbound workqueue is used. * work_busy() output is printed instead of slow-work flags in object debugging outputs. They mean basically the same thing bit-for-bit. * sysctl fscache.object_max_active added to control concurrency. The default value is nr_cpus clamped between 4 and WQ_UNBOUND_MAX_ACTIVE. * slow_work_sleep_till_thread_needed() is replaced with fscache private implementation fscache_object_sleep_till_congested() which waits on fscache_object_wq congestion. * debugfs support is dropped for now. Tracing API based debug facility is planned to be added. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: David Howells <dhowells@redhat.com>
443 lines
19 KiB
Text
443 lines
19 KiB
Text
==========================
|
|
General Filesystem Caching
|
|
==========================
|
|
|
|
========
|
|
OVERVIEW
|
|
========
|
|
|
|
This facility is a general purpose cache for network filesystems, though it
|
|
could be used for caching other things such as ISO9660 filesystems too.
|
|
|
|
FS-Cache mediates between cache backends (such as CacheFS) and network
|
|
filesystems:
|
|
|
|
+---------+
|
|
| | +--------------+
|
|
| NFS |--+ | |
|
|
| | | +-->| CacheFS |
|
|
+---------+ | +----------+ | | /dev/hda5 |
|
|
| | | | +--------------+
|
|
+---------+ +-->| | |
|
|
| | | |--+
|
|
| AFS |----->| FS-Cache |
|
|
| | | |--+
|
|
+---------+ +-->| | |
|
|
| | | | +--------------+
|
|
+---------+ | +----------+ | | |
|
|
| | | +-->| CacheFiles |
|
|
| ISOFS |--+ | /var/cache |
|
|
| | +--------------+
|
|
+---------+
|
|
|
|
Or to look at it another way, FS-Cache is a module that provides a caching
|
|
facility to a network filesystem such that the cache is transparent to the
|
|
user:
|
|
|
|
+---------+
|
|
| |
|
|
| Server |
|
|
| |
|
|
+---------+
|
|
| NETWORK
|
|
~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
| +----------+
|
|
V | |
|
|
+---------+ | |
|
|
| | | |
|
|
| NFS |----->| FS-Cache |
|
|
| | | |--+
|
|
+---------+ | | | +--------------+ +--------------+
|
|
| | | | | | | |
|
|
V +----------+ +-->| CacheFiles |-->| Ext3 |
|
|
+---------+ | /var/cache | | /dev/sda6 |
|
|
| | +--------------+ +--------------+
|
|
| VFS | ^ ^
|
|
| | | |
|
|
+---------+ +--------------+ |
|
|
| KERNEL SPACE | |
|
|
~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|~~~~~~|~~~~
|
|
| USER SPACE | |
|
|
V | |
|
|
+---------+ +--------------+
|
|
| | | |
|
|
| Process | | cachefilesd |
|
|
| | | |
|
|
+---------+ +--------------+
|
|
|
|
|
|
FS-Cache does not follow the idea of completely loading every netfs file
|
|
opened in its entirety into a cache before permitting it to be accessed and
|
|
then serving the pages out of that cache rather than the netfs inode because:
|
|
|
|
(1) It must be practical to operate without a cache.
|
|
|
|
(2) The size of any accessible file must not be limited to the size of the
|
|
cache.
|
|
|
|
(3) The combined size of all opened files (this includes mapped libraries)
|
|
must not be limited to the size of the cache.
|
|
|
|
(4) The user should not be forced to download an entire file just to do a
|
|
one-off access of a small portion of it (such as might be done with the
|
|
"file" program).
|
|
|
|
It instead serves the cache out in PAGE_SIZE chunks as and when requested by
|
|
the netfs('s) using it.
|
|
|
|
|
|
FS-Cache provides the following facilities:
|
|
|
|
(1) More than one cache can be used at once. Caches can be selected
|
|
explicitly by use of tags.
|
|
|
|
(2) Caches can be added / removed at any time.
|
|
|
|
(3) The netfs is provided with an interface that allows either party to
|
|
withdraw caching facilities from a file (required for (2)).
|
|
|
|
(4) The interface to the netfs returns as few errors as possible, preferring
|
|
rather to let the netfs remain oblivious.
|
|
|
|
(5) Cookies are used to represent indices, files and other objects to the
|
|
netfs. The simplest cookie is just a NULL pointer - indicating nothing
|
|
cached there.
|
|
|
|
(6) The netfs is allowed to propose - dynamically - any index hierarchy it
|
|
desires, though it must be aware that the index search function is
|
|
recursive, stack space is limited, and indices can only be children of
|
|
indices.
|
|
|
|
(7) Data I/O is done direct to and from the netfs's pages. The netfs
|
|
indicates that page A is at index B of the data-file represented by cookie
|
|
C, and that it should be read or written. The cache backend may or may
|
|
not start I/O on that page, but if it does, a netfs callback will be
|
|
invoked to indicate completion. The I/O may be either synchronous or
|
|
asynchronous.
|
|
|
|
(8) Cookies can be "retired" upon release. At this point FS-Cache will mark
|
|
them as obsolete and the index hierarchy rooted at that point will get
|
|
recycled.
|
|
|
|
(9) The netfs provides a "match" function for index searches. In addition to
|
|
saying whether a match was made or not, this can also specify that an
|
|
entry should be updated or deleted.
|
|
|
|
(10) As much as possible is done asynchronously.
|
|
|
|
|
|
FS-Cache maintains a virtual indexing tree in which all indices, files, objects
|
|
and pages are kept. Bits of this tree may actually reside in one or more
|
|
caches.
|
|
|
|
FSDEF
|
|
|
|
|
+------------------------------------+
|
|
| |
|
|
NFS AFS
|
|
| |
|
|
+--------------------------+ +-----------+
|
|
| | | |
|
|
homedir mirror afs.org redhat.com
|
|
| | |
|
|
+------------+ +---------------+ +----------+
|
|
| | | | | |
|
|
00001 00002 00007 00125 vol00001 vol00002
|
|
| | | | |
|
|
+---+---+ +-----+ +---+ +------+------+ +-----+----+
|
|
| | | | | | | | | | | | |
|
|
PG0 PG1 PG2 PG0 XATTR PG0 PG1 DIRENT DIRENT DIRENT R/W R/O Bak
|
|
| |
|
|
PG0 +-------+
|
|
| |
|
|
00001 00003
|
|
|
|
|
+---+---+
|
|
| | |
|
|
PG0 PG1 PG2
|
|
|
|
In the example above, you can see two netfs's being backed: NFS and AFS. These
|
|
have different index hierarchies:
|
|
|
|
(*) The NFS primary index contains per-server indices. Each server index is
|
|
indexed by NFS file handles to get data file objects. Each data file
|
|
objects can have an array of pages, but may also have further child
|
|
objects, such as extended attributes and directory entries. Extended
|
|
attribute objects themselves have page-array contents.
|
|
|
|
(*) The AFS primary index contains per-cell indices. Each cell index contains
|
|
per-logical-volume indices. Each of volume index contains up to three
|
|
indices for the read-write, read-only and backup mirrors of those volumes.
|
|
Each of these contains vnode data file objects, each of which contains an
|
|
array of pages.
|
|
|
|
The very top index is the FS-Cache master index in which individual netfs's
|
|
have entries.
|
|
|
|
Any index object may reside in more than one cache, provided it only has index
|
|
children. Any index with non-index object children will be assumed to only
|
|
reside in one cache.
|
|
|
|
|
|
The netfs API to FS-Cache can be found in:
|
|
|
|
Documentation/filesystems/caching/netfs-api.txt
|
|
|
|
The cache backend API to FS-Cache can be found in:
|
|
|
|
Documentation/filesystems/caching/backend-api.txt
|
|
|
|
A description of the internal representations and object state machine can be
|
|
found in:
|
|
|
|
Documentation/filesystems/caching/object.txt
|
|
|
|
|
|
=======================
|
|
STATISTICAL INFORMATION
|
|
=======================
|
|
|
|
If FS-Cache is compiled with the following options enabled:
|
|
|
|
CONFIG_FSCACHE_STATS=y
|
|
CONFIG_FSCACHE_HISTOGRAM=y
|
|
|
|
then it will gather certain statistics and display them through a number of
|
|
proc files.
|
|
|
|
(*) /proc/fs/fscache/stats
|
|
|
|
This shows counts of a number of events that can happen in FS-Cache:
|
|
|
|
CLASS EVENT MEANING
|
|
======= ======= =======================================================
|
|
Cookies idx=N Number of index cookies allocated
|
|
dat=N Number of data storage cookies allocated
|
|
spc=N Number of special cookies allocated
|
|
Objects alc=N Number of objects allocated
|
|
nal=N Number of object allocation failures
|
|
avl=N Number of objects that reached the available state
|
|
ded=N Number of objects that reached the dead state
|
|
ChkAux non=N Number of objects that didn't have a coherency check
|
|
ok=N Number of objects that passed a coherency check
|
|
upd=N Number of objects that needed a coherency data update
|
|
obs=N Number of objects that were declared obsolete
|
|
Pages mrk=N Number of pages marked as being cached
|
|
unc=N Number of uncache page requests seen
|
|
Acquire n=N Number of acquire cookie requests seen
|
|
nul=N Number of acq reqs given a NULL parent
|
|
noc=N Number of acq reqs rejected due to no cache available
|
|
ok=N Number of acq reqs succeeded
|
|
nbf=N Number of acq reqs rejected due to error
|
|
oom=N Number of acq reqs failed on ENOMEM
|
|
Lookups n=N Number of lookup calls made on cache backends
|
|
neg=N Number of negative lookups made
|
|
pos=N Number of positive lookups made
|
|
crt=N Number of objects created by lookup
|
|
tmo=N Number of lookups timed out and requeued
|
|
Updates n=N Number of update cookie requests seen
|
|
nul=N Number of upd reqs given a NULL parent
|
|
run=N Number of upd reqs granted CPU time
|
|
Relinqs n=N Number of relinquish cookie requests seen
|
|
nul=N Number of rlq reqs given a NULL parent
|
|
wcr=N Number of rlq reqs waited on completion of creation
|
|
AttrChg n=N Number of attribute changed requests seen
|
|
ok=N Number of attr changed requests queued
|
|
nbf=N Number of attr changed rejected -ENOBUFS
|
|
oom=N Number of attr changed failed -ENOMEM
|
|
run=N Number of attr changed ops given CPU time
|
|
Allocs n=N Number of allocation requests seen
|
|
ok=N Number of successful alloc reqs
|
|
wt=N Number of alloc reqs that waited on lookup completion
|
|
nbf=N Number of alloc reqs rejected -ENOBUFS
|
|
int=N Number of alloc reqs aborted -ERESTARTSYS
|
|
ops=N Number of alloc reqs submitted
|
|
owt=N Number of alloc reqs waited for CPU time
|
|
abt=N Number of alloc reqs aborted due to object death
|
|
Retrvls n=N Number of retrieval (read) requests seen
|
|
ok=N Number of successful retr reqs
|
|
wt=N Number of retr reqs that waited on lookup completion
|
|
nod=N Number of retr reqs returned -ENODATA
|
|
nbf=N Number of retr reqs rejected -ENOBUFS
|
|
int=N Number of retr reqs aborted -ERESTARTSYS
|
|
oom=N Number of retr reqs failed -ENOMEM
|
|
ops=N Number of retr reqs submitted
|
|
owt=N Number of retr reqs waited for CPU time
|
|
abt=N Number of retr reqs aborted due to object death
|
|
Stores n=N Number of storage (write) requests seen
|
|
ok=N Number of successful store reqs
|
|
agn=N Number of store reqs on a page already pending storage
|
|
nbf=N Number of store reqs rejected -ENOBUFS
|
|
oom=N Number of store reqs failed -ENOMEM
|
|
ops=N Number of store reqs submitted
|
|
run=N Number of store reqs granted CPU time
|
|
pgs=N Number of pages given store req processing time
|
|
rxd=N Number of store reqs deleted from tracking tree
|
|
olm=N Number of store reqs over store limit
|
|
VmScan nos=N Number of release reqs against pages with no pending store
|
|
gon=N Number of release reqs against pages stored by time lock granted
|
|
bsy=N Number of release reqs ignored due to in-progress store
|
|
can=N Number of page stores cancelled due to release req
|
|
Ops pend=N Number of times async ops added to pending queues
|
|
run=N Number of times async ops given CPU time
|
|
enq=N Number of times async ops queued for processing
|
|
can=N Number of async ops cancelled
|
|
rej=N Number of async ops rejected due to object lookup/create failure
|
|
dfr=N Number of async ops queued for deferred release
|
|
rel=N Number of async ops released
|
|
gc=N Number of deferred-release async ops garbage collected
|
|
CacheOp alo=N Number of in-progress alloc_object() cache ops
|
|
luo=N Number of in-progress lookup_object() cache ops
|
|
luc=N Number of in-progress lookup_complete() cache ops
|
|
gro=N Number of in-progress grab_object() cache ops
|
|
upo=N Number of in-progress update_object() cache ops
|
|
dro=N Number of in-progress drop_object() cache ops
|
|
pto=N Number of in-progress put_object() cache ops
|
|
syn=N Number of in-progress sync_cache() cache ops
|
|
atc=N Number of in-progress attr_changed() cache ops
|
|
rap=N Number of in-progress read_or_alloc_page() cache ops
|
|
ras=N Number of in-progress read_or_alloc_pages() cache ops
|
|
alp=N Number of in-progress allocate_page() cache ops
|
|
als=N Number of in-progress allocate_pages() cache ops
|
|
wrp=N Number of in-progress write_page() cache ops
|
|
ucp=N Number of in-progress uncache_page() cache ops
|
|
dsp=N Number of in-progress dissociate_pages() cache ops
|
|
|
|
|
|
(*) /proc/fs/fscache/histogram
|
|
|
|
cat /proc/fs/fscache/histogram
|
|
JIFS SECS OBJ INST OP RUNS OBJ RUNS RETRV DLY RETRIEVLS
|
|
===== ===== ========= ========= ========= ========= =========
|
|
|
|
This shows the breakdown of the number of times each amount of time
|
|
between 0 jiffies and HZ-1 jiffies a variety of tasks took to run. The
|
|
columns are as follows:
|
|
|
|
COLUMN TIME MEASUREMENT
|
|
======= =======================================================
|
|
OBJ INST Length of time to instantiate an object
|
|
OP RUNS Length of time a call to process an operation took
|
|
OBJ RUNS Length of time a call to process an object event took
|
|
RETRV DLY Time between an requesting a read and lookup completing
|
|
RETRIEVLS Time between beginning and end of a retrieval
|
|
|
|
Each row shows the number of events that took a particular range of times.
|
|
Each step is 1 jiffy in size. The JIFS column indicates the particular
|
|
jiffy range covered, and the SECS field the equivalent number of seconds.
|
|
|
|
|
|
===========
|
|
OBJECT LIST
|
|
===========
|
|
|
|
If CONFIG_FSCACHE_OBJECT_LIST is enabled, the FS-Cache facility will maintain a
|
|
list of all the objects currently allocated and allow them to be viewed
|
|
through:
|
|
|
|
/proc/fs/fscache/objects
|
|
|
|
This will look something like:
|
|
|
|
[root@andromeda ~]# head /proc/fs/fscache/objects
|
|
OBJECT PARENT STAT CHLDN OPS OOP IPR EX READS EM EV F S | NETFS_COOKIE_DEF TY FL NETFS_DATA OBJECT_KEY, AUX_DATA
|
|
======== ======== ==== ===== === === === == ===== == == = = | ================ == == ================ ================
|
|
17e4b 2 ACTV 0 0 0 0 0 0 7b 4 0 0 | NFS.fh DT 0 ffff88001dd82820 010006017edcf8bbc93b43298fdfbe71e50b57b13a172c0117f38472, e567634700000000000000000000000063f2404a000000000000000000000000c9030000000000000000000063f2404a
|
|
1693a 2 ACTV 0 0 0 0 0 0 7b 4 0 0 | NFS.fh DT 0 ffff88002db23380 010006017edcf8bbc93b43298fdfbe71e50b57b1e0162c01a2df0ea6, 420ebc4a000000000000000000000000420ebc4a0000000000000000000000000e1801000000000000000000420ebc4a
|
|
|
|
where the first set of columns before the '|' describe the object:
|
|
|
|
COLUMN DESCRIPTION
|
|
======= ===============================================================
|
|
OBJECT Object debugging ID (appears as OBJ%x in some debug messages)
|
|
PARENT Debugging ID of parent object
|
|
STAT Object state
|
|
CHLDN Number of child objects of this object
|
|
OPS Number of outstanding operations on this object
|
|
OOP Number of outstanding child object management operations
|
|
IPR
|
|
EX Number of outstanding exclusive operations
|
|
READS Number of outstanding read operations
|
|
EM Object's event mask
|
|
EV Events raised on this object
|
|
F Object flags
|
|
S Object work item busy state mask (1:pending 2:running)
|
|
|
|
and the second set of columns describe the object's cookie, if present:
|
|
|
|
COLUMN DESCRIPTION
|
|
=============== =======================================================
|
|
NETFS_COOKIE_DEF Name of netfs cookie definition
|
|
TY Cookie type (IX - index, DT - data, hex - special)
|
|
FL Cookie flags
|
|
NETFS_DATA Netfs private data stored in the cookie
|
|
OBJECT_KEY Object key } 1 column, with separating comma
|
|
AUX_DATA Object aux data } presence may be configured
|
|
|
|
The data shown may be filtered by attaching the a key to an appropriate keyring
|
|
before viewing the file. Something like:
|
|
|
|
keyctl add user fscache:objlist <restrictions> @s
|
|
|
|
where <restrictions> are a selection of the following letters:
|
|
|
|
K Show hexdump of object key (don't show if not given)
|
|
A Show hexdump of object aux data (don't show if not given)
|
|
|
|
and the following paired letters:
|
|
|
|
C Show objects that have a cookie
|
|
c Show objects that don't have a cookie
|
|
B Show objects that are busy
|
|
b Show objects that aren't busy
|
|
W Show objects that have pending writes
|
|
w Show objects that don't have pending writes
|
|
R Show objects that have outstanding reads
|
|
r Show objects that don't have outstanding reads
|
|
S Show objects that have work queued
|
|
s Show objects that don't have work queued
|
|
|
|
If neither side of a letter pair is given, then both are implied. For example:
|
|
|
|
keyctl add user fscache:objlist KB @s
|
|
|
|
shows objects that are busy, and lists their object keys, but does not dump
|
|
their auxiliary data. It also implies "CcWwRrSs", but as 'B' is given, 'b' is
|
|
not implied.
|
|
|
|
By default all objects and all fields will be shown.
|
|
|
|
|
|
=========
|
|
DEBUGGING
|
|
=========
|
|
|
|
If CONFIG_FSCACHE_DEBUG is enabled, the FS-Cache facility can have runtime
|
|
debugging enabled by adjusting the value in:
|
|
|
|
/sys/module/fscache/parameters/debug
|
|
|
|
This is a bitmask of debugging streams to enable:
|
|
|
|
BIT VALUE STREAM POINT
|
|
======= ======= =============================== =======================
|
|
0 1 Cache management Function entry trace
|
|
1 2 Function exit trace
|
|
2 4 General
|
|
3 8 Cookie management Function entry trace
|
|
4 16 Function exit trace
|
|
5 32 General
|
|
6 64 Page handling Function entry trace
|
|
7 128 Function exit trace
|
|
8 256 General
|
|
9 512 Operation management Function entry trace
|
|
10 1024 Function exit trace
|
|
11 2048 General
|
|
|
|
The appropriate set of values should be OR'd together and the result written to
|
|
the control file. For example:
|
|
|
|
echo $((1|8|64)) >/sys/module/fscache/parameters/debug
|
|
|
|
will turn on all function entry debugging.
|