drm/i915 wants to read/write more than one page in its fastpath
and hence needs to prefault more than PAGE_SIZE bytes.
Add new functions in filemap.h to make that possible.
Also kill a copy&pasted spurious space in both functions while at it.
v2: As suggested by Andrew Morton, add a multipage parameter to both
functions to avoid the additional branch for the pagemap.c hotpath.
My gcc 4.6 here seems to dtrt and indeed reap these branches where not
needed.
v3: Becaus I couldn't find a way around adding a uaddr += PAGE_SIZE to
the filemap.c hotpaths (that the compiler couldn't remove again),
let's go with separate new functions for the multipage use-case.
v4: Adjust comment to CodingStlye and fix spelling.
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
While moving around things, this two functions slowly grew out of any
sane bounds. So extract a few lines that do the copying and
clflushing. Also add a few comments to explain what's going on.
v2: Again do s/needs_clflush/needs_clflush_after/ in the write paths
as suggested by Chris Wilson.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It's around 20% faster.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It's too expensive to move it around just for that pwrite, especially
when we're trashing on the mappable gtt part like crazy.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In micro-benchmarking of the usual pwrite use-pattern of alternating
pwrites with gtt domain reads from the gpu, this yields around 30%
improvement of pwrite throughput across all buffers size. The trick is
that we can avoid clflush cachelines that we will overwrite completely
anyway.
Furthermore for partial pwrites it gives a proportional speedup on top
of the 30% percent because we only clflush back the part of the buffer
we're actually writing.
v2: Simplify the clflush-before-write logic, as suggested by Chris
Wilson.
v3: Finishing touches suggested by Chris Wilson:
- add comment to needs_clflush_before and only set this if the bo is
uncached.
- s/needs_clflush/needs_clflush_after/ in the write paths to clearly
differentiate it from needs_clflush_before.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The pagemap.h prefault helpers do the prefaulting by simply writing
some data into every page. Hence we should not prefault when we're not
yet commited to to actually writing data to userspace. The problem is
now that
- we can't prefault while holding dev->struct_mutex for we could
deadlock with our own pagefault handler
- we need to grab dev->struct_mutex before copying to sync up with any
outsanding gpu writes.
Therefore only prefault when we're dropping the lock the first time in
the pread slowpath - at that point we're committed to the write, don't
wait on the gpu anymore and hence won't return early (with e.g.
-EINTR).
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With the proper prefault, it's extremely unlikely that we fall back
to the gtt slowpath.
So just kill it and use the shmem_pwrite path as fallback.
To further clean up the code, move the preparatory gem calls into the
respective pwrite functions. This way the gtt_fast->shmem fallback
is much more obvious.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This speeds up pwrite and pread from ~120 µs ro ~100 µs for
reading/writing 1mb on my snb (if the backing storage pages
are already pinned, of course).
v2: Chris Wilson pointed out a glaring page reference bug - I've
unconditionally dropped the reference. With that fixed (and the
associated reduction of dirt in dmesg) it's now even a notch faster.
v3: Unconditionaly grab a page reference when dropping
dev->struct_mutex to simplify the code-flow.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
~120 µs instead fo ~210 µs to write 1mb on my snb. I like this.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
No longer needed.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is obviously gonna slow down pread. But for a half-way realistic
micro-benchmark, it doesn't matter: Non-broken userspace reads back
data from the gpu once before the gpu again dirties it.
So all this ranged clflush tracking is just a waste of time.
No pread performance change (neglecting the dumb benchmark of
constantly reading the same data) measured.
As an added bonus, this avoids clflush on read on coherent objects.
Which means that partial preads on snb are now roughly 4x as fast.
This will be usefull for e.g. the libva encoder - when I finally get
around to fix that up.
v2: Properly sync with the gpu on LLC machines.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Useful when the page is already mapped to copy date in/out.
For -stable because the next patch (fixing phys obj pwrite) needs this
little helper function.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With the previous rewrite, they've become essential identical.
v2: Simplify the page_do_bit17_swizzling logic as suggested by Chris
Wilson.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With the previous rewrite, they've become essential identical.
v2: Simplify the page_do_bit17_swizzling logic as suggested by Chris
Wilson.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We try to avoid writing the relocations through the uncached GTT, if the
buffer is currently in the CPU write domain and so will be flushed out to
main memory afterwards anyway. Also on SandyBridge we can safely write
to the pages in cacheable memory, so long as the buffer is LLC mapped.
In either of these cases, we therefore do not need to force the
reallocation of the buffer into the mappable region of the GTT, reducing
the aperture pressure.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We've lost our guard page somewhere in the gtt rewrite, this patch
here will restore it.
Exercised by i-g-t/tests/gem_cs_prefetch.
v2: Substract the guard page from the range we're supposed to manage
with gem. Suggested by Chris Wilson to increase the odds of old ums +
gem userspace not blowing up. To compensate for the loss of a page,
don't substract the guard page in the modeset init code any longer.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44748
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
So don't call it like that.
Also rip out a confusing comment and instead explain what's really
going on.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
... because this is what it actually doesn now that we have the global
gtt vs. ppgtt split.
Also move it to the other global gtt functions in i915_gem_gtt.c
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This reverts commmit d4b74bf078 which
reverted the origin fix fb8b5a39b6.
We have at least 3 different bug reports that this fixes things and no
indication what is exactly wrong with this. So try again.
To make matters slightly more fun, the commit itself was cc: stable
whereas the revert has not been.
According to Peter Clifton he discussed this with Zhao Yakui and this
seems to be in contradiction of the GM45 PRM, but rumours have it that
this is how the BIOS does it ... let's see.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Tested-by: Peter Clifton <Peter.Clifton@clifton-electronics.com>
Cc: Zhao Yakui <yakui.zhao@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=16236
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=25913
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=14792
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Originally the code tried to allocate a large enough array to perform
the copy using vmalloc, performance wasn't great and throughput was
improved by processing each individual relocation entry separately.
This too is not as efficient as one would desire. A compromise would be
to allocate a single page, or to allocate a few entries on the stack,
and process the copy in batches. The latter gives simpler code and more
consistent performance due to a lack of heuristic.
x11perf -copywinwin10: n450/pnv i3-330m i5-2520m (cpu)
before: 249000 785000 1280000 (80%)
page: 264000 896000 1280000 (65%)
on-stack: 264000 902000 1280000 (67%)
v2: Use 512-bytes of stack for batching rather than allocate a page.
v3: Tidy the code slightly with more descriptive variable names
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With the recent set of gmbus fixes, this seems to work on my i855gm.
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Again, Valleyview modes these around, so make the mmio base more
explicit to consolidate the base address computations to one
HAS_PCH_SPLIT check.
v2: Fix up the PCH_SPLIT braino ... it actually works that way round.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With valleyview we'll have these at yet another address, so keeping
track of this with an ever-growing list of registers will get ugly.
This way intel_sdvo.c is fully independent of the base address of the
output ports display register blocks.
While at it, do 2 closely related cleanups:
- use SDVO_NAME some more
- change the sdvo_reg variables to uint32_t like other registers.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
They were all over the place, order them by position and add a few.
v2: add gen indications to the new bits (Ben)
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It's only used by the main read/write functions, so we can keep it with
them.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If we discard a buffer due to memory pressure, also release its alloted
mmap address space. As it may be sometime before userspace wakes up
and notices that it has buffers to purge from its cache, we may waste
valuable address space on unusable objects for a period of time.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47738
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Add a new module optoin lvds_channel to specify the LVDS channel mode
explicitly instead of probing the LVDS register value set by BIOS.
This will be helpful when VBT is broken or incompatible with the
current code.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=42842
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Currently i915 driver checks [PCH_]LVDS register bits to decide
whether to set up the dual-link or the single-link mode. This relies
implicitly on that BIOS initializes the register properly at boot.
However, BIOS doesn't initialize it always. When the machine is
booted with the closed lid, BIOS skips the LVDS reg initialization.
This ends up in blank output on a machine with a dual-link LVDS when
you open the lid after the boot.
This patch adds a workaround for that problem by checking the initial
LVDS register value in VBT.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37742
Tested-By: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Introduced in commits c1cd90ed and d27b1e0e
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: s/fix/shut up in the commit msg and add a comment to the
BUG_ON.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Introduced in commit 8461d226 and 8c59967c
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: s/fix/shut up/ in the commit msg.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
On Sanybridge a few MI read/write commands only work when ppgtt is
enabled. Userspace therefore needs to be able to check whether ppgtt
is enabled. For added hilarity, you need to reset the "use global GTT"
bit on snb when ppgtt is enabled, otherwise it won't work. Despite
what bspec says about automatically using ppgtt ...
Luckily PIPE_CONTROL (the only write cmd current userspace uses) is
not affected by all this, as tested by tests/gem_pipe_control_store_loop.
Reviewed-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now that everything is in place, only bind to the global gtt
when actually required. Patch split-up suggested by Chris Wilson.
Reviewed-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
PIPE_CONTROL on snb needs global gtt mappings in place to workaround a
hw gotcha. No other commands need such a workaround. Luckily we can
detect a PIPE_CONTROL commands easily because they have a write_domain
= I915_GEM_DOMAIN_INSTRUCTION (and nothing else has that).
v2: Binding the target of such a reloc into the global gtt actually
works instead of binding the source, which is rather pointless ...
v3: Kill a superflous has_global_gtt_mapping assignement noticed by
Chris Wilson.
Reviewed-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
And track the existence of such a binding similar to the aliasing
ppgtt case. Speeds up binding/unbinding in the common case where we
only need a ppgtt binding (which is accessed in a cpu coherent fashion
by the gpu) and no gloabl gtt binding (which needs uc writes for the
ptes).
This patch just puts the required tracking in place.
v2: Check that global gtt mappings exist in the error_state capture
code (with Chris Wilson's llc reloc patches batchbuffers are no longer
relocated as mappable in all situations, so this matters). Suggested
by Chris Wilson.
v3: Adapted to Chris' latest llc-reloc patches.
v4: Fix a bug in the i915 error state capture code noticed by Chris
Wilson.
Reviewed-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Note that there's a functional change buried in this patch wrt the ilk
dmar workaround: We now only idle the gpu while tearing down the dmar
mappings, not while clearing the gtt. Keeping the current semantics
would have made for some really ugly code and afaik the issue is only
with the dmar unmapping that needs a fully idle gpu.
Reviewed-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Use a more current logging style. Ensure that appropriate
logging messages are prefixed with "i915: ".
Convert printks to pr_<level>. Align arguments.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Mark the Acer Aspire 5734Z that this machines requires the module to
invert the panel backlight brightness value after reading from and prior
to writing to the PCI configuration space.
Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
A machine may need to invert the panel backlight brightness value. This
patch adds the infrastructure for a quirk to do so.
Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Following the documentation of the Legacy Backlight Brightness (LBB)
Register in the configuration space of some Intel PCI graphics adapters,
setting the LBB register with the value 0x0 causes the backlight to be
turned off, and 0xFF causes the backlight to be set to 100% intensity
(http://download.intel.com/embedded/processors/Whitepaper/324567.pdf).
The Acer Aspire 5734Z, however, turns the backlight off at 0xFF and sets
it to maximum intensity at 0. In consequence, the screen of this systems
becomes dark at an early boot stage which makes it unusable. The same
inversion applies to the BLC_PWM_CTL I915 register. This problem was
introduced in kernel version 2.6.38 when the PCI device of this system
was first supported by the i915 KMS module.
This patch adds a parameter to the i915 module to enable inversion of
the brightness variable (i915.invert_brightness).
Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
I have seen a number of "blt ring initialization failed" messages
where the ctl or start registers are not the correct value. Upon further
inspection, if the code just waited a little bit, it would read the
correct value. Adding the wait_for to these reads should eliminate the
issue.
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Some newer BIOSes are shipping with all MTRRs already populated. These
BIOSes are all on machines with sufficiently new CPUs that the
referenced errata doesn't apply anyway, so just don't try to claim the
MTRR.
Signed-off-by: Adam Jackson <ajax@redhat.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41648
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
No functional change here, just clarifying code flow.
Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
According to the PRM (Vol3P2), the PCH FDI receiver ISR read for bit lock
should be retried at least once. This patch retries the read 5 times
with a small delay in between reads. I've had reports of display corruption
on resume with "FDI train 1 fail!", so I'm hoping that adding this retry
will mitigate the issue.
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
They're not really errors (well actually I don't know; I don't
understand _DSM and _MUX well enough to say, but I do know they spam
people's logs and seem to be harmless).
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: The _DSM error got remove in another patch already]
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44250
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This change enables the use of displays where the vbt table just
contains inappropriate values, but either the vesa defaults or
the video=... modes do something sensible with the attached display.
The problem happens with an embedded board that contains vbt bios
tables that do not match the attached display. Using this change and
the appropriate kernel boot command line they are able to use an
otherwise completely unusable secondary display on that embedded
board.
Reviewed-by: Paul Menzel <paulepanter@users.sourceforge.net>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With the rework to merge the bit-banging fallback into the gmbus
i2c adapter we've gotten rid of the deadlock possibility that
originally lead to the disabling of this code.
This reverts the revert
commit 826c7e4147
Author: Jean Delvare <khali@linux-fr.org>
Date: Sat Jun 4 19:34:56 2011 +0000
Revert "drm/i915: Enable GMBUS for post-gen2 chipsets"
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=35572
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This way we can simplify the setup and teardown a bit.
Because we don't actually allocate anything anymore for the force_bit
case, we can now convert that into a boolean.
Also and the functionality supported by the bit-banging together with
what gmbus can do, so that this doesn't randomly change any more.
v2: Chris Wilson noticed that I've mixed up && and & ...
v3: Clarify an if block as suggested by Eugeni Dodonov.
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
... and directly call the newly exported i2c bit-banging functions.
The code is still pretty convoluted because we only set up the gpio
i2c stuff when actually falling back, resulting in more complexity
than necessary. This will be fixed up in the next patch.
v2: Use exported i2c_bit_algo vtable instead of exported functions.
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When we set up the gpio fallback, we always have a 1:1 relationship
with an intel_gmbus. Exploit that to store all gpio related data in
there, too. This is a preparation step to merge the tw i2c adapters
controlling the same bus into one.
Just mundane code-munging in this patch.
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>