Optimizing DXVK apps

One of the recent happenings in the world of Linux graphics is rise of DXVK.  For those who don't know, DXVK is a translation layer which translates D3D11 and D3D10 Api calls to Vulkan.  It's intended to be used together with Wine to allow more Windows game titles to run directly on Linux without modification.  Wine already has a D3D10/11 to OpenGL translator but DXVK has generally better performance and compatibility than what is built into core Wine.

For Linux gamers, this has meant a wealth of new titles to play on their favorite operating system.  For driver developers, it means more workloads which have different shaders and API usage patterns.  This means more bugs and more opportunities for performance optimization.  While a lot of stuff works fine and performs very well out-of-the-box, we've gotten a handful of new GPU hangs and other issues reported.  Much of the work I've done over the course of the last three months or so has been focused around fixing or improving the performance of games running under DXVK.

Because bug fixing is boring, let's talk about making games faster!

Skyrim Special Edition

One of the first titles I tested on DXVK (the third, if I recall correctly) was The Elder Scrolls V: Skyrim Special Edition.  When I first fired the game up, there were two immediately obvious problems: everything was green (this turned out to be a DXVK bug) and it was a slide-show.  I don't recall the details exactly but it may have been in the seconds-per-frame range.  While Skyrim may have once been considered graphically intensive, that was a long time ago and I knew we could do better.

The first thing I did to try and narrow down the problem was to use RenderDoc to capture a frame of the game so I could inspect it draw-by-draw.  Even though RenderDoc doesn't have actual performance counter support yet, it does use timestamps to tell you how long each draw takes.  I was quickly able to identify a particular draw call that was dominating the frame render time even though it was just rendering a quad with some shading.

With a bit more work, I was able to isolate the offending shader and look at the assembly.  The shader was an ambient occlusion shader which had a couple of large constant arrays in the shader which it used as a look-up table for part of the calculation.  Due to the size of the arrays, they were taking considerable shader resources and causing a large amount of spilling in the shader.  Also, since they were accessed indirectly, we were generating large if-ladders for accessing them.

Isn't this a fairly obvious thing we should be optimizing?  Yes, and we have been in OpenGL.  Unfortunately, the optimization pass for this lives at the GLSL IR level and not in NIR so the SPIR-V path can't take advantage of it.  Using more-or-less the same idea as the GLSL IR pass, I wrote a NIR pass which pulls large constant arrays out into a blob of constant data associated with the shader which we then turn into a UBO in the Vulkan driver.  The optimization successfully got rid of all of the spilling in that and similar shaders, reduced the time required for that draw by 99.6% (no joke!), brought the framerate from slide-show to nicely playable and roughly in-line with the performance of the same game under native D3D11.

This all goes to show that sometimes the difference between garbage performance and good performance is just that one tiny thing you were missing all along.

Batman: Arkham City

Some time later, a user was complaining on the DXVK issue tracker about GPU hangs with Batman: Arkham City on Intel.  How I fixed the hangs is a very boring story but, while I was looking at GPU error states trying to figure out the hangs, I noticed that the tessellation shaders were spilling like mad.  (As it turns out, that had nothing to do with the hangs and our spilling was working perfectly.)

Why were they spilling so badly?  The problem turned out to be because of the shadow variables that DXVK was creating for inputs.  There are very good reasons why it creates these shadows that has to do with differences between the D3D shader interface and Vulkan.  However, our compiler was having difficulty eliminating them and so we were storing 4K of temporary data which blows out the register file and we start spilling like mad.  The pattern in DXVK looks like this:

    layout(location=0) in vec3 v0[3];
    layout(location=0) in vec2 v1[3];
    layout(location=0) out vec4 oVertex[3][32];

    vec4 shader_in[3][32];

    void hs_main () {
        oVertex[gl_InvocationId][0].xyz = shader_in[gl_InvocationId][0].xyz;
        oVertex[gl_InvocationId][1].xy = shader_in[gl_InvocationId][1].xy;
        // Do some other stuff
    }

    void main () {
        shader_in[0][0].xyz = v0[0];
        shader_in[1][0].xyz = v0[1];
        shader_in[2][0].xyz = v0[2];
        shader_in[0][1].xyz = v1[0];
        shader_in[1][1].xyz = v1[1];
        shader_in[2][1].xyz = v1[2];

        hs_main();
    }

In order to chew through it, I wrote a series of four optimizations which chews through the above mess and turns it into, effectively, this:

    layout(location=0) in vec3 v0[3];
    layout(location=0) in vec2 v1[3];
    layout(location=0) out vec4 oVertex[3][32];

    void main () {
        oVertex[gl_InvocationId][0].xyz = v0[gl_InvocationId].xyz;
        oVertex[gl_InvocationId][1].xy = v1[gl_InvocationId].xy;
        // Do some other stuff
    }

Not only are the temporary arrays gone but the array access with an index of gl_InvocationId is now on an input variable directly and not on a temporary.  It's much easier for our hardware to do an indirect access on a vertex input than on a temporary so, again, we dropped the if-ladders and almost all of the spilling.

The improvement to Batman: Arkham City wasn't nearly as dramatic as with Skyrim but it was still around a 15% FPS increase in the game's built-in benchmark.

Conclusion

So what's the moral of the story?  It's not that bad shaders or spilling is the root of all performance problems.  (I could just as easily tell you stories of badly placed HiZ resolves.)  It's that sometimes big performance problems are caused by small things (that doesn't mean they're easy to find!).  Also, that we (the developers on the Intel Mesa team) care about Linux gamers and are hard at work trying to make our open-source Vulkan and OpenGL drivers the best they can be.

Comments

  1. Very interesting read but I have one fear. I'm not expert on this domain but...


    One of the selling point of Vulkan over DirectX 11/OpenGL so I've heard was that the user do more work so graphics driver doesn't need to have a complicated High level shader compiler and hideous blackbox voodoo magic optimizations. The efficiency and optimization is the responsibility of the user. Not the graphic drivers.


    From the look of it, DXVK blindly convert DirectX shader assemblies and its semantics to Vulkan equivalents and pass it to the Vulkan graphics drivers which is so obviously inefficient. So Vulkan graphics driver was forced to optimized away the obvious inefficiency. Thanks to the graphics driver's work, the user(DXVK) don't need to fix their inefficiency.


    If this cycle continues. Vulkan graphics driver will be ended up the same being they tried to depart, that is DirectX/OpenGL. Complicated optimizing compilers and blackbox optimization completely opaque from the user.


    At this rate, we will have yet another Vulkan 2.0 which claims more close to the hardware than Vulkan in less than a decade.

    ReplyDelete
    Replies
    1. Yes, we generally have a philosophy of "don't be stupid" in Vulkan. However, it is still a cross-platform and cross-vendor API so application developers are not expected to optimize for specific platforms or drivers. Rather the intention is that they write the best Vulkan application possible and we translate that to hardware as efficiently as we can.

      In the world of shaders and compilers, the idea of "best for performance" is really hard to define. Sure, DXVK can do the same transformations we do in our driver and the two described above would probably be fairly generically applicable. However, even there, I could imagine a hardware/software stack where DXVK doing the transformation described above for Skyrim could actually hurt performance.

      Delete
    2. This is not the fault of Vulkan but the original API, D3D.

      Because D3D applications need vodoo optimizations to work properly, they also need vodoo when translated with DXVK.

      Nothing interesting to see here.

      The promise of Vulkan remains unchanged and valid.

      Delete
    3. FWIW, DXVK does the Large Constant Array -> UBO optimization itself these days, since some applications are crazy and hardcore 8kB of data inside the shader, but it can be useful for native apps as well.

      The shader input issue would be *much* harder to solve on the DXVK side.

      Delete
    4. That explains why I can no longer reproduce the performance issue by turning off the optimization. I was very confused by that a couple weeks ago.

      Delete
  2. Is there anything the end-user can do to help? If I sent you RenderDoc frame captures of games that work but have graphical issues, would that help, or is it better to just wait for the issue to solve itself in due time? Or games that crash outright in certain scenarios?

    I'd love to help in any way I can if I can actually be a help.

    ReplyDelete
    Replies
    1. Where you should file it depends on the type of issue you're seeing. If you're seeing a rendering corruption on Intel that goes away if you run the same app with DXVK on AMD or Nvidia, then go ahead and file a bug against the Intel Vulkan driver on bugs.freedesktop.org. Including a RenderDoc capture on such a bug report would definitely help. If you don't have other hardware or if you see the same corruption on AMD or Nvidia, please file the bug against DXVK first.

      If, on the other hand, you're seeing GPU or system hangs, that's very likely an Intel driver problem (kernel or mesa) and you should file the bug directly on bugs.freedesktop.org.

      Delete
  3. This comment has been removed by the author.

    ReplyDelete
  4. This comment has been removed by a blog administrator.

    ReplyDelete
  5. When playing deviations from the raft wars rafters, you can calculate the height, field of view, shooting ability to perform your rivals.

    ReplyDelete

Post a Comment

Popular Posts