Now Playing Tracks

April 12th, 2013 - Editor In Full Swing


Exciting times for me.  I’ve separated all the parts of my project into about 12-13 different libraries and am in the process of going crazy with perfecting the encapsulation so that there’s no bizarre entanglements going on anymore.

Because of all the work I’ve done on this, it now means I can work much faster on the editor since it integrates all the world rendering from the client now.  I’m already moving stuff around, but soon I’ll work on painting alphas and extruding terrain.

Preparations for Portability/Release(?) - April 10th, 2013

At some point during the past month, I tried to bring my complete project to work to see how it would fare in a different environment.  I had been very sloppy with my encapsulation and dependencies and hard-coding a lot of things.  It was a nightmare and took me about 2 hours to do enough local hacks to get even one area (the viewer app) working.

This was a wake-up call:  If I was ever going to release the project, I would need to get it into a releasable state.  So the past few days I spent a ton of time removing the hard-coded things and making the project build cleanly.  My current choice is to put all dependencies in the source tree and have people build them from the solution rather than requiring anything to be installed by a user/designer beforehand.  The only requirement at the moment for your environment is to have an environment variable defined for the root of the tree - this is used by the apps themselves and by some helpers.

I am trying to determine how I will end up releasing my ongoing work here. One possibility is to just release ALL source for everything.  This is the most flexible for people to do what they want to do, however I have nightmares of random people making forks and starting their ‘own’ version. Another is to release the libraries as libraries/headers, and then the source for the server and client only.  This is relatively little code and most would be custom per project anyway. 

Whichever way I go, I will want some interested parties to help beta (more like alpha) test the progress.  I will not be wanting other contributors to the code-base, but will welcome ideas/suggestions from the community.

If you’re interested in being a sort of alpha-tester, please let me know via http://modcraft.superparanoid.de/ - send me a private message (I’m relaxok on there).

Note - I am likely to rename the project soon.  I’d like to give it a snappy name, but may settle for something completely generic, like MMOSDK.  We shall see.

In other news - I have done a lot more culling portal-wise and am now 90% done with portal support. More soon!

March 14th, 2013 - Instancing is a go

Well, I’ve finally implemented instancing.  In D3D11 the best way to do it is to create a structured buffer that’s approximately the size of the most instances of a single model you’ll be creating in a scene.  Then you use SV_InstanceID in your vertex shader to index into the buffer and grab the world matrix from it.  That number is auto-incremented when you call DrawIndexedInstanced with a certain number of instances.

I take the sorted opaque and transparent object lists that I would’ve rendered one by one in their entirety before, and pass them to a new function.  That function retains a list of models that have been drawn already this frame, and ignores ones that have already been drawn when iterating through.  For each new model that hasn’t been drawn, it uses the first appearance of that model to fill in element 0 in the structured buffer.  For the other times that model appears in the list, instead of rendering it, it just continues to fill the structured buffer with locations.  Then the original first appearance is drawn, but instead of 1 instance you draw the number of appearances of that object in the list.  

You can see how much API usage it saves you in the above screenshot.  That’s a lot of stuff to not have to manually draw.  This offloads quite a bit of CPU, especially if you have a long draw distance.  I had scenes go from 45 to 90 fps just by enabling instancing.

Right now, there are some issues.  Models with transparency aren’t great for instancing because of depth ordering issues.  Supposedly there are ways to solve all that, but all the conventional wisdom is so far not working. It’s a bit subtle but you can see different meshes drawing in different orders when you’re walking around.  That can be fixed by only instancing opaques, but you lose a lot of savings that way.  So it’s still a work-in-progress.  Instancing also somewhat broke my nice clean fade-ins/fade-outs when objects go in and out of the far clip plane.

February 24, 2013 - Portal Rendering (I’m Back!)

Yes, I have returned. Real life intervened for quite awhile, but I’m working on the game engine once again.

I’ve accomplished two major things the past few days:

1) I completed the terrain rendering optimization mentioned in the last post.  I now use a simple box filter to create my own mipmaps so that I can manually create the data required for a Texture2DArray resource.  This let me completely get rid of all the ugly branching my pixel shader was doing in order to address different textures.  Speed is very fast now.  The next optimization is to have lower rez skin versions for further-away tiles.

2) Portals!  Yes I have finally started to implement them.  You can see in the above video the portal rectangles in white, and in wireframe mode you can see how internal objects disappear when the angle of view does not allow the camera eye ray to intersect the portal from the proper direction (using dot product of the difference between the eye origin and centerpoint of the portal face, and the portal’s normal.)  This vastly improves rendering speeds anywhere there are buildings and such.  Towns and cities are much speedier now.  The next step is fully implementing the portal chain, which makes dungeons quite speedy as well. I’m only using portal 0 for now since in most buildings that’s the front door.

In some other areas:

- I’ve laid the groundwork for the world builder tool by implementing picking of all world objects (things that will be placed/rotated/etc in an editor).  Picking is actually trickier than I anticipated once you want to do something beyond bounding boxes.  Ideally I will have to get bsp leaf ray intersection done, otherwise picking objects that are close together gets quite difficult.  

- I started supporting String CRCs so that costly things like string comparisons of full resource path names can be very fast.  This is in progress.

Major Optimization Fun

October 2nd, 2012 - A few weeks ago I became quite disappointed when comparing my rendering speed to that of the real WoW client which is one of the most high-performing game clients in recent history.

I’ve made a ton of changes, using new profiling tools like Intel’s VTune, and the newest NSight for Visual Studio.

Some of the optimizations in recent weeks are below.  A few were silly and obvious, but some were only found when really drilling down.

— Completely re-tool cbuffer usage into frame specific, object specific, and mesh specific structures.  This especially helps reduce lighting related Maps/Unmaps into the shader.  I reduced Map/Unmap copies by about 75%. 

— Stop using string (shader name)->pointer maps for shader-specific d3d11 buffers and such.  I converted my whole ShaderSet system to be enum based arrays.

— Don’t use sqrt in pythagorean distance calculations unless you actually need to present the actual value.  Comparisions don’t need them.  This brought the render lists object sorting (see below) from about 5% of the rendering thread’s CPU time to 0.5%.  Surprise kids: sqrt isn’t fast.

— Toy with shader compile options for speed.

— Locally cache a bunch of variables that were accessed by accessor a lot and not optimized out in compilation.

— Separate opaque and transparent render lists, resulting in far fewer blend state changes (also in preparation for instancing of opaques)

— Sort opaque list front to back and transparent list back to front.  The former takes advantage of early-z culling in the pipeline which is always recommended and I never did it because of worries about my transparent objects.

— Don’t bother clearing stencil buffer - not using it.

— Calculate world-light lighting values once every second or so instead of every frame.  That was definitely overkill. 

— Turn some copy-constructor-calling assigns into references to avoid lots of copying (specifically transparency timelines and the hefty bone timelines)

— Get rid of my ‘extra stuff’ Map/Unmap buffer and move the stuff into one of the other buffers.  Still not sure why I didn’t do this before.

— Only calculate bounding box corners once for passive (stationary) objects.  Massive speedup.  Bounding box calculation was 9% of the rendering thread’s CPU time before, now negligible.  Just a terrible oversight on my part before.

— Draw terrain early in the frame so it can be worked on while doing all the object-specific culling and such on the CPU side.  This is a technique mentioned in several of the more hardcore optimization sites (like Matt Fisher who wrote GPUView while interning(!) at Microsoft).

— Start reducing branching in my terrain’s pixel shader.  This will be one of the biggest speedups of all (going from about 60 fps in an average scene to 100+) if my tests so far prove correct.   I have quite a ways to go though.  Unfortunately, accessing texture arrays by index in pixel shaders can’t be done with variables, only integrals.  In order to get by this and do true texture array accessing (sampling via x,y,z rather than just trying to access textures[n]) I will no longer be able to use D3DX11 SRV loading from graphic files and will have to roll my own texture creation and mipmapping.  To get started with this I’m beginning to use stb_image, if you’ve heard of that.  Nice for lightweight no frills loading of png files (for example) into RGBA raw memory which is nice pointing D3D to.

What’s the result of all this?  Well, I don’t have a good test framework and scene set up for exact testing numbers of such things.  But all told, I think frame rates are up anywhere from 50-80% compared to before the optimizations.

More updates soon.

September 5th, 2012 - BSP Tree

Well, I finally bit the bullet and incorporated the BSP Tree data from WMO models.  So my format now has support for a bsp tree.  This is pretty much needed for collision detection that has decent performance.  I’m not totally clear on how it works mathematically yet, but above you can see me drawing theramore isle, drawing collision vertices only.  Each BSP Leaf mesh is a random color so you can see how granular the bsp structure is.  It goes way way beyond the meshes the models normally use.

The next step is to raycast with the planes and leaf polys, from a point on the character bounding boxes to test collision.  I have not done this before, but I think it’s going to be quite a math lesson.

September 3rd, 2012 - Baked Shadows v2.0

If you’ll recall months ago I had baked shadows in terrain working, however it was quite ugly - blocky and such.  There is actually more data there than first thought.  The iffy documentation on the ADT format makes it seem like there’s only ‘on and off’ in the shadow data, so I assumed better looking shadows were created by multisampling textures.   However, there is more data than that.  Here I am basically shading the data range from 0.8 to 1.0 (as a multiplier).  There is some hint that a different ‘shadow color’ from a light database may be used but I think it looks fine as is.  Note the smoothness compared to the old method.

Someday, I will incorporate real shadows for everything but I’d prefer not to slow rendering down for that right now (the same reason Blizzard didn’t add it to the client way back when)

August 27th, 2012 - Fades

Obviously, you can’t draw every object in every map tile, so typically you implement a draw distance for objects in addition to a scene draw distance.  One thing that has been really bothering me with the draw distance is the instant on/off clipping of objects.   So today, I also implemented fade-ins and fade-outs for all objects going in and out of view.  In the above video, I’m demonstrating the fades, and messing with the draw distance to make it a bit more obvious.  This gives you a much smoother experience in the client, similar to the WoW client.  

August 27th, 2012 - Spawning and Leak-Hunting

Once again, I have time to work on this.  Expect lots of updates now.  

It’s useful to be able to spawn creatures, for all sorts of reasons - so I finally implemented it.  Again, note that this is not a local client hack - it actually spawns the mob into the world on the server.

In other news, I’ve found many of the COM memory leaks.  Though I am still leaking memory, it’s no longer COM objects and the leaks are quite reduced.  This is exciting news.

jbb99 asked:

I just stumbled across this website, and just wanted to say I'm very impressed! I just started learning direct3d11 myself and realize just how much work this is. Great work. (And I know that's not an "Ask me something" but I wasn't sure how else to say this!)

Thanks for writing.  It was indeed a lot of work.  In fact, it would have been overwhelming to look at the big picture too much.  The key was to break it down into a series of small manageable steps.  I also highly recommend the book “Practical Rendering and Computation with Direct 3D 11.”  The project would have been absolutely impossible for me without it.  

To Tumblr, Love Pixel Union