The Graphic Guy Squall: Deferred MSAA in Unity

前言

以前在練習寫自己的D3D12引擎的時候就有做過，這次看看能不能取代Unity的GBuffer來修改。

Deferred MSAA是可行的，不過解法並不一般，而且因為Unity架構的關係，實現完後還會存在一些需要解決(或忽略)的問題。 ~~底層自己寫引擎能用，到了Unity就不能的情況也不是第一次。~~

大家都清楚GBuffer一開始用MRT綁入到pipeline，然後一次輸出到多張RT。

每張RT存入不同內容(diffuse、specular、normal等等)，最後再過一個組合pass來完成frame，而Unity生成的RT都沒有開啟multisample。

所以必須要靠自己來深入pipeline做修改了!

最低需求D3D11/Shader Model 4.0以上 (其實D3D10.1就是4.0以上了，不過10這種不上不下的東西就不提了)。
並且需要搭配graphic native plugin才能做到。

修改GBuffer

說修改可能不太正確，因為unity根本拿不到cpu端的GBuffer，必須另外創然後附加上去:

CreateMapAndColorBuffer("Custom diffuse", 0, RenderTextureFormat.ARGB32, 0, msaaFactor, ref diffuseRT);
CreateMapAndColorBuffer("Custom specular", 0, RenderTextureFormat.ARGB32, 1, msaaFactor, ref specularRT);

CreateMapAndColorBuffer("Custom normal", 0, RenderTextureFormat.ARGB2101010, 2, msaaFactor, ref normalRT);

只貼前三張示意一下，CreateMapAndColorBuffer會去根據msaafactor建立RenderTexture，並且要把bindMS設為true，並且在裡面使用GetNativeTexturePtr將RT資源丟到D3D11 plugin裡面去建立RenderTargetView。

之所以這麼麻煩是為了在CameraEvent.BeforeGBuffer這個流程，把我們創好的multisample RT駭進去給pipeline，不然使用一般的SetRenderTarget方法Unity是不會鳥你的。


gBufferColor[_index] = (ID3D11Texture2D*)_colorBuffer;

if (gBufferColor[_index] == nullptr)

{

    return false;

}

D3D11_TEXTURE2D_DESC texDesc;

gBufferColor[_index]->GetDesc(&texDesc);


3D11_RENDER_TARGET_VIEW_DESC rtvDesc;

ZeroMemory(&rtvDesc, sizeof(rtvDesc));


rtvDesc.Format = ConvertTypelessFormat(texDesc.Format);

rtvDesc.ViewDimension = (_msaaFactor > 1) ? D3D11_RTV_DIMENSION_TEXTURE2DMS : D3D11_RTV_DIMENSION_TEXTURE2D;

rtvDesc.Texture2D.MipSlice = 0;


HRESULT rtvResult = m_Device->CreateRenderTargetView(gBufferColor[_index], &rtvDesc, &gBufferColorView[_index]);

上面是底層建立RenderTargetView的部分~接到我們呼叫GetNativeTexturePtr後直接建立即可，注意這邊需要根據msaa有無開啟，去將ViewDimension調整為D3D11_RTV_DIMENSION_TEXTURE2DMS。

另外如果場景有燒lightmap，並且有使用shadow mask的話，還會再多一張RenderTarget給light data使用，這部分由於我沒有燒lightmap測試所以就先不做了。

修改深度圖

GBuffer要建立multisample的，那麼深度圖當然也是了。


CreateMapAndColorBuffer("Cutsom depth", 32, RenderTextureFormat.Depth, -1, msaaFactor, ref depthRT);

一樣建立一張深度圖，這次是透過使用GetNativeDepthBufferPtr丟到D3D11 Plugin裡面去建立DepthStencilView。


gBufferDepth = (ID3D11Texture2D*)_depthBuffer;

if (gBufferDepth == nullptr)

{

    return false;

}

D3D11_TEXTURE2D_DESC texDesc;

gBufferDepth->GetDesc(&texDesc);

D3D11_DEPTH_STENCIL_VIEW_DESC dsvDesc;

ZeroMemory(&dsvDesc, sizeof(dsvDesc));

dsvDesc.Format = ConvertTypelessFormat(texDesc.Format);

dsvDesc.ViewDimension = (_msaaFactor > 1) ? D3D11_DSV_DIMENSION_TEXTURE2DMS : D3D11_DSV_DIMENSION_TEXTURE2D;

dsvDesc.Texture2D.MipSlice = 0;

HRESULT dsvResult = m_Device->CreateDepthStencilView(gBufferDepth, &dsvDesc, &gBufferDepthView);

跟GBuffer建立很像，RTV改成DSV而已(以後你要建立shader用的資源的話就是SRV)。

而ConvertTypelessFormat的部分只是一個小函式，將資源格式轉換為RTV、DSV等等可以利用的格式。

例如32bit-深度圖的資源格式是R32G8X24_TYPELESS (這是一個enum)，我這個小函式就回傳D32_FLOAT_S8X24_UINT建立DepthStencilView。

以這個例子就是其實32-bit深度圖會在底層建立成64-bit資源，D32說明有32-bit作為深度圖使用，S8代表有8-bit作為stencil buffer使用，剩下24-bit是多餘的。

格式之間必須嚴格地確認，要同一個group才能建立，例如R32G32B32A32不能建立成R32G32，詳情請找微軟DXGI_FORMAT的說明來看。

繪圖前入侵並修改GBuffer

前面說過，在CameraEvent.BeforeGuffer這段需要駭入我們建立的GBuffer。


void RenderAPI_D3D11::SetGBufferTarget()

{

    if (m_Device == nullptr)

    {

         return;

    }


    ID3D11DeviceContext *immediateContext = nullptr;

    m_Device->GetImmediateContext(&immediateContext);


    if (immediateContext == nullptr)

    {

        return;

    }


    // set gbuffer target

    FLOAT clearColor[4] = { 0,0,0,-1 };


    for (int i = 0; i < 4; i++)

    {

        immediateContext->ClearRenderTargetView(gBufferColorView[i], clearColor);

    }


    // get unity's depth buffer

    immediateContext->OMGetRenderTargets(0, NULL, &screenDepthView);


    // replace om binding with custom targets

    immediateContext->ClearDepthStencilView(gBufferDepthView, D3D11_CLEAR_DEPTH | D3D11_CLEAR_STENCIL, 0.0f, 0);


    immediateContext->OMSetRenderTargets(4, gBufferColorView, gBufferDepthView);


    immediateContext->Release();

}

呼叫IssuePluginEvent，在底層做，而清除RT顏色的alpha通道我設定成-1別有用途，裡面有一行取得unity的深度view只是我用來處理msaafactor設定成1時的情況在用的。

Resolve GBuffer

接著就是畫完以後的處置了，先上cpu端的code:

copyGBuffer.SetGlobalTexture(texName[texIdx], emissionRT);
copyGBuffer.Blit(null, BuiltinRenderTextureType.CameraTarget, resolveAA);


copyGBuffer.SetGlobalTexture(texName[texIdx], depthRT);

copyGBuffer.Blit(null, BuiltinRenderTextureType.CameraTarget, resolveAADepth);


for (int i = 0; i < msaaFactor; i++)

{

    copyGBuffer.SetGlobalFloat("_TransferAAIndex", i);

    copyGBuffer.SetRenderTarget(diffuseAry, 0, CubemapFace.Unknown, i);

    copyGBuffer.SetGlobalTexture("_MsaaTex", diffuseRT);

    copyGBuffer.Blit(null, BuiltinRenderTextureType.CurrentActive, transferAA);




    copyGBuffer.SetRenderTarget(specularAry, 0, CubemapFace.Unknown, i);

    copyGBuffer.SetGlobalTexture("_MsaaTex", specularRT);

    copyGBuffer.Blit(null, BuiltinRenderTextureType.CurrentActive, transferAA);


    copyGBuffer.SetRenderTarget(normalAry, 0, CubemapFace.Unknown, i);

    copyGBuffer.SetGlobalTexture("_MsaaTex", normalRT);

    copyGBuffer.Blit(null, BuiltinRenderTextureType.CurrentActive, transferAA);

}



copyGBuffer.SetGlobalTexture("_GBuffer0", diffuseAry);

copyGBuffer.SetGlobalTexture("_GBuffer1", specularAry);

copyGBuffer.SetGlobalTexture("_GBuffer2", normalAry);

先說前兩行的部分，Unity官方文件明寫如果走HDR Rendering(一般來說都會開啟)，就不會特別去生成emission buffer，而是直接用camera目標。

所以這邊用兩個自製shader: ResolveAA、ResolveAADepth，把我們的multisample emission、depth，給blit到CameraTarget就可以了。

For裡面的東西，稍微複雜一點，diffuse、specular、normal這三個計算光源的buffer，另外去建立一個texture array，數量為msaa的數量。

將multisample RT設定給TransferAA shader，然後再用blit轉移各個subsample pixel到texture array裡。

為何這麼麻煩?

好問題，因為我是想直接在Unity的Internal-DeferredShading裡面，直接使用Texture2DMS來做取值，結果因為Unity的關係沒有辦法這樣綁。

所以就改成，先過一個pass把Texture2DMS的值讀出來存到texturearray裡。

不能一樣使用ResolveAA的方式來處理d、s、n這三個buffer嗎?

ResolveAA是直接平均數值，但是光源計算如果直接平均normal會導致邊緣出現奇怪的亮點，因此多數引擎的GBuffer Resolve都是在lighting pass時一併處理的。

認命點，到lighting pass時處理吧!

Shader部分


ResolveAA(以4x為例):

Texture2DMS<float4, 4> _MsaaTex_4X;

float4 Resolve4X(v2f i)

{

    float4 col = 0;

    float4 skyColor = tex2D(_SkyTextureForResolve, i.uv);

    [unroll]

    for(uint a = 0; a < 4; a++)

    {

       float4 data = _MsaaTex_4X.Load(i.vertex.xy, a);

       data = lerp(data, skyColor, data.a < 0);

       col += data;

    }

    col /= _MsaaFactor;

    return col;

}

首先丟進來的texture，要宣告為Texture2DMS，這樣才能使用Load來做resolve。

取值之後直接相加平均即可。

但是中間會去檢查skyColor是多少，這時剛剛在底層設定的-1就發揮作用了，因為GBuffer必定是用黑色來做clear(不讓沒物件的地方有計算)，這時如果放著邊緣黑色不管，邊邊平均後是會出現明顯黑邊的，所以當沒有資料的時候(-1)，就要取skyColor(或背景色)來做平均。

SkyTextureForResolve的生成很簡單，只要根據相機設定來下GL.Clear就可以了。


void OnPreCull()  

{

        Graphics.SetRenderTarget(skyTexture);

        if (attachedCam.clearFlags == CameraClearFlags.Skybox)

        {

            GL.ClearWithSkybox(false, attachedCam);

        }

        else

        {

            GL.Clear(false, true, attachedCam.backgroundColor);

        }

        Graphics.SetRenderTarget(null);

}

ResolveAADepth (以4X為例):

Stencil
{
    Ref 192
    Comp always
    Pass replace
}       
 

Texture2DMS<float, 4> _MsaaTex_4X;
float Resolve4X(v2f i)
{
    float col = 1;

    [unroll]
    float baseCol = _MsaaTex_4X.Load(i.vertex.xy, 0).r;
    for (uint a = 0; a < 4; a++)
    {
     float depth = _MsaaTex_4X.Load(i.vertex.xy, a).r;
     col = min(depth, col);
     baseCol = max(depth, baseCol);
    }
 
    col = lerp(col, baseCol, col == 0.0f);

    return col;
}     

float frag(v2f i, out float oDepth : SV_Depth) : SV_Target
{
    float col = 1;

    [branch]
    if (_MsaaFactor == 2)
     col = Resolve2X(i);
    else if (_MsaaFactor == 4)
     col = Resolve4X(i);
    else if (_MsaaFactor == 8)
    col = Resolve8X(i);

    oDepth = col;

    return col;
}

大同小異，但是最後在fragment shader這邊必須指定SV_Depth，這樣才可以輸出深度值 (ZWrite ON!)。

Resolve的部分，不再是平均法了，這是因為depth平均會是很奇怪的事情。

一般來說都是根據用途使用min、max來處理。

這邊的處理是預設1(reverse-z最近的深度)，再在所有msaa sample裡面選出最小值(reverse-z最遠的深度)，作為數值，而如果選出來是0，代表沒有物件(邊緣之處)，設定為最遠的深度。

不反過來選最大是怕深度值極端變化時，會把原本的值取代掉成比較近的深度(本來不會亂擋別人的物件開始會亂擋別人。)

強制寫入Stencil Buffer的用處?

眼尖的人會發現，怎麼有個寫入stencil buffer的部分?

其實stencil是Unity在Deferred Shading時，用來做光源處理的參考。

沒有stencil數值，畫面就完全不受光。(Unity預設寫入192)

因為stencil的部分是沒有辦法複製的 (copyresource()不能從multisample的資源複製到non-multisample資源，resolveresouece()也不適用D32_S8格式)。

而SV_StencilRef雖然可以輸出stencil但是它是D3D12或D3D11.3以上的功能。

在這個情況下，我們可以為stencil建立SRV，丟到shader內藉由SV_StencilRef輸出。

可惜這是D3D11，所以只有這個方式能輸出stencil，輸出255讓它鐵定計算光源。

TransferAA:

Texture2DMS<float4> _MsaaTex;
uint _TransferAAIndex;

float4 frag (v2f i) : SV_Target
{
    return _MsaaTex.Load(i.vertex.xy, _TransferAAIndex);
}

最簡單的部分，只要將選擇到的sample index輸出到texturearray就行了。

Internel-DeferredShading:

[branch]
if (_MsaaFactor > 1)
{
   for (uint a = 0; a < _MsaaFactor; a++)
   {
     gbuffer0 = _GBuffer0.Load(uint4(uv * _ScreenParams.xy, a, 0));
     gbuffer1 = _GBuffer1.Load(uint4(uv * _ScreenParams.xy, a, 0));
     gbuffer2 = _GBuffer2.Load(uint4(uv * _ScreenParams.xy, a, 0));
     UnityStandardData data = UnityStandardDataFromGbuffer(gbuffer0, gbuffer1, gbuffer2);

     float3 eyeVec = normalize(wpos - _WorldSpaceCameraPos);
     half oneMinusReflectivity = 1 - SpecularStrength(data.specularColor.rgb);

     UnityIndirect ind;
     UNITY_INITIALIZE_OUTPUT(UnityIndirect, ind);
     ind.diffuse = 0;
     ind.specular = 0;

     col += UNITY_BRDF_PBS(data.diffuseColor, data.specularColor, oneMinusReflectivity, data.smoothness, data.normalWorld, -eyeVec, light, ind);
    }
    col /= _MsaaFactor;
   }
else
{
    UnityStandardData data = UnityStandardDataFromGbuffer(gbuffer0, gbuffer1, gbuffer2);

    float3 eyeVec = normalize(wpos - _WorldSpaceCameraPos);
    half oneMinusReflectivity = 1 - SpecularStrength(data.specularColor.rgb);

    UnityIndirect ind;
    UNITY_INITIALIZE_OUTPUT(UnityIndirect, ind);
    ind.diffuse = 0;
    ind.specular = 0;

    col = UNITY_BRDF_PBS(data.diffuseColor, data.specularColor, oneMinusReflectivity, data.smoothness, data.normalWorld, -eyeVec, light, ind);
}

其實就只是根據msaa參數去改變行為，沒有msaa就照常，有的話就從我們丟進去的texturearray取值計算。

結果

先來個簡單的場景: 只有地板、球球、方塊。

放大後看，確實是上了MSAA，與Forward AA相差無幾。

再來看看Unity的Demo場景維京村落(後製先暫時全關了):

獲得了不錯的表現。

效能方面，在1070 ti上面8x大概1ms上下，平常2x4x在用效能就可以了。

缺點

l 這邊沒去修改Deferred Reflection，所以用這個方法會沒有反射，所以必須把GraphicsSettings.DeferredReflection的shader設定為No Support(也就是不支援)，這樣還是會計算反射顏色出來，只是不走DeferredReflection。如果堅持要DeferredReflection，就要把GBuffer再丟進去並修改shader。

l 光源的Culling Mask將會失效(只剩everything有用)，前面說過Unity利用stencil buffer來做光源culling。但是要從multisample的stencil buffer resolve到unity的，基本上是無法做，stencil要寫入值必須使用reference value，而且這個數值只能常數設定，沒有辦法取另外一張stencil的來用。只有copyresource能完全複製，不過在multisample的狀況下也無法使用。

l GPU VRAM增加了，因為除了內建GBuffer以外我又額外建立multisample target，然後透明物件一樣無法抗鋸齒。

因為透明物件的繪畫是在GBuffer組合之後，使用Forward Rendering畫的)。

簡單的解法就是透過後製AA加強透明物件的部分，複雜的解法就是把透明物件再畫到一張multisample texture，然後再resolve，不過就要另外寫shader將transparent buffer跟主畫面合理融合(手動blending)了，一般以簡單解法為居多。

如果能接受這幾個問題，那其實效果還是不錯的。

參考資料

完整Code [Github]

Antialiased Deferred Rendering [NVIDIA]

DXGI_FORMAT

Renderdoc (根本文無關，但是這個工具的極度方便，能解析一個遊戲的frame背後是怎麼下繪圖指令的)

2019年5月8日 星期三

Deferred MSAA in Unity

前言