STEAM GROUP
Steam Remote Play homestream
STEAM GROUP
Steam Remote Play homestream
3,574
IN-GAME
30,500
ONLINE
Founded
November 7, 2013
Explanation: NvFBC, NvIFR, NvENC
Hello, I have a few questions.

I would like to know the difference between the NvFBC and NvIFR capture method. Is it correct that NvFBC captures the whole desktop whereas NvIFR copies the frame back buffer. How does the "Game polled" software encoder work? And why is NvENC particularly good? Isn't it just a H.264 encoder? Thank you :)
Last edited by george; Jan 19, 2016 @ 8:49am
< >
Showing 1-9 of 9 comments
Kaldaien Jan 16, 2016 @ 5:26pm 
2
2
2
I have not licensed the appropriate NvAPI nonsense to have access to the documentation for this stuff, so this is all to the best of my understanding. Someone from VALVE can probably step-in and correct anything that I have not properly explained.


Your general understanding is correct. The NV* paths are low-latency capture paths to grab finished frames as soon as they become available and if all is working right pass the data off to the GPU's on-board video encoder.


NVIDIA driver-level capture paths

  • NVFBC

    Captures the framebuffer (front buffer) without any involvement from OpenGL or Direct3D.

    Effectively a direct copy of the framebuffer irrespective of which application(s) drew it.

    It generally only works sensibly in fullscreen mode. If you render in windowed mode and use NVFBC, it is going to capture the entire screen including your desktop and other unrelated windows.


  • NVIFR

    Slightly more complicated and less performant than NVFBC, this can capture a single application.

    In my experience this used to be how Steam would stream windowed-mode applications. I have not seen this capture path in a very long time and I am glad because performance was awful whenever it was used.


Vendor agnostic capture paths

  • Game-polled

    To the best of my knowledge, works by initiating a buffer copy of the backbuffer immediately before the game presents a finished frame to Direct3D / OpenGL.

    This incurs quite a bit of latency because it goes through Direct3D / OpenGL rather than a lower-level driver feature like the NV* stuff (I would imagine it is always at least 1 frame late in order to prevent CPU/GPU synchronization from killing framerate) and it also has issues with multiple overlays.

    Lots of software likes to do stuff when a game presents its final rendered image, this is a very busy time with stuff like overclocking software drawing OSDs to screen, ReShade and GeDoSaTo doing post-processing and so forth. Depending on the order in which these third-party pieces of software handle their extra processing you may or may not see them in the game-polled path.


    Game-polled is NOT how the video is encoded, this is only when/how it is captured.

    I have seen frames captured using "game-polled" later encoded using NVENC and likewise I have also seen NVFBC captured frames encoded on the CPU using libx264.


Video Encoding Paths

  • NVENC

    Ideal because it is GPU-side encoding.

    Newer NVIDIA GPUs can actually do real-time H.264 and H.265 encoding without offloading anything to the CPU, though Steam only exposes H.264.

    If the entire capture and encode path is NV***, that means the video portion will be done on the GPU and the amount of data being copied between GPU and CPU will be kept to a minimum. This reduces latency and PCIe bus traffic.

    NVENC also has an interesting feature where I have seen multi-GPU systems offload video encoding to a separate GPU from the one doing the rendering. So even if you do not have SLI working in a game, the driver may be splitting the work-load intelligently and increasing encoded video quality without putting undue stress on the GPU handling rendering.


NVENC quality is not as good as the software H.264 encoder that Steam uses in my experience, but latency and payload size are incredibly efficient and it is the only way I have ever been able to stream 4K video at framerates higher than 30.


Everything done GPU-side using NVFBC is the holy grail of in-home streaming, but it is a pretty rare thing to see actually happen anymore.
Last edited by Kaldaien; Jan 16, 2016 @ 5:46pm
76561198254197380 Jan 30, 2016 @ 7:27am 
Thank you very much. Great answer!
fatSki Feb 1, 2016 @ 12:12am 
Originally posted by Kaldaien:
I have not licensed the appropriate NvAPI nonsense to have access to the documentation for this stuff, so this is all to the best of my understanding. Someone from VALVE can probably step-in and correct anything that I have not properly explained.


Your general understanding is correct. The NV* paths are low-latency capture paths to grab finished frames as soon as they become available and if all is working right pass the data off to the GPU's on-board video encoder.


NVIDIA driver-level capture paths

  • NVFBC

    Captures the framebuffer (front buffer) without any involvement from OpenGL or Direct3D.

    Effectively a direct copy of the framebuffer irrespective of which application(s) drew it.

    It generally only works sensibly in fullscreen mode. If you render in windowed mode and use NVFBC, it is going to capture the entire screen including your desktop and other unrelated windows.


  • NVIFR

    Slightly more complicated and less performant than NVFBC, this can capture a single application.

    In my experience this used to be how Steam would stream windowed-mode applications. I have not seen this capture path in a very long time and I am glad because performance was awful whenever it was used.


Vendor agnostic capture paths

  • Game-polled

    To the best of my knowledge, works by initiating a buffer copy of the backbuffer immediately before the game presents a finished frame to Direct3D / OpenGL.

    This incurs quite a bit of latency because it goes through Direct3D / OpenGL rather than a lower-level driver feature like the NV* stuff (I would imagine it is always at least 1 frame late in order to prevent CPU/GPU synchronization from killing framerate) and it also has issues with multiple overlays.

    Lots of software likes to do stuff when a game presents its final rendered image, this is a very busy time with stuff like overclocking software drawing OSDs to screen, ReShade and GeDoSaTo doing post-processing and so forth. Depending on the order in which these third-party pieces of software handle their extra processing you may or may not see them in the game-polled path.


    Game-polled is NOT how the video is encoded, this is only when/how it is captured.

    I have seen frames captured using "game-polled" later encoded using NVENC and likewise I have also seen NVFBC captured frames encoded on the CPU using libx264.


Video Encoding Paths

  • NVENC

    Ideal because it is GPU-side encoding.

    Newer NVIDIA GPUs can actually do real-time H.264 and H.265 encoding without offloading anything to the CPU, though Steam only exposes H.264.

    If the entire capture and encode path is NV***, that means the video portion will be done on the GPU and the amount of data being copied between GPU and CPU will be kept to a minimum. This reduces latency and PCIe bus traffic.

    NVENC also has an interesting feature where I have seen multi-GPU systems offload video encoding to a separate GPU from the one doing the rendering. So even if you do not have SLI working in a game, the driver may be splitting the work-load intelligently and increasing encoded video quality without putting undue stress on the GPU handling rendering.


NVENC quality is not as good as the software H.264 encoder that Steam uses in my experience, but latency and payload size are incredibly efficient and it is the only way I have ever been able to stream 4K video at framerates higher than 30.


Everything done GPU-side using NVFBC is the holy grail of in-home streaming, but it is a pretty rare thing to see actually happen anymore.

This should be pinned, really good explanation.
lovely8 Feb 28, 2016 @ 2:08pm 
*claps* +100
Thanks Kaldaien!
gordan Jun 12, 2016 @ 1:23pm 
Originally posted by Kaldaien:
NVENC also has an interesting feature where I have seen multi-GPU systems offload video encoding to a separate GPU from the one doing the rendering. So even if you do not have SLI working in a game, the driver may be splitting the work-load intelligently and increasing encoded video quality without putting undue stress on the GPU handling rendering.

I am reasonably sure that this reduces rather than increases performance and efficiency, unless you are using an old high end GPU (e.g. GTX 580 that doesn't have NVENC hardware) ) with a new low end Kepler/Maxwell GPU (e.g. GT630 which does have NVENC hardware). The reason for this is that the NVENC hardware is physically separate from the rendering hardware. NVENC encoding doesn't remove any performance at all from the rendering going on in the shader processors. They both have access to the same GPU memory, however, which allows the NVENC engine to directly grab the rendered frame from the frame buffer and process it efficiently.

Therefore, rendering on one GPU and then passing raw FB contents over PCIe bus to the 2nd GPU for NVENC encoding is quite inefficient. But it still may be theoretically preferable if the primary GPU is pre-Kepler and doesn't have NVENC on-board. In practice, I doubt the benefits of this would outweigh the added complexity.
Kaldaien Jun 12, 2016 @ 3:27pm 
The driver does this automatically on my 3-way GTX 980 system, so NVIDIA obviously thinks there's a reason to offload encoding to a separate GPU. Probably for thermal / power target reasons.
gordan Jun 13, 2016 @ 12:58pm 
How can you tell which of the GPUs is doing the NVENC encoding?
Kaldaien Jun 13, 2016 @ 2:42pm 
I have a custom OSD I wrote using NvAPI (NVIDIA GPUs) and ADL (AMD GPUs) to monitor my hardware during development. It's included in all of my mods.

Here's an example where the driver is throttling my GPUs back because they are at a power limit; it has decided to use GPU2 for video encoding, GPU0 for rendering and GPU1's pretty much doing nothing.

http://steamcommunity.com/sharedfiles/filedetails/?id=577432726

You can see PCIe bus load, video encoder load, memory bandwidth usage and a host of other things. NV broke the bus load measurement in recent drivers unfortunately :-\
Last edited by Kaldaien; Jun 13, 2016 @ 2:45pm
Markov Dec 27, 2019 @ 11:19pm 
Originally posted by Kaldaien:
I have a custom OSD I wrote using NvAPI (NVIDIA GPUs) and ADL (AMD GPUs) to monitor my hardware during development. It's included in all of my mods.

Here's an example where the driver is throttling my GPUs back because they are at a power limit; it has decided to use GPU2 for video encoding, GPU0 for rendering and GPU1's pretty much doing nothing.

http://steamcommunity.com/sharedfiles/filedetails/?id=577432726

You can see PCIe bus load, video encoder load, memory bandwidth usage and a host of other things. NV broke the bus load measurement in recent drivers unfortunately :-\
Which driver are you referring to?
< >
Showing 1-9 of 9 comments
Per page: 1530 50

Date Posted: Jan 15, 2016 @ 12:30pm
Posts: 9