Deep Desktop

Deep Desktop

Duraexim Jun 24, 2024 @ 8:31pm
Direct ML or Nvidia Cuda?
Hi
I'm loving the app, it's amazing!!

But I have some questions

Which one should I choose? Direct ML or Nvidia Cuda?

I have an Nvidia RTX 3080 (10GB) and a Ryzen 5 5600X, 32GB RAM
I'm choosing Nvidia CUDA since I have an Nvidia card, but I wanna know if there is a performance or the way that depth works difference between them.

Also, there is an option to choose device "CUDA0" or "CPU", which should I choose?

Also I would like to have more information about the three options for the encoders "vits" "vitb" and "vitl", and finally what Frame Offset do? there is options from 0 to 20

Thanks in advance :steamhappy:
< >
Showing 1-2 of 2 comments
LaurentDupin  [developer] Jun 24, 2024 @ 11:10pm 
4
The TLDR is that in your case selecting Nvidia CUDA is the best option. Now here is why :

Direct ML is Microsoft solution to provide a way to use any graphics card to do "AI" (Depth estimation falls into the AI category that they love to tell us about technically), so it is going to be compatible with any (conditions may still apply) Nvidia, AMD or Intel GPU, iGPU (would not expect much performance but it is possible) or NPU (the new thing that the new Qualcomm and AMD CPUs are going to have). It is not as performant but at least it should work in any case. If you want acomparison, my AMD 6700 xt using Direct ML pretty much performs as well as my 1050 ti using Nvidia CUDA, when it is supposed to be capable to delivering roughly 5 times the performance. Ultimately for a Nvidia GPU the performance hit is not that bad because I guess that Direct ML translate to CUDA when it can but expect 50% less performance nonetheless. In terms of difference, there should be none and I don't think I have seen one yet.

Now, for the "CUDA0" and "CPU" thing : "CUDA0" represents your RTX 3080 and "CPU" your 5600X. The "CPU" option is kind of a fallback if you have nothing else, it is going to be slow but is going to work (might be ok if you have 5000$+ server CPU with ton of cores but that's not worth it for sure).

Encoder represents different "models" technically : "vits" (the "s" is for small) is the smallest in terms of memory size, run faster but is not as precise. On the contrary, "vitl" is the largest, is more precise but runs much slower (like 20x times). Since for the whole thing to run in real time you need to turn the resolution way down (the depth resolution is something like 300x180 pixels if you leave the default options), you won't see much difference but if you were to increase the depth resolution, you may start to see actual difference in the depth estimation between them.

Finally, Frame offset allows you to delay the appearance of the desktop capture compared to the depth estimation. Let's say it take 10ms to capture your desktop and 50ms to compute depth from this capture, it means that the depth estimation will be 40ms behind, so offsetting the capture may help to reduce the depth ghosting you may see. It is mostly useful if you have very bad performance, you should leave it at 0 if you don't see the 3D effect lagging behind the actual image as it introduces lag to the desktop capture.

Hopefully that answers all your questions :)
Duraexim Jun 25, 2024 @ 5:26pm 
Thanks for the answer, it was very helpful :D
Regards!
< >
Showing 1-2 of 2 comments
Per page: 1530 50