AutoDepth Image Viewer

AutoDepth Image Viewer

Miraz Jun 17, 2024 @ 10:56pm
A Few Questions On Proper Usage
What is the proper way to give videos depth without borders and background flickering like crazy? Is there a way to improve the accuracy, too? Is it using Depth Anythingv2 for video or is that Zero I think it was called a different model? If it is different can we swap it to Anything v2? Is there anyway to get it to run in real time with video processing even on a high end PC or we only can let it pre-process first. I assume if it can run in real time it will have lower accuracy though.

Suggestion about video depth: Please create an in-game UI for this and not a separately launched application and please let us customize it or move it to like the side so we can easily check progress while looking primarily at what is in front of us.

I feel like Depth Anything v2 is a bit less accurate until I default all settings. It keeps jumping to far away skewed results every time I swap an image or video and every single time I have to click the circular arrow to reset to default which appears to look usually best. Any fix for this? Default isn't always completely accurate though, either, and feels less accurate than the prior version you just updated from at least for some photos requiring manual adjustment.

Mouse movement within the game is horribly slow like there is some type of very low mouse sensitivity even on a high DPI mouse. Joysticks are better but without it being coded to ignore slight jittery movements the buttons often aren't very responsive until you near perfectly still the VR controller.

Every time you open a folder it gets added to play list even though it is part of a sub-folder already on that list... This isn't ideal.

So far I am extremely impressed, especially by its use on anime which I was absolutely not expecting and just randomly tested. This could be game changing for anime over the next couple of years, whether your product or similar tech. Even the anime surprise aside, I did not expect this to be as good as it is in general.
< >
Showing 1-6 of 6 comments
Bolt  [developer] Jun 18, 2024 @ 3:30pm 
This is a lot, but I'll try to touch on everything. And I appreciate the interest and feedback.

Firstly I want to say that the video stuff is cool but in terms of the monocular depth generation field right now, video isn't really a solved problem as temporal consistency is a big challenge. In other words, some flickering is expected because the technology isn't really there yet. But I try to update things as improvements in the research are made.
Though it should be noted that I am not an AI researcher, so I'm not the one who made the depth generation tool itself. Just everything around it bringing it to an easy to use form with (hopefully) many useful and fun extra features.

As for some tips I suppose, the content matters a lot for video, decently close shots of the subject with a mostly stable camera will work the best. And of course you can adjust the interpolation for a tradeoff in flickeriness for laggy depth.
Also if the bigger models will be more stable.
I'd be interested in seeing what your results are like and what kind of things are giving you trouble, like what exactly you mean by 'flickering like crazy'
In my experience most things I try work well enough to be enjoyable without really thinking much about the flickering.
You can message me on discord if you wanna share and discuss stuff further.

I understand what you mean with the video tool, but it exists the way it does to more easily allow batch processing videos in the background. It was also created before I added the realtime video depth feature to the app itself.
Honestly while the realtime video isn't quite the same quality and most of the time cannot generate without skipping frames usually, it works well enough and not having to preprocess things is so much more convenient that I don't even use the preprocessor anymore.
I think I'd rather just focus on that rather than adding in more preprocessing ability to the program itself.
If you didn't know about the realtime feature, it can be enabled from the toolbox menu. As well as obviously needing to enable the "allow animated formats" option in the setting.

I already discussed the issues with depth adjustment in the other thread so I won't reiterate that here. But it should be fixed soon.

The mouse thing seems reallyy odd though. Was it always like that or is this new?
It's supposed to be just directly based on the mouse position on the window, which would just be given from Unity so I have no clue how that could happen.
When it works, the mouse is my preferred method of interacting as the controllers are always just cumbersome and I know the implementation is somewhat half-assed.
I might try and improve it slightly by adding the suggestion of ignoring movement when the trigger starts being pressed, similar to what the steam vr mouse cursor does.
But as for the mouse issue, more info would be useful maybe if you could share a recording of the weird behaviour and any other details of your setup.

I'm not really sure what you mean by opening a folder getting added to the playlist, can you elaborate on the issue here and maybe I can improve it.

And yeah, I was kinda surprised seeing how well DA V2 handles animated styles compared to anything ive seen before. It kinda just #works now, which is cool.

Lotta typing, so little time to exist in this world.
Existentialism aside, I hope it was helpful.
Again thank you for your understanding and feedback.
It really makes things less stressful when people are understanding of issues and not just screaming for things to be fixed.
Admittedly, updates should be tested more before being made public, but it's difficult without a group of people to test things and also getting caught up in the excitement of a new model and just wanting to get it out there for people to use and appreciate. But that's pretty dangerous considering that's the most likely thing to cause issues.
Anyway I'm just rambling now, so I will leave it here and try and fix the most pressing issue of the depth scale being off with default settings.
Bolt  [developer] Jun 19, 2024 @ 3:39pm 
I'm just going to respond to your other post here to keep the threads clean.

But It's definitely supposed to use the GPU. No idea why it wouldn't be.
That would explain a lot about the issues of performance and stuff with videos you mentioned.
How did you notice that it was using CPU? The best way to tell is when the depth gen starts it'll say "device: [device]"
There's really no reason it should ever do that though, even if for some reason Cuda is unavailable it should use directml which would show something like "device: privateuse:0"
So I think something would have to be going very wrong there for it to error out of both of those and use the CPU only.

Anyway, there's a lot of factors at play here.
A big part of it is just that the video player support in Unity is pretty bad, so things like seeking don't behave that well sometimes. As well as support for various codecs is lacking.
Some webm will work and others wont. I can try adding the other formats you mentioned to the list of files to detect, but some might not have support in the Unity player.
Gifs cant do depth realtime because they have to be rendered in their own unique way and I don't think it's worth spending resources to make them work with the video depth system.
Keeping things synchronized with the depth frames is also just a challenge and due to the issues with seeking it can be made even worse.

Another thing worth mentioning with realtime videos, assuming the gpu is being used, is that rendering the app itself also uses the gpu and can take a decent amount of resources, meaning that the depth gen has to compete with the rendering of the app. So having spare gpu comupte is helpful for this, but with a 4090 i wouldnt expect it to be much of an issue, unless maybe certain settings are cranked up, like substeps and use of depth+. And the larger the image is on screen the more expensive it is to render as well.
All that being said, currently the depth gen for videos can't really take full use of the gpu as there are bottlenecks preventing it from going as fast as it could. But I hope to improve this somewhat, though I'm not sure how much can be done.

But yeah idk, stuff is tough.
If there's really an issue with GPU acceleration on the depth side we'll have to investigate that one deeper.
Last edited by Bolt; Jun 19, 2024 @ 4:03pm
Miraz Jun 19, 2024 @ 10:36pm 
Originally posted by Bolt:
I'm just going to respond to your other post here to keep the threads clean.

But It's definitely supposed to use the GPU. No idea why it wouldn't be.
That would explain a lot about the issues of performance and stuff with videos you mentioned.
How did you notice that it was using CPU?
To be clear in case the other algorithm for photos or realtime function differently which I've not verified usage "yet", this pertains to the pre-processing since I decided to let it just process several videos in the background as I busied myself.

I noticed because while the pre-processor was running my RTX 4090 was using only a mere 9% of the GPU. Meanwhile, my Ryzen 3950X was using 25% (thread support limited it appeared with a few nearly capped threads and the rest silent) instead of virtually idle and it remained as such each time I checked it over a period.

For the realtime lag I noticed with the 4K video I'll test that one specifically later when I'm more free in case the realtime algorithm functinos different from that Zero pre-processor in utilization.

Originally posted by Bolt:
The best way to tell is when the depth gen starts it'll say "device: [device]"
There's really no reason it should ever do that though, even if for some reason Cuda is unavailable it should use directml which would show something like "device: privateuse:0"
So I think something would have to be going very wrong there for it to error out of both of those and use the CPU only.

Anyway, there's a lot of factors at play here.
A big part of it is just that the video player support in Unity is pretty bad, so things like seeking don't behave that well sometimes.
In case it helps, I only noticed seeking behavior when using the realtime. I didn't notice it when not using realtime on already pre-processed videos (which I've tested more than realtime and more videos than realtime so it isn't a bias, unless just pure luck).

Originally posted by Bolt:
As well as support for various codecs is lacking.
Some webm will work and others wont. I can try adding the other formats you mentioned to the list of files to detect, but some might not have support in the Unity player.
Fair enough. Here's to hoping and if there are any you can't add might be worth mentioning in store page so we know which ones to convert.

Originally posted by Bolt:
Gifs cant do depth realtime because they have to be rendered in their own unique way and I don't think it's worth spending resources to make them work with the video depth system. Keeping things synchronized with the depth frames is also just a challenge and due to the issues with seeking it can be made even worse.
Understood.

Originally posted by Bolt:
Another thing worth mentioning with realtime videos, assuming the gpu is being used, is that rendering the app itself also uses the gpu and can take a decent amount of resources, meaning that the depth gen has to compete with the rendering of the app. So having spare gpu comupte is helpful for this, but with a 4090 i wouldnt expect it to be much of an issue, unless maybe certain settings are cranked up, like substeps and use of depth+. And the larger the image is on screen the more expensive it is to render as well.
Depth+ was on, though substeps was still at 8 iirc. Will consider comparing without and see if it makes a difference later.

Originally posted by Bolt:
All that being said, currently the depth gen for videos can't really take full use of the gpu as there are bottlenecks preventing it from going as fast as it could. But I hope to improve this somewhat, though I'm not sure how much can be done.

But yeah idk, stuff is tough.
If there's really an issue with GPU acceleration on the depth side we'll have to investigate that one deeper.
Thanks for the response. It is good to see an active dev that listens and takes actual feedback seriously.
Last edited by Miraz; Jun 19, 2024 @ 10:37pm
Bolt  [developer] Jun 20, 2024 @ 8:28pm 
Hmm, I don't think I added a way to check directly which is being used in preprocessing.
If I had to guess, it probably is using the gpu, but your gpu is just so beefy that it's way more bottlenecked by the cpu processing lmao.
All the frame post processing and video stitching happens cpu side and can only go so fast, and it's definitely possible that the gpu is just spitting out a frame and then waiting around for the next one to process. It's a difficult process to optimize with a lot going on and I didn't think it was worth trying too hard on it as it can just churn away in the background.
But I think if it was using cpu, it would be using way more of your cores and be pretty slow.

As for the realtime stuff though I meant to mention framerate currently also matters a lot since the setup can't generate frames that fast and I've implemented a frameskip system which annoyingly has to buffer frames and then increase the frameskip amount if it runs out of frames to play and try again which leads to stuttery stuff at first, especially if the video is like 60+ fps and it has to do this like 4 times.
With a 4090 though I really doubt the rendering settings like 8 substeps would be holding you back.
I barely even run into that on my 4070.
For reference I get probably a little over 15 depths a second generating depth frames for realtime video using the normal setting at like 420 resolution. Lowering the setting any more than that doesn't really help because of bottlenecks in the frame preparation and not generation.
It'd be nice to be able to do like 30fps.

Not sure if 4k would matter much, it might make Unity chug more and put a bit of strain on the GPU on that end but for a high end pc i dont think itd matter, and the depth gen scales everything down before it works on it so it doesnt implode.
But it's possible something somewhere doesn't like it I guess. Though I tested a few thing's and it didn't seem too bad.
Never had it crash though which is odd.
Mostly I just get the janky occasional depth desync when seeking, and sometimes the play button gets stuck or something but nothing that pausing and unpausing doesn't fix.

Also worth noting about the dumb Unity video player, is that it uses the windows HEVC video extensions thing. Annoyingly, and rather bafflingly, Microsoft try to sell this for a dollar on the microsoft store. Which is just one of the pettiest things I've ever seen a corporation do. Especially considering you can google the extension and download it for free very easily. Surely it cannot be a money maker and garners so much ill will I simply cannot comprehend it but I digress.
But, with that installed if you don't already have it, should add support for a couple more codecs like some .mp4 that use HEVC/H.265.
Also I added .avi and .mov to the list but .mkv didn't work with the unity player so I left it off.

I will say videos still aren't the main focus, I just think it's cool so I built tools around it and allow others to use them for the program if they want.
And also that I can't really work on this project a ton as while the sales are a nice bonus, they don't really pay the rent 😅
At least not in this increasingly messed up country.
I try to implement new depth models when it makes sense and prioritize that and fix pressing issues and stuff like that. But v2 is such an improvement in quality for the same performance it's fairly exciting, so I've been trying to put a bit more free time into it and want to improve a few things.
I guess point is I don't really have the resources to do an overhaul of the video system or many things like that.
At the very least I try to respond to everything.
I was never known to be a concise individual though.

Anyway, again I appreciate your understanding and information.
Last edited by Bolt; Jun 20, 2024 @ 9:42pm
PizzaSlice Jun 30, 2024 @ 11:15pm 
Just want to add since this is like a general thread -- with the realtime videos, it helps during playback seeking, to pause and play for the depth to restart and realign. Can this be automated? When seeking directly without pausing it seems to still be playing the old depth from before seeking.

It could be related to my gpu (2060 super) since it's a bit slower and doesn't respond to seeking well. But doing the pause and play thing after seeking definitely helps and restarts the generation for me.

The video stuff was definitely a selling point for me. I didn't even know it could do video until i saw the reddit thread mentioning video capability. This stuff is really promising way better than the standard Side by Side 2D content when it works.
Bolt  [developer] Jul 1, 2024 @ 12:44am 
Trust me, I've tried a billion things to get it to behave, but the problem is mostly due to the jankness of Unity's video player when it comes to seeking.
It might help if i add some kind of delay before playback resumes after seeking, but it's really a bandaid solution that probably wont always work.
I find that the easiest way is to hold it down when seeking in the same spot for a couple seconds before releasing, though this is a lot easier when using the mouse.
I do plan to try and improve some stuff for realtime video a bit so maybe I'll try out a few more things.
< >
Showing 1-6 of 6 comments
Per page: 1530 50