Install Steam
login
|
language
简体中文 (Simplified Chinese)
繁體中文 (Traditional Chinese)
日本語 (Japanese)
한국어 (Korean)
ไทย (Thai)
Български (Bulgarian)
Čeština (Czech)
Dansk (Danish)
Deutsch (German)
Español - España (Spanish - Spain)
Español - Latinoamérica (Spanish - Latin America)
Ελληνικά (Greek)
Français (French)
Italiano (Italian)
Bahasa Indonesia (Indonesian)
Magyar (Hungarian)
Nederlands (Dutch)
Norsk (Norwegian)
Polski (Polish)
Português (Portuguese - Portugal)
Português - Brasil (Portuguese - Brazil)
Română (Romanian)
Русский (Russian)
Suomi (Finnish)
Svenska (Swedish)
Türkçe (Turkish)
Tiếng Việt (Vietnamese)
Українська (Ukrainian)
Report a translation problem
Firstly I want to say that the video stuff is cool but in terms of the monocular depth generation field right now, video isn't really a solved problem as temporal consistency is a big challenge. In other words, some flickering is expected because the technology isn't really there yet. But I try to update things as improvements in the research are made.
Though it should be noted that I am not an AI researcher, so I'm not the one who made the depth generation tool itself. Just everything around it bringing it to an easy to use form with (hopefully) many useful and fun extra features.
As for some tips I suppose, the content matters a lot for video, decently close shots of the subject with a mostly stable camera will work the best. And of course you can adjust the interpolation for a tradeoff in flickeriness for laggy depth.
Also if the bigger models will be more stable.
I'd be interested in seeing what your results are like and what kind of things are giving you trouble, like what exactly you mean by 'flickering like crazy'
In my experience most things I try work well enough to be enjoyable without really thinking much about the flickering.
You can message me on discord if you wanna share and discuss stuff further.
I understand what you mean with the video tool, but it exists the way it does to more easily allow batch processing videos in the background. It was also created before I added the realtime video depth feature to the app itself.
Honestly while the realtime video isn't quite the same quality and most of the time cannot generate without skipping frames usually, it works well enough and not having to preprocess things is so much more convenient that I don't even use the preprocessor anymore.
I think I'd rather just focus on that rather than adding in more preprocessing ability to the program itself.
If you didn't know about the realtime feature, it can be enabled from the toolbox menu. As well as obviously needing to enable the "allow animated formats" option in the setting.
I already discussed the issues with depth adjustment in the other thread so I won't reiterate that here. But it should be fixed soon.
The mouse thing seems reallyy odd though. Was it always like that or is this new?
It's supposed to be just directly based on the mouse position on the window, which would just be given from Unity so I have no clue how that could happen.
When it works, the mouse is my preferred method of interacting as the controllers are always just cumbersome and I know the implementation is somewhat half-assed.
I might try and improve it slightly by adding the suggestion of ignoring movement when the trigger starts being pressed, similar to what the steam vr mouse cursor does.
But as for the mouse issue, more info would be useful maybe if you could share a recording of the weird behaviour and any other details of your setup.
I'm not really sure what you mean by opening a folder getting added to the playlist, can you elaborate on the issue here and maybe I can improve it.
And yeah, I was kinda surprised seeing how well DA V2 handles animated styles compared to anything ive seen before. It kinda just #works now, which is cool.
Lotta typing, so little time to exist in this world.
Existentialism aside, I hope it was helpful.
Again thank you for your understanding and feedback.
It really makes things less stressful when people are understanding of issues and not just screaming for things to be fixed.
Admittedly, updates should be tested more before being made public, but it's difficult without a group of people to test things and also getting caught up in the excitement of a new model and just wanting to get it out there for people to use and appreciate. But that's pretty dangerous considering that's the most likely thing to cause issues.
Anyway I'm just rambling now, so I will leave it here and try and fix the most pressing issue of the depth scale being off with default settings.
But It's definitely supposed to use the GPU. No idea why it wouldn't be.
That would explain a lot about the issues of performance and stuff with videos you mentioned.
How did you notice that it was using CPU? The best way to tell is when the depth gen starts it'll say "device: [device]"
There's really no reason it should ever do that though, even if for some reason Cuda is unavailable it should use directml which would show something like "device: privateuse:0"
So I think something would have to be going very wrong there for it to error out of both of those and use the CPU only.
Anyway, there's a lot of factors at play here.
A big part of it is just that the video player support in Unity is pretty bad, so things like seeking don't behave that well sometimes. As well as support for various codecs is lacking.
Some webm will work and others wont. I can try adding the other formats you mentioned to the list of files to detect, but some might not have support in the Unity player.
Gifs cant do depth realtime because they have to be rendered in their own unique way and I don't think it's worth spending resources to make them work with the video depth system.
Keeping things synchronized with the depth frames is also just a challenge and due to the issues with seeking it can be made even worse.
Another thing worth mentioning with realtime videos, assuming the gpu is being used, is that rendering the app itself also uses the gpu and can take a decent amount of resources, meaning that the depth gen has to compete with the rendering of the app. So having spare gpu comupte is helpful for this, but with a 4090 i wouldnt expect it to be much of an issue, unless maybe certain settings are cranked up, like substeps and use of depth+. And the larger the image is on screen the more expensive it is to render as well.
All that being said, currently the depth gen for videos can't really take full use of the gpu as there are bottlenecks preventing it from going as fast as it could. But I hope to improve this somewhat, though I'm not sure how much can be done.
But yeah idk, stuff is tough.
If there's really an issue with GPU acceleration on the depth side we'll have to investigate that one deeper.
I noticed because while the pre-processor was running my RTX 4090 was using only a mere 9% of the GPU. Meanwhile, my Ryzen 3950X was using 25% (thread support limited it appeared with a few nearly capped threads and the rest silent) instead of virtually idle and it remained as such each time I checked it over a period.
For the realtime lag I noticed with the 4K video I'll test that one specifically later when I'm more free in case the realtime algorithm functinos different from that Zero pre-processor in utilization.
In case it helps, I only noticed seeking behavior when using the realtime. I didn't notice it when not using realtime on already pre-processed videos (which I've tested more than realtime and more videos than realtime so it isn't a bias, unless just pure luck).
Fair enough. Here's to hoping and if there are any you can't add might be worth mentioning in store page so we know which ones to convert.
Understood.
Depth+ was on, though substeps was still at 8 iirc. Will consider comparing without and see if it makes a difference later.
Thanks for the response. It is good to see an active dev that listens and takes actual feedback seriously.
If I had to guess, it probably is using the gpu, but your gpu is just so beefy that it's way more bottlenecked by the cpu processing lmao.
All the frame post processing and video stitching happens cpu side and can only go so fast, and it's definitely possible that the gpu is just spitting out a frame and then waiting around for the next one to process. It's a difficult process to optimize with a lot going on and I didn't think it was worth trying too hard on it as it can just churn away in the background.
But I think if it was using cpu, it would be using way more of your cores and be pretty slow.
As for the realtime stuff though I meant to mention framerate currently also matters a lot since the setup can't generate frames that fast and I've implemented a frameskip system which annoyingly has to buffer frames and then increase the frameskip amount if it runs out of frames to play and try again which leads to stuttery stuff at first, especially if the video is like 60+ fps and it has to do this like 4 times.
With a 4090 though I really doubt the rendering settings like 8 substeps would be holding you back.
I barely even run into that on my 4070.
For reference I get probably a little over 15 depths a second generating depth frames for realtime video using the normal setting at like 420 resolution. Lowering the setting any more than that doesn't really help because of bottlenecks in the frame preparation and not generation.
It'd be nice to be able to do like 30fps.
Not sure if 4k would matter much, it might make Unity chug more and put a bit of strain on the GPU on that end but for a high end pc i dont think itd matter, and the depth gen scales everything down before it works on it so it doesnt implode.
But it's possible something somewhere doesn't like it I guess. Though I tested a few thing's and it didn't seem too bad.
Never had it crash though which is odd.
Mostly I just get the janky occasional depth desync when seeking, and sometimes the play button gets stuck or something but nothing that pausing and unpausing doesn't fix.
Also worth noting about the dumb Unity video player, is that it uses the windows HEVC video extensions thing. Annoyingly, and rather bafflingly, Microsoft try to sell this for a dollar on the microsoft store. Which is just one of the pettiest things I've ever seen a corporation do. Especially considering you can google the extension and download it for free very easily. Surely it cannot be a money maker and garners so much ill will I simply cannot comprehend it but I digress.
But, with that installed if you don't already have it, should add support for a couple more codecs like some .mp4 that use HEVC/H.265.
Also I added .avi and .mov to the list but .mkv didn't work with the unity player so I left it off.
I will say videos still aren't the main focus, I just think it's cool so I built tools around it and allow others to use them for the program if they want.
And also that I can't really work on this project a ton as while the sales are a nice bonus, they don't really pay the rent 😅
At least not in this increasingly messed up country.
I try to implement new depth models when it makes sense and prioritize that and fix pressing issues and stuff like that. But v2 is such an improvement in quality for the same performance it's fairly exciting, so I've been trying to put a bit more free time into it and want to improve a few things.
I guess point is I don't really have the resources to do an overhaul of the video system or many things like that.
At the very least I try to respond to everything.
I was never known to be a concise individual though.
Anyway, again I appreciate your understanding and information.
It could be related to my gpu (2060 super) since it's a bit slower and doesn't respond to seeking well. But doing the pause and play thing after seeking definitely helps and restarts the generation for me.
The video stuff was definitely a selling point for me. I didn't even know it could do video until i saw the reddit thread mentioning video capability. This stuff is really promising way better than the standard Side by Side 2D content when it works.
It might help if i add some kind of delay before playback resumes after seeking, but it's really a bandaid solution that probably wont always work.
I find that the easiest way is to hold it down when seeking in the same spot for a couple seconds before releasing, though this is a lot easier when using the mouse.
I do plan to try and improve some stuff for realtime video a bit so maybe I'll try out a few more things.