So that's basically all the things we did. But I think, you know, the huge advantage is from the subsequent response. And that lesson, you know, kissed my heart so hard.
And then we kind of like when we design a web OS, subsequent is always our goal. But if you look at a reality right now, we can, we can, we already, I'm sure a lot of people in the demos have posted on Twitter and on Discord, you know, I demonstrate to play a song, you know, that's on par with subsequent latency. I think in reality, you know, search web browsing, search up to date information, and also the some of the vision features like the GPT-4 vision provides and our own, you know, vision features provide, you know, sometimes it takes two seconds, sometimes it takes more than that.
A vision probably, you know, the first time I showed probably takes like 15 seconds. And we have another version that now currently takes about seven seconds. So those are things I'm not happy with.
But I think, you know, latency shouldn't be feature by feature. Latency should be a universal bar for natural language, you know, interaction with any devices. And I remember I read a paper, I probably can find that paper later on and post it on my Twitter.
But there's a research on human brain's ability to understand and handle natural language speed wise. I think, you know, there's a couple of the different languages and they can differ very, very differently. Like, I think Japanese, as well as a couple of other languages, you actually process that information faster because of how the language is structured.
But I think 500 milliseconds should be like the golden bar. Like, I did a couple of tests internally. I think 500 milliseconds natural language voice response is the golden bar.