Why can't I use ASIO and VST plugins in teleconference apps like Zoom?

First of all, well hello! :laughing:

So as some of you may know I have been using Cubase for 20+ years, and on the forums for about 20. Less recently than from 1998-2015. Now I find myself on these Zoom calls with coworkers, and the sound is better than WebEx, but itā€™s still kinda terrible.

These apps deploy sloppy noise gating, built in ā€œecho cancellationā€ (which seems essentially to be a feedback suppressorā€“non-optional half-duplex ducker mixed with bad-sounding downsampling) and then of course, none of these boring business apps implement the ASIO standard, so even if you have a good card that can give you 1.5ms or 3ms of latency, you canā€™t even use it in these kinds of apps. And of course, you canā€™t use VST compatible plugins in them either.

Part of the problem is that I just want to torture my work colleagues with insert channels packed with all the UAD, Waves, VST native and other plugins Iā€™ve bought over the years.

But also my wife is a singer who loves doing duets, and she is trying to do remote karaoke duets with friends on zoom, and due to BAD default WDM audio drivers as well as inevitable network latencies through VOIP servers, the lag is just bad there as well.

Why is there no P2P VOIP, or is there? If 2 people in the same city have Gigabit internet, I would think they should be able to get < 12-24ms latency, when you add the innate DSP latency of an ASIO-capable teleconference, and then maybe 6-12ms latency across the cloud.

It might be overkill but can something like VST Connect be used for fun stuff like karaoke? Sorry pretty OOTL on audio products lately. Not much music, tons of day job.

Hey br0d!

I canā€™t help you with your questions but I can commiserate. Iā€™m on MS teams a lot and itā€™s a similar deal. Not sure you can just monkey with the audio because itā€™s often packaged with video.

On a completely unrelated note - I still have and listen to your album with the lions on the front. Great synth work. What are you up to these days? Have you released any more music?

For a while, my path took me to remote recording. I was recording 20+ events a year in addition to doing my day job. However, this has come to a screeching halt with the covid-19 virus shutting everything down. Sucks as this happened just as I was about to do a nice jazz big band recording that I really wanted to do. Oh well.

Itā€™s good to see you on this forum. take care.

Tom

1 Like

Iā€™m certainly no Cubase expert but having been a VoIP and video over IP engineer for the last 20 years I think I can help explain the core issues that would explain why Zoom etc canā€™t deliver what you expect. Leaving the question of codec selection and configuration aside (which would determine things like audio bandwidth, latency and error correction), the main issue you face is your local firewall and the NAT function built into your Internet router. For a standard SIP p2p call to work both ends must be able to send media streams directly to each other, to do this they open up local dynamically assigned network ports on the devices IP address and tell the other end of the call where to send the media to on their end (in other words they only control the receiving ports). The ports that are assigned can be anything between 1024-65535. If both endpoints are on the same network this is fine because there are no network restrictions between the two devices (firewall, NAT etc) so they can talk freely to each other. However, firewalls and NAT both break this connectivity because they are not privy to the conversation between the endpoints so have no knowledge of how the call is setup and what dynamic ports are in use.
Home router firewalls typically allow all outbound traffic as the threat is perceived to be from the outside in, not the inside out. Therefore the endpoint has no problem sending the media stream to the far end. However, when that traffic arrives at the other endā€™s firewall it is treated as unsolicited traffic and is discarded. this means that both clients can send easily enough but not receive.
How services like Zoom etc get around this is to have the client register with the service (when you log in) on the internet and then when a call is setup the client establishes twice the number of required outbound UDP ports and the server on the other end then send the receive media back down the additional ports. For this to work both endpoints have to be registered with the cloud service so there is no way to not route the media streams through the intermediate service.
A very long-winded explanation but hopefully you get the idea.

Hey Tom! Good to hear from you again. Itā€™s been years!

I havenā€™t been doing much music because my younger years were frankly so saturated with nonstop recording (in addition to IT work) that itā€™s nice to have a normalish life for a change. Walking around with my wife, watching Netflix, etc.

The whole remote recording and peforming thing is very interesting because Steinberg and the Cubase community had been working on that problem since at least 2010 as far as I can remember, and with COVID that gets thrust into the spotlight because so many musicians, directors, performers etc are out of workā€“probably suffering mental health challenges in additional to economic onesā€“and suddenly everyone needs to solve the same problem urgently.

I have gigabit FIOS here at home and my latency to the Zoom cloud for instance is <4ms. Thatā€™s a lower latency than I typically use to record on my RME Multiface, around 6ms or 12ms, while still getting usable timing results. We live in a time now where if enough people are on broadband, low latency syncronous recording environments SHOULD be relatively high quality, but I think the Zooms of the world need to partner with some of these pro audio companies.

The most important thing is that Future Sound of London receive the credit for all this for their ISDN concerts in the 1990s! :laughing:

I kinda get it. I used to work on firewalls for a living. I have not kept up with network engineering in some years and am far from a top tier expert. But the the thing that perplexes me about that limitation is that the firewall should be able to split the traffic into a control connection (which becomes effectively authentication for the data flow) and a data connection.

A firewall maintains a TCP state table of allowed connections, and as long as the control port is in state, the bidirectional data stream ought to be fine, unless the firewall software is just badly written and is just introducing latency in processing the actual flow data, which is probably the case for most SOHO devices.

The lack of ASIO availability in software that calls itself ā€œAVā€ software is cumbersome, given where we are now. Not just for the interoperability with VST FX, but for latencies. Also of course some people just have like, 40Mbps ISP connections, and the >72ms latency is just inherent to transport there.

I see the ā€œdefault driverā€ problem in the massively deployed, business critical apps like Zoom as a more easily resolvable one than the Firewalls or bandwidth issue, because business people had acclimated too much to crappy sounding conferences during a time period where conferencing was only a minority of their time. Now, it is pretty much all their comms.