Month: October 2022

Servers at capacity? What does that mean?

If you’ve been following the video system updates we’ve been giving, you’re aware of a problematic server company causing a lot of our buffering issues. This morning we pulled the plug on that server company entirely which has greatly increased the quality of service for everyone. The downside? When we pulled the plug on that problematic server company, we reduced our video edge outbound capacity by 10Gbps. But what point is there in having that extra 10Gbps of capacity if it’s absolute garbage and making everyone mad, including me?

That’s where the “Uh oh! Servers are at capacity” message comes in. Since we have reduced video edge capacity right now, this message will show up more frequently. This is to ensure the existing video edge capacity we have isn’t maxed out. If we didn’t have that safety measure in place it would cause buffering and crashing to happen due to outbound bandwidth being maxed out. We’ve had this system for many years, long before we offered VIP packages. Until VIP came along, you had no option to watch a stream on our platform if our servers were at capacity. But now, because VIP is on its own video system, it allows you to bypass that servers at capacity message entirely. Or you can wear out the F5 button on your keyboard.

So why don’t we replace the servers we pulled offline or add capacity? The primary reason is cost of quality. We’re making quality of service a priority rather than quantity. As a result, it is far more expensive to add capacity to maintain a high quality of service. As of right now the plan is to add an additional 5Gbps of capacity before the end of this month.

Thank you all for your patience and thank you all for being such an amazing community! And a huge shout out to our VIP members! You guys are incredible!

-Mark

Progress on video system issues

The last few weeks have been frustrating for everyone involved. But we have our new servers in place and our video ingest system is now rock solid as far as stability of it goes. Both software and bandwidth wise.

The video edge system has been a different story. We removed our servers from Chicago and Dallas due to how problematic the issues at those datacenter locations have become. From blatantly lying to us, overselling bandwidth despite selling us dedicated 10Gbps bandwidth, and outright incompetency from them messing up routing so frequently.

Our new primary edge server is located in Virginia, in the same building as our primary ingest server, but different company. We’ve had a couple hiccups with congested routes on the edge server within the first 24 hours, but they mysteriously resolved after verbally ripping into the company in question. Keep in mind, same company that we had our Chicago and Dallas locations through. Ingest is through a totally different company, surprisingly zero issues and incredible support from them when we’ve needed it.

Once the bandwidth and routing congestion issues were resolved, another weak point of our edge system showed itself: the edge software. It’s a situation of this weak point not showing itself during local testing and simulated load tests. But once it gets put in production, despite same hardware, OS version, and such, it becomes a completely different story.

This aforementioned issue with the video edge software is something that has plagued our platform for years, often going weeks without issue and then suddenly showing itself. Our first thought was DDoS attack. While we did experience multiple DDoS attacks the last few weeks, extensive monitoring and coordination from the datacenter, ruled out this being caused by any type of attack. But rather it’s being caused by an obscure memory leak that only happens under the absolute perfect circumstances, typically during high traffic hours. Adding RAM bandaids the issue but there comes a point where no amount of additional RAM is a feasible resolution for the problem.

So what are the next steps? At the moment I am gutting the video player for desktop viewers, which is where most of our video traffic comes from. Basically everything under the hood of the video player is being updated and rewritten. This will help resolve playback errors that sometime happen which result in the stream randomly stopping and the loading message appearing as it tries to get the stream back, and help with load time. The look of it won’t change. This video player update will all be finished within the next 12 hours barring any critical issues that may arise.

And what about the edge server software? That’s ultimately the issue, right? Yes it is. I’m prototyping and testing new edge software that will not only be more reliable but even reduce latency. We do not use industry standard HLS or LL-HLS for stream delivery. Those are not ideal solutions when you’re trying to keep latency very low and scale it without investing millions of dollars in infrastructure. That’s the secret to Twitch (Amazon) and YouTube Live (Google) achieving low latency through HLS and DASH. They can afford to dump hundreds of millions into hardware to absorb the very high resource cost of low latency via HLS and DASH.

If you’ve read this far, thank you! We’re just as frustrated as you are about the video issues the last few weeks, many of such issues were completely out of our hands. But rest assured we’re not stopping until they’re completely resolved, and by the looks of it, have a better streaming experience than before.

-Mark