Recent Chat Issues and TwitchPlaysPokemon
UPDATE: I wanted to update as I buried the lead. We did find and fix a “fundamental issue” with our redis servers, as noted here. During the week, we also updated some networking configurations in relation to our chat servers that should have smoothed out service in general.
TwitchPlaysPokemon is a bonafide phenomenon. So far we’ve seen millions of unique viewers and over 100k peak concurrent viewers. It has captured the attention of the gaming community and even made its way into the mainstream press.
The unique nature and huge chat participation in the TwitchPlaysPokemon experiment has put enormous (and unforeseen) stress on our chat system. We’re always working on improving the QoS of our chat system, and this has been a wonderful learning experience for us.
Our first adjustment on Sunday was to move the channel off of our general chat servers onto a dedicated event chat server, which we typically use for large events like The International and League Championship Series (LCS). This helped, but there were some fundamental issues with our chat infrastructure that required a review.
One of our long-time engineers, Mike Ossareh, wrote about this in a couple of posts on his personal blog. They’re a very candid read on the fundamentals of our chat system.
Why does “Twitch Plays Pokemon” Work? — This provides general oversight into our Live Chat product.
Chat Scalability Improvements — This details the work done yesterday [Feb 19,2014] to improve chat performance.
This quote from the second post quite succinctly sums up the chat issues we’ve faced :
“When a phenomenon like TPP comes along which increases load on the system many fold, it gives us a great opportunity to discover and fix new issues and issues that only raise their head under super high load.”
We LOVE TwitchPlaysPokemon and we want to see where this grand experiment leads us all. While it plays itself out, we’ll continue to hammer away at our chat system and make sure it holds up to the load.