COVID-19: Weekly health check of ISPs, cloud providers and conferencing services

ThousandEyes, which tracks internet and cloud traffic, is providing Network World with weekly updates on the performance of three categories of service provider: ISP, cloud provider, UCaaS

thousandeyes map

As COVID-19 continues to spread, forcing employees to work from 彩神8官方版, the services of ISPs, cloud providers and conferencing services a.k.a. unified communications as a service (UCaaS) providers are experiencing increased traffic.

ThousandEyes is monitoring how these increases affect outages and the performance challenges these providers undergo. It will provide Network World a roundup of interesting events of  the week in the delivery of these services, and Network World will provide a summary here. Stop back next week for another update, and see more details .

Update May 25

ISP outages jumped by more than a third in the U.S. during the week ending May 24, while outages among all three categories of provider registered a small increase.

Total outages among all categories rose from 263 to 280 globally, and from 86 to 115 in the U.S.

ISP outages rose from 223 to 225 worldwide, with most of the increase due to outages in the U.S., which jumped from 80 to 109.

Public cloud outages overall rose from 24 to 35, with U.S. outages only ticking up from one to two.

Collaboration app network outages dropped from four to five compared to the week before, with a drop in U.S. outages from five to one accounting for the improvement.

There were two noteworthy outages during the week:

  • Just after 3 a.m. EDT on May 20, Google suffered an outage in the East Coast part of its network that affected users accessing site such as Uber and Shopify that are hosted by the public cloud provider. The outage lasted nine minutes and was located in the New York City metro area, and since it was during off-peak hours, impact on users was likely minimal. Click for an interactive visualization of the outage.
  • About 8 a.m. EDT on May 22, Hurricane Electric suffered an outage that lasted more than an hour and affected several countries. The worst part lasted 44 minutes. The outage was observed at Hurricane Electric nodes across multiple global locations, and affected users reaching sites including Microsoft, Amazon, Workday and Credit Suisse. Click for an interactive visualization of the outage.

Update May 18

The total outages globally leapt up 22% between the week of May 4-10 and the week of May 11-17, from 216 to 263.

ISP outages worldwide grew from 183 to 223, while those outages in the U.S. moved up from 74 to 80.

network on April 20 affected its infrastructure in the U.K., France, Germany, and India. Around 11 a.m. local U.K. time, traffic attempting to reach services such as Amazon, ServiceNow, and Oracle Cloud began terminating in its network, affecting local users, users in the U.S. and elsewhere. The outage lasted about 20 minutes, and affected more than 80 network interfaces across multiple regions and cities.

Another far reaching outage occurred the next day in the U.S., when at least one in Southern California affected enterprises and consumer users up and down the West Coast, and as far away as Raleigh, NC. It affected the Level 3 part of CenturyLink’s network, a transit provider it acquired in 2017. Merrill Lynch a disruption to its business as a result of the network outage, during which its brokers were intermittently unable to access their workstations. The incident started hitting enterprises and their users around 10 a.m. ET, with most disruption resolved by around 11:30 a.m. ET.

Update April 20

Total outages spiked 58% during the week of April 13-19 fueled by one prolonged outage that had a significant effect on multiple ISPs.

That one outage affected TeliaNet, Level 3, AT&T, and other ISPs on April 13. TeliaNet was the most affected of the group, and it’s not clear whether it was the cause of the outage,ThousandEyes says. During the downtime, at least one application provider withdrew route through the TeliaNet network until the next day.

Had that one outage not occurred, the total number of outages for the week would have been in the low 200s, which ThousandEyes says is in the normal range. As it turned out, total outages rose 59% from 177 to 282.

ISP outages jumped from 141 to 243 week over week, up 72% worldwide, and from 56 to 98 (75%) in the U.S.

Public cloud outages dropped off from the week before, from 19 to 14 (down 36%) worldwide, and stayed steady at six outages in the U.S.

Global application-provider networks had a slight increase in outages worldwide, up from 9 to 11 (22%), but dropped from 9 to 4 in the U.S., down 55%.

In other major outages,ThousandEyes said it appeared that several banks effectively suffered denial of service conditions when customers apparently flooded their sites seeking to find out whether they’d received their pandemic-related stimulus checks. Content-delivery networks serving the banks didn’t have network issues yet were unable to return Web content for many banking sites, “likely due to bank origin servers unable to handle the high volume of requests,” ThousandEyes says.

Update April 13

During the week April 6-Apri 12, service outages for ISPs, cloud providers, and conferencing services dropped overall. They went from 298 down to 177 globally (40%, a six-week low), and in the U.S. dropped from 129 to 72 (44%).

Globally, ISP outages were down from 229 to 141 (38%), and in the U.S. were down from 100 to 56 (44%).

Cloud provider outages were also down overall from 25 to 19 (24%), ThousandEyes says, but jumped up from one to six (500%) in the U.S., which saw the highest rate of increase in seven weeks. Even so, the U.S. total was relatively low. “Again, cloud providers are doing quite well,” ThousandEyes says.

Conferencing services recovered from a spike the week before, and all of the outages – nine – were i.n the U.S. Globally outages dropped from 29 to nine (68.9%), and in the U.S. from 25 to nine (64%).

Update April  6

Outages for ISPs globally were down 9.13% during the week of March 30 from the week before, whereas U.S. outages were down 16.7%, dropping from 120 to 100. Worldwide the outages were also down, from 252 to 229. Public cloud outages rose worldwide from 22 to 25, and in the U.S. there was one outage, up from zero the previous week.

Outages for collaboration apps rose dramatically, increasing more than 260% globally and more than 500% in the U.S. over the week before. The actual numbers were an increase from eight to 29 worldwide, and up from 4 to 25 in the U.S.

ISP Cogent Communications suffered what ThousandEyes called a significant outage April 1 from 12:30 p.m. to 12:35 p.m. Pacific time that affected the ability of users to connect to sites and service such as Office 365. Because Cogent peers with other providers, the customers of those providers might have experienced disruption to some services as well.

Access to Yelp and some applications and sites hosted by AWS and Cloudflare were unreachable between 12:35 and 12:40 p.m. Pacific time on April 1 when Russian ISP Rostelecom leaked illegitimate IP address prefixes to its ISP peers, including Level 3. Such leaks lead to incorrect or less than optimal routing, according to ThousandEyes.

In this case, the leak improperly inserted Rostelecom into the network path between users and the affected providers. Level 3 propagated those improperly advertised routes to its peers, setting off a chain of events that led to massive traffic drops during the outage time.

Update March 31

Looking at data over the past six weeks, ThousandEyes finds that the combined worldwide service outages among ISPs, public cloud providers, conferencing services and edge networks (content-delivery networks, DNS, and security as a service) has risen 42%.

Cloud-provider performance hasn't been affected much at all, and in fact multiple weeks last year had a much higher number of outages.

Week of March 23

Between the week of March 16 and March 23, the outages suffered by ISPs worldwide went down from 230 to 203, nearly 12% lower. In the U.S., the number of outages rose from 100 to 107, up 7%.

Public cloud outages were down both worldwide and in the U.S. Worldwide, they dropped from 21 to 15 (down 28%), and in the U.S. dropped from six to zero. There was a service disruption to Google traffic due to a router failure in Atlanta, it did not meet ThousandEyes’ definition of an outage, and it wasn’t related to COVID-19.

Collaboration applications also showed a decline in outages from the week before, dropping from 15 to six worldwide, and down from seven to three in the U.S., reductions of 60% and 57%, respectively.

ThousandEyes highlighted what it considered significant outages:

  • “Cogent Communications suffered yet another significant outage this week — its fifth major outage this month. The outage occurred within parts of Cogent’s network in Northern California and Oregon and impacted users connecting to sites and services in those regions, including projectbaseline.com, the website of Verily’s much-publicized COVID-19 testing program.”
  • ”For approximately 20 minutes on March 25th, ThousandEyes observed that some users located on the East Coast may not have been able to reach Google services due to 100% traffic loss. A short time later, Google’s SVP of Engineering that the incident was due to a router failure in Atlanta, Georgia. US users outside of the Northeast were also impacted intermittently, although they would have experienced the incident as site errors when trying to reach some Google sites, such as google.com. The HTTP server errors seen during this period are consistent with an inability to reach the backend systems necessary to correctly load various services. Any traffic traversing the affected region — connecting from Google’s front-end servers to backend services — may have been impacted and seen the resulting server errors.”
Related:
1 2 Page 1
Page 1 of 2