You might recall that after the nationwide outage in July, Rogers was told by the CRTC to serve up an explanation as to what happened and what they’re going to make sure it doesn’t happen again. Which Rogers did do. Though some details were redacted which makes it look like Rogers has something to hide. Regardless, that document wasn’t good enough for the CRTC based on this:
This letter is in relation to the national service outage experienced by Rogers Communications Canada Inc. (Rogers) that began on 8 July 2022 and is in reply to Rogers’ response dated 22 July 2022 to the Commission’s request for information (the Response). The Response provided valued information for the Commission to understand the cause of the outage, impact of the outage and immediate steps to mitigate the impact.
Since the Response, Rogers has made public statements about measures to reduce the likelihood of similar events as well as address consumers’ concerns. To better understand the impact of the event, mitigating measures as well as these recent announcements, further information is required for the Commission to assess this situation.
Given the seriousness of this event, and while the Commission considers the complaints and calls for a public inquiry before it, Rogers is to provide, by 15 August 2022, comprehensive answers, including rationale and any supporting information, to the questions included in the attachment.
This letter and any subsequent correspondence will be placed on the public record. Should Rogers designate any information in its response as confidential pursuant to section 39 of the Telecom Act, an abridged version of the response must be provided for the public record. Note that, in accordance with its normal practices, the Commission may disclose or require the disclosure of information designated as confidential if its disclosure is in the public interest, i.e., where any specific direct harm likely to result from disclosure does not outweigh the public interest in disclosure.
First off, you’ll note that the request for Rogers to provide this information by August 15th. But Rogers has asked for more time. Specifically August 22nd.
Regardless, this is what the CRTC is looking for answers on:
- A cost breakdown of the $250 million that Rogers is saying that it is going to spend to make its network more resilient
- Implementation timelines and information on how separating networks will improve resilience
- They want to know why Rogers is spending $10 billion in Artificial Intelligence and further testing and oversight and why that’s going to make a difference
- Rogers has to provide details on the direct economic losses of the outage and confirmation if residential and small business customers receive the credit the company promised
Read from a cynical point of view, it sounds like the CRTC believes that Rogers is just putting stuff out there that they have no intention of doing in hopes of making the blowback from their outage go away, and the CRTC wants proof that they will actually follow through. It also sounds like the CRTC is for once doing its job. Which means that when Rogers puts out their response next week, it will not only be worth reading, but it will likely be a means to hold the troubled telco’s feet to the fire.
The CRTC Puts Out An Executive Summary About The July 2022 Rogers Outage
Posted in Commentary with tags CRTC, Rogers on July 6, 2024 by itnerdA reader tipped me off to the posting of this executive summary written by a third party named Xona Partners Inc. on behalf of the CRTC in relation to the major Rogers outage that happened in July of 2022. I encourage you to read it at your leisure. But I want to draw your attention to two items. The first is this:
Root cause of the network failure. The July 2022 outage is attributed to an error in configuring the distribution routers within the Rogers IP network. Rogers staff removed the Access Control List policy filter from the configuration of the distribution routers. This consequently resulted in a flood of IP routing information into the core network routers, which triggered the outage. The core network routers allow Rogers wireline and wireless customers to access services such as voice and data. The flood of IP routing data from the distribution routers into the core routers exceeded their capacity to process the information. The core routers crashed within minutes from the time the policy filter was removed from the distribution routers configuration. When the core network routers crashed, user traffic could no longer be routed to the appropriate destination. Consequently, services such as mobile, home phone, Internet, business wireline connectivity, and 9-1-1 calling ceased functioning.
But there’s more. This also got my attention:
Deficiency in the change management process. The configuration error, which led to the removal of the policy filter from the configuration of the distribution routers, is the result of a change management oversight by Rogers staff. Rogers staff deleted the policy filter that prevented IP route flooding in an effort to clean up the configuration files of the distribution routers. The change management process, which includes audits of change parameters, failed to flag the erroneous configuration change.
That’s pretty bad that a top tier telco like Rogers had a change management process that was suspect. If I was still a customer of Rogers, I’d be rethinking whether I should be doing business with Rogers. Though I have to say that this report also says that Rogers is making improvements in this area.
There’s a couple of other items that I want to draw your attention to. Staring with this:
Limited communication among Rogers staff. Rogers staff relied on the company’s own mobile and Internet services for connectivity to communicate among themselves. When both the wireless and wireline networks failed, Rogers staff, especially critical incident management staff, were not able to communicate effectively during the early hours of the outage. Rogers had to send Subscriber Identity Module (SIM) cards from other mobile network operators to its remote sites to enable its staff with wireless connectivity to communicate with each other. The absence of sufficient alternative means of communication slowed the Rogers response to the July 2022 outage.
This is a problem. Again this report indicates that this has been addressed. But it’s pretty bad that Rogers assumed that nothing would ever happen to their network. And as a result didn’t come up with a plan to have another option for key staff to communicate.
The second item that I want to draw your attention to is this:
Separate IP core for the wireless and wireline networks. Following the outage, Rogers announced it had decided to separate the IP core network for its wireless and wireline networks. This decision entails deploying a new IP core for the wireless network, while the existing IP core would remain to serve the wireline network. Therefore, if one IP core network were affected by an outage, the other IP core network would remain unaffected and operational.
Rogers has not yet finalized the implementation of the IP core network separation, which remains a work in progress. When implemented, separate IP core networks for the wireless and wireline networks will help to contain a failure to its respective access network and, therefore, avoid the type of catastrophic network failure experienced in the July 2022 outage, where both wireless and wireline services were unavailable due to the outage in the common core IP network. IP core network separation would improve the overall resiliency of the Rogers wireless and wireline networks.
Rogers would do well to give customers and non-customers exact timelines as to when this will get done. I say that because simply saying you’re going to do something without saying when you’re going to do it is meaningless. More on this in a bit.
One thing to keep in mind is that the CRTC has put this out there to keep Rogers honest. Specifically:
Today, the CRTC published the executive summary of the expert report completed by Xona Partners Inc. (Xona) on Rogers’ July 2022 outage.
Based on Xona’s findings, the measures taken by Rogers have addressed the cause of the outage. Xona also made additional recommendations to Rogers to further enhance the reliability and resilience of their network, and Rogers has confirmed the implementation of all measures.
In order to prevent future outages, Rogers must report to the Commission on: 1) whether the measures continue to effectively address reliability issues; and 2) progress made to separate the wireline and wireless core networks. The report must be provided by 4 July 2025.
We’ll see a year from now if Rogers is truly serious about making sure that their infrastructure is actually reliable for all Canadians.
Leave a comment »