A reader tipped me off to the posting of this executive summary written by a third party named Xona Partners Inc. on behalf of the CRTC in relation to the major Rogers outage that happened in July of 2022. I encourage you to read it at your leisure. But I want to draw your attention to two items. The first is this:
Root cause of the network failure. The July 2022 outage is attributed to an error in configuring the distribution routers within the Rogers IP network. Rogers staff removed the Access Control List policy filter from the configuration of the distribution routers. This consequently resulted in a flood of IP routing information into the core network routers, which triggered the outage. The core network routers allow Rogers wireline and wireless customers to access services such as voice and data. The flood of IP routing data from the distribution routers into the core routers exceeded their capacity to process the information. The core routers crashed within minutes from the time the policy filter was removed from the distribution routers configuration. When the core network routers crashed, user traffic could no longer be routed to the appropriate destination. Consequently, services such as mobile, home phone, Internet, business wireline connectivity, and 9-1-1 calling ceased functioning.
But there’s more. This also got my attention:
Deficiency in the change management process. The configuration error, which led to the removal of the policy filter from the configuration of the distribution routers, is the result of a change management oversight by Rogers staff. Rogers staff deleted the policy filter that prevented IP route flooding in an effort to clean up the configuration files of the distribution routers. The change management process, which includes audits of change parameters, failed to flag the erroneous configuration change.
That’s pretty bad that a top tier telco like Rogers had a change management process that was suspect. If I was still a customer of Rogers, I’d be rethinking whether I should be doing business with Rogers. Though I have to say that this report also says that Rogers is making improvements in this area.
There’s a couple of other items that I want to draw your attention to. Staring with this:
Limited communication among Rogers staff. Rogers staff relied on the company’s own mobile and Internet services for connectivity to communicate among themselves. When both the wireless and wireline networks failed, Rogers staff, especially critical incident management staff, were not able to communicate effectively during the early hours of the outage. Rogers had to send Subscriber Identity Module (SIM) cards from other mobile network operators to its remote sites to enable its staff with wireless connectivity to communicate with each other. The absence of sufficient alternative means of communication slowed the Rogers response to the July 2022 outage.
This is a problem. Again this report indicates that this has been addressed. But it’s pretty bad that Rogers assumed that nothing would ever happen to their network. And as a result didn’t come up with a plan to have another option for key staff to communicate.
The second item that I want to draw your attention to is this:
Separate IP core for the wireless and wireline networks. Following the outage, Rogers announced it had decided to separate the IP core network for its wireless and wireline networks. This decision entails deploying a new IP core for the wireless network, while the existing IP core would remain to serve the wireline network. Therefore, if one IP core network were affected by an outage, the other IP core network would remain unaffected and operational.
Rogers has not yet finalized the implementation of the IP core network separation, which remains a work in progress. When implemented, separate IP core networks for the wireless and wireline networks will help to contain a failure to its respective access network and, therefore, avoid the type of catastrophic network failure experienced in the July 2022 outage, where both wireless and wireline services were unavailable due to the outage in the common core IP network. IP core network separation would improve the overall resiliency of the Rogers wireless and wireline networks.
Rogers would do well to give customers and non-customers exact timelines as to when this will get done. I say that because simply saying you’re going to do something without saying when you’re going to do it is meaningless. More on this in a bit.
One thing to keep in mind is that the CRTC has put this out there to keep Rogers honest. Specifically:
Today, the CRTC published the executive summary of the expert report completed by Xona Partners Inc. (Xona) on Rogers’ July 2022 outage.
Based on Xona’s findings, the measures taken by Rogers have addressed the cause of the outage. Xona also made additional recommendations to Rogers to further enhance the reliability and resilience of their network, and Rogers has confirmed the implementation of all measures.
In order to prevent future outages, Rogers must report to the Commission on: 1) whether the measures continue to effectively address reliability issues; and 2) progress made to separate the wireline and wireless core networks. The report must be provided by 4 July 2025.
We’ll see a year from now if Rogers is truly serious about making sure that their infrastructure is actually reliable for all Canadians.
The CRTC Puts Out An Executive Summary About The July 2022 Rogers Outage
Posted in Commentary with tags CRTC, Rogers on July 6, 2024 by itnerdA reader tipped me off to the posting of this executive summary written by a third party named Xona Partners Inc. on behalf of the CRTC in relation to the major Rogers outage that happened in July of 2022. I encourage you to read it at your leisure. But I want to draw your attention to two items. The first is this:
Root cause of the network failure. The July 2022 outage is attributed to an error in configuring the distribution routers within the Rogers IP network. Rogers staff removed the Access Control List policy filter from the configuration of the distribution routers. This consequently resulted in a flood of IP routing information into the core network routers, which triggered the outage. The core network routers allow Rogers wireline and wireless customers to access services such as voice and data. The flood of IP routing data from the distribution routers into the core routers exceeded their capacity to process the information. The core routers crashed within minutes from the time the policy filter was removed from the distribution routers configuration. When the core network routers crashed, user traffic could no longer be routed to the appropriate destination. Consequently, services such as mobile, home phone, Internet, business wireline connectivity, and 9-1-1 calling ceased functioning.
But there’s more. This also got my attention:
Deficiency in the change management process. The configuration error, which led to the removal of the policy filter from the configuration of the distribution routers, is the result of a change management oversight by Rogers staff. Rogers staff deleted the policy filter that prevented IP route flooding in an effort to clean up the configuration files of the distribution routers. The change management process, which includes audits of change parameters, failed to flag the erroneous configuration change.
That’s pretty bad that a top tier telco like Rogers had a change management process that was suspect. If I was still a customer of Rogers, I’d be rethinking whether I should be doing business with Rogers. Though I have to say that this report also says that Rogers is making improvements in this area.
There’s a couple of other items that I want to draw your attention to. Staring with this:
Limited communication among Rogers staff. Rogers staff relied on the company’s own mobile and Internet services for connectivity to communicate among themselves. When both the wireless and wireline networks failed, Rogers staff, especially critical incident management staff, were not able to communicate effectively during the early hours of the outage. Rogers had to send Subscriber Identity Module (SIM) cards from other mobile network operators to its remote sites to enable its staff with wireless connectivity to communicate with each other. The absence of sufficient alternative means of communication slowed the Rogers response to the July 2022 outage.
This is a problem. Again this report indicates that this has been addressed. But it’s pretty bad that Rogers assumed that nothing would ever happen to their network. And as a result didn’t come up with a plan to have another option for key staff to communicate.
The second item that I want to draw your attention to is this:
Separate IP core for the wireless and wireline networks. Following the outage, Rogers announced it had decided to separate the IP core network for its wireless and wireline networks. This decision entails deploying a new IP core for the wireless network, while the existing IP core would remain to serve the wireline network. Therefore, if one IP core network were affected by an outage, the other IP core network would remain unaffected and operational.
Rogers has not yet finalized the implementation of the IP core network separation, which remains a work in progress. When implemented, separate IP core networks for the wireless and wireline networks will help to contain a failure to its respective access network and, therefore, avoid the type of catastrophic network failure experienced in the July 2022 outage, where both wireless and wireline services were unavailable due to the outage in the common core IP network. IP core network separation would improve the overall resiliency of the Rogers wireless and wireline networks.
Rogers would do well to give customers and non-customers exact timelines as to when this will get done. I say that because simply saying you’re going to do something without saying when you’re going to do it is meaningless. More on this in a bit.
One thing to keep in mind is that the CRTC has put this out there to keep Rogers honest. Specifically:
Today, the CRTC published the executive summary of the expert report completed by Xona Partners Inc. (Xona) on Rogers’ July 2022 outage.
Based on Xona’s findings, the measures taken by Rogers have addressed the cause of the outage. Xona also made additional recommendations to Rogers to further enhance the reliability and resilience of their network, and Rogers has confirmed the implementation of all measures.
In order to prevent future outages, Rogers must report to the Commission on: 1) whether the measures continue to effectively address reliability issues; and 2) progress made to separate the wireline and wireless core networks. The report must be provided by 4 July 2025.
We’ll see a year from now if Rogers is truly serious about making sure that their infrastructure is actually reliable for all Canadians.
Leave a comment »