Archive for Cisco

Top Internet Outages of 2025 Studied By Cisco ThousandEyes

Posted in Commentary with tags on January 30, 2026 by itnerd

The folks at Cisco ThousandEyes have put out a study on the Top Internet Outages of 2025. It highlights the top outages and what happened as well as what to expect going forward. It’s an interesting piece and is worth your time to read.

You can find it here: https://www.thousandeyes.com/blog/the-top-internet-outages-of-2025-analyses-and-takeaways

CISA Issues Alert Regarding Cisco Firewall Zero-Days

Posted in Commentary with tags on September 29, 2025 by itnerd

Late last week, the Cybersecurity and Infrastructure Security Agency issued an emergency directive in response to a widespread campaign that involves exploiting zero-day vulnerabilities in Cisco firewall devices – giving threat actors access to the devices and enabling them to execute malicious code and malware.

Here is some commentary on the significance of these vulnerabilities and insights for security leaders from cybercrime expert and VP of Cyber Risk for HITRUST, Tom Kellermann.

“The exploitation of Cisco firewalls underscores the dangerous nature of island hopping through security vendors’ vulnerabilities. This systemic attack to U.S. government agencies represents a clear and present danger to national security. Cybersecurity vendors must ramp up their own security postures in 2025 and the private sector must expand third party risk management to include cybersecurity vendors in order to mitigate future widespread attacks by China.”

Once again it is time to patch all the things. Because this is one of those “today problems” which seem to be multiplying like rabbits. That’s not a good place for those of us on the side of keeping users and organizations safe to be.

NTT DATA and Cisco Partner to Power Networking for the AI Era

Posted in Commentary with tags , on September 3, 2025 by itnerd

NTT DATA and Cisco today unveiled a new co-sponsored IDC InfoBrief, Wired for Intelligence: A CIO Guide to Enterprise Networking for AI. The study shares strategic guidance for organizations seeking to accelerate transformation by modernizing their network infrastructure.

As organizations integrate AI into more applications, from manufacturing and healthcare to financial services, the demand for high-speed, low-latency, and secure networks is surging. Legacy infrastructure is no longer sufficient to support the scale and complexity of AI workloads. NTT DATA and Cisco are responding to this shift by helping clients evolve from outdated architectures to intelligent, adaptive infrastructure that can power AI-driven innovation.

The Critical Foundation Empowering AI-Driven Growth

The study highlights that network modernization is at the heart of AI success. More than 78% of companies say that networking capabilities are either important or very important when selecting providers for GenAI infrastructure — underscoring the need for networks that can handle and secure ever-scaling AI workloads while running complex AI training, inference, and storage clusters with ease. At the same time, modernization also infuses AI into network operations through AI-driven configuration, anomaly detection, self-healing, and intelligent monitoring to accelerate issue resolution and elevate user experience. Already, industries like manufacturing, healthcare, and financial services are leveraging AI in networking to improve operational efficiency, ensure secure connectivity and reduce costs.

NTT DATA is Enabling Network Modernization Through Intelligent Services

NTT DATA’s comprehensive suite of intelligent services helps clients modernize their digital infrastructure and build secure networks. These services span the full lifecycle from advisory to sourcing, professional services, support and managed services to enable organizations to modernize and unlock the full potential of AI. With many companies undergoing hardware refresh cycles due to the emergence of AI, NTT DATA’s services are designed to meet this critical moment:

  • Advisory: Strategic guidance to align network modernization with AI goals.
  • Strategic Technology Sourcing: Recommending and procuring the right technology to transform network to be AI-ready.
  • Professional Services: Architecting and deploying scalable, secure and high-performance networks.
  • Software-Defined Infrastructure Services: Driving business outcomes through adoption of automation and AI agents into infrastructure operations and license optimization.
  • Adoption Services: Maximizing value from infrastructure investments through greater adoption of software, continuous improvement and change management.
  • Managed Network Services: End-to-end network management to ensure seamless data flow from edge to cloud, minimizing latency and enhancing application responsiveness.

NTT DATA recently launched AI-powered Software Defined Infrastructure (SDI) services for Cisco products to deliver intelligent automation and real-time insights to optimize infrastructure, reduce costs, and drive business outcomes.

Guest Post: In live sports streaming, some minutes matter more than others

Posted in Commentary with tags on August 25, 2025 by itnerd

By: Sofie Feeney, Regional Leader for Northern Europe at Cisco ThousandEyes

A data driven approach to optimize live sport streaming

Broadcasters have long been awake to the issues of a break in programming or transmission.

Dead air – when silence is mistakenly broadcast instead of regular content – continues to cause maximum discomfort for traditional TV and radio broadcasters, not least because in those crucial seconds of nothingness, people have the chance (and propensity) to switch, either elsewhere or off.

In online streaming, the equivalent experience is glitches – in either network or backend services – that manifest as streams that pixelate, break up, excessively buffer or stop working altogether.

How important those lost seconds or minutes are to a stream depends a lot on the nature of the event. In live sports, an untimely glitch can be the difference between seeing a world record being made, and not.

Visibility into ephemeral connections

Within a live sports broadcast, not all minutes are equal. Proportionally, a minute in the context of the Olympics 100-meter dash carries more weight than a minute in a 90-minute football game.

In the dash, a lost minute could mean missing out on the color commentary preamble as well as the 10-second race in its entirety; in a 90-minute game, the best case scenario is the loss of a comparatively speaking uneventful passage of play.

The exception to that is when a lost minute of the 90-minute game contains a clutch play: where a crucial score is made or a controversial penalty is awarded. Then, that minute is just as important to the broadcast as the one that contains the 100-meter dash final.

The challenge for a streaming provider is that it’s impossible to know ahead of time, of course, with any certainty, which minutes of a live broadcast will be the most crucial: so there’s a need to treat every minute as critical.

One thing that can help streamers – and the service providers that carry streams to customers – is to become more data-driven in their approach, using visibility to understand the ephemeral nature of the connection between the broadcast site and end user audience at any point in time.

This understanding is helpful to make more informed calls that can optimize the streaming experience, such as performing dynamic resource allocation and routing of streams, based on how the live event plays out.

Predicting the Internet path

Top sports streaming providers are increasingly tapping into software agents at different points in the content delivery chain to understand how the stream looks as it makes its way to the consumer.

These software agents can run at the live site, where microwave or satellite links are used to relay content back to a central transmission coordination center; in the data center and cloud, tracing the path content takes as it is sent to a content delivery network (CDN) for onward distribution; and inside consumers’ homes, right up to the point the content reaches the end user’s modem or smart TV.

At all of these different points (hops) in the digital delivery chain, latency and delay can be measured, providing an indication of how the ultimate streaming experience is landing, and whether a performance bottleneck exists that needs to be investigated further.

Visibility and measurement is particularly important wherever content moves off private network links and onto the public Internet. The nature of the Internet and of the underlying network infrastructure means that available paths for traffic are always evolving and constantly changing. Every time a live stream happens, it is likely to encounter a different set of ambient conditions and take a slightly different path to reach the end user.

The predictability of that path depends on how much intelligence the sport or live streaming provider has about it. The greater the visibility, the more predictable the path to the end user is, since the provider can make conscious choices about which network providers they partner with, based on a solid understanding of how each routes or re-routes traffic in a variety of circumstances. It also makes identification of a fault domain easier, in the event a performance bottleneck is identified that requires remediation while the stream is happening.

The best-placed live sports streaming providers are able to validate underlying network conditions before they go live with a broadcast. By setting up tests that show how a stream would perform for different users in different geographic locations, they can be best positioned to understand what is happening ahead of time. They also have a reference point that they can track performance against for the duration of the streaming event.

Guest Post: When AI ambitions are dictated by cloud matters

Posted in Commentary with tags on August 19, 2025 by itnerd

By Mike Hicks, Principal Solutions Analyst, Cisco ThousandEyes

Cloud operations today are for the most part mature. Enterprises have a comfort level with cloud: it has a defined role in an operational sense, and there’s enough support available – through a combination of architectural best practice, community, knowledge, visibility and automation – to optimally run most digital applications and workloads in public, private or hybrid cloud environments.

Moreover, cloud technology has become a key for widespread access to AI. In years past, only a select few private companies would have had access to the high-performance compute capacity required to run generative AI workloads. Cloud is proving to be the great leveller, making this level of compute accessible – and the AI services that use it available – to all who wish to use it.

But it’s coming at a cost. Not necessarily a financial one, although it’s a factor in decision-making. The bigger cost is to cloud optimisation approaches. Put simply: widespread and intensive AI adoption is starting to push organisations outside of their comfort zones when it comes to cloud configurations. Targeted action is required to get comfortable with cloud again.

Understanding AI characteristics

To understand why established norms in cloud operations are being tested, one must first understand the nature of the AI workloads that cloud is now being asked to drive.

AI workloads are powerful, both in the sense of the value that they can bring to enterprises and the amount of compute resources required to run them at scale.

This will only increase as Agentic AI becomes the dominant type of AI encountered in enterprise environments. Agentic AI signifies a tighter integration of AI technology into business processes, with autonomous or semi-autonomous software agents handling key processes or parts of those processes to meet specific goals. These systems can make rapid decisions, manage complex tasks, and adapt to changing conditions, assuming underlying systems are performant, but we’ll get to that.

What enterprises need to know is that Agentic AI is more interactive than other forms of AI – “talking” constantly to source systems, data repositories, external tools, databases, and APIs , which makes it a more latency-sensitive evolution of artificial intelligence technology. A cloud or connectivity disruption or failure could lead to an agent-led process either failing to kick off or achieve what it’s intended to.

The main thing to understand about AI workloads is they have different characteristics to the workloads used to define cloud operational parameters today. That means past decisions to make a digital application or workload perform optimally in the cloud are not always cross-applicable to AI. Today’s cloud setups are not designed to meet a very different set of requirements, nor were they intended to.

For enterprises, it’s clear that the same effort that went into optimising cloud setups for a digital context must now be repeated to optimise cloud setups for AI.

The onus is on enterprises to understand and capture the characteristics of their different AI workloads, such that supporting cloud infrastructure can be architected and configured to meet evolving performance needs.

What this will look like in the cloud

For most enterprises, the reality is that AI and the source systems it taps into run in multiple clouds, in multiple data centres, and across a complex network of owned and unowned connectivity links.

Not all AI services will be available in a local region or zone, and that may be an overriding factor in an enterprise’s choice of AI model.

From an operational excellence perspective, enterprises need to determine where the infrastructure underpinning an AI service and the users of that service are based, to understand whether a cloud environment can support those requirements or if changes need to be made.

This includes understanding the extent of the AI’s exposure to “common” infrastructure, such as having a large amount of traffic being funnelled over a single fiber link, or through a single aggregation point such as a point-of-presence in a high-density data center that has a high concentration of AI service providers present. Such concentration risk and single points of failure may exceed internal risk tolerances, given the increasingly critical role that AI plays.

Enterprises need to understand how every provider or part of their AI service delivery chain operates. How does a provider prioritise traffic at certain transit or hand-off points? Do they perform their own load balancing? How will this impact AI service delivery? The answers to these questions may give enterprises cause to re-architect their cloud setups to diversify traffic routes and improve redundancy options.

Performance efficiency will be impacted by these decisions. A roundtrip response time of 50ms might be acceptable for a basic generative AI application, such as a user asking a question and expecting a contextual response. But for a busy Agentic AI system, if every query response takes 50ms, that will quickly add up. Users may experience excessive transaction times, timeouts or other congestion and latency-related issues as a result.

Enterprises can improve performance efficiency by proactively identifying optimisation opportunities for traffic and cloud resource usage.

Starlink Outage Analysis: July 24, 2025

Posted in Commentary with tags on August 4, 2025 by itnerd

By The Cisco ThousandEyes Team

Summary: On July 24, Starlink experienced a 2.5-hour outage that impacted users globally. This analysis explores our observations, which indicate a centralized control plane failure of Starlink’s Low Earth Orbit (LEO) service. Our findings also provide important lessons for enterprises incorporating LEO satellite connectivity into their network architecture.

ThousandEyes actively monitors the reachability and performance of thousands of services and networks across the global Internet, which we use to analyze outages and other incidents. The following analysis is based on our extensive monitoring, as well as ThousandEyes’ global outage detection service, Internet Insights. See how the outage unfolded in this analysis.

Outage Analysis

On July 24, Starlink experienced a 2.5-hour global outage starting around 19:13 UTC. Users worldwide lost Internet access as their terminals failed to connect to a satellite, and instead began a cycle of reconnection attempts. The outage affected users across the globe within every geo where service is available, including North America, Europe, and Australia. To understand what caused such a widespread disruption, we examined the evidence from ThousandEyes’ extensive Internet monitoring data set.

What Happened During the Starlink Outage?

ThousandEyes monitoring observed unusual failure patterns during the Starlink outage, which suggested a system-wide issue rather than failures originating out of specific locations within Starlink’s network.

Starlink’s network of satellites is referred to as a constellation. This constellation is centrally coordinated and controlled via a software-defined control plane. User terminals are constantly handed off between clusters of satellites within this constellation, a process managed by the control plane, which is also responsible for real-time traffic engineering, seamless handoffs, and load balancing across thousands of moving nodes. When this control plane fails, it can systemically compromise the constellation’s ability to route traffic, creating a single point of failure with global impact—which appears to be what happened during the July 24 outage.

The failure of Starlink’s control plane during the incident manifested in a number of ways:

  • Inability to Connect to a Satellite: Globally, a large number of terminals were unable to establish any connection at all, remaining in a continuous state of searching for connection, indicating a failure to associate with a satellite.

Figure 1. End-user terminals searching for satellites during the outage

●                    Backbone Routing Failures: Other terminals successfully connected to Starlink’s ground station infrastructure via satellite, but traffic could not route beyond it, suggesting issues within Starlink’s backbone network, a critical component of its service. This pattern indicated that while the physical satellite link may have been active at certain points, the network’s ability to forward traffic within its own network was compromised.

Figure 2. Multiple connection failure patterns observed in different regions

●                    End-to-end Traffic Instability: At certain points during the incident, some user terminals appeared to successfully connect to a satellite and forward traffic to a destination via Starlink’s network. However, these paths were highly unstable, exhibiting significant packet loss before failing completely. This suggested the data plane—the infrastructure that forwards user traffic—was sporadically functional but unreliable.

Figure 3. The connection was briefly established, but all other endpoints didn’t connect

Explore this outage further in the ThousandEyes platform (no login required).

Deducing the Cause: Hardware vs. Control Plane

The key to diagnosing the Starlink issue lies in comparing the outage characteristics against the known operational principles of a LEO constellation.

A satellite hardware failure, for instance, would be governed by the constellation’s physical mechanics. Because each Starlink satellite completes an orbit every 91-95 minutes, such a failure would manifest as a rolling disruption where service would degrade and recover across regions in cycles matching this orbital period—as faulty satellites passed overhead and were replaced by functional ones.

Instead, ThousandEyes observed:

●                    A simultaneous global failure affecting all regions at once

●                    A service breakdown lasting 2.5 hours, far exceeding the orbital period

●                    No correlation between the outage timing and predictable satellite orbital paths

This evidence effectively rules out distributed hardware issues. Rather, the observed pattern—a sudden, global, and largely persistent failure of the network to direct traffic—is a typical signature of control plane-related disruptions. Control plane issues can trigger a variety of traffic behaviors, including erratic failure patterns and disruptions in different parts of a network.

The recovery pattern also pointed to control plane issues. As service was restored, ThousandEyes observed a staggered, non-uniform recovery where routing paths were re-established intermittently, and terminals reconnected gradually, not following any clear regional pattern as a hardware issue might have. This behavior is consistent with a complex control plane re-initializing and re-establishing stable routing states across a dynamic, moving topology of thousands of satellites.

Figure 4. Staggered path recovery was observed as connectivity was restored

Insights From Official Statements

ThousandEyes’ findings align with Starlink’s own public statements. The Vice President of Engineering reported that “the outage was due to failure of key internal software services that operate the core network.” Subsequently, an industry report noted that Starlink owner SpaceX informed resellers that the issue stemmed from an “upgrade procedure” involving software rollout to Starlink’s “ground-based compute clusters,” which host the constellation’s control plane.

What Can NetOps Teams Learn From the Starlink Outage?

Stalink’s inter-constellation communication is a closed system that, unlike autonomous networks on the ground, is not designed to directly connect or interoperate with other providers. Traffic must ultimately be routed through Starlink’s IP network on the ground, before it can be handed to another network, such as a service provider or app/cloud provider. In effect, the constellation can become untethered from the Internet. This architecture has implications for enterprises seeking to incorporate Low Earth Orbit (LEO) satellite connectivity into their network architecture, highlighting important considerations for connectivity planning and risk management. Network IT operators should keep the following considerations in mind:

●                    Design for Transport Diversity: Any network is subject to failure, even at global scale. True resilience requires transport diversity—combining satellite with fiber, cellular, or other connectivity types that are not subject to the same control plane.

●                    Plan for Service-specific Failures: Traditional business continuity plans often focus on site-specific disasters (e.g., a fire or power outage), where geographic diversity offers protection. This incident highlights the need to plan for service-specific global failures, where all your locations could be impacted by a single incident. An updated plan should identify critical service dependencies and establish clear procedures for operating in a degraded or offline state.

Guest Post: How Satellites Can Strengthen Your Digital Resilience

Posted in Commentary with tags on July 29, 2025 by itnerd

By Mike Hicks for Cisco ThousandEyes 

Summary

Explore the different types of satellite technology, their strengths and weaknesses, and the factors that must be considered when it comes to digital resilience.

We talk a lot about cloud computing, but there’s a connectivity layer way above the clouds that should be part of the conversation on achieving digital resilience: satellite connectivity.

Both Geostationary (GEO) and Low Earth Orbit (LEO) satellites can be used to deliver high-speed connectivity to remote areas that are underserved by fixed-line networks. But it’s not only people in rural areas who can benefit from satellite connections—they can also provide a very useful alternate option for customers of fixed-line services.

Whether as your primary connection or a backup circuit, there’s a lot to consider when it comes to satellite connectivity, so here we’re going to explore the different types of satellite technology, their strengths and weaknesses in different use cases, and the factors that must be considered when it comes to achieving digital resilience.

Geostationary Satellites

Let’s start with Geostationary satellites.

Geostationary (GEO) satellites orbit at the same rotational speed as Earth. This means that a GEO satellite completes a circular orbit around the Earth in 24 hours. As a result, the satellite’s position and coverage area remains fixed relative to a specific location or observer on the Earth’s surface. They do move occasionally, either because they’ve drifted slightly out of position and need to be moved back, or because they’re being shifted to a new location. The satellites have fuel on board to drive them when needs be, but by and large they cover a specific footprint.

Geostationary satellites orbit at a huge distance from the Earth, approximately 22,000 miles (or 35,000 km) above the equator. That has both advantages and downsides. On the plus side, hovering at such height means each satellite can cover an enormous area on the ground; a single satellite can cover as much as a third of the Earth’s surface.

However, that level of altitude comes at the expense of responsiveness. Latency times can stretch from several hundred milliseconds to 1 second. That’s not a disaster for day-to-day web surfing, but it’s a huge problem for real-time applications such as video conferencing. The bandwidth of geostationary services, particularly on the uplink, is often also restricted to the tens of megabits per second, well below the gigabits per second you can achieve on fiber connections.

LEO Satellites

Geostationary satellites have in many ways been superseded by Low Earth Orbit (LEO) satellites, of which a well-known example is Starlink.

LEO satellites orbit at a much lower altitude, typically in the range of 310-745 miles (or 500-1,200 km). That means latency is greatly reduced, sometimes as low as 50 ms. That’s not quite fiber levels of latency, but it’s not a million miles off either. Bandwidth is considerably greater with LEO than geostationary too, with download speeds stretching into the hundreds of megabits per second.

In contrast to Geostationary satellites, LEO satellites are always in motion. If you visit this map, you can see a mesmerizing real-time map of this enormous mesh of 6,000+ satellites traveling around the Earth.

However, this constant motion can create connectivity challenges. Just like your cell phone connects to different cell towers as you drive down a highway, a satellite receiver must also switch from one orbiting satellite to another as they pass over your location on Earth. We can see from ThousandEyes data—and even in the Starlink app—that at times, this can lead to brief disconnections, causing the receiver to momentarily lose connection with a passing satellite overhead.

How Are GEO and LEO Similar?

Whether GEO or LEO, there are connectivity characteristics with satellite services beyond those we’ve already discussed that are shared by both technologies.

For example, both are currently reliant on ground stations, which send and receive signals from the satellites and connect them to the wider Internet. The satellites then relay this signal from the ground station to receivers in people’s homes or businesses, providing a two-way link.

The availability of ground stations is, therefore, a crucial component in service performance. The closer a user is to a ground station, the faster their data will reach the Internet backbone, because it has less distance to travel. For those living in remote areas, far away from the closest ground station, this can greatly increase latency or decrease bandwidth.

Additionally, both GEO and LEO satellites are susceptible to atmospheric conditions and weather. Factors like fog, heavy clouds, and lightning can disrupt the signal, and even snow accumulation on receiver equipment can negatively affect performance. For instance, to mitigate possible weather impacts, Starlink’s receiver, commonly known as “Dishy,” is equipped with heating elements to melt snow. A clear line of sight from the dish to the satellite is also essential; obstructions such as tree branches swaying in the wind can cause signal disruptions.

Building Digital Resilience

While satellite connectivity has some disadvantages when compared to fixed-line fiber, it also possesses unique strengths. This is why it is increasingly becoming a key component of business resilience planning.

There is not enough satellite bandwidth for communication provider networks to fully rely on satellite as a complete fallback option to serve their customers, but individual businesses and consumers can. In Perth, Australia, I use a fixed-line Internet connection complemented by Starlink. I have router equipment that combines the bandwidth from both connections, allowing me to match network characteristics to application requirements.

This exemplifies an important factor to consider when choosing among various types of Internet connectivity: understanding the characteristics of your applications and aligning them with the network’s capabilities. For instance, the Starlink connection offers significantly higher downstream bandwidth compared to my fixed-line connection, making it more suitable for activities like streaming video or general web browsing. In these scenarios, latency or occasional connection drops are not critical issues as most streaming services buffer a few minutes of video in advance to accommodate potential network interruptions.

When it comes to video conferencing or recording The Internet Report podcast, however, I usually opt for a fixed-line connection. For real-time applications like these, where I’m engaging in conversations with people around the world, it’s essential to minimize latency and ensure consistent service. Inconsistent lag or signal drops can lead to a degraded experience (not to mention an awkward conversation!). Therefore, I prioritize reducing latency and improving consistency, even if it means sacrificing some bandwidth, by using the network that best fits the requirements of these applications.

This comes back to a favorite theme of mine: having a holistic understanding of your overall service delivery chain. Resilience is about keeping the lights on, making sure you always have sufficient connectivity to meet your demands. Whether you opt for fixed-line as your primary connection and satellite as your fallback, or vice versa, will depend on your individual requirements.

One important requirement to consider is the need for reliable connectivity while on the move. Initially, Starlink focused on providing satellite connectivity to specific fixed locations. However, it now offers Starlink Mini, which allows you to take a portable unit with you when you travel. Additionally, special equipment is available for use on boats, where other coverage options might be limited. As a result, given these on-the-move use cases, satellite connectivity may be used in conjunction with 4G or 5G services, instead of relying solely on fixed-line connections. This creates an entirely different set of characteristics to compare.

Improving Traffic Flow

Although satellite connectivity presents challenges, particularly the risk of service interruptions, various traffic management measures are being developed to mitigate these risks and enhance resilience.

One such measure is the TCP BBR (Bottleneck Bandwidth and Round-trip propagation time) congestion control algorithm created by Google, which is utilized for services like YouTube and Google Cloud. Unlike traditional algorithms such as CUBIC, which rely on packet loss to detect congestion, BBR assesses the available bandwidth between the sender and receiver and optimizes the data transmission rate accordingly. It continuously monitors the round-trip time and adjusts the data rates to adapt to changing network conditions.

This approach is especially beneficial for high-latency connections like satellite Internet, as BBR aims to maximize throughput without significantly reducing transmission speeds when packet loss occurs, as older algorithms tended to do.

Google asserts that BBR helps maintain shorter network queues, which can reduce round-trip time by a third. This improvement positively affects response times in latency-sensitive applications, such as chat and gaming, which, as mentioned earlier, are not ideally suited for satellite links.

Space-age Resilience

It’s astounding that satellites traveling at 17,000 mph can enhance the reliability of your Internet connection, but it’s true.

Whether it’s as a backup link to a fixed-line connection or even as your primary connection in an area with limited access to fiber, satellite connectivity is now an affordable, high-speed alternative. Your individual needs and application characteristics must be considered carefully, but space really could provide that extra layer of connectivity resilience you’ve been looking for.

Guest Post: Internet & Cloud Research – The Factors Determining LEO Internet Performance

Posted in Commentary with tags on July 25, 2025 by itnerd

By Mike Hicks & Kemal Sanjta for Cisco ThousandEyes 

Summary

Dive into how LEO Internet through Starlink works, which factors determine the download speed and latency of an individual connection, and the difference that various congestion avoidance algorithms can have on the service’s performance.

Low Earth Orbit (LEO) Internet is a transformative technology that offers a cost-effective method for providing widespread coverage without requiring extensive ground infrastructure. This is particularly beneficial for sparsely populated areas where fixed-line broadband is often impractical or prohibitively expensive.

LEO satellite technology has the potential for low latency and high throughput, making it a viable option for various applications, including Earth observation and research. Consequently, customer interest has surged, leading to a competitive market with multiple companies providing similar services.

In this research, we use Starlink as a case study to examine factors influencing performance, such as throughput, latency, and how different congestion avoidance algorithms affect service quality. Our findings will demonstrate that not all Starlink connections perform uniformly.

How Starlink Works

Starlink is a massive and growing fleet of satellites traveling in low earth orbit, operated by SpaceX. At the time of writing, there are well over 6,000 Starlink satellites deployed, providing a mesh of coverage that spans more than 100 countries and several continents.

The satellites are deployed at altitudes ranging from 310-745 miles (500-1,200 km). This altitude is significantly lower than the geostationary satellites that preceded LEO satellites, which orbit at approximately 21,750 miles (35,000 km) above the Earth. This closer proximity to Earth means LEO technology can offer lower latency and faster speeds than geostationary Internet.

Starlink customers connect to a network of satellites using their Starlink-supplied dish. Starlink offers Internet service for both residential and business customers, available as fixed or mobile options. 

The customer’s dish both sends and receives data from the satellites flying overhead within various frequency bands. Satellites connect with the rest of the Internet using Starlink’s network of ground stations.

Starlink has around 150 active ground stations, but these aren’t uniformly distributed across the planet. In some countries, such as the United Kingdom, there are several ground stations. In others, such as parts of Scandinavia, there are currently none. The significance of this will be discussed shortly.

The ground stations connect the satellite data via fiber to the company’s Points of Presence (POPs)—of which Starlink has many across the globe—and from there to the rest of the Internet.

The Ground Station and POP Impact

To understand the impact of ground stations and POPs on performance, we conducted thousands of throughput tests in locations worldwide, aiming to identify patterns in the performance of LEO Internet as provided by Starlink.

The first thing to note is that our speed tests revealed that Starlink consistently delivers on—or outperforms—its stated speeds in all of the locations that we tested. We tested on the residential fixed plan, with estimated download speeds of 25-100 Mbps, uploads of 5-10 Mbps, and latency of 25-60 ms. The average download speeds were in triple digits in almost all of the locations we tested, with some regions comfortably exceeding 250 Mbps. 

However, we did notice significant variations in speeds and latency, and some of that can probably be attributed to the proximity of ground stations and POPs. As we noted earlier, some countries have multiple ground stations, others have none. That means the wireless signal between satellite and ground station has to travel further, which increases latency. We noted earlier that Scandinavia has no ground stations, so it’s no great shock to see Stockholm as the test destination with the highest latency in Europe, albeit still within Starlink’s estimated bounds.

It’s also worth noting that the proximity of ground stations and POPs could become less relevant as time goes on. Why? Because the newer Starlink satellites are fitted with laser links called Inter-Satellite Links (ISL) that allow Starlink’s satellites to communicate directly with one another, rather than having to send data back and forth to the ground. This means that data can be relayed across the satellite network before reaching a ground station, allowing the service to operate in areas where ground stations aren’t available, such as in the polar regions.

There are also other potential reasons for the large discrepancies between regions that we saw in our tests. Obstructions in the satellite’s path (such as tree branches swinging in the wind) can cause lower-than-expected performance from our test location in Germany, for example. The Starlink app, though, highlights such obstructions, as shown in Figure 1.

Figure 1. Starlink application indicating the location of obstruction

Suboptimal peering strategies could also explain some of the variation, as could performance throttling when a particular satellite link or ground station is under heavy load. Satellite connectivity is also inherently a lossy technology; in other words, it typically suffers from much higher packet loss than fiber connections. This lossy characteristic leads us to the next part of our research.

Switching Congestion Algorithms

To minimize the impact of packet loss on performance, congestion algorithms such as CUBIC and BBR can play a critical role. CUBIC was designed to manage the effects of packet loss in high-speed, long-distance networks, whereas BBR (Bottleneck Bandwidth and Round-trip propagation time), developed by Google, is an algorithm designed to further optimize network utilization and throughput by continuously probing for available bandwidth. BBR adapts to increases in latency by gradually lowering the sending rate. This is in contrast to the CUBIC algorithm, which reduces the delivery speed when it detects packet loss.

In our study on performance, we therefore conducted initial tests using the default congestion algorithm CUBIC, and then switched to BBR to compare results. Given that we controlled the environment end to end, we were able to enable BBR both on the client side (controlling egress traffic) and on the server side (controlling the client’s ingress traffic) to understand the benefits of using BBR in both directions.

Our tests spanned multiple locations globally, targeting dedicated servers at major points where we had Starlink dishes deployed. In the United States, we deployed dedicated, non-throttled servers in US East (Virginia), US Central (Iowa), and US West (Oregon). In Europe, we had dedicated servers in EU West (London, U.K.) and EU Central (Frankfurt, Germany). Lastly, in Australia, we deployed our testing server in AU East (Sydney). 

The results when we switched to BBR were startling. The download throughput between our Georgetown, Texas, and U.S. West Coast data centers, for example, improved almost ten-fold. Between Weinstadt, Germany, with its partially obstructed link to the satellite, and the EU Central data center, the download throughput increased by a staggering 18.4 times with BBR switched to.

We saw improved performance on the uplink too, with anywhere between a 1.2-fold and 3.4-fold improvement in upload speeds when BBR was activated.

CUBIC and BBR Throughput Differences

The results listed below are based on sustained throughput measurements as part of separately testing ingress and egress traffic. We are showing results that were obtained over 7,200 data points and thus represent a good indication of what to expect throughput-wise over longer time periods and for larger data transfers. 

Results for the United States

As shown in Table 1, Selkirk, NY achieved the highest download speed of 40.102 Mbps, despite having the highest latency of 82.662 ms while using the default congestion algorithm, CUBIC. North Bend, WA recorded the highest upload speed at 6.773 Mbps with the lowest latency of 56.772 ms. In contrast, Georgetown, TX had the poorest performance, with download speeds of 10.860 Mbps and upload speeds of 4.902 Mbps.

After switching to the BBR congestion algorithm, all locations demonstrated significant improvements. Notably, Georgetown’s download speed increased dramatically from 10.860 Mbps to 106.668 Mbps, representing a remarkable 9.8-fold improvement. Additionally, Selkirk experienced the most substantial increase in upload speed, rising from 5.631 Mbps to 19.404 Mbps, which reflects a 3.4-fold increase.

Table 1. Throughput differences between CUBIC and BBR when testing with a server hosted in US West

As shown in Table 2, our testing on a dedicated, non-throttled server located in Selkirk, NY, demonstrated the highest download speed at 36.177 Mbps and an upload speed of 6.801 Mbps, with the lowest latency recorded at 50.664 ms. In contrast, Georgetown, TX, had one of the poorest performances, delivering the lowest download speed at 17.049 Mbps. Additionally, San Francisco, CA, registered the lowest upload speed of 4.509 Mbps.

Switching from the CUBIC to the BBR congestion control algorithm resulted in significant improvements. The agent in North Bend, WA, experienced a remarkable 7.7-fold increase in download speeds, rising from 17.458 Mbps to 133.741 Mbps. Furthermore, North Bend, WA, also witnessed the largest enhancement in upload speeds, improving 3.3-fold from 4.651 Mbps to 15.736 Mbps.

Table 2. Throughput differences when testing to US Central

Testing with a server located in US East showed that Selkirk had the highest download speed at 74.247 Mbps and the highest upload speed at 11.449 Mbps, along with the lowest latency of 32.210 ms. This emphasizes the importance of being close to the POP to which the dish is assigned. In contrast, North Bend, WA performed the worst, recording the lowest download speed at 12.436 Mbps and the lowest upload speed at 3.983 Mbps, along with the highest latency of 115.788 ms. The results for North Bend are to be expected, given the geographical characteristics of the dish’s deployment and the testing server’s location.

Table 3. Throughput differences when testing to US East

Results for Europe

Testing the EU West region while using CUBIC as the congestion avoidance algorithm revealed that Weinstadt, DE achieved the highest download speed at 39.434 Mbps, while Jaen, ES recorded the highest upload speed at 8.840 Mbps. Epe, NL had the lowest download speed at 16.454 Mbps, and Weinstadt recorded the lowest upload speed at 6.353 Mbps. Interestingly, Weinstadt exhibited both the highest download and the lowest upload speeds. We attribute these discrepancies to the fact that the testing agent faced physical obstructions to the clear sky during the tests.

Switching to the BBR algorithm resulted in improved speed values across all locations, with the most significant improvement observed in Epe, NL, which experienced a 17.2-fold increase in download speeds—from 16.454 Mbps to 283.013 Mbps. Despite the obstructions, Weinstadt, DE saw a 2.5-fold increase in upload speeds, rising from 6.353 Mbps to 16.369 Mbps.

Table 4. Throughput results when testing to EU West

As shown in Table 5, the testing conducted in the EU West revealed that Epe, NL achieved the best results for both download (76.010 Mbps) and upload (10.975 Mbps) speeds. In contrast, Weinstadt, DE, despite having the lowest latency (27.251 ms) to the testing server, performed the worst, with a download speed of only 6.336 Mbps and an upload speed of 4.820 Mbps. This poor performance can be attributed to its physical obstruction, which hindered its view of the sky.

After switching to BBR, Weinstadt, DE saw a significant improvement in its performance. Download speeds increased dramatically from 6.336 Mbps to 117.049 Mbps, marking an impressive 18.4-fold increase. Upload speeds also improved substantially, rising from 4.820 Mbps to 14.123 Mbps, a 2.9-fold increase. What makes these results even more remarkable is that the agent was still physically obstructed during this assessment, further underscoring the advantages of BBR over CUBIC.

Table 5. Throughput results when testing to EU West

Results for Australia

Brookvale recorded the highest download speed at 61.367 Mbps and the highest upload speed at 9.862 Mbps, along with the lowest latency of 27.642 ms. In contrast, Perth experienced the highest latency at 88.038 ms. Erskineville had the lowest download speed at 33.199 Mbps, while Perth also had the lowest upload speed at 5.972 Mbps. This data further illustrates that physical proximity to the assigned POP significantly impacts performance.

Switching to BBR resulted in substantial improvements across all locations, with a notable highlight being Erskineville’s download speed increase of 7.9-fold, improving from 33.199 Mbps to 264.460 Mbps. For uploads, Perth experienced the largest increase of 2.1-fold, rising from 5.972 Mbps using CUBIC to 12.988 Mbps with BBR.

While the results after switching to BBR are significant, before we all start rushing to switch to BBR on our LEO satellites, there are a couple of important points to consider. The speed tests we conducted were based on raw throughput, not application data. While BBR can provide higher throughput, it can also create issues such as buffer bloat and higher retransmission rates, especially in lossy network environments such as satellite connections.

By switching to BBR, you might actually be pushing the problem of retransmissions back to the application server, because it’s effectively saying: “I have a gap in my data, so you need to send that through again,” whereas CUBIC would likely slow down the rate of transmission to maximize the chances of getting all the data you need in the first place.

Therefore, until we can leverage real application data to perform tests on LEO connectivity over Starlink, it’s a little premature to suggest that switching to BBR is the performance panacea that it may first appear to be.

The Next Step

The ability to demonstrate increased throughput with BBR indicates that satellite links possess characteristics well-suited for BBR’s hybrid approach, which combines bandwidth efficiency with control over latency caused by buffering. This underscores BBR’s potential to optimize LEO satellite communications and highlights its adaptability to distinct network conditions while effectively managing latency.

The next step for our research is to answer questions that revolve around how different applications react to varying amounts and spikes of packet loss. What would the impact be of switching to BBR when using LEO Internet? How would it affect application performance? And even if it did offer improved performance, would the associated costs of retransmission make it prohibitive to implement?

LEO Internet is a fascinating technology with its own unique characteristics. As with everything we test, you have to consider the full service delivery chain to truly understand its implications.

AI could double the strain or solve it says Cisco

Posted in Commentary with tags on June 30, 2025 by itnerd

With companies increasingly pouring funds into AI, recent research from Cisco points to a major infrastructure shift across enterprise networks — AI could double the strain or solve it.

Here’s a snip from the press release that is tied to the report:

As AI assistants, agents, and data-driven workloads reshape how work gets done, they’re creating faster, more dynamic, more latency-sensitive, and more complex network traffic.

Combined with the ubiquity of connected devices, 24/7 uptime demands, and intensifying security threats, these shifts are driving infrastructure to adapt and evolve. The result: IT leaders are changing how they think about the network: what it is, what it enables, and how it protects the organization. The network they build today will decide the business they become tomorrow.

You can view the report here and it is worth your time to read if your responsibility covers this.

Cisco Has Apparently Had A Data Breach

Posted in Commentary with tags on February 11, 2025 by itnerd

Cybersecurity News is reporting that Cisco has suffered a data breach linked to the Kraken ransomware group with sensitive credentials from its internal network and domain infrastructure leaked online. Additional details here:

https://cyberpress.org/cisco-data-breach-2/

Jim Routh, Chief Trust Officer at cybersecurity company Saviynt, commented:

“I had an opportunity to speak to other CISOs about this incident last week. The prevailing viewpoint is that this is a highly sophisticated ransomware-as-a-service attack that took many months of diligent work by the threat actor using sophisticated tools. One of the tools used, Mimikatz, is designed to extract credentials from Microsoft Active Directory and is only accessible by privileged users or threat actors using credentials from those with privilege . Mimikatz is both an exploit on Microsoft Windows that extracts passwords stored in memory and software that performs that exploit. It was created by French programmer Benjamin Delpy and is French slang for “cute cats”. Wikipedia  

“The most effective set of controls to manage this risk is within privilege user monitoring (PAM) with an added component for continuous validation. The continuous validation, in this case, is measuring the deviation in established on-line patterns for privileged users (while using the privilege) and revoking the privilege automatically in milliseconds when the deviation score of the patterns triggers it. This approach (continuous validation) on top of PAM is not widely used today and there are only a few commercial products developing this capability.” 

This incident shows that even the big guys can get pwned if they don’t have proper controls in place. Which illustrates why you need to do everything possible, no matter how difficult to keep yourself from getting pwned.