Is BGP Update Storm a Sign of Trouble: Observing the Internet Control and Data Planes During Internet Worms

I recall doing some analysis when SQLSlammer hit, and looking over the routing data I had available to me. It was a pretty disruptive 36 hours or so, honestly, and looking over the changes in the network, I saw an avergae of three BGP messages for every network that was affected. This appearantly mapped to a BGP withdaw message, an announcement, and then a path update message. What's interesting in this work is that they line up such BGP data with reachability measurements, which I did not have access to at the time.
There are considerable reasons to wish to understand the relationship between the Internet’s control and data planes in times for stress. For example, the much publicized Internet worms—Code Red, Nimda and SQL Slammer—caused BGP storms, but there has been comparatively little study of whether the storms impacted network performance. In this paper, we study these worm events and see whether the BGP storms observed during the worms actually corresponded to problems in the Internet’s data plane. By processing and analyzing two datasets from RIPE, we have found that while BGP update storms occurred in all three worms, the performance of the data plane degraded during the Slammer worm but did not during the Code Red and the Nimda. No direct correlation should be drawn between the degradation of the Internet data plane and the occurrence of a BGP update storm—it may not be a sign of trouble but a sign of the Internet control plane doing its job.
Source: Is BGP Update Storm a Sign of Trouble: Observing the Internet Control and Data Planes During Internet Worms, by Matthew Roughan, Jun Li, Randy Bush, Zhuoqing Mao and Timothy Griffin, Proceedings of SPECTS 2006.

September 15, 2006 in Nimda, papers, routing, SQLSlammer | Permalink | Comments (2)

Early Detection of BGP Instabilities Resulting from Internet Worm Attacks

This is an interesting proposal, but I'm not sure that routing disruptions are the right place to detect the spread of a worm. After all, the preceeding days' worth of posts showed how large the routing disruptions can be, but there's always some BGP disruption that is going on. What's more, only a small number of worms have truly impacted BGP routing tables.

The increasing incidences of worm attacks in the Internet and the resulting instabilities in the global routing properties of the Border Gateway Protocol (BGP) routers pose a serious threat to the connectivity and the ability of the Internet to deliver data correctly. In this paper we propose a mechanism to detect/predict the onset of such instabilities which can then enable the timely execution of preventive strategies in order to minimize the damage caused by the worm. Our technique is based on online statistical methods relying on sequential change-point and persistence filter based detection algorithms. Our technique is validated using a year's worth of real traces collected from BGP routers in the Internet that we use to detect/predict the global routing instabilities corresponding to the Code Red II, Nimda and SQL Slammer worms.

Source: Early Detection of BGP Instabilities Resulting from Internet Worm Attacks, S. Deshpande,  M. Thottan, B. Sikdar.

September 30, 2005 in Code Red, detection, Nimda, papers, routing, SQLSlammer | Permalink | Comments (0)

Observation and Analysis of BGP Behavior Under Stress

Continuing with the theme of a worm outbreak's effect on routing, here is a Nanog presentation on the effect of Code Red and Nimda on routing in September, 2001.

Despite BGP's critical importance as the de-facto Internet inter-domain routing protocol, there is little understanding of how BGP actually performs under stressful conditions when dependable routing is most needed. In this paper, we examine BGP's behavior during one stressful period, the Code Red/Nimda attack on September 18, 2001.

The attack was correlated with a 30-fold increase in BGP update messages at a monitoring point that peers with a number of Internet service providers. Our examination of BGP's behavior during the event concludes that BGP exhibited no significant abnormality, and that over 40% of the observed updates can be attributed to the monitoring artifact in current BGP measurement settings.

Our analysis, however, does reveal several weak points in both the protocol and its implementation, such as BGP's sensitivity to transport session reliability, its inability to avoid the global propagation of small local changes, and certain implementation features whose otherwise benign effects are only amplified under stressful conditions. We also identify areas for improvement in the current network measurement and monitoring effort.

Source: Abstract: Observation and Analysis of BGP Behavior Under Stress, Lan Wang, Xiaoliang Zhao, Dan Pei, Randy Bush, Daniel Massey, Allison Mankin, Felix Wu, Lixia Zhang.

September 29, 2005 in Code Red, Nimda, routing, slides | Permalink | Comments (0)

Routing Instabilities and Worms

Several studies have been performed on global routing instabilities caused by worm outbreaks.

Instability can be induced from without as well as within. In July 2001 the Internet worm known as Code Red 2 (CRv2) spread across the globe. It exploited holes in the Microsoft IIS server to gain access and port itself to a machine. Once on-board it randomly chose IP addresses looking for other computers running IIS with the same access vulnerability, and continued the propagation. The intensity of the attack could be monitored as a function of time by sniffing network traffic in a given domain for scans (i.e. a particular type of connection request) that were characteristic of the worm. Another worm, Nimda, struck in September 2001. It propagated in a similar fashion, using a larger arsenal of vulnerabilities and a faster means of spreading across the Internet. Temporal correlation between these two worm attacks routing instability was noticed by researchers at Renesys Corporation[2]. Figure 1 illustrates an example that plots the number of route withdrawals per 30 second time interval observed at one router, against the rate of .probes. generated by Nimda in the same time period, observed in one network. It turns out that this spike in withdrawals occurs in a wave, across routers situated all over the globe. Recall that a router withdraws a route to an AS only when it cannot reach that AS. The global wave shows that somehow the traffic induced by the worm caused a wave of distress among the routers.

Source: Challenges In Using Simulation to Explain Global Routing Instabilities, David M. Nicol.

Additional measurements have been made for specific worm outbreaks:

This paper examines the surge in BGP updates that coincide with events such as the recent Internet worm attacks. Although the Internet routing infrastructure was not a direct target of the January 2003 Slammer worm attack, the worm attack coincided in time with a large increase in the number of BGP routing update messages observed globally. Our analysis shows that the current global routing protocol BGP allows local connectivity dynamics to propagate globally. As a result, any small number of edge networks can potentially cause wide-scale routing overload. For example, two small edges ASes, which announced less than 0.25% of BGP routing table entries, contributed over 6% of total update messages during the worm attack as observed at the major monitoring points. Although BGP route flap damping has been proposed to eliminate such undesirable global consequences of edge instability, our analysis shows that damping has not been fully deployed even within the Internet core. Our simulation further reveals that partial deployment of BGP damping not only has limited effect but may also worsen the routing performance under certain topological conditions. The results show that it remains a research challenge to design a routing protocol that can prevent local dynamics from triggering global messages in order to scale well in a large, dynamic environment.

Source: Analysis of BGP Update Surge during Slammer Worm Attack, Mohit Lad, Xiaoliang Zhao, Beichuan Zhang, Dan Massey, and Lixia Zhang.

We speculate that, although most of the traffic in the Internet continued to flow normally through the small fraction of links that make up the global backbones, most of the links at the Internet edge had serious performance problems during the worms' probing and propagation phases. A complete list of reasons still needs to be documented, but we suspect i) congestion-induced failures of BGP sessions due to timeouts; ii) flow-diversity induced failures of BGP sesions due to router CPU overloads; iii) proactive disconnection of certain networks; and iv) failures of other equipment at the Internet edge such as DSL routers and other devices.

Source: Global Routing Instabilities during Code Red II and Nimda Worm Propagation, James Cowie, Andy Ogielski, BJ Premore and Yougu Yuan.

Given the number of worm outbreaks in the past four years, the amount of global routing instability that ensures is rare. However, when it does happen, several thousand BGP routes are dropped and up to 36 hours pass before stablity is fully restored. An interesting phenomenon.

December 10, 2004 in Nimda, papers, routing | Permalink | Comments (0)

fewer worms through fewer bugs?

we all know this story that's seeing renewed life in the context of the worm problem:

"Most software development processes used today do not incorporate effective tests, checks or safeguards to detect those software coding defects that result in product vulnerabilities."

source: Will code-check tools make for worm-proof software?,  CNet News, May 26, 2004.

while this is true, and many devastating worms have been carried through buffer overflows, this isn't going to be the panacea some hope it will be. yes, detect and fix bugs. i'm all for static analysis (and dynamic analysis) of code t spot problems. the state of the art is improving, and tools are getting better every year. they should be used.

fix bugs and you fix security problems in the process (OpenBSD is a stunning example of this approach). static analysis tools will help there. i'm certainly not saying this shouldn't be done.

but they won't lead to "worm proof software." not all worms spread through buffer overflows. sure, Sapphire, Code Red, Sasser, Blaster, and Nimda all used buffer overflows as exploit vectors in their propagation. however, plenty of effective worms continue to be found which don't use such methods. the phatbot/agobot toolkit, for example, uses guessable or empty passwords as one of its attack methods. Nimda used a Unicode misinterpretation bug as one of its techniques (to get to a command shell). etc etc etc ... this doesn't even count mail-based worms, where all you have to do is get someone to click on an attachment and they're excuting arbitrary code in a remote context.

static analysis for buzzword compliant bugs, such as overflows, will stop some worms. but it's not even the low hanging fruit of bugs. until static analysis tools can identify basic logic or default configuration errors, the worm problem won't go away.

May 26, 2004 in Nimda, tools | Permalink | Comments (1)