Thursday, September 3, 2009

Understanding BGP Misconfiguration

Summary

This paper does a quantitative study and analysis of BGP misconfiguration errors. The authors spent 3 weeks analyzing routing table advertisements to track potential misconfiguration errors. After the 3 weeks, they used the final routing tables to poll a random host within each AS to see if it was reachable. If it wasn't reachable, they emailed the network operators to get more information about the misconfiguration.

Their study focused on 2 types of misconfigurations: origin and export. An origin misconfiguration is when an AS accidentally introduces a route into the global BGP tables. To test this, they looked at short-lived routes because they assumed that an operator would quickly correct configuration errors. An export misconfiguration is when an AS accidentally exports a route to a peer that violates the AS's policy. They use Gao's algorithm to infer the relationships between the ASes. Along with Gao's algorithm, they looked for short-lived paths that violated the "valley free" property.

Once they had the potential misconfigurations, they needed to confirm. They emailed the network operators that they could reach to ask if the incident was in fact a misconfiguration. They further confirmed that by trying to reach random hosts in each AS to see if the AS was still live.

In their results, they found that at least 72% of new routes seen in a day resulted from an origin misconfiguration. Out of those, only 4% resulted in a loss of connectivity. In terms of their duration, over half lasted less than 10 minutes and 80% were corrected within the hour.

Then, the authors explain the various causes of origin misconfiguration (initialization bugs, old configuration, hijacks, forgotten filter, etc.) and export misconfiguration (prefix based configuration, bad ACL).

Finally, the authors describe several solutions to mitigate the problem of misconfiguration. The first is improving the human interface design by using principles such as safe defaults and large edit distance between the wrong and correct version. The second is to provide the operators with a high level language to configure the routers rather than the error-prone low-level language. The third is to do more consistency checks on the data directly in the routers. The last is to extend the protocol like S-BGP does.

Criticisms & Questions

One of their statistics is that about 75% of all new prefix advertisements were results of misconfigurations. However, they say that only 1 in 25 affects connectivity. I'm curious about what happens to the remaining packets that aren't affecting connectivity. Are they simply being ignored by the router and not being inserted in the BGP table, or are they added in and just aren't affecting connectivity? I was fairly surprised that such a high fraction of packets are results of misconfigurations.

When talking about the causes of the misconfigurations, the authors focused on slips and mistakes, both of which are not malicious. However, I would like to know if and how many of the misconfigurations were caused by someone using it as an attack. It seems like an attacker or a shady operator could intentionally cause problems with these misconfigurations.

Feedback

I enjoyed reading the paper and the authors did a good job answering many of the concerns I had as I read the paper.

1 comment:

  1. Due to route filtering, combined with most of the misconfigurations occur in edge networks, the problem configurations don't propagate very far in the network.

    ReplyDelete