Tuesday, November 24, 2009

BotGraph: Large Scale Spamming Botnet Detection

Summary

This paper proposes BotGraph, a system used to detect and map botnet spamming attacks which attack major Web email providers. BotGraph includes an efficient implementation to construct and analyze large graphs.

BotGraph has 2 main components: aggressive sign-up detection and stealthy bot-user detection. First, BotGraph detects aggressive account signup by looking for a sudden increase in sign-ups from a single IP address. BotGraph uses this information to limit the number of email accounts the botnet can control.

Then, the second part is to identify which are the set of machines controlled by the bot. To do this, BotGraph builds up a user-user graph, where every node is a user and every edge has a weight which represents how similar the 2 users are. BotGraph uses the logic that all the nodes in a bot-user group use the same toolkit and so must have similar behaviors. Therefore, by using the appropriate similarity metric for the graph, the bot-user group will be a connected component in the user-user graph.

The similarity metric that BotGraph uses is the number of common IP addresses logged in by the 2 users. BotGraph relies on the following 2 properties of botnets:
  • The sharing of one IP address: ratio of bot-users to bots is 50:1, so many bot-users use one bot to login, resulting in a shared IP address
  • The sharing of multiple IP addresses: botnets have high churn rate so bots are added and removed often from the botnet. To maximize account utilization, the spammer will reassign accounts to different bots, resulting in each account being used on multiple IP addresses.
The authors implemented their graph creation algorithm using Hotmail logs from 2 1-month periods in 2007 and 2008. The logs contained a signup log and a login log. After constructing the user-user graph, they found 26 million botnet-created user accounts with a false positive rate of 0.44%.

Criticisms & Questions

This was an interesting idea that they proposed and seems to work based on the assumptions they are making. However, given that spammers are constantly evolving their technique, the basic assumptions they make about bots sharing IP addresses might not be valid in the future.

Monday, November 23, 2009

Not-a-Bot: Improving Service Availability in the Face of Botnet Attacks

Summary

This paper proposes Not-a-Bot (NAB), a system that identifies and verifies human-generated and bot-generated traffic to prevent email spam, DDoS attacks and click-fraud. NAB is comprised on 2 main components: the attester, which lives on the client machine and the verifier, which lives on the server that processes the requests. Requests are attested on the client machine saying that they are human-generated and then sent to the server, who verifies the attestation and use application-specific policies to deal with the requests.

NAB has 4 main requirements:
  • attestations must be generated in response to human requests automatically
  • attestations must not be transferable from the client where they were generated to attest traffic from another client
  • NAB must benefit its users but not hurt those that do not
  • NAB must preserve existing privacy and anonymity of applications
The main job for the attester is to determine whether to provide (and provide) an attestation when an application requests one. The attester uses information from keyboard and mouse activity along with timing to determine if the action was human-generated. The attestation contains a signature over the entire request content to ensure that the content used to generate the attestation is tied to the attestation.

Attestations always satisfy the following 2 properties:
  1. Non-transferability: prevents an attestation from being forged to appear as if it were generated for another attester
  2. Binding to the content of the request
NAB also includes a verifier that runs on the server that processes the requests. The verifier checks that the attestation is valid. Once it is, it uses application-specific policies to deal with the request. Examples include aggressive filtering of spam, prioritizing requests in case of DDoS and only serving valid attestation to prevent click-fraud.

NAB uses nonces in the attestation to provides the following 2 security guarantees:
  • attestations can't be double-spent
  • a bot can't steal key clicks and accumulate attestations beyond a fixed time window

By using traces of keyboard and mouse activity of 328 users along with traces of spam, DDoS, and click-fraud, they evaluated NAB. They found that it reduced the volume of spam by 92%, deprioritized 89% of bot-originated web activity and detected bot-originating click-fraud activity with 87% accuracy.


Criticism & Questions

I think this paper proposed an interesting idea, and if deployed widely enough could be useful. One practical concern is that although the authors mention that latency overhead is negligible, if many applications start to require more and more attestations, the latency overhead could rise high enough to be significant.

I liked that they used real traces of users' keyboard and mouse activity. Their current heuristic for guessing if the action is human-generated or not is fairly simple and could be improved using more information on human-computer interaction patterns.

In addition, one of the key requirements of NAB is to benefit users while not harming nonusers. However, if any of the verifiers use the policy that attested traffic is given higher priority (such as in the DDoS example), then nonusers' requests will be given lower priority, which is essentially harming their user experience.

Thursday, November 19, 2009

Skilled in the Art of Being Idle: Reducing Energy Waste in Networked Systems

Summary

This paper explores methods to reduce power usage in idle systems in the most effective way possible.  Some of the problems for systems to enter low-power sleep modes  when idle is that doing so causes the system to lose network connectivity and prevents it from running tasks scheduled to run when the system is in low utilization.

After collecting and analyzing data from 250 enterprise users, they were able to find out the value of proxying, which applications would benefit from proxying and how comprehensive it would need to be in order to be effective.  

Their studies showed that machines are idle about 50% of the time, and this results in 60 TWh/year of wasted electricity, which amounts to 6 billion dollars.  When comparing idle time traffic between home and office environments, they found that average number of packets at home is 1 and at the office is 3.  Based on these they found that any solutions that woke the host up to handle network traffic wouldn't be efficient.

Based on the above data, they propose a proxy architecture that supports different idle-time behaviors.  When designing the proxy, they classified the traffic along 2 dimensions, the first being the need to proxy the traffic with the following options:
  • don't wake protocols: generate sustained and periodic traffic and should be dealt with without waking the host
  • don't ignore protocols: require attention to make sure correct order of operations
  • policy-dependent traffic: how it's handled depends on tradeoff between sophistication of functionality and energy savings.
and the second being the complexity required to proxy the traffic with the following options:
  • ignorable (drop): can be safely ignored
  • handled via mechanical response: easy to construct required response
  • require specialized processing: usually requires state maintenance
The authors designed the following 4 proxies:
  • proxy_1: ignores all traffic listed as ignorable & wakes machine to handle all other incoming traffic
  • proxy_2: ignores all traffic listed as ignorable, issues responses for protocol traffic to be handled with mechanical responses & wakes the machine for all other incoming traffic
  • proxy_3: (1) and (2) as in proxy_2, but wakes up all traffic belong to any of telnet, ssh, VNC, SMB, and NetBIOS & drops any other incoming traffic.
  • proxy_4: identical to proxy_3 but also wake up for regular network backups, anti-virus software updates, FTP traffic and Intel specific updates.
They find that proxy_1 is inadequate for home and office and that proxy_3 achieves the most sleep in all scenarios.

Criticism & Questions

I think this was a very interesting paper.  The methodology seemed sound and their evaluation  thorough, especially their classification of idle-time traffic.  I will be curious to see how their work goes forward and if it ever gets implemented in real-world machines.

Cutting the Electric Bill for Internet Scale Systems

Summary

This paper describes how electricity prices vary with time and geographic location, and how data centers can take advantage of this information to save in electricity costs.

They rely on 2 key observations:
  1. electricity prices vary
  2. large distributed systems already incorporate request routing and replication
Using price information, the goal is to have a system route client requests to clusters such that total electricity cost is minimized. This also depends on the energy elasticity: the extent to which a cluster's electricity costs depend on the load on that cluster.

By simulating 29 different locations in the US with historical electricy prices, the authors get results that suggest huge cost savings for data centers, including:
  • existing systems can see energy cost savings by at least 2%
  • savings rapidly increase with energy elasticity
  • allowing client-server distances to increase leads to increased savings

Criticism & Questions

This was a very interesting paper to read. I liked how the paper problem was rooted in a practical problem with a very practical solution. This paper was recently published, but I would be interested in knowing if any data centers have tried implementing this solution and what kinds of results they're getting.

Scalable Application Layer Multicast

Summary

This paper proposes a new scalable application-layer multicast protocol, designed specifically for low-bandwidth, data streaming applications.

They used 3 dimensions with which to evaluate application-layer multicast protocols:
  • quality of the data delivery path
  • robustness of the overlay
  • control overhead
The NICE protocol arranges the end hosts into a hierarchy, which defines the multipath overlay data paths. The hierarchical structure is necessary for scalability and localizing the effect of member failures.

The hosts are assigned to different layers in the hierarchy and then split into clusters with other hosts close to them. The host in the geographic center of each cluster is designated as the cluster leader.

The authors performed several simulations of their protocol and Narada application-layer multicast protocol. Their results showed that they had lower link stress and improved end-to-end latencies and similar failure recovery properties, all with order of magnitude lower control traffic.

They also deployed their protocol in a wide-area testbed and found that nodes maintained low-latency paths and had a max packet loss rate of 1% or less. And, the average control overhead was less than 1 Kbps.

Criticisms & Questions

This was also another simple, but clever idea.  It would be interesting to see a direct comparison of this protocol to SRM.  I did that like that this paper did both simulations and an implementation and that they evaluated the results using multiple dimensions.

A Reliable Multicast Framework for Light-weight Sessions and Application Level Framing

Summary

This paper describes Scalable Reliable Multicast (SRM), an application-level reliable multicast framework. SRM draws on principles found in ALF, LWS, group delivery model of IP multicast protocol, and core design principles of TCP/IP: best-effort delivery, reliability on an end-to-end basis, and adaptive parameters.

In regular TCP, the sender is responsible for ensuring reliable data delivery.  This requires receiving ACKs and NACKs from receivers and maintain reception state.  With multicast, this could cause the ACK implosion effect at the sender.  In addition, the IP multicast model imposes a level of indirection between the sender and receivers, making it very difficult for the sender to maintain state for all the receivers.

In the wb framework, receivers are responsible for reliability.  Once a sender sends out a packet to all the receivers, if a receiver detects that it's missing data, it will wait for a random time determined by its distance to the sender and then multicasts a repair request to the group.  This means any host that has the packet can multicast it to the group, and all hosts missing the packet will hear it and cancel their repair request timer.  This reduces the number of repair requests and thus a request and response implosion.

The authors simulate this framework on several different node topologies, including chain and star.  The found that the request/repair scheme worked well for bounded-degrees trees where ever node participates in the multicast, but works less well for a sparse session.  It also works less well when the bounded-degree is really high.

A suggested optimization is local recovery, which means setting a TTL on a repair request packet so that only the neighborhood affected by the loss sees it.

Criticisms & Questions

The idea proposed in this paper was simple, but very clever.  It made use of multicast properties to reduce the number of packets in the network, and could make use of even more properties to optimize it.  I think the simulations had good results, but I would like to see SRM implemented in a real-world environment and see how it performs.

Thursday, November 12, 2009

X-Trace: A Pervasive Network Tracing Framework

Summary

This paper describes the design and implementation of X-Trace, a tracing framework that gives users a comprehensive view of service behavior. The motivation for X-Trace was the lack of a single diagnostic tool across all layers and applications, making it hard for the user to understand the interactions between the different services and protocols and therefore hard to find where the problem originated.

A user will start X-Trace when starting an application task. X-Trace will tag the request with a task identifier, which will be propagated to all layers and protocols so that X-Trace can determine all the operations connected to that task, which results in a task tree.

The network path generally traverses multiple administrative domains (AD). Each AD may not wish to share internal information. X-Trace accomodates this situation by providing a clean separation between the user and the receiver of the trace information. The portion of the X-trace that is relevant to each party will get sent to that party if they have X-Trace turned on, and they can do with the data what they wish. Even if all parties don't turn on X-Trace, it will still provide horizontal slices of the task tree.

There are 3 main design principles used by X-Trace:
1. Trace request should be sent in-band: metadata should be added to the same datapath we want to trace
2. Collected data should be sent out-of-band: allows data to be sent even when failures occur in the path
3. The entity requesting data is decoupled from entity receiving the data: this allows each AD to control its own information.

Critique & Questions

I think X-Trace is an interesting concept. I could see it being very useful for network operators and users as well. Of course, the usefulness increases with the number of ADs on the path that are using it. I'm curious about whether this was ever implemented in all layers in the real-world network.

NetFPGA: A Tool for Network Research and Education

Summary

This paper describes NetFPGA, a project that is designed to give students practical experience with networking concepts at the physical and link layer. Students can use NetFPGA to build real networking hardware and because of its low cost, can connect many together to build interesting networking systems.

Till now, there have been 2 versions of NetFPGA. Version 1 includes 3 FPGA's, one of which is the Control FPGA, preprogrammed to connect the Etehrnet controller to the other 2 User FPGA's. The students built a simple logic analyzer in order to be able to see the signals in a waveform viewer.

There were several limitations to Version 1:
  • Board format was awkward, requiring self-assembly
  • Too slow for research projects
  • No on-board CPU
  • Windows not a possible development platform
This led to Version 2 of the NetFPGA. Version 2 switched to the PCI format which was beneficial because most users were already familiar with this format and it had greater bandwidth. It also possible to use ModelSim on XP to develop.

Future work includes further testing to analyze configuration and hardware problems in addition to improving the logic analyzer.

RCP and Intrustion Detection at ICSI are 2 projects currently making use of NetFPGA.

Critique & Questions

This was a quick read and a good introduction to NetFPGAs.

Tuesday, November 10, 2009

Internet Indirection Infrastructure

Summary

This paper describes an overlay network called Internet Indirection Infrastructure (i3) that is designed to easily provide services like multicast, anycast and mobility. i3 adds a level of indirection on top of the current network, separating the act of sending and receiving from each other. Rather than sending to one or more IP address, a sender will send to an logical identifier. Receivers will indicate which identifiers they are interested in receiving by way of triggers.

i3 was designed to be versatile, allowing users to create services like mobility, anycast and service composition on it. It also decentralized the responsibility for creating efficient trees by delegating that task to the end-hosts.

i3 uses a rendezvous-based communication. Each packet is represented by a (id, data) pair where id is an m-bit identifier and data is the IP packet payload. Receivers send out triggers of the form (id, addr) which states that all packets with an identifier id should be sent to addr. When a receiver would like to receive data, it can put out a trigger in the network. And, then when a packet with that id arrives, it will get sent to the receiver.

The authors implemented and evaluated i3 using the Chord lookup system. Initial evaluations suggest that i3 is highly flexible and can support many sophisticated applications.

Critique & Questions

Interesting paper. The rendezvous-style communication they used is fairly straightforward. Although from a practical implementation standpoint, there do seem to be many challenges to work through before that can take place. I liked that they acknowledged if/how i3 could be implemented for real.

A Policy-aware Switching Layer for Data Centers

Summary

This paper proposes PLayer, a policy aware layer-2 for data centers. PLayer is made up of inter-connected pswitches, policy-aware switches. Middleboxes can then be plugged into the pswitches, moving them off the network path. The pswitches are responsible for deciding which middleboxes each packet needs to go through and forwards the packet accordingly.

The main principles of PLayer are:
1) separating policy from reachability - middleboxes traversed are determined explicitly by a policy and not implicitly for network path selection mechanisms.
2) taking middleboxes off the physical network path - middleboxes are removed from choke points

PLayer aims to provide the following properties:
1) correctness
2) flexibility
3) efficiency

When forwarding packets, the PLayer uses centralized policy and middlebox controllers to setup and maintain the rule tables at the pswitches.

Critique & Questions

I thought this was a very well written and well structured paper. They did a good job at explaining the current problems in data centers. I also liked that they went into detail on how PLayer could be incorporated into the current infrastructure with minimal changes.

Wednesday, November 4, 2009

DNS Performance and the Effectiveness of Caching

Summary

This paper presents an analysis of traces of DNS and TCP traffic from MIT and KAIST. The paper explores the performance as perceived by the client and the effectiveness of caching.

Their data comes from 3 traces: 2 from MIT and one from KAIST. Their data includes outgoing DNS queries and incoming responses and outgoing TCP connection start and end packets. Most of the answered A lookups were followed by a TCP connection; those that weren't were excluded from the analysis. Also, any TCP connections that were not proceeded by DNS queries were excluded.

Client-Perceived Performance:

Latency was found to be adversly affected by the number of referrals that take place in a DNS query. 70% of the referrals were found to have cache hits for an NS record in the local DNS cache, showing that caching NS records are really beneficial as they reduce the load on the root servers.

They found that there was a large number unanswered lookups. They break unanswered lookups into 3 categories: zero referrals, non-zero referrals, and loops. They suggest that the large number of unanswered queries is due to the persistent retransmission strategy used by name servers and the referral loops that exist between name servers. They found that over 99.9% of answered queries were done in at most 2 retransmissions while over 90% required no retransmissions. Therefore, they conclude that the DNS name servers are too persistent in their retry strategy.

10-42% of lookups result in a negative answer. The largest cause are inverse lookups for IP addresses with no inverse mappings. They also found that negative caching may not be so effective because it has a heavy tailed distribution. 15-27% of lookups to root name servers resulted in negative responses, most probably caused by incorrectly implemented or configured resolvers.

Effectiveness of Caching:
- How useful is it share DNS caching among several client machines?
- What is the impact of the value of TTL on caching effectiveness?

Previous work shows that web object popularity follows a Zipf-like distribution. NS records were found to have much higher TTLs than A records, resulting in many fewer DNS requests to the root or gTLD servers. If the NS records weren't cached, the load would on the root or gTLD servers would have increased by a factor of 5, making NS record caching critical to DNS scalability.

They found that more popular domain names have shorter TTLs, supporting the idea that DNS-based load-balancing is useful only for popular sites.

They determine that sharing cache across many clients would not be very helpful. Sites common across the clients are the most popular ones, resulting in some performance gain there. However, most of the sites accessed are usually only accessed by one client. Even if a site is accessed multiple times, it is usually from the same client accessing it successively, and the client's local cache would be adequate for that.

They find that increasing TTL values increases hit rates, but is only noticeable for TTL values less than 1000 seconds. This is because most cache hits are due to multiple successive accesses of the same server by the same client. This suggests that low TTL values on A records wouldn't harm performance much.

Criticism & Questions

Another interesting paper. I think it did a good job at evaluating many of the features that were explained in the first paper. Their traces seemed like a good representation of the real-world environment. Most of their conclusions were logical based on the data they presented.

Most of the graphs were helpful in understanding their data, but some were really hard to make out. One thing that was especially confusing was that Table II came before Table I.

Development of the Domain Name System

Summary

This paper presents the original design, implementation, surprises, successes and shortcomings of the Domain Name System.

Before DNS, the HOSTS.TXT system was used to lookup the mapping from domain names to IP addresses. This system was essentially a single file, centrally located at the SRI Network Information Center and distributed to all hosts. As the network was growing larger and the number of hosts change from number of organizations to number of workstations, this was not a feasible solution. Existing systems at the time were IEN116 and XEROX. Neither fit their requirements and so they set out to design a new system.

The specification for DNS was that it should be a hierarchical name space with typed data at the nodes. Following are some of the requirements/assumptions for the DNS:
  • it should provide at least all the info in the HOSTS.TXT file
  • database should be maintable in a distributed manner
  • there should no obvious size limits for names, name components, data associated with a name, etc.
  • should interoperate across the DARPA internet and other environments
  • provide tolerable performance
DNS has 2 major active components: name servers and resolvers. Name servers store information and answer queries. Resolvers interface with the client programs and are responsible for finding the name server that has the information that the client wants.

The name space is a vairable-depth tree where each node has an associated label. There is not standard printing rule for the name format and it decouples the structre of the tree from any semantics.

There are 2 mechanisms for transferring data from source to destination: zones and caching. A zone includes a contiguous section of the tree, usually controlled by an organization and is responble for maintaining the zone's data and provide redudancy for the zone. DNS resolvers cache responses for the TTL specified by the zone.

The basic algorithm used by the DNS resolver is to search "downward" from domains that it can access already.

While building this system, they came across the following surprises:
  • refinements of semantics
  • performance of the network was much worse than the original design had expected, but lookups that required no network access did really well.
  • the need for negative caching
Some successes of the DNS system:
  • variable depth hierarchy
  • organizational structure of names
  • datagram access
  • additional section processing: responding server can anticipate next query and include that information in the response
  • caching
  • mail address cooperating
Some shortcomings of the DNS system:
  • type and class growth
  • easy upgrade of applications: hard to convert existing apps to use DNS
  • distribution of control vs. distribution of expertise/responsibility

Critique & Questions

Very interesting paper to read. It was nice to see the thought process around DNS. It is interesting to note that this paper deems caching as a success of DNS while the other paper we read shows that caching of A records is not very effective. This can probably be attributed to all the changes in the internet in the past 20 years. I wonder if the distribution of domain names when this paper was written (1988) would be Zipf-like, since there would presumably be fewer domain names. It's also important to note the lack of any authenticity and integrity checking of the data, which poses a security concern, something that was never considered in the original design.

Tuesday, November 3, 2009

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

Summary

This paper describes, Chord, a lookup protocol for peer-to-peer applications. Chord uses DHT with consistent hashing, which provides load balancing and low overhead when a node joins or leaves the network. Chord is also scalable because in a N-node network, every node is only required to know about O(log N) nodes and requires O(log^2 N) messages when doing a lookup.

In the Chord protocol, every node and key is given an identifier using a base hash function. These identifiers determine which keys should reside on which node. After arranging all the nodes in order by their identifier in a clock-wise circle, a key is assigned to the first node whose identifier is equal to or follows the key's identifier. This is called the key's successor. Every node is also aware of its successor node.

When a node joins or leaves the network, only O(1/N) fraction of the keys need to moved to a new location. When a node n joins the network, it will be assigned some of the keys previously assigned to n's successor. And, when a node n leaves the network, its keys will be reassigned to its successor.

The basic idea of the algorithm, given that each node is only aware of its successor, is that when a node cannot resolve a lookup query itself, it will pass the query to its successor, which will then pass it to its successor until a node can resolve the lookup query. This will ensure correctness but will be inefficient. To improve its performance, each node can maintain additional routing information, allowing the node to pass each lookup query further than its successor and closer to the destination.

The authors simulated and implemented the Chord protocol, both of which confirmed that the latency grows with the number of nodes in the network.

Criticism & Questions

I enjoyed reading this paper. I think they did a good job of first explaining the overview of the algorithm and then going into details.

I did have one question about the algorithm. They explain that successor(k) is responsible for key k and what happens to keys when a node enters or leaves the network. However, I'm unsure of how the key-value entries are initially entered into each node's lookup table. Maybe the keys are assigned manually to whatever nodes exist when the application is set up?

Monday, November 2, 2009

Looking Up Data in P2P Systems

Summary

This paper explores peer to peer systems and looks at the Lookup Problem. The Lookup Problem can be stated as how to find a data item X given that it can be stored on a dynamic set of nodes.

There are 2 broad categories for solutions to this problem. The first are structured solutions. This includes maintaining a central database with filename-location mappings and a DNS-style hierarchical approach. Both of these have resilience problems since they store the data in a few critical nodes. The second category is symmetric schemes, in which every node is equally important to every other node. This category includes continually broadcasting to neighbors until a node has it in its database, which doesn't scale very well and using "superpeers" in a hierarchical structure, which again has resilience problems.

Most of the latest P2P lookup algorithms combine both of these categories to get structured symmetric algorithms which provide guarantees as well as resilience. All of these algorithms use a DHT. The general idea of the algorithm is that the lookup query is forwarded node to node, each time getting "closer" to the destination. "Closeness" can be measured in several ways: numeric differences between IDs, number of common prefix bits, bitwise XOR of the IDs.

Chord uses a skiplist-like routing, Pastry, Tapestry and Kademlia use tree-like routing and CAN uses d-dimensional routing. The authors conclude the paper with the following open questions: distance function, operation costs, fault tolerance and concurrent changes, proximity routing, malicious nodes, and indexing and keyword search.

Criticisms & Questions

I liked this paper. I think it did a good job at explaining the different lookup algorithms and the benefits and tradeoffs of each.

One interesting point they made about malicious nodes was that the worst a malicious node could do is "convincingly deny that data exists". This essentially becomes a DOS attack. Even if it is the worst case, isn't that still a cause for worry?

Thursday, October 29, 2009

Active Network Vision and Reality: Lessons from a Capsule-Based System

Summary

The authors of this paper aim to understand more about Active Networks. They do so by designing, implementing and evaluating ANTS active network toolkit, which provides a flexible network layer and addresses some of the performance and security concerns due to code injected into the network. They also point to 3 areas that they have not yet resolved: capsule model of programming, accessibility of that model to all users, and applications can be constructed using that model.

Criticism & Questions

I think this is an interesting contract to RON. Both are trying to solve the general problem of application-aware routing, but in different ways. While RON is implemented at the application layer, active networks are at the IP layer. Although I think active networks are a great idea in theory, they will be a pain to implement correctly. And, since they allow users to input code into the network, without proper isolation, it could lead to a security nightmare.

Resilient Overlay Networks

Summary

A Resilient Overlay Network (RON) is an architecture at the application-layer that allows distributed Internet applications to detect outages and recover quickly (on the order of seconds rather than minutes as it is in BGP). The RON nodes sit on top of the network, constantly monitoring it. This network information is used when sending packets to decide whether to send by the normal path through the Internet or to send to another RON node.

RON was evaluated based on 2 deployments. It was able to detect all the outages that took place and reroute packets, all of which took less than 20 seconds on average. RON was also able to improve throughput by 5% and reduce loss probability by 5% in some cases.

RON nodes aggresively probe and monitor the paths connecting their nodes. They exchange information about the quality of each path using a routing protocol. They build a forwarding table with lots of path information, including latency, loss rate and throughput. The authors argue that RON should be limited in size, between 2 and 50 nodes in order to reduce the bandwidth overhead resulting from the aggressive probing.

RON has 3 major goals:
  1. allow nodes to communicate with each other, even in the face of outages.
  2. integrate routing with distributed applications more closely than before, allowing for application-specific metrics to be used.
  3. provide a framework for the implementation of expressive routing policies, which allow paths to be chosen based on criteria such as forwarding rate controls or notions of acceptable use.
The RON architecture consists of many RON clients distributed over the network. The client interacts with RON using the conduit API in order to send and receive data. The first node that sees the packet is called the entry node, which classifies the packet on the type of path it should use. Then, it determines a path for the packet, adds a RON header and forwards the packet. All other RON nodes simply forward to the next RON node. The final RON node is called the exit node and it delivers the packet to the application.

The entry node has a lot of control over the flow of the packet. The packet is tagged with a flow ID, and subsequent nodes attempt to keep the packet on the same path as the rest of the flow.

When building up its routing table, RON keeps track of (i) latency, (ii) packet loss and (iii) throughput. It uses a link-state routing protocol. The latency-minimizer forwarding table uses an exponential weighted moving average of round-trip latency.

RON was implemented and evaluated. It was found to detect and recover from errors in 18 seconds. It also improves throughput, loss rates and latency.

Criticisms & Questions

I think this was an interesting paper. It was well organized and presented the protocol well. Their evaluation also seemed sound given that they did 2 implementations and RON was found to help in all cases.

I think RON is a good way to allow applications to give information to the transport protocol without breaking the end-to-end argument. It solves many of the problems with application-specific needs.

Tuesday, October 27, 2009

On Inferring Autonomous System Relationships in the Internet

Summary

This paper proposes a graph representation of the AS relationships in the Internet at the time it was written. The authors characterize the relationships into customer-provider, peering and sibling relationships. They came up with heuristic algorithms that look at the BGP routing tables and infer the augmented AS graph.

They test the graph generated by their algorithm on AT&T's internal information and find that over 99% of their graph is accurate. They also test sibling-to-sibling relationships using the WHOIS lookup service and find that over half of their graph is accurate.

Their annotated AS graph consists of nodes connected by one of 3 kinds of edges: provider-to-customer, peer-to-peer and sibling-to-sibling.

They explain the basic 3-phase heuristic algorithm they use. The first phase parses the routing tables and calculates the degree of each AS. The next phase goes through all the entries in the routing table and parses it to find the top provider and consecutive AS pairs before and after the top provider. Finally, in the third phase, it assigns all those consecutive AS pairs with a transit relationship.

The basic algorithm assumes that there are no misconfiguration errors. They also propose a refined algorithm that accounts for any misconfiguration errors. Furthermore, they propose another algorithm that infers peering relationships in addition to the provider-customer and sibling relationships that the previous 2 infer.

Finally, they implement all 3 algorithms and using AT&T's internal data and the WHOIS service, they determine that their algorithms are fairly accurate.

Criticisms & Questions

I think this was an interesting paper. I think they did a decent job of explaining the algorithms and their experimental results. I would, however, have liked them to expand more on the motivation for their work. They spend a lot of time talking about how BGP works and what each of the different relationships are. However, they only briefly mention several use cases for the augmented graph representation they present. And, with those few examples, I find it hard to understand why any ISP would need to know the entire representation if they wanted to find out information about a specific ISP. If it's just one ISP, can't they just ask them?

On Power-Law Relationships of the Internet Topology

Summary

This paper characterizes the structure of the Internet between Nov. 1997 and Dec. 1998. They authors find that power-laws fit the topology really well. The power-laws that they generate could be used to understand more characteristics of the network, such as average neighborhood size, and can be used to help design protocols.

During the time they studied, the internet by 45%. Unless this time was a transition phase for the Internet, the authors believe the power laws they came up with should continue to hold in the future.

Some characteristics of the Internet during the time of their study are:
  • Nov. 1997: 3015 nodes, 5156 edges, 3.42 avg. outdegree
  • April 1998: 3530 nodes, 6432 edges, 3.65 avg outdegree
  • Dec. 1998: 4389 nodes, 8256 edges, 3.76 avg outdegree
They propose 3 power-laws:
  • Power-Law 1 (rank exponent): The outdegree, dv, of a node v, is proportional to the rank of the node, rv, to the power of a constant, R.
  • Power-Law 2 (outdegree exponent): The frequency, fd, of an outdegree, d, is proportional to the outdegree to the power of a constant, O
  • Power-Law 3 (eigen exponent): The eigenvalues, lambda_i, of a graph are proportional to the order, i, to the power of a constant.
Finally, they suggest several practical applications of these power-laws, including assessment of realism of synthetic graphs, analyze worst-case behavior of protocols, and can help answer questions in different scenarios.

Criticisms & Questions

This was an interesting paper, especially since it was one of the first to characterize the topology of the Internet. Seeing as this was a decade ago, I am very curious to find out of these power-laws still hold. Since the way the internet is used has changed in the last 10 years, I would be really surprised if they were still true.

The authors mention several practical applications for the power-laws. I wonder if anyone has used these power-laws to inform any future research.

Tuesday, October 20, 2009

Interactive WiFi Connectivity for Moving Vehicles

Summary

This paper presents an evaluation of 6 different handoff protocols for WiFi in moving vehicles. It also proposes ViFi, a protocol that exploits basestation diversity to minimize disruptions and support interactive applications for mobile clients.

For the evaluation, they test these protocols in a 2-month long experiments on 2 platforms. One is VanLAN, which consists of 11 basestations and 2 vehicles used for shuttle service around the Microsoft campus in Redmond. The other consists of vehicles in Amherst. One of these vehicles logs all the beacons heard from all the surrounding BSes.

They test the following 2 applications: VoIP and short TCP transfers.

When evaluating the different handoff protocols, they use 2 metrics:
1. aggregate performance: considers total number of packets delivered and the total time or distance over which the vehicle is connected.
2. periods of uninterrupted connectivity: measures contiguous time intervals when the performance of the application is above a certain threshold.

The 6 handoff policies evaluated are:
1. RSSI
2. BRR
3. Sticky
4. History
5. BestBS
6. AllBSes

When testing performance related to BS density, they found that all protocols' performance increases with density but that relative performance of protocols stays the same throughout. And, AllBSes is found to have the best performance while History, RSSI and BRR are found to perform similar in most cases.

When looking at uninterrupted connectivity results, they found that AllBSes had the fewest regions of inadequate connectivity (because it uses multiple BSes), while BestBS had a few more and BRR had the most.

Then, the authors explain the design and implementation of ViFi. ViFi, motivated partially by AllBSes, leverages BS diversity to reduce disruptions. ViFi designates one of the BSes as the anchor, which is responsible for connecting to the internet. The other BSes are designated as auxiliary.

The authors evaluate ViFi and find that the link-layer performance of ViFi is close to ideal. ViFi has a two-fold application performance improvement over current handoff methods.

Criticism & Questions

I liked reading this paper. It did a great job of evaluating the current protocols and then building off one of those protocols to come up with ViFi. They also had a solid implementation and evaluation of ViFi.

What I am most curious about is the practical application of WiFi connectivity in moving vehicles. They motivate their work by mentioning the growing need and opportunity. I would have liked them to expand more on that and possible current use cases of this technology.

In addition, I would like to know more about what the topology of cities that are covered by WiFi are. When I think about the possible uses for interactive WiFi, I would imagine it would be most useful in longer trips, which generally means crossing multiple cities. If wifi is unavailable in many the cities along the path, that would impact the practical application of this technology.

White Space Networking with Wi-Fi like Connectivity

Summary

This paper proposes the first network prototype demonstrating the feasibility of WiFi-like networking over UHF white spaces.

The authors contribute a new spectrum assignment algorithm that determines variable bandwidth during communication, a new AP discovery mechanism using a new technique called SIFT, and a novel method for seamlessly handling unexpected disconnections.

There are several properties of the UHF white space spectrum that pose problems to networking. First is the spatial spectrum variation, which implies that the AP must get information from clients before deciding the channel to operate on. The second is spectrum fragmentation, which results in several problems related to choosing a channel. And, finally temporal variation, which is due to interference from wireless mics.

The authors built the KNOWS hardware platform to address some of the problems with white space networking. The 2 main features it adds are Variable Channel Widths and Signal Inspection before Fourier Transform (SIFT). Variable channel widths allows the channel bandwidth to be greater than 5 MHz. SIFT addresses the problems with intermittent data transmissions and that transmissions can be sent over multiple channel widths.

They describe the details of the spectrum assignment algorithm which takes an adaptive approach to choosing a channel and periodically re-evaluating this choice based on feedback from the clients.

Finally, they perform an evaluation of WhiteFi, using both simulations and experiments on the prototype. They found that SIFT for pretty accurate, with a worst case loss of 2%. When testing SIFT with low signal strengths, they found that it performs well until it hits a threshold of 96 dB. They evaluated the L-SIFT and J-SIFT to discover APs and found that L-SIFT outperforms J-SIFT for narrow white spaces while J-SIFT is much more efficient for wider white spaces and both outperform the baseline algorithm. They also tried this in more realistic settings and found that J-SIFT still outperformed the baseline by 34%. They found that it took about 4 seconds to reconnect following a disconnection. Simulations of the spectrum assignment algorithm indicate that it yields a reasonably accurate prediction.

Criticism & Questions

I think this was an interesting paper. I was not too familiar with white space networking, so this was a good introduction, especially with all the background information they added about problems.

They explained the setup and results of their evaluations very clearly for the most part. However, when they were talking about handling disconnections, they didn't spend much time on that, only briefly mentioning that there was a lag of at most 4 seconds before reconnection. Without any baseline value to compare that to, I'm unsure of how much of a performance gain that is.

Thursday, October 15, 2009

XORs in The Air: Practical Wireless Network Coding

Summary

This paper proposes COPE, a new architecture for wireless mesh networks. It uses packet coding in order to reduce the number of required transmissions. Like ExOR, it takes advantage of the broadcast nature of the wireless channel.

The packet coding algorithm makes sure to never delay a packet in order to code a packet. It gives preference to XORing packets of similar length. It will never XOR packets headed to the same nexthop. It tries to limit reordering packets from the same flow. Finally, it ensures that each neighbor who gets a coded packet has a high probability of being able to decode it.

They use "coding gain" as the performance metric for COPE. Coding game is the ratio of the number of transmissions required by the current non-coding approach, to the minimum number of transmissions used b COPE to deliver the same set of packets.

COPE was implemented on a 20-node wireless testbed over 3 topologies. It was found to have about a 1.33 coding gain. The cross topology over TCP was found to be a little lower than expected which they attributed to header overhead, imperfect overhearing and an asymmetry in throughputs of the 4 flows.

Finally, they mention fairness and claim that when packets are coded, increasing fairness will increase the overall throughput of the network.

Criticism & Questions

I thought this paper was interesting, definitely another interesting way to make use of the broadcast nature of wireless. I'm not sure if COPE could be used together with ExOR, maybe if ExOR were modified to accommodate multiple flows at once. But, if not, I would really like to see a side-by-side comparison of the two to determine which has greater performance.

ExOR: Opportunistic Multi-Hop Routing for Wireless Networks

Summary

This paper proposes ExOR, an integrated routing and MAC technique that uses cooperative diversity to increase throughput of large unicast transfers in multi-hop wireless networks.

ExOR takes advantage of broadcast transmission to send data to multiple nodes at once. The furthest (to source and closest to destination) node that heard the broadcast will then broadcast it out again, until it reaches the destination. This results in a higher throughput because ExOR makes use of transmissions that reach unexpectedly far or fall short. It also increases network capacity since it needs fewer retransmissions than traditional routing.

Data is sent in batches and is stored by each intermediate node until all the data is sent before deciding which node is closest to destination. ExOR schedules when each node sends its fragments so that only one node sends at a time. Once 90% of the packets have been sent, the remaining 10% get sent using traditional routing.

ExOR was implemented on Roofnet. It was found to outperform traditional routing by a factor of 2. It was also found to have the most performance gains on the longest routes.

Criticism & Questions

I think cooperative diversity routing is a very interesting idea. I think it makes sense intuitively why it would have good performance. I am however concerned about its incompatibility with regular TCP, a hindrance to its use in most practical settings. In addition, when doing the experiment, the authors spend about 10 minutes initially just setting up the nodes and then every 20 minutes, stop the experiment to update their link loss measurements. I'm not sure how practical that is and how it would affect overall throughput.

Thursday, October 8, 2009

A Performance Comparison of Multi-Hop Wireless Ad Hoc Network Routing Protocols

Summary

This paper presents the results of a performance comparison between 4 multi-hop wireless ad hoc network routing protocols.  This paper also proposed and implemented changes to ns so that it could be used to study multi-hop ad hoc networks.

The 4 protocols that were tested were:
  • Destination-Sequenced Distance Vector (DSDV)
  • Temporally-Ordered Routing Algorithms
  • Dynamic Source Routing (DSR)
  • Ad Hoc On-Demand Distance Vector (AODV)
The changes that were made to ns included a model of the physical and data link layer.  The link layer implements the MAC protocol.    There was also an implementation of ARP to resolve IP address to link layer addresses.  Finally, each protocol has a packet buffer of max size 50 packets.

The authors ran a simulation of 50 nodes on ns to see how each protocol would react to changes in the network topology.  The simulation was run using 3 different CBR sources in order to approximate 3 different data sending rates.

They chose the following 3 metrics to evaluate the simulation results:
  • packet delivery rate
  • routing overhead
  • path optimality
When evaluating path delivery rate, DSR and AODV-LL were found to have the best performance at all pause times.  TORA performs a little worse, and while DSDV-SQ performs better than TORA when pause time is over 300 ms, it fails to converge at pause times below 300 ms.

When evaluating routing overhead, DSR was found to have the lowest number of overhead packets while AODV-LL was found to have a slightly higher number of overhead packets.  However, when looking at the number of overhead bytes, AODV-LL actually has the smallest number of total overhead bytes.

When evaluating path optimality, DSR and DSDV-SQ were found to use routes close to the optimal, while AODV-LL and TORA used routes that were sometimes 4 hops longer than optimal.

Criticism & Questions

I think this paper was interesting to read.  The simulation results do suggest many differences between the 4 protocols and would be informative to those choosing between them.  However, since these are purely simulation results, I wonder how accurate they are.  One of the features of ad-hoc networks is their unpredictability, and without doing a real deployment, I'm not sure how accurately you can test the various protocols.

The authors didn't include a future works section, but I would really like to see the results of a real deployment as the next step.

A High-Throughput Path Metric for Multi-Hop Wireless Routing

Summary

This paper proposes a new metric to use when finding the highest-throughput path on multi-hope wireless routing.  This new metric, called Expected Trasmission Count, uses link loss ratios, asymmetry in loss ratios and interference between successive links in a path to find the path that requires the lowest number of transmissions, including retransmissions, to successfully deliver the packet to its destination.

The typical metric currently used in multi-hop wireless routing is minimum hop-count.  The authors first determined the performance of minimum-hop-count routing using a 29-node wireless testbed.  They used both the DSDV and DSR protocols.  They found that minimum-hop-count works well when the shortest route is also the fastest one.  However, when it has a choice among a number of multi-hop routes, it usually picks a much slower than optimal path.

Next, the authors explain, implement and evaluate ETX.  The ETX metric measures the number of transmissions required to send a packet to its destination. 

ETX = 1 / (df * dr) 

where df = forward delivery ratio and dr = reverse delivery ratio.

When testing ETX with DSDV, ETX was found to perform better than DSDV when the packet is being sent on a high loss ratio link and the best path is multi-hop.  Some part of ETX's improvement is due to avoiding extremely asymmetric links, even better than the handshaking scheme with the same goal.

ETX was tested with DSR with its link-layer feedback turned on and off.  When it was turned off, ETX showed a significant improvement in initial route selection.  However, when feedback was turned on, ETX showed only a slight improvement to some pairs of nodes.

One drawback to ETX is that it uses a fixed packet size to make its calculations.  However, it was shown that packet size has a significant effect on delivery ratio.  Because of this, ETX tends to underestimates the delivery ratio of ACK packets, resulting in a overestimation of the number of required transmissions.

Criticism & Questions

I enjoyed reading this paper.  I think they had very sound methodology, especially in testing minimum-hop-count so they could have a fair comparison.  Since there are at least 2 more ad-hoc multi-hop routing protocols (which we read about in the next paper), I would like to see the effect that ETX would have on those protocols as well.  One complaint about the paper is that the graphs were really hard to read and really hard to differentiate between the various lines.

Tuesday, October 6, 2009

Architecture and Evaluation of an Unplanned 802.11b Mesh Network

Summary

This paper evaluated the architecture and performance of Roofnet, an unplanned 802.11b Mesh Network. It explored the effect of node density on connectivity and throughput, the links that the routing protocol uses, performance of a highly connected mesh and comparison to a single-hop network using the same nodes as Roofnet.

Roofnet is primarily characterized by the following design decisions:
  • unconstrained node placement
  • omni-directional antennas
  • multi-hop routing
  • optimization of routing for throughput in a slowly-changing network
Roofnet is organized such that a few nodes act as gateways to the internet. Each node needs a Roofnet address as well as an IP address, both of which it obtains automatically. Roofnet's routing protocol, Srcr makes use of Dijkstra's algorithm to find the highest-throughput route between any pair of nodes. Srcr also makes use of an "estimated transmission time" to chose routes. Finally, Roofnet has its own algorithm to choose the optimal transmit bit-rate.

Then, the authors use 4 sets of measurements on Roofnet to evaluate many characteristics of Roofnet.

The authors conclude that Roofnet works. Throughput and latency are comparable to that of a DSL link. The average throughput was measured to be 627 kbits/second. Roofnet is shown to be robust as it does not depend on a small number of nodes. It also shown that multi-hop forwarding performs better than single-hop forwarding.

Criticism & Questions

This paper was very well organized and easy to follow. Their evaluation considered many characteristics and they made sound arguments when analyzing their data. The charts and graphs were very helpful in understanding their points.

Feedback

I would vote to keep this paper in the syllabus.

Modeling Wireless Links for Transport Protocols

Summary

This paper puts forth a model to evaluate the performance of wireless links. The authors argue that this is especially useful when trying to decide if link layer protocols should have knowledge of the transport layer and use that knowledge when deciding what to do or if transport protocols should be redesigned for better performance on current wireless links.

This paper considers 3 main classes of wireless links: wireless LANs, wide-area cellular links and satellite links. The authors go into detail on the essential aspects of the model: types of wireless links in each class, topologies most common in each class and the traffic patterns in each class.

The performance metrics used for this experiment are throughput, delay, fairness, dynamics and goodput.

Then, the authors explain that current models are inadequate because they are either unrealistic, realistic but explore only a small part of the parameter space, overly realistic or lacking reproducibility.

The authors then choose several link characteristics and for each, explain what the current state is and how to model it:
  • error losses and corruption: not a big concern because of FEC and link layer retransmissions. Can be modeled by dropping packets on a per-packet, per-bit or time-based loss probability
  • delay variation: delay spikes can cause spurious timeouts, causing retransmissions. Can be modeled by suspending data transmission on the link.
  • packet reordering: reordering is not widely enabled in practice. Can be modeled by swapping packets or delaying one packet for a given time.
  • on-demand resource allocation
  • bandwidth variation
  • asymmetry in bandwidth and latency
  • queue management
  • effects of mobility
Criticism & Questions

I enjoyed this paper. It was well organized and easy to follow. I especially liked how they organized the various link characteristics to consider by explaining what the current state of each was as well as how to model that characteristic.

I am very curious to know if this model caught on or is being used by any networking researchers in the present time.

Thursday, October 1, 2009

A Comparison of Mechanisms for Improving TCP Performance over Wireless Links

Summary

This paper presents an evaluation of several schemes to improve the performance of TCP over lossy links, such as wireless. The authors classify the schemes into the following 3 categories:

1. end-to-end protocols: have the sender detect and handle losses using techniques such as SACKs and ELN
2. link-layer protocols: hide link-layer losses from the transport layer and handle them in the link-layer instead using techniques such as local retransmissions and forward error correction.
3. split-connection protocols: terminate the TCP connection at the station so that the sender is not aware of the wireless link at the end. Use a different protocol from the station to the receiver that deals with losses.

After evaluating the various schemes they got the following results: the enhanced link-layer scheme that has knowledge of the TCP protocol and uses SACKs works much better than a simple link-layer retransmission scheme. Out of the various end-to-end protocols, selective acknowledgements was found to be better than partial acknowledgements or ELN, but not better than the enhanced link-layer scheme. The split-connection protocol is worse than the two schemes above, and shows that a split-connection is not necessary to get optimal performance.

Overall, the link-layer scheme that was TCP aware and uses SACKs was found to be the best scheme.

Criticism & Questions

I would be very interested in finding out which schemes, if any, are currently used in lossy networks. They touch on this a little, but I would like to learn more about the practicality of each of the schemes and have that be one of the criteria considered when choosing the optimal scheme.

Feedback

I enjoyed reading this paper. It was great to learn about the various schemes available to improve performance in lossy links. I think the authors did a great job at explaining each of the schemes and why one was better than the other. I vote to keep this paper in the syllabus.

Wednesday, September 30, 2009

MACAW: A Media Access Protocol for Wireless LAN's

Summary

This paper presents a new protocol for wireless LAN's: MACAW. MACAW builds upon the MACA media access protocols by adding 2 new messages: Data-Sending and ACK resulting in a RTS-CTS-DS-DATA-ACK message exchange. MACAW also uses a different backoff algorithm.

The authors base their proposal on 4 observations of MACA proposal. 1) contention occurs at the receiver, not the sender 2) congestion is not homogenous, 3) all stations must work together to learn about the congestion levels and 4) the protocol should send info about contention periods to all devices so they can respond accordingly.

To make the use of bandwidth more fair, the authors propose a new backoff algorithm. In this algorithm, each of the stations shares their current value of the backoff counter so that all stations have the same backoff counter, resulting in fair allocation. Using multiple streams instead of single streams is another method to achieving fair allocation.

The authors add an ACK message which allows the receiver to tell the sender that they have received all the data. To solve the "exposed terminal" problem, they introduce the DS message which tells all nearby stations that a data transmission is about to occur. RRTS (Request-for-Request-to-Send) messages are also used to give all streams fair access to the media.

When evaluating MACAW, the authors found it to have an overhead of approximately 8% higher than MACA. However, it also has a throughput that is 37% higher than MACA, which more than makes up for the higher overhead.

Criticism & Questions

Although it wasn't highly critical to the protocol, I would be interested in knowing how they plan to change the packet header to accomodate the backoff counter value required for their backoff algorithm.

In addition, their backoff algorithm requires each station to copy over the backoff counter that they see in other stations' packets. I'm curious about how this could be misused. It doesn't seem like there's anyway to prevent a station from putting in a false value for other stations to use so that it can have an unfair advantage.

Feedback

I really enjoyed reading this paper. I think it was very well organized and easy to follow. I vote to keep it in the syllabus.

Tuesday, September 29, 2009

Understanding TCP Incast Throughput Collapse in Datacenter Networks

Summary

This paper also explored the Incast problem. The authors aim to understand the dynamics of Incast, propose an analytical model for Incast symptoms and evaluate different solutions to Incast.

The authors put forth some criteria required in a solution to Incast: (1) solution must be general and not for any particular environment, (2) an analytical model that can explain the observed Incast symptoms and (3) modifications to TCP that help improve the problem.

The authors first replicated the Incast problem in their environment. Then, they started with decreasing the minimum RTO timer value. They tried out 2 suggestions made by the authors of the previous paper: use high resolution timers and turn off delayed ACK's where possible. After trying out various combinations of the 2, they found that a low resolution RTO timer of 1 ms with delayed ACK's turned on was optimal.

Next, they propose an analytical model that accounts for the shape of the intial product and the "recovery" portion but is still missing the second order goodput decrease and doesn't predict the curves for small minimum RTO values.

Criticism & Questions

I enjoyed reading this paper. It was laid out well and easy to follow. I especially liked the graphs that were included - they were very helpful in understanding the results. One comment on the graphs is that some of the graphs were a little hard to read, mostly because the lines were very close and the symbols were hard to differentiate.

Safe and Effective Fine-grained TCP Retransmissions for Datacenter Communication

Summary

This paper explores the Incast Problem which is generally seen in TCP workloads in datacenter networks. The incast problem occurs when a client simultaneously requests data from many servers and can't continue until it receives all the data. All the servers respond at the same time, filling the client's inbound link, reducing overall throughput drastically.

This paper focuses on using high-resolution timers to enable more granular TCP timeouts. The authors test their solution in simulations and real-world experiments. Finally, they show that this solution is safe for wide area networks.

While explaining the background of the incast problem, the authors outline 3 main features that generally lead to TCP collapse: (1) high bandwidth, low latency networks with small switch buffers (2) clients that issue barrier-synchronized requests in parallel and doesn't issue next request until all come back from previous request and (3) servers that return small amount of data each request.

For current-day datacenters, the authors find that reducing the RTO to at least 1 ms will avoid degradation when having up to 16 servers. For next-generation datacenters, they suggest adding a randomized component to the RTO.

Finally, they investigate how well their solution would work in the wide area network. They conclude that reducing RTO to 200 microseconds doesn't affect TCP performance in the wide area network.

Criticism & Questions

I think this paper did a good job of explaining the incast problem and especially the preconditions required to go into TCP incast collapse. I think they also did a good job exploring how changes in RTO affects TCP performance. I would have liked to see more information on what else could be changed to fix the incast problem. This paper makes it seem like the timer was the biggest problem, but I'm unsure if that's the case, especially after reading the second paper.

Tuesday, September 22, 2009

VL2: A Scalable and Flexible Data Center Network

Summary

This paper presents a network architecture that supports dynamic resource allocation across many servers. They aim to achieve uniform high capacity, performance isolation and layer-2 semantics. They achieve these goals by using flat addressing, load balancing and address resolution at the end-system.

The authors do an initial analysis of current data center traffic. Then, they built VL2 and ran experiments to evaluate how it performs based on the 3 criteria listed above. They find that VL2 does achieve their goals of high capacity, loading balancing fairness, isolation and convergence following link failures.

Criticism & Questions

I liked how this paper was structured. It was very easy to follow their thought process as they did their study. I also liked that they studied current traffic patterns in datacenters. It was good to have a baseline to compare with when reading this paper.

PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric

Summary

This paper discussed layer 2 protocols that would be optimal for data centers. The goals they try to achieve are i) VM's should be easily migratable without changing IP addresses, ii) no admin configuration required, iii) every node should be able to communicate with every other node, iv) no forwarding loops and v) recover from faults quickly.

PortLand uses a centralized fabric manager that maintains soft state about the network. Nodes uses a psuedo MAC address instead of their actual MAC address. The fabric manager is also able to reduce the overhead that results from broadcasting ARPs.

The authors compare PortLand to 2 other architectures with similar goals: TRILL and SEATTLE. Finally, they run experiments on their testbed to evaluate PortLand using the following criteria: convergence time with increasing faults, tcp convergence, multicast convergence, scalability and VM Migration.

Criticism & Questions

The authors do several experiments to evaluate the criteria given above and present those results quantitatively. However, it would be helpful to know how those numbers compare to what currently happens (ie. they say that after vm migration, communication resumes after 32 seconds - how does this compare to current vm migration?)

Thursday, September 17, 2009

Floodless in SEATTLE: A Scalable Ethernet Architecture for Large Enterprises

Summary

This paper puts forth a new network architecture, SEATTLE, that combines the simplicity of Ethernet and the scalability of IP. SEATTLE achieves this with 3 main features. First, it uses a one-hop, network layer DHT, which doesn't require hosts to flood the network to disseminate host locations and instead has a global switch-level view that stores the location of each host. Second, it uses a traffic-driven location resolution and caching, which helps packets travel the shortest path to their destination. Finally, it uses a scalable, prompt cache-update protocol which uses unicast to inform other switches when a hash table entry is invalid, leading to faster reaction times.

The authors run several simulations using real traces supplemented with synthetic data. They use 4 topologies of various sizes. They find that SEATTLE uses much smaller tables, has 1000 times less control overhead, a smaller stretch than ROFL and reacts faster to network changes and host mobility. SEATTLE is also tested in a real deployment and the results were found to be comparable to those in the simulation.

Criticism & Questions

I think this was paper was well-written and conveyed the features included in SEATTLE well. I think this network architecture has real potential and would be interested in seeing how it performs when implemented in a more wide-scale deployment. As the authors mention, it would also be interesting to see how this architecture could be improved. One thing I would have liked to see is more information on the issues that would come up when implementing SEATTLE in practice.

Detailed Diagnosis in Enterprise Networks

Summary

The paper describes a new diagnostic system that provides operators more details on what occurred. The goals of the system are to get the diagnostic information without knowing anything about the applications as well as to use inference to see which entities in the network are impacting each other.

They do an analysis of several example problems and classify them into 3 categories: what was impacted, symptom and identified cause. Their inference model uses history to gauge the current impact. They use a strategy similar to probabilistic inference [ie. Prob (D = Dnow | S = Snow) and make the assumption that conditional probability affects causality.

The system, NetMedic, is composed of 3 main functions: capture current component state, generate dependency graph, and diagnose using the component state and dependency graph.

They evaluate NetMedic in many different situations and on many different faults. NetMedic performed well. It identified the correct component 80% of the time and the number is only slightly lower for simultaneously occurring faults.

Criticism & Questions

I enjoyed this paper. The authored laid our their assumptions initially, suggesting how those could affect their results. They did evaluate NetMedic in many different situations, but unfortunately they weren't able to test it on real faults and had to make do with faults they injected. I would be interested in seeing how this would perform when met with real faults, happening in real time (ie. many simultaneous and overlapping faults occurring)

Tuesday, September 15, 2009

Congestion Control for High Bandwidth-Delay Product Networks

Summary

This paper addresses the problems resulting from increase in per-flow bandwidth and high latency links, and proposes a solution to this problem.

As more high-bandwidth links and low-latency satellite links get added to the internet, TCP becomes more and more inefficient, no matter what queueing scheme is used. The authors put forth a new protocol called Explicit Control Protocol (XCP), which they've found to outperform TCP in both high-bandwidth, low-latency situations as well as conventional situations. The authors also argue that XCP achieves fair bandwidth allocation, short queues, high utilization and near-zero packet drops. One major feature of XCP is that the controllers for fairness and efficiency are decoupled, allowing each controller to optimize for the one area.

XCP does not use a binary feedback mechanism to signal congestion. Instead, it adds a feedback field in the header, which holds a number that indicates how much the sender should slow down, based on the bottleneck link through the network. XCP routers need to compute the feedback so as to optimize efficiency and min-max fairness.

The authors run several simulations to test XCP performance against TCP Reno. It uses the following queueing protocols: random early discard, random early marking, adaptive virtual queue, and core stateless fair queueing. Their simulations confirm that XCP has high utilization and fairness. It has short queues and near-zero packet drops. Over all the queueing protocols, XCP always performed at least as well as TCP.

Criticism and Questions

I thought this paper was interesting. It's one of the few papers that tries to rebuild a protocol from scratch, which definitely has its own problems. Generally, those problems surround the practicality of the protocol. I liked how the authors stated their assumptions upfront, but also addressed the practicality of deploying a protocol using a new packet format, and provided suggestions on how to do so. One thing I am still wondering about is what the impact would be on the sources to use and understand the XCP packet format.

Random Early Detection Gateways for Congestion Avoidance

Summary

This paper explains how Random Early Detection can be used in gateways for congestion avoidance. RED is based on the idea that gateways are the best places to detect congestion. It has the best view and can detect the difference between propagation delay and persistent queueing delay.

RED uses average queue size to detect congestion just as it begins to appear. The RED gateway would then notify the sender of this congestion by marking the packet (either by dropping the packet or setting a bit in the header). The gateway decides which packet to randomly mark based on a certain probability. With this strategy, RED can ensure that a sender is told to reduce its window size proportional to its share of the bandwidth. Additional features of RED are that it is able to keep the average queue size low, accommodate the occasional burst of traffic and avoid global synchronization.

When evaluating RED gateways, the authors used 7 criteria. These included congestion avoidance, appropriate time scales, no global synchronization, simplicity, fairness and appropriate for a range of environments. They explained how RED met each of those criteria.

Criticism and Questions

The authors mention that fairness is inherently built into the RED protocol. It cannot ensure that all users get the same amount of bandwidth. It is mentioned that it is possible to add in more mechanisms that implement fairness. I think it would be interesting to see the result of adding in those mechanisms. Fairness has been one of the major goals of the network and would help determine if a protocol could be used in practice. Therefore, it would nice to see RED's performance once the fairness mechanisms were added in.

Feedback

I enjoyed reading this paper. It laid out the basics of the RED protocol clearly and had some great simulations. I would say keep this paper in the syllabus as it shows another approach to congestion avoidance.

Wednesday, September 9, 2009

Core-Stateless Fair Queueing: Achieving Approximately Fair Bandwidth Allocations in High Speed Networks

Summary

This paper explores an alternative fair queueing algorithm that would require much less complexity in the routers. The authors are attempting to get the fairness that results from FQ and DRR without storing any state in the core routers. The authors propose a distributed algorithm, where the edge routers label the packets based on their arrival rate, etc and the core routers simply use FIFO along with the the labels to determine which packets to drop. This means that only the edge routers will be slowed down. The authors describe in detail several algorithms to calculate the flow arrival rate, link fair rate and label values as well as an alternative parameter to support flows with different weights. They use performance bounds to determine that ill-behaved users can only exploit the system for short amounts of time.

They run several simulations with the CSFQ, comparing it to several other algorithms. Two of the algorithms, FIFO and RED, are used as the baseline with no fairness attempted. The other two algorithms, FRED and DRR, represent two approaches to fairness. DRR is the optimal algorithm for fairness. The results of the simulations are that CSFQ achieves a reasonable amount of fairness. CSFQ is much better than FIFO and RED. It is equivalent to FRED in many cases but has the advantage of requiring less complexity. It's generally worse than DRR.

The authors finish up with an explanation of the unfriendly flow problem and two approaches to solve it as well as how to punish ill-behaved hosts.

Criticism & Questions

A large part of the CSFQ algorithm involves the edge routers labeling the packets, however they don't explain where in the packet header they plan to put the label.

When looking at the weighted CSFQ, they say that the algorithm can't accommodate flows with different weights within an island. It would be good to know how often this occurs and whether or not it's a practical assumption (especially because they talk about an entire ISP being one island).

When explaining the unfriendly flow problem, they mention that the percentage of noncooperative hosts on the on the network is unknown. Many of the congestion control and flow control algorithms are based on the assumption that there are many noncoopertive hosts in the network - it would be very beneficial to be able to quantify this.

Analysis and Simulation of a Fair Queueing Algorithm

Summary

This paper talks about a fair queueing algorithm that is based on Nagle's fair queueing (FQ) algorithm. The authors set forth 3 quantities that they want to be able to measure and optimize: bandwidth, promptness and buffer space. The main requirement they want to meet is fair bandwidth and buffer space allocation. They use the max-min criterion to define fairness. They also define what a user could be and decide that a user will be a source-destination pair.

They describe the algorithm in detail, which is trying to simulate the bit-by-bit round-robin algorithm. They claim the algorithm they describe asymptotically approaches the fair bandwidth allocation of the bit-by-bit round-robin algorithm. They also describe an addition to the algorithm that gives users who use less than their share of the bandwidth more promptness. And, when the queue becomes full, the last packet of the user with the highest amount of buffer space is dropped, which essentially penalizes ill-behaved hosts.

Finally, the authors run several simulations to compare FQ to FCFS using various flow control algorithms. When running the simulations, they looked at 4 performance criterion: total throughput, average rtt of the packets, number of packet retransmissions and number of dropped packets. The results showed that FQ works the best when paired with the JK flow control algorithm, giving end-hosts an incentive to implement more intelligent flow control algorithms.

Criticism & Questions

When choosing their definition of user, they mention the drawback that comes from an ill-behaved host opening up multiple connections with different destinations in order to congest the network. However, they didn't try this scenario out in the simulations - that would have been interesting to see.

They also put forth several alternative definitions to user and mention that the definition can be easily changed in their algorithm. Their algorithm would be more compelling if they could run their simulations with all the types of users to show that it works equally well for all of them or how it works better or worse for some definitions of user.

Tuesday, September 8, 2009

Congestion Avoidance and Control

Summary

This paper provides some simple algorithms to deal with congestion avoidance and control. It also gives some examples of what happens with and without those algorithms. Specifically, the authors put forth 7 new algorithms: RTT variance estimation, exponential retransmit timer backoff, slow-start, more aggressive receiver ack policy and dynamic window sizing on congestion. They determined that the main cause for congestion control is a violation of conversation of packets, which can be caused by (i) the connection failing to get to equilibrium, (ii) sender injecting a new packet before old one has exited and (iii) by being unable to reach equilibrium because of resource limits along the path.

To solve the first problem, they propose the slow-start algorithm. This algorithm increases the congestion window by 1 each time it gets an ack for new data. One the first problem is solved, the second problem is caused by a incorrect implementation of the retransmit timer. The authors say that not estimating the variation in RTT is usually the cause of an incorrect implementation and they provide some suggestions on how to improve it. They also state that exponential backoff is the best scheme for backoff after a retransmit. Finally, for the third problem, they suggest using a retransmit to indicate that the network is congested. Once the sender knows this, they should use the additive increase/multiplicative decrease algorithm to adjust their congestion window size.

Finally, the authors give information about their future work. It involves using the gateways to ensure fair sharing of the network capacity. They argue that the gateway should 'self-protect' against misbehaving hosts. They finish off by mentioning work on being able to detect congestion early because it can be fixed faster the earlier it is detected.

Criticisms & Questions

I liked that they talked about how to deal with uncooperative hosts and suggested gateways as a possible place to automatically moderate them. I would like to know more about this mechanism and if it's been implemented, whether or not it works in the real world.

I also liked that they point out that congestion avoidance and slow-start are 2 different algorithms with 2 different objectives. I was a little confused about that myself and was glad to have that cleared up.

Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks

Summary

The authors in this paper aim to find the "best" congestion avoidance algorithm. The criteria they use to measure an algorithm are efficiency, fairness, convergence, time and size of oscillations. Congestion avoidance algorithms aim to keep the load oscillating around the "knee". They try a set of algorithms which fall under "increase/decrease" algorithms using a binary feedback scheme. This means that the sender will only get feedback about whether the network is overloaded or not. If the sender gets feedback that the network is overloaded, the sender should decrease the packets its sending and similarly if the network is not overloaded. The 4 main combinations that the authors test are 1) additive increase, additive decrease 2) additive increase, multiplicative decrease 3) multiplicative increase, additive decrease and 4) multiplicative increase, multiplicative decrease. The authors then lay out a mechanism to determine if and when a control converges to efficiency. Then, they figure out which policy of the 4 listed above minimize convergence time and the oscillations that occur. They conclude that additive increase with multiplicative decrease is the optimal policy when considering efficiency and fairness.

Criticisms & Questions

One assumption that the authors make is that "all users receive the same feedback and react to it". I would like to know if this assumption holds in the real world. I'm curious to find out if senders are always required to follow congestion control feedback or if there are ways to bypass them. And, if it is possible to bypass them, are there mechanisms in the congestion control protocol that would be able to control these uncooperative senders?

Feedback

I really liked this paper. It was easy to read and very clear in laying out their plan and their logic.

Thursday, September 3, 2009

Understanding BGP Misconfiguration

Summary

This paper does a quantitative study and analysis of BGP misconfiguration errors. The authors spent 3 weeks analyzing routing table advertisements to track potential misconfiguration errors. After the 3 weeks, they used the final routing tables to poll a random host within each AS to see if it was reachable. If it wasn't reachable, they emailed the network operators to get more information about the misconfiguration.

Their study focused on 2 types of misconfigurations: origin and export. An origin misconfiguration is when an AS accidentally introduces a route into the global BGP tables. To test this, they looked at short-lived routes because they assumed that an operator would quickly correct configuration errors. An export misconfiguration is when an AS accidentally exports a route to a peer that violates the AS's policy. They use Gao's algorithm to infer the relationships between the ASes. Along with Gao's algorithm, they looked for short-lived paths that violated the "valley free" property.

Once they had the potential misconfigurations, they needed to confirm. They emailed the network operators that they could reach to ask if the incident was in fact a misconfiguration. They further confirmed that by trying to reach random hosts in each AS to see if the AS was still live.

In their results, they found that at least 72% of new routes seen in a day resulted from an origin misconfiguration. Out of those, only 4% resulted in a loss of connectivity. In terms of their duration, over half lasted less than 10 minutes and 80% were corrected within the hour.

Then, the authors explain the various causes of origin misconfiguration (initialization bugs, old configuration, hijacks, forgotten filter, etc.) and export misconfiguration (prefix based configuration, bad ACL).

Finally, the authors describe several solutions to mitigate the problem of misconfiguration. The first is improving the human interface design by using principles such as safe defaults and large edit distance between the wrong and correct version. The second is to provide the operators with a high level language to configure the routers rather than the error-prone low-level language. The third is to do more consistency checks on the data directly in the routers. The last is to extend the protocol like S-BGP does.

Criticisms & Questions

One of their statistics is that about 75% of all new prefix advertisements were results of misconfigurations. However, they say that only 1 in 25 affects connectivity. I'm curious about what happens to the remaining packets that aren't affecting connectivity. Are they simply being ignored by the router and not being inserted in the BGP table, or are they added in and just aren't affecting connectivity? I was fairly surprised that such a high fraction of packets are results of misconfigurations.

When talking about the causes of the misconfigurations, the authors focused on slips and mistakes, both of which are not malicious. However, I would like to know if and how many of the misconfigurations were caused by someone using it as an attack. It seems like an attacker or a shady operator could intentionally cause problems with these misconfigurations.

Feedback

I enjoyed reading the paper and the authors did a good job answering many of the concerns I had as I read the paper.

Interdomain Internet Routing

Summary

This paper focuses on BGP (Border Gateway Protocol) - both the technical details of the protocol as well as the policy that controls the flow of traffic.

BGP is the interdomain routing protocol, used by routers at the boundaries of ISP's to share routing information. What actually controls the flow of traffic are the policies put in place by AS's. An AS is owned by a commercial entity and determines its policies based largely on financial reasons as well as its customers' needs. Each AS uses its own Internal Gateway Protocol.

There are 2 types of inter-AS relationships: transit and peering. A transit relationship usually involves a financial settlement from one AS (the customer) to the other (the provider). A peering relationship, on the other hand, does not include a financial settlement. Instead, they are usually between business competitors and involve reciprocal agreements allowing each AS to access a subset of the other's routing tables.

An AS must decide which routes it will export to its neighboring AS's. Exporting a route means that it agrees to take all traffic to that destination IP address. Because an ISP agrees to provide its customers access to anywhere on the web, it will prioritize the exporting of routes to customers first. In addition, it will export all paths to its customers to the other AS's. An AS will also export some peering routes.

When deciding which routes to import, an AS will always prioritize a customer over a peer over a provider. All the import and export rules are stored in the LOCAL PREF.

BGP itself is a simple protocol, but is designed so that it can also implement policy. BGP was designed to meet 3 goals: scalability, policy and cooperation under competitive circumstances. A BGP router initially shares the subset of routes it wants to share with another BGP router. After that, it simply sends "update" and "withdraw" messages.

There are 2 types of BPG sessions: iBGP and eBPG. eBPG sessions are used between BPG routers in different ASes while iBPG sessions are between routers within the same AS. iBGP is used to share externally learned routes within the AS.

Then the paper provides the details for the various attributes in a BGP route announcement. It also goes into some of the security problems with BGP, giving a real-world example involving Pakistan Telecom and Youtube.

Criticisms & Questions

One thing that was unclear to me throughout the paper was the relationship between an AS and an ISP. I couldn't figure out if an ISP was made up several ASes, if an AS can cover several ISP's, if an ISP is an AS, etc. I would have liked it if the paper had made that point clear when it introduced ASes.

The paper went into detail about the "full-mesh" implementation of iBGP. It also offered route reflectors and confederation of BGP routers as 2 alternative implementations. I would like to know which of those implementations is most commonly used in practice? It also wasn't clear to me if the iBGP implementation is something that each AS chooses and so would have to be independent of any other AS's iBGP.

The problem with the lack of origin authentication really surprised me. It seems to me that "hijacking" routes is not very hard to do. And, if that is the case, why is it that more attackers are not taking advantage of it (possibly to launch a DOS attack)? And, are there any other measures being taken to protect against this vulnerability aside from the poorly-maintained registry?

Feedback

I would say definitely keep this paper in the syllabus. It helped me to understand a lot more about the reasoning behind the inter-AS policies. It's also nice to see the clear contrast between what's technically possible and ideal and what actually happens in practice.

Tuesday, September 1, 2009

The Design Philosophy of the DARPA Internet Protocols

Summary

This paper describes the reasoning that went into creating the major internet protocols. Many people know the "what" of the protocols; this paper explains the "why". The 2 main protocols explored in this paper are IP and TCP.

The author explains that the fundamental goal of the internet was to connect the many smaller networks that existed at the time. Using several assumptions and the knowledge they had of current networks, they came up with the following design: a network made up several interconnected networks that used packet-switched communication, connected together with store-and-forward gateways.

The author lists the 7 second level goals of the internet in order of importance, and stresses that this list strongly reflects its use at the time, which was in a military context. This explains why survivability was so high on the list and accountability so low.

The author goes on to explain each goal in depth. Survivability made it necessary for state to be stored somewhere in the network; the designers decided to take the "fate-sharing" approach and stored the state in the hosts. IP and TCP, which were originally designed to be combined, were split in order to be able to support many different types of services. This led to a datagram being the basic building block. The internet is successfully able to support many different network technologies because it makes a very minimal set of assumptions about what the internet will provide. Then, the paper goes over the remaining goals and explains how some of the main problems we have with the internet today are due to these goals being so low on the priority list and therefore not well implemented.

The paper emphasizes how a datagram is not a service in itself, but rather was designed to be a building block. It also includes a discussion on TCP and some of the issues due to counting by bytes rather than packets. The paper finishes off with an for a new building block for the "next generation of architecture" called flows, which would be able to better address the goals of resource management and accountability.

Criticisms & Questions

I thought it was interesting how the author mentions "the next generation of architecture". The way he talks about it, it almost seems like there was a plan to redesign and rebuild the internet from scratch. It would be really interesting to know if at that time people did believe it was possible to switch over to a new, redesigned internet.

It would also be really interesting to know what the metrics were when evaluating the various goals, particularly the second level goals. Most of these goals are very hard to measure so I would imagine that knowing whether the goal was achieved or to what extent it was achieved would be hard to determine.

Feedback

I really enjoyed reading this paper. It was great to get more insight into what the designers considerations were when designing the internet. It definitely teaches one to look at it from their perspective before criticizing the design of the Internet. I would agree to keep this paper in the syllabus.