Wednesday, November 4, 2009

DNS Performance and the Effectiveness of Caching

Summary

This paper presents an analysis of traces of DNS and TCP traffic from MIT and KAIST. The paper explores the performance as perceived by the client and the effectiveness of caching.

Their data comes from 3 traces: 2 from MIT and one from KAIST. Their data includes outgoing DNS queries and incoming responses and outgoing TCP connection start and end packets. Most of the answered A lookups were followed by a TCP connection; those that weren't were excluded from the analysis. Also, any TCP connections that were not proceeded by DNS queries were excluded.

Client-Perceived Performance:

Latency was found to be adversly affected by the number of referrals that take place in a DNS query. 70% of the referrals were found to have cache hits for an NS record in the local DNS cache, showing that caching NS records are really beneficial as they reduce the load on the root servers.

They found that there was a large number unanswered lookups. They break unanswered lookups into 3 categories: zero referrals, non-zero referrals, and loops. They suggest that the large number of unanswered queries is due to the persistent retransmission strategy used by name servers and the referral loops that exist between name servers. They found that over 99.9% of answered queries were done in at most 2 retransmissions while over 90% required no retransmissions. Therefore, they conclude that the DNS name servers are too persistent in their retry strategy.

10-42% of lookups result in a negative answer. The largest cause are inverse lookups for IP addresses with no inverse mappings. They also found that negative caching may not be so effective because it has a heavy tailed distribution. 15-27% of lookups to root name servers resulted in negative responses, most probably caused by incorrectly implemented or configured resolvers.

Effectiveness of Caching:
- How useful is it share DNS caching among several client machines?
- What is the impact of the value of TTL on caching effectiveness?

Previous work shows that web object popularity follows a Zipf-like distribution. NS records were found to have much higher TTLs than A records, resulting in many fewer DNS requests to the root or gTLD servers. If the NS records weren't cached, the load would on the root or gTLD servers would have increased by a factor of 5, making NS record caching critical to DNS scalability.

They found that more popular domain names have shorter TTLs, supporting the idea that DNS-based load-balancing is useful only for popular sites.

They determine that sharing cache across many clients would not be very helpful. Sites common across the clients are the most popular ones, resulting in some performance gain there. However, most of the sites accessed are usually only accessed by one client. Even if a site is accessed multiple times, it is usually from the same client accessing it successively, and the client's local cache would be adequate for that.

They find that increasing TTL values increases hit rates, but is only noticeable for TTL values less than 1000 seconds. This is because most cache hits are due to multiple successive accesses of the same server by the same client. This suggests that low TTL values on A records wouldn't harm performance much.

Criticism & Questions

Another interesting paper. I think it did a good job at evaluating many of the features that were explained in the first paper. Their traces seemed like a good representation of the real-world environment. Most of their conclusions were logical based on the data they presented.

Most of the graphs were helpful in understanding their data, but some were really hard to make out. One thing that was especially confusing was that Table II came before Table I.

No comments:

Post a Comment