Monday, November 2, 2009

Looking Up Data in P2P Systems

Summary

This paper explores peer to peer systems and looks at the Lookup Problem. The Lookup Problem can be stated as how to find a data item X given that it can be stored on a dynamic set of nodes.

There are 2 broad categories for solutions to this problem. The first are structured solutions. This includes maintaining a central database with filename-location mappings and a DNS-style hierarchical approach. Both of these have resilience problems since they store the data in a few critical nodes. The second category is symmetric schemes, in which every node is equally important to every other node. This category includes continually broadcasting to neighbors until a node has it in its database, which doesn't scale very well and using "superpeers" in a hierarchical structure, which again has resilience problems.

Most of the latest P2P lookup algorithms combine both of these categories to get structured symmetric algorithms which provide guarantees as well as resilience. All of these algorithms use a DHT. The general idea of the algorithm is that the lookup query is forwarded node to node, each time getting "closer" to the destination. "Closeness" can be measured in several ways: numeric differences between IDs, number of common prefix bits, bitwise XOR of the IDs.

Chord uses a skiplist-like routing, Pastry, Tapestry and Kademlia use tree-like routing and CAN uses d-dimensional routing. The authors conclude the paper with the following open questions: distance function, operation costs, fault tolerance and concurrent changes, proximity routing, malicious nodes, and indexing and keyword search.

Criticisms & Questions

I liked this paper. I think it did a good job at explaining the different lookup algorithms and the benefits and tradeoffs of each.

One interesting point they made about malicious nodes was that the worst a malicious node could do is "convincingly deny that data exists". This essentially becomes a DOS attack. Even if it is the worst case, isn't that still a cause for worry?

No comments:

Post a Comment