Thursday, November 19, 2009

A Reliable Multicast Framework for Light-weight Sessions and Application Level Framing

Summary

This paper describes Scalable Reliable Multicast (SRM), an application-level reliable multicast framework. SRM draws on principles found in ALF, LWS, group delivery model of IP multicast protocol, and core design principles of TCP/IP: best-effort delivery, reliability on an end-to-end basis, and adaptive parameters.

In regular TCP, the sender is responsible for ensuring reliable data delivery.  This requires receiving ACKs and NACKs from receivers and maintain reception state.  With multicast, this could cause the ACK implosion effect at the sender.  In addition, the IP multicast model imposes a level of indirection between the sender and receivers, making it very difficult for the sender to maintain state for all the receivers.

In the wb framework, receivers are responsible for reliability.  Once a sender sends out a packet to all the receivers, if a receiver detects that it's missing data, it will wait for a random time determined by its distance to the sender and then multicasts a repair request to the group.  This means any host that has the packet can multicast it to the group, and all hosts missing the packet will hear it and cancel their repair request timer.  This reduces the number of repair requests and thus a request and response implosion.

The authors simulate this framework on several different node topologies, including chain and star.  The found that the request/repair scheme worked well for bounded-degrees trees where ever node participates in the multicast, but works less well for a sparse session.  It also works less well when the bounded-degree is really high.

A suggested optimization is local recovery, which means setting a TTL on a repair request packet so that only the neighborhood affected by the loss sees it.

Criticisms & Questions

The idea proposed in this paper was simple, but very clever.  It made use of multicast properties to reduce the number of packets in the network, and could make use of even more properties to optimize it.  I think the simulations had good results, but I would like to see SRM implemented in a real-world environment and see how it performs.

No comments:

Post a Comment