Tuesday, September 29, 2009

Understanding TCP Incast Throughput Collapse in Datacenter Networks

Summary

This paper also explored the Incast problem. The authors aim to understand the dynamics of Incast, propose an analytical model for Incast symptoms and evaluate different solutions to Incast.

The authors put forth some criteria required in a solution to Incast: (1) solution must be general and not for any particular environment, (2) an analytical model that can explain the observed Incast symptoms and (3) modifications to TCP that help improve the problem.

The authors first replicated the Incast problem in their environment. Then, they started with decreasing the minimum RTO timer value. They tried out 2 suggestions made by the authors of the previous paper: use high resolution timers and turn off delayed ACK's where possible. After trying out various combinations of the 2, they found that a low resolution RTO timer of 1 ms with delayed ACK's turned on was optimal.

Next, they propose an analytical model that accounts for the shape of the intial product and the "recovery" portion but is still missing the second order goodput decrease and doesn't predict the curves for small minimum RTO values.

Criticism & Questions

I enjoyed reading this paper. It was laid out well and easy to follow. I especially liked the graphs that were included - they were very helpful in understanding the results. One comment on the graphs is that some of the graphs were a little hard to read, mostly because the lines were very close and the symbols were hard to differentiate.

No comments:

Post a Comment