Analysis of Transmission Control Protocol Incast over Large-scale HPC Clusters
Abstract
The lifecycle of large-scale applications executing on High-Performance Computing (HPC) clusters involves massive use of transmission control protocol (TCP) while performing orchestration for job completion on multiple compute resources. As the HPC clusters involve large local area network communication for distributing jobs over compute and data nodes, the core network fabric in cluster architecture faces heavy workloads of TCP sessions; causing more than average packet drop events. This results in the poor TCP throughput; thus reducing the overall performance indices of the cluster. In this article, we have analyzed the TCP behavior at nominal, average, and heavy transmission load in a cluster environment for assessing various alternatives to solve the problem. We have also analyzed the cumulative queuing behavior of multiple TCP sessions at the contention switch and used a fine-grained configuration at the network fabric to improve the TCP performance. The simulation results show that the smaller set of data flow suffers a significant throughput collapse. The performance of TCP variants tested indicates that the congestion control mechanism of these protocols plays a significant role in performance degradation and needs a scalable solution to improve TCP performance indices. In this paper, different versions of TCP are employed for an HPC compute cluster and data storage to cater to the TCP Incast problem and simple solutions are presented. It has been observed that none of the classical, as well as newer TCP variants, perform consistently under heavy fan-in workload but a better queue management system at the network fabric greatly simplifies the problem and improves the cluster performance.References
W. Chen, F. Ren, J. Xie, C. Lin, K. Yin and F. Baker, “Comprehensive understanding of TCP Incast problem”, IEEE Conf. Comp. Comm. (INFOCOM), Kowloon, Hong Kong, pp. 1688-1696, 2015.
Y. Chen, R. Griffith, J. Liu, R.H. Katz and A.D. Joseph, “Understanding TCP Incast Throughput Collapse in Datacenter Networks”, Proc. ACM work. Res. enter. net., Barcelona, Spain, 2009.
P. Sreekumari and J. Jung, “Transport protocols for data center networks: a survey of issues, solutions, and challenges”, Photo. Net. Comm., vol. 31, no. 1, pp. 112-128, 2015.
H. Wu, Z. Feng, C. Gu and Y. Zhang, “ICTCP: Incast congestion control for TCP in data-center networks”, IEEE/ACM Trans. Net. (ToN), vol. 21, no. 2, pp. 345-358, 2013.
L. Xu, K. Xu, Y. Jiang, F. Ren and H. Wang, “Throughput optimization of TCP Incast congestion control in large-scale data center networks”, Comp. Net., vol. 124, pp. 46-60, 2017.
J.T. Luo, J. Xu and J. Sun, “Modeling TCP Incast Issue in Data Center Networks and an Adaptive Application-Layer Solution”, J. Elect. Sci. Tech., vol. 16, no. 1, pp. 84-91, 2018.
Y. Xu, S. Shukla, Z. Guo, S. Liu, A.S. Tam, K. Xi and H.J. Chao, “RAPID: Avoiding TCP Incast Throughput Collapse in Public Clouds with Intelligent Packet Discarding”, IEEE J. select. are. comm., vol. 37, no. 8, pp. 1911-1923, 2019.
B. Thiruvenkatam and M. Mukeshkrishnan, “Optimizing data center network throughput by solving TCP Incast problem using k‐means algorithm”, Int. J. Comm. Sys., 2020.
S. Zou, J. Huang, J. Wang and T. He, “Flow-aware adaptive pacing to mitigate TCP incast in data center networks”, IEEE/ACM Trans. Net., pp.134-147, 2020.
K. Sasaki, M. Hanai, K. Miyazawa, A. Kobayashi, N. Oda and S. Yamaguchi, “TCP fairness among modern TCP congestion control algorithms including TCP BBR”, IEEE Int. conf. clou. Net. (CLOUDNET), pp. 1-4, 2018.
H. Wang, “Trade-off queuing delay and link utilization for solving buffer bloat”, ICT Exp., vol. 6, no. 4, pp. 269-272, 2020.
V. Arun and H. Balakrishnan, “Copa: Practical delay-based congestion control for the internet”, Proc. USENIX Symp. Net. Sys. Des. Imp. (NSDI), pp. 329-342, 2018.
H. Hisamatu, H. Ohsaki and M. Murata, “Modeling a heterogeneous network with TCP connections using fluid flow approximation and queuing theory”, Proc. Perf. Cont. Nex. Gen. Comm. Net., vol. 5244, 2003.
E. Altman, K. Avrachenkov, C. Barakat and R. Núñez-Queija, “State-dependent M/G/1 type queueing analysis for congestion control in data networks ”, Proc. IEEE INFOCOM Ann. Conf. Comp. Comm. Soc., vol. 3, pp. 1350-1359, 2001.
G. Raina and D. Wischik, “Buffer sizes for large multiplexers: TCP queuing theory and instability analysis”, Nex. Gener. Inter. Networks, pp. 173-180, 2005.
A. Dhamdhere, H. Jiang and C. Dovrolis, “Buffer Sizing for Congested Internet Links”, Proc. IEEE Ann. Conf. IEEE Comp. Comm. Soc., vol. 2, pp. 1072-1083, 2005.