Predictive and distributed routing balancing, an application-aware approach

CN Castillo and D Lugones and D Franco and E Luque and M Collier, 2013 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, 18, 179-188 (2013).

DOI: 10.1016/j.procs.2013.05.181

The interconnection design in computing clusters and data centers is expected to change significantly in the near future to sustain the increasing communication demand at controlled capitalization and operational cost. In particular, a shift from typical and expensive full-bisection bandwidth interconnects (which safely cover the worst communication cases) to application oriented designs (which may provide cost-efficient data movement at larger system scales) is devised in academic research and industry initiatives. Having information of communication dynamics of applications (e. g. repetitiveness, computing and communication phases, traffic pattern and bandwidth, etc.) allows for efficiently managing and provisioning of network resources at reduced cost. This paper presents an Application-Aware Predictive and Distributed Routing Balancing technique (PR-DRB), a new method that controls network inefficiencies based on communication patterns of applications and speculative routing, PR-DRB monitors increments in the communication latency and, then, dynamically re-distributes the network traffic over multiple paths (path expansion) to deal with load unbalances. Additionally, PR-DRB stores the number of paths used to balance the traffic (solution) and links it to the application's pattern that caused the unbalance (problem). This information allows PR-DRB to respond to similar situations in repetitive patterns, quickly converging to a stable solution. Evaluation results show latency and completion time reductions of up to 37% for experiments conducted on 64 nodes executing the NAS benchmarks and the Lammps application.

