Abstract.
In recent years we observe the rapid growth of large-scale analytics applications in a wide range of domains - from healthcare infrastructures to traffic management. The high volume of data that need to be processed has stimulated the development of special purpose frameworks which handle the data deluge by parallelizing data processing and concurrently using multiple computing nodes. These frameworks differentiate significantly in terms of the policies they follow to decompose their workloads into multiple tasks and also on the way they exploit the available computing resources. As a result, based on the framework that applications have been implemented in, we observe significant variations in their resource utilization and execution times. Therefore, determining the appropriate framework for executing a big data application is not trivial. In this work we propose Orion, a novel resource negotiator for cloud infrastructures that support multiple big data frameworks such as Apache Spark, Apache Flink and TensorFlow. More specifically, given an application, Orion determines the most appropriate framework to assign it to. Additionally, Orion reserves the required resources so that the application is able to meet its performance requirements. Our negotiator exploits state-of-the-art prediction techniques for estimating the application's execution time when it is assigned to a specific framework with varying configuration parameters and processing resources. Finally, our detailed experimental evaluation, using practical big data workloads on our local cluster, illustrates that our approach outperforms its competitors.
Bibtex Entry.
@inproceedings{zacheilas2018orion,
title={ORiON: Online ResOurce Negotiator for Multiple Big Data Analytics Frameworks},
author={Zacheilas, Nikos and Chalvantzis, Nikolaos and Konstantinou, Ioannis and Kalogeraki, Vana and Koziris, Nectarios},
booktitle={2018 IEEE International Conference on Autonomic Computing (ICAC)},
pages={11--20},
year={2018},
organization={IEEE}
}