Optimizing N-Tier Application Scalability in the Cloud: A Study of Soft Resource Allocation
We analyse the performance of an incentive scheme for two-hop DTNs in which a backlogged source proposes a fixed reward to the relays to deliver a message. Only one message at a time is proposed by the source. For a given message, only the first relay to deliver it gets the reward corresponding to this message thereby inducing a competition between the relays. The relays seek to maximize the expected reward for each message whereas the objective of the source is to satisfy a given constraint on the probability of message delivery. We show that the optimal policy of a relay is of threshold type: it accepts a message until a first threshold and then keeps the message until it either meets the destination or reaches the second threshold. Formulas for computing the thresholds as well as probability of message delivery are derived for a backlogged source.
Big Data systems (e.g., MapReduce, Hadoop, Spark) rely increasingly on speculative execution to mask slow tasks, also known as stragglers, because a job's execution time is dominated by the slowest task instance. Big Data systems typically identify stragglers and speculatively run copies of those tasks with the expectation a copy may complete faster to shorten job execution times. There is a rich body of recent results on straggler mitigation in MapReduce. However, the majority of these do not consider the problem of accurately detecting stragglers. Instead, they adopt a particular straggler detection approach and then study its effectiveness in terms of performance, e.g., reduction in job completion time, or efficiency, e.g., high resource utilization. In this paper, we consider a complete framework for straggler detection and mitigation. We start with a set of metrics that can be used to characterize and detect stragglers including Precision, Recall, Detection Latency, Undetected Time and Fake Positive. We then develop an architectural model by which these metrics can be linked to measures of performance including execution time and system energy overheads. We further conduct a series of experiments to demonstrate which metrics and approaches are more effective in detecting stragglers and are also predictive of effectiveness in terms performance and energy efficiencies. For example, our results indicate that the default Hadoop straggler detector could be made more effective. In certain cases, Precision is low and only 55% of those detected are actual stragglers and the Recall, i.e., percent of actual detected stragglers, is also relatively low at 56%. For the same case, the hierarchical approach (i.e., a green-driven detector based on the default one) achieves a Precision of 99% and a Recall of 29%. Further, these increases in Precision can be translated to achieve lower execution time and energy consumption, and thus higher performance and energy efficiency; compared to the default Hadoop mechanism, the energy consumption is reduced by almost 31%. These results demonstrate how our framework can offer useful insights and be applied in practical settings to characterize and design new straggler detection mechanisms for MapReduce systems.
Garbage collection (GC) algorithms for flash-based solid state drives (SSDs) have a profound impact on its performance and many studies have focused on assessing the so-called write amplification of various GC algorithms. In this paper we consider the family of d-choices GC algorithms and study (a) the extent in which these algorithms induce unequal wear and (b) the manner in which they affect the life time of the drive. For this purpose we introduce two performance measures: the PE fairness and SSD endurance. We study the impact of the d-choices GC algorithm on both these measures under different workloads (uniform, synthetic and trace-based) when combined with two different write modes. Numerical results show that the less complex of the two write modes, which does not require any form of hot/cold data identification, may give rise to a better SSD endurance. Further the d-choices GC algorithm is often shown to strike a good balance between garbage collection and wear leveling for small d values (e.g., d=10), yielding high endurance.
Queueing systems with redundancy have received considerable attention recently. The idea of redundancy is to reduce latency by replicating each incoming job a number of times and to assign these replicas to a set of randomly selected servers. As soon as one replica completes service the remaining replicas are cancelled. Most prior work on queueing systems with redundancy assumes that the job durations of the different replicas are i.i.d., which yields insights that can be misleading for computer system design. In this paper we develop a differential equation, using the cavity method, to assess the workload and response time distribution in a large homogeneous system with redundancy without the need to rely on this independence assumption. More specifically, we assume that the duration of each replica of a single job is identical across the servers and follows a general service time distribution. Simulation results suggest that the differential equation yields exact results as the system size tends to infinity and can be used to study the stability of the system. We also compare our system to the one with i.i.d. replicas and show the similarity in the analysis used for independent resp. identical replicas.