ACM Transactions on

Modeling and Performance Evaluation of Computing Systems (TOMPECS)

Latest Articles

On the Endurance of the d-Choices Garbage Collection Algorithm for Flash-Based SSDs

Garbage collection (GC) algorithms for flash-based solid-state drives (SSDs) have a profound impact on its performance and many studies have focused... (more)



ACM Transactions on Modeling and Performance Evaluation of Computing Systems (ToMPECS) is a new ACM journal that publishes refereed articles on all aspects of the modeling, analysis, and performance evaluation of computing and communication systems.

The target areas for the application of these performance evaluation methodologies are broad, and include traditional areas such as computer networks, computer systems, storage systems, telecommunication networks, and Web-based systems, as well as new areas such as data centers, green computing/communications, energy grid networks, and on-line social networks.

Issues of the journal will be published on a quarterly basis, appearing both in print form and in the ACM Digital Library. 

Forthcoming Articles
Packet Clustering Introduced by Routers: Modeling, Analysis and Experiments

In this paper, we investigate routers inherent variation on packet processing time and its effect on interpacket delay and packet clustering.We propose a simple pipeline model incorporating the inherent variation, and two metrics, one to measure packet clustering and one to quantify inherent variation. To isolate the effect of the inherent variation, we begin our analysis with no cross traffic and step through setups where the input streams have different data rate, packet size and go through different number of hops. We show that a homogeneous input stream with a sufficiently large interpacket gap will emerge at the routers output with interpacket delays that are negative correlated with adjacent values and have symmetrical distributions. We show that for smaller interpacket gaps, the change in packet clustering is smaller. It is also shown that the degree of packet clustering could in fact decrease for a clustered input. We generalize our results by adding cross traffic. All the results predicted by the model are validated with experiments with real routers. We also discussed several factors that can affect the inherent variation as well as some potential applications of this study.

Scheduling for Optimal File Transfer Delay using Chunked Random Linear Network Coding Broadcast

We study the broadcast transmission of a single file to an arbitrary number of receivers using Random Linear Network Coding (RLNC) in a network with unreliable channels. Due to the increased computational complexity of the decoding process (especially for large files) we apply chunked RLNC (i.e. RLNC is applied within non-overlapping subsets of the file). In our work we show the optimality of the Least Received (LR) batch scheduling policy with regards to the expected file transfer completion time. The $LR$ policy strives to keep the receiver queues balanced. This is done by transmitting packets (corresponding to \textit{encoded} batches) that are needed by the receivers with the shortest queues of successfully received packets. Furthermore, we provide formulas for the expected time for the file transmission to all receivers using the LR batch scheduling policy and the minimum achievable coding window size in the case of a pre-defined delay constraint. Moreover, we evaluate through simulations a modification of the LR policy in a more realistic system setting with reduced feedback from the receivers. Finally, we provide an initial analysis and further modifications to the $LR$ policy for time-correlated channels and asymmetric channels.

Generalization of LRU Cache Replacement Policy with Applications to Video Streaming

Caching plays a crucial role in networking systems to reduce the load on the network and has become an ubiquitous functionality available at each router. One of the commonly used mechanisms, Least Recently Used (LRU), works well for identical file sizes. However, for asymmetric file sizes, the performance deteriorates. This paper proposes an adaptation to LRU strategy, called gLRU, where the file is sub-divided into equal-sized chunks. In this strategy, a chunk of the newly requested file is added in the cache, and a chunk of the least-recently-used file is removed from the cache. Even though approximate analysis for the hit rate has been studied for LRU, the analysis does not extend to gLRU since the metric of interest is no longer the hit rate as the cache has partial files. This paper provides a novel approximation analysis for this policy where the cache may have partial file contents. The approximation approach is validated by simulations. Further, gLRU outperforms LRU strategy for Zipf file popularity distribution and censored Pareto file size distribution for the file download times. Video streaming applications can further use the partial cache contents to help the stall durations significantly, and the numerical results indicate significant improvements (29\%) in stall durations using the gLRU strategy as compared to the LRU strategy.

A New Framework for Evaluating Straggler Detection Mechanisms in MapReduce

Big Data systems (e.g., MapReduce, Hadoop, Spark) rely increasingly on speculative execution to mask slow tasks, also known as stragglers, because a job's execution time is dominated by the slowest task instance. Big Data systems typically identify stragglers and speculatively run copies of those tasks with the expectation a copy may complete faster to shorten job execution times. There is a rich body of recent results on straggler mitigation in MapReduce. However, the majority of these do not consider the problem of accurately detecting stragglers. Instead, they adopt a particular straggler detection approach and then study its effectiveness in terms of performance, e.g., reduction in job completion time, or efficiency, e.g., high resource utilization. In this paper, we consider a complete framework for straggler detection and mitigation. We start with a set of metrics that can be used to characterize and detect stragglers including Precision, Recall, Detection Latency, Undetected Time and Fake Positive. We then develop an architectural model by which these metrics can be linked to measures of performance including execution time and system energy overheads. We further conduct a series of experiments to demonstrate which metrics and approaches are more effective in detecting stragglers and are also predictive of effectiveness in terms performance and energy efficiencies. For example, our results indicate that the default Hadoop straggler detector could be made more effective. In certain cases, Precision is low and only 55% of those detected are actual stragglers and the Recall, i.e., percent of actual detected stragglers, is also relatively low at 56%. For the same case, the hierarchical approach (i.e., a green-driven detector based on the default one) achieves a Precision of 99% and a Recall of 29%. Further, these increases in Precision can be translated to achieve lower execution time and energy consumption, and thus higher performance and energy efficiency; compared to the default Hadoop mechanism, the energy consumption is reduced by almost 31%. These results demonstrate how our framework can offer useful insights and be applied in practical settings to characterize and design new straggler detection mechanisms for MapReduce systems.

Investigating Characteristics of Internet Paths

Interactive and multimedia applications depend on the stability of end-to-end paths for predictable performance and good quality of service. On the other hand, network providers depend on multiple paths to ensure fault tolerance and use load balancing between these paths to enhance the overall network throughput. In this study, we analyze path dynamics for both end-to-end paths and path segments within service providers' networks using two months of measurement data from the RIPE Atlas platform, which collects path traces between a fixed set of source and destination pairs every 15 minutes. We observe that 77.6% of the end-to-end routes have at least two alternative Autonomous System (AS) paths with some end-to-end routes going through hundreds of different AS paths during the two months of analysis. While AS level paths are often stable for a day, there is considerable changes in the routing of the trace packets over the ASes for longer durations of a month or longer. Analyzing end-to-end paths for routing anomalies, we observe that 18% of ASes contain routing loops within their network indicating mis-configuration of routers. Some of the ASes had over 100 routers causing loops in the path traces through their networks. We observe much higher rate of anomalies in the AS level, with 44.9% of path traces containing an AS loop, while 16.4% of end-to-end traces contained a router level loop. Additionally, we discovered that few of the ASes bounce-back packets where some traces through their network traverse routers in both forward and backward directions. Determining path segments belonging to each AS, we further explore ingress to egress paths of ASes in addition to the source to destination paths within the AS. Analyzing trace segments between ingresses and egresses of an AS, we realized more than half of the ASes have the same router level path between any ingress-egress pair for the two months, but others implement load balancing. These results are different from earlier studies that indicated high level of path dynamism. Our results indicate that the end-to-end path dynamism is due to the BGP level rather than the router level within ASes.

All ACM Journals | See Full Journal Index

enter search term and/or author name