ACM Transactions on

Modeling and Performance Evaluation of Computing Systems (TOMPECS)

Latest Articles

Efficiency and Optimality of Largest Deficit First Prioritization: Dynamic User Prioritization for Soft Real-Time Applications

An increasing number of real-time applications with compute and/or communication deadlines are being supported on a shared infrastructure. Such... (more)

CloudHeat: An Efficient Online Market Mechanism for Datacenter Heat Harvesting

Datacenters are major energy consumers and dissipate an enormous amount of waste heat. Simple outdoor discharging of datacenter heat is energy-consuming and environmentally unfriendly. By harvesting datacenter waste heat and selling to the district heating system (DHS), both energy cost compensation and environment protection can be achieved. To... (more)

GPSonflow: Geographic Positioning of Storage for Optimal Nice Flow

This article evaluates the maximum data flow from a sender to a receiver via the internet when all transmissions are scheduled for early morning hours. The significance of early morning hours is that internet congestion is low while users sleep. When the sender and receiver lie in proximal time zones, a direct transmission from sender to receiver... (more)

Searching for a Single Community in a Graph

In standard graph clustering/community detection, one is interested in partitioning the graph into more densely connected subsets of nodes. In contrast, the search problem of this article aims to only find the nodes in a single such community, the target, out of the many communities that may exist. To do so, we are given suitable side information... (more)

System and Architecture Level Characterization of Big Data Applications on Big and Little Core Server Architectures

The rapid growth in data yields challenges to process data efficiently using current... (more)



ACM Transactions on Modeling and Performance Evaluation of Computing Systems (ToMPECS) is a new ACM journal that publishes refereed articles on all aspects of the modeling, analysis, and performance evaluation of computing and communication systems.

The target areas for the application of these performance evaluation methodologies are broad, and include traditional areas such as computer networks, computer systems, storage systems, telecommunication networks, and Web-based systems, as well as new areas such as data centers, green computing/communications, energy grid networks, and on-line social networks.

Issues of the journal will be published on a quarterly basis, appearing both in print form and in the ACM Digital Library. The first issue will likely be released in late 2015 or early 2016.

Forthcoming Articles
Ensuring Persistent Content in Opportunistic Networks via Stochastic Stability Analysis

The emerging device-to-device communication solutions and the abundance of mobile applications and services make opportunistic networking not only a feasible solution, but also an important component of future wireless networks. Specifically, the distribution of locally relevant content could be based on the community of mobile users visiting an area, if long term content survival can be ensured this way. In this paper we establish the conditions of content survival in such opportunistic networks, considering the user mobility patterns, as well as the time users keep forwarding the content, as the controllable system parameter. We demonstrate that a tractable epidemic model adequately characterizes the content spreading process, and derive the necessary user contribution to ensure content survival. We show that the required contribution from the users depends significantly on the size of the population, that users need to redistribute content only in a short period within their stay, and that they can decrease their contribution significantly in crowded areas. Hence, with the appropriate control of the system parameters, opportunistic content sharing can be both reliable and sustainable.

Scale-out vs Scale-up: A Study of ARM-based SoCs on Server-class workloads

ARM 64-bit processing has generated enthusiasm to develop ARM-based servers that are targeted for both data centers and supercomputers. In addition to the server-class components and hardware advancements, ARM software environment has been growing substantially over the past decade. Major development ecosystems and libraries are ported and optimized to run on ARM environment, making ARM suitable for server-class workloads. There are two trends in available ARM SoCs in the market: the mobile-class ARM SoCs rely on heterogeneous integration of a mix of CPU cores, GPGPU streaming multiprocessors (SMs), and other accelerators, whereas the server-class SoCs instead rely on integrating a larger number of CPU cores with no GPGPU support and a number of IO accelerators. For scaling the number of processing cores, there are two different paradigm: mobile-class SoCs uses scale-out architecture in the form of a cluster of simpler systems connected with the network, whereas sever-class ARM SoCs uses the scale-up solution and leverage symmetric multiprocessing (SMP) to pack large number of cores on the chip. In this work, we present ScaleSoC cluster which is a scale-out solution based on mobile class ARM SoCs. ScaleSoC leverage fast network connectivity and GPGPU acceleration to improve performance and energy efficiency compared to previous ARM clusters. We consider a wide range of modern server-class parallel workloads including latency-sensitive transactional workloads, MPI-based CPU and GPGPU accelerated scientific applications, and emerging artificial intelligence workloads. We study the performance and energy efficiency of ScaleSoC compared to server-class ARM SoCs and discrete GPGPUs in depth for each type of server-class workloads. We quantify the network overhead on the performance of ScaleSoC and show packing a large number of ARM cores on a single chip does not necessarily guarantee better performance due to shared resources such as last level cache become the performance bottleneck. We characterize the GPGPU accelerated workloads and demonstrate for applications that can leverage the better CPU-GPGPU balance of ScaleSoC cluster, performance and energy efficiency both get improved compared to discrete GPGPUs. We also analyze the scalability and performance limitations of the proposed ScaleSoC cluster.

Quantifying Cloud Performance and Dependability: Taxonomy, Metric Design, and Emerging Challenges

In the past decade, cloud computing has emerged from a pursuit for a service-driven information and communication technology (ICT), into a significant fraction of the ICT market. Responding to the growth of the market, many alternative cloud services and their underlying systems are currently vying for the attention of cloud users and providers. Thus, benchmarking them is needed, to enable cloud users to make an informed choice, and to enable system DevOps to tune, design, and evaluate their systems. This requires focusing on old and new system properties, possibly leading to the re-design of classic benchmarking metrics, such as expressing performance as throughput and latency (response time), and the design of new, cloud-specific metrics. Addressing this requirement, in this work we focus on four system properties: (i) elasticity of the cloud service, to accommodate large variations in the amount of service requested, (ii) performance isolation between the tenants of shared cloud systems, (iii) availability of cloud services and systems, and the (iv) operational risk of running a production system in a cloud environment. Focusing on key metrics, for each of these properties we review the state-of-the-art, then select or propose new metrics together with measurement approaches. We see the presented metrics as a foundation towards upcoming, industry-standard, cloud benchmarks.

Mean Field Games in Nudge Systems for Societal Networks

We consider the general problem of resource sharing in societal networks, consisting of interconnected communication, transportation, energy and other networks important to the functioning of society. Participants in such network need to take decisions daily, both on the quantity of resources to use as well as the periods of usage. With this in mind we discuss the problem of incentivizing users to behave in such a way that society as a whole benefits. In order to perceive societal level impact such incentives may take the form of rewarding users with lottery tickets based on good behavior, and periodically conducting a lottery to translate these tickets into real rewards. We will pose the user decision problem as a mean field game (MFG), and the incentives question as one of trying to select a good mean field equilibrium (MFE). In such a framework, each agent (a participant in the societal network) takes a decision based on an assumed distribution of actions of his/her competitors, and the incentives provided by the social planner. The system is said to be at MFE if the agent's action is a sample drawn from the assumed distribution. We will show the existence of such an MFE under different settings, and also illustrate how to choose an attractive equilibrium using as an example demand-response in energy networks.

EnergyQARE: QoS-Aware Data Center Participation in Smart Grid Regulation Service Reserve Provision

Power market operators have recently introduced smart grid demand response (DR), in which electricity consumers regulate their power usage following market requirements. DR helps stabilize the grid and enables integrating a larger amount of intermittent renewable. Data centers provide unique opportunities for DR participation due to their flexibility in both workload servicing and power consumption. While prior studies have focused on data center participation in legacy DR programs such as dynamic energy pricing and peak shaving, this paper studies data centers in emerging DR programs, i.e., demand side capacity reserves. Among different types of capacity reserves, regulation service reserves (RSRs) are especially attractive due to their relatively higher value. This paper proposes EnergyQARE, the Energy and Quality-of-Service (QoS) Aware RSR Enabler, which enables data center RSR provision in real-life scenarios. EnergyQARE not only incorporates RSR market constraints, but also adaptively makes decisions based on workload QoS feedback and provides QoS guarantees. It modulates data center power through server power management techniques and server provisioning, handles a heterogeneous set of jobs, and considers transition time delay and energy loss of servers in order to reflect real-life scenarios. In addition, EnergyQARE searches for a near-optimal energy and reserve bidding market strategy in RSR provision. Simulated numerical results demonstrate that in a general data center scenario, EnergyQARE provides close to 50% of data center average power consumption as reserves to the market, and saves up to 44% in data center electricity cost, while still meeting workload QoS constraints. Case studies in this paper show that the percentages of savings are not sensitive to workload types or the size of the data center, although they depend strongly on data center utilization and parameters of server power states.

On the Convergence of the TTL Approximation for an LRU Cache under Independent Stationary Request Processes

In this paper we focus on the LRU cache where requests for distinct contents are described by independent stationary and ergodic processes. We extend a TTL-based approximation of the cache hit probability first proposed by Fagin \cite{fagin1977asymptotic} for the independence reference model to this more general workload model. We show that under very general conditions this approximation is exact as the cache size and the number of contents go to infinity. Moreover, we establish this not only for the aggregate cache hit probability as in \cite{fagin1977asymptotic} but also for every individual content. Last, we obtain a rate of convergence.

QMLE: a Methodology for Statistical Inference of Service Demands from Queueing Data

Estimating the demands placed by services on physical resources is an essential step for the definition of performance models. For example, scalability analysis relies on these parameters to predict queueing delays under increasing loads. In this paper, we investigate maximum likelihood (ML) estimators for demands at load-independent and load-dependent resources in systems with parallelism constraints. We define a likelihood function based on state measurements and derive necessary conditions for its maximization. We then obtain novel estimators that accurately and inexpensively obtain service demands using only aggregate state data. With our approach, confidence intervals can be rigorously derived, explicitly taking into account both topology and concurrency levels of the services. Our estimators and their confidence intervals are validated against simulations and real system measurements for two multi-tier applications, showing high accuracy also in the presence of load-dependent resources.

Efficient Traffic Load-Balancing via Incremental Expansion of Routing Choices

Routing policies play a major role in the performance of communication networks. Backpressure-based adaptive routing algorithms where traffic is load-balanced along different routing paths on a per-packet basis have been studied extensively in the literature. Although backpressure-based algorithms have been shown to be network-wide throughput optimal, they typically have poor delay performance under light or moderate loads because packets may be sent over unnecessarily long routes. Further, backpressure-based algorithms have required every node to compute differential backlogs for every destination queue with the corresponding destination queue at every adjacent node, which is expensive given the large number of possible pairwise differential backlogs between neighbor nodes. In this paper, we propose new backpressure-based adaptive routing algorithms that only use shortest-path routes to destinations when they are sufficient to accommodate the given traffic load, but the proposed algorithms will incrementally expand routing choices as needed to accommodate increasing traffic loads. We show analytically by means of fluid analysis that the proposed algorithms retain network-wide throughput optimality, and we show empirically by means of simulations that our proposed algorithms provide substantial improvements in delay performance. Our evaluations further show that in practice, our approach dramatically reduces the number of pairwise differential backlogs that have to be computed and the amount of corresponding backlog information that has to be exchanged because routing choices are only incrementally expanded as needed.

All ACM Journals | See Full Journal Index

enter search term and/or author name