收录:
摘要:
The trace analysis for datacenter holds a prominent importance for the datacenter performance optimization. However, due to the error and low execution priority of trace collection tasks, modern datacenter traces suffer from the serious data missing problem. Previous works handle the trace data recovery via the statistical imputation methods. However, such methods either recover the missing data with fixed values or require users to decide the relationship model among trace attributes, which are not feasible or accurate when dealing with the two missing data trends in datacenter traces: the data sparsity and the complex correlations among trace attributes. To this end, we focus on a trace released by Alibaba and propose a tensor-based trace data recovery model to facilitate the efficient and accurate data recovery for large-scale, sparse datacenter traces. The proposed model consists of two main phases. First, the data discretization and attribute selection methods work together to select the trace attributes with strong correlations with the value-missing attribute. Then, a tensor is constructed and the missing values are recovered by employing the CANDECOMP/PARAFAC decomposition based tensor completion method. The experimental results demonstrate that our model achieves higher accuracy than six statistical or machine learning-based methods.
关键词:
通讯作者信息: