Cluster Analysis to Preprocess the Building Power Usage Data Without Domain Knowledge

Jongwoo Choi1, Il‑Woo Lee, Suk‑Won Cha#


This paper aims to provide the advantage of applying cluster analysis as a data preprocessing algorithm. Daily power usage of the office building during a year is analyzed in this study. Density-based clustering algorithm is applied in this study to find outliers of the data. Calendar day of the data is mapped on the circular time domain to consider the seasonality of power data. Optimal parameters for the data normalization and clustering is found by iterative search procedures. The result of this study found many possible outliers even without considerations for the detailed domain knowledge about the data themselves. Advanced studies such as modeling or statistical analyses can take advantage of outlier-free data from the data preprocessing.