Blind label ratio estimation | ||
| Journal of Statistical Modelling: Theory and Applications | ||
| دوره 6، شماره 1، فروردین 2025، صفحه 105-114 اصل مقاله (1.2 M) | ||
| نوع مقاله: Original Scientific Paper | ||
| شناسه دیجیتال (DOI): 10.22034/jsmta.2026.22764.1173 | ||
| نویسندگان | ||
| Javad Kazemitabar* 1؛ Soheila Rahimi2؛ Amir Rezaei-Ghadim2 | ||
| 1Department of Electrical and Computer Engineering, Babol Noshirvani University of Technology, Tehran, Iran | ||
| 2Son Corporate Group, Tehran, Iran | ||
| چکیده | ||
| Many anomaly detection algorithms require knowledge of the ratio of the two labels to operate. In real life, however, we may not have access to this value. As such, we often run anomaly detection packages with default values that may differ significantly from the actual value. Experiments on multiple datasets show that correctly determination of this ratio or at least obtaining a close estimate can makes a significant difference in the final performance of the anomaly detection algorithm. In this paper, we address the problem of estimating this ratio using both theoretical and heuristic techniques. In the theoretical method, we maximize the mutual information between features and labels to find the exact ratio. In the heuristic method, we sweep the [0,1] range in 0.01 steps to search for the ratio. On each iteration, we run the anomaly detection algorithm based on the ratio for that iteration and record the correlation coefficient between the features and the label generated by the algorithm. After the 100th iteration, we declare the ratio that provides the maximum correlation coefficient as our estimate of the label ratio. Our experiments on multiple datasets and several anomaly detection algorithms show that maximizing the correlation coefficient leads to the best results. | ||
| کلیدواژهها | ||
| Anomaly detection؛ Correlation coefficient؛ Mutual information؛ One-class support vector machine؛ Spectral ranking of anomalies | ||
|
آمار تعداد مشاهده مقاله: 99 تعداد دریافت فایل اصل مقاله: 38 |
||
