ارائه روشی نوین به منظور هم ترازی دادگان در الگوریتمهای تلفیق تصاویر فروسرخ و RGB | ||
| پژوهش های نظری و کاربردی هوش ماشینی | ||
| مقاله 3، دوره 3، شماره 1، شهریور 1404، صفحه 35-50 اصل مقاله (1.01 M) | ||
| نوع مقاله: مقاله پژوهشی | ||
| شناسه دیجیتال (DOI): 10.22034/abmir.2025.22795.1101 | ||
| نویسندگان | ||
| راضیه رضوی1؛ رضا روحانی* 2 | ||
| 1کارشناسی ارشد، گروه مهندسی کامپیوتر، دانشکده فنی و مهندسی، دانشگاه شهرکرد، شهر شهرکرد، ایران | ||
| 2استادیار، گروه مهندسی کامپیوتر، دانشکده فنی و مهندسی، دانشگاه شهرکرد، شهر شهرکرد، ایران | ||
| چکیده | ||
| در سالهای اخیر، تشخیص حرکت انسان به یکی از موضوعات مهم در حوزه بینایی ماشین تبدیل شده است. بااینحال، یکی از چالشهای اساسی در این زمینه، استخراج ویژگیهای مؤثر برای افزایش دقت تشخیص است. دادههای ویدئویی فروسرخ و RGB معمولاً برای این منظور استفاده میشوند، اما هیچکدام بهتنهایی اطلاعات کاملی از صحنه ارائه نمیدهند. بنابراین، ترکیب این دادهها میتواند به استخراج ویژگیهای دقیقتر منجر شود. یکی از راهکارهای مؤثر برای دستیابی به این هدف، استفاده از تکنیکهای تلفیق اطلاعات است. بااینوجود، بیشتر مجموعهدادههای تشخیص حرکت انسان برای تلفیق استانداردسازی نشدهاند و دادهها بهدرستی با یکدیگر تراز نیستند. در این پژوهش، از مجموعهداده NTU RGB+D استفاده شده و با بهرهگیری از تکنیکهای مسائل معکوس و مختصات نقاط بدنی موجود در این مجموعهداده، روشی برای ترازسازی و برش دادههای ویدئویی بهمنظور تلفیق دو نوع داده ویدئویی فروسرخ و RGB ارائه شده است. عملکرد روش پیشنهادی با استفاده از معیارهای EN، MI، SSIM و MS-SSIM مورد ارزیابی قرار گرفته است. نتایج بهدستآمده نشان میدهند که مقادیر حاصله از (17/7)EN و (1/13)MI بیانگر حداکثر میزان انتقال و همپوشانی اطلاعات هستند. همچنین، مقادیر (78/0)SSIM و (84/0) MS-SSIM نشاندهنده حفظ ساختار و کیفیت بالای دادههای تلفیقشده هستند. این نتایج، کارایی روش پیشنهادی را در بهبود تلفیق دادههای ویدئویی تأیید میکنند. | ||
| کلیدواژهها | ||
| تلفیق تصاویر؛ تشخیص حرکت انسان؛ همتراز کردن | ||
| عنوان مقاله [English] | ||
| A Novel Approach for Data Alignment in Infrared and RGB Image Fusion Algorithms | ||
| نویسندگان [English] | ||
| Raziyeh Razavi1؛ Reza Rohani Sarvestani2 | ||
| 1MSc., Department of Computer Engineering, Faculty of Technology and Engineering, Shahrekord University, Shahrekord, Iran | ||
| 2Assistant Professor, Department of Computer Engineering, Faculty of Technology and Engineering, Shahrekord University, Shahrekord, Iran | ||
| چکیده [English] | ||
| In recent years, human action recognition has become a key topic in the field of computer vision. However, one of the main challenges in this area is extracting effective features to enhance recognition accuracy. Infrared and RGB video data are commonly used for this purpose, yet neither of them alone provides a comprehensive representation of the scene. Therefore, combining these data types can lead to more accurate feature extraction. One effective approach to achieving this goal is through information fusion techniques. However, most human motion recognition datasets are not standardized for fusion, and the data are not properly aligned with each other. In this study, the NTU RGB+D dataset is utilized, and a method for aligning and cropping video data is proposed to fuse infrared and RGB video frames. This method leverages inverse problem-solving techniques and body joint coordinates available in the dataset. The performance of the proposed approach is evaluated using EN, MI, SSIM, and MS-SSIM metrics. The obtained results indicate that the values of EN (7/17) and MI (13/1) demonstrate maximum information transfer and overlap. Additionally, the SSIM (0/78) and MS-SSIM (0/84) values confirm the preservation of structure and high quality of the fused data. These findings validate the effectiveness of the proposed method in enhancing video data fusion. | ||
| کلیدواژهها [English] | ||
| Image Fusion, Human Action Recognition, Data Alignment | ||
| مراجع | ||
|
[1] K. Rani and R. Sharma, “Study of different image fusion algorithm,” Int. J. Emerg. Technol. Adv. Eng., vol. 3, no. 5, pp. 288–291, May 2013. [2] D. Mishra and B. Palkar, “Image fusion techniques: a review,” Int. J. Comput. Appl., vol. 130, no. 9, pp. 7–13, 2015, doi: 10.5120/ijca2015907084. [3] J. Ma, Y. Ma, and C. Li, “Infrared and visible image fusion methods and applications: A survey,” Inf. Fusion, vol. 45, pp. 153–178, 2019, doi: 10.1016/j.inffus.2018.02.004. [4] S. Li, X. Kang, L. Fang, J. Hu, and H. Yin, “Pixel-level image fusion: A survey of the state of the art,” Inf. Fusion, vol. 33, pp. 100–112, 2017, doi: 10.1016/j.inffus.2016.05.004. [5] D. E. Nirmala and V. Vaidehi, “Comparison of Pixel-level and feature level image fusion methods,” in Proc. 2nd Int. Conf. Comput. Sustain. Global Dev. (INDIACom), New Delhi, India, Mar. 2015, pp. 743–748. [6] G. Xiao, D. P. Bavirisetti, G. Liu, and X. Zhang, “Decision-level image fusion,” Image Fusion, pp. 149–170, 2020. [7] R. Poppe, “A survey on vision-based human action recognition,” Image Vis. Comput., vol. 28, no. 6, pp. 976–990, 2010, doi: 10.1016/j.imavis.2009.11.014. [8] M. Karim, S. Khalid, A. Aleryani, J. Khan, I. Ullah, and Z. Ali, "Human action recognition systems: A review of the trends and state-of-the-art," IEEE Access, 2024, doi: 10.1109/ACCESS.2024.3373199. [9] X. Jin, Q. Jiang, S. Yao, D. Zhou, R. Nie, J. Hai, and K. He, "A survey of infrared and visual image fusion methods," Infrared Phys. Technol., vol. 85, pp. 478-501, 2017, doi: 10.1016/j.infrared.2017.07.010. [10] Shahroudy, J. Liu, T. T. Ng, and G. Wang, "Ntu rgb+d: A large scale dataset for 3d human activity analysis," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 1010-1019, doi: 10.1109/CVPR.2016.115. [11] He, Q. Liu, H. Li, and H. Wang, "Multimodal medical image fusion based on IHS and PCA," Procedia Eng., vol. 7, pp. 280-285, 2010, doi: 10.1016/j.proeng.2010.11.045. [12] Lu, C. Miao, and H. Wang, "Pixel level image fusion based on linear structure tensor," in 2010 IEEE Youth Conf. Inf., Comput. Telecommun., 2010, pp. 303-306. [13] U. Patil and U. Mudengudi, "Image fusion using hierarchical PCA," in 2011 Int. Conf. Image Inf. Process., 2011, pp. 1-6, doi: 10.1109/ICIIP.2011.6108966. [14] W. He, W. Feng, Y. Peng, Q. Chen, G. Gu, and Z. Miao, "Multi-level image fusion and enhancement for target detection," Optik, vol. 126, no. 11–12, pp. 1203-1208, 2015, doi: 10.1016/j.ijleo.2015.02.092. [15] Y. Liu, X. Chen, H. Peng, and Z. Wang, "Multi-focus image fusion with a deep convolutional neural network," Inf. Fusion, vol. 36, pp. 191-207, 2017, doi: 10.1016/j.inffus.2016.12.001. [16] Z. Ahmad, A. Tabassum, L. Guan, and N. Khan, "ECG heart-beat classification using multimodal image fusion," in ICASSP 2021-2021 IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), 2021, pp. 1330-1334, doi: 10.1109/ICASSP39728.2021.9414709. [17] L. Tang, X. Xiang, H. Zhang, M. Gong, and J. Ma, "DIVFusion: Darkness-free infrared and visible image fusion," Inf. Fusion, vol. 91, pp. 477-493, 2023, doi: 10.1016/j.inffus.2022.10.034. [18] Y. Chen, L. Cheng, H. Wu, F. Mo, and Z. Chen, "Infrared and visible image fusion based on iterative differential thermal information filter," Opt. Lasers Eng., vol. 148, p. 106776, 2022, doi: 10.1016/j.optlaseng.2021.106776. [19] H. Li and X. J. Wu, "DenseFuse: A fusion approach to infrared and visible images," IEEE Trans. Image Process., vol. 28, no. 5, pp. 2614-2623, 2019, doi: 10.1109/TIP.2019.2899946. [20] Z. Zhao, S. Xu, C. Zhang, J. Liu, P. Li, and J. Zhang, "DIDFuse: Deep image decomposition for infrared and visible image fusion," arXiv preprint arXiv:2003.09210, 2020, doi: 10.24963/ijcai.2020/135. [21] H. Li, X. J. Wu, and T. Durrani, "NestFuse: An infrared and visible image fusion architecture based on nest connection and spatial/channel attention models," IEEE Trans. Instrum. Meas., vol. 69, no. 12, pp. 9645-9656, 2020, doi: 10.1109/TIM.2020.3005230. [22] H. Li, X. J. Wu, and J. Kittler, "RFN-Nest: An end-to-end residual fusion network for infrared and visible images," Inf. Fusion, vol. 73, pp. 72-86, 2021, doi: 10.1016/j.inffus.2021.02.023. [23] H. Xu, J. Ma, J. Jiang, X. Guo, and H. Ling, "U2Fusion: A unified unsupervised image fusion network," IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 1, pp. 502-518, 2020, doi: 10.1109/TPAMI.2020.3012548. [24] R. Dang, C. Liu, M. Liu, and Q. Chen, “Channel attention and multi-scale graph neural networks for skeleton-based action recognition,” AI Commun., vol. 35, no. 3, pp. 187–205, 2022, doi: 10.3233/AIC-210250. [25] P. Fieguth, Statistical Image Processing and Multidimensional Modeling. Springer, 2010, doi: 10.1007/978-1-4419-7294-1. [26] J. Hadamard, Lectures on Cauchy’s Problem in Linear Partial Differential Equations, vol. 15. Yale Univ. Press, 1923, doi: 10.1063/1.3061337. [27] M. Zolfaghari, G. L. Oliveira, N. Sedaghat, and T. Brox, “Chained multi-stream networks exploiting pose, motion, and appearance for action classification and detection,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2017, pp. 2904–2913, doi: 10.1109/ICCV.2017.316. [28] P. Wang, W. Li, J. Wan, P. Ogunbona, and X. Liu, “Cooperative training of deep aggregation networks for RGB-D action recognition,” in Proc. AAAI Conf. Artif. Intell., vol. 32, no. 1, Apr. 2018, doi: 10.1609/aaai.v32i1.12228. [29] F. Baradel, C. Wolf, and J. Mille, “Pose-conditioned spatio-temporal attention for human action recognition,” arXiv preprint arXiv:1703.10106, 2017, doi: 10.48550/arXiv.1703.10106. [30] W. Ma, K. Wang, J. Li, S. X. Yang, J. Li, L. Song, and Q. Li, “Infrared and visible image fusion technology and application: A review,” Sensors, vol. 23, no. 2, p. 599, 2023, doi: 10.3390/s23020599. [31] M. De Boissiere and R. Noumeir, “Infrared and 3D skeleton feature fusion for RGB-D action recognition,” IEEE Access, vol. 8, pp. 168297–168308, 2020, doi: 10.1109/ACCESS.2020.3023599. [32] S. Hong, A. Ansari, G. Saavedra, and M. Martinez-Corral, “Full-parallax 3D display from stereo-hybrid 3D camera system,” Opt. Lasers Eng., vol. 103, pp. 46–54, 2018, doi: 10.1016/j.optlaseng.2017.11.010. [33] G. Di Leo and A. Paolillo, “Uncertainty evaluation of camera model parameters,” in Proc. IEEE Int. Instrum. Meas. Technol. Conf. (I2MTC), May 2011, pp. 1–6, doi: 10.1109/IMTC.2011.5944307. [34] M. Folk, G. Heber, Q. Koziol, E. Pourmal, and D. Robinson, “An overview of the HDF5 technology suite and its applications,” in Proc. EDBT/ICDT Workshop Array Databases, 2011, pp. 36–47, doi: 10.1145/1966895.1966900. [35] S. Karim, G. Tong, J. Li, A. Qadir, U. Farooq, and Y. Yu, “Current advances and future perspectives of image fusion: A comprehensive review,” Inf. Fusion, vol. 90, pp. 185–217, 2023, doi: 10.1016/j.inffus.2022.09.019. | ||
|
آمار تعداد مشاهده مقاله: 181 تعداد دریافت فایل اصل مقاله: 86 |
||
