معرفی یک روش مبتنی بر یادگیری تقویتی برای تعیین زمان و تعداد مناسب خرید سهام

درهمی, ولی; دره زرشکی, فاطمه

doi:10.22034/abmir.2024.21357.1050

	معرفی یک روش مبتنی بر یادگیری تقویتی برای تعیین زمان و تعداد مناسب خرید سهام
پژوهش های نظری و کاربردی هوش ماشینی
مقاله 6، دوره 2، شماره 1، شهریور 1403، صفحه 92-103 اصل مقاله (1.07 M)
نوع مقاله: مقاله پژوهشی
شناسه دیجیتال (DOI): 10.22034/abmir.2024.21357.1050
نویسندگان
ولی درهمی^* ¹؛ فاطمه دره زرشکی²
¹دانشگاه یزد- دانشکده مهندسی کامپیوتر
²دانشگاه یزد، دانشکده مهندسی کامپیوتر
چکیده
نوسان قیمت و عدم اطمینان موجود در بازار، تعیین استراتژی بهینه برای خرید سهام را به یک فرایند پیچیده تبدیل کرده است. عدم تکرار شرایط یک معامله، لزوم یادگیری به‌صورت تعاملی را ایجاب می‌کند. یادگیری تقویتی یک روش یادگیری تعاملی است که تنها با استفاده از یک سیگنال اسکالرِ راندمان، می‌تواند پارامترهای سیستم را تنظیم نماید. در این مقاله با تعریف مناسب حالت‌های سیستم شامل گام زمانی، تعداد کل سهام خریداری‌شده تا گام زمانی فعلی، میزان انحراف معیار قیمت سهام از گام نخست تا گام زمانی مورد نظر و میزان تغییرات قیمت نسبت به گام زمانی قبل و همچنین تعریف مناسب سیگنال تقویتی، از روش یادگیری کیو به‌عنوان یکی از معروف‌ترین الگوریتم‌های یادگیری تقویتی برای تقریب توابع ارزش حالت-عمل استفاده می‌شود. در این پژوهش، بازار سهام با توجه به روابط ریاضی موجود، مدل شده و روش ارائه‌شده در آن به کار گرفته شده است. عملکرد استراتژی حاصل از مدل پیشنهادی با استراتژی بازگشت به میانگین در 5000 بازارِ شبیه‌سازی‌شده مورد مقایسه قرار گرفته است. نتایج نشان‌دهنده آن است که بهره‌گیری از مدل پیشنهادی در مقایسه با استراتژی بازگشت به میانگین نه‌تنها هزینه متوسط پایین‌تر، بلکه قابلیت اطمینان بسیار بالاتری نیز دارد.
کلیدواژه‌ها
بازار سهام؛ بهینه‌سازی هزینه‌های اجرایی سهام؛ یادگیری تقویتی؛ یادگیری کیو
عنوان مقاله [English]
A Reinforcement Learning Approach to Determine When and How Many Stocks to Buy in Stock Trading
نویسندگان [English]
Fatemeh Darezereshki²؛

²Yazd university
چکیده [English]
Due to the volatility and uncertainty inherent in the stock market, devising an optimal trading strategy is a complex endeavor. Given the non-repetitive nature of trading circumstances, learning through interactions becomes imperative. Reinforcement learning emerges as an interactive learning approach capable of adjusting system parameters based solely on a scalar efficiency signal. This paper introduces a methodology wherein the states of the system are defined by the time step, the total number of shares purchased thus far, the standard deviation of stock prices from the beginning to the current step, and the difference between the current price and the price at the previous step. By defining a suitable reinforcement signal, the paper employs one of the most popular reinforcement learning algorithms, Q-learning, to approximate state-action value functions. The stock market is simulated using a set of equations, and the proposed method is applied. Performance evaluation is conducted by comparing the proposed model against mean reversion trading strategy across 5000 simulated markets. The experimental results demonstrate that the trading strategy derived from the Q-model not only yields lower average cost but also exhibits greater reliability compared to mean reversion strategy.
کلیدواژه‌ها [English]
Stock Market, Optimizing Execution Costs of Shares, Reinforcement Learning, Q-learning

مراجع
]1[ ولی درهمی، فریناز اعلمیان هرندی، محمدباقر دولتشاهی، . "یادگیری تقویتی"، دانشگاه یزد، چاپ اول، 2090 [2] K. Chaudhari and A. Thakkar, “Neural network systems with an integrated coefficient of variation-based feature selection for stock price and trend prediction.” Expert Systems with Applications, Vol. 219, p. 119527, 2023. [3] Y. Zhao and G. Yang, “Deep Learning-based Integrated Framework for stock price movement prediction.” Applied Soft Computing, Vol. 133, p. 109921, 2023. [4] A. Chudziak, “Predictability of stock returns using neural networks: Elusive in the long term.” Expert Systems with Applications, Vol. 213, p. 119203, 2023. D. Bertsimas and A. W. Lo, “Optimal control of execution costs,” Journal of Financial Markets, Vol.1, No.1, pp. 1-50, 1998. [6] R. Almgren and N. Chriss, “Optimal Execution of Portfolio Transactions,” Journal of Risk, Vol.3, No.2, pp. 5-40, 2001. [7] R. F. Almgren, “Optimal execution with nonlinear impact functions and trading-enhanced risk,” Applied Mathematical Finance, Vol.10, No.1, pp. 1-18, 2003. [8] J. Lorenz and R. Almgren, “Mean–Variance Optimal Adaptive Execution,” Applied Mathematical Finance, Vol.18, No.5, pp. 395-422, 2011. [9] G. Huberman and W. Stanzl, “Optimal Liquidity Trading,” Review of finance, Vol.9, No.2, pp. 165-200, 2005. [10] A. A. Obizhaeva and J. Wang, “Optimal trading strategy and supply/demand dynamics,” Journal of Financial Markets, Vol.16, No.1, pp. 1-32, 2013. [11] A. Schied and T. Schöneborn, “Risk aversion and the dynamics of optimal liquidation strategies in illiquid markets,” Finance and Stochastics, Vol.13, No.2, pp. 181-204, 2009. [12] R. Almgren, “Optimal Trading with Stochastic Liquidity and Volatility,” SIAM Journal on Financial Mathematics, Vol.3, No.1, pp. 163-181, 2012. [13] P. Forsyth, J. Kennedy, S. Tse, and H. Windcliff, “Optimal trade execution: A mean quadratic variation approach,” Journal of Economic Dynamics and Control, Vol.36, No.12, pp. 1971-1991, 2012. [14] O. Guéant, “Optimal Execution and Block Trade Pricing: A General Framework,” Applied Mathematical Finance, Vol.22, No.4, pp. 336-365, 2015. [15] Z. Liu, Y. Zhai, J. Li, G. Wang, Y. Miao, and H. Wang, “Graph Relational Reinforcement Learning for Mobile Robot Navigation in Large-Scale Crowded Environments.” IEEE Transactions on Intelligent Transportation Systems, Vol. 24, No. 8, pp. 8776-8787, 2023. [16] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, … and D. Hassabis, “Human-level control through deep reinforcement learning,” nature, Vol.518, No.7540, pp. 529-533, 2015. [17] B. Xian, X. Zhang, H. Zhang, and X. Gu, “Robust Adaptive Control for a Small Unmanned Helicopter Using Reinforcement Learning.” IEEE Transactions on Neural Networks and Learning Systems, Vol. 33, No. 12, pp. 7589-7597, 2022. [18] Y. D. Song, Q. Song, and W. C. Cai, “Fault-Tolerant Adaptive Control of High-Speed Trains Under Traction/Braking Failures: A Virtual Parameter-Based Approach,” IEEE Transactions on Intelligent Transportation Systems, Vol.15, No.2, pp. 737-748, 2014. [19] F. S. Melo, “Convergence of Q-learning: A simple proof,” Institute of Systems and Robotics, Tech. Rep., pp. 1-4, 2001. [20] T. Jaakkola, M. I. Jordan, and S. P. Singh, “Convergence of stochastic iterative dynamic programming algorithms,” Advances in Neural Information Processing Systems, pp. 703–710, 1994. [21] G. Ritter, “Machine Learning for Trading,” SSRN Electronic Journal, 2017. [22] J. C. H. Watkins and P. Dayan, “Q-learning,” Machine learning, Vol.8, No.3, pp. 279–292, 1992.
آمار تعداد مشاهده مقاله: 398 تعداد دریافت فایل اصل مقاله: 174

سامانه مدیریت نشریات علمی دانشگاه یزد

معرفی یک روش مبتنی بر یادگیری تقویتی برای تعیین زمان و تعداد مناسب خرید سهام