Financial Market Performance Forecasting

Proposal

Analyzing stock market trends to identify future price movements and assess market volatility.

Author

Affiliation

Team Name - Nicholas Tyler, John Moran, Zuleima Cota

College of Information Science, University of Arizona

Goal

Our goal for this project is to assess the financial markets and investigate the impacts that various economic conditions have on their performance. Furthermore, we would like to attempt to predict market volatility and future price movements within the financial markets given aspects like unemployment rates. The insights gained from our analysis may aid investors and those looking to better understand stock market performance what factors drive market stability and fluctuation to devise informed investment plans.

Dataset

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8597 entries, 0 to 8596
Data columns (total 13 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   dt            8597 non-null   object 
 1   vix           8597 non-null   float64
 2   sp500         8597 non-null   float64
 3   sp500_volume  8597 non-null   float64
 4   djia          8597 non-null   float64
 5   djia_volume   8597 non-null   float64
 6   hsi           8597 non-null   float64
 7   ads           8597 non-null   float64
 8   us3m          8597 non-null   float64
 9   joblessness   8597 non-null   int64  
 10  epu           8597 non-null   float64
 11  GPRD          8597 non-null   float64
 12  prev_day      8597 non-null   float64
dtypes: float64(11), int64(1), object(1)
memory usage: 873.3+ KB

	Year	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
0	1990	5.4	5.3	5.2	5.4	5.4	5.2	5.5	5.7	5.9	5.9	6.2	6.3
1	1991	6.4	6.6	6.8	6.7	6.9	6.9	6.8	6.9	6.9	7.0	7.0	7.3
2	1992	7.3	7.4	7.4	7.4	7.6	7.8	7.7	7.6	7.6	7.3	7.4	7.4
3	1993	7.3	7.1	7.0	7.1	7.1	7.0	6.9	6.8	6.7	6.8	6.6	6.5
4	1994	6.6	6.6	6.5	6.4	6.1	6.1	6.1	6.0	5.9	5.8	5.6	5.5

The Daily Stocks Data dataset (stocks_data.csv) is a dataset from Kaggle which captures data on financial market performance from 1990 to 2024. This data has relevant information related to volume, macroeconomic indicators, volatility index, and uncertainty metrics. This dataset is compiled from a variety of historical records, including the Chicago Board Option Exchange, Yahoo Finance and historical finance datasets, Buerau of Economic Anaysis, Federal Reserve, Economic Policy Uncertainty Index, and the Global Policy Uncertainty Database.

We ultimately chose this dataset because we wanted to analyze the stock market. Furthermore, this particular dataset had very reliable sources, was in a clean format, and had all the necessary information that we were looking for, so we knew that this was a trustworthy dataset that would set us up well for our analysis.

The dataset has the following dimensions:
1. dt: Observation Date
2. vix: Volatility Index (VIX)
3. sp500: S&P 500 Index Value
4. sp500_volume: Daily trading volume for the S&P 500.
5. djia: Dow Jones Industrial Average (DJIA)
6. djia_volume: Daily trading volume for the DJIA.
7. hsi: Hang Seng Index
8. ads: Aruoba-Diebold-Scotti (ADS) Business Conditions Index
9. us3m: U.S. Treasury 3-month bond yield
10. joblessness: U.S. unemployment rate, reported as quartiles.
11. epu: Economic Policy Uncertainty Index
12. GPRD: Geopolitical Risk Index (Daily)
13. prev_day: Previous day’s S&P 500 closing value

The unemployment dataset (SeriesReport.csv) is a dataset that comes from the Bureau of Labor Statistics that provides the unemployment rate per month for the United States from 1990 - 2024. Since the only unemployment information we had from the Daily Stocks Dataset was the quartile that that particular date fell into, we felt that it was necessary to get a more true representation of what the actual unemployment situation was during the time that the stock market results occurred. While a simple dataset, this was a dataset from a reliable source which provided the additional information we needed for our analysis.

Questions

Can we predict the Volatility Index within the financial markets based on various economic factors (i.e. unemployment rates, U.S. Treasurey 3-Month Bond Yield, etc.)?
What insights can historical stock market data offer to forecast future price movements and assess market volatility?

Analysis plan

For predicting the Volatility Index, we plan to analyze and predict how volatility rates are impacted by various economic variables. The variables that we will use in our models inlcude Unemployment Rate, U.S. Treasury 3-Month Bond Yield Rate, Economic Policy Uncertainty Index, and Geopolitical Risk Index in order to determine how these economic factors impact the financial markets. We chose to use these variables for the following reasons:

Unemployment Rate: We chose to include this variable because we wanted to analyze how joblessness impacted the uncertainty of the financial markets. Furthermore, when more people are without a job, we anticipate this to cause concern among investors, potentially raising the Volatility Index.
U.S. Treasury 3-Month Bond Yield Rate: We chose to include the U.S. Treasury 3-Month Bond Yield Rate because we wanted to analyze how rates surrounding these short-term government-owned bonds impacted the Volatility Index.
Economic Policy Uncertainty Index: We wanted to investigate how uncertainty related to economic policies impacted the performance within the financial markets. As the levels of economic uncertainty related to policies rise, we expect this to positively correlate with the Volatility Index, as if people are uncertain about how policies will impact the economy, this will likely raise concern with investors and raise the Volatility Index.
Geopolitical Risk Index: As with the Economic Policy Uncertainty Index, we wanted to investigate how uncertainty related to geopolitical events impact the financial markets. Furthermore, we also expect the Volatility Index to rise as the Geopolitical Risk Index rises, as we expect uncertainty around geopolitical events to cause uncertainty within the financial markets.

By using the best machine learning model for this dataset, we hope to accurately predict how volatility rates will be impacted as we see various economic factors fluctuate. During the model implementation phase, we will compare model performance on a ridge regression, lasso regression, random forest, and gradient boosting models. To evaluate our models models, we will use ajusted R-squared and mean standard error.

For forecasting future price movements, we plan to analyze long-term trends focusing on monthly and yearly patterns using the date variable. By examining aggregated data across these longer intervals, we can identify consistent patterns that can show us how long-term market movements influence future price forecasts and volatility levels. We will accomplish this through time-series analysis of historical stock market data using the ARIMA and run evaluation on the model. To evaluate our model, we will use AIC and root mean standard error.

Weekly plan

Week of 11/3 - Data retrieval and EDA
- For this week, we will load in the datasets and begin to explore and analyze our various datasets. Some of the tasks that we will perform this week include running descriptive statistics on variables, exploring distributions of variables, checking for normality, analyzing correlations between variables, and so on.
- Nicholas will be responsible for performing these tasks, while John and Zuleima will be responsible for reviewing this code and information following completion of the EDA.
Week of 11/10 - Pre-Processing
- For this week, we will perform pre-processing steps, including handling missing values, standardizing variables, handling skew, dealing with highly correlated variables, performing PCA, and so on.
- John will be responsible for handling missing values and standardizing variables, Zuleima will be responsible for handling skew within the data and dealing with highly correlated variables, and Nicholas will be responsible for performing PCA.
Week of 11/17 - Model Implementation
- For this week, we will begin training our models and searching for the best fitting model.
- Nicholas and John will be responsible for training the model for question #1 (predicing volatility within the financial markets), whereas Zuleima will be responsible for training the model for question #2 (predicting future price movements and assessing market volatility).
Week of 11/24 - Finalize Model Implementation
- This week, we will finalize our model implementation.
- John and Zuleima will be responsible for finalizing model implementation on question #1, whereas Nicholas will be responsible for finalizing model implementation on question #2.
Week of 12/1 - Model Evaluation
- This week, we will evaluate our models and assess model performance.
- Nicholas and Zuleima will evaluate the model for question #1, whereas John will evaluate the model for question #2.
Week of 12/8 - Result Interpretation
- This week, we will interpret the results from our models.
- Zuleima will be responsible for interpreting the model for question #1, whereas Nicholas and John will be responsible for interpreting the model for question #2.
- Halfway through the week, we will switch to ensure that everyone on our team gets the opportunity to interpret the results from each model.
- While evaluating and interpreting the results from our models, we will each work on compiling our analysis within the report to ensure that everyone contributes their feedback on each of the models.
Week of 12/15 - Finalize report/presentation and submit
- This week, we will finalize our report and presentation.