The following is a summary of Nick Menzies’ MBA Major Research Project conducted at the Ted Rogers School of Management at Ryerson University and supervised by Dr. Anatoliy Gruzd at the Social Media Lab. As part of this work, Nick also developed and shared his source code on github that collects data from Twitter, Klout as well as information about stocks.
When trying to pick stocks to invest in there are many different sources of information that an investor can look at. While typical sources, such as financial information or comments from company executives can give a very good picture of the current state of a company, any additional information that can assist in predicting the future stock price can help investors when they are attempt to maximize their returns. While most information once it becomes public is priced into a stock, social media communications may give a sense of the direction a company is headed before it is priced in.
Previously, there had been a number of published papers indicating that social media did have the ability to predict future stock prices. Luo et al. (2011) showed that for a select group of software and hardware firms social media factors did assist in predicting future stock prices. Bollen et. al (2010) showed that using social media sentiment analysis could assist in predicting the Dow Jones Industrial average. My research, in order to build off these previous studies, looks at the entire Fortune 1000, as opposed to any specific industry or sector and uses basic social media factors, such as number of Twitter Friends, as opposed to using more complex factors such as sentiment.
To determine if information from social media can in fact help predict stock prices, I created a program in C# that pull data on companies in the 2015 Fortune 1000. The program, which can be accessed at https://github.com/9nick9/MRPCode, uses publicly available application programming interfaces (API) to gather data from the stock market, company’s Twitter profile information (e.g., the number of followers) and an aggregated social media engagement score, calculated by a social media influence scoring platform called Klout. The data was collected in an automated fashion every weeknight after the North American stock markets closed. In total, the resulting data set contained information about 680 of the Fortune 1000 companies.
Next, I used a statistical technique for comparing time series data called the Granger causality to determine whether social media data and stock market data can predict future stock market data better than just using past stock market data.
The specific data used to perform the Granger causality was the changes in social media data and the changes in stock market data with gaps of time from 1 to 4 days. In total there were 2720 cases (680 companies x 4 different time windows: 1, 2, 3, and 4 days). Only 101 of these cases (3.7%) showed the Granger causality holding with an acceptable significance; these are the cases where the company engagement factors (e.g., company’s Klout score) can be used to better predict stock prices. When the user engagement factors (e.g., the number of company’s Twitter followers) were used, even fewer of these cases (52 cases, 1.9%) showed the Granger causality holding at a significant level.
Two of the companies that did have cases where the Granger causality held were Wal-Mart and Constellation Brands. Figure 1 shows the adjusted R2 for these companies and the resulting F-value. For the Granger causality to hold, the ‘company’ and ‘users’ R2 must be greater than the ‘price’ R2 and the F-value must be larger than the necessary cut-off point, which is determined by the number of degrees of freedom contained in the data and the F-distribution curve. In chart, the Granger causality can be seen when both the price R2 bar is shorter than either of the other bars in the grouping and the F-value is above the bottom of the chart. The only time that the F-value does not meet the significance target is with the user activity for Constellation brands, which is also the time when the price R2 is greater than the user activity R2, which means the Granger Causality does not hold in that situation.
The results show that only in a small percentage of cases social media factors may assist in predicting stock market data. While I have yet to determine why social media factors only seem to help predict future stock prices in some companies there are a few possibilities, including that companies with a customer base which is more active on social media would have a larger influence on the stock price. Another possibility is that only companies with a technologically savvy portion of their customer base will see the increased predictability. While other research has had more success at relating social media data to stock prices on a small-scale in specific industries, or using sentiment analysis techniques, the current work did not confirm that stock market data can be consistently predicted on a broad scale using simple social media factors.
This study was limited due to the data collection taking place over a 1-month period. Additionally, the study only looked at large publicly traded companies, and it may be the case that social media has more predictive ability before companies are large enough to appear on the Fortune 1000. Future research should be performed to determine which social media factors are the best a predicting future company value. As well, determining how long of a lag period is needed between the actions on social media platforms and the corresponding changes in stock price would allow for more efficient analyzes in the future.
Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1), 1–8. http://doi.org/10.1016/j.jocs.2010.12.007
Luo, X., Zhang, J., & Duan, W. (2013). Social Media and Firm Equity Value. Information Systems Research, 24(1), 146–163.
Photo credit: Anthony Quintano A Twitter Banner Draped Over The New York Stock Exchange For Twitter’s IPO. Retrieved from https://www.flickr.com/photos/quintanomedia/10779578936