The following is the R code to perform the dividends and SP500 analysis. Since Berkshire Hathaway Inc. is recorded as “BRK.B” in the function of “tq_index”, whereas Tiingo recorded as “BRK-B”. We need to make some changes in order to make sure the R code to correctly pull data from Riingo or Yahoo.
library(tidyverse) library(tidyquant) library(riingo) sp_500_ticker <- tq_index("SP500") %>% arrange(symbol) %>% pull(symbol) sp_500_ticker<-replace(sp_500_ticker, sp_500_ticker=="BF.B", "BF-B") sp_500_ticker<-replace(sp_500_ticker, sp_500_ticker=="BRK.B", "BRK-B") sp_500_ticker_final<-sp_500_ticker
Since we have got the tickers, we can try to get the data from Yahoo Finance. Note that, you need to make sure all the tickers are alphabetically sorted, as these free services may have hourly-limitation. If that is true, an alphabetically sorted list can help you know the pick-up point.
sp_500_2019_2020_Yahoo <- sp_500_ticker_final %>% tq_get(get = "stock.prices", from = "2015-01-01", to = "2020-04-16") %>% arrange(symbol) saveRDS(sp_500_2019_2020_Yahoo, file="sp_500_2019_2020_Yahoo.rds") sp_500_2015_2020_Yahoo_working<-readRDS("sp_500_2019_2020_Yahoo.rds") check_1<-sp_500_2015_2020_Yahoo_working[is.na(sp_500_2015_2020_Yahoo_working$adjusted),] check_1
## # A tibble: 406 x 8 ## symbol date open high low close volume adjusted ## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 HWM 2019-10-18 NA NA NA NA NA NA ## 2 HWM 2019-10-21 NA NA NA NA NA NA ## 3 HWM 2019-10-22 NA NA NA NA NA NA ## 4 HWM 2019-10-23 NA NA NA NA NA NA ## 5 HWM 2019-10-24 NA NA NA NA NA NA ## 6 HWM 2019-10-25 NA NA NA NA NA NA ## 7 HWM 2019-10-28 NA NA NA NA NA NA ## 8 HWM 2019-10-29 NA NA NA NA NA NA ## 9 HWM 2019-10-30 NA NA NA NA NA NA ## 10 HWM 2019-10-31 NA NA NA NA NA NA ## # ... with 396 more rows
table(check_1$symbol)
## ## HWM TT UA ## 113 288 5
Since we run into problem with “HWM”,”TT”, and “UA”, as it seems that there are some NA data from Yahoo finance. We might need to find some data from Riingo. In addition, we want to check whether there are missing dates for some tickers.
length(unique(as.Date(sp_500_2015_2020_Yahoo_working$date)))
## [1] 1330
There are 1330 unique dates. Thus, for tickers that have less than 1330 observations, we should correct them or remove them.
less_than1330<-sp_500_2015_2020_Yahoo_working %>% group_by(symbol) %>% filter (length(unique(as.Date(date)))<1330) table(less_than1330$symbol)
## ## CARR CTVA DOW FOX FOXA FTV HPE HWM IR KHC LW OTIS PYPL UA WRK ## 19 225 271 276 277 952 1130 134 736 1204 861 19 1204 1206 1211
Combining with NA ones, I created an additional list to get the data from Diingo. I excluded CARR, CTVA, DOW,FTV, HPE, IR, LW, OTIS, and WRK, given that their earlist starting time point is after 2015-01-01.
additional_list_tickers<-c("FOX","FOXA","KHC","PYPL","HWM","TT","UA") sp_500_2015_2020_Riingo<- additional_list_tickers%>% riingo_prices(start_date = "2015-01-01", end_date = "2020-4-15") %>% arrange(ticker) %>% mutate(date = ymd(date)) saveRDS(sp_500_2015_2020_Riingo, file="sp_500_2015_2020_Riingo.rds") sp_500_2015_2020_Riingo<-readRDS("sp_500_2015_2020_Riingo.rds") counts_2<-table(sp_500_2015_2020_Riingo$ticker) counts_2
## ## FOX FOXA HWM KHC PYPL TT UA ## 276 277 1330 1204 1194 1330 1013
Thus, in the end, I will only copy “HWM” and “TT” to combine with data from Yahoo, since “FOX”,”FOXA”,”KHC”,”PYPL”, and “UA” are also imcomplete in Tiingo.
target <- c("TT","HWM") sp_500_2015_2020_Riingo_TT_HWM<-sp_500_2015_2020_Riingo %>% select("ticker","date","open","high","low","close","volume","adjClose") %>% filter(ticker %in% target) colnames(sp_500_2015_2020_Riingo_TT_HWM)<-c("symbol","date","open","high","low","close","volume","adjusted") head(sp_500_2015_2020_Riingo_TT_HWM)
## # A tibble: 6 x 8 ## symbol date open high low close volume adjusted ## <chr> <date> <dbl> <dbl> <dbl> <dbl> <int> <dbl> ## 1 HWM 2015-01-02 15.9 16.0 15.7 15.9 10419290 34.0 ## 2 HWM 2015-01-05 15.6 15.6 14.9 15.0 21690561 32.0 ## 3 HWM 2015-01-06 15.0 15.4 14.9 15.1 19374678 32.2 ## 4 HWM 2015-01-07 15.3 15.6 15.2 15.5 15946440 33.1 ## 5 HWM 2015-01-08 15.7 16.0 15.6 15.9 19633147 34.0 ## 6 HWM 2015-01-09 16 16.2 15.8 16.1 16672646 34.5
Remove all the imcomplete tickers from the Yahoo data.
target_2<-c("CARR","CTVA","DOW","FOX","FOXA", "FTV","HPE","HWM", "IR", "KHC", "LW", "OTIS","PYPL","UA","WRK","TT") sp_500_2015_2020_Yahoo_working_deleted_updated<-sp_500_2015_2020_Yahoo_working %>% filter(!symbol %in% target_2)
Combine the data from Yahoo and Riingo.
sp_500_2015_2020_final<-rbind(sp_500_2015_2020_Yahoo_working_deleted_updated, sp_500_2015_2020_Riingo_TT_HWM) sp_500_2015_2020_final<-sp_500_2015_2020_final %>% arrange(symbol)
Next, we want to separate all the stocks into Aristocrats versus non-Aristocrats.
data<-read.csv("Dividend_Aristocrats.csv") DA_tickers<-data %>% pull(Ticker) DA_tickers<-as.character(DA_tickers) DA_tickers<-replace(DA_tickers, DA_tickers=="BF.B", "BF-B") sp_500_2015_2020_final_Aristocrats_OR_NOT<-sp_500_2015_2020_final %>% mutate(aristocrat = ifelse((sp_500_2015_2020_final$symbol %in% DA_tickers),1,0)) sp_500_2015_2020_final_Aristocrats_OR_NOT[,c("symbol","aristocrat")]
## # A tibble: 653,030 x 2 ## symbol aristocrat ## <chr> <dbl> ## 1 A 0 ## 2 A 0 ## 3 A 0 ## 4 A 0 ## 5 A 0 ## 6 A 0 ## 7 A 0 ## 8 A 0 ## 9 A 0 ## 10 A 0 ## # ... with 653,020 more rows
sp_500_aggregate<-aggregate(sp_500_2015_2020_final_Aristocrats_OR_NOT[, "adjusted"],by= list(sp_500_2015_2020_final_Aristocrats_OR_NOT$date, sp_500_2015_2020_final_Aristocrats_OR_NOT$aristocrat), mean) colnames(sp_500_aggregate)<-c("date", "aristocrat", "adjusted")
we can then plot the data using ggplot.
sp_500_aggregate%>% ggplot(aes(x = date, y = adjusted,group=aristocrat, color = as.factor(aristocrat))) +geom_line()+ scale_x_date(breaks = scales::pretty_breaks(n = 5)) + labs(x = "", y = "Adjusted Price") +theme_classic()
sp_500_aggregate%>% ggplot(aes(x = date, y = adjusted,group=aristocrat, color=as.factor(aristocrat))) + geom_line()+ scale_x_date(limit=c(as.Date("2020-01-01"),as.Date("2020-04-15")))+ labs(x = "Year of 2020", y = "Adjusted Price") +theme_classic()
# The list is from https://www.dividend.com/investor-resources/sp-500-companies-that-dont-pay-dividends/ Non_Dividend<-read.csv("non_dividends.csv") Non_Dividend_tickers<-Non_Dividend %>% pull(ï..Ticker) Non_Dividend_tickers<-as.character(Non_Dividend_tickers) Non_Dividend_tickers<-replace(Non_Dividend_tickers, Non_Dividend_tickers=="BF.B", "BF-B")
sp_500_2015_2020_final_Aristocrats_OR_NOT_dividend_ornot<-sp_500_2015_2020_final_Aristocrats_OR_NOT %>% mutate(Dividends = ifelse((sp_500_2015_2020_final_Aristocrats_OR_NOT$symbol %in% Non_Dividend_tickers),0,1))
sp_500_2015_2020_final_Aristocrats_OR_NOT_dividend_ornot_3_groups<-sp_500_2015_2020_final_Aristocrats_OR_NOT_dividend_ornot%>% mutate(threegroups = ifelse(Dividends==0,0,(ifelse(aristocrat==1,2,1))))
sp_500_aggregate_3<-aggregate(sp_500_2015_2020_final_Aristocrats_OR_NOT_dividend_ornot_3_groups[, "adjusted"], by= list(sp_500_2015_2020_final_Aristocrats_OR_NOT_dividend_ornot_3_groups$date, sp_500_2015_2020_final_Aristocrats_OR_NOT_dividend_ornot_3_groups$threegroups), mean)
colnames(sp_500_aggregate_3)<-c("date","threegroups","adjusted") sp_500_aggregate_3%>% ggplot(aes(x = date, y = adjusted,group=as.factor(threegroups), color=as.factor(threegroups))) + geom_line()+ scale_x_date(breaks = scales::pretty_breaks(n = 5))+ labs(x = "", y = "Adjusted Price") +theme_classic()
sp_500_aggregate_3%>% ggplot(aes(x = date, y = adjusted,group=as.factor(threegroups), color=as.factor(threegroups))) + geom_line()+ scale_x_date(limit=c(as.Date("2020-01-01"),as.Date("2020-04-15")))+ labs(x = "Year of 2020", y = "Adjusted Price") +theme_classic()
sp_500_aggregate$base_2015<-ifelse(sp_500_aggregate$aristocrat==0,75.21828,74.84421) sp_500_aggregate<-sp_500_aggregate %>% mutate(percentage_2015=(adjusted-base_2015)/base_2015)
sp_500_aggregate%>% ggplot(aes(x = date, y = percentage_2015,group=aristocrat, color = as.factor(aristocrat))) +geom_line()+ scale_y_continuous(labels = scales::percent, breaks = scales::pretty_breaks(n = 10))+ scale_x_date(breaks = scales::pretty_breaks(n = 5)) + labs(x = "", y = "Return") +theme_classic()
sp_500_aggregate_3 %>% group_by(threegroups) %>%slice(1)
## # A tibble: 3 x 3 ## # Groups: threegroups [3] ## date threegroups adjusted ## <date> <dbl> <dbl> ## 1 2015-01-02 0 123. ## 2 2015-01-02 1 66.8 ## 3 2015-01-02 2 74.8
sp_500_aggregate_3$base_2015<-ifelse(sp_500_aggregate_3$threegroups==0,122.66204,ifelse(sp_500_aggregate_3$threegroups==1,66.80719,74.84421)) sp_500_aggregate_3<-sp_500_aggregate_3 %>% mutate(percentage_2015=(adjusted-base_2015)/base_2015)
sp_500_aggregate_3%>% ggplot(aes(x = date, y = percentage_2015,group=as.factor(threegroups), color=as.factor(threegroups))) + geom_line()+ scale_y_continuous(labels = scales::percent, breaks = scales::pretty_breaks(n = 10))+ scale_x_date(breaks = scales::pretty_breaks(n = 6))+ scale_colour_discrete(name="SP500 Stock Categories", breaks=c("0", "1", "2"), labels=c("No Dividends", "Dividend Non-Aristocrats", "Dividend Aristocrats"))+theme_classic()+ labs(x = "", y = "Return")+ggtitle("SP500 Return Based on Adjusted Close Price (from Jan. 2015)")+theme(legend.position = c(0.2, 0.7))
sp_500_aggregate_3%>% ggplot(aes(x = date, y = percentage_2015,group=as.factor(threegroups), color=as.factor(threegroups))) + geom_line()+ scale_y_continuous(labels = scales::percent, breaks = scales::pretty_breaks(n = 10))+ scale_x_date(limit=c(as.Date("2020-01-01"),as.Date("2020-04-15")),breaks = scales::pretty_breaks(n = 10))+scale_colour_discrete(name="SP500 Stock Categories", breaks=c("0", "1", "2"), labels=c("No Dividends", "Dividend Non-Aristocrats", "Dividend Aristocrats"))+theme_classic()+ labs(x = "Year of 2020", y = "Return")+ggtitle("SP500 Return Based on Adjusted Close Price (from Jan. 2015)")+theme(legend.position = c(0.2, 0.2))
sp_500_aggregate_3_2020<-sp_500_aggregate_3 %>% filter(date > "2019-12-31") sp_500_aggregate_3_2020 %>% group_by(threegroups)%>% slice(1)
## # A tibble: 3 x 5 ## # Groups: threegroups [3] ## date threegroups adjusted base_2015 percentage_2015 ## <date> <dbl> <dbl> <dbl> <dbl> ## 1 2020-01-02 0 258. 123. 1.11 ## 2 2020-01-02 1 124. 66.8 0.857 ## 3 2020-01-02 2 131. 74.8 0.756
sp_500_aggregate_3_2020$base_2020<-ifelse(sp_500_aggregate_3_2020$threegroups==0,258.2466,ifelse(sp_500_aggregate_3_2020$threegroups==1,124.0812,131.4314)) sp_500_aggregate_3_2020<-sp_500_aggregate_3_2020 %>% mutate(percentage_2020=(adjusted-base_2020)/base_2020)
sp_500_aggregate_3_2020%>% ggplot(aes(x = date, y = percentage_2020,group=as.factor(threegroups), color=as.factor(threegroups))) + geom_line()+ scale_y_continuous(labels = scales::percent, breaks = scales::pretty_breaks(n = 10))+ scale_x_date(breaks = scales::pretty_breaks(n = 10))+ scale_colour_discrete(name="SP500 Stock Categories", breaks=c("0", "1", "2"), labels=c("No Dividends", "Dividend Non-Aristocrats", "Dividend Aristocrats"))+theme_classic()+ labs(x = "Year of 2020", y = "Return")+ggtitle("SP500 Return Based on Adjusted Close Price (from Jan. 2020)")+theme(legend.position = c(0.2, 0.2))
Note: R Code and datasets are available on Github.