In my opinion plotting data is the most important part of any data analysis. Plots are both essential starting point of any data analysis and the end point of the project delivery. We can either make or break an argument with a powerful graphics. In the energy and commodities trading analysis and risk management, Information-rich, clean graphics, are not only a great discovery tool, but they can also work as a powerful communication medium for “decision-engineering” and influencing.

I am a big fan of ggplot2 package in R. I use ggplot2 graphs, both as a standalone, in R Markdown reports, or shiny interactive apps. In my line of work, I am usually interested in time-series data to better understand market dynamics. While ggplot2 is has offered me a great balance between simplicity and scalibility, I am always on the look out for new tools.

In this post I am exploring two Java-Script based visualization packages for a typical use in energy market analysis. These packages include (1) plotly and (2) dygraphs. Both can be found in: htmlwidgets and are fairly easy to use.

plotly

First I am going to look at Plot.ly. This is a very interesting product. Starting Nov 17 2015 this product is fully open source under (MIT license). I looked at this product in the past but hesitated to send data to a public cloud. From a personal experience, the new approach is great because it allows people like me to develop applications behind the corporate firewall, and demonstrate value proposition of the tool to my employer before making a purchase for a full version. Alternative, involved trying to make a business case without a working prototype, and send the request to purchase to corporate IT’s black hole…. As a lot of us painfully aware, this is where good products and ideas go to die….

Initially, I am interested to replicate my ggplot plot from my previous post Quandl for Energy

First, as usual, I load required packages.

library(devtools)
library(Quandl)
library(dplyr)
library(plotly)

Then pull Natural Gas front contract from Quandl.

mydata = Quandl("CHRIS/CME_NG1",trim_start="1983-03-30", trim_end="2015-10-20")
head(mydata)

Clean up the data using dplyr.

mydata<-mydata %>% dplyr::mutate(Date=as.Date(Date,format="%Y-%m-%d")) %>% dplyr::arrange(-desc(Date)) %>% dplyr::mutate(Year=lubridate::year(Date)) %>% dplyr::select(Date,Settle,Year,Volume) %>% dplyr::filter(Date > as.Date("2012-01-01","%Y-%m-%d"))

Do simple plot using ggplot as per-previous example.

ggplot(data=mydata, aes(x=Settle, y=Volume, color=factor(Year)))+
geom_point(size=0.5)+ggtitle("")+stat_smooth()

Now I want to plot the same chart using plotly. The result is almost identical! But now with added functionality of extracting more information using mouse-over. This is significant specially if we want to look at some of the outlier points in the chart.

p<-ggplot(data=mydata, aes(x=Settle, y=Volume, color=factor(Year)))+
geom_point(size=0.5)+ggtitle("")+stat_smooth()
(gg <- ggplotly(p))

So far so good. I am very impressed! The nice thing about this tool is I can use all of my ggplot2 functions without much change to the code!

The second thing I want to test is to see how well I can plot time-series data and do a simple annotation and run a fit through the data. The result is below for Natural Gas front month settles.

p <- plot_ly(mydata, x = Date, y = Settle)
p %>%
  add_trace(y = fitted(loess(Settle ~ as.numeric(Date)))) %>%
  layout(title = "Front Contract NG Prices",
         showlegend = FALSE) %>%
  dplyr::filter(Settle == max(Settle)) %>%
  layout(annotations = list(x = Date, y = Settle, text = "Winter 2014 Cold Weather", showarrow = T))

The third test was to see how well I can map my data in Geo Map. Andrew A through his great post Making Maps in R inspired me to give this a try.

My goal to is to layer Natural Gas basis pricing points on the map and insert a line representing a line. Then through the size/color of the circles and lines I can layer more information.

The setup for this example is little more involved and I am sure there is more elegant way of doing this out there….

First I get the approximate Lat/Lon of some of the compressor stations on the GTN pipeline that take Alberta Gas to California. Then clean the data little bit.

getGPS<-function(iaddress) {
require(RJSONIO)
getUrl <- function(address,sensor = "false") {
  root <- "http://maps.google.com/maps/api/geocode/json?"
  u <- paste0(root,"address=", address, "&sensor=false")
  return(URLencode(u))
}
target <- getUrl(iaddress)
dat <- fromJSON(target)
latitude <- as.character(dat$results[[1]]$geometry$location["lat"])
longitude <- as.character(dat$results[[1]]$geometry$location["lng"])
place <- dat$results[[1]]$formatted_address
return(list(lat=latitude,lng=longitude,address=place))
}
 
Pricing<-data.frame(code=c("GTN1","GTN2","GTN3","GTN4"),address=c("Calgary, AB","Stanfild, OR","Malin, OR","San jose, CA"),IFERC=c("AECO","Stanfield","Malin","PG&E City Gate"),Price=c(2.5,2.6,2.7,2.8)) 
Pricing<-Pricing %>% dplyr::mutate(lat=lapply(address,function(x) getGPS(x)$lat),lon=lapply(address,function(x) getGPS(x)$lng))
#
tmp1<-Pricing[1:3,] %>% dplyr::select(code,lat,lon) %>% dplyr::rename(start_lat=lat,start_lon=lon)
tmp2<-Pricing[-1,] %>% dplyr::select(lat,lon) %>% dplyr::rename(end_lat=lat,end_lon=lon)
Pipes<-cbind(tmp1,tmp2)
Pipes$id <- seq_len(nrow(Pipes))
Pipes<-Pipes %>% mutate(start_lat=as.character(start_lat),start_lon=as.character(start_lon),end_lat=as.character(end_lat),end_lon=as.character(end_lon))

Now I can plot the pipeline with my points on it. I can use size and color of the objects to layer data very quickly. This can for example represent price, liquidity or risk of trading position. I can change the color of the pipe to show if the segment of the pipe is in the money for example. I can use the size option to show capacity of the pipe.

geo <- list(
  scope = 'north america',
  projection = list(type = 'azimuthal equal area'),
  showland = TRUE,
  landcolor = toRGB("gray95"),
  countrycolor = toRGB("gray80")
)

plot_ly(Pricing, lon = lon, lat = lat, text = paste0("Loc:",IFERC," Price: $",Price), type = 'scattergeo',
        locationmode = 'USA-states', marker = list(size = c(20,6,7,10), color = c('blue','red','blue','blue')),
        inherit = FALSE) %>%
    add_trace(lon = list(start_lon, end_lon), lat = list(start_lat, end_lat),
            group = id, data = Pipes,
            mode = 'lines', line = list(width = c(5,2,2), color = c('red','green','green')),
            type = 'scattergeo', locationmode = 'USA-states') %>%
    layout(title = 'Natural Gas Pipeline',
         geo = geo, showlegend = FALSE, height=1000)

While not as elegant as Andrew A leaflet example but it gets the job done very quickly…

dygraphs

Now I am going to look at dygraphs for time series data. The feature that grabbed my attention is the ability to feed xts data. I still think xts is the easiest format for time series data. So I am going to test this out.

Staying with the Natural Gas example and looking at weekly EIA storage numbers, I want to plot the data and forecast the working gas in the storage over the next year! Nothing fancy simple seasonally adjusted forecast with upper and lower range.

GasStorage = Quandl("FLYINGSQRL/WEEKLY_NAT_GAS_UNDERGROUND_STORAGE_LOWER_48_STATES",trim_start="1983-03-30", trim_end="2015-12-30")
names(GasStorage)<-c("Date","BCF")
GasStorage<-xts(GasStorage$BCF,order.by=GasStorage$Date)
GasStorage.ts<-ts(GasStorage,frequency = 52,start = c(2013,9))
hw <- HoltWinters(GasStorage.ts)
predicted <- predict(hw, n.ahead = 52, prediction.interval = TRUE)

d<-last(index(GasStorage))+lubridate::days(1:52*7)
p<-xts(data.frame(predicted),order.by = d)

all<-cbind(GasStorage,p)
names(all)<-c("HS","fit","upr","lwr")

dygraphs::dygraph(all, "Lower 48 - Natural Gas Storage") %>%
  dygraphs::dySeries("HS", label = "Actual") %>%
  dygraphs::dySeries(c("lwr", "fit", "upr"), label = "Predicted") %>%
  dygraphs::dyRangeSelector(dateWindow = c("2014-01-01", "2016-10-01"))

Overall, I really liked this function. It is very easy to use the chaining feature makes it very easy to layer features into the chart.

The second example is a classic risk management chart! Price return of some sort, within highlighted $ vol and apply some sort of limit. The result is a very decent looking chart. Again the mouse over feature here is the key that makes the chart very useful…. I don’t need to look at the raw data to find out the date of the outlier price moves.

mydata = Quandl("CHRIS/CME_NG1",trim_start="1983-03-30", trim_end="2015-10-20")
mydata<-mydata %>% dplyr::mutate(Date=as.Date(Date,format="%Y-%m-%d")) %>% dplyr::arrange(-desc(Date)) %>% dplyr::mutate(Year=lubridate::year(Date)) %>% dplyr::select(Date,Settle,Year,Volume) %>% dplyr::filter(Date > as.Date("2012-01-01","%Y-%m-%d"))

ret<-TTR::ROC(mydata$Settle)
ret<-xts(ret,order.by=mydata$Date)
names(ret)<-"NG1"
mn = mean(ret, na.rm = TRUE)
std = sd(ret, na.rm = TRUE)

 dygraphs::dygraph(ret, main = "Natural Gas Realized Vols",ylab="$") %>% 
   dygraphs::dySeries("NG1", label = "NG1",strokeWidth = 1, strokePattern = "dashed") %>%
   dygraphs::dyShading(from = mn - std, to = mn + std, axis = "y")%>%
   dygraphs::dyLimit(-0.05, color = "red") %>%
   dygraphs::dyLimit(0.05, color = "red")

All in all I am very happy with both plotly and dygraphs packages. Going forward, I am planning to use them extensively in my shiny apps…..