Unleashing Websockets to better understand Bitcoin

Back when Bitcoin was all the rage, I promised myself never to jump on the bandwagon. I gave in after a few days and brought a hundred dollars worth of Ethereum. That doubled in a month and I rewarded my borderline gambling with Trappist beer. Can’t say it was an honest day’s work. But it felt good.

Photo of me after selling my Cryptocurrencies in January 2018

Recently, a colleague mentioned Websockets – a framework that allows you to receive ‘alerts’ of incoming data in real-time – to perform real-time analysis of cryptocurrencies. This piqued my curiosity.

Introduction

Websockets is truly a fascinating way to ingest data. Previously, data was sent from the server whenever a client generates a request. With Websockets, an initial request/response validates the client and opens a channel between the client and server. In our example, the server will send a stream of continuous data unless the server is given an explicit request to close the channel.

Configuring a Websocket

Having little prior experience of Python, I relied on many many tutorials from more gifted individuals. This post, written by Will McGugan, was especially useful as it explained concisely how to configure a WebSocket with the Lomond library. I adapted his code to add a counter that would close the connection after successfully ‘capturing’ a certain number of records. For new users like myself, I’d also recommend skimming through the documentation for Lomond to better understand the architecture of WebSockets at a high-level.

Since evolving to become Coinbase Pro, GDAX has re-engineered itself to be much more user-friendly

Our data comes from Gdax, a Bitcoin exchange. Gdax was the first licensed U.S Bitcoin exchange. Today, it is part of Coinbase’s suite of offerings – though more tailored towards professional traders.

  1. First step was to download all the necessary libraries
#import libraries
from lomond import WebSocket
from datetime import datetime, timedelta
from plotly.subplots import make_subplots
import time
import pandas as pd
import plotly.graph_objects as go

2. Then I set up a WebSocket to Gdax.


#create a websocket to gdax.com
websocket = WebSocket('wss://ws-feed.gdax.com')
counter = 0
price = []
time = []
lastSize = []


#when you iterate over the websocket instance, it generates a stream of event objects
for event in websocket:
    #iterating over the websocket automatically calls the connect method
    #ready indicates successful connection
    if event.name == "ready":
        #encode object as JSON
        websocket.send_json(
            type='subscribe',
            product_ids=['BTC-USD'],
            channels=['ticker']
        )
    #when data comes to the server
    elif event.name == "text":
        counter = counter + 1
        if counter >= 100:
                    print(datetime.now().timestamp())
                    websocket.close()
        elif counter >= 10:
            #after 100 records, initiate websocket close handshake
            eventDict = event.json
            a = eventDict['price']
            b = datetime.strptime(eventDict['time'],"%Y-%m-%dT%H:%M:%S.%fZ")
            c = float(eventDict['last_size'])
            price.append(a)
            time.append(b)
            lastSize.append(c)

Ingesting Data

When data is sent from the server, the information flows in the form of a dictionary. It is up to us to capture the data, transform and store it in a dataframe. The trick was changing the received time into a timestamp and reinstating that as our dataframe’s index.

#convert volume into tuple
vol = list(zip(price,lastSize))
#store data in a temporary dataframe
df = pd.DataFrame(vol, index=time,columns=['PRICE','LAST_SIZE'])
#convert index to datatime
df = df.set_index(pd.to_datetime(df.index))
#convert price to numeric type
df['PRICE'] = pd.to_numeric(df['PRICE'])

A strength of pandas is resample() – for frequency sampling of time series. This especially came in handy later when charting.


#use built-in function to create data_ohlc
binSize = '10S'
vol_ohlc = df.resample(binSize, how={'LAST_SIZE': 'sum'})
data_ohlc =  df['PRICE'].resample(binSize).ohlc()
data_ohlc['LAST_SIZE_SUM'] = vol_ohlc['LAST_SIZE']
data_ohlc['TIMESTAMP'] = data_ohlc.index

Visualizing in Real-Time

One of my most favorite graphs for visualizing financial data is the candlestick plot. A good candlestick chart illustrates the distribution in prices within a time interval. The interval could be as large as days or as granular as seconds.

Our visualizations were created with Plotly. Luckily, the library came built-in with functions to create candlestick charts. Install plotly and extract plotly.graph_objects to use these features.

---------------------------------------------------------------
def buildGraph(ohlc):
#use plotly to build a figure
 fig = go.Figure(data = [go.Candlestick(x=ohlc.index,open=ohlc['open'],high=ohlc['high'],low=ohlc['low'],close=ohlc['close'])])
 fig.update_layout(title = "Candlestick chart of Bitcoin Price (USD)")
 return fig
-------------------------------------------------------------
#get user input to create a graph
fig1 = buildGraph(data_ohlc)
fig1.show()

In the example, I set the bin size at 10 seconds. An advantage of WebSockets is the incredible level of granularity I can extract information at. If I wanted to, I could set the bin size at milliseconds – although, at that point, market noise would overwhelm any patterns in the data.

I set the bin size to 10 seconds – pretty incredible if you consider it

Conclusions

Overall, I am deeply impressed by the ease in configuring Websockets and retrieving stellar quality data. I am especially grateful by the number of tutorials and blogposts available. I am eager to see where I can take this. I’d love to build a trading bot and toss it some chum change every once in a while. But at the end of the day, its all about the learning.

References

https://www.willmcgugan.com/blog/tech/post/stream-btc-prices-over-websockets-with-python-and-lomond/

https://www.clarkejduggan.com/outlet.html (for the featured image)

https://www.datacamp.com/community/news/converting-tick-by-tick-data-to-ohlc-data-using-pandas-resample-yx9hbxo919e

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s