Sky Blog Behavioral Data


GitHub Archive Visualizer

Overview

Today I'm releasing code and a video for the first Sky demo app. Sky is built to aggregate user actions and state over time so I wanted to find a data set and a visualization that fellow developers would relate to. And so was born: The GitHub Archive Visualizer!

The visualizer uses Sky to step through aggregate paths of user actions. So you can walk through and see what users do immediately following an action such as Create Repository. The Sankey diagram shows the total number of users who performed each particular path. It’s also interactive so you can drill down to as many levels deep as you want.

It's important to note that all the results are computed in real-time. There's no aggregation happening. Every click on the UI recomputes the entire dataset. In the demo, Sky maxes out at about 60MM events per second. That's fast enough to crunch through the monthly pages views of StackOverflow in about a second and a half. On my laptop. On a single thread.

Video Walkthrough

Sky is built to import a lot of events per second but to only allow a handful of large queries at a time. That's typically how analytics are used within organizations. The downside of this is that putting a public demo up and having tens of thousands of people querying tens of millions of events each is just not feasible.

So I created a short walkthrough of the demo app that you can view below.

The video is six and a half minutes but the first half is a short history on Sky and some information on the dataset. If you don't have time to spare, you can skip to 3:20 to actually see the visualizer in action.

Installing the Demo

If the video does satisfy you need for data crunching, you can install the app locally on your machine. The sky-d3-demo project project on GitHub has details on how to get up and running. If you have any issues getting it installed, feel free to add an issue on GitHub.

I'll be adding more demos in the future. If you have any suggestions for behavioral analytics you'd like to see (like dynamic cohort analysis or predictive behavioral analytics), add a comment below to suggest it.

Special Thanks

The demo wouldn't be possible without these awesome tools and services:


comments powered by Disqus