First things first: the rebalancing app is live, and it can be found here.
Now that we’ve wrapped up the script that generates station recommendations for rebalancing, we want to visualize this information to make it more digestible. Unfortunately (or maybe fortunately…), R is not a great language for dealing with browsers. Python, however, offers a variety of lightweight web frameworks that make this problem far less challenging.
While R was a great tool for building a prototype, a rewrite in python is hugely beneficial, especially with the ultimate goal of displaying the results as a web app. While a good amount of the logic from the R script rolls over to python without much trouble, I still did have to find libraries to replace
kmeans (from the stats package in R) and
igraph. While there is a python build of
igraph, it requires a C compiler, which we want to avoid for easier deployment later.
Ultimately, I went with
networkx to handle the graph data and
cluster to handle the kmeans clustering due to their pure python implementations.
Working in python had additional advantages: the most obvious is how simple it is to parse the input JSON data. Implementing the kmeans clusters turned out to be fairly simple:
networkx was also fairly straightforward.
nx.degree_centrality(G) returns a dictionary-like structure, whose
items() can be called as expected. This makes it fairly easy to join the degree centrality measure from
The biggest win for python, though, was in throwing out the nearest stations. While that was very complex (and pretty hard to read) in R, it’s much more straightforward in python.
Because the current iteration of this doesn’t store any data or do any database writing, I decided to deploy the app with Heroku, an awesome service that does some free deployment.
Additionally, I didn’t need a lot of the advanced features that come bundled in with Django, so I opted to use Flask. Flask is very lightweight and handles my needs perfectly for this use case. You can find the finished product here.
A central component to this visualization was going to be the map. I decided to make the map fullscreen, and layer information on top of it. Mapbox and mapbox.js (which is an extension of the already great library Leaflet) were good choices for me in this case. Mapbox produces beautiful maps.
Another big advantage that came from rewriting the script in python was that it made it trivial to look at both empty and full stations, so I decided to display both.
Overall, the relative simplicity of the design showcases the beautiful Mapbox map and lets the information show through fairly clearly. One interesting thing is noticing the way the clusters vary based on time of day. Plugging this system into a predictive system such as the one I’ve mentioned before would be really interesting.
I think the project was successful. When looking at the official Citibike station map, you can see the emptiness patterns. Comparing this against our map shows the same patterns. Instead of a sea of empty stations, however, we instead get a small central cluster of stations.