Welcome to the web blog
Bit pusher at Spotify. Previously Interactive News at the New York Times, U.S. Digital Service, and Code for America.
First things first: the rebalancing app is live, and it can be found here.
Now that we’ve wrapped up the script that generates station recommendations for rebalancing, we want to visualize this information to make it more digestible. Unfortunately (or maybe fortunately…), R is not a great language for dealing with browsers. Python, however, offers a variety of lightweight web frameworks that make this problem far less challenging.
While R was a great tool for building a prototype, a rewrite in python is hugely beneficial, especially with the ultimate goal of displaying the results as a web app. While a good amount of the logic from the R script rolls over to python without much trouble, I still did have to find libraries to replace kmeans
(from the stats package in R) and igraph
. While there is a python build of igraph
, it requires a C compiler, which we want to avoid for easier deployment later.
Ultimately, I went with networkx
to handle the graph data and cluster
to handle the kmeans clustering due to their pure python implementations.
Working in python had additional advantages: the most obvious is how simple it is to parse the input JSON data. Implementing the kmeans clusters turned out to be fairly simple:
Working with networkx
was also fairly straightforward.
Calling nx.degree_centrality(G)
returns a dictionary-like structure, whose items()
can be called as expected. This makes it fairly easy to join the degree centrality measure from degcent
.
The biggest win for python, though, was in throwing out the nearest stations. While that was very complex (and pretty hard to read) in R, it’s much more straightforward in python.
Because the current iteration of this doesn’t store any data or do any database writing, I decided to deploy the app with Heroku, an awesome service that does some free deployment.
Additionally, I didn’t need a lot of the advanced features that come bundled in with Django, so I opted to use Flask. Flask is very lightweight and handles my needs perfectly for this use case. You can find the finished product here.
A central component to this visualization was going to be the map. I decided to make the map fullscreen, and layer information on top of it. Mapbox and mapbox.js (which is an extension of the already great library Leaflet) were good choices for me in this case. Mapbox produces beautiful maps.
Another big advantage that came from rewriting the script in python was that it made it trivial to look at both empty and full stations, so I decided to display both.
Overall, the relative simplicity of the design showcases the beautiful Mapbox map and lets the information show through fairly clearly. One interesting thing is noticing the way the clusters vary based on time of day. Plugging this system into a predictive system such as the one I’ve mentioned before would be really interesting.
I think the project was successful. When looking at the official Citibike station map, you can see the emptiness patterns. Comparing this against our map shows the same patterns. Instead of a sea of empty stations, however, we instead get a small central cluster of stations.
Feel free to poke around the app and the code (a gist of the R script is here), and let me know if you find issues.