Welcome to the web blog
Bit pusher at Spotify. Previously Interactive News at the New York Times, U.S. Digital Service, and Code for America.
At the end of the second part of the Citibike rebalancing problem, we had visualized a network of a cluster of citibike stations and left off with noting that the final steps involved extracting the information from the graph data and packaging it up for delivery.
The first step in doing this is to extract the information from the network graph. The following function both gets the graph data from an input dataframe and returns the necessary information from that graph.
Information that we need is stored in the vertices. In order to access the vertices from igraph, we simply call V(g)
where g
is our graph object. When you access the vertices, the information is returned to you as a vector of R character
s. Note that this is accounted for above by casting this vector to a matrix.
The last thing that we want to do is to only return stations that aren’t close to each other. Note that this implementation is definitely the most hack-y of any other part of the project so far.
First, we use the citibike app endpoint (note that this is a different endpoint from the original API). From this, we can get each station’s ID and the IDs of its closest five neighbors:
Note how awful parsing nested JSON is in R.
Once we have this information, we can go ahead and make our recommendations. In order to do this, we are going to use a while
loop to get the top four results for one particular cluster. Even though I am making use of control flows in R in this script (for
loops, while
loops and if
/else
statements), typically they are to be avoided in favor of the apply
family of functions. If you are reading this and have a more clever way of doing what happens below, please let me know.
With this, we can finally get ready to return out the final recommendations. I’ve cleaned up the script used to generate the code from parts one and two and turned those pieces into function calls.
Looking at these recommendations, we can see a list a priority list of stations to be rebalanced, culled from a list of over one hundred mostly empty stations:
id name available total 1 260 Broad St & Bridge St 32 35 2 337 Old Slip & Front St 37 37 3 360 William St & Pine St 38 39 4 224 Spruce St & Nassau St 25 31 5 317 E 6 St & Avenue B 22 27 6 410 Suffolk St & Stanton St 30 35 7 428 E 3 St & 1 Ave 27 31 8 504 1 Ave & E 15 St 43 45 9 2017 E 43 St & 2 Ave 37 39 10 228 E 48 St & 3 Ave 54 55 11 501 FDR Drive & E 35 St 35 43 12 456 E 53 St & Madison Ave 31 35 13 120 Lexington Ave & Classon Ave 17 19 14 270 Adelphi St & Myrtle Ave 20 23 15 372 Franklin Ave & Myrtle Ave 25 27 16 396 Lefferts Pl & Franklin Ave 23 25
Now that the part of the system is complete, there’s still a few more things that can be done to make it even better:
I think that these features would all be positive steps forward in bulding this tool out. I’m going to start working on deploying the output onto a map first because the visual will be, I think, easier to understand than the list of stations.