As the number of jurisdictions regulating the ride-hailing industry grows, a natural question for investors is what, if any, effect this would have on the use of ride-hailing apps.
Almost surely, new rules and fees will raise the price of rides, so this comes down to an elasticity calculation – how sensitive is ride-hailing demand to changes in price? Adding to the relevance of this question is the limited ability of the biggest ride-hailing companies to continue to absorb losses now that they are public.
In order to provide insight into these issues, our Investment Sciences team downloaded data on circa 2.4 billion rides from 2010 to 2019 across all major providers of taxi services from the NYC Taxi and Limousine Commission (TLC). It is the agency responsible for licensing and regulating New York City's medallion (yellow) taxis, street hail livery (green) taxis, for-hire vehicles, commuter vans and paratransit (wheelchair-accessible) vehicles.
The TLC collects trip record information for each taxi and for-hire vehicle trip completed by TLC licensed drivers and vehicles. Broadly speaking, data on pick-up and, more recently, drop-off location, duration of the ride, fare (for yellow and green taxis only), and whether or not the ride was shared (for-hire vehicles only, capturing pooled versus individual rides) are available.
In order to understand the drivers of ridership across different parts of the city, TLC data was merged with demographic data from the US Census, the IRS and New York City Open Data. This required mapping pick-up and drop-off locations into the Neighborhood Tabulation Areas (NTAs) used by New York City, which are based on Census Tracts used by the US Census department, and which themselves can be mapped to zip codes.
The result is a comprehensive ride-hailing dataset that can be used to answer important questions about how the introduction of app-based ride hailing has changed transportation in New York City.
Some key notes about working with the data:
- Some observations prior to 2018 are missing values for pick-up and drop-off locations. To avoid introducing any bias related to this, we use observations from 2018 forward for most of our analysis.
- The locations of pick-ups and drop offs are provided by latitude/longitude coordinate prior to 2015, but in later data are coded to one of 263 taxi zones.
- In order to add information about the populations of these areas, we mapped them as closely as possible to Neighborhood Tabulation Areas (NTAs), which are aggregations of census tracts used by New York City government in providing population aggregates on its Open Data portal. We also mapped NTAs and taxi zones to zip code boundaries to join other data, such as income data from the IRS.
- We started by aggregating ride volumes at the NTA level (we use neighbourhood and NTA interchangeably in this text). Because the boundaries of NTAs and taxi zones do not match exactly, for each NTA we summed all the rides for the taxi zones that are entirely within that NTA. Then, we split the rides in any taxi zone between all NTAs that overlap with it, proportionate to the amount of overlap.
- Some taxi zones do not correspond to population centers, and therefore do not have meaningful population features. These areas include: JFK and LaGuardia airports, parks, cemeteries, Rikers Island, and rides that originated or concluded outside New York City.