Generic placeholder image

Hadoop.

1 year taxi data is about 50 GB. All the visualization and analytics in this website relies on hadoop platform for data aggregation and preparation - AWS EMR, hadoop, pig and python.

View details »

Generic placeholder image

Analytics.

K-means clustering method of the drop off area based on some features derived from Census data reveals several interesting finding.

View details »

Generic placeholder image

Maps

Rtree algorithm was in map reduce was used to index the pickup and dropoff lat/lon spatially by zipcode.

View details »

The most common tip amounts?

The most common tip amount is between 1~3 dollars.
The most frequently occurring tip amount is $1 dollar, followed by $2 and $1.50.
People tend to give dollars and half dollars for tips.

1000x1000

20-22% is the magic number.

The most common tip percentage ranges in 20-22%.
It out counts the second common range, which is 22-24%, by almost three times.

1000x1000

How speed matters.

The average speed is between 0-10MPH.
The riders pay around 20% of tip.
The tip decreases with the increase of the speed until the speed hits 38MPH.

1000x1000

Fluctuations in higher fare amount.

Overall trend: Greater the fare amount, smaller the tip percentage.
Fare > 50, tip percentage fluctuates.
Fare > 50, low tip percentage at fares ending in 0, 5.

1000x1000

Weekdays VS Weekends.

On workdays
people tend to tip the most during off-work hours (4-7pm).
People tend to tip the least during to-work hours (6-9am).
On weekend
Tip percentage does not fluctuate as much as on workdays.
It is slightly higher during night (8pm-5am) and morning (8-11am) than in the other period of the day.

1000x1000

Tip percentage by month.

Average tip percentage peaks December, January and August.
Average tip percentage is lower in spring and fall seasons.
Hypothesis: Perhaps people tend to pay more tips when the weather is harsh!

1000x1000

Trip count per month.

Trip number peaked in March, April, October and November.
Trip number hit the bottom in August.
Hypothesis: visitors came to the City in Spring and Fall; New Yorkers fled out in the summer.

1000x1000