Latest posts

Processing NYC Taxi Data Part 3: Labeling and Aggregating

In this multi-part series, I will process five years of taxi trips from the NYC TLC dataset. This part describes how to cluster and aggregate the dataset using PySpark.

Processing NYC Taxi Data Part 2: Geofiltering

In this multi-part series, I will process five years of taxi trips from the NYC TLC dataset. This part describes how to Spark to select only trips that happen within Manhattan.

Processing NYC Taxi Data Part 1: Downloading

In this multi-part series, I will process five years of taxi trips from the NYC TLC dataset. This part describes how to move the data to S3.

Approximations to social clusters.

Tribes as an approximate function to clusters and some implications.

Testing New York's Taxi Dataset, Google's BigQuery and GeoPandas

I use Google's BigQuery to fetch and aggregate NYC's taxi dataset,then I user GeoPandas to filter out by geographic location.

Plotting the Task of Postulating as an Independent Candidate in Juarez

Using Folium and GeoPandas to plot the daunting task of postulating as an independent candidate in Ciudad Juarez, Chih., Mexico.

Extracting Shapefiles from a Zip Virtual File Server in Fiona.

Using Fiona to extract shapefiles in memory, and then using Folium to plot them.

Hosted IPython Notebooks

A service that allows researchers to easily host, run, and share Jupyter notebooks.

A Friendly Network of Small Investors.

Thoughts on a platform for identifying potential investment partners within your group and extended group of friends.

On the Lack of Benchmarking

Thoughts on the opportunity of providing users with feedback based on peers.