til: A simple ETL task in Airflow using PostgresHook

This week at work I had the need to build a small ETL (Export, Transform, Load) process to move some data from PostgreSQL database A (a primary relational database used by our application to serve customer traffic) to PostgreSQL database B (a back-of-house instance used to perform metering and other usage analytics). We already use Apache Airflow to orchestrate the metering tasks, data sync and Stripe API interactions, so building this process in Airflow was my first choice.
Read more →

Reading mailing list archives with Python: Noisechain Pt. 1

Inspired by the twitter account “Shit Noisebridge says” I set about recently to script together that trains a Markov chaina on the complete archive of the noisebridge-discuss@ mailing list to create a rival account “Shit noisebridge probably says”. It’s not yet complete but one useful thing to have fallen out of this project already is a script that makes it easy to download mailman list arcihves in their entirety by passing the name of a list.
Read more →