til: Using htmltest to find broken links in your blog

While working on my blog I discovered that some of the image and link sources in different posts had become broken without me noticing. Outside of “hugo says OK” I don’t have any other validation running on my blog so I decided to look for something that could lint the rendered output for dead links and other issues. I tried a number of different tools but ended up using htmltest. It runs after hugo renders the static output of my blog in a public/ directory in the project root.
Read more →

til: Addressing S3 URIs with Pandas using s3fs

Another short “today I learned” post from the analytics mines. If you have previous experience writing any form of data munging or analytics tasks then you have almost certainly encountered Python, Pandas, and AWS S3 in some combination. These jobs usually follow the structure: download the files from S3. deserialize them into Python objects & create Pandas dataframes. perform calculations over these dataframes. Normally #1 and #2 would be wasted repetitive work that is left to the reader, but there is a better way.
Read more →

til: About the /.well-known/change-password URI

I attended BSidesSF this year for the first time in a while and saw Aalaa Kamal Satti and Yuru Shao of Pinterest speak about their efforts on password security for both Pinterest’s consumer and business users. During their talk they spoke about implementing support for the /.well-known/change-password URI that allows websites to integrate with the password managers that ship within most modern browsers. These password managers have had features like checking for compromised credentials via HaveIBeenPwned for a while but prior to the .
Read more →

til: A simple ETL task in Airflow using PostgresHook

This week at work I had the need to build a small ETL (Export, Transform, Load) process to move some data from PostgreSQL database A (a primary relational database used by our application to serve customer traffic) to PostgreSQL database B (a back-of-house instance used to perform metering and other usage analytics). We already use Apache Airflow to orchestrate the metering tasks, data sync and Stripe API interactions, so building this process in Airflow was my first choice.
Read more →