til: Addressing S3 URIs with Pandas using s3fs
Another short “today I learned” post from the analytics mines. If you have previous experience writing any form of data munging or analytics tasks then you have almost certainly encountered Python, Pandas, and AWS S3 in some combination.
These jobs usually follow the structure:
download the files from S3. deserialize them into Python objects & create Pandas dataframes. perform calculations over these dataframes. Normally #1 and #2 would be wasted repetitive work that is left to the reader, but there is a better way.