Pray listen, stranger, to a tale of noble deeds and bitter struggle, of fffffttt nope it’s about me doing computershit for around three weeks. But I am proud of this graphic and interactive map on oil spills and it has a sort of Homeric-level story behind it. But only if the first thousand stanzas of the Odyssey were him trying to untangle rigging while yelling “Does anyone know how to sail?”
I pitched a bigass graphic timed with the 25th anniversary of the Exxon Valdez spill, which was both an ecological catastrophe and a watershed moment for petro-tragedy regulation. Recap: a tired third mate under the command of a drunk captain plowed an oil tanker into a reef, around 1/5 of its crude poured into a remote Alaskan sound, and a lot of wildlife death followed. Thousands of people hosed rocks and scrubbed birds, Exxon paid far out the butt, and the Oil Pollution Act of 1990 firmed up liability law for spills. 25 years later and there’s residual crud stuck to beaches and fewer Orcas than there should be. So yeah, heavy hydrocarbons tend to linger. Try and keep them out of the biosphere.
I wanted a map showing all oil spills in U.S. waters since the Valdez. Step zero: find a dataset. I alighted on the National Response Center query tool, which is the pollution incident reporting clearinghouse run by the U.S. Coast Guard. It is, naturally, pretty dated-looking and also uses flash. A classic tale of news dev woe. I grabbed the yearly summary data files, found they were inexplicably wrapped in self-extracting .exes, took a round trip to my windows laptop, out came two dozen excel files with ~12 tabs each, which I then ran through some excel macros to split into 231 discrete CSVs covering the years 1990 to 2012. Phew.
From those I wanted to sift the incident date, the vector (ship, refinery, etc.), the stuff spilled (petroleum only), the spill amount and the location. Of course there was no documentation to help me sort the useful field names from the PTF_FLAG_Ys that made up the bulk of the column headers, but with a little bit of OpenRefine and a lot of emails, I figured I could likely sort that out.
So now I had data from 1990-2012 in a sorta-convenient format, but I had to use a python script to grab 2013-2014 pollution incidents from the query tool.
At least I would have done that but for being caught between the Scylla of website maintenance and the Charybdis of pokey FOIA approvals: the query tool went down, so I was not getting any more data. Wasn’t too awful a fate since the editors okayed using data up to 2012.
But then I took a closer look at the data and noticed how locations were stored. Addresses. Cities. States. Oh lord, geocoding. I’d need to geocode an apocalyptic number of incidents, sure to overwhelm whatever API I could throw at it. A pall fell over the project. Mostly because I’m bad at thinking things through.
NICAR 2014 interrupted my panic, the news-Calypso to break up my hideous journey. I especially enjoyed David Fallis’ talk on tracking guns and the four “intro to SQL” classes I had to camp out for (note to NICAR 2015: overflow rooms for the hands-on seshes would help). SQL, SQL…yeah…this gives me an idear.
As with most situations involving me and editorial graphics, fortune intervened to cut the knot. Hallelujah: there was another Coast Guard database that I could use. And it had distinct advantages:
- It tracks chemical and oil spills specifically. Incidents of that sort are included in the National Response Center data set, but it’s more of a giant pollution clearinghouse and thus cluttered with wacky stuff. Exhibit A: someone called the government to tell them a mercury thermometer broke.
- Instead of locating the incidents by address, they have latitude and longitude columns! No desperate shitfights with geocoding confidence levels! Just beautiful decimals! Deliverance! Praise to William Harrison!
Now I had more data, in a better format, with better documentation. By pure luck. Not a good strategy but one can’t argue with results. Anyway I knew I had to face the database music this time: Excel would choke and die on the 20mb TSVs I had to deal with. Again, lucky me, I’m blessed enough to have Toph Tucker at my elbow, a true Phaeacian who could be relied on to provide deep sighs and plenty of “you forgot the semicolon again”s.
I loaded my tab-delimited text files into a MySQL database using SequelPro (great product) and got to querying. Being a neophyte, this naturally took a while, so we’ll skim over my troubles here. With Toph’s help I was able to format queries to retrieve oil spills that a) actually involved petroleum products b) involved them in some non-picayune quantity and c) had lat/longs associated with them.
After all sorts of false starts and late nights, I ended up with a spreadsheet that included all oil spills from ships, facilities, pipelines and random dumpage, of petroleum products, of 1,000 gallons or more, from 1973 through May 2013. That’s a winnowing of ~600,000 rows down to ~4,300.
Time to get plotting, right? Not so fast: while most of the rows had decimal lat/longs, some had this N25456 E055186 nonsense. C’mon son. Some Excel magic remedied that:
And I had my final output, looking damn fine if I do say so myself.
CartoDB was used for the public-facing interactive map, but also proved invaluable at this stage because it’s so easy to prototype with. I could upload my master spreadsheet and within seconds see where the clusters were, what the data looked like as scaled circles, if any lat/longs ended up in the middle of Greenland, etc. Anything that minimizes the time spent futzing in MAPublisher is good in my book, a regular Aeolian wind bag (the Odyssey? Eh? Cute huh?).
The rest was relatively easy window dressing: major oil ports and pipelines came from the EIA, major commercial waterways came from the U.S. Army Corps of Engineers. The usual MAPublisher nightmares ensued, but I was on familiar territory for once:
The map’s foundation was poured, so it was time to sift arcane spill remediation PDFs and conduct some interviews. My mother always told me “journalists make phone calls” so I rang up Dr. Jeff Short and the nice folks at the Exxon Valdez Oil Spill Trustee Council.
After the callouts were written and bones of the map were set, it was time to recruit the best art directors on the continent: Cynthia Hoffman and Chris Nosenzo. To demonstrate their talent I placed the first draft next to the final.
From assey to classy in only 30 drafts. I lost the geographic literalism that I began with, which bummed me out since I get real sweaty whenever I get to go all National Geographic carto on a project, but in the end the reader won out. Cindy and Chris tuned up the colors, fixed my type hierarchy, got some cuter illos, and a two-page spread was born. Honey I’m home clean your room Telemachus har har har har.
→→→The final product, loud and proud←←←
Next came the interactive map, which thanks to CartoDB, was mostly finished! CartoCSS made it trivial to match the magazine’s color and line weight styles, but the combination of tiles and vector symbol scaling did not sit well with CartoDB:
No matter what I did, the Deepwater Horizon circle was too damn big to render properly. And to get the jump on any pedants, yes I remembered to do the square root thing. Anyway Cindy had the idea of saying “screw it” to the circle scaling, which looks better anyway:
ＯＮＥ ＭＯＲＥ ＴＨＩＮＧ
Oh so you think you could have done better? Is that right, punk?? Well be my guest, make a better map! Seriously please do, Businessweek is hiring an editorial cartographer. Come map for us and spare my poor editors and art directors the goofery detailed above.