1. Process post: how to make a giant oil spill map the hard way

    image

    Pray listen, stranger, to a tale of noble deeds and bitter struggle, of fffffttt nope it’s about me doing computershit for around three weeks. But I am proud of this graphic and interactive map on oil spills and it has a sort of Homeric-level story behind it. But only if the first thousand stanzas of the Odyssey were him trying to untangle rigging while yelling “Does anyone know how to sail?”

    I pitched a bigass graphic timed with the 25th anniversary of the Exxon Valdez spill, which was both an ecological catastrophe and a watershed moment for petro-tragedy regulation. Recap: a tired third mate under the command of a drunk captain plowed an oil tanker into a reef, around 1/5 of its crude poured into a remote Alaskan sound, and a lot of wildlife death followed. Thousands of people hosed rocks and scrubbed birds, Exxon paid far out the butt, and the Oil Pollution Act of 1990 firmed up liability law for spills. 25 years later and there’s residual crud stuck to beaches and fewer Orcas than there should be. So yeah, heavy hydrocarbons tend to linger. Try and keep them out of the biosphere.

    I wanted a map showing all oil spills in U.S. waters since the Valdez. Step zero: find a dataset. I alighted on the National Response Center query tool, which is the pollution incident reporting clearinghouse run by the U.S. Coast Guard. It is, naturally, pretty dated-looking and also uses flash. A classic tale of news dev woe. I grabbed the yearly summary data files, found they were inexplicably wrapped in self-extracting .exes, took a round trip to my windows laptop, out came two dozen excel files with ~12 tabs each, which I then ran through some excel macros to split into 231 discrete CSVs covering the years 1990 to 2012. Phew.

    From those I wanted to sift the incident date, the vector (ship, refinery, etc.), the stuff spilled (petroleum only), the spill amount and the location. Of course there was no documentation to help me sort the useful field names from the PTF_FLAG_Ys that made up the bulk of the column headers, but with a little bit of OpenRefine and a lot of emails, I figured I could likely sort that out.

    image

    So now I had data from 1990-2012 in a sorta-convenient format, but I had to use a python script to grab 2013-2014 pollution incidents from the query tool.

    image

    At least I would have done that but for being caught between the Scylla of website maintenance and the Charybdis of pokey FOIA approvals: the query tool went down, so I was not getting any more data. Wasn’t too awful a fate since the editors okayed using data up to 2012.

    But then I took a closer look at the data and noticed how locations were stored. Addresses. Cities. States. Oh lord, geocoding. I’d need to geocode an apocalyptic number of incidents, sure to overwhelm whatever API I could throw at it. A pall fell over the project. Mostly because I’m bad at thinking things through.

    image

    image

    NICAR 2014 interrupted my panic, the news-Calypso to break up my hideous journey. I especially enjoyed David Fallis’ talk on tracking guns and the four “intro to SQL” classes I had to camp out for (note to NICAR 2015: overflow rooms for the hands-on seshes would help). SQL, SQL…yeah…this gives me an idear.

    As with most situations involving me and editorial graphics, fortune intervened to cut the knot. Hallelujah: there was another Coast Guard database that I could use. And it had distinct advantages:

    1. It tracks chemical and oil spills specifically. Incidents of that sort are included in the National Response Center data set, but it’s more of a giant pollution clearinghouse and thus cluttered with wacky stuff. Exhibit A: someone called the government to tell them a mercury thermometer broke.
      image

    2. Instead of locating the incidents by address, they have latitude and longitude columns! No desperate shitfights with geocoding confidence levels! Just beautiful decimals! Deliverance! Praise to William Harrison!

    Now I had more data, in a better format, with better documentation. By pure luck. Not a good strategy but one can’t argue with results. Anyway I knew I had to face the database music this time: Excel would choke and die on the 20mb TSVs I had to deal with. Again, lucky me, I’m blessed enough to have Toph Tucker at my elbow, a true Phaeacian who could be relied on to provide deep sighs and plenty of “you forgot the semicolon again”s.

    I loaded my tab-delimited text files into a MySQL database using SequelPro (great product) and got to querying. Being a neophyte, this naturally took a while, so we’ll skim over my troubles here. With Toph’s help I was able to format queries to retrieve oil spills that a) actually involved petroleum products b) involved them in some non-picayune quantity and c) had lat/longs associated with them.

    image

    After all sorts of false starts and late nights, I ended up with a spreadsheet that included all oil spills from ships, facilities, pipelines and random dumpage, of petroleum products, of 1,000 gallons or more, from 1973 through May 2013. That’s a winnowing of ~600,000 rows down to ~4,300.

    Time to get plotting, right? Not so fast: while most of the rows had decimal lat/longs, some had this N25456 E055186 nonsense. C’mon son. Some Excel magic remedied that:

    image

    And I had my final output, looking damn fine if I do say so myself.

    image

    CartoDB was used for the public-facing interactive map, but also proved invaluable at this stage because it’s so easy to prototype with. I could upload my master spreadsheet and within seconds see where the clusters were, what the data looked like as scaled circles, if any lat/longs ended up in the middle of Greenland, etc. Anything that minimizes the time spent futzing in MAPublisher is good in my book, a regular Aeolian wind bag (the Odyssey? Eh? Cute huh?).

    The rest was relatively easy window dressing: major oil ports and pipelines came from the EIA, major commercial waterways came from the U.S. Army Corps of Engineers. The usual MAPublisher nightmares ensued, but I was on familiar territory for once:

    image

    image

    The map’s foundation was poured, so it was time to sift arcane spill remediation PDFs and conduct some interviews. My mother always told me “journalists make phone calls” so I rang up Dr. Jeff Short and the nice folks at the Exxon Valdez Oil Spill Trustee Council.

    After the callouts were written and bones of the map were set, it was time to recruit the best art directors on the continent: Cynthia Hoffman and Chris Nosenzo. To demonstrate their talent I placed the first draft next to the final.

    image

    From assey to classy in only 30 drafts. I lost the geographic literalism that I began with, which bummed me out since I get real sweaty whenever I get to go all National Geographic carto on a project, but in the end the reader won out. Cindy and Chris tuned up the colors, fixed my type hierarchy, got some cuter illos, and a two-page spread was born. Honey I’m home clean your room Telemachus har har har har.

    The final product, loud and proud

    image

    Next came the interactive map, which thanks to CartoDB, was mostly finished! CartoCSS made it trivial to match the magazine’s color and line weight styles, but the combination of tiles and vector symbol scaling did not sit well with CartoDB:

    image

    No matter what I did, the Deepwater Horizon circle was too damn big to render properly. And to get the jump on any pedants, yes I remembered to do the square root thing. Anyway Cindy had the idea of saying “screw it” to the circle scaling, which looks better anyway:

    Et voila. As a beautiful coda, Toph made a mobile-optimized teaser. I couldn’t begin to tell you how he did it. Go ask him.

    image

    Until the next graphics nightmare,
    Evan Applegate • on twitter • on tumblr

    ONE  MORE  THING

    Oh so you think you could have done better? Is that right, punk?? Well be my guest, make a better map! Seriously please do, Businessweek is hiring an editorial cartographer. Come map for us and spare my poor editors and art directors the goofery detailed above.

    image

     
  1. lesfemmesduex likes this
  2. sunnyorshine likes this
  3. niniwrites likes this
  4. americas-worst-companies reblogged this from bizweekgraphics
  5. insecuredave reblogged this from soupsoup
  6. adam-turney likes this
  7. misterhappiness reblogged this from bizweekgraphics
  8. temporary-problems reblogged this from soupsoup
  9. 7pep7 reblogged this from bizweekgraphics
  10. 7pep7 likes this
  11. collapsingcontainers likes this
  12. jolierose likes this
  13. wonderkiller likes this
  14. gohomewrong likes this
  15. piecesofjen likes this
  16. qwerty6996 likes this
  17. mohamednazmi likes this
  18. florintin likes this
  19. moonfant likes this
  20. fivis likes this
  21. yanoz likes this
  22. missiri88 likes this
  23. publikenemy-10 reblogged this from soupsoup
  24. stochasticjack likes this
  25. ipicktheround likes this
  26. beatmolecular likes this
  27. mellionz likes this
  28. missivesfromtroy likes this
  29. welterreader likes this
  30. damekmz likes this
  31. popculturebrain likes this
  32. warmachine101 likes this
  33. lanceturner likes this
  34. raisonneuse likes this
  35. asuddenblog likes this
  36. gcologist likes this
  37. vanbean likes this
  38. oldgirlie likes this
  39. alternrg likes this
  40. meggy1508 reblogged this from soupsoup and added:
    lol
  41. macon2 likes this
  42. chlc likes this
  43. patrickstjohn likes this
  44. goodchillhunting likes this
  45. rubyamaterasu likes this
  46. bookishbenz likes this
  47. vrabac likes this
  48. coldgettindumb likes this
  49. rivereverywhere likes this
  50. justacommonview reblogged this from soupsoup