Instructor: Amanda Hickman
Let's touch base about where folks have needed help to get unstuck. There's no way you can learn everything you might need to know and store it in your head. So the best way we can guide you is to get you used to articulating the question you want to ask, and trying to ask it and then addressing the errors that inevitably surface.
At this point everyone should have met with one of us to talk through what your pitch / reporting plan needs. If you haven't, you should.
The next update to your data journalism project is due Saturday April 7 at 8 pm.
Updates should be cumulative: your submission should include everything we need to evaluate your work. At this point you should be making progress both in analyzing your data and understanding the underlying story or potential stories. With this update, you should clearly and concisely:
Describe the analysis you have conducted so far. Frame your description in terms of the questions you have asked of the data, and the answers you have obtained. Include the data you are using, a description of how you obtained it, and any code you have used for your analysis.
In clear sentences, describe the main conclusions you have drawn so far from your analysis.
Describe the further questions you wish to ask of your data.
Describe any obstacles you face. Important: you should be seeking help from your instructors when you get stuck! Do not wait until this assignment deadline for that.
Outline the additional reporting, beyond data analysis, that would be required to turn this project into a story.
We do want to see the code you have used in your analysis, but please do not show us every last SELECT/ALTER/UPDATE query that you used. Clean your work up so that someone can replicate it without having to replicate the casting about that we all do as we find our way through this stuff. This isn't just about making busy work. These final scripts are your resource to refer back to when you inevitably find yourself facing a problem you've solved before and trying to remember the solution. They also become your resource when you shelve a story and come back to it after two months working on some other breaking project: you want to be able to get yourself back up to speed on the work you've already done.
When I've taught Postgres before I've taught it inside of Carto, which does a good job of forcing things into shape for you. I haven't tried to teach Desktop Postgres, mostly because it is just fussier about the SQL it will accept. A few of you hit real walls around that with cases in your column names and I apologize for not anticipating that.
The tradeoff is what we're going to start to get into this week and next, which is that we have a lot of much deeper functionality available to us in Postgres, including PostGIS.
Don't create a lot of new tables. If you need to refer back to the results of a particular query often, you can use CREATE VIEW, but creating tables will add a lot of overhead and you almost never need it.
Cartography and GIS aren't the same thing. We're talking about very basic maps as visualizations here. More on that
"Geocoding" refers to the process of identifying an individual latitude/longitude pair for an address or other location description. To actually plot a location on a map, you need the location's latitude and longitude.
219 West 40th Street means nothing without coordinates.
Geocoding is actually challenging because there aren't good, free resources for doing batch jobs, where many addresses are geocoded at once. My Geocoding Tip Sheet includes some helpful resources, but many city data sources actually include coordinates, so double check that, first.
If you're committed to mapping points, you may need my help geocoding them.
No student has ever pitched me a compelling map that features lines rather than shapes or points. I did a project that drew out flight maps showing how far from home every prisoner incarcerated in Florence, CO is, but I pitched that, so it doesn't count. To draw that map I had to take a crash course in rendering lines. If you're excited about doing something like this, great! But you're going to need to install R and walk through Nathan Yau's tutorial before you do anything else.
Zipcodes, council districts, police precincts -- these are all polygons. Most of your maps will be in polygons. These polygons are defined in (usually) one of two specialized file formats -- a "Shapefile" or a "KML" file. The syntax of the file types varies, but they contain basically the same information -- the polygon called "Bronx CB 04" is defined by this series of lat/lon pairs. My Shapefiles Tip Sheet has some excellent resources for finding shapefiles though a lot of the resources there are New York City specific.
Often (usually) your data won't include a shapefile. If you have High School graduation rates by school districts, and you want to map those, you need to find a shapefile that describes the outline of each school district, and then you need to combine that shapefile with your data, by identifying a column that the two tables have in common.
We don't deal with projections much but they matter. And if you have inconsistent projections you might end up with a map where the city of San Francisco is floating about 10 miles NE of where it belongs. I had to ask for help to resolve that.
Most of the time you're going to be in WGS84. But your takeaway here should not be that you need to remember what projection to use, or that you need to internalize when to use one projection over another. Your takeaway should just be that projects are a thing and if you're finding things positioned super weirdly, look into the possibility that you've got a projection problem.
The Wikipedia article on web mercator is pretty good if you're dying to understand how this all fits together, but EPSG is an obsolete acronym for European Petroleum Survey Group a scientific research group with ties to the petroleum industry. They compiled a comprehensive database of projections and coordinate systems.
What is a shapefile? There are a few different widely used formats for storing geographic information. ESRI makes ArcGIS which is popular and expensive. Their shapefile format is almost universal. Google Maps uses it's own KML format.
I can't say enough about the importance of learning how to ask for help. If you look at my gis.stackexchange.com profile you can see where I got stuck and then unstuck, starting back in 2013. There's a community there that is very good about thoroughly explaining what you're dealing with.
Let's actually do some mapping.
In 2011, the BLS published a map of fatal workplace injuries by state.
What do we think of this map?
So we'll recreate it:
I already combined the Census 2011 population estimates with the BLS workplace fatality data. We're actually going to use Postgres to do the first piece of this -- we could do it just as easily in a spreadsheet but we need the SQL practice.
(state character varying(20) NOT NULL,fatalities integer NOT NULL,population integer NOT NULL);
Do you remember how to load data into a table?
And how are we going to normalize this?
ADD COLUMN fatality_rate float;
And then try out a query to fill that rate column:
SELECT *,((fatalities::float/population)*100000) AS rateFROM bls_fatalities_2011
And then actually do it:
UPDATE bls_fatalities_2011 SET fatality_rate = ((fatalities::float/population)*100000);
::float is recasting fatalities, which are stored as an integer, into a decimal number. Well, because.
And then output it:
COPY bls_fatalities_2011 TO '~/Desktop/bls_normalized.csv' DELIMITER ',' CSV HEADER;
To actually map this, we need some states. Who keeps track of US State boundaries? The Census. You want "Cartographic Boundary Shapefiles" > "States". The state boundaries don't actually change, so it doesn't matter which year.
For our purposes 1:20,000,000 is plenty of resolution. Download the shapefile, but don't unzip or extract it. Do use
Layer > Add Layer > Add Vector
You should be able to load the zip file in as a layer.
Why does it look all squished? Once upon a time I asked about that, too. The answer is kind of cool. If we use the toggle on the bottom right to switch to "EPSG 54004" we get something that looks a little more familiar.
You may also need to enable "on the fly" CRS transformation, down in the bottom right.
If you want any context for where you are in the world, you need a basemap. The "tile map scale plugin" does a nice job of automatically zooming you to an available tile layer, which the other base map plugins don't do.
So go ahead and download the plugin.
Plugins > Manage and Install Plugins ... and search for "Tile Map Scale"
That will give you a tiny pulldown on the map that lets you add a base layer so you can see where you are in the world. Use Mapnik -- it doesn't require an API key.
Add your CSV to QGIS (it has no geometry.)
Look at the Attribute table for the Shapefile. How are we going to join this?
And then style it.
Use equal intervals. I found a bug once which is another reason to ask for help. They had no idea.
Make compound labels with the
|| operator. Eg.
NAME || '\n' || "bls_fatalities_2011_Workplace Fatalities 2011"
This series is highly recommended: https://www.youtube.com/watch?v=Pf9cYvaCYWA&index=3&list=PL7HotvlLKHCs9nD1fFUjSOsZrsnctyV2R
We probably won't get to this. And this week all we're going to do is talk it through. We're not going to tackle it.
In 2006, California voters passed Prop 83, which requires registered sex offenders to live at least 2000 feet from any school or playground. In 2015, the state supreme court said the blanket restriction was too broad and the law could only be applied to offenders whose crimes involved children.
Sex offender restrictions buffered around schools to show how much of a city is off limits.
+ location of every public school in the county+ location of every private school in the county+ location of every playground in the county
Alameda publishes at least the schools: https://data.acgov.org/Education/Alameda-County-Schools/yza6-6jwu
Then we're going to need a way to calculate circles around those points. I want you to take five minutes to think of a Google search that might let you get at that. Put your search terms in the Etherpad.
More applications of this? If you aren't following SB 827 you should be. So how would you map the impact of that bill? https://transitrichhousing.org/ tried.
Load it. Label it.
Spend some time in your shapefile and describe, in words, a join that you can use to populate it with data. What column's can you use in that join?
The Data of the Week is in a Google Spreadsheet. Who is up after the break?.