This course is for students who want to make finding and reporting stories from data part of their toolkit. It will be useful for anyone interested in investigative journalism, which nowadays is often heavily data-driven, or those keen to use data to provide context and ground-truth for regular beat reporting. You should be comfortable with numbers, and thinking critically and quantitatively. You must be prepared to work with some simple code (in R and SQL), and to get your hands dirty with real-life, messy data!
We will meet in 108/Lower NG on Thursdays from 6pm - 9pm. Your instructors, Peter Aldhous and Amanda Hickman, will maintain office hours over Skype. You are encouraged to arrange appointments to discuss your work.
Categorical and continuous variables; basic operations for interviewing a dataset; sampling and margins of error; plotting and summarizing distributions; choosing bins for your data; basic newsroom math. (Peter Aldhous)
Through a tour of some great examples of data journalism, we’ll get inspiration for our work in this course. (Peter Aldhous)
Understanding spreadsheet functions and pivot tables is the foundation for the rest of your exploration of data. We’ll learn how to troubleshoot and use spreadsheets. (Amanda Hickman)
Where and how to find data online. Tips and tricks for downloading unruly data, including browser extensions to extract data from web tables and download from multiple files en masse. (Peter Aldhous)
Introduction to R, R Studio and the tidyverse packages for data journalism. (Peter Aldhous)
We’ll use tidyverse packages to explore data on opioid prescription under Medicare in California, and related datasets on the doctors involved. (Peter Aldhous)
Data is as full of lies as people, but somehow we are inclined to believe numbers in ways we we wouldn’t believe sentences. We’ll look at ways that numbers lie to us and people lie to each other with numbers. (Amanda Hickman)
We’ll work PostgreSQL using SQL, to ask questions of data. Note that this class will meet on a Tuesday. (Amanda Hickman)
QGIS is a desktop geographic information system (or GIS) application that we’ll use to view, edit, and analyze geographic data. (Amanda Hickman)
PostGIS adds support for geographic queries and objects to Postgres — it is a powerful tool for geographic analysis, and it plugs right into QGIS. We’ll explore more advanced queries that take advantage of PostGIS’s power. We’ll also discuss the tool or strategy you’d like to tackle in week 12. (Amanda Hickman)
We’ll keep working on pulling together GIS and databases so everyone has a good handle on the tools we’re using. (Amanda Hickman)
After a few weeks working with SQL and maps, we’ll get reacquainted with R. In groups, you will answer questions from data using dplyr code. (Peter Aldhous)
Another practice session to reinforce skills covered previously, working in groups to write PostGIS queries. (Amanda Hickman)
Some more useful tricks in R, including loading data from multiple files, making new columns in your data, and pulling data from the US Census API. Plus: When you need step beyond dplyr, and do a statistical analysis. (Peter Aldhous)
Sarah Cohen: Numbers in the Newsroom: Using Math and Statistics in News
Philip Meyer: Precision Journalism: A Reporter’s Introduction to Social Science Methods
Unexcused absence from two classes will drop you one letter grade; a third unexcused absence will result in an F. Excused absences will be permitted only in extraordinary circumstances. Regardless of the reason for an absence, students will be responsible for any assignments due and for learning material covered in class.
Class participation, weekly assignments: 90%
Attendance: 10%
Students must turn off the ringers on their cell phones before class begins. Students may not check e-mail, social media sites or other websites during lecture portions of class or while working on class exercises.
The high academic standard at the University of California, Berkeley, is reflected in each degree that is awarded. As a result, it is up to every student to maintain this high standard by ensuring that all academic work reflects his/her own ideas or properly attributes the ideas to the original sources.
These are some basic expectations of students with regards to academic integrity:
Any work submitted should be your own individual thoughts, and should not have been submitted for credit in another course unless you have prior written permission to re-use it in this course from this instructor.
All assignments must use “proper attribution,” meaning that you have identified the original source of words or ideas that you reproduce or use in your assignment. This includes drafts and homework assignments!
If you are unclear about expectations, ask your instructor.
If you need disability-related accommodations in this class, if you have emergency medical information you wish to share with the instructor, or if you need special arrangements in case the building must be evacuated, please inform the instructors as soon as possible by seeing one of us after class or making an appointment to visit during office hours. If you are not currently listed with DSP (Disabled Students’ Program) but believe that you could benefit from their support, we encourage you to apply online.