Better Spreadsheets
Data4All
0.1 Reminder
This workshop adheres to the DaSL Learning Community Participation Guidelines:
Please be respectful of your fellow learners and help each other learn.
Remember, it’s dangerous to learn alone! So partner up with someone, it’s fun to learn together.
0.2 Introduction
Spreadsheets, for all of their mundane rectangularness, have been the subject of angst and controversy for decades. - Kara Woo and Karl Broman
There are better ways of formatting spreadsheets that we can learn to make our spreadsheets more useful to others. We can do this in ways that still make spreadsheets useful to us.
0.3 A Note About Excel and Spreadsheets
Excel is super useful and super flexible. As a Data Scientist, I will never look down on Excel users.
These tips will make collaborating with data scientists easier, but the goal is to understand how your data gets transformed and loaded with various tools.
If you only implement some of these tips, that’s great. Do what’s easiest for you.
Our goal is reproducible research - and you can do your part.
0.4 This is Workshop is Not About Excel Tips
This workshop is about formatting, not necessarily tips and tricks for Excel.
0.5 Learning Objectives
- Explain and utilize tidy principles to effectively organize your data
- Format your data to effecitvely utilize it in analyses
- Collaborate with data scientists more effectively with formatting tips
- Explain data cleaning tools in R that can help
0.6 Motivation
- Retraction Watch: Genomics has a Spreadsheet Problem - Thankfully, this has been fixed in recent versions of Excel
- Formula error inflated myocarditis statistics
- Officials Say They’ve Fixed a ‘Sorting’ Breakdown in Virginia Booze Raffle
Murrell (2013) contrasted data that are formatted for humans to view by eye with data that are formatted for a computer. He provided an extended example of computer code to extract data from a set of files with complex arrangements. It is important that data analysts be able to work with such complex data files. But if the initial arrangement of the data files is planned with the computer in mind, the later analysis process is simplified.
0.7 Data Integrity
- What We Know About Spreadsheet Errors
- Don’t go back and try to retrospectively implement these tips. Try and adopt a few of them starting now.
- Don’t be embarrassed or ashamed - just resolve to do better
0.8 Overview of Today
We’ll be looking at the tips from Data Organization in Spreadsheets.
I think if you work with spreadsheets in any way, you should take a look at it.
0.8.1 Starting Out Tips
- No empty cells
- Choose Good Names for Things
- Write Dates as YYYY-MM-DD
- Be Consistent
- Put just one thing in a cell
0.8.2 Intermediate Tips
- Make it a Rectangle
- Create a Data Dictionary
- Do not use font color or highlighting as data
0.8.3 Advanced Tips
- Make Backups
- No Calculations in the Raw Data files
- Use Data Validation to Avoid Errors
- Save the Data in Plain Text Files
0.8.4 Cleaning Data in Excel
0.8.5 Spreadsheet Examples
All of these examples are from the reading, with the exception of the fertility data.