5 Resources
5.1 Data Organization in Spreadsheets
- This is the paper we discussed - it is free to read and I think it’s a must read if you do any work with spreadsheets.
5.2 Tidy Data
Some say that tidy data is a bit of a cult. But I think the principles are sound and helpful ways to think about organizing data.
- Tidy Data (from the tidyr package) - this is a vignette from the
{tidyr}
package explaining the virtues of tidy data. - Tidy data for efficiency, reproducibility and collaboration - this blog from Openscapes is a nice, gentle introduction to tidy data.
- How to Share Data for Collaboration - more guidelines on how to share data with statisticians from Jeff Leek (DaSL overlord) and Shannon Ellis.
5.3 R Packages
5.4 Python Resources
- Python for Data Science: Tidy Data - this chapter outlines some of the Python tools for cleaning data:
.melt()
,.wide_to_long()
,.stack()/unstack()
,.pivot()
and others. - Tidy Data with Pandas - lesson from Library Carpentry which also discusses Python methods for tidying data.