Data sources and processing procedures

Wenlong

2018-04-02

Introduction of data sources and availability

The data used in this package were original compiled and processed by United States Geographic Services (USGS). The fertilizer data include the application in both farms and non-farms for 1945 through 2012. The folks in USGS utilized the sales data of commercial fertilizer each state or county from the Association of American Plant Food Control Officials (AAPFCO) commercial fertilizer sales data. State estimates were then allocated to the county-level using fertilizer expenditure from the Census of Agriculture as county weights for farm fertilizer, and effective population density as county weights for nonfarm fertilizer. The data sources and other further information are availalbe in Table 1.

Dataset name Temporal coverage Source Website Comments
Fertilizer data before 1985 1945 - 1985 USGS Link Only has farm data.
Fertilizer data after 1986 1986 - 2012 USGS Link Published in 2017.
County background data 2010 US Census Link Assume descriptors of counties do not change.
Manure data before 1997 1982 - 1997 USGS link Manual data into farm every five years
Manure data in 2002 2002 USGS link Published in 2013
Manure data in 2007 and 2012 2007 & 2012 USGS link Published in 2017

Data cleanning and processing

As the county-level fertilizer data were processed at different times and by different researchers, the format of the data are a little bit messy. For the sake of time and efforts to employ a complicated dataset, the author cleaned the data into a Tidy Data following these rules from Hadley Wickham:

Fig. 1 shows the rules visually.

Fig. 1 Following three rules makes a dataset tidy: variables are in columns, observations are in rows, and values are in cells.

(The description of tidy data was adapted from R for data science)

import libraries and data.

Data cleanning

clean data before 1982.

clean data after 1987

Clean manure data before 1997

Clean manure data after 1997

Save data as rdata with compaction

Future development plan

There are some future features in the dataset, including: