Output of usdarnass

Robert Dinterman

2019-06-20

Introduction

Each successful call of the nass_data() command will return a data.frame object with 39 variables, although a handful of these variables will have the same value for each observation in the data.frame due to the nature of setting parameters for a query. The resulting data.frame is of the long variety with the Value variable as the numerical variable of interest for the query.

Official USDA Objects

The official documentation for each of these variables from the USDA are as follows:

I learn best through examples, so I’ll cover a few different levels of analysis and subtleties related to the data.

County Level Example

We can set a query where we return all data at the county level in Ohio related to rent, which is equivalent to https://quickstats.nass.usda.gov/ setting “Geographic Level” to COUNTY, “State” equal to Ohio, and “Commodity” equal to RENT :

library("usdarnass")
library("dplyr") # Helpful package

ohio_rent <- nass_data(commodity_desc = "RENT", agg_level_desc = "COUNTY",
                       state_name = "OHIO")
glimpse(ohio_rent)
#> Observations: 2,006
#> Variables: 39
#> $ source_desc           <chr> "CENSUS", "CENSUS", "CENSUS", "CENSUS", "C…
#> $ sector_desc           <chr> "ECONOMICS", "ECONOMICS", "ECONOMICS", "EC…
#> $ group_desc            <chr> "EXPENSES", "EXPENSES", "EXPENSES", "EXPEN…
#> $ commodity_desc        <chr> "RENT", "RENT", "RENT", "RENT", "RENT", "R…
#> $ class_desc            <chr> "CASH, LAND & BUILDINGS", "CASH, LAND & BU…
#> $ prodn_practice_desc   <chr> "ALL PRODUCTION PRACTICES", "ALL PRODUCTIO…
#> $ util_practice_desc    <chr> "ALL UTILIZATION PRACTICES", "ALL UTILIZAT…
#> $ statisticcat_desc     <chr> "EXPENSE", "EXPENSE", "EXPENSE", "EXPENSE"…
#> $ unit_desc             <chr> "$", "$", "$", "$", "$", "$", "$", "$", "$…
#> $ short_desc            <chr> "RENT, CASH, LAND & BUILDINGS - EXPENSE, M…
#> $ domain_desc           <chr> "TOTAL", "TOTAL", "TOTAL", "TOTAL", "TOTAL…
#> $ domaincat_desc        <chr> "NOT SPECIFIED", "NOT SPECIFIED", "NOT SPE…
#> $ agg_level_desc        <chr> "COUNTY", "COUNTY", "COUNTY", "COUNTY", "C…
#> $ state_ansi            <chr> "39", "39", "39", "39", "39", "39", "39", …
#> $ state_fips_code       <chr> "39", "39", "39", "39", "39", "39", "39", …
#> $ state_alpha           <chr> "OH", "OH", "OH", "OH", "OH", "OH", "OH", …
#> $ state_name            <chr> "OHIO", "OHIO", "OHIO", "OHIO", "OHIO", "O…
#> $ asd_code              <chr> "10", "10", "10", "10", "10", "10", "10", …
#> $ asd_desc              <chr> "NORTHWEST", "NORTHWEST", "NORTHWEST", "NO…
#> $ county_ansi           <chr> "003", "003", "003", "003", "003", "039", …
#> $ county_code           <chr> "003", "003", "003", "003", "003", "039", …
#> $ county_name           <chr> "ALLEN", "ALLEN", "ALLEN", "ALLEN", "ALLEN…
#> $ region_desc           <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ zip_5                 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ watershed_code        <chr> "00000000", "00000000", "00000000", "00000…
#> $ watershed_desc        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ congr_district_code   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ country_code          <chr> "9000", "9000", "9000", "9000", "9000", "9…
#> $ country_name          <chr> "UNITED STATES", "UNITED STATES", "UNITED …
#> $ location_desc         <chr> "OHIO, NORTHWEST, ALLEN", "OHIO, NORTHWEST…
#> $ year                  <chr> "2017", "2012", "2007", "2002", "1997", "2…
#> $ freq_desc             <chr> "ANNUAL", "ANNUAL", "ANNUAL", "ANNUAL", "A…
#> $ begin_code            <chr> "00", "00", "00", "00", "00", "00", "00", …
#> $ end_code              <chr> "00", "00", "00", "00", "00", "00", "00", …
#> $ reference_period_desc <chr> "YEAR", "YEAR", "YEAR", "YEAR", "YEAR", "Y…
#> $ week_ending           <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ load_time             <chr> "2018-02-01 00:00:00", "2012-12-31 00:00:0…
#> $ Value                 <chr> "10,693,000", "11,067,000", "7,001,000", "…
#> $ `CV (%)`              <chr> "18.4", "9.6", NA, NA, NA, "18.4", "9.6", …

The agg_level_desc, commodity_desc, and state_name variables are all the same because the query parameters were set on those values. It turns out a few other variables will be identical for the whole data.frame because they do not vary based on county level observations in Ohio: country_code, state_alpha, state_ansi, and state_fips_code.

There are a fair amount of other variables which are all the same for the entire data.frame, but these variables are not the same because of the regional aggregation variables but because we have subset the data by the commodity_desc as “RENT”.

The various other parameters that we could have set in this query of interest are from the short_desc parameter, which we could use the nass_param() function to view the options for this data item:

nass_param("short_desc", commodity_desc = "RENT", agg_level_desc = "COUNTY", state_name = "OHIO")
#> [1] "RENT, CASH, CROPLAND, IRRIGATED - EXPENSE, MEASURED IN $ / ACRE"    
#> [2] "RENT, CASH, CROPLAND, NON-IRRIGATED - EXPENSE, MEASURED IN $ / ACRE"
#> [3] "RENT, CASH, LAND & BUILDINGS - EXPENSE, MEASURED IN $"              
#> [4] "RENT, CASH, LAND & BUILDINGS - OPERATIONS WITH EXPENSE"             
#> [5] "RENT, CASH, PASTURELAND - EXPENSE, MEASURED IN $ / ACRE"            
#> [6] "RENT, PER HEAD OR ANIMAL UNIT MONTH - OPERATIONS WITH EXPENSE"

Alternatively, in the previously returned query with the nass_data() function, we could view the frequency of the different short_desc variables to get to the same outcome but with the additional benefit of knowing the number of observations:

table(ohio_rent$short_desc)
#> 
#>     RENT, CASH, CROPLAND, IRRIGATED - EXPENSE, MEASURED IN $ / ACRE 
#>                                                                  19 
#> RENT, CASH, CROPLAND, NON-IRRIGATED - EXPENSE, MEASURED IN $ / ACRE 
#>                                                                 719 
#>               RENT, CASH, LAND & BUILDINGS - EXPENSE, MEASURED IN $ 
#>                                                                 440 
#>              RENT, CASH, LAND & BUILDINGS - OPERATIONS WITH EXPENSE 
#>                                                                 440 
#>             RENT, CASH, PASTURELAND - EXPENSE, MEASURED IN $ / ACRE 
#>                                                                 311 
#>       RENT, PER HEAD OR ANIMAL UNIT MONTH - OPERATIONS WITH EXPENSE 
#>                                                                  77

Dominant form of rent in Ohio

In Ohio, the dominant form of cash rent is for non-irrigated cropland and most counties in Ohio are surveyed in the state and have a usable value for cash rent in a year. But this is not exactly right, so we can see this by subsetting our original query for only non-irrigated cropland and look at the number of counties in each year’s observation.

The cash rent values begin in 2008 with a small subset of counties in Ohio and then cover the vast majority of the state in 2009 onward. As it turns out, the entire state is surveyed but some counties do not have enough observations to have a statistically relevant sample and are thus combined at the agricultural reporting district level. This can be see with a listing of all of the counties in Ohio with rent data available:

The “OTHER (COMBINED) COUNTIES” value has by far and away the most observations over this time and, if one knows all of the counties in Ohio, there are a few missing counties that we need to input their values for. Each of the “OTHER (COMBINED) COUNTIES” values is for a specific agricultural reporting district:

There are nine reporting districts in Ohio and eight of the nine have observations which combine counties for an observation – which implies some counties are missing official observations for the statistics in question. It is not necessarily the case that the counties are completely missing but they are suppressed. One method for correcting for these missing values is to replace the missing counties with the “OTHER (COMBINED) COUNTIES” category for average rent.

In order to do this, we first need a full set of all of the counties in Ohio along with their corresponding agricultural district number. There are many ways to accomplish this, but I will go about this by leveraging the 2012 Agricultural Census data which is in Quick Stats and contains a category for the number of farms in each Ohio county. The number of farms is not helpful for our data concerns but it does serve as a way to return a data.frame with 88 observations in Ohio with each uniquely corresponding to a county in Ohio.

At this point, we want to make sure that each year for our non_irrigated rent data has all 88 counties with an NA value if it does not exist. Then, we want to impute the “OTHER (COMBINED) COUNTIES” category for all of the NAs. To do this we will create the backbone of our desired observations from the farms and then fully merge this with the current data from Quick Stats. The resulting data.frame will have missing “Value” observations for the counties which do not have observations and we will impute the value from “OTHER (COMBINED) COUNTIES”. And last of all, with these data the only important aspect of these observations is that we have a county identifier, locational aspects, and the year in question. Much of the rest of the variables in the dataset are not of importance to us so we will have only keep the remaining important variables and convert them to numeric.