The goal of STICr (pronounced “sticker”) is to provide a standardized set of functions for working with data from Stream Temperature, Intermittency, and Conductivity (STIC) loggers, first described in Chapin et al. (2014). STICs and other intermittency sensors are becoming more popular, but their raw data output is not in a form that allows for convenient analysis. In this vignette, we will walk through some typical processing steps, starting from the raw STIC model output and ending with classified and visualized STIC data.
STICs are created from modified HOBO Pendant loggers (see instructions for how to do this here),
and as a result the raw data downloaded from the STIC logger comes in a
proprietary .hobo format. Before using STICr, you must use the HOBOware
software to export the STIC output data as a CSV file. STICr can
then use tidy_hobo_data()
to get this into an R-friendly
tidy format. We will also use the convert_utc
argument to
ensure that the data are converted from local time to UTC.
# use tidy_hobo_data to load and tidy your raw HOBO data
df_tidy <-
tidy_hobo_data(
infile = "https://samzipper.com/data/raw_hobo_data.csv",
outfile = FALSE, convert_utc = TRUE
)
head(df_tidy)
#> datetime condUncal tempC
#> 1 2021-07-16 22:00:00 88178.4 27.764
#> 2 2021-07-16 22:15:00 77156.1 28.655
#> 3 2021-07-16 22:30:00 74400.5 28.060
#> 4 2021-07-16 22:45:00 74400.5 27.764
#> 5 2021-07-16 23:00:00 74400.5 27.862
#> 6 2021-07-16 23:15:00 71644.9 27.370
The output of tidy_hobo_data()
is a data frame with 3
columns:
datetime
= the date and time of the STIC measurement
(in this case, converted to UTC).condUncal
= the STIC conductivity reading, which is an
uncalibrated, sensor-specific relative conductivity measurement.tempC
= the STIC temperature reading in degrees
celsius.Since the raw STIC data is in a sensor-specific relative conductivity
value, it can often be useful to calibrate STIC loggers. This involves
taking STIC readings in the lab with the sensor immersed in standards
with known specific conductivity - detailed instructions for how to do this can
be found here.. STICr can then use the
get_calibration()
and apply_calibration()
functions to take these lab standard measurements and apply them to the
STIC sensor output data.
# inspect the example calibration standard data provided with the package
data(calibration_standard_data)
head(calibration_standard_data)
#> # A tibble: 4 × 3
#> sensor standard condUncal
#> <dbl> <dbl> <dbl>
#> 1 20946471 100 12400.
#> 2 20946471 250 23422.
#> 3 20946471 500 46845.
#> 4 20946471 1000 104712.
The calibration standard data has three columns:
sensor
= an identifier or serial number for the STIC
sensor, since the calibration differs for each sensor.standard
= the specific conductivity for the lab
standard data used for calibration.condUncal
= the uncalibrated conductivity data reading
by the STIC sensor when immersed in the standard.Using this calibration standard data, we can then create a
calibration curve using the get_calibration()
function and
apply it to our data using the apply_calibration()
function. Currently, get_calibration()
only allows for a
linear calibration curve.
# get calibration
lm_calibration <- get_calibration(calibration_standard_data)
summary(lm_calibration)
#>
#> Call:
#> lm(formula = standard ~ condUncal, data = calibration_data)
#>
#> Residuals:
#> 1 2 3 4
#> -33.43 11.27 37.50 -15.34
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1.496e+01 3.136e+01 0.477 0.6803
#> condUncal 9.554e-03 5.328e-04 17.932 0.0031 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 37.99 on 2 degrees of freedom
#> Multiple R-squared: 0.9938, Adjusted R-squared: 0.9907
#> F-statistic: 321.5 on 1 and 2 DF, p-value: 0.003096
This sensor has a very strong calibration, with an R^2 > 0.99. We
can now apply this to the tidied STIC data we loaded earlier. We will
use the outside_range_flag
argument to flag any data points
that are outside the range of the standards used to develop the
calibration curve, as these should be treated with caution.
# apply calibration
df_calibrated <- apply_calibration(
stic_data = df_tidy,
calibration = lm_calibration,
outside_std_range_flag = T
)
head(df_calibrated)
#> datetime condUncal tempC SpC outside_std_range
#> 1 2021-07-16 22:00:00 88178.4 27.764 857.3845
#> 2 2021-07-16 22:15:00 77156.1 28.655 752.0820
#> 3 2021-07-16 22:30:00 74400.5 28.060 725.7561
#> 4 2021-07-16 22:45:00 74400.5 27.764 725.7561
#> 5 2021-07-16 23:00:00 74400.5 27.862 725.7561
#> 6 2021-07-16 23:15:00 71644.9 27.370 699.4302
We now have two additional columns in our STIC data frame:
SpC
= the calibrated specific conductivity value at
each timestep.outside_range
= a flag indicating whether the STIC data
was within the range of the standards used for calibration (column is
empty) or outside the range of the standards (column has an
O
).Many people use STIC sensors to monitor when a stream is wet or dry,
based on the principle that the conductivity of water is much higher
than that of air. Therefore, high values of condUncal
and/or SpC
can be classified as wet conditions, and low
values can be classified as dry conditions. In STICr, this can be done
using the classify_wetdry()
function, but this requires
determining a suitable threshold to use for differentiating wet and dry
conditions from the STIC data. It is typically useful to plot the data
to determine this threshold.
# plot SpC as a timeseries and histogram
plot(df_calibrated$datetime, df_calibrated$SpC,
xlab = "Datetime", ylab = "SpC",
main = "Specific Conductivity Timeseries"
)
hist(df_calibrated$SpC,
xlab = "Specific Conductivity", breaks = seq(0, 1025, 25),
main = "Specific Conductivity Distribution"
)
It can be unclear when exactly the sensor is wet or dry, particularly if the sensor has been buried by deposited sediments (which have an intermediate conductivity between water and air). In this case, there is a clear abundance of points with SpC < 100, which we will use for our classification threshold.
# classify data
df_classified <- classify_wetdry(
stic_data = df_calibrated,
classify_var = "SpC",
threshold = 100,
method = "absolute"
)
head(df_classified)
#> datetime condUncal tempC SpC outside_std_range wetdry
#> 1 2021-07-16 22:00:00 88178.4 27.764 857.3845 wet
#> 2 2021-07-16 22:15:00 77156.1 28.655 752.0820 wet
#> 3 2021-07-16 22:30:00 74400.5 28.060 725.7561 wet
#> 4 2021-07-16 22:45:00 74400.5 27.764 725.7561 wet
#> 5 2021-07-16 23:00:00 74400.5 27.862 725.7561 wet
#> 6 2021-07-16 23:15:00 71644.9 27.370 699.4302 wet
We now have a new column, wetdry
, which reads “wet” when
SpC
exceeds the threshold and “dry” when SpC
is less than the threshold. We can plot and visualize the classified
data.
STICr has built-in a built-in QAQC function,
qaqc_stic_data()
, to deal with some common data issues we
have experienced. This function requires classified STIC data, such as
that output by classify_stic_data()
, and produces a QAQC
column or columns which could include the following data flags:
C
= calibrated SpC value was negative and corrected to
0, flagged if spc_neg_correction = T
.D
= point was identified as a deviation or anomaly
based on a moving window, flagged if inspect_deviation = T
.
If this flag is calculated, an anomaly_size
and
window_size
need to be set.apply_calibration()
.If concatenate_flags = T
, these flags will be combined
into a single column named QAQC
; if not, they will be
separate columns. A blank value in this column means no QAQC flags were
identified at that timestep.
# apply qaqc function
df_qaqc <-
qaqc_stic_data(
stic_data = df_classified,
spc_neg_correction = T,
inspect_deviation = T,
deviation_size = 4,
window_size = 96
)
head(df_qaqc)
#> datetime condUncal tempC SpC wetdry QAQC
#> 1 2021-07-16 22:00:00 88178.4 27.764 857.3845 wet
#> 2 2021-07-16 22:15:00 77156.1 28.655 752.0820 wet
#> 3 2021-07-16 22:30:00 74400.5 28.060 725.7561 wet
#> 4 2021-07-16 22:45:00 74400.5 27.764 725.7561 wet
#> 5 2021-07-16 23:00:00 74400.5 27.862 725.7561 wet
#> 6 2021-07-16 23:15:00 71644.9 27.370 699.4302 wet
table(df_qaqc$QAQC)
#>
#> DO O
#> 916 1 83
We now have a single column, QAQC
with all QAQC flags.
We can see that there was 1 “D” flag for a potential deviation anomaly
(a dry reading surrounding by wet readings) and 83 “O” points indicating
the calibrated SpC was outside the range of the standards used for
calibration. Using table
, we can inspect the total number
of data flags in our classified dataset.
Comparing STIC data to field observations is useful to help determine the appropriate threshold that should be used for wet/dry classification and to assess confidence in the STIC data. The more field observations the better, and it is best if they span a range of wet and dry conditions at each site.
# load and inspect sample field observation data
head(field_obs)
#> datetime wetdry SpC
#> 1 2021-07-16 18:03:00 wet 612.1672
#> 2 2021-07-19 15:01:00 wet 589.4157
#> 3 2021-07-21 02:44:00 dry 599.6622
#> 4 2021-07-23 13:55:00 wet 916.8215
#> 5 2021-07-25 16:27:00 wet 631.9857
# use validate_stic_data to compile closest-in-time STIC reading for each field observation
stic_validation <-
validate_stic_data(
stic_data = classified_df,
field_observations = field_obs,
max_time_diff = 30,
join_cols = NULL,
get_SpC = TRUE,
get_QAQC = FALSE
)
# we can now compare the field observations and classified STIC data in the table
head(stic_validation)
#> datetime wetdry_obs SpC_obs condUncal_STIC wetdry_STIC SpC_STIC
#> 1 2021-07-16 18:03:00 wet 612.1672 74400.5 wet 725.7561
#> 2 2021-07-19 15:01:00 wet 589.4157 88178.4 wet 857.3845
#> 3 2021-07-21 02:44:00 dry 599.6622 85422.8 wet 831.0587
#> 4 2021-07-23 13:55:00 wet 916.8215 88178.4 wet 857.3845
#> 5 2021-07-25 16:27:00 wet 631.9857 77156.1 wet 752.0820
#> timediff_min
#> 1 3
#> 2 1
#> 3 -1
#> 4 -5
#> 5 -3
# calculate percent classification accuracy
sum(stic_validation$wetdry_obs == stic_validation$wetdry_STIC) / length(stic_validation$wetdry_STIC)
#> [1] 0.8
# compare SpC
plot(stic_validation$SpC_obs, stic_validation$SpC_STIC,
xlab = "Observed SpC", ylab = "STIC SpC"
)
In this example, we are seeing pretty good classification accuracy, but fairly poor SpC agreement.