Preparing toxEval Data

07 February, 2024

Introduction

What kind of data could be used in a toxEval analysis? It was designed with concentration measurements from water samples as the primary use case. There may be other concentration measurements that could be used as well, but it is up to the researcher to determine if special considerations must be taken in those circumstances. For instance, there was a toxEval analysis done on the concentration of chemicals measured in eagle plasma.

For all cases within toxEval, a “sample” is considered a unique site/date. There are times when this might not be especially relavent to the data collection (passive samplers, groundwater samples at separate depths, etc.). The user will need to come up with strategies to deal with the limiting workflow. For example, single sites at different depths could add site suffixes (site_a_3m, site_a_6m, etc.). Passive samplers could use the start or end time as the sampling times.

Preparing the data

Input data for toxEval should be prepared in a Microsoft ™ Excel file using specifically named sheets (also known as tabs). There are 3 mandatory sheets (Data, Chemical, Sites), and 2 optional sheets (Exclude, Benchmarks). The sheets should appear as follows (although the order is not important):

Each sheet has mandatory columns; the order of the columns is not important, but the names of the columns are important. Additional columns can be included but will be ignored. The top row of each sheet must contain the column names (headers), and the second row should begin with the data. That means no titles or comment rows should precede the data.

Data

The “Data” sheet is used to define the measured concentrations to be evaluated in toxEval. Four columns are required in this sheet: “CAS”, “SiteID”, “Value”, and “Sample Date”. The columns can be in any order, but the first row of the sheet must be the header (column names).

Note: Additional columns may be useful to organize the data. These additional columns will be ignored by toxEval and will not influence a toxEval analysis. For example, many data sets have detection level or censoring information. In this version of toxEval, that type of analysis is ignored. The censored data can be entered in the Value column as the detection level, or half the detection level, 0, or some other strategy…that is currently up to the researcher. This is a topic that could be re-evaluated in future versions of toxEval.

As an example, the first several rows of a minimal example would look like this:

Chemicals

The “Chemicals” sheet is used to define the unique chemicals included in the “Data” sheet (so, 1 row per unique chemical). Two columns are required in this sheet: “CAS” and “Class”. The columns can be in any order, but the first row of the sheet must be the header (column names). If you need chemical names that do not match up wiht the “tox_chemical” list provided in the package, you will want to include a 3rd column “Chemical” which is the chemical name to use for plots and tables.

Note: Additional columns may be useful to organize the data. These additional columns will be ignored by toxEval and will not influence a toxEval analysis.

Sites

The “Sites” sheet is used to define site information for locations where samples were collected. Four columns are required in this sheet: “SiteID”, “Short Name”, “dec_lon”, and “dec_lat”. The columns can be in any order, but the first row of the sheet must be the header (column names).

Note: Additional columns may be useful to organize the data. These additional columns will be ignored by toxEval and will not influence a toxEval analysis.

Exclude

At times, it may be appropriate to exclude endpoints, chemicals, or specific endpoint:chemical combinations from a data analysis due to lack of relevance to the study objective or low confidence in specific portions of the data. The “Exclude” sheet is used for this purpose.

The “Exclude” sheet is optional, but if used, two columns are required: “CAS” and “endPoint”. They can be in any order, but the first row of the sheet should be the header (column names).

Why would you choose to exclude a chemical/endpoint value? There are times that the dose-response curves from ToxCast may not trigger any automated flags, but upon inspection, the curves seem suspect. The easiest way to view the dose response curves is from the Comptox dashboard. The function endpoint_hits_DT includes an option to get a direct link to find the dose-response curves if the category is “Chemical”. This is handy to do quick checks on the endpoint/chemical combinations that produce the highest EARs. If the highest EAR values have dose-response curves that seem suspect, consider adding those to the “Exclude” tab, or at least trying to get more information on that endpoint/assay.

Note: Additional columns may be useful to organize the data. These additional columns will be ignored by toxEval and will not influence a toxEval analysis.

Benchmarks

The user may provide a set of concentration benchmarks to be used in place of the ToxCast database. For example, there may be a need to perform similar toxEval analysis using EPA aquatic life benchmarks to compare measured concentrations against established toxicity thresholds. The “Benchmarks” sheet is used for this purpose. For more information, see here.

The “Benchmarks” sheet is optional, but if used, five columns are required: “CAS”, “Chemical”, “endPoint”, “Value”, and “groupCol”. They can be in any order, but the first row of the sheet should be the header (column names).

Note: Additional columns may be useful to organize the data. These additional columns will be ignored by toxEval and will not influence a toxEval analysis.

Disclaimer

This software has been approved for release by the U.S. Geological Survey (USGS). Although the software has been subjected to rigorous review, the USGS reserves the right to update the software as needed pursuant to further analysis and review. No warranty, expressed or implied, is made by the USGS or the U.S. Government as to the functionality of the software and related material nor shall the fact of release constitute any such warranty. Furthermore, the software is released on condition that neither the USGS nor the U.S. Government shall be held liable for any damages resulting from its authorized or unauthorized use.

Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.