---
title: package overview
subtitle: "suwo: access nature media repositories"
pagetitle: suwo package overview
author:
- <a href="https://marce10.github.io/">Marcelo Araya-Salas</a>, Jorge Elizondo & <a href="https://ecophysics.org/">Alejandro Rico-Guevara</a>
date:  "2026-04-13"
output:
  rmarkdown::html_document:
    self_contained: yes
    toc: true
    toc_depth: 3
    toc_float:
      collapsed: false
      smooth_scroll: true
vignette: >
   %\VignetteIndexEntry{1. Package overview}
   %\usepackage[utf8]{inputenc}
   %\VignetteEncoding{UTF-8}
   %\VignetteEngine{knitr::rmarkdown}
editor_options:
  chunk_output_type: console
params:
  EVAL: !r identical(Sys.getenv("NOT_CRAN"), "true")
---

&nbsp;






::: {.alert .alert-info}

The [suwo](https://docs.ropensci.org/suwo/) package aims to simplify the
retrieval of nature media (mostly photos, audio files and videos) across
multiple online biodiversity databases. This vignette provides an overview of
the package’s core querying functions, the searching and downloading of media
files, and the compilation of metadata from various sources. For detailed
information on each function, please refer to the
[function reference](https://docs.ropensci.org/suwo/reference/index.html)
or use the help files within R (e.g., `?query_gbif`).

:::

::: {.alert .alert-warning}

**Intended use and responsible practices**

This package is designed exclusively for non-commercial, scientific purposes, including research, education, and conservation. **Commercial use of data or media retrieved through this package is the user’s responsibility and is allowed only when the applicable license of the source database explicitly permits such use, or when explicit, separate permission has been obtained directly from the original source platforms or rights holders**. Users must comply with the specific terms of service and data-use policies of each source database, which may require attribution and may further restrict commercial application. The package developers assume no liability for misuse of the retrieved data or for violations of third-party terms of service.

:::

# Installation

Installing from CRAN:


``` r
#Install from CRAN:

# From CRAN would be
install.packages("suwo")

#load package
library(suwo)
```

Install the latest development version from GitHub:


``` r
install.packages("suwo", repos = c(
  'https://ropensci.r-universe.dev',
  'https://cloud.r-project.org'
))

#load package
library(suwo)
```

# Basic workflow for obtaining nature media files

Obtaining nature media using [suwo](https://docs.ropensci.org/suwo/) follows a basic
sequence. The following diagram illustrates this workflow and the main functions
involved:

<center><img src="workflow_diagram.png" alt="Flowchart of the suwo workflow for obtaining nature media files. Step 1, 'Get metadata', includes multiple boxes representing queries to different repositories, such as query_wikiaves() and query_xenocanto(), plus additional possible query_() calls. Arrows from all these queries converge into Step 2, 'Combine metadata', using merge_metadata(). The process then moves to Step 3, 'Remove duplicates', using find_duplicates() and remove_duplicates(). Next is Step 4, 'Download media files', using download_media(). Finally, Step 5, 'Update metadata', using update_metadata(), loops back toward the earlier steps, indicating that metadata can be updated after downloading and re-enter the workflow." width="100%"></center>

</br>
Here is a description of each step:

Obtain metadata:

1. Queries regarding a species are submitted through one of the available
query functions (`query_repo_name()`) that connect to five different online
repositories (Xeno-Canto, Inaturalist, GBIF, Macaulay Library and WikiAves). The output of these queries is a data frame containing metadata
associated with the media files (e.g., species name, date, location, etc, see below).

Curate metadata:

1. If multiple repositories are queried, the resulting metadata data frames can
be merged into a single data frame using the
[merge_metadata()](https://docs.ropensci.org/suwo/reference/merge_metadata.html) function.

1. Check for duplicate records in their datasets using the [find_duplicates()](https://docs.ropensci.org/suwo/reference/find_duplicates.html) function. Candidate duplicated entries are identified based on matching species name, country, date, user name, and
geographic coordinates. User can double check the candidate duplicates and
decide which records to keep, which can be done with [remove_duplicates()](https://docs.ropensci.org/suwo/reference/remove_duplicates.html).

1. Download the media files associated with the metadata using the [download_media()](https://docs.ropensci.org/suwo/reference/download_media.html) function.

1. Users can update their datasets with new records using the [update_metadata()](https://docs.ropensci.org/suwo/reference/update_metadata.html) function.


# Obtaining metadata: the query functions

The following table summarizes the available [suwo](https://docs.ropensci.org/suwo/)
query functions and the types of metadata they retrieve:


<table class="table table-striped table-hover table-condensed table-responsive" style="width: auto !important; ">
<caption>Table 1: Summary of query functions and the associated repositories.</caption>
 <thead>
  <tr>
   <th style="text-align:left;"> Function </th>
   <th style="text-align:left;"> Repository </th>
   <th style="text-align:left;"> URL link </th>
   <th style="text-align:left;"> File types </th>
   <th style="text-align:left;"> Requires api key </th>
   <th style="text-align:left;"> Taxonomic level </th>
   <th style="text-align:left;"> Geographic coverage </th>
   <th style="text-align:left;"> Taxonomic coverage </th>
   <th style="text-align:left;"> Other features </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> <a href="https://docs.ropensci.org/suwo/reference/query_gbif.html" style="     " target="_blank">query_gbif</a> </td>
   <td style="text-align:left;"> GBIF </td>
   <td style="text-align:left;"> <a href="https://www.gbif.org/" style="     " target="_blank">https://www.gbif.org/</a> </td>
   <td style="text-align:left;"> image, sound, video, interactive resource </td>
   <td style="text-align:left;"> No </td>
   <td style="text-align:left;"> Species </td>
   <td style="text-align:left;"> Global </td>
   <td style="text-align:left;"> All life </td>
   <td style="text-align:left;"> Specify query by data base </td>
  </tr>
  <tr>
   <td style="text-align:left;"> <a href="https://docs.ropensci.org/suwo/reference/query_inaturalist.html" style="     " target="_blank">query_inaturalist</a> </td>
   <td style="text-align:left;"> iNaturalist </td>
   <td style="text-align:left;"> <a href="https://www.inaturalist.org/" style="     " target="_blank">https://www.inaturalist.org/</a> </td>
   <td style="text-align:left;"> image, sound </td>
   <td style="text-align:left;"> No </td>
   <td style="text-align:left;"> Species </td>
   <td style="text-align:left;"> Global </td>
   <td style="text-align:left;"> All life </td>
   <td style="text-align:left;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;"> <a href="https://docs.ropensci.org/suwo/reference/query_macaulay.html" style="     " target="_blank">query_macaulay</a> </td>
   <td style="text-align:left;"> Macaulay Library </td>
   <td style="text-align:left;"> <a href="https://www.macaulaylibrary.org/" style="     " target="_blank">https://www.macaulaylibrary.org/</a> </td>
   <td style="text-align:left;"> image, sound, video </td>
   <td style="text-align:left;"> No </td>
   <td style="text-align:left;"> Species </td>
   <td style="text-align:left;"> Global </td>
   <td style="text-align:left;"> Mostly birds but also other vertebrates and
  invertebrates </td>
   <td style="text-align:left;"> Interactive </td>
  </tr>
  <tr>
   <td style="text-align:left;"> <a href="https://docs.ropensci.org/suwo/reference/query_wikiaves.html" style="     " target="_blank">query_wikiaves</a> </td>
   <td style="text-align:left;"> WikiAves </td>
   <td style="text-align:left;"> <a href="https://www.wikiaves.com.br/" style="     " target="_blank">https://www.wikiaves.com.br/</a> </td>
   <td style="text-align:left;"> image, sound </td>
   <td style="text-align:left;"> No </td>
   <td style="text-align:left;"> Species </td>
   <td style="text-align:left;"> Brazil </td>
   <td style="text-align:left;"> Birds </td>
   <td style="text-align:left;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;"> <a href="https://docs.ropensci.org/suwo/reference/query_xenocanto.html" style="     " target="_blank">query_xenocanto</a> </td>
   <td style="text-align:left;"> Xeno-Canto </td>
   <td style="text-align:left;"> <a href="https://www.xeno-canto.org/" style="     " target="_blank">https://www.xeno-canto.org/</a> </td>
   <td style="text-align:left;"> sound </td>
   <td style="text-align:left;"> Yes </td>
   <td style="text-align:left;"> Species, subspecies, genus, family, group </td>
   <td style="text-align:left;"> Global </td>
   <td style="text-align:left;"> Birds, frogs, non-marine mammals and grasshoppers </td>
   <td style="text-align:left;"> Specify query by taxonomy, geographic range and dates </td>
  </tr>
</tbody>
</table>

These are some example queries:

1. Images of Sarapiqui Heliconia (_Heliconia sarapiquensis_) from iNaturalist (we print the first 4 rows of each output data frame):


``` r
# Load suwo package
library(suwo)

h_sarapiquensis <- query_inaturalist(species = "Heliconia sarapiquensis",
                                     format = "image")
```

```
✔ Obtaining metadata (29 matching records found) 😀
```

``` r
head(h_sarapiquensis, 4)
```

<div style="border: 1px solid #ddd; padding: 0px; overflow-y: scroll; height:300px; overflow-x: scroll; width:100%; "><table class="table table-striped" style="font-size: 14px; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> repository </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> format </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> key </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> species </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> date </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> time </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> user_name </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> country </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> locality </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> latitude </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> longitude </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> file_url </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> file_extension </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> observation_url </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;"> iNaturalist </td>
   <td style="text-align:center;"> image </td>
   <td style="text-align:center;"> 330280680 </td>
   <td style="text-align:center;"> Heliconia sarapiquensis </td>
   <td style="text-align:center;"> 2025-12-08 </td>
   <td style="text-align:center;"> 13:47 </td>
   <td style="text-align:center;"> Carlos g Velazco-Macias </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> 10.159645,-83.9378766667 </td>
   <td style="text-align:center;"> 10.15964 </td>
   <td style="text-align:center;"> -83.93788 </td>
   <td style="text-align:center;"> https://inaturalist-open-data.s3.amazonaws.com/photos/598874322/original.jpg </td>
   <td style="text-align:center;"> jpeg </td>
   <td style="text-align:center;"> https://www.inaturalist.org/observations/330280680 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> iNaturalist </td>
   <td style="text-align:center;"> image </td>
   <td style="text-align:center;"> 330280680 </td>
   <td style="text-align:center;"> Heliconia sarapiquensis </td>
   <td style="text-align:center;"> 2025-12-08 </td>
   <td style="text-align:center;"> 13:47 </td>
   <td style="text-align:center;"> Carlos g Velazco-Macias </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> 10.159645,-83.9378766667 </td>
   <td style="text-align:center;"> 10.15964 </td>
   <td style="text-align:center;"> -83.93788 </td>
   <td style="text-align:center;"> https://inaturalist-open-data.s3.amazonaws.com/photos/598874346/original.jpg </td>
   <td style="text-align:center;"> jpeg </td>
   <td style="text-align:center;"> https://www.inaturalist.org/observations/330280680 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> iNaturalist </td>
   <td style="text-align:center;"> image </td>
   <td style="text-align:center;"> 330280680 </td>
   <td style="text-align:center;"> Heliconia sarapiquensis </td>
   <td style="text-align:center;"> 2025-12-08 </td>
   <td style="text-align:center;"> 13:47 </td>
   <td style="text-align:center;"> Carlos g Velazco-Macias </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> 10.159645,-83.9378766667 </td>
   <td style="text-align:center;"> 10.15964 </td>
   <td style="text-align:center;"> -83.93788 </td>
   <td style="text-align:center;"> https://inaturalist-open-data.s3.amazonaws.com/photos/598874381/original.jpg </td>
   <td style="text-align:center;"> jpeg </td>
   <td style="text-align:center;"> https://www.inaturalist.org/observations/330280680 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> iNaturalist </td>
   <td style="text-align:center;"> image </td>
   <td style="text-align:center;"> 263417773 </td>
   <td style="text-align:center;"> Heliconia sarapiquensis </td>
   <td style="text-align:center;"> 2025-02-28 </td>
   <td style="text-align:center;"> 14:23 </td>
   <td style="text-align:center;"> Original Madness </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> 10.163116739,-83.9389050007 </td>
   <td style="text-align:center;"> 10.16312 </td>
   <td style="text-align:center;"> -83.93891 </td>
   <td style="text-align:center;"> https://inaturalist-open-data.s3.amazonaws.com/photos/473219810/original.jpeg </td>
   <td style="text-align:center;"> jpeg </td>
   <td style="text-align:center;"> https://www.inaturalist.org/observations/263417773 </td>
  </tr>
</tbody>
</table></div>

</br>

2.  Harpy eagles (_Harpia harpyja_) audio recordings from WikiAves:


``` r
h_harpyja <- query_wikiaves(species = "Harpia harpyja", format = "sound")
```

```
✔ Obtaining metadata (78 matching records found) 🥇
```

``` r
head(h_harpyja, 4)
```

<div style="border: 1px solid #ddd; padding: 0px; overflow-y: scroll; height:300px; overflow-x: scroll; width:100%; "><table class="table table-striped" style="font-size: 14px; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> repository </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> format </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> key </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> species </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> date </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> time </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> user_name </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> country </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> locality </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> latitude </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> longitude </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> file_url </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> file_extension </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> observation_url </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;"> WikiAves </td>
   <td style="text-align:center;"> sound </td>
   <td style="text-align:center;"> 25867 </td>
   <td style="text-align:center;"> Harpia harpyja </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> Gustavo Pedersoli </td>
   <td style="text-align:center;"> Brazil </td>
   <td style="text-align:center;"> Alta Floresta/MT </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> https://s3.amazonaws.com/media.wikiaves.com.br/recordings/52/25867_a73f0e8da2179e82af223ff27f74a912.mp3 </td>
   <td style="text-align:center;"> mp3 </td>
   <td style="text-align:center;"> https://www.wikiaves.com.br/25867 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> WikiAves </td>
   <td style="text-align:center;"> sound </td>
   <td style="text-align:center;"> 2701424 </td>
   <td style="text-align:center;"> Harpia harpyja </td>
   <td style="text-align:center;"> 2020-10-20 </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> Bruno Lima </td>
   <td style="text-align:center;"> Brazil </td>
   <td style="text-align:center;"> Itanhaém/SP </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> https://s3.amazonaws.com/media.wikiaves.com.br/recordings/1072/2701424_e0d533b952b64d6297c4aff21362474b.mp3 </td>
   <td style="text-align:center;"> mp3 </td>
   <td style="text-align:center;"> https://www.wikiaves.com.br/2701424 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> WikiAves </td>
   <td style="text-align:center;"> sound </td>
   <td style="text-align:center;"> 878999 </td>
   <td style="text-align:center;"> Harpia harpyja </td>
   <td style="text-align:center;"> 2013-03-20 </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> Thiago Silveira </td>
   <td style="text-align:center;"> Brazil </td>
   <td style="text-align:center;"> Alta Floresta/MT </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> https://s3.amazonaws.com/media.wikiaves.com.br/recordings/878/878999_c1f8f4ba81fd597548752e92f1cdba50.mp3 </td>
   <td style="text-align:center;"> mp3 </td>
   <td style="text-align:center;"> https://www.wikiaves.com.br/878999 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> WikiAves </td>
   <td style="text-align:center;"> sound </td>
   <td style="text-align:center;"> 3027120 </td>
   <td style="text-align:center;"> Harpia harpyja </td>
   <td style="text-align:center;"> 2016-06-20 </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> Ciro Albano </td>
   <td style="text-align:center;"> Brazil </td>
   <td style="text-align:center;"> Camacan/BA </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> https://s3.amazonaws.com/media.wikiaves.com.br/recordings/7203/3027120_5148ce0fed5fe99aba7c65b2f045686a.mp3 </td>
   <td style="text-align:center;"> mp3 </td>
   <td style="text-align:center;"> https://www.wikiaves.com.br/3027120 </td>
  </tr>
</tbody>
</table></div>


</br>

3. Common raccoon (_Procyon lotor_) videos from GBIF:


``` r
p_lotor <- query_gbif(species = "Procyon lotor", format = "video")
```

```
✔ Obtaining metadata (13 matching records found) 😸
```

``` r
head(p_lotor, 4)
```

<div style="border: 1px solid #ddd; padding: 0px; overflow-y: scroll; height:300px; overflow-x: scroll; width:100%; "><table class="table table-striped" style="font-size: 14px; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> repository </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> format </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> key </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> species </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> date </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> time </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> user_name </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> country </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> locality </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> latitude </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> longitude </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> file_url </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> file_extension </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> observation_url </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;"> GBIF </td>
   <td style="text-align:center;"> video </td>
   <td style="text-align:center;"> 3501153129 </td>
   <td style="text-align:center;"> Procyon lotor </td>
   <td style="text-align:center;"> 2015-07-21 </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> Luxembourg </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> 49.7733 </td>
   <td style="text-align:center;"> 5.94092 </td>
   <td style="text-align:center;"> https://archimg.mnhn.lu/Observations/Taxons/Biomonitoring/063_094_S2_K2_20150721_063004AM.mp4 </td>
   <td style="text-align:center;"> m4a </td>
   <td style="text-align:center;"> https://www.gbif.org/occurrence/3501153129 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> GBIF </td>
   <td style="text-align:center;"> video </td>
   <td style="text-align:center;"> 3501153135 </td>
   <td style="text-align:center;"> Procyon lotor </td>
   <td style="text-align:center;"> 2015-07-04 </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> Luxembourg </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> 49.7733 </td>
   <td style="text-align:center;"> 5.94092 </td>
   <td style="text-align:center;"> https://archimg.mnhn.lu/Observations/Taxons/Biomonitoring/063_094_S2_K1_20150704_072418AM.mp4 </td>
   <td style="text-align:center;"> m4a </td>
   <td style="text-align:center;"> https://www.gbif.org/occurrence/3501153135 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> GBIF </td>
   <td style="text-align:center;"> video </td>
   <td style="text-align:center;"> 3501153159 </td>
   <td style="text-align:center;"> Procyon lotor </td>
   <td style="text-align:center;"> 2015-07-04 </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> Luxembourg </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> 49.7733 </td>
   <td style="text-align:center;"> 5.94092 </td>
   <td style="text-align:center;"> https://archimg.mnhn.lu/Observations/Taxons/Biomonitoring/063_094_S2_K1_20150704_072402AM.mp4 </td>
   <td style="text-align:center;"> m4a </td>
   <td style="text-align:center;"> https://www.gbif.org/occurrence/3501153159 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> GBIF </td>
   <td style="text-align:center;"> video </td>
   <td style="text-align:center;"> 3501153162 </td>
   <td style="text-align:center;"> Procyon lotor </td>
   <td style="text-align:center;"> 2015-07-04 </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> Luxembourg </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> 49.7733 </td>
   <td style="text-align:center;"> 5.94092 </td>
   <td style="text-align:center;"> https://archimg.mnhn.lu/Observations/Taxons/Biomonitoring/063_094_S2_K1_20150704_072346AM.mp4 </td>
   <td style="text-align:center;"> m4a </td>
   <td style="text-align:center;"> https://www.gbif.org/occurrence/3501153162 </td>
  </tr>
</tbody>
</table></div>

</br>

---

By default all query function return the 14 most basic metadata fields associated with the media files. Here is the definition of each field:

 - **repository**: Name of the repository
 - **format**: Type of media file (e.g., sound, photo, video)
 - **key**: Unique identifier of the media file in the repository
 - **species**: Species name associated with the media file (Note taxonomic authority may vary among repositories)
 - **date***: Date when the media file was recorded/photographed (in YYYY-MM-DD format or YYYY if only year is available)
 - **time***: Time when the media file was recorded/photographed (in HH:MM format)
 - **user_name***: Name of the user who uploaded the media file
 - **country***: Country where the media file was recorded/photographed
 - **locality***: Locality where the media file was recorded/photographed
 - **latitude***: Latitude of the location where the media file was recorded/photographed (in decimal degrees)
 - **longitude***: Longitude of the location where the media file was recorded/photographed (in decimal degrees)
 - **file_url**: URL link to the media file (used to download media files)
 - **file_extension**: Extension of the media file (e.g., .mp3, .jpg, .mp4)
 - **observation_url**: URL link to the original observation page in the repository (used to check the original metadata and media file)

_* Can contain missing values (NAs)_

Users can also download all available metadata by setting the argument `all_data = TRUE`. These are the additional metadata fields, on top of the basic fields, that are retrieved by each query function:

<table class="table table-striped table-hover table-condensed table-responsive" style="width: auto !important; ">
<caption>Table 2: Additional metadata per query function.</caption>
 <thead>
  <tr>
   <th style="text-align:left;"> Function </th>
   <th style="text-align:left;"> Additional data </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> <a href="https://docs.ropensci.org/suwo/reference/query_gbif.html" style="     " target="_blank">query_gbif</a> </td>
   <td style="text-align:left;"> datasetkey, publishingorgkey, installationkey, hostingorganizationkey, publishingcountry, protocol, lastcrawled, lastparsed, crawlid, basisofrecord, occurrencestatus, taxonkey, kingdom_code, phylum_code, class_code, order_code, family_key, genus_code, species_code, acceptedtaxonkey, scientificnameauthorship, acceptedscientificname, kingdom, phylum, order, family, genus, genericname, specific_epithet, taxonrank, taxonomicstatus, iucnredlistcategory, continent, year, month, day, startdayofyear, enddayofyear, lastinterpreted, license, organismquantity, organismquantitytype, issequenced, isincluster, datasetname, recordist, identifiedby, samplingprotocol, geodeticdatum, class, countrycode, gbifregion, publishedbygbifregion, recordnumber, identifier, habitat, institutionid, verbatimeventdate, dynamicproperties, verbatimcoordinatesystem, eventremarks, gbifid, collectioncode, occurrenceid, institutioncode, identificationqualifier, media_type, page, state_province, comments </td>
  </tr>
  <tr>
   <td style="text-align:left;"> <a href="https://docs.ropensci.org/suwo/reference/query_inaturalist.html" style="     " target="_blank">query_inaturalist</a> </td>
   <td style="text-align:left;"> quality_grade, taxon_geoprivacy, uuid, cached_votes_total, identifications_most_agree, species_guess, identifications_most_disagree, positional_accuracy, comments_count, site_id, created_time_zone, license_code, observed_time_zone, public_positional_accuracy, oauth_application_id, created_at, description, time_zone_offset, observed_on, observed_on_string, updated_at, captive, faves_count, num_identification_agreements, identification_disagreements_count, map_scale, uri, community_taxon_id, owners_identification_from_vision, identifications_count, obscured, num_identification_disagreements, geoprivacy, spam, mappable, identifications_some_agree, place_guess, id, license_code_1, attribution, hidden </td>
  </tr>
  <tr>
   <td style="text-align:left;"> <a href="https://docs.ropensci.org/suwo/reference/query_macaulay.html" style="     " target="_blank">query_macaulay</a> </td>
   <td style="text-align:left;"> common_name, background_species, caption, year, month, day, country_state_county, state_province, county, age_sex, behavior, playback, captive, collected, specimen_id, home_archive_catalog_number, recorder, microphone, accessory, partner_institution, ebird_checklist_id, unconfirmed, air_temp__c_, water_temp__c_, media_notes, observation_details, parent_species, species_code, taxon_category, taxonomic_sort, recordist_2, average_community_rating, number_of_ratings, asset_tags, original_image_height, original_image_width </td>
  </tr>
  <tr>
   <td style="text-align:left;"> <a href="https://docs.ropensci.org/suwo/reference/query_wikiaves.html" style="     " target="_blank">query_wikiaves</a> </td>
   <td style="text-align:left;"> user_id, species_code, common_name, repository_id, verified, locality_id, number_of_comments, likes, visualizations, duration </td>
  </tr>
  <tr>
   <td style="text-align:left;"> <a href="https://docs.ropensci.org/suwo/reference/query_xenocanto.html" style="     " target="_blank">query_xenocanto</a> </td>
   <td style="text-align:left;"> genus, specific_epithet, subspecies, taxonomic_group, english_name, altitude, vocalization_type, sex, stage, method, url, uploaded_file, license, quality, length, upload_date, other_species, comments, animal_seen, playback_used, temp, regnr, auto, recorder, microphone, sampling_rate, sonogram_small, sonogram_med, sonogram_large, sonogram_full, oscillogram_small, oscillogram_med, oscillogram_large, sonogram </td>
  </tr>
</tbody>
</table>

<div class="alert alert-warning">

**Obtaining raw data**

By default the package standardizes the information in the basic fields (detailed above) in order to facilitate the compilation of metadata from multiple repositories. However, in some cases this may result in loss of information. For instance, some repositories allow users to provide "morning" as a valid time value, which are converted into NAs by [suwo](https://docs.ropensci.org/suwo/). In such cases, users can retrieve the original data by setting the `raw_data = TRUE` in the query functions and/or global options (`options(raw_data = TRUE)`). Note that subsequent data manipulation functions (e.g., [merge_metadata()](https://docs.ropensci.org/suwo/reference/merge_metadata.html), [find_duplicates()](https://docs.ropensci.org/suwo/reference/find_duplicates.html), etc) will not work as the basic fields are not standardized.

</div>

The code above examplifies the most common use of query functions, which applies also to the function [query_gbif()](https://docs.ropensci.org/suwo/reference/query_gbif.html). The following sections provide more details on the two query functions that require special considerations: [query_macaulay()](https://docs.ropensci.org/suwo/reference/query_macaulay.html) and [query_xenocanto()](https://docs.ropensci.org/suwo/reference/query_xenocanto.html).

## query_macaulay()

### Interactive retrieval of metadata

[query_macaulay()](https://docs.ropensci.org/suwo/reference/query_macaulay.html) is the only interactive function.  This means that when users run a query the function opens a browser window to the [Macaulay Library's search page](https://search.macaulaylibrary.org/catalog), where the users must download a .csv file with the metadata. Here is a example of a query for strip-throated hermit (_Phaethornis striigularis_) videos:


``` r
p_striigularis <- query_macaulay(species = "Phaethornis striigularis",
                                 format = "video")
```

```
ℹ A browser will open the macaulay library website. Save the .csv file ('export' button) to this directory: 
/home/m/Dropbox/R_package_testing/suwo/vignettes/ 
```

```
ℹ (R is monitoring for new CSV files. Press ESC to stop the function)
```

```
ℹ File:  
ML__2026-04-13T20-24_stther2_video.csv 
```

```
✔ 27 matching records found 🎉
```

Users must click on the "Export" button to save the .csv file with the metadata:

<center><img src="ml_browser.jpeg" alt="Screen shot of the Macaulay library search site showing the first result of a query for Stripe-throated hermit videos" width="100%"></center>

</br>


Note that for bird species the species name must be valid according to the Macaulay Library taxonomy (which follows the Clements checklist). For non-bird species users must use the argument `taxon_code`. The species taxon code can be found by running a search at the [Macaulay Library's search page](https://search.macaulaylibrary.org/catalog) and checking the URL of the species page. For instance, the taxon code for jaguar (_Panthera onca_) is "t-11032765":

<center><img src="ml_taxon_code.png" alt="Screen shot of the Macaulay library search site showing the first result of a query for jaguar videos, highlighting the taxon code in the URL adress" width="100%"></center>

Once you have the taxon code, you can run the query as follows:

``` r
jaguar <- query_macaulay(taxon_code = "t-11032765",
                                 format = "video")
```

```
ℹ A browser will open the macaulay library website. Save the .csv file ('export' button) to this directory: 
/home/m/Dropbox/R_package_testing/suwo/vignettes/ 
```

```
ℹ (R is monitoring for new CSV files. Press ESC to stop the function)
```

```
ℹ File:  
ML__2026-04-13T20-24_t-11032765_video.csv 
```

```
✔ 12 matching records found 🌈
```

Here are some tips for using this function properly:

* Valid bird species names can be checked at `suwo:::ml_taxon_code$SCI_NAME`
* The exported csv file must be saved in the directory specified by the argument `path` of the function (default is the current working directory)
* The function will not proceed until the file is saved (press ESC to stop the function)
* Do not overwritte files : if the file is saved overwriting a pre-existing file (i.e. same file name) the function will not detect it
* Users must log in to the Macaulay Library/eBird account in order to access large batches of observations

After saving the file, the function will read the file and return a data frame with the metadata. Here we print the first 4 rows of the output data frame:


``` r
head(p_striigularis, 4)
```

<div style="border: 1px solid #ddd; padding: 0px; overflow-y: scroll; height:300px; overflow-x: scroll; width:100%; "><table class="table table-striped" style="font-size: 14px; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> repository </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> format </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> key </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> species </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> date </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> time </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> user_name </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> country </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> locality </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> latitude </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> longitude </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> file_url </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> file_extension </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> observation_url </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;"> Macaulay Library </td>
   <td style="text-align:center;"> video </td>
   <td style="text-align:center;"> 654011852 </td>
   <td style="text-align:center;"> Phaethornis striigularis </td>
   <td style="text-align:center;"> 2026-03-15 </td>
   <td style="text-align:center;"> 06:56 </td>
   <td style="text-align:center;"> Bret Whitney </td>
   <td style="text-align:center;"> Mexico </td>
   <td style="text-align:center;"> Camino La Guadalupe--La Reforma </td>
   <td style="text-align:center;"> 17.8472300 </td>
   <td style="text-align:center;"> -96.03763 </td>
   <td style="text-align:center;"> https://cdn.download.ams.birds.cornell.edu/api/v1/asset/654011852/ </td>
   <td style="text-align:center;"> mp4 </td>
   <td style="text-align:center;"> https://macaulaylibrary.org/asset/654011852 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> Macaulay Library </td>
   <td style="text-align:center;"> video </td>
   <td style="text-align:center;"> 652949939 </td>
   <td style="text-align:center;"> Phaethornis striigularis </td>
   <td style="text-align:center;"> 2026-03-15 </td>
   <td style="text-align:center;"> 08:17 </td>
   <td style="text-align:center;"> Jonathan Ávalos </td>
   <td style="text-align:center;"> Costa Rica </td>
   <td style="text-align:center;"> PN Carara-Entrada principal [senderos Quebrada Bonita y Universal] </td>
   <td style="text-align:center;"> 9.7808812 </td>
   <td style="text-align:center;"> -84.60610 </td>
   <td style="text-align:center;"> https://cdn.download.ams.birds.cornell.edu/api/v1/asset/652949939/ </td>
   <td style="text-align:center;"> mp4 </td>
   <td style="text-align:center;"> https://macaulaylibrary.org/asset/652949939 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> Macaulay Library </td>
   <td style="text-align:center;"> video </td>
   <td style="text-align:center;"> 652590842 </td>
   <td style="text-align:center;"> Phaethornis striigularis </td>
   <td style="text-align:center;"> 2026-02-12 </td>
   <td style="text-align:center;"> 05:57 </td>
   <td style="text-align:center;"> Matthew Kumjian </td>
   <td style="text-align:center;"> Costa Rica </td>
   <td style="text-align:center;"> Iguana Lodge (Osa) </td>
   <td style="text-align:center;"> 8.5105006 </td>
   <td style="text-align:center;"> -83.29248 </td>
   <td style="text-align:center;"> https://cdn.download.ams.birds.cornell.edu/api/v1/asset/652590842/ </td>
   <td style="text-align:center;"> mp4 </td>
   <td style="text-align:center;"> https://macaulaylibrary.org/asset/652590842 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> Macaulay Library </td>
   <td style="text-align:center;"> video </td>
   <td style="text-align:center;"> 647905148 </td>
   <td style="text-align:center;"> Phaethornis striigularis </td>
   <td style="text-align:center;"> 2021-08-28 </td>
   <td style="text-align:center;"> 06:21 </td>
   <td style="text-align:center;"> Edison🦉 Ocaña </td>
   <td style="text-align:center;"> Ecuador </td>
   <td style="text-align:center;"> Finca Blanca Margarita - Chicao Chocolate </td>
   <td style="text-align:center;"> 0.1575163 </td>
   <td style="text-align:center;"> -79.22496 </td>
   <td style="text-align:center;"> https://cdn.download.ams.birds.cornell.edu/api/v1/asset/647905148/ </td>
   <td style="text-align:center;"> mp4 </td>
   <td style="text-align:center;"> https://macaulaylibrary.org/asset/647905148 </td>
  </tr>
</tbody>
</table></div>

### Bypassing record limit

Even if logged in, a maximum of 10000 records per query can be returned. This can be bypassed by using the argument `dates` to split the search into a sequence of shorter date ranges. The rationale is that by splitting the search into date ranges, users can download multiple .csv files, which are then combined by the function into a single metadata data frame. Of course users must download the csv for each data range.  The following code looks for photos of costa's hummingbird (_Calypte costae_). As Macaulay Library hosts more than 30000 costa's hummingbird records, we need to split the query into multiple date ranges:


``` r
# test a query with more than 10000 results paging by date
cal_cos <- query_macaulay(
  species = "Calypte costae",
  format = "image",
  dates = c(1976, 2020, 2022, 2024, 2025, 2026)
)
```

```
ℹ A browser will open the macaulay library website. Save the .csv file ('export' button) to this directory: 
/home/m/Dropbox/R_package_testing/suwo/vignettes/ 
```

```
ℹ (R is monitoring for new CSV files. Press ESC to stop the function)
```

```
• Query 1 of 5 (1976-2019):
```

```
ℹ File:  
ML__2026-04-13T20-25_coshum_photo.csv 
```

```
• Query 2 of 5 (2020-2021):
```

```
ℹ File:  
ML__2026-04-13T20-25_coshum_photo2.csv 
```

```
• Query 3 of 5 (2022-2023):
```

```
ℹ File:  
ML__2026-04-13T20-28_coshum_photo.csv 
```

```
• Query 4 of 5 (2024):
```

```
ℹ File:  
ML__2026-04-13T20-28_coshum_photo2.csv 
```

```
• Query 5 of 5 (2025-2026):
```

```
ℹ File:  
ML__2026-04-13T20-28_coshum_photo3.csv 
```

```
✔ 39566 matching records found 🌈
```

Users can check at the Macaulay Library website how many records are available for their species of interest (see image below) and then decide how to split the search by date ranges accordingly so each sub-query has less than 10000 records.

<center><img src="ml_num_recs.jpeg" alt="Screen shot of the Macaulay library search site showing the first result of a query for Costa's hummingbird and highlighting how to check the number of records for that query" width="100%"></center>

[query_macaulay()](https://docs.ropensci.org/suwo/reference/query_macaulay.html) can also read metadata previously downloaded from [Macaulay Library website](https://www.macaulaylibrary.org/). To do this, users must provide 1) the name of the csv file(s) to the argument `files` and 2) the directory path were it was saved to the argument `path`.

## query_xenocanto()

### API key

[Xeno-Canto](https://www.xeno-canto.org/) requires users to obtain a free API key to use [their API v3](https://xeno-canto.org/admin.php/explore/api). Users can get their API key by creating an account at [Xeno-Canto's registering page](https://xeno-canto.org/auth/register). Once users have their API key, they can set it as a variable in your R environment using `Sys.setenv(xc_api_key = "YOUR_API_KEY_HERE")` and [query_xenocanto()](https://docs.ropensci.org/suwo/reference/query_xenocanto.html) will use it. Here is an example of a query for Spix's disc-winged bat (_Thyroptera tricolor_) audio recordings:


``` r
#  set your Xeno-Canto key as environmental variable (run it on the console)
# Sys.setenv(xc_api_key = "YOUR_API_KEY_HERE")

# query Xeno-CAnto
t_tricolor <- query_xenocanto(species = "Thyroptera tricolor")
```

```
ℹ Obtaining metadata: 
```

```
✔ 6 matching sound files found 😀
```

``` r
# we remove urls to avoid CRAN issues
head(t_tricolor[, grep("url", names(t_tricolor), invert = TRUE)], 4)
```

<div style="border: 1px solid #ddd; padding: 0px; overflow-y: scroll; height:300px; overflow-x: scroll; width:100%; "><table class="table table-striped" style="font-size: 14px; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> repository </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> format </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> key </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> species </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> date </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> time </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> user_name </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> country </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> locality </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> latitude </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> longitude </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> file_extension </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;"> Xeno-Canto </td>
   <td style="text-align:center;"> sound </td>
   <td style="text-align:center;"> 879621 </td>
   <td style="text-align:center;"> Thyroptera tricolor </td>
   <td style="text-align:center;"> 2023-07-15 </td>
   <td style="text-align:center;"> 12:30 </td>
   <td style="text-align:center;"> José Tinajero </td>
   <td style="text-align:center;"> Costa Rica </td>
   <td style="text-align:center;"> Hacienda Baru, Dominical, Costa Rica </td>
   <td style="text-align:center;"> 9.2635 </td>
   <td style="text-align:center;"> -83.8768 </td>
   <td style="text-align:center;"> wav </td>
  </tr>
  <tr>
   <td style="text-align:center;"> Xeno-Canto </td>
   <td style="text-align:center;"> sound </td>
   <td style="text-align:center;"> 820604 </td>
   <td style="text-align:center;"> Thyroptera tricolor </td>
   <td style="text-align:center;"> 2013-01-10 </td>
   <td style="text-align:center;"> 19:00 </td>
   <td style="text-align:center;"> Sébastien J. Puechmaille </td>
   <td style="text-align:center;"> Costa Rica </td>
   <td style="text-align:center;"> Pavo, Provincia de Puntarenas </td>
   <td style="text-align:center;"> 8.4815 </td>
   <td style="text-align:center;"> -83.5945 </td>
   <td style="text-align:center;"> wav </td>
  </tr>
  <tr>
   <td style="text-align:center;"> Xeno-Canto </td>
   <td style="text-align:center;"> sound </td>
   <td style="text-align:center;"> 820603 </td>
   <td style="text-align:center;"> Thyroptera tricolor </td>
   <td style="text-align:center;"> 2013-01-10 </td>
   <td style="text-align:center;"> 19:00 </td>
   <td style="text-align:center;"> Sébastien J. Puechmaille </td>
   <td style="text-align:center;"> Costa Rica </td>
   <td style="text-align:center;"> Pavo, Provincia de Puntarenas </td>
   <td style="text-align:center;"> 8.4815 </td>
   <td style="text-align:center;"> -83.5945 </td>
   <td style="text-align:center;"> wav </td>
  </tr>
  <tr>
   <td style="text-align:center;"> Xeno-Canto </td>
   <td style="text-align:center;"> sound </td>
   <td style="text-align:center;"> 821928 </td>
   <td style="text-align:center;"> Thyroptera tricolor </td>
   <td style="text-align:center;"> 2013-01-10 </td>
   <td style="text-align:center;"> 19:00 </td>
   <td style="text-align:center;"> Daniel j buckley </td>
   <td style="text-align:center;"> Costa Rica </td>
   <td style="text-align:center;"> Pavo, Provincia de Puntarenas </td>
   <td style="text-align:center;"> 8.4815 </td>
   <td style="text-align:center;"> -83.5945 </td>
   <td style="text-align:center;"> wav </td>
  </tr>
</tbody>
</table></div>


## Special queries

[query_xenocanto()](https://docs.ropensci.org/suwo/reference/query_xenocanto.html) allows users to perform special queries by specifying additional query tags. Users can also search by country, taxonomy (taxonomic group, family, genus, subspecies), geography (country, location, geographic coordinates)  date, sound type (e.g. female song, calls) and recording properties (quality, length, sampling rate) ([see list of available tags here](https://xeno-canto.org/admin.php/explore/api#examples)). Here is an example of a query for audio recordings of pale-striped poison frog (_Ameerega hahneli_, 'sp:"Ameerega hahneli") from French Guiana (cnt:"French Guiana") and with the highest recording quality (q:"A"):


``` r
# assuming you already set your API key as in previous code block
a_hahneli <- query_xenocanto(
  species = 'sp:"Ameerega hahneli" cnt:"French Guiana" q:"A"')
```

```
ℹ Obtaining metadata: 
```

```
✔ 3 matching sound files found 🌈
```

``` r
# we remove urls to avoid CRAN issues
head(a_hahneli[, grep("url", names(a_hahneli), invert = TRUE)], 4)
```

<div style="border: 1px solid #ddd; padding: 0px; overflow-y: scroll; height:300px; overflow-x: scroll; width:100%; "><table class="table table-striped" style="font-size: 14px; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> repository </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> format </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> key </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> species </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> date </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> time </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> user_name </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> country </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> locality </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> latitude </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> longitude </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> file_extension </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;"> Xeno-Canto </td>
   <td style="text-align:center;"> sound </td>
   <td style="text-align:center;"> 928987 </td>
   <td style="text-align:center;"> Ameerega hahneli </td>
   <td style="text-align:center;"> 2024-05-14 </td>
   <td style="text-align:center;"> 16:00 </td>
   <td style="text-align:center;"> Augustin Bussac </td>
   <td style="text-align:center;"> French Guiana </td>
   <td style="text-align:center;"> Sentier Gros-Arbre </td>
   <td style="text-align:center;"> 3.6132 </td>
   <td style="text-align:center;"> -53.2169 </td>
   <td style="text-align:center;"> mp3 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> Xeno-Canto </td>
   <td style="text-align:center;"> sound </td>
   <td style="text-align:center;"> 928972 </td>
   <td style="text-align:center;"> Ameerega hahneli </td>
   <td style="text-align:center;"> 2024-04-24 </td>
   <td style="text-align:center;"> 17:00 </td>
   <td style="text-align:center;"> Augustin Bussac </td>
   <td style="text-align:center;"> French Guiana </td>
   <td style="text-align:center;"> Camp Bonaventure </td>
   <td style="text-align:center;"> 4.3226 </td>
   <td style="text-align:center;"> -52.3387 </td>
   <td style="text-align:center;"> mp3 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> Xeno-Canto </td>
   <td style="text-align:center;"> sound </td>
   <td style="text-align:center;"> 928971 </td>
   <td style="text-align:center;"> Ameerega hahneli </td>
   <td style="text-align:center;"> 2023-11-26 </td>
   <td style="text-align:center;"> 13:00 </td>
   <td style="text-align:center;"> Augustin Bussac </td>
   <td style="text-align:center;"> French Guiana </td>
   <td style="text-align:center;"> Guyane Natural Regional Park (near  Roura), Arrondissement of Cayenne </td>
   <td style="text-align:center;"> 4.5423 </td>
   <td style="text-align:center;"> -52.4432 </td>
   <td style="text-align:center;"> mp3 </td>
  </tr>
</tbody>
</table></div>

# Update metadata

The [update_metadata()](https://docs.ropensci.org/suwo/reference/update_metadata.html) function allows users to update a previous query to add new information from the corresponding repository of the original search. This function takes as input a data frame previously obtained from any query function (i.e. `query_reponame()`) and returns a data frame similar to the input with new data appended.

To show case the function, we first query metadata of Eisentraut's Bow-winged Grasshopper sounds from iNaturalist. Let's assume that the initial query was done a while ago and we want to update it to include any new records that might have been added since then. The following code removes all observations recorded after 2024-12-31 to simulate an old query:


``` r
# initial query
c_eisentrauti <- query_inaturalist(species = "Chorthippus eisentrauti")
```

```
✔ Obtaining metadata (113 matching records found) 😸
```

``` r
head(c_eisentrauti, 3)
```

<div style="border: 1px solid #ddd; padding: 0px; overflow-y: scroll; height:300px; overflow-x: scroll; width:100%; "><table class="table table-striped" style="font-size: 14px; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> repository </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> format </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> key </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> species </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> date </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> time </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> user_name </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> country </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> locality </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> latitude </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> longitude </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> file_url </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> file_extension </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> observation_url </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;"> iNaturalist </td>
   <td style="text-align:center;"> image </td>
   <td style="text-align:center;"> 335245347 </td>
   <td style="text-align:center;"> Chorthippus eisentrauti </td>
   <td style="text-align:center;"> 2019-11-16 </td>
   <td style="text-align:center;"> 12:48 </td>
   <td style="text-align:center;"> Eliot Stein-Deffarges J. </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> 43.967696755,7.6218244195 </td>
   <td style="text-align:center;"> 43.96770 </td>
   <td style="text-align:center;"> 7.621824 </td>
   <td style="text-align:center;"> https://inaturalist-open-data.s3.amazonaws.com/photos/608983424/original.jpg </td>
   <td style="text-align:center;"> jpeg </td>
   <td style="text-align:center;"> https://www.inaturalist.org/observations/335245347 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> iNaturalist </td>
   <td style="text-align:center;"> image </td>
   <td style="text-align:center;"> 335245344 </td>
   <td style="text-align:center;"> Chorthippus eisentrauti </td>
   <td style="text-align:center;"> 2019-11-16 </td>
   <td style="text-align:center;"> 12:36 </td>
   <td style="text-align:center;"> Eliot Stein-Deffarges J. </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> 43.967696755,7.6218244195 </td>
   <td style="text-align:center;"> 43.96770 </td>
   <td style="text-align:center;"> 7.621824 </td>
   <td style="text-align:center;"> https://inaturalist-open-data.s3.amazonaws.com/photos/608982971/original.jpg </td>
   <td style="text-align:center;"> jpeg </td>
   <td style="text-align:center;"> https://www.inaturalist.org/observations/335245344 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> iNaturalist </td>
   <td style="text-align:center;"> image </td>
   <td style="text-align:center;"> 334597801 </td>
   <td style="text-align:center;"> Chorthippus eisentrauti </td>
   <td style="text-align:center;"> 2026-01-11 </td>
   <td style="text-align:center;"> 12:01 </td>
   <td style="text-align:center;"> Eliot Stein-Deffarges J. </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> 44.0786166389,7.6128199722 </td>
   <td style="text-align:center;"> 44.07862 </td>
   <td style="text-align:center;"> 7.612820 </td>
   <td style="text-align:center;"> https://inaturalist-open-data.s3.amazonaws.com/photos/607665238/original.jpg </td>
   <td style="text-align:center;"> jpeg </td>
   <td style="text-align:center;"> https://www.inaturalist.org/observations/334597801 </td>
  </tr>
</tbody>
</table></div>

``` r
# exclude new observations (simulate old data)
old_c_eisentrauti <-
  c_eisentrauti[c_eisentrauti$date <= "2024-12-31" | is.na(c_eisentrauti$date),
                ]

# update "old" data
upd_c_eisentrauti <- update_metadata(metadata = old_c_eisentrauti)
```

```
✔ Obtaining metadata (113 matching records found) 😸
```

```
✔ 95 new entries found 🥇
```

``` r
# compare number of records
nrow(c_eisentrauti) == nrow(upd_c_eisentrauti)
```

```
[1] TRUE
```

# Combine metadata from multiple repositories

The [merge_metadata()](https://docs.ropensci.org/suwo/reference/merge_metadata.html) function allows users to combine metadata data frames obtained from multiple query functions into a single data frame. The function will match the basic columns of all data frames. Data from additional columns (for instance when using `all_data = TRUE` in the query) will only be combined if the column names from different repositories match. The function will return a data frame that includes a new column called `source` indicating the name of the original metadata data frame:


``` r
truf_xc <- query_xenocanto(species = "Turdus rufiventris")
```

```
ℹ Obtaining metadata: 
```

```
✔ 486 matching sound files found 🎊
```

``` r
truf_gbf <- query_gbif(species = "Turdus rufiventris", format = "sound")
```

```
✔ Obtaining metadata (745 matching records found) 🥇
```

```
! 2 observations do not have a download link and were removed from the results (inlcuded as an attribute called 'excluded_results'). 
```

``` r
truf_ml <- query_macaulay(species = "Turdus rufiventris",
                          format = "sound")
```

```
ℹ A browser will open the macaulay library website. Save the .csv file ('export' button) to this directory: 
/home/m/Dropbox/R_package_testing/suwo/vignettes/ 
```

```
ℹ (R is monitoring for new CSV files. Press ESC to stop the function)
```

```
ℹ File:  
ML__2026-04-13T20-31_rubthr1_audio.csv 
```

```
✔ 1390 matching records found 🥇
```

``` r
# merge metadata
merged_metadata <- merge_metadata(truf_xc, truf_gbf, truf_ml)

# we remove urls to avoid CRAN issues
head(merged_metadata[, grep("url", names(merged_metadata), invert = TRUE)], 4)
```

<div style="border: 1px solid #ddd; padding: 0px; overflow-y: scroll; height:300px; overflow-x: scroll; width:100%; "><table class="table table-striped" style="font-size: 14px; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> repository </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> format </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> key </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> species </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> date </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> time </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> user_name </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> country </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> locality </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> latitude </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> longitude </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> file_extension </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> source </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;"> Xeno-Canto </td>
   <td style="text-align:center;"> sound </td>
   <td style="text-align:center;"> 1096135 </td>
   <td style="text-align:center;"> Turdus rufiventris </td>
   <td style="text-align:center;"> 2025-11-09 </td>
   <td style="text-align:center;"> 11:50 </td>
   <td style="text-align:center;"> Franco Vushurovich </td>
   <td style="text-align:center;"> Argentina </td>
   <td style="text-align:center;"> Victoria, Entre Ríos </td>
   <td style="text-align:center;"> -32.8606 </td>
   <td style="text-align:center;"> -60.6486 </td>
   <td style="text-align:center;"> wav </td>
   <td style="text-align:center;"> truf_xc </td>
  </tr>
  <tr>
   <td style="text-align:center;"> Xeno-Canto </td>
   <td style="text-align:center;"> sound </td>
   <td style="text-align:center;"> 1080158 </td>
   <td style="text-align:center;"> Turdus rufiventris </td>
   <td style="text-align:center;"> 2025-12-30 </td>
   <td style="text-align:center;"> 17:27 </td>
   <td style="text-align:center;"> Jayrson Araujo De Oliveira </td>
   <td style="text-align:center;"> Brazil </td>
   <td style="text-align:center;"> Reserva do Setor Sítio de Recreio Caraíbas-Goiânia, Goiás </td>
   <td style="text-align:center;"> -16.5631 </td>
   <td style="text-align:center;"> -49.2850 </td>
   <td style="text-align:center;"> mp3 </td>
   <td style="text-align:center;"> truf_xc </td>
  </tr>
  <tr>
   <td style="text-align:center;"> Xeno-Canto </td>
   <td style="text-align:center;"> sound </td>
   <td style="text-align:center;"> 1071699 </td>
   <td style="text-align:center;"> Turdus rufiventris </td>
   <td style="text-align:center;"> 2025-10-26 </td>
   <td style="text-align:center;"> 11:20 </td>
   <td style="text-align:center;"> Franco Vushurovich </td>
   <td style="text-align:center;"> Argentina </td>
   <td style="text-align:center;"> Victoria, Entre Ríos </td>
   <td style="text-align:center;"> -32.8606 </td>
   <td style="text-align:center;"> -60.6486 </td>
   <td style="text-align:center;"> wav </td>
   <td style="text-align:center;"> truf_xc </td>
  </tr>
  <tr>
   <td style="text-align:center;"> Xeno-Canto </td>
   <td style="text-align:center;"> sound </td>
   <td style="text-align:center;"> 1070609 </td>
   <td style="text-align:center;"> Turdus rufiventris </td>
   <td style="text-align:center;"> 2025-12-29 </td>
   <td style="text-align:center;"> 05:59 </td>
   <td style="text-align:center;"> Jayrson Araujo De Oliveira </td>
   <td style="text-align:center;"> Brazil </td>
   <td style="text-align:center;"> Reserva do Setor Sítio de Recreio Caraíbas-Goiânia, Goiás </td>
   <td style="text-align:center;"> -16.5631 </td>
   <td style="text-align:center;"> -49.2850 </td>
   <td style="text-align:center;"> mp3 </td>
   <td style="text-align:center;"> truf_xc </td>
  </tr>
</tbody>
</table></div>

Note that in such a multi-repository query, all query functions use the same search species (i.e. species name) and media format (e.g., sound, image, video). To facilitate this, users can set the global options `species` and `format` so they do not need to specify them in each query function:


``` r
# query at multiple repositories setting global options
options(species = "Turdus rufiventris", format = "sound")
truf_xc <- query_xenocanto() # assuming you already set your API key
truf_gbf <- query_gbif()
truf_ml <- query_macaulay()

# merge metadata
merged_metadata <- merge_metadata(truf_xc, truf_gbf, truf_ml)

# we remove urls to avoid CRAN issues
head(merged_metadata[, grep("url", names(merged_metadata), invert = TRUE)], 4)
```

# Find and remove duplicated records

When compiling data from multiple repositories, duplicated media records are a common issue, particularly for sound recordings. These duplicates occur both through data sharing between repositories like Xeno-Canto and GBIF, and when users upload the same file to multiple platforms. To help users efficiently identify these duplicate records, [suwo](https://docs.ropensci.org/suwo/) provides the [find_duplicates()](https://docs.ropensci.org/suwo/reference/find_duplicates.html) function. Duplicates are identified based on matching species name, country, date, user name, and locality. The function uses a fuzzy matching approach to account for minor variations in the data (e.g., typos, different location formats, etc).The output is a data frame with the candidate duplicate records, allowing users to review and decide which records to keep.

In this example we look for possible duplicates in the merged metadata data frame from the previous section:


``` r
# find duplicates
dups_merged_metadata <- find_duplicates(merged_metadata)
```

```
ℹ 668 potential duplicates found 
```

``` r
# look first 6 columns
head(dups_merged_metadata)
```

<div style="border: 1px solid #ddd; padding: 0px; overflow-y: scroll; height:300px; overflow-x: scroll; width:100%; "><table class="table table-striped" style="font-size: 14px; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> repository </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> format </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> key </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> species </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> date </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> time </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> user_name </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> country </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> locality </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> latitude </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> longitude </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> file_url </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> file_extension </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> observation_url </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> source </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> duplicate_group </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;"> Xeno-Canto </td>
   <td style="text-align:center;"> sound </td>
   <td style="text-align:center;"> 1048627 </td>
   <td style="text-align:center;"> Turdus rufiventris </td>
   <td style="text-align:center;"> 2025-10-03 </td>
   <td style="text-align:center;"> 12:03 </td>
   <td style="text-align:center;"> Franco Vushurovich </td>
   <td style="text-align:center;"> Argentina </td>
   <td style="text-align:center;"> Victoria, Entre Ríos </td>
   <td style="text-align:center;"> -32.86060 </td>
   <td style="text-align:center;"> -60.64860 </td>
   <td style="text-align:center;"> https://xeno-canto.org/1048627/download </td>
   <td style="text-align:center;"> wav </td>
   <td style="text-align:center;"> https://xeno-canto.org/1048627 </td>
   <td style="text-align:center;"> truf_xc </td>
   <td style="text-align:center;"> 1 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> GBIF </td>
   <td style="text-align:center;"> sound </td>
   <td style="text-align:center;"> 5995375753 </td>
   <td style="text-align:center;"> Turdus rufiventris </td>
   <td style="text-align:center;"> 2025-10-03 </td>
   <td style="text-align:center;"> 12:03 </td>
   <td style="text-align:center;"> Franco Vushurovich </td>
   <td style="text-align:center;"> Argentina </td>
   <td style="text-align:center;"> Victoria, Entre Ríos </td>
   <td style="text-align:center;"> -32.86060 </td>
   <td style="text-align:center;"> -60.64860 </td>
   <td style="text-align:center;"> https://xeno-canto.org/sounds/uploaded/VLDFGFKOWN/XC1048627-ZorzalColorado3deOctubre2025IslaUNION.wav </td>
   <td style="text-align:center;"> wav </td>
   <td style="text-align:center;"> https://www.gbif.org/occurrence/5995375753 </td>
   <td style="text-align:center;"> truf_gbf </td>
   <td style="text-align:center;"> 1 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> Xeno-Canto </td>
   <td style="text-align:center;"> sound </td>
   <td style="text-align:center;"> 398272 </td>
   <td style="text-align:center;"> Turdus rufiventris </td>
   <td style="text-align:center;"> 2016-10-26 </td>
   <td style="text-align:center;"> 20:30 </td>
   <td style="text-align:center;"> Federico Ferrer </td>
   <td style="text-align:center;"> Argentina </td>
   <td style="text-align:center;"> Villa Carlos Paz, Córdoba </td>
   <td style="text-align:center;"> -31.42940 </td>
   <td style="text-align:center;"> -64.48850 </td>
   <td style="text-align:center;"> https://xeno-canto.org/398272/download </td>
   <td style="text-align:center;"> mp3 </td>
   <td style="text-align:center;"> https://xeno-canto.org/398272 </td>
   <td style="text-align:center;"> truf_xc </td>
   <td style="text-align:center;"> 2 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> GBIF </td>
   <td style="text-align:center;"> sound </td>
   <td style="text-align:center;"> 2243790806 </td>
   <td style="text-align:center;"> Turdus rufiventris </td>
   <td style="text-align:center;"> 2016-10-26 </td>
   <td style="text-align:center;"> 20:30 </td>
   <td style="text-align:center;"> Federico Ferrer </td>
   <td style="text-align:center;"> Argentina </td>
   <td style="text-align:center;"> Villa Carlos Paz, Córdoba </td>
   <td style="text-align:center;"> -31.42940 </td>
   <td style="text-align:center;"> -64.48850 </td>
   <td style="text-align:center;"> https://xeno-canto.org/sounds/uploaded/SPSOKYZMRX/XC398272-Zorzal%20Colorado.mp3 </td>
   <td style="text-align:center;"> mp3 </td>
   <td style="text-align:center;"> https://www.gbif.org/occurrence/2243790806 </td>
   <td style="text-align:center;"> truf_gbf </td>
   <td style="text-align:center;"> 2 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> Macaulay Library </td>
   <td style="text-align:center;"> sound </td>
   <td style="text-align:center;"> 644453560 </td>
   <td style="text-align:center;"> Turdus rufiventris </td>
   <td style="text-align:center;"> 2025-10-28 </td>
   <td style="text-align:center;"> 03:51 </td>
   <td style="text-align:center;"> Fernanda Fernandex </td>
   <td style="text-align:center;"> Brazil </td>
   <td style="text-align:center;"> Brasília--Grande Colorado/Condomínio Vivendas Bela Vista </td>
   <td style="text-align:center;"> -15.65401 </td>
   <td style="text-align:center;"> -47.86088 </td>
   <td style="text-align:center;"> https://cdn.download.ams.birds.cornell.edu/api/v1/asset/644453560/ </td>
   <td style="text-align:center;"> mp3 </td>
   <td style="text-align:center;"> https://macaulaylibrary.org/asset/644453560 </td>
   <td style="text-align:center;"> truf_ml </td>
   <td style="text-align:center;"> 3 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> Macaulay Library </td>
   <td style="text-align:center;"> sound </td>
   <td style="text-align:center;"> 644453551 </td>
   <td style="text-align:center;"> Turdus rufiventris </td>
   <td style="text-align:center;"> 2025-10-28 </td>
   <td style="text-align:center;"> 03:51 </td>
   <td style="text-align:center;"> Fernanda Fernandex </td>
   <td style="text-align:center;"> Brazil </td>
   <td style="text-align:center;"> Brasília--Grande Colorado/Condomínio Vivendas Bela Vista </td>
   <td style="text-align:center;"> -15.65401 </td>
   <td style="text-align:center;"> -47.86088 </td>
   <td style="text-align:center;"> https://cdn.download.ams.birds.cornell.edu/api/v1/asset/644453551/ </td>
   <td style="text-align:center;"> mp3 </td>
   <td style="text-align:center;"> https://macaulaylibrary.org/asset/644453551 </td>
   <td style="text-align:center;"> truf_ml </td>
   <td style="text-align:center;"> 3 </td>
  </tr>
</tbody>
</table></div>



Note that the [find_duplicates()](https://docs.ropensci.org/suwo/reference/find_duplicates.html) function adds a new column called "duplicate_group" to the output data frame. This column assigns a unique identifier to each group of potential duplicates, allowing users to easily identify and review them. For instance, in the example above, records from duplicated group 90 belong to the same user, were recorded on the same date and time and in the same country:


``` r
subset(dups_merged_metadata, duplicate_group == 90)
```

<div style="border: 1px solid #ddd; padding: 0px; overflow-y: scroll; height:300px; overflow-x: scroll; width:100%; "><table class="table table-striped" style="font-size: 14px; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> repository </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> format </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> key </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> species </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> date </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> time </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> user_name </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> country </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> locality </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> latitude </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> longitude </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> file_url </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> file_extension </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> observation_url </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> source </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> duplicate_group </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;"> Xeno-Canto </td>
   <td style="text-align:center;"> sound </td>
   <td style="text-align:center;"> 273100 </td>
   <td style="text-align:center;"> Turdus rufiventris </td>
   <td style="text-align:center;"> 2013-10-19 </td>
   <td style="text-align:center;"> 18:00 </td>
   <td style="text-align:center;"> Peter Boesman </td>
   <td style="text-align:center;"> Argentina </td>
   <td style="text-align:center;"> Calilegua NP, Jujuy </td>
   <td style="text-align:center;"> -23.74195 </td>
   <td style="text-align:center;"> -64.85777 </td>
   <td style="text-align:center;"> https://xeno-canto.org/273100/download </td>
   <td style="text-align:center;"> mp3 </td>
   <td style="text-align:center;"> https://xeno-canto.org/273100 </td>
   <td style="text-align:center;"> truf_xc </td>
   <td style="text-align:center;"> 90 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> Xeno-Canto </td>
   <td style="text-align:center;"> sound </td>
   <td style="text-align:center;"> 273098 </td>
   <td style="text-align:center;"> Turdus rufiventris </td>
   <td style="text-align:center;"> 2013-10-19 </td>
   <td style="text-align:center;"> 18:00 </td>
   <td style="text-align:center;"> Peter Boesman </td>
   <td style="text-align:center;"> Argentina </td>
   <td style="text-align:center;"> Calilegua NP, Jujuy </td>
   <td style="text-align:center;"> -23.74195 </td>
   <td style="text-align:center;"> -64.85777 </td>
   <td style="text-align:center;"> https://xeno-canto.org/273098/download </td>
   <td style="text-align:center;"> mp3 </td>
   <td style="text-align:center;"> https://xeno-canto.org/273098 </td>
   <td style="text-align:center;"> truf_xc </td>
   <td style="text-align:center;"> 90 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> GBIF </td>
   <td style="text-align:center;"> sound </td>
   <td style="text-align:center;"> 2243678570 </td>
   <td style="text-align:center;"> Turdus rufiventris </td>
   <td style="text-align:center;"> 2013-10-19 </td>
   <td style="text-align:center;"> 18:00 </td>
   <td style="text-align:center;"> Peter Boesman </td>
   <td style="text-align:center;"> Argentina </td>
   <td style="text-align:center;"> Calilegua NP, Jujuy </td>
   <td style="text-align:center;"> -23.74195 </td>
   <td style="text-align:center;"> -64.85777 </td>
   <td style="text-align:center;"> https://xeno-canto.org/sounds/uploaded/OOECIWCSWV/XC273098-Rufous-bellied%20Thrush%20QQ%20call%20A%201.mp3 </td>
   <td style="text-align:center;"> mp3 </td>
   <td style="text-align:center;"> https://www.gbif.org/occurrence/2243678570 </td>
   <td style="text-align:center;"> truf_gbf </td>
   <td style="text-align:center;"> 90 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> GBIF </td>
   <td style="text-align:center;"> sound </td>
   <td style="text-align:center;"> 2243680322 </td>
   <td style="text-align:center;"> Turdus rufiventris </td>
   <td style="text-align:center;"> 2013-10-19 </td>
   <td style="text-align:center;"> 18:00 </td>
   <td style="text-align:center;"> Peter Boesman </td>
   <td style="text-align:center;"> Argentina </td>
   <td style="text-align:center;"> Calilegua NP, Jujuy </td>
   <td style="text-align:center;"> -23.74195 </td>
   <td style="text-align:center;"> -64.85777 </td>
   <td style="text-align:center;"> https://xeno-canto.org/sounds/uploaded/OOECIWCSWV/XC273100-Rufous-bellied%20Thrush%20QQQ%20call%20A.mp3 </td>
   <td style="text-align:center;"> mp3 </td>
   <td style="text-align:center;"> https://www.gbif.org/occurrence/2243680322 </td>
   <td style="text-align:center;"> truf_gbf </td>
   <td style="text-align:center;"> 90 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> Macaulay Library </td>
   <td style="text-align:center;"> sound </td>
   <td style="text-align:center;"> 301276 </td>
   <td style="text-align:center;"> Turdus rufiventris </td>
   <td style="text-align:center;"> 2013-10-19 </td>
   <td style="text-align:center;"> 18:00 </td>
   <td style="text-align:center;"> Peter Boesman </td>
   <td style="text-align:center;"> Argentina </td>
   <td style="text-align:center;"> Calilegua NP </td>
   <td style="text-align:center;"> -23.74200 </td>
   <td style="text-align:center;"> -64.85780 </td>
   <td style="text-align:center;"> https://cdn.download.ams.birds.cornell.edu/api/v1/asset/301276/ </td>
   <td style="text-align:center;"> mp3 </td>
   <td style="text-align:center;"> https://macaulaylibrary.org/asset/301276 </td>
   <td style="text-align:center;"> truf_ml </td>
   <td style="text-align:center;"> 90 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> Macaulay Library </td>
   <td style="text-align:center;"> sound </td>
   <td style="text-align:center;"> 301275 </td>
   <td style="text-align:center;"> Turdus rufiventris </td>
   <td style="text-align:center;"> 2013-10-19 </td>
   <td style="text-align:center;"> 18:00 </td>
   <td style="text-align:center;"> Peter Boesman </td>
   <td style="text-align:center;"> Argentina </td>
   <td style="text-align:center;"> Calilegua NP </td>
   <td style="text-align:center;"> -23.74200 </td>
   <td style="text-align:center;"> -64.85780 </td>
   <td style="text-align:center;"> https://cdn.download.ams.birds.cornell.edu/api/v1/asset/301275/ </td>
   <td style="text-align:center;"> mp3 </td>
   <td style="text-align:center;"> https://macaulaylibrary.org/asset/301275 </td>
   <td style="text-align:center;"> truf_ml </td>
   <td style="text-align:center;"> 90 </td>
  </tr>
</tbody>
</table></div>

In this case all the observations seem to refer to the same media file. Therefore only one copy is needed. Also note that the locality is not exactly the same for these records, but the fuzzy matching approach used by [find_duplicates()](https://docs.ropensci.org/suwo/reference/find_duplicates.html) was able to identify them as potential duplicates. By default, the criteria is set to `country > 0.8 & locality > 0.5 & user_name > 0.8 & time == 1 & date == 1` which means that two entries will be considered duplicates if they have a country similarity greater than 0.8, locality similarity greater than 0.5, user_name similarity greater than 0.8, and exact matches for time and date (similarities range from 0 to 1). These values have been found to work well in most cases. Nonetheless, users can adjust the sensitivity based on their specific needs using the argument `criteria`.

Once users have reviewed the candidate duplicates, they can apply the [remove_duplicates()](https://docs.ropensci.org/suwo/reference/remove_duplicates.html) function to eliminate unwanted duplicates from their metadata data frames. This function takes as input a metadata output data frame from [find_duplicates()](https://docs.ropensci.org/suwo/reference/find_duplicates.html):


``` r
# remove duplicates
dedup_metadata <- remove_duplicates(dups_merged_metadata)
```

```
ℹ 304 duplicates removed 
```

The output is a data frame similar to the input but without the specified duplicate records:


``` r
# look at first 4 columns of deduplicated metadata
# we remove urls to avoid CRAN issues
head(dedup_metadata[, grep("url", names(dedup_metadata), invert = TRUE)], 4)
```

<div style="border: 1px solid #ddd; padding: 0px; overflow-y: scroll; height:300px; overflow-x: scroll; width:100%; "><table class="table table-striped" style="font-size: 14px; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> repository </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> format </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> key </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> species </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> date </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> time </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> user_name </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> country </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> locality </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> latitude </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> longitude </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> file_extension </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> source </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> duplicate_group </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;"> Xeno-Canto </td>
   <td style="text-align:center;"> sound </td>
   <td style="text-align:center;"> 1048627 </td>
   <td style="text-align:center;"> Turdus rufiventris </td>
   <td style="text-align:center;"> 2025-10-03 </td>
   <td style="text-align:center;"> 12:03 </td>
   <td style="text-align:center;"> Franco Vushurovich </td>
   <td style="text-align:center;"> Argentina </td>
   <td style="text-align:center;"> Victoria, Entre Ríos </td>
   <td style="text-align:center;"> -32.8606 </td>
   <td style="text-align:center;"> -60.6486 </td>
   <td style="text-align:center;"> wav </td>
   <td style="text-align:center;"> truf_xc </td>
   <td style="text-align:center;"> 1 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> GBIF </td>
   <td style="text-align:center;"> sound </td>
   <td style="text-align:center;"> 5995375753 </td>
   <td style="text-align:center;"> Turdus rufiventris </td>
   <td style="text-align:center;"> 2025-10-03 </td>
   <td style="text-align:center;"> 12:03 </td>
   <td style="text-align:center;"> Franco Vushurovich </td>
   <td style="text-align:center;"> Argentina </td>
   <td style="text-align:center;"> Victoria, Entre Ríos </td>
   <td style="text-align:center;"> -32.8606 </td>
   <td style="text-align:center;"> -60.6486 </td>
   <td style="text-align:center;"> wav </td>
   <td style="text-align:center;"> truf_gbf </td>
   <td style="text-align:center;"> 1 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> Xeno-Canto </td>
   <td style="text-align:center;"> sound </td>
   <td style="text-align:center;"> 398272 </td>
   <td style="text-align:center;"> Turdus rufiventris </td>
   <td style="text-align:center;"> 2016-10-26 </td>
   <td style="text-align:center;"> 20:30 </td>
   <td style="text-align:center;"> Federico Ferrer </td>
   <td style="text-align:center;"> Argentina </td>
   <td style="text-align:center;"> Villa Carlos Paz, Córdoba </td>
   <td style="text-align:center;"> -31.4294 </td>
   <td style="text-align:center;"> -64.4885 </td>
   <td style="text-align:center;"> mp3 </td>
   <td style="text-align:center;"> truf_xc </td>
   <td style="text-align:center;"> 2 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> GBIF </td>
   <td style="text-align:center;"> sound </td>
   <td style="text-align:center;"> 2243790806 </td>
   <td style="text-align:center;"> Turdus rufiventris </td>
   <td style="text-align:center;"> 2016-10-26 </td>
   <td style="text-align:center;"> 20:30 </td>
   <td style="text-align:center;"> Federico Ferrer </td>
   <td style="text-align:center;"> Argentina </td>
   <td style="text-align:center;"> Villa Carlos Paz, Córdoba </td>
   <td style="text-align:center;"> -31.4294 </td>
   <td style="text-align:center;"> -64.4885 </td>
   <td style="text-align:center;"> mp3 </td>
   <td style="text-align:center;"> truf_gbf </td>
   <td style="text-align:center;"> 2 </td>
  </tr>
</tbody>
</table></div>

When duplicates are found, one observation from each group of duplicates is retained in the output data frame. However, if multiple observations from the same repository are labeled as duplicates, by default (`same_repo = FALSE`) all of them are retained in the output data frame. This is useful as it can be expected that observations from the same repository are not true duplicates (e.g. different recordings uploaded to Xeno-Canto with the same date, time and location by the same user), but rather have not been documented with enough precision to be told apart. This behavior can be modified. If `same_repo = TRUE`, only one of the duplicated observations from the same repository will be retained in the output data frame (and all other excluded). The function will give priority to repositories in which media downloading is more straightforward (i.e. Xeno-Canto, GBIF), but this can be modified with the argument `repo_priority`.


# Download media files

The last step of the workflow is to download the media files associated with the metadata. This can be done using the [download_media()](https://docs.ropensci.org/suwo/reference/download_media.html) function, which takes as input a metadata data frame (obtained from any query function or any of the other metadata managing functions) and downloads the media files to a specified directory. For this example we will download images from a query on zambian slender Caesar (_Amanita zambiana_) (a mushroom) on GBIF:


``` r
# query GBIF for Amanita zambiana images
a_zam <- query_gbif(species = "Amanita zambiana", format = "image")
```

```
✔ Obtaining metadata (7 matching records found) 🎉
```

``` r
# create folder for images
out_folder <- file.path(tempdir(), "amanita_zambiana")
dir.create(out_folder)

# download media files to a temporary directory
azam_files <- download_media(metadata = a_zam, path = out_folder)
```

```
Downloading media files:
```

```
✔ All files were downloaded successfully 🎊
```

The output of the function is a data frame similar to the input metadata but with two additional columns indicating the file name of the downloaded files ('downloaded_file_name') and the result of the download attempt ('download_status', with values "success", 'failed', 'already there (not downloaded)' or 'overwritten').

Here we print the first 4 rows of the output data frame:


``` r
head(azam_files, 4)
```

<div style="border: 1px solid #ddd; padding: 0px; overflow-y: scroll; height:300px; overflow-x: scroll; width:100%; "><table class="table table-striped" style="font-size: 14px; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> repository </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> format </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> key </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> species </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> date </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> time </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> user_name </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> country </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> locality </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> latitude </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> longitude </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> file_url </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> file_extension </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> observation_url </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> downloaded_file_name </th>
   <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> download_status </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;"> GBIF </td>
   <td style="text-align:center;"> image </td>
   <td style="text-align:center;"> 4430877067 </td>
   <td style="text-align:center;"> Amanita zambiana </td>
   <td style="text-align:center;"> 2023-01-25 </td>
   <td style="text-align:center;"> 10:57 </td>
   <td style="text-align:center;"> Allanweideman </td>
   <td style="text-align:center;"> Mozambique </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> -21.28456 </td>
   <td style="text-align:center;"> 34.61868 </td>
   <td style="text-align:center;"> https://inaturalist-open-data.s3.amazonaws.com/photos/253482452/original.jpg </td>
   <td style="text-align:center;"> jpeg </td>
   <td style="text-align:center;"> https://www.gbif.org/occurrence/4430877067 </td>
   <td style="text-align:center;"> Amanita_zambiana-GBIF4430877067-1.jpeg </td>
   <td style="text-align:center;"> saved </td>
  </tr>
  <tr>
   <td style="text-align:center;"> GBIF </td>
   <td style="text-align:center;"> image </td>
   <td style="text-align:center;"> 4430877067 </td>
   <td style="text-align:center;"> Amanita zambiana </td>
   <td style="text-align:center;"> 2023-01-25 </td>
   <td style="text-align:center;"> 10:57 </td>
   <td style="text-align:center;"> Allanweideman </td>
   <td style="text-align:center;"> Mozambique </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> -21.28456 </td>
   <td style="text-align:center;"> 34.61868 </td>
   <td style="text-align:center;"> https://inaturalist-open-data.s3.amazonaws.com/photos/253482473/original.jpg </td>
   <td style="text-align:center;"> jpeg </td>
   <td style="text-align:center;"> https://www.gbif.org/occurrence/4430877067 </td>
   <td style="text-align:center;"> Amanita_zambiana-GBIF4430877067-2.jpeg </td>
   <td style="text-align:center;"> saved </td>
  </tr>
  <tr>
   <td style="text-align:center;"> GBIF </td>
   <td style="text-align:center;"> image </td>
   <td style="text-align:center;"> 4430877067 </td>
   <td style="text-align:center;"> Amanita zambiana </td>
   <td style="text-align:center;"> 2023-01-25 </td>
   <td style="text-align:center;"> 10:57 </td>
   <td style="text-align:center;"> Allanweideman </td>
   <td style="text-align:center;"> Mozambique </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> -21.28456 </td>
   <td style="text-align:center;"> 34.61868 </td>
   <td style="text-align:center;"> https://inaturalist-open-data.s3.amazonaws.com/photos/253484256/original.jpg </td>
   <td style="text-align:center;"> jpeg </td>
   <td style="text-align:center;"> https://www.gbif.org/occurrence/4430877067 </td>
   <td style="text-align:center;"> Amanita_zambiana-GBIF4430877067-3.jpeg </td>
   <td style="text-align:center;"> saved </td>
  </tr>
  <tr>
   <td style="text-align:center;"> GBIF </td>
   <td style="text-align:center;"> image </td>
   <td style="text-align:center;"> 5104283819 </td>
   <td style="text-align:center;"> Amanita zambiana </td>
   <td style="text-align:center;"> 2023-03-31 </td>
   <td style="text-align:center;"> 13:41 </td>
   <td style="text-align:center;"> Nick Helme </td>
   <td style="text-align:center;"> Zambia </td>
   <td style="text-align:center;"> NA </td>
   <td style="text-align:center;"> -12.44276 </td>
   <td style="text-align:center;"> 31.28535 </td>
   <td style="text-align:center;"> https://inaturalist-open-data.s3.amazonaws.com/photos/268158445/original.jpeg </td>
   <td style="text-align:center;"> jpeg </td>
   <td style="text-align:center;"> https://www.gbif.org/occurrence/5104283819 </td>
   <td style="text-align:center;"> Amanita_zambiana-GBIF5104283819.jpeg </td>
   <td style="text-align:center;"> saved </td>
  </tr>
</tbody>
</table></div>

... and check that the files were saved in the path supplied:

``` r
fs::dir_tree(path = out_folder)
```


```
/tmp/Rtmpp5V9Oh/amanita_zambiana
├── Amanita_zambiana-GBIF3759537817-1.jpeg
├── Amanita_zambiana-GBIF3759537817-2.jpeg
├── Amanita_zambiana-GBIF4430877067-1.jpeg
├── Amanita_zambiana-GBIF4430877067-2.jpeg
├── Amanita_zambiana-GBIF4430877067-3.jpeg
├── Amanita_zambiana-GBIF5069132689-1.jpeg
├── Amanita_zambiana-GBIF5069132689-2.jpeg
├── Amanita_zambiana-GBIF5069132691.jpeg
├── Amanita_zambiana-GBIF5069132696-1.jpeg
├── Amanita_zambiana-GBIF5069132696-2.jpeg
├── Amanita_zambiana-GBIF5069132732.jpeg
└── Amanita_zambiana-GBIF5104283819.jpeg
```

Note that the name of the downloaded files includes the species name, an abbreviation of the repository name and the unique record key. If more than one media file is associated with a record, a sequential number is added at the end of the file name.


This is a multipanel plot of 6 of the downloaded images (just for illustration purpose):




``` r
# create a 6 pannel plot of the downloaded images
opar <- par(mfrow = c(2, 3), mar = c(1, 1, 2, 1))

for (i in 1:6) {
img <- jpeg::readJPEG(file.path(out_folder, azam_files$downloaded_file_name[i]))
  plot(
    1:2,
    type = 'n',
    axes = FALSE
  )
  graphics::rasterImage(img, 1, 1, 2, 2)
  title(main = paste(
    azam_files$country[i],
    azam_files$date[i],
    sep = "\n"
  ))
}

# reset par
par(opar)
```

<center><img src="amanitas.jpeg" alt="Example images obtain from a query of Amanita zambiana" width="100%"></center>




Users can also save the downloaded files into sub-directories with the argument `folder_by`. This argument takes a character or factor column with the names of a metadata field (a column in the metadata data frame) to create sub-directories within the main download directory (suplied with the argument `path`). For instance, the following code searches/downloads images of longspined porcupinefish (_Diodon holocanthus_) from GBIF, and saves images into sub-directories by country (for simplicity only 6 of them):


``` r
# query GBIF for longspined porcupinefish images
d_holocanthus <- query_gbif(species = "Diodon holocanthus", format = "image")
```

```
✔ Obtaining metadata (4031 matching records found) 🥳
```

```
! 1 observation does not have a download link and was removed from the results (inlcuded as an attribute called 'excluded_results'). 
```

``` r
# keep only JPEG records (for simplicity for this vignette)
d_holocanthus <- d_holocanthus[d_holocanthus$file_extension == "jpeg", ]

# select 6 random JPEG records
set.seed(666)
d_holocanthus <- d_holocanthus[sample(seq_len(nrow(d_holocanthus)), 6),]

# create folder for images
out_folder <- file.path(tempdir(), "diodon_holocanthus")
dir.create(out_folder)

# download media files creating sub-directories by country
dhol_files <- download_media(metadata = d_holocanthus,
                             path = out_folder,
                             folder_by = "country")
```

```
Downloading media files:
```

```
✔ All files were downloaded successfully 🥳
```


``` r
fs::dir_tree(path = out_folder)
```


```
/tmp/Rtmpp5V9Oh/diodon_holocanthus
├── Ecuador
│   └── Diodon_holocanthus-GBIF2563608698.jpeg
├── Mexico
│   ├── Diodon_holocanthus-GBIF5077046236.jpeg
│   ├── Diodon_holocanthus-GBIF5903724188.jpeg
│   └── Diodon_holocanthus-GBIF5935933490.jpeg
├── Panama
│   └── Diodon_holocanthus-GBIF3499339312.jpeg
└── Philippines
    └── Diodon_holocanthus-GBIF5134168116.jpeg
```

In such case the 'downloaded_file_name' column will include the sub-directory name:

``` r
dhol_files$downloaded_file_name
```

```
[1] "Mexico/Diodon_holocanthus-GBIF5903724188.jpeg"     
[2] "Ecuador/Diodon_holocanthus-GBIF2563608698.jpeg"    
[3] "Panama/Diodon_holocanthus-GBIF3499339312.jpeg"     
[4] "Philippines/Diodon_holocanthus-GBIF5134168116.jpeg"
[5] "Mexico/Diodon_holocanthus-GBIF5077046236.jpeg"     
[6] "Mexico/Diodon_holocanthus-GBIF5935933490.jpeg"     
```

This is a multipanel plot of the downloaded images (just for fun):



``` r
# create a 6 pannel plot of the downloaded images
opar <- par(mfrow = c(2, 3), mar = c(1, 1, 2, 1))

for (i in 1:6) {
img <- jpeg::readJPEG(file.path(out_folder, dhol_files$downloaded_file_name[i]))
  plot(
    1:2,
    type = 'n',
    axes = FALSE
  )
  graphics::rasterImage(img, 1, 1, 2, 2)
  title(main = paste(
    substr(dhol_files$country[i], start = 1, stop = 14),
    dhol_files$date[i],
    sep = "\n"
  ))
}

# reset par
par(opar)
```

<center><img src="porcupinefish.jpeg" alt="Example images obtain from a query of porcupinefish" width="100%"></center>



<!-- add packages used, system details and versions  -->

## Session information {.unnumbered .unlisted}

<details>
  <summary>Click to see</summary>

```
R version 4.5.2 (2025-10-31)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=es_CR.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=es_CR.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=es_CR.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=es_CR.UTF-8 LC_IDENTIFICATION=C       

time zone: America/Costa_Rica
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] suwo_0.2.0 knitr_1.51

loaded via a namespace (and not attached):
 [1] xfun_0.57              httr2_1.2.2            lattice_0.22-9         vctrs_0.7.2           
 [5] tools_4.5.2            generics_0.1.4         curl_7.0.0             parallel_4.5.2        
 [9] proxy_0.4-29           RSQLite_2.4.6          blob_1.3.0             Matrix_1.7-4          
[13] data.table_1.18.2.1    checkmate_2.3.4        RColorBrewer_1.1-3     lifecycle_1.0.5       
[17] compiler_4.5.2         farver_2.1.2           stringr_1.6.0          textshaping_1.0.4     
[21] getPass_0.2-4          codetools_0.2-20       htmltools_0.5.9        class_7.3-23          
[25] evd_2.3-7.1            yaml_2.3.12            prodlim_2026.03.11     crayon_1.5.3          
[29] pillar_1.11.1          MASS_7.3-65            rsconnect_1.3.4        cachem_1.1.0          
[33] rpart_4.1.24           parallelly_1.46.1      lava_1.9.0             digest_0.6.39         
[37] stringi_1.8.7          future_1.70.0          listenv_0.10.1         splines_4.5.2         
[41] fastmap_1.2.0          grid_4.5.2             cli_3.6.5              magrittr_2.0.5        
[45] survival_3.8-6         future.apply_1.20.2    e1071_1.7-17           scales_1.4.0          
[49] backports_1.5.1        rappdirs_0.3.4         bit64_4.6.0-1          lubridate_1.9.5       
[53] timechange_0.4.0       rmarkdown_2.31         globals_0.19.1         bit_4.6.0             
[57] nnet_7.3-20            kableExtra_1.4.0       memoise_2.0.1          evaluate_1.0.5        
[61] ff_4.5.2               viridisLite_0.4.3      rlang_1.2.0            Rcpp_1.1.1            
[65] xtable_1.8-8           glue_1.8.0             DBI_1.3.0              RecordLinkage_0.4-12.6
[69] xml2_1.5.2             ipred_0.9-15           svglite_2.2.2          rstudioapi_0.17.1     
[73] jsonlite_2.0.0         R6_2.6.1               fs_2.0.1               systemfonts_1.3.1     
```
</details>
