The capesR package was developed to facilitate access and manipulation of data from the Catalog of Theses and Dissertations of the Brazilian Coordination for the Improvement of Higher Education Personnel (CAPES). This catalog includes information on theses and dissertations defended at higher education institutions in Brazil, with variables such as:
This package automates the process of obtaining and organizing this data, making it easily accessible for analysis and reporting.
The original CAPES data is available at dadosabertos.capes.gov.br.
The data used in this package is hosted in the The Open Science Framework (OSF).
The download_capes_data
function allows you to download
CAPES data files hosted on OSF. You can specify the desired years, and
the corresponding files will be saved locally.
Download data using the temporary directory (function default):
library(capesR)
library(dplyr)
# Download data for 1987 and 1990
capes_files <- download_capes_data(c(1987, 1990))
# View the list of downloaded files
capes_files %>% glimpse()
In this case, the data will not persist for future uses.
It is recommended to define a persistent directory to store the
downloaded data instead of using the default temporary directory
(tempdir()
). This will allow you to reuse the data in the
future.
# Define the directory to store the data
data_directory <- "/capes_data"
# Download data for 1987 and 1990 using a persistent directory
capes_files <- download_capes_data(
c(1987, 1990),
destination = data_directory)
When using a persistent directory, the data will be downloaded only once. In future uses, the function will identify which files already exist in the directory and return their paths.
Use the read_capes_data
function to combine the
downloaded files from a list generated by the
download_capes_data
function or manually created.
Filters are applied before the data is read, improving performance.
Exact filters are applied before reading the data for better performance, and the text filter is optimized to accelerate the search.
The package also provides a set of synthetic data,
capes_synthetic_df
, containing aggregated information from
the CAPES Catalog of Theses and Dissertations. This synthetic dataset
facilitates quick analyses and prototyping without requiring full
downloads and processing.
The synthetic data includes the following columns:
The synthetic data is available directly in the package and can be loaded with: