nanoparquet
is a reader and writer for a common subset
of Parquet files.
FLOAT16
,
INTERVAL
, UNKNOWN
.Install the R package from CRAN:
install.packages("nanoparquet")
Call read_parquet()
to read a Parquet file:
<- nanoparquet::read_parquet("example.parquet") df
To see the columns of a Parquet file and how their types are mapped
to R types by read_parquet()
, call
parquet_column_types()
first:
::parquet_column_types("example.parquet") nanoparquet
Folders of similar-structured Parquet files (e.g. produced by Spark) can be read like this:
<- data.table::rbindlist(lapply(
df Sys.glob("some-folder/part-*.parquet"),
::read_parquet
nanoparquet ))
Call write_parquet()
to write a data frame to a Parquet
file:
::write_parquet(mtcars, "mtcars.parquet") nanoparquet
To see how the columns of the data frame will be mapped to Parquet
types by write_parquet()
, call
parquet_column_types()
first:
::parquet_column_types(mtcars) nanoparquet
Call parquet_info()
,
parquet_column_types()
, parquet_schema()
or
parquet_metadata()
to see various kinds of metadata from a
Parquet file:
parquet_info()
shows a basic summary of the file.parquet_column_types()
shows the leaf columns, these
are are the ones that read_parquet()
reads into R.parquet_schema()
shows all columns, including non-leaf
columns.parquet_metadata()
shows the most complete metadata
information: file meta data, the schema, the row groups and column
chunks of the file.::parquet_info("mtcars.parquet")
nanoparquet::parquet_column_types("mtcars.parquet")
nanoparquet::parquet_schema("mtcars.parquet")
nanoparquet::parquet_metadata("mtcars.parquet") nanoparquet
If you find a file that should be supported but isn’t, please open an issue here with a link to the file.
See also ?parquet_options()
.
nanoparquet.class
: extra class to add to data frames
returned by read_parquet()
. If it is not defined, the
default is "tbl"
, which changes how the data frame is
printed if the pillar package is loaded.nanoparquet.use_arrow_metadata
: unless this is set to
FALSE
, read_parquet()
will make use of Arrow
metadata in the Parquet file. Currently this is used to detect factor
columns.nanoparquet.write_arrow_metadata
: unless this is set to
FALSE
, write_parquet()
will add Arrow metadata
to the Parquet file. This helps preserving classes of columns,
e.g. factors will be read back as factors, both by nanoparquet and
Arrow.MIT