Version Note: Up-to-date with v0.3.0

This article briefly introduces the usage of
`dplyr::select`

, and how it is applied to this package. In
the first section, I will briefly describe `dplyr::select`

syntax for R-beginners. If you are already familiar with
`dplyr::select`

syntax, then you can skip to the next section
where I describe how to apply the syntax in this pacakge.

`dplyr::select`

(abbreviated as `select`

hereafter) is an extremely power function for R. It allows you to subset
columns with the a set of syntax that is also known as the
`select`

syntax / semantics in the R community. A side note
here. With the new introduction of `dplyr::across`

function,
the `select`

syntax can be applied to
`dplyr::mutate`

and `dplyr::filter`

where make
these two already powerful function even more powerful. I will first
introduce the usage of `:`

, `c()`

and
`-`

. Then, I will discuss how to use `everything`

,
`starts_with`

, `end_with`

, `contains`

,
and `where`

. This is not an exhaustive list of the
`select`

syntax, but there are the most relevant one. If you
want to learn more, I encourage you to check the vignette of
`dplyr`

or just google it. There are tons of article that
discuss this in detail.

I am going to use the `iris`

dataset for the
demonstration. Let’s take a quick peek of the dataset.

```
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
```

If I want to select the first 3 columns, you can use `:`

to do that

```
Sepal.Length Sepal.Width Petal.Length
1 5.1 3.5 1.4
```

```
Sepal.Length Sepal.Width Petal.Length
1 5.1 3.5 1.4
```

Next, if you want to combine selection then you can use
`c()`

. For example, I want the 1st, 3rd and 4th columns.
Then, you can do it like this

```
Sepal.Length Petal.Length Petal.Width
1 5.1 1.4 0.2
```

```
Sepal.Length Petal.Length Petal.Width
1 5.1 1.4 0.2
```

Finally, if you want to delete a column from selection, then you can
use `-`

. For example, you want to select all columns except
the 3rd column, then you can do it like this

```
Sepal.Length Sepal.Width Petal.Width Species
1 5.1 3.5 0.2 setosa
```

```
Sepal.Length Sepal.Width Petal.Width Species
1 5.1 3.5 0.2 setosa
```

Ok. Now you understand the basic usage. Let’s get to something a
little bit more advanced. First, let’s talk about my favorite which is
`everything`

. As the name entails, it select all the
variables in the data frame. It is usually used in combination with
`c()`

if you are using in `select`

function.
However, it is very powerful in other use cases like the one in this
package. For example, you want to fit a linear regression with all the
variables, then you can use `everything`

(a more detailed
discussion is presented in the next section).

```
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
```

```
Sepal.Length Petal.Length Petal.Width Species
1 5.1 1.4 0.2 setosa
```

Next, we can talk about `starts_with`

.
`starts_with`

select all columns that is starts with a
certain specified string. For example, we want to select all columns
start with Sepal, then we can do something like this

```
Sepal.Length Sepal.Width
1 5.1 3.5
```

Similar to `starts_with`

, `ends_with`

select
all columns that is ends with a certain specified string. For example,
we want to select all columns ends with Width.

```
Sepal.Width Petal.Width
1 3.5 0.2
```

Next, we are going talk about `contains`

. As the name
entails, it select all columns that contains a specified string.

```
Sepal.Length Sepal.Width
1 5.1 3.5
```

```
Sepal.Width Petal.Width
1 3.5 0.2
```

```
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 5.1 3.5 1.4 0.2
```

Finally, we are going to conclude this section with
`where`

. `where`

is not used alone. It is usually
pair with a function return `TRUE`

or `FALSE`

. I
think the most common use case for this package is paired with
`is.numeric`

. `where(is.numeric)`

will select all
numeric variables. A little tip, you need to pass
`is.numeric`

instead of `is.numeric()`

. I will not
go into the detail of why because this is out of the scope of this
article. It required a little bit more advanced understanding of how
function work in R. If you have that, you wouldn’t reading this article
anyway.

```
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 5.1 3.5 1.4 0.2
```

First, I will demonstrate the usage of linear regression. I will first create a data frame. You don’t need to know anything about how this data frame is created. Just know that it has 1 DV / outcome / response variable (i.e, y) and 5 IV / predictor variable (i.e, x1 to x5)

```
set.seed(1)
test_data = data.frame(y = rnorm(n = 100,mean = 2,sd = 3),
x1 = rnorm(n = 100,mean = 1.5, sd = 4),
x2 = rnorm(n = 100,mean = 1.7, sd = 4),
x3 = rnorm(n = 100,mean = 1.5, sd = 4),
x4 = rnorm(n = 100,mean = 2, sd = 4),
x5 = rnorm(n = 100,mean = 1.5, sd = 4))
```

Ok, let’s fit that linear regression now.

```
# Without this package:
model1 = lm(data = test_data, formula = y ~ x1 + x2 + x3 + x4 + x5)
# With this package:
model2 = lm_model(data = test_data,
response_variable = y,
predictor_variable = c(everything(),-y))
```

```
Fitting Model with lm:
Formula = y ~ x1 + x2 + x3 + x4 + x5
```

This is already a step up from the basic `lm()`

function.
We can still make is even simpler by just passing
`everyhing()`

. The function is designed to remove the
response variable from predictor variables (if selected) automatically.
The following `model3`

is the same as `model2`

```
Fitting Model with lm:
Formula = y ~ x1 + x2 + x3 + x4 + x5
```

The same logic is applied to all other functions in this package.
Arguments that support `dplyr::select`

syntax will ends with
“support dplyr::select syntax” in the description of the argument.That’s
it for this brief introduction. If you want to learn more about this
package, I encourage you to check out this article
or use `vignette('quick-introduction')`

if you are in R
Studio.