## Introduction

It is considered bad statistical practice to dichotomise continuous
outcomes, but some applications require predicted probabilities rather
than predicted values. To obtain predicted values, we recommend to model
the original continuous outcome with *linear regression*. To
obtain predicted probabilities, we recommend not to model the artificial
binary outcome with *logistic regression*, but to model the
original continuous outcome and the artificial binary outcome with
*combined regression*.

## Installation

Install the current release from CRAN:

`install.packages("cornet")`

Or install the development version from GitHub:

```
#install.packages("devtools")
devtools::install_github("rauschenberger/cornet")
```

Then load and attach the package:

## Example

We simulate data for \(n\) samples
and \(p\) features, in a
high-dimensional setting (\(p \gg n\)).
The matrix \(\boldsymbol{X}\) with
\(n\) rows and \(p\) columns represents the features, and
the vector \(\boldsymbol{y}\) of length
\(n\) represents the continuous
outcome.

```
set.seed(1)
n <- 100; p <- 500
X <- matrix(rnorm(n*p),nrow=n,ncol=p)
beta <- rbinom(n=p,size=1,prob=0.05)
y <- rnorm(n=n,mean=X%*%beta)
```

We use the function `cornet`

for modelling the original
continuous outcome and the artificial binary outcome. The argument
`cutoff`

splits the samples into two groups, those with an
outcome less than or equal to the cutoff, and those with an outcome
greater than the cutoff.

```
model <- cornet(y=y,cutoff=0,X=X)
model
```

The function `coef`

returns the estimated coefficients.
The first column is for the linear model (beta), and the second column
is for the logistic model (gamma). The first row includes the estimated
intercepts, and the other rows include the estimated slopes.

The function `predict`

returns fitted values for training
data, or predicted values for testing data. The argument
`newx`

specifies the feature matrix. The output is a matrix
with one column for each model.

`predict <- predict(model,newx=X)`

The function `cv.cornet`

measures the predictive
performance of combined regression by nested cross-validation, in
comparison with logistic regression.

`cv.cornet(y=y,cutoff=0,X=X)`

Here we observe that combined regression outperforms logistic
regression (lower logistic deviance), and that logistic regression is
only slightly better than the intercept-only model.