---
title: "Introduction to nmfkc"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to nmfkc}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5
)
```

## Introduction

Welcome to the `nmfkc` package\!
This vignette provides a beginner-friendly introduction to the core function, `nmfkc()`.

**Non-negative Matrix Factorization (NMF)** is a technique that decomposes a large data matrix $Y$ into two smaller matrices, $X$ and $B$:
$$Y \approx X B$$
The key feature of NMF is that all elements must be **non-negative** ($\ge 0$). This makes the results intuitive, as the original data can be understood as an additive combination of parts.

In this guide, we will cover:

1.  **Basic NMF**: Extracting latent topics using a Movie Ratings example.
2.  **Interpretation**: Understanding what the decomposed matrices represent.
3.  **Missing Values**: How to handle and predict missing data (e.g., for recommendations).

-----

## 1\. Basic Usage: Analyzing Movie Ratings

To understand NMF, let's imagine a scenario with **5 Users** rating **4 Movies** on a scale of 1 to 5.

First, load the package.

```{r load-package}
library(nmfkc)
```

### Creating the Data

We create a rating matrix `Y`.
The dataset contains two hidden genres: "Action" (Movies 1 & 2) and "Romance" (Movies 3 & 4).

```{r create-data}
# Rows: Users (U1-U5), Cols: Movies (M1-M4)
# U1, U2, U3 prefer Action movies.
# U4, U5 prefer Romance movies.
Y <- matrix(
  c(5, 4, 1, 1,
    4, 5, 1, 2,
    5, 5, 2, 2,
    1, 2, 5, 4,
    1, 1, 4, 5),
  nrow = 5, byrow = TRUE
)

# Assign names for better interpretation
rownames(Y) <- paste0("User", 1:5)
colnames(Y) <- c("Action1", "Action2", "Romance1", "Romance2")

# Check the data
print(Y)
```

### Running NMF

We use the `nmfkc()` function to decompose this matrix.
Since we assume there are 2 genres (Action and Romance), we set **rank = 2**.

```{r run-nmfkc}
# Run NMF with rank = 2
res <- nmfkc(Y, rank = 2, seed = 123)
```

### Interpretation

NMF decomposes $Y$ into $X$ (Basis) and $B$ (Coefficient).
*(Note: The order of bases may vary depending on the random seed. In this example with seed=123, Basis 1 corresponds to Action and Basis 2 to Romance.)*

#### 1\. Basis Matrix X: User Preferences

The matrix $X$ represents **"How much each User likes each Genre (Basis)."**

```{r interpret-X}
# Each column represents a latent factor (Basis)
res$X
```

  * **Basis1**: High values for **User1, User2, and User3** (Action fans).
  * **Basis2**: High values for **User4 and User5** (Romance fans).

#### 2\. Coefficient Matrix B: Movie Genres

The matrix $B$ represents **"Which Genre each Movie belongs to."**

```{r interpret-B}
# Each row represents a latent factor
res$B
```

  * **Basis1**: High weights on **Action1 and Action2**.
  * **Basis2**: High weights on **Romance1 and Romance2**.

As you can see, NMF automatically discovered the hidden structures ("Action" vs "Romance") and user preferences without being explicitly told.

-----

## 2\. Visualization

`nmfkc` provides tools to visually diagnose your model.

### Convergence Plot

Use the `plot()` function to check if the error minimized properly during iterations.

```{r plot-convergence}
plot(res, main = "Convergence Plot")
```

### Visualizing the Reconstruction

The `nmfkc.residual.plot()` function allows you to compare the **Original Matrix ($Y$)**, the **Fitted Matrix ($XB$)**, and the **Residuals ($E$)** side-by-side.

```{r plot-residual, fig.width=9, fig.height=4}
# Visualize Original vs Fitted vs Residuals
nmfkc.residual.plot(Y, res)
```

The middle plot (Fitted Matrix) successfully captures the block structure of the original data.

-----

## 3\. Handling Missing Values (Imputation)

A powerful feature of `nmfkc` is its robustness to **Missing Values (`NA`)**. This is useful for tasks like **Recommendation Systems**, where you want to predict how a user would rate a movie they haven't seen yet.

### Creating Data with Missing Values

Let's assume **User1** has not seen **Action1** yet. We set this value to `NA`.

```{r create-na}
Y_missing <- Y
Y_missing["User1", "Action1"] <- NA # Introduce missing value
print(Y_missing)
```

### Running NMF with NAs

Simply pass the matrix with `NA`s to `nmfkc()`. The algorithm automatically handles them by ignoring the missing entries during optimization.

```{r run-na}
res_na <- nmfkc(Y_missing, rank = 2, seed = 123)
```

### Predicting the Unknown Rating

The fitted model ($X \times B$) provides an estimate for the missing entry.

```{r impute-na}
# Extract the predicted value from the fitted matrix XB
predicted_rating <- res_na$XB["User1", "Action1"]
actual_rating <- Y["User1", "Action1"] # The original hidden value (5)

cat(paste0("Actual Rating:    ", actual_rating, "\n"))
cat(paste0("Predicted Rating: ", round(predicted_rating, 2), "\n"))
```

Because User1 liked other Action movies, the model predicted a **reasonably high rating (3.62)** for the missing Action movie, closer to the actual rating (5) than to a low rating.

## Summary

With the `nmfkc` package, you can easily:

1.  **Decompose** complex data into interpretable parts ($X$ and $B$).
2.  **Handle missing values** robustly for imputation and prediction.
3.  **Visualize** the results to verify the fit.

For more advanced topics, such as Time Series Analysis or Covariate-assisted NMF, please refer to the other vignettes (`Topic Modeling` and `Time Series Analysis`).

