---
title: "Introduction to vivaglint"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to vivaglint}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Overview

**vivaglint** is an R package built for HR analysts who work with Microsoft Viva Glint survey exports. It handles the repetitive data wrangling that Glint's native UI doesn't support — multi-cycle trend analysis, manager roll-ups, demographic segmentation, attrition risk scoring, and comment search — so you can spend more time interpreting results and less time reshaping data.

## Key Capabilities

- **Data Import & Validation**: Reads Glint CSV exports and validates the structure automatically
- **Summary Statistics**: Mean, SD, Glint Score (0–100), response counts, favorability percentages
- **Multi-Cycle Trending**: Compare engagement scores across survey waves
- **Manager Roll-Ups**: Aggregate results by direct or full reporting tree
- **Demographic Segmentation**: Slice results by any employee attribute (department, gender, tenure, etc.)
- **Attrition Risk Analysis**: Link survey scores to actual turnover outcomes
- **Comment Search**: Full-text search across all comment columns at once

## Getting Started

### Installation

```{r eval=FALSE}
devtools::install_github("microsoft/vivaglint")
library(vivaglint)
```

### Basic Workflow

#### 1. Import Your Glint Export

Export your survey from Viva Glint as a CSV and load it with `read_glint_survey()`:

```{r eval=FALSE}
survey_path <- system.file("extdata", "survey_export.csv", package = "vivaglint")
survey <- read_glint_survey(survey_path, emp_id_col = "EMP ID")
```

If you prefer to pull data directly from the Viva Glint API, configure
credentials once and use `read_glint_survey_api()`:

```{r eval=FALSE}
glint_setup(
  tenant_id = "your-tenant-id",
  client_id = "your-client-id",
  client_secret = "your-client-secret",
  experience_name = "your-experience-name"
)
survey <- read_glint_survey_api(
  survey_uuid = "your-survey-uuid",
  cycle_id = "your-cycle-id",
  emp_id_col = "EMP ID"
)
```

This returns a `glint_survey` object containing:
- **data**: The full survey response table
- **metadata**: Question names, column mappings, and respondent counts

#### 2. Get a Question-Level Summary

Calculate metrics across all survey questions in one call:

```{r eval=FALSE}
summary <- summarize_survey(survey, scale_points = 5)
```

Each row in the output represents one question and includes:
- `mean` — Average response on the raw scale
- `sd` — Standard deviation
- `glint_score` — Score transformed to 0–100, matching what appears in the Viva Glint UI
- `n_responses` — Number of employees who answered the question
- `n_skips` — Number of employees who skipped the question
- `n_total` — Total respondents
- `pct_favorable` — Percentage of favorable responses
- `pct_neutral` — Percentage of neutral responses
- `pct_unfavorable` — Percentage of unfavorable responses

**About Glint Score**: The Glint Score is calculated as `round(((mean - 1) / (scale_points - 1)) * 100)`, placing every question on a common 0–100 scale regardless of the original scale format.

**About Favorability**: The package uses Viva Glint's standard favorability thresholds for each scale type. On a 5-point scale, for example, 4–5 is favorable, 3 is neutral, and 1–2 is unfavorable.

#### 3. Focus on Specific Questions

If you're presenting to a leadership team or investigating a particular theme, filter to just the questions you care about:

```{r eval=FALSE}
# Engagement-specific questions
engagement_qs <- c(
  "I would recommend my team as a great place to work",
  "My work is meaningful"
)
engagement_summary <- summarize_survey(survey,
                                       scale_points = 5,
                                       questions = engagement_qs)
```

#### 4. Explore Response Distributions

See exactly how many employees chose each response value — useful when a mean alone doesn't tell the full story:

```{r eval=FALSE}
distributions <- get_response_dist(survey, scale_points = 5)
```

The output adds columns like `count_1`, `count_2`, `pct_1`, `pct_2`, etc. for each response value.

---

## Multi-Cycle Trend Analysis

One of the most common HR reporting tasks is tracking whether engagement improved since the last survey. `compare_cycles()` takes multiple survey objects and aligns them by question:

```{r eval=FALSE}
survey_path <- system.file("extdata", "survey_export.csv", package = "vivaglint")
survey_q1 <- read_glint_survey(survey_path, emp_id_col = "EMP ID")
survey_q2 <- read_glint_survey(survey_path, emp_id_col = "EMP ID")
survey_q3 <- read_glint_survey(survey_path, emp_id_col = "EMP ID")

trends <- compare_cycles(
  survey_q1, survey_q2, survey_q3,
  scale_points = 5,
  cycle_names = c("Q1 FY25", "Q2 FY25", "Q3 FY25")
)
```

The output includes all metrics from `summarize_survey()` for each cycle, plus:
- `change_from_previous` — Point change in mean score vs. prior cycle
- `pct_change_from_previous` — Percentage change vs. prior cycle

This is the foundation for executive trend slides: which items are improving, holding steady, or declining quarter over quarter.

---

## Manager-Level Analysis

HR business partners often need to understand which managers' teams are scoring below average, or identify pockets of high engagement to learn from.

### Roll Up to Manager Level

```{r eval=FALSE}
# Direct reports only
manager_summary <- aggregate_by_manager(survey, scale_points = 5)

# Full org tree (includes indirect reports)
manager_full <- aggregate_by_manager(survey, scale_points = 5, full_tree = TRUE)
```

Each row represents one manager × question combination and includes:
- `manager_id`, `manager_name`
- `team_size`
- All standard metrics: `mean`, `sd`, `glint_score`, `n_responses`, `n_skips`, `n_total`, `pct_favorable`, `pct_neutral`, `pct_unfavorable`

You can filter the result to identify managers with low favorability on a specific question:

```{r eval=FALSE}
library(dplyr)

# Managers where fewer than 50% of their team is favorable on a key item
low_engagement_managers <- manager_summary %>%
  filter(question == "I would recommend my team as a great place to work",
         pct_favorable < 50) %>%
  arrange(pct_favorable)
```

---

## Demographic Analysis

### Segment by Employee Attributes

`analyze_by_attributes()` lets you break survey results down by any combination of employee attributes — department, gender, tenure group, location, job level, etc.

```{r eval=FALSE}
attr_path <- system.file("extdata", "employee_attributes.csv", package = "vivaglint")
demo_results <- analyze_by_attributes(
  survey,
  attribute_file = attr_path,
  scale_points = 5,
  attribute_cols = c("Department", "Gender", "Tenure Group"),
  min_group_size = 10  # Suppress groups below this size for privacy
)
```

This is useful for identifying which employee populations have systematically lower scores and on which questions.

### Pre-Joining Attributes for Multiple Analyses

If you plan to run several analyses against the same attribute file, join it once and reuse the enriched survey object:

```{r eval=FALSE}
# Join once
attr_path <- system.file("extdata", "employee_attributes.csv", package = "vivaglint")
survey_enriched <- join_attributes(survey, attr_path)

# Reuse for multiple analyses — no need to re-read the file
dept_results <- analyze_by_attributes(
  survey_enriched,
  scale_points = 5,
  attribute_cols = "Department"
)

gender_results <- analyze_by_attributes(
  survey_enriched,
  scale_points = 5,
  attribute_cols = "Gender"
)

# Filter to a subpopulation before analyzing
na_only <- survey_enriched
na_only$data <- filter(survey_enriched$data, Region == "North America")
na_results <- analyze_by_attributes(na_only, scale_points = 5,
                                    attribute_cols = "Department")
```

Attributes are stored in `survey$metadata$attribute_cols` after joining, so they are excluded from question detection in downstream analyses.

---

## Attrition Risk Analysis

Linking engagement scores to actual turnover is a high-value analysis for HR leaders. `analyze_attrition()` computes attrition rates by favorability group — showing whether employees who scored low on engagement were more likely to leave.

```{r eval=FALSE}
# Basic attrition analysis (90, 180, and 365 days post-survey)
attrition_path <- system.file("extdata", "attrition.csv", package = "vivaglint")
attrition <- analyze_attrition(
  survey,
  attrition_file = attrition_path,
  emp_id_col = "EMP ID",
  term_date_col = "Termination Date",
  scale_points = 5
)
```

The output shows, for each question and time period, the attrition rates for favorable vs. unfavorable responders along with a risk ratio — making it straightforward to identify which survey items are the strongest leading indicators of turnover.

### Segment Attrition by Demographics

Combine attrition analysis with employee attributes to answer questions like "Are unfavorable responders in Engineering leaving at higher rates than those in Sales?":

```{r eval=FALSE}
attr_path <- system.file("extdata", "employee_attributes.csv", package = "vivaglint")
attrition_path <- system.file("extdata", "attrition.csv", package = "vivaglint")
survey_enriched <- join_attributes(survey, attr_path)

attrition_by_dept <- analyze_attrition(
  survey_enriched,
  attrition_file = attrition_path,
  emp_id_col = "EMP ID",
  term_date_col = "Termination Date",
  scale_points = 5,
  attribute_cols = c("Department", "Job Level"),
  min_group_size = 10
)
```

---

## Correlation and Factor Analysis

### Understand Which Items Move Together

Correlation analysis is useful for identifying clusters of related questions — often as a first step before building a composite score or validating that a set of items measures a single construct:

```{r eval=FALSE}
# Pearson correlations in long format (default)
correlations <- get_correlations(survey)

# Spearman correlations (more robust for ordinal scale data)
correlations_spearman <- get_correlations(survey, method = "spearman")

# Correlation matrix
cor_matrix <- get_correlations(survey, format = "matrix")
```

Supported methods: `"pearson"` (default), `"spearman"`, `"kendall"`

### Factor Analysis

Factor analysis identifies the latent constructs underlying a set of survey items. This can validate whether your "manager effectiveness" items truly cluster together, or reveal unexpected groupings:

```{r eval=FALSE}
# Requires the psych package
factors <- extract_survey_factors(survey, n_factors = 3, rotation = "oblimin")

# Consolidated summary: item, factor assignment, loading, label, communality
print(factors$factor_summary)

# Filter to items with strong factor loadings only
strong_loaders <- dplyr::filter(factors$factor_summary, loading_label == "Strong")

# Access the raw psych object for advanced use
factors$fa_object
```

Loading labels: **Strong** (≥ 0.75), **Medium** (0.60–0.74), **Weak** (< 0.60)

---

## Working with Comments

### Full-Text Comment Search

Search across all comment columns at once — useful for surfacing themes around specific topics like "flexibility", "burnout", or a manager's name:

```{r eval=FALSE}
# Fuzzy search (default) — tolerates minor spelling differences
flexibility_comments <- search_comments(survey, "flexibility")

# Exact, case-sensitive match
exact_results <- search_comments(survey, "work from home", exact = TRUE)

# Broaden fuzzy tolerance to catch more spelling variation
results <- search_comments(survey, "colaboration", max_distance = 0.3)
```

Each result row includes:
- `question` — Which question the comment was attached to
- `response` — The numeric score the employee gave
- `comment` — The comment text
- `topics` — Topic tags assigned by Glint

### Convert to Long Format for NLP

To route comments into a text analysis or NLP pipeline, reshape the survey to long format and filter to rows with comments:

```{r eval=FALSE}
# All responses in long format
long_all <- pivot_long(survey, data_type = "all")

# Comments only
long_comments <- pivot_long(survey, data_type = "comments")

# Both as separate tibbles
both <- pivot_long(survey, data_type = "both")
comments_df <- both$comments
```

---

## Separating Quantitative and Qualitative Data

For workflows that route numeric scores and open-text comments to different pipelines (e.g., numeric data to a statistical model, comments to an LLM), use `split_survey_data()`:

```{r eval=FALSE}
parts <- split_survey_data(survey)

# Numeric scores only — standard respondent columns + one score column per question
quantitative <- parts$quantitative

# Comments only — EMP ID + all _COMMENT, _COMMENT_TOPICS, _SENSITIVE_COMMENT_FLAG columns
qualitative <- parts$qualitative

# Pass numeric data directly to vivaglint functions
summary <- summarize_survey(parts$quantitative, scale_points = 5,
                            emp_id_col = "EMP ID")

# Rejoin at any time using EMP ID
full_data <- dplyr::left_join(parts$quantitative, parts$qualitative, by = "EMP ID")
```

---

## Privacy and Data Handling

### Minimum Group Sizes

Use `min_group_size` wherever it's available to suppress results for groups that are too small to protect individual anonymity:

```{r eval=FALSE}
# Default is 5; consider 10 or higher for sensitive analyses
attr_path <- system.file("extdata", "employee_attributes.csv", package = "vivaglint")
demo_results <- analyze_by_attributes(
  survey,
  attribute_file = attr_path,
  scale_points = 5,
  attribute_cols = c("Department", "Gender"),
  min_group_size = 10
)
```

### Local Processing

This package processes all data locally within your R environment. No employee data is transmitted to any external service, including Microsoft. Always follow your organization's data handling and privacy policies when working with employee survey data.

---

## Additional Resources

- **Function Documentation**: `?read_glint_survey`, `?summarize_survey`, `?analyze_by_attributes`, etc.
- **GitHub**: https://github.com/microsoft/vivaglint
- **Issues**: https://github.com/microsoft/vivaglint/issues