This vignette will describe the decision rules used in the original
method of Song
(2013) and the High CMC method of Tong
et al. (2015). For illustrative purposes, we will consider a
comparison between a known match and known non-match pair of cartridge
cases from the stuy performed by Fadul et
al. (2011). The raw cartridge case scans can be downloaded from the
NIST
Ballistics Toolmark Research Database. The scans were preprocessed
using functions available in the cmcR package and are
not discussed here. Refer to the fadul_examples.R
script
available on the cmcR
GitHub page for how these scans were preprocessed. We will also not
discuss how similarity features are extracted from two processed scans.
Refer to the documentation of the comparison_allTogether
function on the cmcR website for information regarding
this procedure.
library(cmcR)
library(dplyr)
library(ggplot2)
library(purrr)
library(tidyr)
library(gridExtra)
We will consider comparisons between three cartridge case scans. Fadul 1-1 and Fadul 1-2 are known matches (i.e., were fired from the same firearm) while Fadul 2-1 is a non-match. The comparisons considered are Fadul 1-1 vs. Fadul 1-2 and Fadul 1-1 vs. Fadul 2-1.
data("fadul1.1_processed")
data("fadul1.2_processed")
#Download a non-matching cartridge case to Fadul 1-1 and Fadul 1-2
.1_raw <- x3ptools::read_x3p("https://tsapps.nist.gov/NRBTD/Studies/CartridgeMeasurement/DownloadMeasurement/8ae0b86d-210a-41fd-ad75-8212f9522f96")
fadul2
.1_processed <- fadul2.1_raw %>%
fadul2preProcess_crop(region = "exterior",
radiusOffset = -30) %>%
preProcess_crop(region = "interior",
radiusOffset = 200) %>%
preProcess_removeTrend(statistic = "quantile",
tau = .5,
method = "fn") %>%
preProcess_gaussFilter() %>%
::sample_x3p() x3ptools
The three processed cartridge cases are shown below.
The cell-based comparison procedure implemented in the
comparison_allTogether
function returns a data frame/tibble
of similarity features between two cartridge case scans. For each cell
in the “reference” scan (Fadul 1-1 in this example), the similarity
features include
(x,y)
,
required to align the reference cell in the target scan(x,y)
values(x,y,CCF)
feature setThe fundamental assumption underlying all CMC decision rules is that
truly matching cartridge case pairs should have similarity features that
are consistent across the cell/region pairs. In particular, a plurality
of cell/region pairs should “vote” for similar
(x, y, theta)
alignment values. In contrast, the
cell/region pairs of a truly non-matching cartridge cases should have
seemingly random (x, y, theta)
votes. The two decision
rules implemented in the cmcR package can be understood
as two different systems by which cells vote for
(x, y, theta)
values that they “believe” to be the true
alignment values for the overall cartridge case scans.
An actual implementation of the original method of Song (2013) is described in Song et al. (2014). The decision rule Song et al. (2014) describe using is based on
a virtual reference with three reference registration parameters θref, xref and yref generated by the median values of the collective θ, and x-, y-translation values of all cell pairs.
That is, a consensus is determined by finding the median registration phase values across the cell/region pairs for a particular cartridge case pair comparison. Then, the distances between the consensus registration values and the cell comparison values are assessed to determine whether they are within a specified distance of the consensus. This consensus assessment introduces threshold parameters Tx,Ty,Tθ,TCCF.
Let xi,yi,θi denote the translation and rotation parameters which produce the highest CCF for the alignment of cell/region pair i. Also let xref,yref,θref be the median over alignment values for a particular cartridge case comparison (these are the “virtual reference” values). A cell/region pair i is declared a match if all of the following conditions hold:
With respect to the voting system analogy, we might interpret this decision rule as a single-choice voting system similar to the system used in U.S. presidential elections. That is, every cell is allowed to submit one vote corresponding to the registration phase with the highest CCF_{\max} value. Some of these votes are discarded if the associated CCF_{\max} are below the T_{\text{CCF}} threshold. A consensus is determined by counting the number of votes that are close to the reference values x_{\text{ref}}, y_{\text{ref}}, \theta_{\text{ref}} (which is dyadically defined based on the T_x,T_y,T_{\theta} thresholds).
The plot below shows the values of x_i, y_i, \theta_i, and CCF_{\max,i} for each cell/region pair between Fadul 1-1 and Fadul 1-2 as well as Fadul 1-1 and Fadul 2-1. These values are shown as blue/red bars. The purple bands indicate the range of acceptable values within T_{x} = 20 T_{y}, T_{\theta} = 6 within x_{\text{ref}}, y_{\text{ref}}, \theta_{\text{ref}} and above T_{\text{CCF}} = .5 to be declared “congruent.” As we might expect, a larger proportion of x_i, y_i, \theta_i, and CCF_{\max,i} values are within these acceptable ranges for the comparison between Fadul 1-1 and Fadul 1-2 than the comparison between Fadul 1-1 and Fadul 2-1. This indicates that there is a clearer “consensus” about the true alignment values for the matching cartridge case pair than the non-matching.
The first step in the High CMC method is to count the CMCs under the original method of Song (2013) in both comparison “directions,” meaning each scan plays the role as the “reference” and “target” scan. After these CMCs are counted, Tong et al. (2015) propose using the minimum of the 2 CMC counts as an initial CMC count prior to applying the High CMC decision rule. The figure below shows the behavior of the x_i, y_i, \theta_i, and CCF_{\max,i} values in each direction via a parallel-coordinates plot, which is useful for visualizing multi-dimensional data sets. Each connected path represents a single cell/region pair. The purple regions again represent the acceptable regions that are sufficiently “close” to the reference values (or above .5 in the case of the CCF). Paths that only traverse through purple regions are deemed congruent under the decision rule of the original method of Song (2013) and are colored blue. We can see that 19 cells are deemed congruent for the comparison in which Fadul 1-1 is treated as the reference while 18 are considered congruent in the other direction. As such, the initial CMC count used for the High CMC method would be 18.
By considering only the “top vote” of each cell as is done in the
decision rule of the original method of Song
(2013), information is lost regarding other registration phases for
which a cell might also rank highly. As Tong
et al. (2015) observe:
some of the valid cell pairs may be mistakenly excluded from the CMC count because by chance their correlation yields a higher CCF value at a rotation angle outside the threshold range T_\theta.
The High CMC method lifts the single-choice restriction by allowing cells to cast a vote for the translation phase at every \theta value for which it has a sufficiently large associated CCF_{\max} value. Under this system, each vote represents the translation phase that the cell considers to be the true translation phase of the overall scans conditional on a particular \theta value. In this way, the High CMC method might be viewed as an approval voting system in which an individual may cast a vote for all of the candidates that they would like. For each \theta value, the number of translation phase votes that are close to the \theta-specific reference values x_{\text{ref},\theta}, y_{\text{ref},\theta} are counted (now defined based only on the T_x,T_y thresholds). This yields what refer to as a “CMC-\theta” distribution representing, as they consider it, the number of “congruent cells” per \theta value. Thus, there may be more than one \theta value for which a single cell/region pair is considered congruent. While seemingly contradictory (as there should be only one “true” \theta alignment value), justify their method by the empirical observation:
[i]f two images are truly matching, the CMC-\theta distribution of matching image pairs should have a prominent peak located near the initial phase angle \Theta_0, while non-matching image pairs may have a relatively flat and random CMC-\theta distribution pattern.
The assumption underlying the High CMC method is that the number of cells classified as congruent should be larger near the true \theta value (the “initial phase angle \Theta_0”, as they call it) than for other \theta values if the cartridge case pair is indeed a match. These phenomena are illustrated in Figures \ref{fig:kmCMCPerTheta} and \ref{fig:knmCMCPerTheta}. shows the CMC counts per rotation value in both directions for the known match pair Fadul 1-1 and Fadul 1-2 from . We can clearly see a CMC mode around \theta = -24 in one direction and 21 in the other, which is to be expected for a known match pair. , on the other hand, shows the CMC counts for the known non-match pair Fadul 1-1 and Fadul 2-1; in this comparison, no such CMC count mode is achieved.
An example of the CMC-\theta
distribution for the comparison between Fadul 1-1 and Fadul 1-2 is shown
below. We can see that, conditional on \theta = -24 degrees, more cells tend to
have similar x, y
values than conditional on \theta = 30. The CCF values are also
larger. This indicates that \theta =
-24 is likely closer to the “true” rotation than \theta = 30 or elsewhere. The two
darker-shaded bars represent the \theta values that have a “High CMC”
count as described above. Because these \theta values are adjacent rather than
being far from each other, there is evidence that the “true” \theta value is approximately \theta = -24 or -27 degrees. We say that this comparison
direction would “pass” the High CMC criteria because the \theta values with high CMC counts are
adjacent.
Based on this observation, outline the following procedure for the High CMC method:
Conduct both forward and backward correlations at each rotation and record the registration based on CCF_{\max}, x, and y for each cell at each rotation. These data will be used in the next two steps separately.
At every rotation angle, each cell in the reference image finds a registration position in the compared image with a maximum CCF value. By selecting the registration with the maximum CCF value for each cell, the two CMC numbers determined by the four thresholds can be obtained based on the original algorithm []. The lower CMC number is used as the initial result.
Build CMC-\theta distributions using the data generated in step 1, by counting the number of cells that have congruent positions at each individual rotation angle. Calculate the angular range of “high CMCs” using both the forward and backward CMC-\theta distributions, as illustrated in Figs. 2 and 3.
If the angular range of the “high CMCs” is within the range T_\theta, identify the CMCs for each rotation angle in this range and combine them to give the number of CMCs for this comparison in place of the original CMC number. In this step, if the range is narrower than T_\theta, the nearby angles are included to make the range equal to T_\theta; CMCs with same index in each rotation are only counted once.
introduce an additional criteria to identify a mode in the CMC count per \theta distribution. Let \{\text{CMC}_{\theta} : \theta \in \Theta \} denote the CMC-\theta distribution where \Theta is the set of rotation values considered for the comparison. Define CMC_{\max} \equiv \max_{\theta} \{\text{CMC}_{\theta} : \theta \in \Theta\}. a “high” CMC threshold as CMC_{\text{high}} \equiv CMC_{\max} - \tau for some constant \tau (they choose \tau = 1). Now let \Theta_{\text{high}} \equiv \{\theta : \text{CMC}_{\theta} \geq \text{CMC}_{\text{high}}\}. That is, \Theta_{\text{high}} consists of the \theta values with “high” CMC counts. propose calculating R = \max_{\theta} \Theta_{\text{high}} - \min_{\theta} \Theta_{\text{high}}. If R \leq T_{\theta}, then there is evidence that a single mode exists in the CMC-\theta distribution (and thus that the cartridge case pair is a match). Otherwise, no such mode exists (by their definition) and the cartridge case pair is likely not a match. The horizontal dashed lines in Figures \ref{fig:kmCMCPerTheta} and \ref{fig:knmCMCPerTheta} represent the CMC_{\text{high}} thresholds. The \theta \in \Theta_{\text{high}} are represented by blue bars. For the matching pair shown in \ref{fig:kmCMCPerTheta}, the range of \Theta_{\text{high}} is less than the threshold T_{\theta} = 6 degrees, so this pair would “pass” the High CMC criteria. In contrast, the range of \Theta_{\text{high}} is larger than T_{\theta} = 6 degrees for the non-match pair shown in . Thus, the non-match pair would “fail” the High CMC criteria.
The “prominent peak” empirical observation upon which the High CMC method is based does seem to hold for many known match and known non-match pairs in our experience. However, we’ve observed that the behavior of the CMC-\theta distributions depend heavily on the preprocessing procedures used and thresholds set. In particular, the CMC-\theta distributions for some KNM pairs exhibit the prominent peak behavior for a wide range of threshold values making them difficult to distinguish from KM pairs.