This vignette explains how proxyC compute the similarity and distance measures.
→x=[xi,xi+1,…,xn]→y=[yi,yi+1,…,yn] The length of the vector n=||→x||, while |→x| is the absolute values of the elements.
Operations on vectors are element-wise:
→z=→x→yn=||→x||=||→y||=||→z||
Summation of the elements of vectors is written using sigma without specifying the range:
∑→x=n∑i=1xi
When the elements of the vector is compared with a value in a pair of square brackets, the summation is counting the number of elements that equal (or unequal) to the value:
∑[→x=1]=n∑i=1[xi=1]
Similarity measures are available in
proxyC::simil()
.
simil=∑→x→y√∑→x2√∑→y2
simil=Cov(→x,→y)Var(→x)Var(→y)
The values of x and y are Boolean for “jaccard”.
e=∑→x→yw=user-provided weightsimil=e∑→xw+∑→yw−e
The values must be 0≤x≤1.0 and 0≤y≤1.0.
simil=∑min(→x,→y)∑max(→x,→y)
The values of x and y are Boolean for “dice”.
e=∑→x→yw=user-provided weightsimil=2e∑→xw+∑→yw
e=∑→x→yn=||→x||=||→y||u=n−esimil=e−ue+u
t=∑[→x=1][→y=1]f=∑[→x=0][→y=0]n=||→x||=||→y||simil=t+0.5fn
simil=∑[→x=→y]
Similarity measures are available in proxyC::dist()
.
Smoothing of the vectors can be performed when method
is
“chisquared”, “kullback”, “jefferys” or “jensen”: the value of
smooth
will be added to each element of →x and →y.
dist=∑|→x−→y|
dist=|→x−→y||→x|+|→y|
dist=∑√→x2+→y2
p=user-provided parameterdist=(∑|→x−→y|p)1p
dist=∑[→x≠→y]
dist=max
O_{ij} = \text{augmented matrix from } \vec{x} \text{ and } \vec{y} \\ E_{ij} = \text{matrix of expected count for } O_{ij} \\ dist = \sum{\frac{(O_{ij} - E_{ij}) ^ 2}{ E_{ij}}} \\
\vec{p} = \frac{\vec{x}}{\sum{\vec{x}}} \\ \vec{q} = \frac{\vec{y}}{\sum{\vec{y}}} \\ dist = \sum{\vec{q} \log_2{\frac{\vec{q}}{\vec{p}}}}
\vec{p} = \frac{\vec{x}}{\sum{\vec{x}}} \\ \vec{q} = \frac{\vec{y}}{\sum{\vec{y}}} \\ dist = \sum{\vec{q} \log_2{\frac{\vec{q}}{\vec{p}}}} + \sum{\vec{p} \log_2{\frac{\vec{p}}{\vec{q}}}}
\vec{p} = \frac{\vec{x}}{\sum{\vec{x}}} \\ \vec{q} = \frac{\vec{y}}{\sum{\vec{y}}} \\ \vec{m} = \frac{1}{2} (\vec{p} + \vec{q}) \\ dist = \frac{1}{2} \sum{\vec{q} \log_2{\frac{\vec{q}}{\vec{m}}}} + \frac{1}{2} \sum{\vec{p} \log_2{\frac{\vec{p}}{\vec{m}}}}