FAQ

Frequently Asked Questions

Java error: Could not find the main class: pfpack:JPathfinder.Program will exit

This error could result from not having an up-to-date Java on your machine. Try updating Java. Another possibility is that the download was corrupted, download JPathfinder.jar again.

Similarity, relatedness, dissimilarity, and distance

Data come in various forms. Of primary importance is the directonality of the data values. With similarity or relatedness, higher values mean more similar or more related pairs of items, and "similarity" or "similarities" should appear on the second line of a proximity data file. With dissimilarity or distance, lower values mean more related or smaller distance between pairs of items, and "distance" or "dissimilarity" should appear on the second line of a proximity data file.

JPathfinder algorithms use distances (or dissimilarities) rather than similarities so similarity data are inverted using the equation:

dis(i,j) = max - sim(i,j) + min

What is the Coherence measure and when is it appropriate?

The measure is based on a kind of transitivity assumption, i.e., if two concepts have similar relationships with other concepts, then the two concepts should be similar to one another. Now we know that transitivity does not necessarily hold for all sets of concepts, but failure of transitivity is the exception while transitivity is the rule. The Coherence measure computes an indirect measure of similarity by correlating the distances for each item in a pair with all of the other concepts -- if we have 8 concepts (ABCDEFGH) then for the pair AB we would correlate the ratings in the first two columns inthe following table. For the CF pair we would correlate the distances in the last two columns.

.........AB Pair............CF Pair.....

AC	BC	CA	FA
AD	BD	CB	FB
AE	BE	CD	FD
AF	BF	CE	FE
AG	BG	CG	FG
AH	BH	CH	FH

These correlations give the indirect similarity of AB - the extent to which A and B have similar relationships with other concepts and likewise for the pair CF. If we do this for all pairs, we can construct a half-matrix of indirect similarities. Then, Coherence is the negative of the correlation between these indirect measures and the original distances for each pair. A more consistent set of distance data will yield a higher Coherence. One data set came from the study of physics expertise where Coherence increased monotonically with level of expertise. Extremely low Coherence (less than .15) may indicate that the rater did not take the task seriously, or the rater did not understand the concepts very well.

The negative of the correlation of indirect similarities and distances is used because the indirect similarities have an opposite direction to the distance data itself, and greater Coherence should reflect greater consistency.

Correlation of Proximity Data

The correlation between proximity data files is accomplished by simply correlating the corresponding proximity data entries in the two files. Because missing (or infinite, out of range) data in either member of a pair of proximities leads to dropping the pair from the correlation, the measure may not be very good for data with a large proportion of missing proximities. The correlations are performed on the distance data generated from the proximity data files or from averages of distances from multiple files.

Network Similarity: C, S, C-E[C], S-E(S), and P(C_or_more)

Given two networks on the same nodes:
C is the number of links in common between the 2 networks.
...L1 is the number of links in network 1.
...L2 is the number of links in network 2.
S is the similarity of two networks : S = C / (L1 + L2 - C).
S will be a value from 0 to 1 or from no common links to all links in common.
...E[X] means the expected value of X by chance alone.
C-E[C] is the amout by which C exceeds its chance expected value.
S-E[S] is the amount by which S exceeds its chance expected value.
P(C_or_more) is the probability of C or more links in common by chance.
which is the same as the probability of S or greater by chance.

What is the difference between network similarity and "Closeness" that I have read about?

Goldsmith and his colleagues originally developed a measure of network similarity that they called C (for Closeness). This measure involved computing the similarity of the neighborhood for each node and then averaging this measure across nodes. Unfortunately, the probability distribution for this measure has not been discovered. So the similarity measure used in the Pathfinder software was developed. This measure is the ratio of the number of links shared by two networks over the number of links found in either of the two networks (the cardinality of the intersection of the links in the two networks over the cardinality of the union of the links in the two networks). This measure follows the hypergeometric probability distribution. The Pathfinder software computes values from this distribution to provide information concerning the similarity expected by chance.

Pathfinder Networks