Document Similarity

It is an evaluation framework for evaluating and comparing graph embedding techniques

Document Similarity

Datasets used as gold standard

Dataset Structure Size
LP50 doc1 doc2 avg 50 docs

Model

The algorithm takes two documents doc1 and doc2 as its input and calculates their similarity as follows:

The similarity_function can be customized by the user.

The Document Similarity task simply ignores any missing entities and computes the similarity only on entities that both occur in the gold standard dataset and in the input file.

Output of the evaluation

Metric Range Interpretation
Pearson correlation coefficient (P_cor) [-1,1] Extreme values: correlation, Values close to 0: no correlation
Spearman correlation coefficient (S_cor) [-1,1] Extreme values: correlation, Values close to 0: no correlation
Harmonic mean of P_cor and S_cor [-1,1] Extreme values: correlation, Values close to 0: no correlation