It is an evaluation framework for evaluating and comparing graph embedding techniques

Dataset |
Structure |
Size |
---|---|---|

LP50 | doc1 doc2 avg | 50 docs |

The algorithm takes two documents *doc1* and *doc2* as its input and calculates their similarity as follows:

- For each document, the related set of entities is retrieved. The output of this step are the sets
*E1*and*E2*, respectively. - For each pair of entities (i.e. for the cross product of the sets), the similarity score is computed.
- Only the maximum value is preserved for determining the document similarity evaluation. Therefore, for each entity in
*E1*the maximum similarity to an entity in*E2*is kept and vice versa. - The similarity score between the two documents is calculated by averaging the sum of all these maximum similarities.

The *similarity_function* can be customized by the user.

The Document Similarity task simply ignores any *missing entities* and computes the similarity only on entities that both occur in the gold standard dataset and in the input file.

Metric |
Range |
Interpretation |
---|---|---|

Pearson correlation coefficient (P_cor) | [-1,1] | Extreme values: correlation, Values close to 0: no correlation |

Spearman correlation coefficient (S_cor) | [-1,1] | Extreme values: correlation, Values close to 0: no correlation |

Harmonic mean of P_cor and S_cor | [-1,1] | Extreme values: correlation, Values close to 0: no correlation |