PartitionRepresentation

Bases: Representation

A partition of the indices into different representation modules.

Each index is assigned to an index in exactly one of the base representations. This representation is useful, e.g., when one of the base representations cannot provide vectors for each of the indices, and another representation is used as back-up.

Consider the following example: We only have textual information for two entities. We want to use textual features computed from them, which should not be trained. For the remaining entities we want to use directly trainable embeddings.

"""The partition representation."""

import torch

from pykeen.nn import Embedding, PartitionRepresentation, init
from pykeen.pipeline import pipeline
from pykeen.triples.generation import generate_triples_factory

num_entities = 5

# create embedding from label encodings
labels = {1: "a first description", 4: "a second description"}
label_initializer = init.LabelBasedInitializer(labels=list(labels.values()))
label_repr = label_initializer.as_embedding()
shape = label_repr.shape

# create a simple embedding matrix for all remaining ones
non_label_repr = Embedding(max_id=num_entities - len(labels), shape=shape)

# compose partition representation with a hard-coded assignment
assignment = torch.as_tensor([(1, 0), (0, 0), (1, 1), (1, 2), (0, 1)])
entity_repr = PartitionRepresentation(assignment=assignment, bases=[label_repr, non_label_repr])

# For brevity, we use here randomly generated triples factories instead of the actual data
training = generate_triples_factory(num_entities=num_entities, num_relations=5, num_triples=31)
testing = generate_triples_factory(num_entities=num_entities, num_relations=5, num_triples=17)

# we can use this to train a model
pipeline(
    interaction="distmult",
    dimensions={"d": shape[0]},
    model_kwargs=dict(entity_representations=entity_repr),
    training=training,
    testing=testing,
)

Note

For this simple but often occuring case, we provide a more convenient specialized BackfillRepresentation.

Initialize the representation.

Warning

The base representations have to have coherent shapes.

Parameters:

assignment (Tensor) – shape: (max_id, 2) the assignment, as tuples (base_id, local_id), where base_id refers to the index of the base representation and local_id is an index used to lookup in the base representation
shape (tuple[int, ...]) – the shape of an individual representation. If provided, must match the bases’ shape
bases (OneOrSequence[HintOrType[Representation]]) – the base representations, or hints thereof.
bases_kwargs (OneOrSequence[OptionalKwargs]) – keyword-based parameters to instantiate the base representations
kwargs – additional keyword-based parameters passed to Representation.__init__(). May not contain max_id, or shape, which are inferred from the base representations.

Raises:

ValueError – if any of the inputs is invalid

Note

The parameter pair (bases, bases_kwargs) is used for pykeen.nn.representation_resolver

An explanation of resolvers and how to use them is given in https://class-resolver.readthedocs.io/en/latest/.