PartitionRepresentation
- class PartitionRepresentation(assignment: Tensor, shape: int | Sequence[int] | None = None, bases: str | Representation | type[Representation] | None | Sequence[str | Representation | type[Representation] | None] = None, bases_kwargs: Mapping[str, Any] | None | Sequence[Mapping[str, Any] | None] = None, **kwargs)[source]
Bases:
Representation
A partition of the indices into different representation modules.
Each index is assigned to an index in exactly one of the base representations. This representation is useful, e.g., when one of the base representations cannot provide vectors for each of the indices, and another representation is used as back-up.
Consider the following example: We only have textual information for two entities. We want to use textual features computed from them, which should not be trained. For the remaining entities we want to use directly trainable embeddings.
"""The partition representation.""" import torch from pykeen.nn import Embedding, PartitionRepresentation, init from pykeen.pipeline import pipeline from pykeen.triples.generation import generate_triples_factory num_entities = 5 # create embedding from label encodings labels = {1: "a first description", 4: "a second description"} label_initializer = init.LabelBasedInitializer(labels=list(labels.values())) label_repr = label_initializer.as_embedding() shape = label_repr.shape # create a simple embedding matrix for all remaining ones non_label_repr = Embedding(max_id=num_entities - len(labels), shape=shape) # compose partition representation with a hard-coded assignment assignment = torch.as_tensor([(1, 0), (0, 0), (1, 1), (1, 2), (0, 1)]) entity_repr = PartitionRepresentation(assignment=assignment, bases=[label_repr, non_label_repr]) # For brevity, we use here randomly generated triples factories instead of the actual data training = generate_triples_factory(num_entities=num_entities, num_relations=5, num_triples=31) testing = generate_triples_factory(num_entities=num_entities, num_relations=5, num_triples=17) # we can use this to train a model pipeline( interaction="distmult", dimensions={"d": shape[0]}, model_kwargs=dict(entity_representations=entity_repr), training=training, testing=testing, )
Note
For this simple but often occuring case, we provide a more convenient specialized
BackfillRepresentation
.Initialize the representation.
Warning
The base representations have to have coherent shapes.
- Parameters:
assignment (Tensor) – shape: (max_id, 2) the assignment, as tuples (base_id, local_id), where base_id refers to the index of the base representation and local_id is an index used to lookup in the base representation
shape (tuple[int, ...]) – the shape of an individual representation. If provided, must match the bases’ shape
bases (OneOrSequence[HintOrType[Representation]]) – the base representations, or hints thereof.
bases_kwargs (OneOrSequence[OptionalKwargs]) – keyword-based parameters to instantiate the base representations
kwargs – additional keyword-based parameters passed to
Representation.__init__()
. May not contain max_id, or shape, which are inferred from the base representations.
- Raises:
ValueError – if any of the inputs is invalid
Note
The parameter pair
(bases, bases_kwargs)
is used forpykeen.nn.representation_resolver
An explanation of resolvers and how to use them is given in https://class-resolver.readthedocs.io/en/latest/.