Skip to content

Use evaluation loop #1543

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 20 commits into
base: master
Choose a base branch
from
Draft

Use evaluation loop #1543

wants to merge 20 commits into from

Conversation

mberr
Copy link
Member

@mberr mberr commented May 11, 2025

Update the code to use EvaluationLoop instead of the Evaluator.evaluate.

EvaluationLoop uses a PyTorch dataset and dataloader to improve communication and computation overlap. In the filtered setting, EvaluationLoop precomputes the filter positions once and re-uses them next time.

Also implement support for slicing.

model=model_instance,
triples_factory=evaluation_factory,
evaluator=evaluator_instance,
# TODO: mode support?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which mode is already considered the "default"?

maybe just make this an argument, and raise a NotImplementedError if it's set to anything else (for now)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None is the default (which corresponds to the transductive setting).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on second look, the mode is passed through to the evaluator instance, so the way to support this is during construction of that

"""
self.model = model
self.evaluator = evaluator
self.dataset = dataset
if mode is not None:
raise NotImplementedError(f"non-transductive evaluation mode is not yet implemented: {mode}")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The evaluation loop supports it. The only issue is that we don't expose the mode through the pipeline. However, this may be acceptable since adding support for inductive models to the pipeline could make it more difficult to determine which parameter combinations to provide.

Copy link
Member

@cthoyt cthoyt May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! I noticed I misunderstood this and updated it after the first commit.

There was definitely some discussion somewhere about how to document exposing the inductive mode via the pipeline (#1503 (comment), moved to its own issue in #1544)

I wouldn't block up this PR with solving that problem

@@ -169,12 +175,16 @@ def evaluate(
batch_size = determine_maximum_batch_size(
batch_size=batch_size, device=self.model.device, maximum_batch_size=len(self.dataset)
)
# set upper limit for slice size
# TODO: if we knew the targets here, we could guess this better
slice_size = max(self.model.num_entities, self.model.num_relations)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are parts of the pipeline that actually try to pass slice_size into this function (I am doing an experiment with typeddicts locally, which would make this PR much messier, but I found this)

so maybe slice_size should be an optional[int] where this gets calculated if it's not passed directly (or does this blow up AMO?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a7d9231

(or does this blow up AMO?)

The function that is wrapped by AMO needs to pass an int

@cthoyt cthoyt mentioned this pull request May 16, 2025
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants