Description
When attempting to load the model state into TrainEvalModel
, a RuntimeError
occurs due to a size mismatch for the parameter eval_classifier.class_means
. Specifically, the checkpoint contains a parameter with shape torch.Size([8, 512])
, while the current model expects a parameter with shape torch.Size([7, 512])
.
The issue arises because the number of classes in the current model does not match the number of classes in the checkpoint.
Error:
avalanche\training\losses.py", line 61, in after_training_exp self.old_model.load_state_dict(strategy.model.state_dict()) RuntimeError: Error(s) in loading state_dict for TrainEvalModel: size mismatch for eval_classifier.class_means: copying a param with shape torch.Size([8, 512]) from checkpoint, the shape in current model is torch.Size([7, 512]).
Current Setup:
My dataset has 8 classes with images 224x224 rgb.
The model uses a ResNet18 backbone with a fully connected layer for classification.
The iCaRL strategy is used for continual learning, with eval_classifier being part of the TrainEvalModel.
The issue occurs because eval_classifier.class_means is dynamically sized based on the number of classes observed.
resnet_model.txt
TrainEvalModel.txt
Code
self.model=lib.torchvision.models.resnet18(weights=lib.torchvision.models.ResNet18_Weights.DEFAULT)
self.model.fc = lib.nn.Linear(self.model.fc.in_features, 8)
self.model.to(const.DEVICE)
self.optimizer = lib.optim.Adam(self.model.parameters(), lr=0.001)
self.criterion = lib.nn.CrossEntropyLoss()
self.strategy = lib.ICaRL(
feature_extractor= lib.torch.nn.Sequential(
*list(self.model.children())[:-1],
lib.torch.nn.Flatten(start_dim=1)
),
classifier=self.model.fc,
optimizer=self.optimizer,
memory_size=2000,
buffer_transform=None,
fixed_memory=True,
train_mb_size=train_mb_size,
eval_mb_size=eval_mb_size,
train_epochs=train_epochs,
device=const.DEVICE,
evaluator=self.evaluator,
plugins=[lib.ICaRLLossPlugin()]
)
Problem start in avalanche\training\losses.py", line 61
def after_training_exp(self, strategy, **kwargs):
if self.old_model is None:
old_model = copy.deepcopy(strategy.model)
self.old_model = old_model.to(strategy.device)
self.old_model.load_state_dict(strategy.model.state_dict())
self.old_classes += np.unique(strategy.experience.dataset.targets).tolist()
Additional context:
Python version: 3.11.5
PyTorch version: 2.6.0+cu118
Avalanche version: 0.6.0
Operating system: Windows 11