Skip to content

ICaRL + Runtime Error #1692

Open
Open
@francescobarioni

Description

@francescobarioni

When attempting to load the model state into TrainEvalModel, a RuntimeError occurs due to a size mismatch for the parameter eval_classifier.class_means. Specifically, the checkpoint contains a parameter with shape torch.Size([8, 512]), while the current model expects a parameter with shape torch.Size([7, 512]).

The issue arises because the number of classes in the current model does not match the number of classes in the checkpoint.
Error:
avalanche\training\losses.py", line 61, in after_training_exp self.old_model.load_state_dict(strategy.model.state_dict()) RuntimeError: Error(s) in loading state_dict for TrainEvalModel: size mismatch for eval_classifier.class_means: copying a param with shape torch.Size([8, 512]) from checkpoint, the shape in current model is torch.Size([7, 512]).

Current Setup:
My dataset has 8 classes with images 224x224 rgb.
The model uses a ResNet18 backbone with a fully connected layer for classification.
The iCaRL strategy is used for continual learning, with eval_classifier being part of the TrainEvalModel.
The issue occurs because eval_classifier.class_means is dynamically sized based on the number of classes observed.
resnet_model.txt
TrainEvalModel.txt

Code

        self.model=lib.torchvision.models.resnet18(weights=lib.torchvision.models.ResNet18_Weights.DEFAULT)
        self.model.fc = lib.nn.Linear(self.model.fc.in_features, 8) 
        self.model.to(const.DEVICE)

        self.optimizer = lib.optim.Adam(self.model.parameters(), lr=0.001)
        self.criterion = lib.nn.CrossEntropyLoss()

        self.strategy = lib.ICaRL(
            feature_extractor= lib.torch.nn.Sequential(
                    *list(self.model.children())[:-1],
                    lib.torch.nn.Flatten(start_dim=1)  
            ),
            classifier=self.model.fc,
            optimizer=self.optimizer,
            memory_size=2000,
            buffer_transform=None,
            fixed_memory=True,
            train_mb_size=train_mb_size,
            eval_mb_size=eval_mb_size,
            train_epochs=train_epochs,
            device=const.DEVICE,
            evaluator=self.evaluator,
            plugins=[lib.ICaRLLossPlugin()]
        )

Problem start in avalanche\training\losses.py", line 61

    def after_training_exp(self, strategy, **kwargs):
        if self.old_model is None:
            old_model = copy.deepcopy(strategy.model)
            self.old_model = old_model.to(strategy.device)

        self.old_model.load_state_dict(strategy.model.state_dict())

        self.old_classes += np.unique(strategy.experience.dataset.targets).tolist()

Additional context:
Python version: 3.11.5
PyTorch version: 2.6.0+cu118
Avalanche version: 0.6.0
Operating system: Windows 11

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions