Skip to content

[WIP] Fit api with sacred config #1331

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 16, 2020
Merged

Conversation

zhreshold
Copy link
Member

An alternative implementation related to #1323 which is better(IMO) than #1323 .

  • Easier to reuse common configurations(ingredients)
  • Much simpler config file and command line integration during the experimenting phase
  • prettier config with comments preserved!

Example

from sacred import Experiment

ex = Experiment('center_net_default', ingredients=[coco_detection, train_hyperparams])

@coco_detection.config
def update_coco_detection():
    data_shape = (512, 512)  # override coco config for center_net

@coco_detection.capture
def load_dataset(root, train_splits, valid_splits, valid_skip_empty, data_shape, cleanup):
    train_dataset = COCODetection(root=os.path.join(root, 'coco'), splits=train_splits)
    val_dataset = COCODetection(root=os.path.join(root, 'coco'),
                                splits=valid_splits, skip_empty=valid_skip_empty)
    val_metric = COCODetectionMetric(val_dataset,
                                     tempfile.NamedTemporaryFile('w', delete=False).name,
                                     cleanup=cleanup,
                                     data_shape=data_shape,
                                     post_affine=get_post_transform)
    return train_dataset, val_dataset, val_metric

@ex.config
def model():
    base_network = 'dla34_deconv_dcnv2'

@set_default(ex)
class CenterNetEstimator(BaseEstimator):
    def __init__(self, config, logger=None, logdir=None):
        super(CenterNetEstimator, self).__init__(config, logger, logdir)
        # print(self._cfg)

    def _fit(self):
        pass

@ex.automain
def main(_config, _log):
    # main is the commandline entry for user w/o coding
    c = CenterNetEstimator(_config, _log)
    c.fit()

One time configuration

First of all, a util function that returns the default arguments for this particular estimator is provided so that you don't have to repetitively write the docs for each argument. Though there's no excellent solution so far to inject comments into the docstring or yaml file.

Automatic docstring generation

A decorator for injecting the default arguments into the BaseEstimator with lots of helper functions is provided. The __doc__ will be generated by reading the default values from the config.

For example, call help(estimator) of above example will produce:

class CenterNetEstimator(gluoncv.pipelines.base.BaseEstimator)
 |  CenterNetEstimator(config=None, reporter=None, logdir=None)
 |
 |  CenterNet Estimator.
 |
Parameters
----------
config : str, dict
  Config used to override default configurations.
  If `str`, assume config file (.yml, .yaml) is used.
logger : logger, default is `None`.
  If not `None`, will use default logging object.
logdir : str, default is None.
  Directory for saving logs. If `None`, current working directory is used.

Default configurations:
----------
  base_network = 'dla34_deconv_dcnv2'
  coco_detection:
    cleanup = True
    data_shape = [512, 512]          # override coco config for center_net
    root = '/Users/zhiz/.mxnet/datasets'
    train_splits = 'instances_train2017'
    valid_skip_empty = False
    valid_splits = 'instances_val2017'
  train_hyperparams:
    batch_size = 32
    epochs = 3
    gpus = [0, 1, 2, 3]              # gpu individual ids, not necessarily consecutive
    num_workers = 16                 # cpu workers, the larger the more processes used
    pretrained_base = True           # use pre-trained weights from ImageNet
    resume = ''
    start_epoch = 0
    transfer_from = None

Parse configurations from yaml file or command line

The automain entry can be used to easily accept config file or command line overrides

python -m gluoncv.pipelines.estimator.center_net print_config with config.yaml

which produce

Configuration (modified, added, typechanged, doc):
  base_network = 'dla34_deconv_dcnv2'
  config_filename = 'config1.yaml'
  seed = 927480057                   # the random seed for this experiment
  coco_detection:
  1 base_network: '123'
    cleanup = True
    data_shape = [512, 512]          # override coco config for center_net
    root = '/Users/zhiz/.mxnet/datasets'
    train_splits = 'instances_train2017'
    valid_skip_empty = False
    valid_splits = 'instances_val2017'
  train_hyperparams:
    batch_size = 32
    epochs = 3
    gpus = [0, 1, 2, 3]              # gpu individual ids, not necessarily consecutive
    num_workers = 16                 # cpu workers, the larger the more processes used
    pretrained_base = True           # use pre-trained weights from ImageNet
    resume = ''
    start_epoch = 0
    transfer_from = None

or custom overrides

python -m gluoncv.pipelines.estimator.center_net print_config with 'train_hyperparams.batch_size=128'
Configuration (modified, added, typechanged, doc):
  base_network = 'dla34_deconv_dcnv2'
  config_filename = 'config1.yaml'
  seed = 927480057                   # the random seed for this experiment
  coco_detection:
  1 base_network: '123'
    cleanup = True
    data_shape = [512, 512]          # override coco config for center_net
    root = '/Users/zhiz/.mxnet/datasets'
    train_splits = 'instances_train2017'
    valid_skip_empty = False
    valid_splits = 'instances_val2017'
  train_hyperparams:
    batch_size = 128
    epochs = 3
    gpus = [0, 1, 2, 3]              # gpu individual ids, not necessarily consecutive
    num_workers = 16                 # cpu workers, the larger the more processes used
    pretrained_base = True           # use pre-trained weights from ImageNet
    resume = ''
    start_epoch = 0
    transfer_from = None

Finalize config and dump to yaml file

To ensure that we don't modify the config by accident, we can call finalize_config, this will also save the yaml file to logdir for reference.
The generated 'xxx_module.xxxEstimator-06-02-2020-23-12-03.yml' file looks like this:

base_network: dla34_deconv_dcnv2
coco_detection:
  cleanup: true
  data_shape:
  - 1024
  - 512
  root: /Users/zhiz/.mxnet/datasets
  train_splits: instances_train2017
  valid_skip_empty: false
  valid_splits: instances_val2017
seed: 772107811
train_hyperparams:
  batch_size: 32
  epochs: 3
  gpus:
  - 0
  - 1
  - 2
  - 3
  - 4
  num_workers: 32
  pretrained_base: true
  resume: ''
  start_epoch: 0
  transfer_from: null

@zhreshold
Copy link
Member Author

@Jerryzcn @chongruo Please leave your comments on pro/cons about these two solutions, I personally prefer this over #1323.


class BaseEstimator:
def __init__(self, config, logger=None, logdir=None):
self._ex.add_config(config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add some comment regarding set_defualt?

Copy link
Contributor

@Jerryzcn Jerryzcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could have some comment regarding BaseEstimator's set_default()

@chongruo
Copy link
Contributor

chongruo commented Jun 4, 2020

I wonder if it supports multilevel inheritance?

When users need to change some hype-parameters, is it allowed for them to create a new yaml file? thanks

@zhreshold
Copy link
Member Author

@chongruo It supports multilevel inheritance and multiple way to override the configs. A new yaml is allowed, with partial/full config overriding

@zhreshold
Copy link
Member Author

The doc is available here: https://sacred.readthedocs.io/en/stable/configuration.html

self._ex.add_config(config)
r = self._ex.run('_get_config', options={'--loglevel': 50})
self._cfg = r.config
self._log = logger if logger is not None else logging.getLogger(__name__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of logging.info you can always use self._log.info

@zhreshold zhreshold changed the base branch from master to estimator June 16, 2020 21:29
@zhreshold zhreshold merged commit c76ab24 into dmlc:estimator Jun 16, 2020
zhreshold added a commit that referenced this pull request Nov 2, 2020
* [WIP] Fit api with sacred config (#1331)

* config using sacred

* update

* update base

* allow attribute access, add warning to config modification (#1348)

* Faster R-CNN estimator (#1338)

* move rcnn forward backward task to model zoo

* revert #1249

* fix

* fix

* docstring

* fix style

* add docs

* faster rcnn estimator

* refactor

* move dataset to init

* lint

* merge

* disable sacred config for now

* logger fix

* fix fit

* update full centernet example (#1349)

* Autogluon Integration (#1355)

* move rcnn forward backward task to model zoo

* revert #1249

* fix

* fix

* docstring

* fix style

* add docs

* faster rcnn estimator

* refactor

* move dataset to init

* lint

* merge

* disable sacred config for now

* logger fix

* fix fit

* autogluon integration

* fix small bug. training working

* lint

* sacred config for faster rcnn (#1358)

* move rcnn forward backward task to model zoo

* revert #1249

* fix

* fix

* docstring

* fix style

* add docs

* faster rcnn estimator

* refactor

* move dataset to init

* lint

* merge

* disable sacred config for now

* logger fix

* fix fit

* autogluon integration

* fix small bug. training working

* lint

* sacred config for faster rcnn

* Finish centernet fit estimator (#1359)

* add voc detection pipeline

* update

* fix errors

* Add docs for Faster R-CNN config (#1361)

* move rcnn forward backward task to model zoo

* revert #1249

* fix

* fix

* docstring

* fix style

* add docs

* faster rcnn estimator

* refactor

* move dataset to init

* lint

* merge

* disable sacred config for now

* logger fix

* fix fit

* autogluon integration

* fix small bug. training working

* lint

* sacred config for faster rcnn

* add config docs

* Estimator rcnn (#1366)

* move rcnn forward backward task to model zoo

* revert #1249

* fix

* fix

* docstring

* fix style

* add docs

* faster rcnn estimator

* refactor

* move dataset to init

* lint

* merge

* disable sacred config for now

* logger fix

* fix fit

* autogluon integration

* fix small bug. training working

* lint

* sacred config for faster rcnn

* add config docs

* move all logging into base estimator logdir

* raise key error when config is not found in freezed config (#1367)

* auto object detection refactor (#1371)

* auto object detection refactor

* change logdir and common config

* centernet config change

* autogluon

* fix auto object detection task

* add mask_rcnn estimator (#1398)

* fix typo (#1378)

Co-authored-by: Ubuntu <ubuntu@ip-172-31-0-100.us-west-2.compute.internal>

* add ssd estimator (#1394)

* add ssd estimator

* modify ssd estimator

* provide options to customize network structure

* minor changes

* minor changes

* add auto detection using ssd

* add auto detection using ssd

* add custom model for ssd

* minor changes

* add tiny dataset for testing; fix errors in training auto detector

* [WIP] Add estimator(yolo) (#1380)

* yolo

* yolo

* add yolo

* [chore] update name, add fit script

Co-authored-by: Ubuntu <ubuntu@ip-172-31-0-100.us-west-2.compute.internal>

* auto register args (#1419)

* Auto detector (#1425)

* auto detector

* auto detector

* auto detector

* auto detector

* auto detector

* auto detector

Co-authored-by: Joshua Z. Zhang <cheungchih@gmail.com>

* update auto detection (#1436)

* auto detector

* auto detector

* auto detector

* auto detector

* auto detector

* auto detector

* update auto detection

* add a framework for automatic suggestion of hyperparameter search space

* 1) change auto_resume default to False; 2) change input config to estimator is a pure dict

* [fix] bugs for test_auto_detection (#1438)

Co-authored-by: Ubuntu <ubuntu@ip-172-31-31-97.us-west-2.compute.internal>

* add cls (#1381)

* update auto suggest (#1443)

* update auto suggest

* update auto suggest

* fix yolo errors (#1445)

* remove dependencies on AutoGluon non-core functions (#1452)

* remove dependency to autogluon non-core functions

* fix errors on importing estimators

* [Lint] fix pylint for estimator branch (#1451)

* fix pylint

* update mxnet build

* remove py2

* remove py2.yml

* fix jenkinsfile

* fix post_nms in rcnn

* fix

* fix doc build

* fix lint con't

* add sacred

* no tutorial yet

* estimator prototype for save/load (#1458)

* prototype for save/load

* add type check

* handle ctx

* fix

* collect

* fix classmethod self

* fix

* pickle only init args

* cast to numpy to avoid ctypes

* fix get data

* base estimator

* [WIP] Add detailed logging information for auto estimator (#1470)

* add detailed logging information for auto estimator

* fix lint error

* [WIP] Estimator data (#1471)

* dataframe for object detection

* fix pack and unpack for bboxes

* update

* refactor fit

* fix

* update pickle behavior

* update

* fix __all__

* Dataset as class property not module

* fix centernet, add image classification dataset

* fix

* fix

* fix logger not inited before init_network

* reuse weights from known classes

* add predict

* fix index

* format returned prediction

* fix id to int

* improve predict with pd.dataframe

* add numpy

* reset index

* clean up

* update image classification dataset

* dataset improvements

* valid url checker

* setup.py improve

* fix

* fix import utils

* add display to object detection

* fix

* change fit functions

* add coco import

* fix lint

* fix lint

* fix lint

* fix

* Estimator con't improvements (#1484)

* allow ssd/faster-rcnn to take in train/val dataset

* update

* fix

* update ssd

* fix ctx

* fix ctx

* fix self.datasets

* fix self.epoch

* remove async_net

* fix predict

* debug predict

* fix predict scores

* filter out invalid predictions

* fix faster_rcnn

* fix

* fix

* fix deepcopy

* fix fpn anchor generator

* fix ctx

* fix frcnn predict

* fix

* fix skipping logic

* fix yolo3

* fix import

* fix rename yoloestimator

* fix import

* fix yolo3 train

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix ctx

* fix trainer

* fix num_class < 5 for topk

* fix unpickable batch_fn

* fix print

* add predict

* fix cls predict

* fix cls predict

* fix cls predict

* fix cls predict

* improve auto fit

* improve auto fit

* fix

* fix

* fix

* fix

* fix

* debug

* fix

* fix

* fix

* fix

* fix

* fix

* fix reporter pickle

* change epochs to smaller

* update image cls search space

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* replace sacred with autocfg

* fix

* fix tuple type

* fix

* fix

* fix

* clean up

* remove sacred

* fix import

* fix import

* add types

* fix

* fix

* defaults for object detection

* fix

* fix

* update image classification

* change lr

* update

* Fix pylint

* Fix pylint

* fit summary

* pprint summary

* fix

* update

* fix single trial

* fix sample_config

* fix sample_config

* fix sample_config

* fix lint

* fix lint

* adjust batch size

* fix

* stacktrace

* fix

* fix traceback

* fix traceback

* fix train evaluation

* default networks

* default networks

* improves

* fix

* fix lint

Co-authored-by: tmwangcas <tmwang428@outlook.com>

* update script to master

* add unittests for auto

* update conda

* pin autogluon

* fix test

* fix

* fix ssd/yolo

* fix

* update defaults

* fix kv_store being overwriten

* fix rcnn batch size

Co-authored-by: Jerry Zhang <zhangz6@cs.washington.edu>
Co-authored-by: Tianming Wang <tmwang428@outlook.com>
Co-authored-by: Chongruo Wu <chongruo@gmail.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-0-100.us-west-2.compute.internal>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-31-97.us-west-2.compute.internal>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants