Description
I've been playing with open_spiel's R-NAD algorithm implementation in python and noticed some strange behavior: each time R-NAD calls state.observation_tensor()
a new state is created, then there is a call to observation.set_from(new_state)
and only then the original observation.set_from(state) is called. It also looks like the new state is not a clone of the original one.
My game implementation is in python. Here is an excerpt from its configuration:
pyspiel.GameType(
dynamics = pyspiel.GameType.Dynamics.SEQUENTIAL,
chance_mode = pyspiel.GameType.ChanceMode.EXPLICIT_STOCHASTIC,
information = pyspiel.GameType.Information.IMPERFECT_INFORMATION,
utility = pyspiel.GameType.Utility.ZERO_SUM,
reward_model = pyspiel.GameType.RewardModel.TERMINAL,
max_num_players = _NUM_PLAYERS,
min_num_players = _NUM_PLAYERS,
provides_information_state_string = False,
provides_information_state_tensor = False,
provides_observation_string = False,
provides_observation_tensor = True,
provides_factored_observation_string = False,
parameter_specification = {}
)
There is a great chance I'm doing something wrong. on the other hand I could not find any issue related to the described behavior in my code. I also tried to find the actual code of State::ObservationTensor()
, but I guess the implementation of the virtual method is in the pyspiel.State
which to my embarrassment I was not able to find.
Please advise.