Big bug in PPO2

In `dist = Normal(mu, sigma)` , `sigma` should be a positive value, but actor_net output can be negative, so `action_log_prob = dist.log_prob(action)` can be `nan`.

Try:
```
import torch
a = torch.FloatTensor([1]).cuda()
b = torch.FloatTensor([-1]).cuda()
dist = Normal(a,b)
action = dist.sample()
action_log_prob = dist.log_prob(action)

print(action.cpu().numpy())
print(action_log_prob.item())
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Big bug in PPO2 #35

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Big bug in PPO2 #35

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions