You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
When running evaluations on the evaluation set, the client should be able to specify how many repetitions of the samples they want. eg for a 100-sized eval set, they may want to operate over it 3x (to bump up the eval size to 300) with different random seeds to get a breadth of the variance since in production they may not use greedy sampling.
Describe the solution you'd like
A pretty UI where the Eval client is greeted by, "Greedy" vs "Sampling" toggles, where Greedy explains that every sample will produce the same generation every time, and sampling means every time can be different. Then Sampling has a sub-toggle on maybe topP and Temperature to start auto-populated to the default that the service (Fireworks for example) is currently deploying as default.
Describe alternatives you've considered
At the very least a Greedy vs not Greedy switch. Clients may not know that they have variance at inference time and need to know if their evals are representative.
Additional context
Add any other context or screenshots about the feature request here.