We should add a docs for how to use synthetic data gen from code. It's all there, but a sample would help.