Training data pipeline for global brands (canary-1b) #14049
PardisTaghavi
started this conversation in
General
Replies: 2 comments
-
Beta Was this translation helpful? Give feedback.
0 replies
-
With canary-1b its more reliant on dataset. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello Canary Team,
Thank you for all the great work on this model. I have a general question regarding its training and capabilities.
My question is about the model's out-of-the-box ability to recognize high-value proper nouns, specifically major brand names (e.g., Google, Meta, Amazon, Microsoft, etc.).
I'm trying to understand if this capability is an intentional part of the training process or more of an emergent property of the large, general dataset used. Specifically:
Was the training dataset intentionally curated or augmented to include a robust representation of major global brand names or we just rely on general dataset?
Thank you
Beta Was this translation helpful? Give feedback.
All reactions