Internationally known for its popular content sharing mobile apps, a Global Fortune 500 company wanted to improve its foundation models for the North American market.
The technology company had several projects they wanted to tackle around caption writing and prompt and response evaluation but lacked the expertise in both context and English language proficiency to achieve the high model accuracy they required.
The tech leader chose to partner with Sama to evaluate caption generation by their foundation model, which they planned to use for video search and personalization in addition to video editing and creation.
Combining partial automation with humans-in-the-loop (HITL), Sama’s in-house team of data experts reviewed image captions for factual errors and hallucinations, flagging any inconsistencies and rewriting captions as needed to create additional training data sets to fine-tune the model.
To expand their foothold in the North American market, it was especially important for the company’s large language model (LLM) to master the nuances of the English language in order to create engaging in-platform content and authentic virtual agent conversations. With that goal in mind, the Sama team evaluated a series of prompts and responses generated by the LLM, looking for errors and unexpected leaps in logic.
“When it comes to conversational AI solutions, the biggest challenge with fine tuning is subjectivity,” said Annepeace Alwala, Sama Vice President, Global Service Delivery. “For projects like this, we work closely with the client to define what’s impactful for the model’s performance — is it the response itself? Or the way the prompt is described? And how do they want to define what is objective and subjective?”
Using our domain expertise, Sama carefully evaluated each turn for tone, context, and chain of thought, rewriting prompts for clarity and intent and including rationale for preferred responses.
Our dedicated team helped the tech leader scale up caption and prompt evaluation, achieving a 95% acceptance rate with the client.
Sama’s combination of automation and HITL improved the performance of their Gen AI model, including:
Example: Retraining the model when it identified an object that wasn’t present in the video frame, leading to increased caption accuracy.
Example: Rewriting sentences with poor syntax or an inappropriate tone
Example: Reframing responses to reflect North American colloquialisms and local context
Due to significant model improvements, the client was also able to get to validation more quickly, leading to a better customer experience and subsequent project engagement.