As Large Language Models (LLMs) become increasingly sophisticated, adversaries are finding new ways to exploit them for harmful content creation and distribution. However, the contrary is also true i.e. LLM are enablers for guardians of the internet in fighting abuse creatively, efficiently and effectively at scale.
This paper presents a case study of successful end to end deployment of "LLM as a Rater" within Google Discover's content moderation ecosystem with $xM annual savings.
We outline the significant challenges faced in existing content review within Google Discover, including high incoming volumes, long tailed languages, dynamically changing narratives, and operational bottlenecks. To tackle these issues, we deployed an LLM as a Rater solution. This model automatically annotates and dequeue URLs (30%+) with safe labels, significantly reducing the volume of content requiring human review (30% jump observed in reviewed volumes). LLMs offer scalability, ease of setup, and the ability to understand complex, nuanced narratives, however, challenges still persist.
LLMs can occasionally hallucinate thus requiring human-in-the-loop oversight to ensure accuracy. Initial deployment necessitates specialized resources and investment. This case study demonstrates the potential of LLMs to revolutionize content moderation. We believe this approach can be broadly applied across industries to effectively address user feedback, trust workflows, optimize resource allocation (currently content moderation is valued around 10B USD with expected growth of 9.6%) and improve user experiences on digital platforms.
6 Raffles Boulevard
Singapore 039594
Singapore