import Markdown from 'react-markdown';

import Authors from '../authors';
import type { BlogPostConfig } from './types';

const markdown = `
> *Terminology note: In this blog post, we’ll use the terms “foundation models” and “LLMs” interchangeably. Foundation models are usually large, pre-trained deep learning models that generalize to a variety of tasks, and LLMs are a type of foundation model focused on language (as opposed to other modalities like visual content). Colloquially, the terms “LLM” and “Generative AI” are used to actually describe foundation models more broadly, so for this post, when we use the term “LLM”, we generally mean foundation models more broadly.*
> 

New Large Language Models (LLMs) have caused seismic technological shifts in the last two years, allowing us to reimagine how we interact with products, generate content, and consume information. Companies in every industry are asking the question “How can we use LLMs to improve our products?”.

Trust & Safety teams are no exception, and for good reason - LLMs and other foundation models present an incredibly promising new approach to detecting harmful content and promoting safety online. But in our conversations with hundreds of Trust & Safety teams, we consistently find that the biggest hurdle to adopting LLMs is figuring out exactly *how* to use them.

- *Should you use popular hosted LLMs like GPT-4 (from OpenAI) or Claude (from Anthropic)?*
- *Should you use open source LLMs like Llama 3 (from Meta) or Mistral (from Mistral AI) and host them yourself?*
- *Should you fine-tune one of the open source models?*
- *Should you build your own LLM?*
- *How do you compare these options, and in what ways do they differ?*
- *How do you get a solution that takes full advantage of the power of LLMs without sacrificing on critical business needs like cost, availability, consistency, customizability, and control?*

Let’s get to the bottom of these questions!

## Why Use LLMs At All?

There are many industries that are adopting AI for the first time, and the benefits are obvious. In Trust & Safety, however, we’ve been using AI in the form of traditional “classifiers” for years. A classifier can take a piece of content and produce a score indicating whether the content violates a given policy.

If we’ve been using classifiers for years to automate content moderation, clearly these classifiers haven’t been sufficient because we’re far from perfecting automated content moderation. So what’s their big limitation, and how could LLMs address that limitation?

Traditional classifiers are built by first hand labeling a set of content, and then feeding that labeled content into an algorithm to produce a model. The algorithm’s job is to *understand the underlying qualities of the labeled content*, recognize what makes some content harmful and other content not harmful, and then apply those patterns in the future.

The problem is: *a lot* of labeled data is required to train models that can truly understand complex properties of text and visual content. And in Trust & Safety in particular, we need to build classifiers that can understand extremely complex, nuanced decisions (governing speech at scale isn’t easy!), so we need *tons* of labeled data in order to give the models a halfway decent understanding of how to enforce a content policy.

One of the key innovations of LLMs is that they’re trained on essentially all of the publicly available data on the internet, which means they have a very sophisticated understanding of language relative to the previous generation of classifiers. The consequence is that we no longer need vast amounts of labeled data in order to train good classifiers - we can now use an LLM’s intrinsic understanding of language to create systems that are far better at content moderation than non-LLM-based approaches.

## **The Two Ways To Use LLMs**

There are essentially two high level ways to use LLMs for content moderation:

**Approach 1: Prompting**

You can directly ask an LLM whether a piece of content violates a particular policy. For example, you can ask GPT-4: “Here is our hate speech policy: …. Does the following content violate that policy? Here is the content: …..”. This approach only works with *generative* LLMs, or LLMs that generate text (like GPT-4).

**Approach 2: LLM-Powered Classifiers**

You can use an LLM-powered solution (like Cove) that combines the power of LLMs with the benefits of classic ML models (i.e. “classifiers”). Classifiers can offer some nice benefits, including low cost, low latency, and high consistency (i.e. most classifiers are deterministic - they will always produce the same score for the same piece of content). But the classifiers of the past couldn’t understand language to the same degree that LLMs can. To get **the** **power of an LLM and the latency and cost benefits of a classifier**, you can use an LLM-powered classifier, which gives you the best of both worlds.

Depending on your goals and constraints, either approach might be better for you, so it might be worth experimenting with both.

In Part 2, we’ll walk through some tradeoffs to help you understand how to think about the decision.
`;

const ModeratingWithLLMsBlogPost = {
  slug: 'moderating-with-llms',
  title: 'Part 1: How to use LLMs for Content Moderation',
  date: new Date('2024-11-25T00:00:00-07:00'),
  author: Authors['michael'],
  coverPhoto: {
    url: '/graphics/moderating-with-llms.webp',
    attribution: 'Image: Dana Davis',
  },
  content: <Markdown>{markdown}</Markdown>,
  label: 'Moderating with LLMs',
  description:
    'Explore how companies can harness LLMs to revolutionize content moderation.',
} satisfies BlogPostConfig;

export default ModeratingWithLLMsBlogPost;
