Prompting, realized, and unrealized bias in generative AI

Andrew Marble
marble.onl
andrew@willows.ai
August 20, 2023

With increasing interest in generative AI, there has been an understandable focus on responsible and accountable use of this technology, including addressing bias in system output as well as the underlying training data. There are many examples of foundation models in their “raw” form reflecting back the composition of their training sets, and this has been construed as bias. With generative text and image models, the introduction of prompting (providing input context and instructions) can decouple dataset bias from how the models actually perform as part of a system. With bigger, smarter models, data bias becomes less important to how the system actually performs when properly configured. The ability to demonstrate contrived or “unrealized” bias in a model does not automatically equate to a model that will result in biased system performance. I discuss this in the context of a newly introduced code of practice for generative AI models.

Last week (August 16, 2023), Industry Canada published some information on “potential elements” of a voluntary “code of practice” for generative AI1. Overall the guidance makes sense (other than watermarking requirements), is relatively apolitical (compare to what you might see in a “responsible” AI license) and is a good starting point for anyone wanting to use ML in an application where accountability is required. This is a list of potential points that have been published seeking comment. I’ve added numbers and highlighted the ones I want to focus on here. The main points are that those building generative models are asked to:

  1. identify the ways that the system may attract malicious use

  2. identify the ways that the system may attract harmful inappropriate use

  3. assess and curate datasets to avoid low-quality data and non-representative datasets/biases.

  4. implement measures to assess and mitigate risk of biased output

  5. provide a reliable and freely available method to detect content generated by the AI system

  6. provide a meaningful explanation of the process used to develop the system

  7. ensure that systems that could be mistaken for humans are clearly and prominently identified as AI systems

  8. provide human oversight in the deployment and operations of their system

  9. implement mechanisms to allow adverse impacts to be identified and reported

  10. use a wide variety of testing methods to measure performance and identify vulnerabilities

  11. employ appropriate cybersecurity measures to prevent or identify adversarial attacks

  12. ensure that multiple lines of defence are in place to secure the safety of their system

  13. develop policies, procedures, and training

Short rant

I first want to digress briefly to touch on the “watermarking” requirement (5) that asks developers to provide a way of detecting that content is AI generated. A similar suggestion was recently released by another policy group2. Although watermarking methods exist, these are trivially bypassed, and reminiscent of the failed anti-copying circumventions for media that have been subject to an arms race since the invention of the VCR. Additionally, even if it was possible, it’s pointless, and (to use another analogy) the idea that people need to know if something is AI generated or not is the same kind of panic we’ve seen with the introduction of other media: the frenzy caused by War of the Worlds being read on the radio in 19383, or the (apparently apocryphal) story of movie theatre audiences in 1896 jumping out of the way when shown a train barrelling towards them on screen4. The concern seems to be more about discomfort with new technology than any concrete harm that could result. Explicitly knowing that something came from generative AI (especially vs. photoshop) is not going to be anyone’s concern as it becomes widespread, and wasting time on a game of technological leapfrog to alternately build and conquer marking methods serves nobody.

That all said, I wholeheartedly agree with point 7 that says AI systems should be labeled as such. It’s basic decency to tell someone they are talking to a computer e.g. when receiving customer “support” (read diversion), and give options to talk to a real person. And in general I think voluntary labeling of anything substantially created by AI is appropriate and responsible. It’s the need for detection implying some adversarial case where output hasn’t been freely labeled as such that is a pointless battle.

Bias

That out of the way, the points I really want to talk about are 3 and 4, about bias. The code of practice includes the following preamble for those points:

Due to the broad datasets on which they are trained and the scale at which they are deployed, generative AI systems can have important adverse impacts on societal fairness and equity through, for example, perpetuation of biases and harmful stereotypes. It will be essential to ensure that models are trained on appropriate and representative data, and provide relevant, accurate, and unbiased outputs.

Words like “bias” and “societal fairness and equity” are imprecise and not that useful in regulation. In fact, they remind me of the famous “I know it when I see it” quote about obscenity from Justice Potter Stewart5 but that’s for another time. I’m going to use the term bias anyway, in a general sense to mean “skewed in a way we don’t want”. The code talks about “appropriate and representative data”, with point 3 asking that datasets be curated so as not to be low quality and to be representative. “Representative” sounds wishy-washy as well, but to me means that the training data reflects what you’ll see in production and can be precisely defined and measured. Training data and bias are important and should be part of the framework. But I believe there need to be some caveats and room for judgement to be exercised on how important they are in a particular application.

When bias doesn’t matter

You shouldn’t use AI as a decision maker when bias could be a concern. It stereotypes, by definition, so if that’s not something acceptable to you, it’s better to look at other options. AI is really good at “feature extraction” – providing intermediate output that can inform a decision, and that’s the preferred use. Accordingly, good uses of AI are hardened against aspects of the training data, such as bias, that could lead to an inappropriate decision being made. As models get bigger and AI centers increasingly on “foundation models”, any bias or imbalance in the training data can be made less important. Just like people can be exposed to imbalanced or biased views but still act appropriately, the bigger models are not “corrupted” by bias in the same way as a simple one might be.

Generative AI is not the first instance of AI to have sparked concerns about bias in the datasets. Image-net, a big collection of online images that was used to train the breakthrough deep learning models of the 2010’s notably includes images of people labeled according to various offensive stereotypes. That became a problem when Image-net was used to train an online classifier that captioned user submitted selfies.6 But such a classifier (arguably an inappropriate use covered by point 2) is not what Image-net is for, it’s a contrived example, not a use case. The main use of datasets like Image-net, just like more modern ones (LAION, etc.) is to pre-train models so that they can learn generic image features and then be fine-tuned for specific applications. The point of the dataset is to provide a task to force a model to learn rules that distinguish a broad set of images so the model is pre-conditioned to be trained for a real downstream application. Training a model on Image-net gives us something that’s good at extracting image features with predictive value, and that’s how it should be used. Subsequent fine-tuning for an end use is typically on a smaller and better quality dataset, to which it makes sense to apply the guidance in the code of practice.

With modern generative AI, a similar principle holds, though specifics of the training data can be even less important because of prompting. In the emerging use pattern for LLMs, the model is provided with a “prompt” often a large passage of text, followed by an instruction to execute based on the text in the prompt. The model itself is there for language and reasoning ability while the prompt provides the specific context information for the task. In ChatGPT, the model is also queried directly about facts that have been memorized, but can be constrained to facts provided in a prompt. Say we are using a LLM to help with resume screening, a task where fair treatment of candidates is important. We can prompt with the resume text and questions we want to ask. I tried this, pasting my resume into ChatGPT with a preamble asking to only base the answers on the resume7 and asking the following questions.

Question Answer
Does this person have experience working at a start-up? Yes, this person has experience working at a start-up. According to the resume, they worked at Element AI, which is described as a technology start-up commercializing machine learning research.
Does this person have software project management experience? Yes, this person has software project management experience. According to the resume, while working at Element AI, they managed the lifecycle (sales, scoping, delivery) of "Research as a Service" (RaaS) projects in advanced manufacturing (computer vision). They led a team of software developers, business strategists, research scientists, and machine learning engineers. Additionally, as the owner of Willows AI, they were responsible for the administration of the company, as well as technical product development, delivery, and business development, which implies a role in managing software projects as part of the product development process.
Does this person have public speaking experience? Yes, this person has public speaking experience. According to the resume, while working at KPMG in Manager, Advisory Services, they presented findings to senior stakeholders (C-level equivalent). Additionally, as an Assistant Professor at Carleton University, they developed and taught senior undergraduate courses, which would have involved lecturing and presenting to students. These roles suggest that this person is experienced in presenting and communicating information to others in a formal setting.
Does this person have experience with database technologies? Based on the information provided in the resume, there is no explicit mention of experience with database technologies. The technologies section lists various programming languages, tools, and cloud services, but there are no specific database technologies listed.
Does this person have experience with agile methodology? Based on the information provided in the resume, there is no explicit mention of experience with agile methodology. The resume outlines various roles and responsibilities that involve project management and team leadership, but it does not specifically state that the person has experience working with agile methodology.
Is this person bilingual (french/english)? Based on the information provided in the resume, this person speaks some French, as indicated in the "Personal" section where it is mentioned that they speak some French and beginner Japanese. However, the resume does not explicitly state whether this person is fluent in French or if they are bilingual in French and English. The resume is written in English, suggesting proficiency in that language.

The questions are job-screening criteria that might be relevant for a particular position. The answers are factual and based on the resume provided and the model’s understanding of language, not any memorized facts or preconceptions that came from the content of the training data. The model has been used for what it’s good at (manipulating language) and we’ve avoided things it could be bad at (reflecting any stereotypes in the training data). Compare this to some applications where AI is used to opaquely derive an overall “applicant score” and thus expose job seekers to the full biases of the model. Of course, the rest of the code of practice is still relevant: if we were building an actual resume screening model, it would still have to be tested and secured and overseen etc. Additionally, the answers are verbose and you can imagine another model or better prompts that get to the point quickly.

This model of using language models in particular as a form of “natural language automation” as opposed to a source of facts or opinions can mitigate or eliminate issues with the training data being reflected in system output.

Amplifying Bias

At the other end of the spectrum of bias sensitivity in generative AI are text-to-image models. If prompted with a short phrase, the model is essentially stuck making up some stereotype about what a picture of that might look like, and that’s going to harshly reflect what’s in the training data because the model has to rely entirely on its preconceptions. This has led to clearly inappropriate stereotyping, in particular when applied to images of people8. There are times when trying to carefully balance the training data makes sense, but a more robust solution is to remove data dependence as much as possible.

For generative AI, this can again be by suitable prompting. In the pictures below, I prompted Dall-E to make pictures of ducks in outer space. I first prompted generically for a duck and predictably the pictures were all of standard mallard type ducks like the one on the left. The model has to make something up, and presumably mallard ducks are the most common in the training data so that’s what it shows. If I wanted another kind of duck, rather than rely on the balance of the training data, I can change the prompt to ask for it. On the right I got a Muscovy duck when I put that in the prompt.

A photo of a duck in outer space A photo of a muscovy duck in outer space

Like with the resume example, prompting can be a powerful tool to decouple bias in the training data from the system output. If I was building a space-duck generator app (VCs: I’m open to raising a seed round) I could either ask the user to pick the kind of duck they wanted, or append a random duck type to the prompt if I was concerned about creating a good variety. This lets me use the power of the foundation model – being able to combine stuff in an image – without worrying about its biases.

Realized vs. Unrealized bias

I recently wrote that even though AI foundation models can generate copyrighted content, they don’t, by their existence alone violate copyright9. It would be possible to use a foundation model exclusively to violate copyright, or to fine-tune a model that only violates copyright, but that’s not the default. I see a similar analogy in concerns about bias. A text-to-image foundation model trained on LAION (roughly all the public images on the internet) is going to reflect all the biases in those images, and one could use it to generate biased output. But it’s existence shouldn’t be a concern. It’s primary function is to have built a deep understanding of the structure of language and composition of images that can be applied in specific contexts, and it is a necessary, if sometimes dirty, intermediate product. It’s when it gets used in an application where the bias matters concretely and we should be concerned and potentially want safeguards. I’ve used text-to-image mostly to generate pictures of defective metal parts for manufacturing research, and I don’t care what societal biases are included (other than the ones about pictures of metal cases which skew towards cartoony desktop backgrounds). But I’m glad it’s been trained on all of LAION because the large loosely curated dataset provides a scale allowing the model to build a general relationship between words and pictures.

Conclusion

In practice, pre-existing bias in the training data, even for foundation models is something that should at least be considered, and assessing its impact makes sense as part of responsible system development. However there are many cases where bias, even if demonstrable in a contrived example, does not impact system performance, and striving for a balanced dataset would be a waste of effort. The best applications of generative AI are those that have been made specifically robust to whatever preconceptions are necessarily going to be in the foundation model. And suitable prompting can be an important component of ensuring an unbiased, accountable model.

These comments are specific to generative AI, as is the code of practice that’s been published. In a more traditional machine learning system like a classifier, the model is by definition stereotyping and its decisions are going to be tightly coupled to patterns in the training data, both causal and spurious. Specific training data can be much more important here, as can the role of the model in the final system decision.


  1. https://ised-isde.canada.ca/site/ised/en/consultation-development-canadian-code-practice-generative-artificial-intelligence-systems/canadian-guardrails-generative-ai-code-practice↩︎

  2. https://gpai.ai/projects/responsible-ai/social-media-governance/Social%20Media%20Governance%20Project%20-%20July%202023.pdf↩︎

  3. https://www.smithsonianmag.com/history/infamous-war-worlds-radio-broadcast-was-magnificent-fluke-180955180/↩︎

  4. https://www.atlasobscura.com/articles/did-a-silent-film-about-a-train-really-cause-audiences-to-stampede↩︎

  5. https://mtsu.edu/first-amendment/article/392/jacobellis-v-ohio↩︎

  6. https://hyperallergic.com/518822/600000-images-removed-from-ai-database-after-art-project-exposes-racist-bias/↩︎

  7. https://chat.openai.com/share/143d004f-524b-4b41-83db-732e6e0c6032↩︎

  8. https://www.technologyreview.com/2022/12/12/1064751/the-viral-ai-avatar-app-lensa-undressed-me-without-my-consent/↩︎

  9. http://marble.onl/posts/general_technology_doesnt_violate_copyright.html↩︎