The Few-Shot Prompt Booklet

Submitter: Alan M. Knowles, Wright State U

——————————————————

The experiment:

This semester-long project was assigned in an upper-level professional/technical writing (PTW) course. The goal was to encourage students to explore the utility of LLMs through serial encounters with the technology (Boyle, 2018). Students developed weekly few-shot training prompts for OpenAI’s GPT API (the “Playground”). Unlike the standard ChatGPT interface, the full API allows users to provide sample inputs/outputs to train the AI on what they want it to do. In the “Chat” mode of the API, the input is the “User” box and the output is the “Assistant” box. Put simply, this teaches the AI: if given X (sample input), generate Y (sample output).”

Students submitted weekly few-shot training prompts, each of which included the following:

A few-shot prompt: The input/output samples students developed. As a rule of thumb, prompts that use shorter inputs or generate longer outputs should include more samples than those that have longer inputs/outputs. Results: Students tested the performance of their few-shot prompts on 2 sample inputs. For each sample input, students shared the first 3 generated outputs.

Evaluation: Students used their generated outputs (results) to evaluate the performance of their few-shot prompts. This included notes about how the prompt samples could be revised to improve performance.

Students received peer feedback on every prompt, and included revised versions of 5 prompts in a “Few-Shot Prompt Booklet” at the end of the semester.

Results:

This project was very successful. For the first few prompts, I assigned students a task and provided them with the 2 inputs for testing. For example, students developed a prompt to rewrite abstracts from PTW publications for a non-SME audience of their choosing. These early prompts were instructive, as some performed notably better than others, leading students to a few preliminary conclusions about what makes few-shot prompts successful, such as: higher consistency in training samples (structure, style, etc.) results in better performance; and prompts that “transform” existing text perform better than those that generate entirely novel text.

For later submissions, students developed prompts for PTW use cases of their choosing. Some were promising, e.g., a prompt that generated standard English summaries of legalese from paragraphs of digital User Agreements. Others fell flat, often because they were too aspirational, prompting the AI to complete tasks it was unable to do reliably.

I attribute the success of this project to explicitly inviting the inclusion of unsuccessful prompts in students’ final booklets. Because of this, students were ambitious for some prompts, leading to instructive failures, and honest and insightful in their prompt evaluations, noting interesting shortcomings of the technology.

Students liked that the project invited failure, that it was scaffolded throughout the semester, and that it left them with a substantial item for their PTW portfolios.

Relevant resources:

Contact:

  • Email: knowles[DOT]alanm[AT]gmail[DOT]com
  • Twitter: @AlanMKnowles

Leave a Reply

Your email address will not be published. Required fields are marked *

*