Exploring the Utility of LLMs through Prompt (Reverse) Engineering

Submitter: Alan M. Knowles, Wright State U

——————————————————

The experiment:

This activity was used in a Business Writing course. It had two major goals: to explore the risks of relying on large language models (LLMs) for information, and to determine if users can limit the generation of inaccurate content by retaining bulk of the intellectual labor in their LLMs prompts.

Students were working on a project concerning the delivery of bad news in business communications. At this stage, all students were analyzing the same bad news message (an example from 2015). Before class, they each submitted brief, preliminary evaluations of the message to a discussion board. In class, I queried ChatGPT with the following prompt: “Write an analysis of [company name]’s 2015 video apology for [brief description of event]. Focus the analysis on the company’s adherence to principles of bad news delivery.”

Following a class discussion of the generated analysis, students were placed into groups and asked to reverse-engineer (i.e., revert to outline form) an instructor-provided sample analysis of the bad news message. Students then wrote new prompts that included their bullet point outlines of the sample analysis. A typical prompt looked like this: “Transform the following outline into an analysis of [company name]’s 2015 video apology for [brief description of event]. [pasted bullet point outline]”

Each group shared the analysis generated by their prompt, then we discussed how significant the degree of user guidance in prompts is to the quality of generated text.

Results:

The activity was largely successful, but could be improved. The first part of the activity went smoothly. Students discovered major risks involved with relying on LLMs for information. For instance, students said the inaccuracies were so inconsistent and subtle that they wouldn’t have noticed them if they had not analyzed the video so recently. They also reported that the impressive results overshadowed issues–some said they were so impressed that ChatGPT generated an accurate list of bad news delivery principles that they let their guards down and overlooked major issues mentioned by others in discussion.

In the second part of the activity, students determined that: (1) their latter prompts generated higher quality, more accurate analyses; (2) none of the generated analyses were as good as the reverse-engineered sample; and (3) the amount of detail provided in the outlines was significant to the accuracy of the generated texts – this was true even between sections within a generated analysis.

However, the reverse-engineering process took too long, so some groups generated analyses from incomplete outlines. When I teach this again, I will have students bring the reverse-engineered outlines to class, and focus the final discussion on how the quality of their outlines affects the generated text. I would also ask students to consider whether including additional rhetorical instructions in their prompts (e.g., tone, target audience) improves generated text.

Contact:

Email: knowles[DOT]alanm[AT]gmail[DOT]com
Twitter: https://twitter.com/alanmknowles

Exploring the Utility of LLMs through Prompt (Reverse) Engineering

Leave a Reply Cancel reply

Categories