ChatGPT is unpredictable in text analysis and extraction. Can this be fixed?

Yes, ChatGPT is excellent in text extraction and analysis. You can tune it to produce predictable results.

✔️ Read prompt engineering guide

If you haven't done so already, the next 30 minutes could help to save you a few hours in the next days.

Don't waste your time on the entire guide, though. Just focus on:

Introduction - all sections
Zero-shot prompting
Few-shot prompting (or multi-shot)

✔️ Use ChatGPT API

Use ChatGPT via API calls, instead of WebUI. This allows you to set a custom system prompt, temperature and provide precise user/assistant prompts.

✔️ Set temperature parameter to 0

Temperature controls creativity (probability of picking less probable token), and we don't want that in text analysis.

✔️ Provide multi-shot examples

ChatGPT was fine-tuned on user-assistant interactions, so set up multi-shot prompts by providing it with examples of expected interactions. Use user and assistant content types in ChatGPT API for that. This makes a huge difference!

✔️ Stick to English

Yes, ChatGPT can speak many languages. However, it generally is crippled on any language other that English.

This is because:

it was primarily trained on English datasets
it is smarter, faster and cheaper with English, because text tokenisers are based on English vocabularies. Given 100 tokens, you can fit there ~70 words in English and ~20 words in any other language.

If you get bad results in non-English language, try these options:

write prompts in English
Translate your data to English before asking ChatGPT and translate the response back to your language afterwards

✔️ Work with datasets

When crafting prompts, don't measure their efficiency on a single test case at a time. Create a dataset of 10-100 items, run prompt against it and measure overall accuracy.

If you spot some repeating error - extract an example and add it to the list of examples within your prompt: "Here is how you should act in this case." This helps a lot!

By gathering even more data you'll spot cases where the model is not working good enough. Group these cases by types of errors and extract a representative sample (by the way, ChatGPT is good with that) and add yet another multi-shot example.

And at every single step - benchmark and measure!

✔️ Ask for the confidence ratings

One thing that helps me a lot in creating stable LLM-driven pipelines - I ask to provide an answer AND a confidence rating. Rating usually comes from 1 to 5, with 5 being the most confident.

While running pipelines, I can quickly filter results without re-running expensive and slow prompts. Sometimes only the results with rating of 5 are good, other times I can accept things all the way down to 3.

✔️ Use LLM prompts to merge and double-check results

Let's say you are extracting a single bit of information from multiple conflicting sources. How do you arrive to a single value that will be presented to a human?

You can gather these results into a single prompt and ask ChatGPT to merge these results into a single value, with a confidence rating of its own.

You can even let extraction script be creative, while the double-checking script can be pedantic and with a great attention to the detail.

Summary

There are multiple ways to stabilise data extraction and processing pipelines with ChatGPT.

No matter what you do, please benchmark your prompts and any changes to them!

Published: April 29, 2023.

Next post in Ship with ChatGPT story: My team has no experience with ML/GPT. How do we proceed?

🤗 Check out my newsletter! It is about building products with ChatGPT and LLMs: latest news, technical insights and my journey. Check out it out