Tips and Best Practices to get the most out of Bedrock Prompt Engineering

Rick O'Sullivan
AWS, GenAI'ish
March 25, 2024

Want to harness the full potential of your Large language model? (LLM). Then spend 30 mins (or less) learning the tips and tricks required to craft highly effective prompts. From understanding the fundamental components of a well-structured prompt to mastering advanced techniques like role-playing, XML tags, and few-shot examples, you’ll gain the essential knowledge and skills needed to become a Prompt Engineering pro. Whether you’re just getting started or looking to take your prompts to the next level, this guide will equip you with some practical tips, and real-world examples to create prompts that improve accuracy, are more relevant, and produce better outputs. Get ready to take your Generative AI game to the next level! – Let’s Get Nerdy ! #PromptEngineering #GenerativeAI #LLMBestPractices #Anthropic #AmazonBedrock

What is Amazon Bedrock

We recently wrote a blog post “Introduction into Amazon Bedrock Agents and Knowledge Bases” that dives a little deeper into the subject. For the sake of this blog post, let’s define Amazon Bedrock as a fully managed service and marketplace that makes high-performing foundation models (FMs) available for your use through a unified API. With Amazon Bedrock you can chose from a diverse range of foundation models from leading AI start-ups and Amazon, to find the model that is best suits for your use case. Without the need to manage any infrastructure you can start quickly experimenting with Generative AI by simply calling one of Amazon Bedrock’s APIs along with the relevant ‘Prompt’. In this blog, we will dissect the ‘Prompt’ and provide a few examples and best practices to get the best possible results from your models.

What is Prompt Engineering

Prompt engineering is the process by which we communicate with our large language model (LLM). More specifically it refers to the practice of crafting input prompts using phrases, sentences, punctuation, and separator characters to instruct the LLM on how to operate. We will explain below some of the general guidelines for Amazon Bedrock below by referencing examples and templates from the user guide on Amazon Bedrock from AWS website.

Let’s take a look at the following good example of prompt construction:

The following is text from a restaurant review:

“I finally got to check out Alessandro’s Brilliant Pizza and it is now one of my favorite restaurants in Seattle. The dining room has a beautiful view over the Puget Sound but it was surprisingly not crowed. I ordered the fried Reference text for the task. Castelvetrano olives, a spicy Neapolitan-style pizza and a gnocchi dish. The olives were absolutely decadent, and the pizza came with a smoked mozzarella, which was delicious. The gnocchi was fresh and wonderful. The waitstaff were attentive, and overall the experience was lovely. I hope to return soon.”

Summarize the above restaurant review in one sentence.

When we construct a prompt, we want to ensure it includes the following components

Operational Context – This should be the first part of your prompt and detail how we want the Model to operate with relevant information about the reference data we plan to interpret. If we use our example scenario above this will correspond to the first row (highlighted Orange).
Reference data – This ideally should be the sandwiched in the middle of our prompt (highlighted Green) and will typically be the largest portion of our prompt. Its worth noting that all models have different limits when it comes to the size of the input (called tokens). Here is a quick breakdown for maximum token size window for some of the most popular models currently (as of March 2024) available in Amazon Bedrock:
- Anthropic Claude V2.1 – 200k tokens
- Cohere Command – 4000 tokens
- Meta Llama 2 70B – 4096 tokens
- Amazon Titan Text G1 Express – 8k tokens
Processing Instructions – Finally we want to provide a set of concise instructions on what we want the LLM to do and how we want the output to be structured (highlighted Blue). Note how the instructions are simple and clear and that we specify the form of output (“… in one sentence”). It is generally recommended to place the instruction at the end of your prompt.

Tokens

Tokens are the basic unit of measure that govern the payload size a LLM can process. Each LLM may represent a token differently (image vs text), depending on processing logic but in the context of Text, think of a token as a collection of character’s (4 – 6). This is an important parameter to pay attention too and relatively easy to calculate. Simply divide the number of characters *(including white space) by 6 and this is a generally a close representation of the Token value needed to process your request. Its worth calling out that each LLM will have its own Token limit so make sure to check this as part of your model evaluation process. As an example, at the time this blog was published The Amazon Titan Lite model had a Token limit of 4k and Titan Express has a limit of 8k.

Inference parameters

Inference parameters are ways to control how the LLMs on Amazon Bedrock respond to your prompts. Here is a quick overview of inference parameters that are available on Amazon Bedrock:

Temperature – Value between 0 and 1. Think about it as to how creative you want the model to behave. The lower temperature you set (closer to 0) the more deterministic the responses will be. Conversely, creativity/variety of different responses can be achieved by setting the temperature closer to 1. Generally, it is advices to keep it closer (or equal) to 0 if the workload requires the responses to be deterministic.
Maximum generation length/maximum new tokens – As the name implies this parameter limits the number of tokens that the LLM generates for any prompt. It’s helpful to specify this number as some tasks, such as sentiment classification, don’t need a long answer and you don’t want to generate more tokens than you actually need resulting in overpaying for the response payload.
Top-p – If you set Top-p below 1.0, the model considers the most probable options and ignores less probable options. The result is more stable and repetitive completions. Generally, I found the best range for the top-p parameter to be in the 0.8-1.0 range.
End token/end sequence – The token that the LLM uses to indicate the end of the output. LLMs stop generating new tokens after encountering the end token. Usually this doesn’t need to be set by users.

Additionally, there are model-specific inference parameters such as presence penalty, count penalty, etc. for AI21 Labs Jurassic models or additional Top-k inference parameters for Anthropic Claude models.

With a recent addition of Claude 3 models on Amazon Bedrock, let’s take a look at some of the best practices advices of it’s usage by following Anthropic’s guide:

Using the System Prompt

Looking at Anthropic’s documentation a System prompt is described as follows;

A system prompt is a way to provide context, instructions, and guidelines to Claude before presenting it with a question or task. By using a system prompt, you can set the stage for the conversation, specifying Claude’s role, personality, tone, or any other relevant information that will help it better understand and respond to the user’s input.

System prompts can include:

Task instructions and objectives
Personality traits, roles, and tone guidelines
Contextual information for the user input
Creativity constraints and style guidance
External knowledge, data, or reference material
Rules, guidelines, and guardrails
Output verification standards and requirements

The general rule here is to only specify things that will improve the output of your LLM response. One often used and very effective technique for improving the performance and tailoring the outputs of Claude is to assign it a specific role through the system prompt. By giving Claude a defined role to play, you can prime it to respond in a more accurate and contextually appropriate way for the task at hand.

When to Use Role Prompts:

– For highly technical tasks like solving complex math problems or writing code, assigning a relevant role like “math tutor” or “senior software engineer” can significantly boost Claude’s performance.

– To achieve a desired communication style, voice, and tone, you can cast Claude in roles like “kindergarten teacher,” “motivational coach,” or “literary critic.”

– To potentially enhance Claude’s baseline capabilities even for general tasks, role prompting is a low-cost way to try boosting its performance.

Tips for Role Prompting:

– Be as specific as possible with the role details to give Claude clear context

– Experiment with different roles and prompt variations

– Don’t be afraid to get creative with roleplaying scenarios

Examples of Effective Role Prompts:

Applying the system prompt is simply a matter of defining your rules at the beginning of the prompt. It does not require any special technique although some users prefer to include the system: context before the user input to help keep their prompt templates structured.

The following are a few examples of using the Role prompt;

Logic Puzzle: “You are a master logic bot designed to solve even the most complex logic puzzles…”

Explaining Concepts: “You are a kindergarten teacher. Explain to your students why the sky is blue in a simple, understandable way…”

Writing Code: “You are a senior Python engineer. Write clean, well-documented code for a function that…”

Using XML Tags

XML tags are another powerful way to interpolate data in Prompts and keep things organised. By wrapping key parts of your prompt (such as instructions, examples, or input data) in XML tags, you can help Claude better understand the context and generate more accurate outputs. Claude is particularly good at interpreting XML tags and you will find this to be a nice shortcut for controlling response outputs and minimizing regex logic (more on this in another blog).

Rules for using XML Tags

XML tags are angle-bracket tags like <tag></tag>
They come in pairs and should always be represented as such in prompts
Opening and closing XML tags should share exactly the same name

I’ve included a sample prompt that contains XML tags in the next section, but if your looking for more samples then take a look at the Anthropic docs here.

The Power of Examples

Another popular technique for prompt engineers is to provide your LLM with examples of the desired output or task being performed. This is often referred to as ‘Few-Shot’ prompting. Providing relevant examples essentially demonstrates the type of response you want, allowing the LLM to identify patterns and generalize to new inputs more effectively.

Benefits of Using Examples:

– Improved accuracy by clearly illustrating expectations

– Increased consistency by providing a template to follow

– Enhanced performance on complex or structured tasks Guidelines for Crafting Examples:

– Ensure examples are highly relevant to the actual use case

– Include a diversity of examples covering different scenarios

– Use formatting tags like <example> to clearly designate examples

– Aim for 3-5 examples to start, adding more as needed Example Usage:

Using Prompt Examples:

You are a Natural Language processing engine and your job is to analyse text to determine language. Analyse the text provided in the <text></text> tags and provide an assessment of the Language used. The output should be contained in <response></response> tags and should strictly follow the examples provided in the <example></example> tags.

Input: The quick brown fox jumps over the lazy dog.

Output: The sentence uses all letters in the English alphabet.

</example>

Input: Pack my box with five dozen liquor jugs.

Output: The sentence uses all letters in the English alphabet.

</example>

<text>

Jack and Jill ran up the hill to fetch a pale of water.

</text>

By including targeted examples like these in the prompt, you provide Claude with a strong signal for the desired output structure and rules to follow.

Final thoughts

I encourage you to go the Amazon Bedrock console and experiment as I find tweaking the parameters and prompts in the “Playground” section where you can experiment in chat, text or image generation environment. You can also see a lot of examples under the “Getting started” section in the console.

Some good reference materials I recommend to take a look at if you want to learn more:

Prompt templates and examples for Amazon Bedrock text models
re:Invent session on Prompt engineering best practices for LLMs on Amazon Bedrock and corresponding YouTube video
GenAI Movie Poster Moderator

Stay tuned as we will explore some Media & Entertainment specific examples of prompt engineering soon!

I work for AWS but all of the opinions, ideas and solution implementations in my blogs are purely of my own.

Rick O'Sullivan

Rick is a Senior Media and Entertainment Solutions Architect based in Sydney, Australia. Rick spends his time working with Australia's largest Media and Publication customers helping them bring News, Sports and Drama to your home. Rick is a dedicated Father and Husband and in his spare time, builds GenAI applications, plays a few instruments ( poorly ) and dabbles with DJ'ing and video editing.

1 thought on “Tips and Best Practices to get the most out of Bedrock Prompt Engineering”

Pingback: Using Amazon Bedrock Knowledge Bases to power up sports stats – GetNerdyin30

Tips and Best Practices to get the most out of Bedrock Prompt Engineering

Rick O'Sullivan

1 thought on “Tips and Best Practices to get the most out of Bedrock Prompt Engineering”

Leave a Comment Cancel Reply

Top Posts

Using Amazon Bedrock Agents to augment RAG