Amazon Bedrock – 技术博客

Amazon Bedrock is a fully managed service that provides access to Foundation Models (FMs) from third-party providers and Amazon. With Bedrock, you can choose from a variety of models to find the one that best suits your needs.

Its usage scenarios include the following:

1. InvokeModel vs. Converse

There are two primary ways to call models in the Amazon Bedrock Runtime API: InvokeModel and the Converse API (which includes CreateConversation and ConversationInvokeStream for session management).

InvokeModel

Characteristics: Single request-response pattern; stateless; suitable for all supported models.
Use Cases:
- Single queries or tasks, such as text generation, question answering, or summarization.
- Tasks that do not require context memory.
- Batch processing or parallel processing of multiple independent requests.
- Simple API integration.
Example Uses:
- Generating product descriptions.
- Answering standalone questions.
- Text classification or sentiment analysis.
- Code generation or debugging.

Converse API

Characteristics: Multi-turn conversational mode; maintains conversation state and context; supports streaming responses; currently only available for specific models (e.g., Anthropic’s Claude).
Use Cases:
- Applications requiring continuous conversation.
- Scenarios where the model needs to remember previous interactions.
- Applications needing real-time, streaming responses.
- Complex, multi-step tasks.
Example Uses:
- Customer service chatbots.
- Interactive teaching assistants.
- Multi-turn question-answering systems.
- Collaborative writing or programming assistants.

Converse vs. ConverseStream

Converse and ConverseStream are two different API call methods, with the main difference being how responses are handled.

Converse API:
- A synchronous call that returns the complete response at once.
- Suitable for short conversations or scenarios that do not require real-time feedback.
ConverseStream API:
- An asynchronous, streaming call that returns partial responses as they are generated.
- Ideal for long conversations or interactive scenarios requiring immediate feedback.

Recommendations

Use InvokeModel when: You need to handle independent, unrelated requests, and your application does not need to maintain a conversational state.
Use Converse when: You are building a conversational application, need the model to remember previous interactions, and want to receive streaming, real-time responses.

Note that while InvokeModel does not directly support multi-turn dialogues, you can simulate this functionality by managing the conversation history at the application level and including the relevant context in each API call. This method, while flexible, may be less efficient than using the dedicated Converse API.

CloudTrail Logging

Both InvokeModel and Converse API calls are recorded in AWS CloudTrail.

However, the logs do not record the user’s input or the prompt itself; they only record whether the call was successful and the reason for any failures.

When asking a question in the Bedrock Playground, the ConverseStream API is called.

The payload includes the history of questions and answers, and the response is an event stream.

This event is also visible in CloudTrail logs.

2. Text Generation I: Basic Generation

Checking Model Parameters

Different models may require different parameter names. You can find these on the Bedrock Provider page by selecting a provider and then a specific model.

For example, the Titan model requires the topP parameter, whereas an Anthropic Haiku model uses top_p.

Example: Text Generation with Titan

The following Python code demonstrates how to generate text using the amazon.titan-text-express-v1 model.Python

import boto3
import json

def generate_text_with_bedrock(prompt, model_id="amazon.titan-text-express-v1"):
    """
    Generates text using AWS Bedrock's Titan Express model.

    :param prompt: The user's input prompt.
    :param model_id: The model ID to use, default is Titan Express.
    :return: The generated text.
    """
    # Initialize the Bedrock client
    client = boto3.client("bedrock-runtime", region_name="us-west-2")

    # Define the generation payload
    payload = {
        "inputText": prompt,
        "textGenerationConfig": {
            "maxTokenCount": 500,  # Maximum number of tokens to generate
            "temperature": 0.7,  # Controls the randomness of the text
            "topP": 0.9,  # Controls the diversity of the text
            "stopSequences": []  # Defines stop sequences
        }
    }

    try:
        # Invoke the Bedrock service
        response = client.invoke_model(
            modelId=model_id,
            body=json.dumps(payload),
            contentType="application/json"
        )

        # Parse the response
        result = response["body"].read().decode("utf-8")
        return result
    except Exception as e:
        print("Error invoking Bedrock model:", str(e))
        return None

# Example usage
prompt = "Write a short story about a brave knight and a dragon."
generated_text = generate_text_with_bedrock(prompt)

if generated_text:
    print("Generated Text:")
    print(generated_text)

3. Text Generation II: Zero-shot vs. Context-aware

Large Language Models (LLMs) can be used for text generation tasks like creating emails, stories, and social media posts. However, the generated text can sometimes be generic or contain hallucinations.

There are two primary modes for text generation:

Zero-shot: The user provides an input request without any context or examples.
Context-aware: The user provides the LLM with contextual information along with the prompt.

Zero-shot Generation

In this example, we will generate an email response to a customer using only a zero-shot prompt.

Using the Boto3 SDK

The invoke_model API in the Boto3 SDK for Amazon Bedrock accepts parameters such as modelId, accept, contentType, and a body containing the prompt and configuration.Python

import json
import boto3
import botocore

boto3_bedrock = boto3.client('bedrock-runtime')

# Create the prompt
prompt_data = """
Command: Write an email from Bob, Customer Service Manager, to the customer "John Doe" 
who provided negative feedback on the service provided by our customer support 
engineer"""

# The body for Amazon Titan includes the inputText and textGenerationConfig
body = json.dumps({
    "inputText": prompt_data, 
    "textGenerationConfig": {
        "topP": 0.95, 
        "temperature": 0.1
    }
})

modelId = 'amazon.titan-text-express-v1'
accept = 'application/json'
contentType = 'application/json'

try:
    response = boto3_bedrock.invoke_model(
        body=body, 
        modelId=modelId, 
        accept=accept, 
        contentType=contentType
    )
    response_body = json.loads(response.get('body').read())
    outputText = response_body.get('results')[0].get('outputText')
    print(outputText)
except botocore.exceptions.ClientError as error:
    if error.response['Error']['Code'] == 'AccessDeniedException':
           print(f"x1b[41m{error.response['Error']['Message']}n")
    else:
        raise error

Using LangChain with Amazon Bedrock

LangChain is a framework for developing applications powered by language models. It abstracts the Bedrock API, making it easier to build use cases by simply passing a prompt.Python

from langchain.llms.bedrock import Bedrock
import boto3

boto3_bedrock = boto3.client('bedrock-runtime')

inference_modifier = {
    "max_tokens_to_sample": 4096,
    "temperature": 0.5,
    "top_k": 250,
    "top_p": 1,
    "stop_sequences": ["nnHuman"],
}

textgen_llm = Bedrock(
    model_id="anthropic.claude-v2",
    client=boto3_bedrock,
    model_kwargs=inference_modifier,
)

# LangChain abstracts the API call
response = textgen_llm("""
Human: Write an email from Bob, Customer Service Manager, 
to the customer "John Doe" that provided negative feedback on the service 
provided by our customer support engineer.

Assistant:""")

print(response)

Creating a LangChain PromptTemplate

A PromptTemplate allows you to pass different input variables at runtime, which is useful for generating content dynamically.Python

from langchain.prompts import PromptTemplate

# Create a prompt template with multiple input variables
multi_var_prompt = PromptTemplate(
    input_variables=["customerServiceManager", "customerName", "feedbackFromCustomer"], 
    template="""

Human: Create an apology email from the Service Manager {customerServiceManager} to {customerName} in response to the following feedback that was received from the customer: 
<customer_feedback>
{feedbackFromCustomer}
</customer_feedback>

Assistant:"""
)

# Pass values to the input variables
prompt = multi_var_prompt.format(
    customerServiceManager="Bob", 
    customerName="John Doe", 
    feedbackFromCustomer="""Hello Bob,
     I am very disappointed with the recent experience I had when I called your customer support.
     I was expecting an immediate call back but it took three days for us to get a call back.
     The first suggestion to fix the problem was incorrect. Ultimately the problem was fixed after three days.
     We are very unhappy with the response provided and may consider taking our business elsewhere.
     """
)

response = textgen_llm(prompt)
email = response[response.index('n')+1:]
print(email)

4. Text Summarization I: Basic Summarization

Text summarization is a Natural Language Processing (NLP) technique for extracting the most relevant information from a document and presenting it concisely. This is achieved by sending a prompt to a model with an instruction to summarize a given text.

Key challenges include managing documents that exceed token limits and obtaining high-quality summaries.

Example: Text Summarization with Prompts

Here we will send a small amount of text data to the Amazon Bedrock API with an instruction to summarize it. We will use both Amazon Titan and Anthropic Claude models.Python

# Using Amazon Titan
import json
import boto3
import botocore

boto3_bedrock = boto3.client('bedrock-runtime')

prompt = """
Please provide a summary of the following text. Do not add any information that is not mentioned in the text below.

<text>
AWS took all of that feedback from customers, and today we are excited to announce Amazon Bedrock, 
a new service that makes FMs from AI21 Labs, Anthropic, Stability AI, and Amazon accessible via an API. 
Bedrock is the easiest way for customers to build and scale generative AI-based applications using FMs, 
democratizing access for all builders. Bedrock will offer the ability to access a range of powerful FMs 
for text and images—including Amazons Titan FMs, which consist of two new LLMs we’re also announcing 
today—through a scalable, reliable, and secure AWS managed service. With Bedrock’s serverless experience, 
customers can easily find the right model for what they’re trying to get done, get started quickly, privately 
customize FMs with their own data, and easily integrate and deploy them into their applications using the AWS 
tools and capabilities they are familiar with, without having to manage any infrastructure (including integrations 
with Amazon SageMaker ML features like Experiments to test different models and Pipelines to manage their FMs at scale).
</text>
"""

body = json.dumps({
    "inputText": prompt, 
    "textGenerationConfig": {
        "maxTokenCount": 1024,
        "stopSequences": [],
        "temperature": 0,
        "topP": 1
    },
}) 

modelId = 'amazon.titan-tg1-large'
accept = 'application/json'
contentType = 'application/json'

try:
    response = boto3_bedrock.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
    response_body = json.loads(response.get('body').read())
    print(response_body.get('results')[0].get('outputText'))
except botocore.exceptions.ClientError as error:
    # ... error handling ...
    raise error

5. Text Summarization II: Using LangChain for Large Documents

When dealing with large documents, we face challenges like exceeding the model’s context length, model hallucinations, and out-of-memory errors. To address this, we can split the document into smaller chunks and process them sequentially.

LangChain supports several methods for this, such as map_reduce, where each chunk is summarized individually, and then the summaries are combined and summarized again.

This example uses the map_reduce method to summarize a large text file.Python

import boto3
from langchain.llms.bedrock import Bedrock
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.summarize import load_summarize_chain
from io import StringIO
import sys
import textwrap

boto3_bedrock = boto3.client('bedrock-runtime')

# Initialize the LLM
modelId = "amazon.titan-tg1-large"
llm = Bedrock(
    model_id=modelId,
    model_kwargs={
        "maxTokenCount": 4096,
        "temperature": 0,
        "topP": 1,
    },
    client=boto3_bedrock,
)

# Load the document
with open("2022-letter.txt", "r") as file:
    letter = file.read()

# Split the document into chunks
text_splitter = RecursiveCharacterTextSplitter(
    separators=["nn", "n"], chunk_size=4000, chunk_overlap=100
)
docs = text_splitter.create_documents([letter])

# Load and run the summarization chain
summary_chain = load_summarize_chain(llm=llm, chain