Amazon AWS – Page 49

Introducing Stable Diffusion 3.5 Large in Amazon SageMaker JumpStart

November 14, 2024

by Tom Yemington Amazon AWS

We are excited to announce the availability of Stability AI’s latest and most advanced text-to-image model, Stable Diffusion 3.5 Large, in Amazon SageMaker JumpStart. This new cutting-edge image generation model, which was trained on Amazon SageMaker HyperPod, empowers AWS customers to generate high-quality images from text descriptions with unprecedented ease, flexibility, and creative potential. By adding Stable Diffusion 3.5 Large to SageMaker JumpStart, we’re taking another significant step towards democratizing access to advanced AI technologies and enabling businesses of all sizes to harness the power of generative AI.

In this post, we provide an implementation guide for subscribing to Stable Diffusion 3.5 Large in SageMaker JumpStart, deploying the model in Amazon SageMaker Studio, and generating images using text-to-image prompts.

Stable Diffusion 3.5 Large capabilities and use cases

At 8.1 billion parameters, with superior quality and prompt adherence, Stable Diffusion 3.5 Large is the most powerful model in the Stable Diffusion family. The model excels at creating diverse, high-quality images across a wide range of styles, making it an excellent tool for media, gaming, advertising, ecommerce, corporate training, retail, and education. For ideation, Stable Diffusion 3.5 Large can accelerate storyboarding, concept art creation, and rapid prototyping of visual effects. For production, you can quickly generate high-quality 1-megapixel images for campaigns, social media posts, and advertisements, saving time and resources while maintaining creative control.

Stable Diffusion 3.5 Large offers users nearly endless creative possibilities, including:

Enhanced creativity and photorealism – You can generate exceptional visuals with highly detailed 3D imagery that include fine details like lighting and textures.
Exceptional multi-subject proficiency – It offers unrivaled capabilities in generating images with multiple subjects, which is ideal for creating complex scenes.
Increased efficiency – Fast, accurate, and quality content production streamlines operations, saving time and money. Despite its power and complexity, Stable Diffusion 3.5 Large is optimized for efficiency, providing accessibility and ease of use across a broad audience.

Solution overview

With SageMaker JumpStart, you can choose from a broad selection of publicly available foundation models (FMs). ML practitioners can deploy FMs to dedicated SageMaker instances from a network isolated environment and customize models using Amazon SageMaker for model training and deployment. You can now discover and deploy the Stable Diffusion 3.5 large model with a few clicks in SageMaker Studio or programmatically through the SageMaker Python SDK, enabling you to derive model performance and MLOps controls with SageMaker features such as Amazon SageMaker Pipelines, Amazon SageMaker Debugger, or container logs. The model is deployed in an AWS secure environment and under your virtual private cloud (VPC) controls, helping provide data security.

The Stable Diffusion 3.5 Large model is available today in the following AWS Regions: US East (N. Virginia, Ohio), US West (Oregon), Asia Pacific (Osaka, Hong Kong), China (Beijing), Middle East (Bahrain), Africa (Cape Town), and Europe (Milan, Stockholm).

SageMaker Studio is an integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform all machine learning (ML) development steps, from preparing data to building, training, and deploying your ML models. For more details on how to get started and set up SageMaker Studio, refer to Amazon SageMaker Studio.

Prerequisites

Make sure that your AWS Identity and Access Management (IAM) role has AmazonSageMakerFullAccess. To successfully deploy the model, confirm that your IAM role has the following three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used:

aws-marketplace:ViewSubscriptions
aws-marketplace:Unsubscribe
aws-marketplace:Subscribe

Subscribe to the Stable Diffusion 3.5 Large model package

You can access SageMaker JumpStart through the SageMaker Studio Home page by selecting JumpStart in the Prebuilt and automated solutions section. The JumpStart landing page allows you to explore various resources including solutions, models, and notebooks. You can search for a particular provider. In this following screenshot, we are looking at all the models by Stability AI on SageMaker JumpStart.

Each model is presented with a model card containing key information such as the model name, fine-tuning capability, provider, and a brief description. To find the Stable Diffusion 3.5L model, you can either browse the Foundation Model: Image Generation carousel or use the search function. Select Stable Diffusion 3.5 Large.

Next, we will subscribe to Stable Diffusion 3.5 Large, follow these steps:

Open the model listing page in AWS Marketplace using the link available from the example notebook in SageMaker JumpStart.
On the listing, choose Continue to subscribe.
On the Subscribe to this software page, review and choose Accept Offer if you and your organization accept the EULA, pricing, and support terms.
Choose Continue to configuration to start configuring your model.
Choose a supported Region, and you will see the model package Amazon Resource Name (ARN) that you need to specify when creating an endpoint.

Note: If you don’t have the necessary permissions to view or subscribe to the model, reach out to your AWS administrator or procurement point of contact. Many enterprises may limit AWS Marketplace permissions to control the actions that someone can take in the AWS Marketplace Management Portal.

Deploy the model in SageMaker Studio

Now you’re prepared to follow the notebook example from Stability AI’s GitHub repository to create an endpoint (with the model package ARN from AWS Marketplace) and create a deployable ModelPackage.

For Stable Diffusion 3.5 Large, you’ll need to deploy on an Amazon Elastic Compute Cloud (Amazon EC2) ml.p5.48xlarge instance.

Generate images with a text prompt

Refer to the Stable Diffusion 3.5 Large documentation for more details. From the example notebook, the code to generate an image is as follows:

sm_runtime = boto3.client("sagemaker-runtime")

params = {
    "prompt": " Photography, pink rose flowers in the twilight, glowing, tile houses in the background.",
    "seed": 101,
    "aspect_ratio": "21:9",
    "output_format": "jpeg",
}

payload = json.dumps(params).encode("utf-8")

response = sm_runtime.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="application/json",
    Accept="application/json",
    Body=payload,
)

out = json.loads(response["Body"].read().decode("utf-8"))
try:
    base64_string = out["body"]["images"][0]
    image_data = base64.b64decode(base64_string)
    image = Image.open(io.BytesIO(image_data))
    display(image)

except:
    print(out)

The following are examples of images generated from different prompts.

Prompt:

Photography, pink rose flowers in the twilight, glowing, tile houses in the background.

Prompt:

The word “AWS x Stability” in a thick, blocky script surrounded by roots and vines against a solid white background. The scene is lit by flat light, creating a reflective scene with a minimal color palette. Quilling style.

Prompt:

Expressionist painting, side profile of a silhouette of a student seated at a desk, absorbed in reading a book. Her thoughts artistically connect to the stars and the vast universe, symbolizing the expansion of knowledge and a boundless mind.

Prompt:

High-energy street scene in a neon-lit Tokyo alley at night, where steam rises from food carts, and colorful neon signs illuminate the rain-slicked pavement.

Prompt:

3D animation scene of an adventurer traveling the world with his pet dog.

Clean up

When you’ve finished working, you can delete the endpoint to release the EC2 instances associated with it and stop billing.

Get your list of SageMaker endpoints using the AWS Command Line Interface (AWS CLI) as follows:

!aws sagemaker list-endpoints

Then delete the endpoints:

deployed_model.sagemaker_session.delete_endpoint(endpoint_name)

Conclusion

In this post, we walked through subscribing to Stable Diffusion 3.5 Large in SageMaker JumpStart, deploying the model in SageMaker Studio, and generating of a variety of images with Stability AI’s latest text-to-image model.

Start creating amazing images today with Stable Diffusion 3.5 Large on SageMaker JumpStart. To learn more about SageMaker JumpStart, see SageMaker JumpStart pretrained models, Amazon SageMaker JumpStart Foundation Models, and Getting started with Amazon SageMaker JumpStart.

If you’d like to explore advanced prompt engineering techniques that can enhance the performance of text-to-image models from Stability AI and facilitate the creation of compelling imagery, see Understanding prompt engineering: Unlock the creative potential of Stability AI models on AWS.

About the Authors

Tom Yemington is a Senior GenAI Models Specialist focused on helping model providers and customers scale generative AI solutions in AWS. Tom is a Certified Information Systems Security Professional (CISSP). Outside of work, you can find Tom racing vintage cars or teaching people how to race as an instructor at track-day events.

Isha Dua is a Senior Solutions Architect based in the San Francisco Bay Area working with GENAI Model providers and helping customer optimize their GENAI workloads on AWS. She helps enterprise customers grow by understanding their goals and challenges, and guides them on how they can architect their applications in a cloud-native manner while ensuring resilience and scalability. She’s passionate about machine learning technologies and environmental sustainability.

Boshi Huang is a Senior Applied Scientist in Generative AI at Amazon Web Services, where he collaborates with customers to develop and implement generative AI solutions. Boshi’s research focuses on advancing the field of generative AI through automatic prompt engineering, adversarial attack and defense mechanisms, inference acceleration, and developing methods for responsible and reliable visual content generation.

Improve governance of models with Amazon SageMaker unified Model Cards and Model Registry

November 13, 2024

by Ram Vittal Amazon AWS

You can now register machine learning (ML) models in Amazon SageMaker Model Registry with Amazon SageMaker Model Cards, making it straightforward to manage governance information for specific model versions directly in SageMaker Model Registry in just a few clicks.

Model cards are an essential component for registered ML models, providing a standardized way to document and communicate key model metadata, including intended use, performance, risks, and business information. This transparency is particularly important for registered models, which are often deployed in high-stakes or regulated industries, such as financial services and healthcare. By including detailed model cards, organizations can establish the responsible development of their ML systems, enabling better-informed decisions by the governance team.

When solving a business problem with an ML model, customers want to refine their approach and register multiple versions of the model in SageMaker Model Registry to find the best candidate model. To effectively operationalize and govern these various model versions, customers want the ability to clearly associate model cards with a particular model version. This lack of a unified user experience posed challenges for customers, who needed a more streamlined way to register and govern their models.

Because SageMaker Model Cards and SageMaker Model Registry were built on separate APIs, it was challenging to associate the model information and gain a comprehensive view of the model development lifecycle. Integrating model information and then sharing it across different stages became increasingly difficult. This required custom integration efforts, along with complex AWS Identity and Access Management (IAM) policy management, further complicating the model governance process.

With the unification of SageMaker Model Cards and SageMaker Model Registry, architects, data scientists, ML engineers, or platform engineers (depending on the organization’s hierarchy) can now seamlessly register ML model versions early in the development lifecycle, including essential business details and technical metadata. This unification allows you to review and govern models across your lifecycle from a single place in SageMaker Model Registry. By consolidating model governance workflows in SageMaker Model Registry, you can improve transparency and streamline the deployment of models to production environments upon governance officers’ approval.

In this post, we discuss a new feature that supports the integration of model cards with the model registry. We discuss the solution architecture and best practices for managing model cards with a registered model version, and walk through how to set up, operationalize, and govern your models using the integration in the model registry.

Solution overview

In this section, we discuss the solution to address the aforementioned challenges with model governance. First, we introduce the unified model governance solution architecture for addressing the model governance challenges for an end-to-end ML lifecycle in a scalable, well-architected environment. Then we dive deep into the details of the unified model registry and discuss how it helps with governance and deployment workflows.

Unified model governance architecture

ML governance enforces the ethical, legal, and efficient use of ML systems by addressing concerns like bias, transparency, explainability, and accountability. It helps organizations comply with regulations, manage risks, and maintain operational efficiency through robust model lifecycles and data quality management. Ultimately, ML governance builds stakeholder trust and aligns ML initiatives with strategic business goals, maximizing their value and impact. ML governance starts when you want to solve a business use case or problem with ML and is part of every step of your ML lifecycle, from use case inception, model building, training, evaluation, deployment, and monitoring of your production ML system.

Let’s delve into the architecture details of how you can use a unified model registry along with other AWS services to govern your ML use case and models throughout the entire ML lifecycle.

SageMaker Model Registry catalogs your models along with their versions and associated metadata and metrics for training and evaluation. It also maintains audit and inference metadata to help drive governance and deployment workflows.

The following are key concepts used in the model registry:

Model package group – A model package group or model group solves a business problem with an ML model (for this example, we use the model CustomerChurn). This model group contains all the model versions associated with that ML model.
Model package version – A model package version or model version is a registered model version that includes the model artifacts and inference code for the model.
Registered model – This is the model group that is registered in SageMaker Model Registry.
Deployable model – This is the model version that is deployable to an inference endpoint.

Additionally, this solution uses Amazon DataZone. With the integration of SageMaker and Amazon DataZone, it enables collaboration between ML builders and data engineers for building ML use cases. ML builders can request access to data published by data engineers. Upon receiving approval, ML builders can then consume the accessed data to engineer features, create models, and publish features and models to the Amazon DataZone catalog for sharing across the enterprise. As part of the SageMaker Model Cards and SageMaker Model Registry unification, ML builders can now share technical and business information about their models, including training and evaluation details, as well as business metadata such as model risk, for ML use cases.

The following diagram depicts the architecture for unified governance across your ML lifecycle.

There are several for implementing secure and scalable end-to-end governance for your ML lifecycle:

Define your ML use case metadata (name, description, risk, and so on) for the business problem you’re trying to solve (for example, automate a loan application process).
Set up and invoke your use case approval workflow for building the ML model (for example, fraud detection) for the use case.
Create an ML project to create a model for the ML use case.
Create a SageMaker model package group to start building the model. Associate the model to the ML project and record qualitative information about the model, such as purpose, assumptions, and owner.
Prepare the data to build your model training pipeline.
Evaluate your training data for data quality, including feature importance and bias, and update the model package version with relevant evaluation metrics.
Train your ML model with the prepared data and register the candidate model package version with training metrics.
Evaluate your trained model for model bias and model drift, and update the model package version with relevant evaluation metrics.
Validate that the candidate model experimentation results meet your model governance criteria based on your use case risk profile and compliance requirements.
After you receive the governance team’s approval on the candidate model, record the approval on the model package version and invoke an automated test deployment pipeline to deploy the model to a test environment.
Run model validation tests in a test environment and make sure the model integrates and works with upstream and downstream dependencies similar to a production environment.
After you validate the model in the test environment and make sure the model complies with use case requirements, approve the model for production deployment.
After you deploy the model to the production environment, continuously monitor model performance metrics (such as quality and bias) to make sure the model stays in compliance and meets your business use case key performance indicators (KPIs).

Architecture tools, components, and environments

You need to set up several components and environments for orchestrating the solution workflow:

AI governance tooling – This tooling should be hosted in an isolated environment (a separate AWS account) where your key AI/ML governance stakeholders can set up and operate approval workflows for governing AI/ML use cases across your organization, lines of business, and teams.
Data governance – This tooling should be hosted in an isolated environment to centralize data governance functions such as setting up data access policies and governing data access for AI/ML use cases across your organization, lines of business, and teams.
ML shared services – ML shared services components should be hosted in an isolated environment to centralize model governance functions such as accountability through workflows and approvals, transparency through centralized model metadata, and reproducibility through centralized model lineage for AI/ML use cases across your organization, lines of business, and teams.
ML development – This phase of the ML lifecycle should be hosted in an isolated environment for model experimentation and building the candidate model. Several activities are performed in this phase, such as creating the model, data preparation, model training, evaluation, and model registration.
ML pre-production – This phase of ML lifecycle should be hosted in an isolated environment for integrating the testing the candidate model with the ML system and validating that the results comply with the model and use case requirements. The candidate model that was built in the ML development phase is deployed to an endpoint for integration testing and validation.
ML production – This phase of the ML lifecycle should be hosted in an isolated environment for deploying the model to a production endpoint for shadow testing and A/B testing, and for gradually rolling out the model for operations in a production environment.

Integrate a model version in the model registry with model cards

In this section, we provide API implementation details for testing this in your own environment. We walk through an example notebook to demonstrate how you can use this unification during the model development data science lifecycle.

We have two example notebooks in GitHub repository: AbaloneExample and DirectMarketing.

Complete the following steps in the above Abalone example notebook:

Install or update the necessary packages and library.
Import the necessary library and instantiate the necessary variables like SageMaker client and Amazon Simple Storage Service (Amazon S3) buckets.
Create an Amazon DataZone domain and a project within the domain.

You can use an existing project if you already have one. This is an optional step and we will be referencing the Amazon DataZone project ID while creating the SageMaker model package. For overall governance between your data and the model lifecycle, this can help create the correlation between business unit/domain, data and corresponding model.

The following screenshot shows the Amazon DataZone welcome page for a test domain.

In Amazon DataZone, projects enable a group of users to collaborate on various business use cases that involve creating assets in project inventories and thereby making them discoverable by all project members, and then publishing, discovering, subscribing to, and consuming assets in the Amazon DataZone catalog. Project members consume assets from the Amazon DataZone catalog and produce new assets using one or more analytical workflows. Project members can be owners or contributors.

You can gather the project ID on the project details page, as shown in the following screenshot.

In the notebook, we refer to the project ID as follows:

project_id = "5rn1teh0tv85rb"

Prepare a SageMaker model package group.

A model group contains a group of versioned models. We refer to the Amazon DataZone project ID when we create the model package group, as shown in the following screenshot. It’s mapped to the custom_details field.

Update the details for the model card, including the intended use and owner:

model_overview = ModelOverview(
    #model_description="This is an example model used for a Python SDK demo of unified Amazon SageMaker Model Registry and Model Cards.",
    #problem_type="Binary Classification",
    #algorithm_type="Logistic Regression",
    model_creator="DEMO-Model-Registry-ModelCard-Unification",
    #model_owner="datascienceteam",
)
intended_uses = IntendedUses(
    purpose_of_model="Test model card.",
    intended_uses="Not used except this test.",
    factors_affecting_model_efficiency="No.",
    risk_rating=RiskRatingEnum.LOW,
    explanations_for_risk_rating="Just an example.",
)
business_details = BusinessDetails(
    business_problem="The business problem that your model is used to solve.",
    business_stakeholders="The stakeholders who have the interest in the business that your model is used for.",
    line_of_business="Services that the business is offering.",
)
additional_information = AdditionalInformation(
    ethical_considerations="Your model ethical consideration.",
    caveats_and_recommendations="Your model's caveats and recommendations.",
    custom_details={"custom details1": "details value"},
)
my_card = ModelCard(
    name="mr-mc-unification",
    status=ModelCardStatusEnum.DRAFT,
    model_overview=model_overview,
    intended_uses=intended_uses,
    business_details=business_details,
    additional_information=additional_information,
    sagemaker_session=sagemaker_session,
)

This data is used to update the created model package. The SageMaker model package helps create a deployable model that you can use to get real-time inferences by creating a hosted endpoint or to run batch transform jobs.

The model card information shown as model_card=my_card in the following code snippet can be passed to the pipeline during the model register step:

register_args = model.register(
    content_types=["text/csv"],
    response_types=["text/csv"],
    inference_instances=["ml.t2.medium", "ml.m5.large"],
    transform_instances=["ml.m5.large"],
    model_package_group_name=model_package_group_name,
    approval_status=model_approval_status,
    model_metrics=model_metrics,
    drift_check_baselines=drift_check_baselines,
    model_card=my_card
)

step_register = ModelStep(name="RegisterAbaloneModel", step_args=register_args)

Alternatively, you can pass it as follows:

step_register = RegisterModel(
    name="MarketingRegisterModel",
    estimator=xgb_train,
    model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
    content_types=["text/csv"],
    response_types=["text/csv"],
    inference_instances=["ml.t2.medium", "ml.m5.xlarge"],
    transform_instances=["ml.m5.xlarge"],
    model_package_group_name=model_package_group_name,
    approval_status=model_approval_status,
    model_metrics=model_metrics,
    model_card=my_card
)

The notebook will invoke a run of the SageMaker pipeline (which can also be invoked from an event or from the pipelines UI), which includes preprocessing, training, and evaluation.

After the pipeline is complete, you can navigate to Amazon SageMaker Studio, where you can see a model package on the Models page.

You can view the details like business details, intended use, and more on the Overview tab under Audit, as shown in the following screenshots.

The Amazon DataZone project ID is captured in the Documentation section.

You can view performance metrics under Train as well.

Evaluation details like model quality, bias pre-training, bias post-training, and explainability can be reviewed on the Evaluate tab.

Optionally, you can view the model card details from the model package itself.

Additionally, you can update the audit details of the model by choosing Edit in the top right corner. Once you are done with your changes, choose Save to keep the changes in the model card.

Also, you can update the model’s deploy status.

You can track the different statuses and activity as well.

Lineage

ML lineage is crucial for tracking the origin, evolution, and dependencies of data, models, and code used in ML workflows, providing transparency and traceability. It helps with reproducibility and debugging, making it straightforward to understand and address issues.

Model lineage tracking captures and retains information about the stages of an ML workflow, from data preparation and training to model registration and deployment. You can view the lineage details of a registered model version in SageMaker Model Registry using SageMaker ML lineage tracking, as shown in the following screenshot. ML model lineage tracks the metadata associated with your model training and deployment workflows, including training jobs, datasets used, pipelines, endpoints, and the actual models. You can also use the graph node to view more details, such as dataset and images used in that step.

Clean up

If you created resources while using the notebook in this post, follow the instructions in the notebook to clean up those resources.

Conclusion

In this post, we discussed a solution to use a unified model registry with other AWS services to govern your ML use case and models throughout the entire ML lifecycle in your organization. We walked through an end-to-end architecture for developing an AI use case embedding governance controls, from use case inception to model building, model validation, and model deployment in production. We demonstrated through code how to register a model and update it with governance, technical, and business metadata in SageMaker Model Registry.

We encourage you to try out this solution and share your feedback in the comments section.

About the authors

Ram Vittal is a Principal ML Solutions Architect at AWS. He has over 3 decades of experience architecting and building distributed, hybrid, and cloud applications. He is passionate about building secure and scalable AI/ML and big data solutions to help enterprise customers with their cloud adoption and optimization journey to improve their business outcomes. In his spare time, he rides his motorcycle and walks with his 3-year-old Sheepadoodle.

Neelam Koshiya is principal solutions architect (GenAI specialist) at AWS. With a background in software engineering, she moved organically into an architecture role. Her current focus is to help enterprise customers with their ML/ GenAI journeys for strategic business outcomes. Her area of depth is machine learning. In her spare time, she enjoys reading and being outdoors.

Siamak Nariman is a Senior Product Manager at AWS. He is focused on AI/ML technology, ML model management, and ML governance to improve overall organizational efficiency and productivity. He has extensive experience automating processes and deploying various technologies.

Saumitra Vikaram is a Senior Software Engineer at AWS. He is focused on AI/ML technology, ML model management, ML governance, and MLOps to improve overall organizational efficiency and productivity.

Transcribe, translate, and summarize live streams in your browser with AWS AI and generative AI services

November 13, 2024

by Luca Guida Amazon AWS

Live streaming has been gaining immense popularity in recent years, attracting an ever-growing number of viewers and content creators across various platforms. From gaming and entertainment to education and corporate events, live streams have become a powerful medium for real-time engagement and content consumption. However, as the reach of live streams expands globally, language barriers and accessibility challenges have emerged, limiting the ability of viewers to fully comprehend and participate in these immersive experiences.

Recognizing this need, we have developed a Chrome extension that harnesses the power of AWS AI and generative AI services, including Amazon Bedrock, an AWS managed service to build and scale generative AI applications with foundation models (FMs). This extension aims to revolutionize the live streaming experience by providing real-time transcription, translation, and summarization capabilities directly within your browser.

With this extension, viewers can seamlessly transcribe live streams into text, enabling them to follow along with the content even in noisy environments or when listening to audio is not feasible. Moreover, the extension’s translation capabilities open up live streams to a global audience, breaking down language barriers and fostering more inclusive participation. By offering real-time translations into multiple languages, viewers from around the world can engage with live content as if it were delivered in their first language.

In addition, the extension’s capabilities extend beyond mere transcription and translation. Using the advanced natural language processing and summarization capabilities of FMs available through Amazon Bedrock, the extension can generate concise summaries of the content being transcribed in real time. This innovative feature empowers viewers to catch up with what is being presented, making it simpler to grasp key points and highlights, even if they have missed portions of the live stream or find it challenging to follow complex discussions.

In this post, we explore the approach behind building this powerful extension and provide step-by-step instructions to deploy and use it in your browser.

Solution overview

The solution is powered by two AWS AI services, Amazon Transcribe and Amazon Translate, along with Amazon Bedrock, a fully managed service that allows you to build generative AI applications. The solution also uses Amazon Cognito user pools and identity pools for managing authentication and authorization of users, Amazon API Gateway REST APIs, AWS Lambda functions, and an Amazon Simple Storage Service (Amazon S3) bucket.

After deploying the solution, you can access the following features:

Live transcription and translation – The Chrome extension transcribes and translates audio streams for you in real time using Amazon Transcribe, an automatic speech recognition service. This feature also integrates with Amazon Transcribe automatic language identification for streaming transcriptions—with a minimum of 3 seconds of audio, the service can automatically detect the dominant language and generate a transcript without you having to specify the spoken language.
Summarization – The Chrome extension uses FMs such as Anthropic’s Claude 3 models on Amazon Bedrock to summarize content being transcribed, so you can grasp key ideas of your live stream by reading the summary.

Live transcription is currently available in the over 50 languages currently supported by Amazon Transcribe streaming (Chinese, English, French, German, Hindi, Italian, Japanese, Korean, Brazilian Portuguese, Spanish, and Thai), while translation is available in the over 75 languages currently supported by Amazon Translate.

The following diagram illustrates the architecture of the application.

The solution workflow includes the following steps:

A Chrome browser is used to access the desired live streamed content, and the extension is activated and displayed as a side panel. The extension delivers a web application implemented using the AWS SDK for JavaScript and the AWS Amplify JavaScript library.
The user signs in by entering a user name and a password. Authentication is performed against the Amazon Cognito user pool. After a successful login, the Amazon Cognito identity pool is used to provide the user with the temporary AWS credentials required to access application features. For more details about the authentication and authorization flows, refer to Accessing AWS services using an identity pool after sign-in.
The extension interacts with Amazon Transcribe (StartStreamTranscription operation), Amazon Translate (TranslateText operation), and Amazon Bedrock (InvokeModel operation). Interactions with Amazon Bedrock are handled by a Lambda function, which implements the application logic underlying an API made available using API Gateway.
The user is provided with the transcription, translation, and summary of the content playing inside the browser tab. The summary is stored inside an S3 bucket, which can be emptied using the extension’s Clean Up feature.

In the following sections, we walk through how to deploy the Chrome extension and the underlying backend resources and set up the extension, then we demonstrate using the extension in a sample use case.

Prerequisites

For this walkthrough, you should have the following prerequisites:

A computer with Google Chrome installed on it
An AWS account
Access to one or more Amazon Bedrock models (for more information, see Access Amazon Bedrock foundation models)
An AWS Identity and Access Management (IAM) user with the AdministratorAccess policy granted (for production, we recommend restricting access as needed)
The AWS Command Line Interface (AWS CLI) installed and configured to use with your AWS account
The AWS CDK CLI installed
js and npm installed

Deploy the backend

The first step consists of deploying an AWS Cloud Development Kit (AWS CDK) application that automatically provisions and configures the required AWS resources, including:

An Amazon Cognito user pool and identity pool that allow user authentication
An S3 bucket, where transcription summaries are stored
Lambda functions that interact with Amazon Bedrock to perform content summarization
IAM roles that are associated with the identity pool and have permissions required to access AWS services

Complete the following steps to deploy the AWS CDK application:

Using a command line interface (Linux shell, macOS Terminal, Windows command prompt or PowerShell), clone the GitHub repository to a local directory, then open the directory:

git clone https://github.com/aws-samples/aws-transcribe-translate-summarize-live-streams-in-browser.git
cd aws-transcribe-translate-summarize-live-streams-in-browser

Open the cdk/bin/config.json file and populate the following configuration variables:

{
    "prefix": "aaa123",
    "aws_region": "us-west-2",
    "bedrock_region": "us-west-2",
    "bucket_name": "summarization-test",
    "bedrock_model_id": "anthropic.claude-3-sonnet-20240229-v1:0"
}

The template launches in the us-east-2 AWS Region by default. To launch the solution in a different Region, change the aws_region parameter accordingly. Make sure to select a Region in which all the AWS services in scope (Amazon Transcribe, Amazon Translate, Amazon Bedrock, Amazon Cognito, API Gateway, Lambda, Amazon S3) are available.

The Region used for bedrock_region can be different from aws_region because you might have access to Amazon Bedrock models in a Region different from the Region where you want to deploy the project.

By default, the project uses Anthropic’s Claude 3 Sonnet as a summarization model; however, you can use a different model by changing the bedrock_model_id in the configuration file. For the complete list of model IDs, see Amazon Bedrock model IDs. When selecting a model for your deployment, don’t forget to check that the desired model is available in your preferred Region; for more details about model availability, see Model support by AWS Region.

If you have never used the AWS CDK on this account and Region combination, you will need to run the following command to bootstrap the AWS CDK on the target account and Region (otherwise, you can skip this step):

npx cdk bootstrap aws://{targetAccountId}/{targetRegion}

Navigate to the cdk sub-directory, install dependencies, and deploy the stack by running the following commands:

cd cdk
npm i
npx cdk deploy

Confirm the deployment of the listed resources by entering y.

Wait for AWS CloudFormation to finish the stack creation.

You need to use the CloudFormation stack outputs to connect the frontend to the backend. After the deployment is complete, you have two options.

The preferred option is to use the provided postdeploy.sh script to automatically copy the cdk configuration parameters to a configuration file by running the following command, still in the /cdk folder:

./scripts/postdeploy.sh

Alternatively, you can copy the configuration manually:

Open the AWS CloudFormation console in the same Region where you deployed the resources.
Find the stack named AwsStreamAnalysisStack.
On the Outputs tab, note of the output values to complete the next steps.

Set up the extension

Complete the following steps to get the extension ready for transcribing, translating, and summarizing live streams:

Open the src/config.js Based on how you chose to collect the CloudFormation stack outputs, follow the appropriate step:
1. If you used the provided automation, check whether the values inside the src/config.js file have been automatically updated with the corresponding values.
2. If you copied the configuration manually, populate the src/config.js file with the values you noted. Use the following format:

const config = {
    "aws_project_region": "{aws_region}", // The same you have used as aws_region in cdk/bin/config.json
    "aws_cognito_identity_pool_id": "{CognitoIdentityPoolId}", // From CloudFormation outputs
    "aws_user_pools_id": "{CognitoUserPoolId}", // From CloudFormation outputs
    "aws_user_pools_web_client_id": "{CognitoUserPoolClientId}", // From CloudFormation outputs
    "bucket_s3": "{BucketS3Name}", // From CloudFormation outputs
    "bedrock_region": "{bedrock_region}", // The same you have used as bedrock_region in cdk/bin/config.json
    "api_gateway_id": "{APIGatewayId}" // From CloudFormation outputs
};

Take note of the CognitoUserPoolId, which will be needed in a later step to create a new user.

In the command line interface, move back to the aws-transcribe-translate-summarize-live-streams-in-browser directory with a command similar to following:

cd ~/aws-transcribe-translate-summarize-live-streams-in-browser

Install dependencies and build the package by running the following commands:

npm i
npm run build

Open your Chrome browser and navigate to chrome://extensions/.

Make sure that developer mode is enabled by toggling the icon on the top right corner of the page.

Choose Load unpacked and upload the build directory, which can be found inside the local project folder aws-transcribe-translate-summarize-live-streams-in-browser.
Grant permissions to your browser to record your screen and audio:
1. Identify the newly added Transcribe, translate and summarize live streams (powered by AWS)
2. Choose Details and then Site Settings.
3. In the Microphone section, choose Allow.
Create a new Amazon Cognito user:
1. On the Amazon Cognito console, choose User pools in the navigation pane.
2. Choose the user pool with the CognitoUserPoolId value noted from the CloudFormation stack outputs.
3. On the Users tab, choose Create user and configure this user’s verification and sign-in options.

See a walkthrough of Steps 4-6 in the animated image below. For additional details, refer to Creating a new user in the AWS Management Console.

Use the extension

Now that the extension in set up, you can interact with it by completing these steps:

On the browser tab, choose the Extensions.
Choose (right-click) on the Transcribe, translate and summarize live streams (powered by AWS) extension and choose Open side panel.
Log in using the credentials created in the Amazon Cognito user pool from the previous step.
Close the side panel.

You’re now ready to experiment with the extension.

Open a new tab in the browser, navigate to a website featuring an audio/video stream, and open the extension (choose the Extensions icon, then choose the option menu (three dots) next to AWS transcribe, translate, and summarize, and choose Open side panel).
Use the Settings pane to update the settings of the application:
- Mic in use – The Mic not in use setting is used to record only the audio of the browser tab for a live video streaming. Mic in use is used for a real-time meeting where your microphone is recorded as well.
- Transcription language – This is the language of the live stream to be recorded (set to auto to allow automatic identification of the language).
- Translation language – This is the language in which the live stream will be translated and the summary will be printed. After you choose the translation language and start the recording, you can’t change your choice for the ongoing live stream. To change the translation language for the transcript and summary, you will have to record it from scratch.
Choose Start recording to start recording, and start exploring the Transcription and Translation

Content on the Translation tab will appear with a few seconds of delay compared to what you see on the Transcription tab. When transcribing speech in real time, Amazon Transcribe incrementally returns a stream of partial results until it generates the final transcription for a speech segment. This Chrome extension has been implemented to translate text only after a final transcription result is returned.

Expand the Summary section and choose Get summary to generate a summary. The operation will take a few seconds.
Choose Stop recording to stop recording.
Choose Clear all conversations in the Clean Up section to delete the summary of the live stream from the S3 bucket.

See the extension in action in the video below.

Troubleshooting

If you receive the error “Extension has not been invoked for the current page (see activeTab permission). Chrome pages cannot be captured.”, check the following:

Make sure you’re using the extension on the tab where you first opened the side pane. If you want to use it on a different tab, stop the extension, close the side pane, and choose the extension icon again to run it
Make sure you have given permissions for audio recording in the web browser.

If you can’t get the summary of the live stream, make sure you have stopped the recording and then request the summary. You can’t change the language of the transcript and summary after the recording has started, so remember to choose it appropriately before you start the recording.

Clean up

When you’re done with your tests, to avoid incurring future charges, delete the resources created during this walkthrough by deleting the CloudFormation stack:

On the AWS CloudFormation console, choose Stacks in the navigation pane.
Choose the stack AwsStreamAnalysisStack.
Take note of the CognitoUserPoolId and CognitoIdentityPoolId values among the CloudFormation stack outputs, which will be needed in the following step.
Choose Delete stack and confirm deletion when prompted.

Because the Amazon Cognito resources won’t be automatically deleted, delete them manually:

On the Amazon Cognito console, locate the CognitoUserPoolId and CognitoIdentityPoolId values previously retrieved in the CloudFormation stack outputs.
Select both resources and choose Delete.

Conclusion

In this post, we showed you how to deploy a code sample that uses AWS AI and generative AI services to access features such as live transcription, translation and summarization. You can follow the steps we provided to start experimenting with the browser extension.

To learn more about how to build and scale generative AI applications, refer to Transform your business with generative AI.

About the Authors

Luca Guida is a Senior Solutions Architect at AWS; he is based in Milan and he supports independent software vendors in their cloud journey. With an academic background in computer science and engineering, he started developing his AI/ML passion at university; as a member of the natural language processing and generative AI community within AWS, Luca helps customers be successful while adopting AI/ML services.

Chiara Relandini is an Associate Solutions Architect at AWS. She collaborates with customers from diverse sectors, including digital native businesses and independent software vendors. After focusing on ML during her studies, Chiara supports customers in using generative AI and ML technologies effectively, helping them extract maximum value from these powerful tools.

Arian Rezai Tabrizi is an Associate Solutions Architect based in Milan. She supports enterprises across various industries, including retail, fashion, and manufacturing, on their cloud journey. Drawing from her background in data science, Arian assists customers in effectively using generative AI and other AI technologies.

Accelerate your financial statement analysis with Amazon Bedrock and generative AI

November 13, 2024

by Jason Dalba Amazon AWS

The financial and banking industry can significantly enhance investment research by integrating generative AI into daily tasks like financial statement analysis. By taking advantage of advanced natural language processing (NLP) capabilities and data analysis techniques, you can streamline common tasks like these in the financial industry:

Automating data extraction – The manual data extraction process to analyze financial statements can be time-consuming and prone to human errors. Generative AI models can automate finding and extracting financial data from documents like 10-Ks, balance sheets, and income statements. Foundation model (FMs) are trained to identify and extract relevant information like expenses, revenue, and liabilities.
Trend analysis and forecasting – Identifying trends and forecasting requires domain expertise and advanced mathematics. This limits the ability for individuals to run one-time reporting, while creating dependencies within an organization on a small subset of employees. Generative AI applications can analyze financial data and identify trends and patterns while forecasting future financial performance, all without manual intervention from an analyst. Removing the manual analysis step and allowing the generative AI model to build a report analyzing trends in the financial statement can increase the organization’s agility to make quick market decisions.
Financial reporting statements – Writing detailed financial analysis reports manually can be time-consuming and resource intensive. Dedicated resources to generate financial statements can create bottlenecks within the organization, requiring specialized roles to handle the translation of financial data into a consumable narrative. FMs can summarize financial statements, highlighting key metrics found through trend analysis and providing insights. An automated report writing process not only provides consistency and speed, but minimizes resource constraints in the financial reporting process.

Amazon Bedrock is a fully managed service that makes FMs from leading AI startups and Amazon available through an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case. Amazon Bedrock offers a serverless experience, so you can get started quickly, privately customize FMs with your own data, and quickly integrate and deploy them into your applications using AWS tools without having to manage infrastructure.

In this post, we demonstrate how to deploy a generative AI application that can accelerate your financial statement analysis on AWS.

Solution overview

Building a generative AI application with Amazon Bedrock to analyze financial statements involves a series of steps, from setting up the environment to deploying the model and integrating it into your application.

The following diagram illustrates an example solution architecture using AWS services.

The workflow consists of the following steps:

The user interfaces with a web or mobile application, where they upload financial documents.
Amazon API Gateway manages and routes the incoming request from the UI.
An AWS Lambda function is invoked when new documents are added to the Amazon Simple Storage Service (Amazon S3) bucket.
Amazon Bedrock analyzes the documents stored in Amazon S3. The analysis results are returned to the S3 bucket through a Lambda function and stored there.
Amazon DynamoDB provides a fast, scalable way to store and retrieve metadata and analysis results to display to users.
Amazon Simple Notification Service (Amazon SNS) sends notifications about the status of document processing to the application user.

In the following sections, we discuss the key considerations in each step to build and deploy a generative AI application.

Prepare the data

Gather the financial statements you want to analyze. These can be balance sheets, income statements, cash flow statements, and so on. Make sure the data is clean and in a consistent format. You might need to preprocess the data to remove noise and standardize the format. Preprocessing the data will transform the raw data into a state that can be efficiently used for model training. This is often necessary due to messiness and inconsistencies in real-world data. The outcome is to have consistent data for the model to ingest. The two most common types of data preprocessing are normalization and standardization.

Normalization modifies the numerical columns within a dataset to standardize the scale. By rearranging the data within a dataset, the scaling method reduces duplication in which the numbers are scaled from 0–1. Because outliers are removed, undesirable characteristics from the dataset are also removed. When dealing with a significant amount of data, normalizing the dataset enhances the performance of a machine learning model in environments where feature distribution is unclear.

Standardization is a method designed to rescale the values of a dataset to meet the characteristics of a standard normal distribution. By using this methodology, the data can transmit more reliably across systems, making it simpler to process, analyze, and store data in a database. Standardization is beneficial when feature distribution is consistent and values on a scale aren’t constrained within a particular range.

Choose your model

Amazon Bedrock gives you the power of choice by providing a flexible and scalable environment that allows you to access and use multiple FMs from leading AI model providers. This flexibility enables you to select the most appropriate models for your specific use cases, whether you’re working on tasks like NLP, text generation, image generation, or other AI-driven applications.

Deploy the model

If you don’t already have access to Amazon Bedrock FMs, you’ll need to request access through the Amazon Bedrock console. Then you can use the Amazon Bedrock console to deploy the chosen model. Configure the deployment settings according to your application’s requirements.

Develop the backend application

Create a backend service to interact with the deployed model. This service will handle requests from the frontend, send data to the model, and process the model’s responses. You can use Lambda, API Gateway, or other preferred REST API endpoints.

Use the Amazon Bedrock API to send financial statements to the model and receive the analysis results.

The following is an example of the backend code.

Develop the frontend UI

Create a frontend interface for users to upload financial statements and view analysis results. This can be a web or mobile application. Make sure the frontend can send financial statement data to the backend service and display the analysis results.

Conclusion

In this post, we discussed the benefits to building a generative AI application powered by Amazon Bedrock to accelerate the analysis of financial documents. Stakeholders will be able to use AWS services to deploy and manage LLMs that help improve the efficiency of pulling insights from common documents like 10-Ks, balance sheets, and income statements.

For more information on working with generative AI on AWS, visit the AWS Skill Builder generative AI training modules.

For instructions on building frontend applications and full-stack applications powered by Amazon Bedrock, refer to Front-End Web & Mobile on AWS and Create a Fullstack, Sample Web App powered by Amazon Bedrock.

About the Author

Jason D’Alba is an AWS Solutions Architect leader focused on enterprise applications, helping customers architect highly available and scalable data & ai solutions.

Multilingual content processing using Amazon Bedrock and Amazon A2I

November 13, 2024

by Marin Mestrovic Amazon AWS

The market size for multilingual content extraction and the gathering of relevant insights from unstructured documents (such as images, forms, and receipts) for information processing is rapidly increasing. The global intelligent document processing (IDP) market size was valued at $1,285 million in 2022 and is projected to reach $7,874 million by 2028 (source).

Let’s consider that you’re a multinational company that receives invoices, contracts, or other documents from various regions worldwide, in languages such as Arabic, Chinese, Russian, or Hindi. These languages might not be supported out of the box by existing document extraction software.

Anthropic’s Claude models, deployed on Amazon Bedrock, can help overcome these language limitations. These large language models (LLMs) are trained on a vast amount of data from various domains and languages. They possess remarkable capabilities in understanding and generating human-like text in multiple languages. Handling complex and sensitive documents requires accuracy, consistency, and compliance, often necessitating human oversight. Amazon Augmented AI (Amazon A2I) simplifies the creation of workflows for human review, managing the heavy lifting associated with developing these systems or overseeing a large reviewer workforce. By combining Amazon A2I and Anthropic’s Claude on Amazon Bedrock, you can build a robust multilingual document processing pipeline with improved accuracy and quality of extracted information.

To demonstrate this multilingual and validated content extraction solution, we will use Amazon Bedrock generative AI, serverless orchestration managed by Amazon Step Functions, and augmented human intelligence powered by Amazon A2I.

Solution overview

This post outlines a custom multilingual document extraction and content assessment framework using a combination of Anthropic’s Claude 3 on Amazon Bedrock and Amazon A2I to incorporate human-in-the-loop capabilities. The key steps of the framework are as follows:

Store documents of different languages
Invoke a processing flow that extracts data from the document according to given schema
Pass extracted content to human reviewers to validate the information
Convert validated content into an Excel format and store in a storage layer for use

This framework can be further expanded by parsing the content to a knowledge base, indexing the information extracted from the documents, and creating a knowledge discovery tool (Q&A assistant) to allow users to query information and extract relevant insights.

Document processing stages

Our reference solution uses a highly resilient pipeline, as shown in the following diagram, to coordinate the various document processing stages.

The document processing stages are:

Acquisition – The first stage of the pipeline acquires input documents from Amazon Simple Storage Service (Amazon S3). In this stage, we store initial document information in an Amazon DynamoDB table after receiving an Amazon S3 event notification. We use this table to track the progression of this document across the entire pipeline.
Extraction – A document schema definition is used to formulate the prompt and documents are embedded into the prompt and sent to Amazon Bedrock for extraction. Results are stored as JSON in a folder in Amazon S3.
Custom business rules – Custom business rules are applied to the reshaped output containing information about tables in the document. Custom rules might include table format detection (such as detecting that a table contains invoice transactions) or column validation (such as verifying that a product code column only contains valid codes).
Reshaping – JSON extracted in the previous step is reshaped in the format supported by Amazon A2I and prepared for augmentation.
Augmentation – Human annotators use Amazon A2I to review the document and augment it with any information that was missed.
Cataloging – Documents that pass human review are cataloged into an Excel workbook so your business teams can consume them.

A custom UI built with ReactJS is provided to human reviewers to intuitively and efficiently review and correct issues in the documents.

Extraction with a multi-modal language model

The architecture uses a multi-modal LLM to perform extraction of data from various multi-lingual documents. We specifically used the Rhubarb Python framework to extract JSON schema-based data from the documents. Rhubarb is a lightweight Python framework built from the ground up to enable document understanding tasks using multi-modal LLMs. It uses Amazon Bedrock through the Boto3 API to use Anthropic’s Claude V3 multi-modal language models, but makes it straightforward to use file formats that are otherwise not supported by Anthropic’s Claude models. As of writing, Anthropic’s Claude V3 models can only support image formats (JPEG, PNG, and GIF). This means that when dealing with documents in PDF or TIF format, the document must be converted to a compatible image format. This process is taken care by the Rhubarb framework internally, making our code simpler.

Additionally, Rhubarb comes with built-in system prompts that ground the model responses to be in a defined format using the JSON schema. A predefined JSON schema can be provided to the Rhubarb API, which makes sure the LLM generates data in that specific format. Internally, Rhubarb also does re-prompting and introspection to rephrase the user prompt in order to increase the chances of successful data extraction by the model. We used the following JSON schema for the purposes of extracting data from our documents:

{
    "type": "object",
    "properties": {
        "invoice_number": {
            "type": "string",
            "description": "The unique identifier for the invoice"
        },
        "issue_date": {
            "type": "string",
            "description": "The date the invoice was issued"
        },
        "due_date": {
            "type": "string",
            "description": "The date the payment for the invoice is due"
        },
        "issuer": {
            "type": "object",
            "properties": {
                "name": {
                    "type": "string",
                    "description": "The name of the company or entity issuing the invoice"
                },
                "address": {
                    "type": "string",
                    "description": "The address of the issuing company or entity"
                },
                "identifier": {
                    "type": "string",
                    "description": "The identifier of the issuing company or entity"
                }
            },
            "required": [
                "name",
                "address",
                "identifier"
            ]
        },
        "recipient": {
            "type": "object",
            "properties": {
                "name": {
                    "type": "string",
                    "description": "The name of the company or entity receiving the invoice"
                },
                "address": {
                    "type": "string",
                    "description": "The address of the receiving company or entity"
                },
                "identifier": {
                    "type": "string",
                    "description": "The identifier of the receiving company or entity"
                }
            },
            "required": [
                "name",
                "address",
                "identifier"
            ]
        },
        "line_items": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "product_id": {
                        "type": "string",
                        "description": "The identifier for the product or service"
                    },
                    "description": {
                        "type": "string",
                        "description": "A description of the product or service"
                    },
                    "quantity": {
                        "type": "number",
                        "description": "The quantity of the product or service"
                    },
                    "unit_price": {
                        "type": "number",
                        "description": "The price per unit of the product or service"
                    },
                    "discount": {
                        "type": "number",
                        "description": "The discount applied to the unit price"
                    },
                    "discounted_price": {
                        "type": "number",
                        "description": "The price per unit after discount"
                    },
                    "tax_rate": {
                        "type": "number",
                        "description": "The tax rate applied to the unit price"
                    },
                    "total_price": {
                        "type": "number",
                        "description": "The total price for the line item (quantity * unit_price)"
                    }
                },
                "required": [
                    "product_id",
                    "description",
                    "quantity",
                    "unit_price",
                    "discount",
                    "discounted_price",
                    "tax_rate",
                    "total_price"
                ]
            }
        },
        "totals": {
            "type": "object",
            "properties": {
                "subtotal": {
                    "type": "number",
                    "description": "The total of all line item prices before taxes and fees"
                },
                "discount": {
                    "type": "number",
                    "description": "The total discount applied"
                },
                "tax": {
                    "type": "number",
                    "description": "The amount of tax applied to the subtotal"
                },
                "total": {
                    "type": "number",
                    "description": "The total amount due for the invoice after taxes and fees"
                }
            },
            "required": [
                "subtotal",
                "discount",
                "tax",
                "total"
            ]
        }
    },
    "required": [
        "invoice_number",
        "issue_date",
        "due_date",
        "issuer",
        "recipient",
        "line_items",
        "totals"
    ]
}

There are a number of other features supported by Rhubarb; for example, it supports document classification, summary, page wise extractions, Q&A, streaming chat and summaries, named entity recognition, and more. Visit the Rhubarb documentation to learn more about using it for various document understanding tasks.

Prerequisites

This solution uses Amazon SageMaker labeling workforces to manage workers and distribute tasks. As a prerequisite, create a private workforce. For instructions, see Create an Amazon Cognito Workforce Using the Labeling Workforces Page. Create two worker teams, called primary and quality, and assign yourself to both teams.

After you add yourself to the teams and confirm your email, note the worker portal URL. To find the URL, open the AWS Management Console for SageMaker and choose Ground Truth and then Labeling workforces in the navigation pane. On the Private tab, you can find the URL for the labeling portal. This URL is also automatically emailed to the work team members as they are onboarded.

Next, install the AWS Cloud Development Kit (AWS CDK) toolkit with the following code:

npm install -g aws-cdk

Disclaimer: When installing global packages like the AWS CDK using npm, some systems, especially macOS and Linux, might require elevated permissions. If you encounter a permissions error when running npm install -g aws-cdk, you can adjust the global npm directory to avoid using sudo by following the instructions in this documentation.

Lastly, install Docker based on your operating system:

Mac – Install Docker Desktop on Mac
Windows – Install Docker Desktop on Windows

Deploy the application to the AWS Cloud

This reference solution is available on GitHub, and you can deploy it with the AWS CDK. For instructions on deploying the cloud application, see the README file in the GitHub repo.

Deploying this application to your AWS account will create various S3 buckets for document storage, AWS Lambda functions for integration with AWS machine learning (ML) services and business logic, AWS Identity and Access Management (IAM) policies, an Amazon Simple Queue Service (Amazon SQS) queue, a data processing pipeline using a Step Functions state machine, and an Amazon A2I based human review workflow.

Complete the following steps:

Clone the GitHub repo.

To clone the repository, you can use either the HTTPS or SSH method depending on your environment and authentication setup:

Using HTTPS:

git clone https://github.com/aws-samples/multilingual-content-processing-with-amazon-bedrock.git

This option is generally accessible for most users who have their Git configuration set up for HTTPS

Using SSH:

git clone git@github.com:aws-samples/multilingual-content-processing-with-amazon-bedrock.git

Make sure you have your SSH keys properly configured and added to your GitHub account to use this method.

Navigate to the root directory of the repository.

cd  multilingual-content-processing-with-amazon-bedrock

Create a virtual environment.

python3 -m venv .venv

Enter the virtual environment.

source .venv/bin/activate

Install dependencies in the virtual environment.

pip install -r requirements.txt

Bootstrap the AWS CDK (you only need to do this one time per account setup).

cdk bootstrap

Edit the json file to add the name of the work team you created earlier. Make sure to match the work team name in the same AWS Region and account.

edit cdk.json

Deploy the application.

cdk deploy --all

After you run cdk deploy --all, the AWS CloudFormation template provisions the necessary AWS resources.

Test the document processing pipeline

When the application is up and running, you’re ready to upload documents for processing and review. For this post, we use the following sample document for testing the pipeline. You can use the AWS Command Line Interface (AWS CLI) to upload the document, which will automatically invoke the pipeline.

Upload the document schema.

aws s3 cp ./data/invoice_schema.json s3://mcp-store-document-<ACCOUNT-ID>/schema/

Upload the documents.

aws s3 cp ./data/croatianinvoice.pdf s3://mcp-store-document-<ACCOUNT-ID>/acquire/

The status of the document processing is tracked in a DynamoDB table. You can check the status on the DynamoDB console or by using the following query.

aws dynamodb query 
    --table-name mcp-table-pipeline 
    --key-condition-expression "DocumentID = :documentID" 
    --expression-attribute-values '{":documentID":{"S":"croatianinvoice.pdf"}}' 
    --output text

When the document reaches the Augment#Running stage, the extraction and business rule applications are complete, indicating that the document is ready for human review.

Navigate to the portal URL that you retrieved earlier and log in to view all tasks pending human review.
Choose Start working to examine the submitted document.

The interface will display the original document on the left and the extracted content on the right.

When you complete your review and annotations, choose Submit.

The results will be stored as an Excel file in the mcp-store-document-<ACCOUNT-ID> S3 bucket in the /catalog folder.

The /catalog folder in your S3 bucket might take a few minutes to be created after you submit the job. If you don’t see the folder immediately, wait a few minutes and refresh your S3 bucket. This delay is normal because the folder is generated when the job is complete and the results are saved.

By following these steps, you can efficiently process, review, and store documents using a fully automated AWS Cloud-based pipeline.

Clean up

To avoid ongoing charges, clean up the entire AWS CDK environment by using the cdk destroy command. Additionally, it’s recommended to manually inspect the Lambda functions, Amazon S3 resources, and Step Functions workflow to confirm that they are properly stopped and deleted. This step is essential to avoid incurring any additional costs associated with running the AWS CDK application.

Furthermore, delete the output data created in the S3 buckets while running the orchestration workflow through the Step Functions and the S3 buckets themselves. You must delete the data in the S3 buckets before you can delete the buckets themselves.

Conclusion

In this post, we demonstrated an end-to-end approach for multilingual document ingestion and content extraction, using Amazon Bedrock and Amazon A2I to incorporate human-in-the-loop capabilities. This comprehensive solution enables organizations to efficiently process documents in multiple languages and extract relevant insights, while benefiting from the combined power of AWS AI/ML services and human validation.

Don’t let language barriers or validation challenges hold you back. Try this solution to take your content and insights to the next level to unlock the full potential of your data, and reach out to your AWS contact if you need further assistance. We encourage you to experiment editing the prompts and model versions to generate outputs that may get more closely aligned with your requirements.

For further information about Amazon Bedrock, check out the Amazon Bedrock workshop. To learn more about Step Functions, see Building machine learning workflows with Amazon SageMaker Processing jobs and AWS Step Functions.

About the Authors

Marin Mestrovic is a Partner Solutions Architect at Amazon Web Services, specializing in supporting partner solutions. In his role, he collaborates with leading Global System Integrators (GSIs) and independent software vendors (ISVs) to help design and build cost-efficient, scalable, industry-specific solutions. With his expertise in AWS capabilities, Marin empowers partners to develop innovative solutions that drive business growth for their clients.

Shikhar Kwatra is a Sr. Partner Solutions Architect at Amazon Web Services, working with leading Global System Integrators. He has earned the title of one of the Youngest Indian Master Inventors with over 500 patents in the AI/ML and IoT domains. Shikhar aids in architecting, building, and maintaining cost-efficient, scalable cloud environments for the organization, and support the GSI partners in building strategic industry solutions on AWS.

Dilin Joy is a Senior Partner Solutions Architect at Amazon Web Services. In his role, he works with leading independent software vendors (ISVs) and Global System Integrators (GSIs) to provide architectural guidance and support in building strategic industry solutions on the AWS platform. His expertise and collaborative approach help these partners develop innovative cloud-based solutions that drive business success for their clients.

Anjan Biswas is a Senior AI Services Solutions Architect who focuses on computer vision, NLP, and generative AI. Anjan is part of the worldwide AI services specialist team and works with customers to help them understand and develop solutions to business problems with AWS AI Services and generative AI.

Build a reverse image search engine with Amazon Titan Multimodal Embeddings in Amazon Bedrock and AWS managed services

November 13, 2024

by Nathan Pogue Amazon AWS

In ecommerce, visual search technology revolutionizes how customers find products by enabling them to search for products using images instead of text. Shoppers often have a clear visual idea of what they want but struggle to describe it in words, leading to inefficient and broad text-based search results. For example, searching for a specific red leather handbag with a gold chain using text alone can be cumbersome and imprecise, often yielding results that don’t directly match the user’s intent. By using images, visual search can directly match physical attributes, providing better results quickly and enhancing the overall shopping experience.

A reverse image search engine enables users to upload an image to find related information instead of using text-based queries. It works by analyzing the visual content to find similar images in its database. Companies such as Amazon use this technology to allow users to use a photo or other image to search for similar products on their ecommerce websites. Other companies use it to identify objects, faces, and landmarks to discover the original source of an image. Beyond ecommerce, reverse image search engines are invaluable to law enforcement for identifying illegal items for sale and identifying suspects, to publishers for validating visual content authenticity, for healthcare professionals by assisting in medical image analysis, and tackling challenges such as misinformation, copyright infringement, and counterfeit products.

In the context of generative AI, significant progress has been made in developing multimodal embedding models that can embed various data modalities—such as text, image, video, and audio data—into a shared vector space. By mapping image pixels to vector embeddings, these models can analyze and compare visual attributes such as color, shape, and size, enabling users to find similar images with specific attributes, leading to more precise and relevant search results.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. The Amazon Bedrock single API access, regardless of the models you choose, gives you the flexibility to use different FMs and upgrade to the latest model versions with minimal code changes.

Exclusive to Amazon Bedrock, the Amazon Titan family of models incorporates 25 years of experience innovating with AI and machine learning at Amazon. Amazon Titan FMs provide customers with a breadth of high-performing image, multimodal, and text model choices, through a fully managed API. With Amazon Titan Multimodal Embeddings, you can power more accurate and contextually relevant multimodal search, recommendation, and personalization experiences for users.

In this post, you will learn how to extract key objects from image queries using Amazon Rekognition and build a reverse image search engine using Amazon Titan Multimodal Embeddings from Amazon Bedrock in combination with Amazon OpenSearch Serverless Service.

Solution overview

The solution outlines how to build a reverse image search engine to retrieve similar images based on input image queries. This post demonstrates a guide for using Amazon Titan Multimodal Embeddings to embed images, store these embeddings in an OpenSearch Serverless vector index, and use Amazon Rekognition to extract key objects from images for querying the index.

The following diagram illustrates the solution architecture:

The steps of the solution include:

Upload data to Amazon S3: Store the product images in Amazon Simple Storage Service (Amazon S3).
Generate embeddings: Use Amazon Titan Multimodal Embeddings to generate embeddings for the stored images.
Store embeddings: Ingest the generated embeddings into an OpenSearch Serverless vector index, which serves as the vector database for the solution.
Image analysis: Use Amazon Rekognition to analyze the product images and extract labels and bounding boxes for these images. These extracted objects will then be saved as separate images, which can be used for the query.
Convert search query to an embedding: Convert the user’s image search query into an embedding using Amazon Titan Multimodal Embeddings.
Run similarity search: Perform a similarity search on the vector database to find product images that closely match the search query embedding.
Display results: Display the top K similar results to the user.

Prerequisites

To implement the proposed solution, make sure that you have the following:

An AWS account and a working knowledge of FMs, Amazon Bedrock, Amazon SageMaker, Amazon OpenSearch Service, Amazon S3, and AWS Identity and Access Management (IAM).
The AWS Command Line Interface (AWS CLI) installed on your machine to upload the dataset to Amazon S3. Alternatively, you could directly upload the dataset to an S3 bucket by using the AWS Management Console.
Amazon Titan Multimodal Embeddings model access in Amazon Bedrock. Verify its status on the Model access page of the Amazon Bedrock console. If enabled, its status will display as Access granted. If the model isn’t enabled, you can gain model access by selecting choosing Manage model access, selecting Amazon Titan Multimodal Embeddings G1, and choosing Request model access. The model will then be available for use.

An Amazon SageMaker Studio domain. If you haven’t set up a SageMaker Studio domain, see this Amazon SageMaker blog post for instructions on setting up SageMaker Studio for individual users.
An Amazon OpenSearch Serverless collection. You can create a vector search collection by following the steps in Create a collection with public network access and data access granted to the Amazon SageMaker Notebook execution role principal.
The GitHub repo cloned to the Amazon SageMaker Studio instance. To clone the repo onto your SageMaker Studio instance, choose the Git icon on the left sidebar and enter https://github.com/aws-samples/reverse-image-search-engine.git
After it has cloned, you can navigate to the reverse-image-search-engine.ipynb notebook file to and run the cells. This post highlights the important code segments; however, the full code can be found in the notebook.
The necessary permissions attached to the Amazon SageMaker notebook execution role to grant read and write access to the Amazon OpenSearch Serverless collection. For more information on managing credentials securely, see the AWS Boto3 documentation. Make sure that full access is granted to the SageMaker execution role by applying the following IAM policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "aoss:*",
            "Resource": "*"
        }
    ]
}

Upload the dataset to Amazon S3

In this solution, we will use the Shoe Dataset from Kaggle.com, which contains a collection of approximately 1,800 shoe images. The dataset is primarily used for image classification use cases and contains images of shoes from six main categories—boots, sneakers, flip flops, loafers, sandals, and soccer shoes—with 249 JPEG images for each shoe type. For this tutorial, you will concentrate on the loafers folder found in the training category folder.

To upload the dataset

Download the dataset: Go to the Shoe Dataset page on Kaggle.com and download the dataset file (350.79MB) that contains the images.
Extract the specific folder: Extract the downloaded file and navigate to the loafers category within the training
Create an Amazon S3 bucket: Sign in to the Amazon S3 console, choose Create bucket, and follow the prompts to create a new S3 bucket.
Upload images to the Amazon S3 bucket using the AWS CLI: Open your terminal or command prompt and run the following command to upload the images from the loafers folder to the S3 bucket:
aws s3 cp </path/to/local/folder> s3://<your-bucket-name>/ --recursive

Replace </path/to/local/folder> with the path to the loafers category folder from the training folder on your local machine. Replace <your-bucket-name> with the name of your S3 bucket. For example:
aws s3 cp /Users/username/Documents/training/loafers s3://footwear-dataset/ --recursive

Confirm the upload: Go back to the S3 console, open your bucket, and verify that the images have been successfully uploaded to the bucket.

Create image embeddings

Vector embeddings represent information—such as text or images—as a list of numbers, with each number capturing specific features. For example, in a sentence, some numbers might represent the presence of certain words or topics, while in an image or video, they might represent colors, shapes, or patterns. This numerical representation, or vector, is placed in a multidimensional space called the embedding space, where distances between vectors indicate similarities between the represented information. The closer vectors are to one another in this space, the more similar the information they represent is. The following figure is an example of an image and part of its associated vector.

To convert images to vectors, you can use Amazon Titan Multimodal Embeddings to generate image embeddings, which can be accessed through Amazon Bedrock. The model will generate vectors embeddings with 1,024 dimensions; however, you can choose a smaller dimension size to optimize for speed and performance.

To create image embeddings:

The following code segment shows how to create a function that will be used to generate embeddings for the dataset of shoe images stored in the S3 bucket.

# Import required libraries
import boto3
import pandas as pd
import base64
import json

# Constants, change to your S3 bucket name and selected AWS region
BUCKET_NAME = "<YOUR_AMAZON_S3_BUCKET_NAME>"
BEDROCK_MODEL_ID = "amazon.titan-embed-image-v1"
REGION = "<YOUR_SELECTED_AWS_REGION>"
# Define max width and height for resizing to accommodate Bedrock limits
MAX_WIDTH = 1024  
MAX_HEIGHT = 1024  

# Initialize AWS clients
s3 = boto3.client('s3')
bedrock_client = boto3.client(
    "bedrock-runtime", 
    REGION, 
    endpoint_url=f"https://bedrock-runtime.{REGION}.amazonaws.com"
)

# Function to resize image
def resize_image(image_data):
    image = Image.open(io.BytesIO(image_data))

    # Resize image while maintaining aspect ratio
    image.thumbnail((MAX_WIDTH, MAX_HEIGHT))

    # Save resized image to bytes buffer
    buffer = io.BytesIO()
    image.save(buffer, format="JPEG")
    buffer.seek(0)

    return buffer.read()

# Function to create embedding from input image
def create_image_embedding(image):
    image_input = {}

    if image is not None:
        image_input["inputImage"] = image
    else:
        raise ValueError("Image input is required")

    image_body = json.dumps(image_input)

    # Invoke Amazon Bedrock with encoded image body
    bedrock_response = bedrock_client.invoke_model(
        body=image_body,
        modelId=BEDROCK_MODEL_ID,
        accept="application/json",
        contentType="application/json"
    )

    # Retrieve body in JSON response
    final_response = json.loads(bedrock_response.get("body").read())

    embedding_error = final_response.get("message")

    if embedding_error is not None:
        print (f"Error creating embeddings: {embedding_error}")

    # Return embedding value
    return final_response.get("embedding")

Because you will be performing a search for similar images stored in the S3 bucket, you will also have to store the image file name as metadata for its embedding. Also, because the model expects a base64 encoded image as input, you will have to create an encoded version of the image for the embedding function. You can use the following code to fulfill both requirements.

# Retrieve images stored in S3 bucket 
response = s3.list_objects_v2(Bucket=BUCKET_NAME)
contents = response.get('Contents', [])

# Define arrays to hold embeddings and image file key names
image_embeddings = []
image_file_names = []

# Loop through S3 bucket to encode each image, generate its embedding, and append to array
for obj in contents:
    image_data = s3.get_object(Bucket=BUCKET_NAME, Key=obj['Key'])['Body'].read()

    # Resize the image to meet model requirements
    resized_image = resize_image(image_data)

    # Create base64 encoded image for Titan Multimodal Embeddings model input
    base64_encoded_image = base64.b64encode(resized_image).decode('utf-8')

    # Generate the embedding for the resized image
    image_embedding = create_image_embedding(image=base64_encoded_image)
    image_embeddings.append(image_embedding)
    image_file_names.append(obj["Key"])

After generating embeddings for each image stored in the S3 bucket, the resulting embedding list can be obtained by running the following code

# Add and list embeddings with associated image file key to dataframe object
final_embeddings_dataset = pd.DataFrame({'image_key': image_file_names, 'image_embedding': image_embeddings})
final_embeddings_dataset.head()

image_key	image_embedding
image1.jpeg	[0.00961759, 0.0016261627, -0.0024508594, -0.0…
image10.jpeg	[0.008917685, -0.0013863152, -0.014576114, 0.0…
image100.jpeg	[0.006402869, 0.012893448, -0.0053941975, -0.0…
image101.jpg	[0.06542923, 0.021960363, -0.030726435, -0.000…
image102.jpeg	[0.0134112835, -0.010299515, -0.0044046864, -0…

Upload embeddings to Amazon OpenSearch Serverless

Now that you have created embeddings for your images, you need to store these vectors so they can be searched and retrieved efficiently. To do so, you can use a vector database.

A vector database is a type of database designed to store and retrieve vector embeddings. Each data point in the database is associated with a vector that encapsulates its attributes or features. This makes it particularly useful for tasks such as similarity search, where the goal is to find objects that are the most similar to a given query object. To search against the database, you can use a vector search, which is performed using the k-nearest neighbors (k-NN) algorithm. When you perform a search, the algorithm computes a similarity score between the query vector and the vectors of stored objects using methods such as cosine similarity or Euclidean distance. This enables the database to retrieve the closest objects that are most similar to the query object in terms of their features or attributes. Vector databases often use specialized vector search engines, such as nmslib or faiss, which are optimized for efficient storage, retrieval, and similarity calculation of vectors.

In this post, you will use OpenSearch Serverless as the vector database for the image embeddings. OpenSearch Serverless is a serverless option for OpenSearch Service, a powerful storage option built for distributed search and analytics use cases. With Amazon OpenSearch Serverless, you don’t need to provision, configure, and tune the instance clusters that store and index your data.

To upload embeddings:

If you have set up your Amazon OpenSearch Serverless collection, the next step is to create a vector index. In the Amazon OpenSearch Service console, choose Serverless Collections, then select your collection.
Choose Create vector index.

Next, create a vector field by entering a name, defining an engine, and adding the dimensions, and search configurations.
1. Vector field name: Enter a name, such as vector.
2. Engine: Select nmslib.
3. Dimensions: Enter 1024.
4. Distance metric: Select Euclidean.
5. Choose Confirm.

To tag each embedding with the image file name, you must also add a mapping field under Metadata management.
1. Mapping field: Enter image_file.
2. Data type: Select String.
3. Filterable: Select True.
4. Choose Create to create the index.

Now that the vector index has been created, you can ingest the embeddings. To do so, run the following code segment to connect to your Amazon OpenSearch Serverless collection.

# Import required libraries to connect to Amazon OpenSearch Serverless connection
from opensearchpy import OpenSearch, RequestsHttpConnection
from requests_aws4auth import AWS4Auth

# Initialize endpoint name constant
HOST = "<YOUR_HOST_ENDPOINT_NAME>" # For example, abcdefghi.us-east-1.aoss.amazonaws.com (without https://)

# Initialize and authenticate with the OpenSearch client
credentials = boto3.Session().get_credentials()
auth = AWS4Auth(credentials.access_key, credentials.secret_key, REGION, 'aoss', session_token=credentials.token)
client = OpenSearch(
    hosts=[{'host': HOST, 'port': 443}],
    http_auth=auth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection,
    pool_maxsize=300
)

After connecting, you can ingest your embeddings and the associated image key for each vector as shown in the following code.

# Import required library to iterate through dataset
import tqdm.notebook as tq

INDEX_NAME = "<YOUR_VECTOR_INDEX_NAME>"
VECTOR_NAME = "<YOUR_VECTOR_FIELD_NAME>"
VECTOR_MAPPING = "<YOUR_MAPPING_FIELD_NAME>"

# Ingest embeddings into vector index with associate vector and text mapping fields
for idx, record in tq.tqdm(final_embeddings_dataset.iterrows(), total=len(final_embeddings_dataset)):
    body = {
        VECTOR_NAME: record['image_embedding'],
        VECTOR_MAPPING: record['image_key']
    }
    response = client.index(index=INDEX_NAME, body=body)

Use Amazon Rekognition to extract key objects

Now that the embeddings have been created, use Amazon Rekognition to extract objects of interest from your search query. Amazon Rekognition analyzes images to identify objects, people, text, and scenes by detecting labels and generating bounding boxes. In this use case, Amazon Rekognition will be used to detect shoe labels in query images.

To view the bounding boxes around your respective images, run the following code. If you want to apply this to your own sample images, make sure to specify the labels you want to identify. Upon completion of the bounding box and label generation, the extracted objects will be saved in your local directory in the SageMaker Notebook environment.

# Import required libraries to draw bounding box on image
from PIL import Image, ImageDraw, ImageFont

# Function to draw bounding boxes and extract labeled objects
def process_image(image_path, boxes, labels):
    # Load the image
    image = Image.open(image_path)
    
    # Convert RGBA to RGB if necessary
    if image.mode == 'RGBA':
        image = image.convert('RGB')
    
    draw = ImageDraw.Draw(image)
    
    # Font for the label
    try:
        font = ImageFont.truetype("arial.ttf", 15)
    except IOError:
        font = ImageFont.load_default()

    # Counter for unique filenames
    crop_count = 1 
    
    # Draw bounding boxes around specific label of interest (ex. shoe) and extract labeled objects
    for box, label in zip(boxes, labels):
    
        # Change to specific label you are looking to extract
        if label not in "Shoe":
            continue
        
        # Box coordinates
        left = int(image.width * box['Left'])
        top = int(image.height * box['Top'])
        right = left + int(image.width * box['Width'])
        bottom = top + int(image.height * box['Height'])
            
        # Crop the image to the bounding box
        cropped_image = image.crop((left, top, right, bottom))
    
        # Draw label on the cropped image
        cropped_draw = ImageDraw.Draw(cropped_image)
    
        # File name for the output
        file_name = f"extract_{crop_count}.jpg"
        # Save extracted object image locally
        cropped_image.save(file_name)
        print(f"Saved extracted object image: {file_name}")
        crop_count += 1
    
    # Save or display the image with bounding boxes
    image.show()

The following image shows the outputted image with the respective labels within the bounding boxes:

Embed object image

Now that the object of interest within the image has been extracted, you need to generate an embedding for it so that it can be searched against the stored vectors in the Amazon OpenSearch Serverless index. To do so, find the best extracted image in the local directory created when the images were downloaded. Ensure the image is unobstructed, high-quality, and effectively encapsulates the features that you’re searching for. After you have identified the best image, paste its file name as shown in the following code.

# Open the extracted object image file in binary mode
# Paste your extracted image from the local download directory in the notebook below
with open("<YOUR_LOCAL_EXTRACTED_IMAGE (ex. extract_1.jpg)>", "rb") as image_file:
    base64_encoded_image = base64.b64encode(image_file.read()).decode('utf-8')

# Embed the extracted object image
object_embedding = create_image_embedding(image=base64_encoded_image)

# Print the first few numbers of the embedding followed by ...
print(f"Image embedding: {object_embedding[:5]} ...")

Perform a reverse image search

With the embedding of the extracted object, you can now perform a search against the Amazon OpenSearch Serverless vector index to retrieve the closest matching images, which is performed using the k-NN algorithm. When you created your vector index earlier, you defined the similarity between vector distances to be calculated using the Euclidian metric with the nmslib engine. With this configuration, you can define the number of results to retrieve from the index and invoke the Amazon OpenSearch Service client with a search request as shown in the following code.

# Define number of images to search and retrieve
K_SEARCHES = 3

# Define search configuration body for K-NN 
body = {
        "size": K_SEARCHES,
        "_source": {
            "exclude": [VECTOR_NAME],
        },
        "query": {
            "knn": {
                "vectors": {
                    "vector": object_embedding,
                    "k": K_SEARCHES,
                }
            }
        },
        "_source": True,
        "fields": [VECTOR_MAPPING],
    }

# Invoke OpenSearch to search through index with K-NN configurations
knn_response = client.search(index=INDEX_NAME, body=body)
result = []
scores_tracked = set()  # Set to keep track of already retrieved images and their scores

# Loop through response to print the closest matching results
for hit in knn_response["hits"]["hits"]:
    id_ = hit["_id"]
    score = hit["_score"]
    item_id_ = hit["_source"][VECTOR_MAPPING]

    # Check if score has already been tracked, if not, add it to final result
    if score not in scores_tracked:
        final_item = [item_id_, score]
        result.append(final_item)
        scores_tracked.add(score)  # Log score as tracked already

# Print Top K closest matches
print(f"Top {K_SEARCHES} closest embeddings and associated scores: {result}")

Because the preceding search retrieves the file names that are associated with the closest matching vectors, the next step is to fetch each specific image to display the results. This can be accomplished by downloading the specific image from the S3 bucket to a local directory in the notebook, then displaying each one sequentially. Note that if your images are stored within a subdirectory in the bucket, you might need to add the appropriate prefix to the bucket path as shown in the following code.

import os

# Function to display image
def display_image(image_path):
    image = Image.open(image_path)
    image.show()
    
# List of image file names from the K-NN search
image_files = result

# Create a local directory to store downloaded images
download_dir = 'RESULTS'

# Create directory if not exists
os.makedirs(download_dir, exist_ok=True)

# Download and display each image that matches image query
for file_name in image_files:
    print("File Name: " + file_name[0])
    print("Score: " + str(file_name[1]))
    local_path = os.path.join(download_dir, file_name[0])
    # Ensure to add in the necessary prefix before the file name if files are in subdirectories in the bucket
    # ex. s3.download_file(BUCKET_NAME, "training/loafers/"+file_name[0], local_path)
    s3.download_file(BUCKET_NAME, file_name[0], local_path)
    # Open downloaded image and display it
    display_image(local_path)
    print()

The following images show the results for the closest matching products in the S3 bucket related to the extracted object image query:

First match:
File Name: image17.jpeg
Score: 0.64478767

Second match:
File Name: image209.jpeg
Score: 0.64304984

Third match:
File Name: image175.jpeg
Score: 0.63810235

Clean up

To avoid incurring future charges, delete the resources used in this solution.

Delete the Amazon OpenSearch Collection vector index.
Delete the Amazon OpenSearch Serverless collection.
Delete the Amazon SageMaker resources.
Empty and delete the Amazon S3 bucket.

Conclusion

By combining the power of Amazon Rekognition for object detection and extraction, Amazon Titan Multimodal Embeddings for generating vector representations, and Amazon OpenSearch Serverless for efficient vector indexing and search capabilities, you successfully created a robust reverse image search engine. This solution enhances product recommendations by providing precise and relevant results based on visual queries, thereby significantly improving the user experience for ecommerce solutions.

For more information, see the following resources:

About the Authors

Nathan Pogue is a Solutions Architect on the Canadian Public Sector Healthcare and Life Sciences team at AWS. Based in Toronto, he focuses on empowering his customers to expand their understanding of AWS and utilize the cloud for innovative use cases. He is particularly passionate about AI/ML and enjoys building proof-of-concept solutions for his customers.

Waleed Malik is a Solutions Architect with the Canadian Public Sector EdTech team at AWS. He holds six AWS certifications, including the Machine Learning Specialty Certification. Waleed is passionate about helping customers deepen their knowledge of AWS by translating their business challenges into technical solutions.

Generative AI for agriculture: How Agmatix is improving agriculture with Amazon Bedrock

November 12, 2024

by Etzik Bega Amazon AWS

This post is co-written with Etzik Bega from Agmatix. Agmatix is an Agtech company pioneering data-driven solutions for the agriculture industry that harnesses advanced AI technologies, including generative AI, to expedite R&D processes, enhance crop yields, and advance sustainable agriculture. Focused on addressing the challenge of agricultural data standardization, Agmatix has developed proprietary patented technology to harmonize and standardize data, facilitating informed decision-making in agriculture. Its suite of data-driven tools enables the management of agronomic field trials, the creation of digital crop nutrient prescriptions, and the promotion of sustainable agricultural practices. Widely embraced by agronomists, scientists, and R&D teams in crop input manufacturing and contract-based research organizations, Agmatix’s field trial and analysis solutions are at the forefront of agricultural innovation.

This post describes how Agmatix uses Amazon Bedrock and AWS fully featured services to enhance the research process and development of higher-yielding seeds and sustainable molecules for global agriculture.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. With Amazon Bedrock, you can experiment with and evaluate top FMs for your use case, privately customize them with your data using techniques such as fine-tuning and Retrieval Augmented Generation (RAG), and build agents that run tasks using your enterprise systems and data sources.

Through this innovative approach, Agmatix streamlines operations, accelerates the introduction of higher-yielding seeds, and fosters the development of new and sustainable molecules used in crop protection, including pesticides, herbicides, fungicides, and biologicals.

Innovation in field trial R&D is complex

Innovation continues to be a major driver for increasing yields and the security of our global food supply. Discoveries and improvements across seed genetics, site-specific fertilizers, and molecule development for crop protection products have coincided with innovations in generative AI, Internet of Things (IoT) and integrated research and development trial data, and high-performance computing analytical services.

Holistically, these systems have enabled dramatic reductions in time to market for new genetics and molecules, enabling growers with new and more effective products. Historical and current R&D on crop varieties and agricultural chemicals is essential to improving agricultural yields, but the process of bringing a new crop input to farms is expensive and complex. A key stage in this process is field trials. After new inputs are developed in labs, field trials are conducted to test the effectiveness of new crop varieties and agricultural chemicals in real-world conditions.

There are various technologies that help operationalize and optimize the process of field trials, including data management and analytics, IoT, remote sensing, robotics, machine learning (ML), and now generative AI.

Led by agricultural technology innovators, generative AI is the latest AI technology that helps agronomists and researchers have open-ended human-like interactions with computing applications to assist with a variety of tasks and automate historically manual processes. Applications of generative AI in agriculture include yield prediction, improving precision agriculture recommendations, educating and training agronomy staff, and enabling users to query vast datasets using natural language.

Current challenges in analyzing field trial data

Agronomic field trials are complex and create vast amounts of data. Most companies are unable to use their field trial data based on manual processes and disparate systems. Agmatix’s trial management and agronomic data analysis infrastructure can collect, manage, and analyze agricultural field trials data. Agronomists use this service to accelerate innovation and turn research and experimentation data into meaningful, actionable intelligence.

Agronomists upload or enter field trial data, create and manage tasks for monitoring field trials, and analyze and visualize trial data to generate insights. The time-consuming, undifferentiated task of cleaning, standardizing, harmonizing, and processing the data is automated and handled by Agmatix’s intelligent service.

Without the use of generative AI, the ability to build an analytical dashboard to analyze trial data and gain meaningful insights from field trials is complex and time-consuming. The following are two common challenges:

Each trial may contain hundreds of different parameters, and it’s challenging for an agronomist to understand which parameters and data points are meaningful to the specific problems they want to investigate.
There is a wide range of analytical visualization tools and charts (such as ANOVA One-Way, Regression, Boxplots, and Maps) available to choose from. However, selecting the most appropriate visualization technique that facilitates understanding of patterns and identification of anomalies within the data can be a challenging task.

Moreover, after the analytical dashboard is created, it can be complex to draw conclusions and establish connections between the different data points. For example, do the results of the trial support the hypothesis of the trial? Is there a connection between the fertilizer applied and the weight of the grain produced? Which external factors have the biggest impact on the efficacy of the product trial?

AWS generative AI services provide a solution

In addition to other AWS services, Agmatix uses Amazon Bedrock to solve these challenges. Amazon Bedrock is a fully managed, serverless generative AI offering from AWS that provides a range of high-performance FMs to support generative AI use cases.

Through the integration of Agmatix’s landscape with Amazon Bedrock, Agmatix has developed a specialized generative AI assistant called Leafy, which gives agronomists and R&D staff a significantly improved user experience.

Instead of spending hours evaluating data points for investigation, selecting the right visualization tools, and creating multiple dashboards for analyzing R&D and trial information, agronomists can write their questions in natural language and get Leafy to provide the relevant dashboards and insights immediately (see the following screenshot for an example of Leafy in action). This helps improve productivity and user experience.

The first step in developing and deploying generative AI use cases is having a well-defined data strategy. Agmatix’s technology architecture is built on AWS. Their data pipeline (as shown in the following architecture diagram) consists of ingestion, storage, ETL (extract, transform, and load), and a data governance layer. Multi-source data is initially received and stored in an Amazon Simple Storage Service (Amazon S3) data lake. AWS Glue accesses data from Amazon S3 to perform data quality checks and important transformations. AWS Lambda is then used to further enrich the data. The transformed data acts as the input to AI/ML services. The generated insights are accessed by users through Agmatix’s interface.

Focusing on generative AI, let’s first understand the fundamentals of the generative AI chatbot application:

Prompt – The input question or task including contextual information provided by the user
Data – The data required to answer the question in the prompt
Agent – The agent that performs the orchestration of tasks

In the case of Agmatix, when the agronomist asks Leafy a question, Agmatix’s Insights solution sends a request to Anthropic Claude on Amazon Bedrock through an API:

Prompt – The prompt sent to Anthropic Claude consists of tasks and data. The task is the question submitted by the user.
Data – The data in the prompt includes two types of data:
- Context data instructions to the model; for example, a list of the types of widgets available for visualization.
- The data from the specific field trial.

The following diagram illustrates the generative AI workflow.

The workflow consists of the following steps:

The user submits the question to Agmatix’s AI assistant, Leafy.
The application reads the field trial data, business rules, and other required data from the data lake.
The agent inside the Insights application collects questions and tasks and the relevant data, and sends it as a prompt to the FM through Amazon Bedrock.
The generative AI model’s response is sent back to the Insights application.
The response is displayed to the user through the widgets visualizing the trial data and the answer to the user’s specific question, as shown in the following screenshot.

The data used in the prompt engineering (trial result and rules) is stored in plain text and sent to the model as is. Prompt engineering plays a central part in this generative AI solution. For more information, refer to the Anthropic Claude prompt engineering guide.

Overall, by using Amazon Bedrock on AWS, Agmatix’s data-driven field trials service observed over 20% improved efficiency, more than 25% improvement in data integrity, and a three-fold increase in analysis potential throughput.

This is how generative AI technology is helping improve the overall experience and productivity of agronomists so they can focus on solving complex challenges and tasks that require human knowledge and intervention.

A real-life example of this solution can be seen within the largest open nutrient database for crop nutrition, powered by the Agmatix infrastructure, where researchers can tap into insights gleaned from thousands of field trials. In this practical scenario, users benefit from guided question prompts and responses facilitated by generative AI. This advanced data processing enhances users’ grasp of evolving trends in crop nutrient uptake and removal, simplifying the creation of decision support systems.

Conclusion

Seed, chemical, and fertilizer manufacturers need innovative, smart agricultural solutions to advance the next generation of genetics and molecules. Ron Baruchi, President and CEO of Agmatix, highlights the beneficial synergy between humans and technology:

“AI complements, rather than replaces, human expertise. By integrating Amazon Bedrock’s generative AI into our infrastructure, we provide our customers with self-service analytical tools that simplify complex and time-consuming tasks.”

This integration equips agronomists and researchers with advanced AI capabilities for data processing and analysis, enabling them to concentrate on strategic decision-making and creative problem-solving.

Field trial management has long needed a fresh dose of technology infusion. With Agmatix’s AI-enabled agriculture service, powered by AWS, input manufacturers can reduce the time and cost associated with field trials, while improving the overall productivity and experience of agronomists and growers. By delivering growers the most successful seeds, crop protection products, and fertilizers, their farming operations can thrive. This approach not only maximizes the efficiency of these essential crop inputs but also minimizes natural resource usage, resulting in a more sustainable and healthier planet for all.

Resources

Check out the following resources to learn more about AWS and Amazon Bedrock:

Visit the AWS Generative AI community to find deep-dive technical content and discover how our builder communities are using Amazon Bedrock in their solutions
Learn more about generative AI on AWS
Learn more about Amazon Bedrock
Learn more about how customers are achieving success with Amazon Bedrock
Learn more about AWS Solutions for Agriculture

About the Authors

Etzik Bega is the Chief Architect of Agmatix, where he has revolutionized the company’s data lake architecture using cutting-edge GenAI technology. With over 25 years of experience in cybersecurity, system architecture, and communications, Etzik has recently focused on helping organizations move to the public cloud securely and efficiently.

Menachem Melamed is a Senior Solutions Architect at AWS, specializing in Big Data analytics and AI. With a deep background in software development and cloud architecture, he empowers organizations to build innovative solutions using modern cloud technologies.

Prerana Sharma is Manager of Solutions Architects at AWS, specializing in Manufacturing. With a wide experience of working in the Digital Farming space, Prerana helps customers solve business problems by experimenting and innovating with emerging technologies on AWS.

Generate financial industry-specific insights using generative AI and in-context fine-tuning

November 12, 2024

by Randy DeFauw Amazon AWS

In this blog post, we demonstrate prompt engineering techniques to generate accurate and relevant analysis of tabular data using industry-specific language. This is done by providing large language models (LLMs) in-context sample data with features and labels in the prompt. The results are similar to fine-tuning LLMs without the complexities of fine-tuning models. We used a method called Generative Tabular Learning (GTL) based on the whitepaper From Supervised to Generative: A Novel Paradigm for Tabular Deep Learning with Large Language Models and demonstrate the advantages of GTL using fully managed JupyterLab notebooks in Amazon SageMaker notebooks to interact with Meta Llama models hosted in Amazon SageMaker or Amazon Bedrock. You may check out additional reference notebooks on aws-samples for how to use Meta’s Llama models hosted on Amazon Bedrock.

Prerequisites

The following sections describes the prerequisites needed for this demonstration. You can implement these steps either from the AWS Management Console or using the latest version of the AWS Command Line Interface (AWS CLI).

Access to LLMs such as Meta’s Llama models hosted on Amazon SageMaker or Amazon Bedrock
Amazon SageMaker Domain configuration configured with JupyterLab notebooks and the necessary python libraries and packages to interact with the LLMs
Sample tabular datasets from the financial industry formatted as structured data (we are using exchange-traded funds data from Kaggle) available for querying using a SQL engine like Amazon Athena.
Knowledge of generative AI prompt engineering techniques to provide LLMs with relevant context and sample data
Ability to evaluate and compare LLM-generated outputs for accuracy and relevance to the analysis task
Understanding of financial industry data and knowledge of staging and querying this data in a structured tabular format consumable by LLMs
Knowledge of the industry domain that the data belongs to in order to determine appropriate features and labels for sample data prompts

Financial industry data

In the financial industry data can be in the form of a table in PDF files or structured data in a database. The following is an example of a financial information dataset for exchange-traded funds (ETFs) from Kaggle in a structured tabular format that we used to test our solution.

A user can ask a business- or industry-related question for ETFs.

NOTE: Since we used an SQL query engine to query the dataset for this demonstration, the prompts and generated outputs mention SQL below.

# Business question 
question = "Please provide a list of about 100 ETFs or ETNs names with exposure to US markets" 

# Generate a prompt to get the LLM to provide an SQL query 
SQL_SYS_PROMPT = PromptTemplate.from_template(tmp_sql_sys_prompt).format(
     question=question, 
     table_schema=table_schema_etf 
) 

results = get_llm_sql_analysis( 
     question=question, 
     sql_sys_prompt=SQL_SYS_PROMPT, 
     qna_sys_prompt=QNA_SYS_PROMPT 
)

After the data is retrieved from the dataset, it’s sent to the LLM hosted in Amazon Bedrock (refer to the list of supported models in Amazon Bedrock) for analysis and generates a response to the user’s question or query in natural language.

The question in the preceding example doesn’t require a lot of complex analysis on the data returned from the ETF dataset. We get a response from the LLM based on its analysis of the data in a satisfactory industry or business-relevant language:

LLM SQL Analysis: 
After analyzing the provided SQL query results, I can conclude that the list of ETFs/ETNs does not primarily focus on US markets. Instead, it appears to be a comprehensive list of bond ETFs/ETNs with a global scope, covering various regions, currencies, and bond types.

Here are some key observations:

1. **Global coverage**: The list includes ETFs/ETNs tracking bond markets in Europe (e.g., Eurozone, UK), the US, and globally diversified indices.
2. **Bond types**: The list covers a range of bond types, including corporate bonds, government bonds, high-yield bonds, and green bonds.
3. **Currency exposure**: ETFs/ETNs are denominated in various currencies, such as EUR, USD, and GBP, with some offering hedged exposure to mitigate currency risks.
4. **ESG and SRI focus**: A significant portion of the list consists of ETFs/ETNs with an Environmental, Social, and Governance (ESG) or Socially Responsible Investing (SRI) focus, which suggests a emphasis on sustainable investing.

To answer the original question, I can identify a subset of ETFs/ETNs from the list that have exposure to US markets:

**US-focused ETFs/ETNs:**

1. xxxx USD Corporate Bond 0-3yr ESG UCITS ETF USD (Dist)
2. xxxx USD Corporate Bond ESG 0-3yr UCITS ETF EUR Hedged (Acc)
3. xxxx ESG USD High Yield (DR) UCITS ETF - Dist
4. xxxx USD High Yield Corporate Bond ESG UCITS ETF USD (Acc)
5. xxxx USD High Yield Corporate Bond ESG UCITS ETF USD (Dist)
6. xxxx Index US Corporate SRI UCITS ETF DR (C)
7. xxxx Index US Corporate SRI UCITS ETF DR Hedged EUR (D)
8. xxxx USD Corporate Bond ESG UCITS ETF (Acc)
9. xxxx USD Corporate Bond ESG UCITS ETF (Dist)
10. xxxx ESG USD High Yield Corporate Bond UCITS ETF 1C
11. xxxx ETF (LU) xxxx xxxx US Liquid Corporates Sustainable UCITS ETF (USD) A-dis
12. xxxx USD Corporate Green Bond UCITS ETF 2C Acc USD

Please note that this subset is not exhaustive, and there may be other ETFs/ETNs in the original list that have some exposure to US markets. Additionally, investors should carefully evaluate the investment objectives, risks, and characteristics of each ETF/ETN before making any investment decisions.

NOTE: Output ETF names do not represent the actual data in the dataset used in this demonstration.

NOTE: Outputs generated by LLMs are non-deterministic and may vary in your testing.

What would the LLM’s response or data analysis be when the user’s questions in industry specific natural language get more complex? To answer questions that require more complex analysis of the data with industry-specific context the model would need more information than relying solely on its pre-trained knowledge.

Solution overview

We encourage you to think about this question before starting: Can enhancing the context provided to the LLM in the prompt along with the user’s natural language question work in generating better outputs, before trying to fine-tuning the LLMs which requires setting up MLOPS processes and environments, collecting and preparing relevant and accurate labeled datasets, and more?

We propose an intermediate GTL framework using the Meta Llama model on Amazon Bedrock. The proposed framework is not meant to replace the fine-tuning option. The following diagram illustrates this framework of GTL for LLMs.

GTL is a type of few-shot prompting technique where we provide the following information about the data retrieved from the structured dataset as part of the prompt to the LLM:

A personality for the LLM to use when generating the data analysis (which provides hints to the model to use industry-specific data it has already been pre-trained with)
Data features and descriptions
Data labels and descriptions
A small sample dataset containing features
A sample analysis as an example

The following is an example GTL prompt:

instructions = [
    {
        "role": "user",
        "content": """Given the following SQL query results: {query_results}

And the original question: {question}

You are an expert in Exchange-Traded Funds or ETFs and Exchange-Traded Notes or ETNs .
Based on the features of the funds or notes, please predict how expensive the funds are for investors.
I will supply multiple instances with features and the corresponding label for reference.
Please refer to the table below for detailed descriptions of the features and label:
— feature description —
Features:
isin: International Securities Identification Number
wkn: Wertpapierkennnummer or German securities identification number
name: ETF Name
fundprovider: Financial Company providing the ETF
legalstructure: Exchange Traded Fund (ETF) or Exchange Traded Notes (ETN)
totalexpenseratio: An expense ratio is the cost of owning an ETF or ETN, the management fee paid to the fund company for the benefit of owning the fund, 
paid annually and measured as a percent of your investment in the fund. 0.30 percent means you’ll pay $30 per year for every $10,000 you have invested in the fund.
— label description —
Expensive: Whether the fund is expensive for investors or not. 0 means not expensive, 1 means expensive.
— data —
|isin|wkn|name|fundprovider|legalstructure|totalexpenseratio|Expensive|
|GB00BNRRxxxx |A3xxxx|xxxx Physical Staked Cardano|xxxx|ETN|0.0|0|
|BGPLWIG0xxxx|A2xxxx|xxxx Poland WIGxxx UCITS ETF|xxxx|ETF|0.0138|0|
|CH044568xxxx|A2Txxxx|xxxx Crypto Basket Index ETP|xxxx|ETN|0.025|1|
|CH1114873xxxx|A3Gxxxx|xxxx Solana ETP|xxxx|ETN|0.025|1|
|GB00BNRRxxxx|A3xxxx|xxxx Physical Staked Algorand|xxxx|ETN|0.0|<MASK>|
Please use the supplied data to predict the <MASK>. Fund is expensive[1] or not[0]?
Answer: 0
Please provide an analysis and interpretation of the results to answer the original {question}.
"""
    }
]
QNA_SYS_PROMPT = format_instructions(instructions)

In our GTL prompt, we are highlighting only the subset of columns in the dataset that are relevant to produce accurate and relevant analysis of the data based on the industry out of the possible 129 columns in the EFT dataset we have chosen and also examples of how to interpret the data.

Use case examples

Let’s look at a few sample prompts with generated analysis. The following question requires complex industry knowledge-based analysis of data from multiple columns in the ETF database. In this example the user is trying to find ETFs (funds) that provide higher dividends and lower volatility in value, which are a desired fund characteristic that investors look for in funds they want to invest in.

# Business question 
question = "Name the least risk funds that yields higher dividends and isn't volatile" 

# Generate a prompt to get the LLM to provide an SQL query 
SQL_SYS_PROMPT = PromptTemplate.from_template(tmp_sql_sys_prompt).format(
     question=question, 
     table_schema=table_schema_etf 
) 

results = get_llm_sql_analysis( 
      question=question, 
      sql_sys_prompt=SQL_SYS_PROMPT, 
      qna_sys_prompt=QNA_SYS_PROMPT 
)

The following is our response without the GTL prompt:

LLM SQL Query: 

SELECT 
    name, 
    yeardividendyield, 
    yearvolatilitycur, 
    fiveyearreturncur 
FROM 
    etftable 
WHERE 
    yeardividendyield > 3 
    AND yearvolatilitycur < 10 
    AND fiveyearreturncur > 5 
ORDER BY 
    yearvolatilitycur ASC 
LIMIT 10;

LLM SQL Analysis: 
Based on the provided data, I will analyze the features and predict the GoodInvestment label for each fund. Then, I will identify 
the least risk funds that yield higher dividends and aren't volatile.

**Analysis**

From the data, I observe the following:

1. **Low total expense ratio**: Funds with lower totalexpenseratio are more desirable, as they charge lower management fees. 
A lower expense ratio indicates that the fund is cheaper to own.
2. **High five-year return**: Funds with higher fiveyearreturncur indicate better performance over the past 5 years.
3. **Non-zero yearly dividend yield**: Funds with non-zero yeardividendyield provide a regular income stream to investors.

**Prediction of GoodInvestment label**

Based on the analysis, I predict the GoodInvestment label for each fund as follows:

|isin|GoodInvestment|
| --- | --- |
|LU17815xxxx|1|
|LU17815xxxx7|1|
|IE00B5Bxxxx|0|
|LU14078xxxx|1|
|IE00B4xxxx|1|
|LU17815xxxx|1|

**Least risk funds that yield higher dividends and aren't volatile**

To identify the least risk funds, I will consider the following criteria:

1. **Low volatility**: Funds with lower fiveyearreturncur volatility are less risky.
2. **Higher dividend yield**: Funds with higher yeardividendyield provide a regular income stream to investors.
3. **Low expense ratio**: Funds with lower totalexpenseratio are cheaper to own.

Based on these criteria, I recommend the following funds:

1. **xxxxUK NT (DR) UCITS ETF - Dist (LU17815xxxx)**: This fund has a low expense ratio (0.04%), a relatively high dividend yield (0.0379), 
and a moderate five-year return (0.2296).
2. **xxxxUS Equity (DR) UCITS ETF - Dist (LU17815xxxx)**: This fund has a low expense ratio (0.04%), a relatively high dividend yield (0.0196), 
and a high five-year return (0.7337).
3. **xxxxEURO STOXX 50 UCITS ETF EUR (IE00B4Kxxxx)**: This fund has a moderate expense ratio (0.05%), a relatively high dividend yield (0.0297), 
and a moderate five-year return (0.3927).

These funds offer a balance between dividend yield, low expense ratio, and moderate volatility, making them suitable for investors seeking a 
relatively stable investment with a regular income stream.

We see that the data retrieval only uses the yearly volatility and not the 3-year and 5-year volatility information. Also, some of the funds don’t have volatility data in the dataset (no values for 1-year, 3-year, or 5-year volatility).

The following is a modified question requesting additional column considerations for 3-year and 5-year data.

# Business question 
question = "Name the least risk funds that yields higher dividends and isn't volatile based on five year, three year and one year volatiliy data" 

# Generate a prompt to get the LLM to provide an SQL query 

SQL_SYS_PROMPT = PromptTemplate.from_template(tmp_sql_sys_prompt).format( 
     question=question, 
     table_schema=table_schema_etf 
) 

results = get_llm_sql_analysis( 
     question=question, 
     sql_sys_prompt=SQL_SYS_PROMPT, 
     qna_sys_prompt=QNA_SYS_PROMPT 
)

We use the following GTL prompt with labels to interpret 1-year, 3-year, and 5-year data or lack of data:

instructions = [
    {
        "role": "user",
        "content": """Given the following SQL query results: {query_results}

And the original question: {question}

You are an expert in Exchange-Traded Funds or ETFs and Exchange-Traded Notes or ETNs .
Based on the features of the funds or notes, please predict best funds for investors to invest in.
I will supply multiple instances with features and the corresponding label for reference.
Please refer to the table below for detailed descriptions of the features and label:
— feature description —
Features:
isin: International Securities Identification Number
wkn: Wertpapierkennnummer or German securities identification number
name: ETF Name
fundprovider: Financial Company providing the ETF
legalstructure: Exchange Traded Fund (ETF) or Exchange Traded Notes (ETN)
yeardividendyield: Yearly Dividend yield as a percentage of total investment
fiveyearreturncur: Returns over past 5 year period as a percentage of investment
totalexpenseratio: An expense ratio is the cost of owning an ETF or ETN, the management fee paid to the fund company for the benefit of owning the fund, 
paid annually and measured as a percent of your investment in the fund. 0.30 percent means you’ll pay $30 per year for every $10,000 you have invested in the fund.
— label description —
volatile: The fund has low fiveyearvolatilitycur, threeyearvolatilitycur, yearvolatilitycur. 0 means not volatile, 1 means volatile, 2 means cannot be determined.
— data —
|isin|name|fiveyearvolatilitycur|threeyearvolatilitycur|yearvolatilitycur|Risk|
|LU033504xxxx|xxxx II EUR Overnight Rate Swap UCITS ETF 1D|8.0E-4|9.0E-4|0.0011|0|
|FR001051xxxx|Lyxor Euro Overnight Return UCITS ETF - Acc|8.0E-4|9.0E-4|0.0011|0|
|LU0290358xxxx|xxxxII EUR Overnight Rate Swap UCITS ETF 1C|9.0E-4|0.001|0.0013|0|
|BGCROEX03xxxx|xxxx UCITS ETF|0.675|0.8606|1.4787|1|
|IE000RN0xxxx|xxxxDisruptive Technology Real Estate UCITS ETF Acc||||2|
|GB00BNRRxxxx|xxxxPhysical Staked xxxx||||2|
|FR001075xxxx|xxxx ETF xxxx0-6 Months Euro Investment Grade UCITS ETF EUR (C)|0.0014|0.0016|0.0021|<MASK>|
Please use the supplied data to predict the <MASK>. Fund is volatile[1] or not[0] or cannot-be-determined[2]?
Answer: 1
Please provide an analysis and interpretation of the results to answer the original {question}.
"""
    }
]

We see that with additional prompting the model uses all of the volatility columns in the dataset (1-year, 3-year, and 5-year) and provides output suggestions for when data is present or missing in the volatility columns.

The following is our response with GTL prompts:

LLM SQL Query: 

SELECT 
  name, 
  fundprovider, 
  yeardividendyield, 
  fiveyearvolatilitycur, 
  threeyearvolatilitycur, 
  yearvolatilitycur
FROM 
  etftable
WHERE 
  yeardividendyield > 0 
  AND fiveyearvolatilitycur < (SELECT AVG(fiveyearvolatilitycur) FROM etftable) 
  AND threeyearvolatilitycur < (SELECT AVG(threeyearvolatilitycur) FROM etftable) 
  AND yearvolatilitycur < (SELECT AVG(yearvolatilitycur) FROM etftable)
ORDER BY 
  yeardividendyield DESC, 
  fiveyearvolatilitycur, 
  threeyearvolatilitycur, 
  yearvolatilitycur
LIMIT 10;

LLM SQL Analysis: 
Based on the provided data, I will analyze and interpret the results to answer the original question.

**Least Risk Funds with Higher Dividend Yields and Low Volatility**

To identify the least risk funds, I will consider the following criteria:

1. **Low Volatility**: Funds with low five-year, three-year, and one-year volatility (less than 0.1).
2. **Higher Dividend Yields**: Funds with a higher yearly dividend yield (greater than 0.07).
3. **Low Expense Ratio**: Funds with a low total expense ratio (less than 0.1).

After analyzing the data, I have identified the following funds that meet the above criteria:

1. **xxxx Global Select Dividend 100 Swap UCITS ETF 1D**:
	* Yearly Dividend Yield: 0.0938
	* Five-year Volatility: 0.1096
	* Three-year Volatility: 0.1125
	* One-year Volatility: 0.1236
	* Expense Ratio: Not available in the data
2. **xxxx Emerging Markets Equity Income UCITS ETF**:
	* Yearly Dividend Yield: 0.0797
	* Five-year Volatility: 0.1532
	* Three-year Volatility: 0.1471
	* One-year Volatility: 0.1335
	* Expense Ratio: Not available in the data
3. **xxxx FTSE EM High Dividend Low Volatility UCITS ETF**:
	* Yearly Dividend Yield: 0.0787
	* Five-year Volatility: 0.1566
	* Three-year Volatility: 0.1621
	* One-year Volatility: 0.1439
	* Expense Ratio: Not available in the data

These funds have relatively low volatility and higher dividend yields compared to the other funds in the dataset. However, please note that the 
expense ratio is not available in the data, which is an important factor to consider when evaluating the overall cost of investing in an ETF.

**Additional Considerations**

When evaluating these funds, investors should also consider other factors such as:

1. **Fund Provider**: The reputation and experience of the fund provider can impact the fund's performance and risk profile.
2. **Legal Structure**: The legal structure of the fund (ETF or ETN) can affect its tax efficiency, liquidity, and regulatory environment.
3. **Investment Strategy**: The fund's investment strategy and asset allocation can influence its risk profile and potential returns.

In conclusion, the three funds mentioned above appear to be the least risk funds with higher dividend yields and low volatility based 
on the provided data. However, investors should conduct further research and consider additional factors before making an investment decision.

As we can see the data retrieval is more accurate. Additionally, the generated analysis has considered all of the volatility information in the dataset (1-year, 3-year, and 5-year) and accounted for present or missing data for volatility.

Based on this outcome, the recommendation is to build a curated set of GTL prompts along with the most common user questions pertaining to datasets that users will be asking. The prompts will need to be created by dataset specialists who have deep understanding of the dataset from industry perspective and can provide the right context to the LLMs. Organizations can use such a prompt library to build interactive applications that allow regular business users who may not have deep knowledge or understanding of underlying datasets to interact with and gain insights from these datasets using natural language questions.

Conclusion

As newer and larger LLMs are released, they get better at generating an analysis of structured datasets using industry-specific language. However, there is room for improvement in the analysis of data from structured datasets. One option is to fine-tune the LLM to improve relevance and language of the generated data analysis using specific business language. Fine-tuning requires additional efforts and costs (collecting relevant data, labeling the data, additional costs involved in procuring, and provisioning, and maintaining the fine-tuning compute environment).

In this post, we showcased a method with few-shot prompting using Meta Llama models available through Amazon Bedrock that can improve industry- or business-specific analysis of the data with just prompt engineering. (For certain use cases, fine-tuning may be required. Refer to Amazon Bedrock pricing for estimated costs with or without using fine-tuned models).

Try this solution with your own industry-specific use cases and datasets, and let us know your feedback and questions in the comments.

NOTE: Blog authors are not providing any financial or investment advice in this blog post, nor are they recommending this dataset or ETFs mentioned in this dataset.

About the Authors

Randy DeFauw is a Senior Principal Solutions Architect at AWS. He holds an MSEE from the University of Michigan, where he worked on computer vision for autonomous vehicles. He also holds an MBA from Colorado State University. Randy has held a variety of positions in the technology space, ranging from software engineering to product management. In entered the Big Data space in 2013 and continues to explore that area. He is actively working on projects in the ML space and has presented at numerous conferences including Strata and GlueCon.

Arghya Banerjee is a Sr. Solutions Architect at AWS in the San Francisco Bay Area focused on helping customers adopt and use AWS Cloud. He is focused on Big Data, Data Lakes, Streaming and batch Analytics services and generative AI technologies.

Ravi Ganesh is a Sr Solution Architect in AWS at Austin Texas Area, focused on helping customer address their business problems through adoption of Cloud, He is focussed on Analytics, Resiliency, Security and Generative AI technologies.

Varun Mehta is a Sr. Solutions Architect at AWS. He is passionate about helping customers build enterprise-scale Well-Architected solutions on the AWS Cloud. He works with strategic customers who are using AI/ML to solve complex business problems. Outside of work, he loves to spend time with his wife and kids

Deliver personalized marketing with Amazon Bedrock Agents

November 12, 2024

by Ray Wang Amazon AWS

Creative content plays a crucial role in marketing, and personalized creative content in particular significantly boosts marketing performance. Generating personalized content can present a significant challenge for marketers because it requires considerable time and resources. This challenge stems from the need for multiple versions of creative content across various channels, such as paid media (ads) and owned media, including electronic direct mail (EDM), social media posts, app notifications, and SMS. Scaling this process can be challenging, especially for small and medium-sized businesses.

Generative AI now empowers marketers to efficiently create personalized content, even with limited resources. By using machine learning (ML) models, you can pinpoint customer preferences for specific merchandise and tailor your marketing campaigns accordingly. This enables the crafting of compelling promotional text and striking visuals that effectively resonate with each customer segment, thereby driving engagement and increasing sales. Using Amazon Bedrock Agents to create your own marketing agent allows you to seamlessly accomplish list targeting and personalized material generation for specific marketing purposes.

In this post, we demonstrate a solution using Amazon Bedrock Agents, Amazon Bedrock Knowledge Bases, Amazon Bedrock Developer Experience, and Amazon Personalize that allow marketers to save time and deliver efficient personalized advertising using a generative AI enhanced solution. Our solution is a marketing agent that shows how Amazon Personalize can effectively segment target customers based on relevant characteristics and behaviors. Additionally, by using Amazon Bedrock Agents and foundation models (FMs), our tool generates personalized creative content specifically tailored to each purpose. It customizes the tone, creative style, and individual preferences according to each customer’s specific prompt, providing highly customized and effective marketing communications.

Marketing agent overview

In the following diagram, we show the components that power our marketing agent.

The difference between an agent and a large language model (LLM) is that an agent comprises not only LLMs, but also includes planning skills, tool usage, and memory. This means that when you provide a natural language prompt, you receive user segment results along with creative content tailored to your specifications. For example, if you want to promote an oven through EDM, social media posts, or SMS, the marketing agent will use its tools to generate a customer list using a segmentation model trained on your data. Furthermore, it will generate creative content that uses your historical creative content as examples and incorporate detailed merchandise data from your database.

The marketing agent solution includes three tools:

Merchandise tool – Retrieve merchandise details from Amazon DynamoDB (item database) and deliver them to the creative content tool according to the customer’s prompt.
User segment tool – Retrieve a list from Amazon Simple Storage Service (Amazon S3) created by Amazon Personalize which is tailored to the merchandise plan for promotion. This process uses comprehensive user, merchandise (item), and interaction data.
Creative content tool – Generate the personalized creative content using an LLM based on the augmented prompt. The augmented prompt is formed by retrieving creative assets data from Amazon Bedrock Knowledge Bases (historical creative content), the merchandise database from DynamoDB, and the user database from DynamoDB, based on the customer’s input prompt.

This agent operates based on natural language prompts and your organization’s data. These managed agents serve as intelligent orchestrators, managing interactions between FMs, API integrations, user questions and instructions, and knowledge sources filled with your proprietary data. The agent skillfully coordinates and processes user inputs through various dynamic steps during its runtime.

Amazon Bedrock is a fully managed service that offers a choice of high-performing FMs from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. The single API access, regardless of the models you choose, gives you the flexibility to use different FMs and upgrade to the latest model versions with minimal code changes.

Amazon Bedrock agents plan and run multistep tasks using company systems and data sources—from answering customer questions about your product availability to taking their orders. With Amazon Bedrock, you can create an agent in just a few clicks by selecting an FM and providing it access to your enterprise systems, knowledge bases, and AWS Lambda functions to securely run your APIs. An agent analyzes the user request and automatically calls the necessary APIs and data sources to fulfill the request. Amazon Bedrock Agents enables you to do this securely and privately—you don’t have to engineer prompts, manage session context, or manually orchestrate tasks.

Amazon Bedrock Knowledge Bases is a fully managed capability that helps you implement the entire retrieval augmented generation (RAG) workflow, from ingestion to retrieval and prompt augmentation, without having to build custom integrations to data sources or manage data flows. Session context management is built in, so your app can readily support multi-turn conversations. You can use the Retrieve API to fetch relevant results for a user query from knowledge bases. You can also add knowledge bases to Amazon Bedrock Agents to provide contextual information to agents. The information retrieved from the knowledge base is provided with citations to improve transparency and minimize hallucinations.

Amazon Personalize is a fully managed ML service that uses your data to generate recommendations for your users and enables developers to quickly implement a customized personalization engine, without requiring ML expertise. It accelerates your digital transformation with ML, making it effortless to integrate personalized recommendations into existing websites, applications, email marketing systems, and more.

Solution overview

Amazon Bedrock Agents is our key component for developing our marketing agent. It enables you to build and configure autonomous agents in your application. Agents orchestrate interactions between FMs, data sources, software applications, and user conversations, and automatically call APIs to perform actions and invoke knowledge bases to supplement information for these actions. You can add actions for it to carry out and define how to handle them by writing Lambda functions in a programming language of your choice. For more details, refer to Automate tasks in your application using conversational agents.

We implement the marketing agent through Amazon Bedrock Agents, and use the following key features:

Foundation model – The agent invokes an FM to interpret user input, generate subsequent prompts in its orchestration process, and generate creative content based on the customer’s requirement.
Instructions – Instructions tell the agent what it’s designed to do and how to do it.
Action groups – Action groups are interfaces that an agent uses to interact with the different underlying components such as APIs (such as Amazon Personalize batch inference result on Amazon S3) and databases (such as user or merchandise databases). An agent uses action groups to carry out actions, such as making an API call to another tool.
Knowledge base – The knowledge base is a link to an existing data source, consisting of the customer’s historical creative content, which allows the agent to query for extra context for the prompts.

For details about supported models, refer to Supported foundation models in Amazon Bedrock, Supported regions and models for Amazon Bedrock Agents, and Supported regions and models for Amazon Bedrock Knowledge Bases.

The following diagram illustrates the solution workflow.

There are two associated action groups:

Segment targeted customer list – Useful for segmenting a customer list for specific merchandise that you aim to promote
Generate personalized creative content – Useful for generating creative content tailored to specific purposes, such as diverse customer preferences, varying customer types, and different marketing channels

We use two types of datasets in this solution:

Structured customer data – We use customer data, merchandise (item) data, and interaction data to train the segmentation model using Amazon Personalize
Unstructured data – We use historical creative content and merchandise (item) data as augmented prompts to make sure that the creative content generated by the LLM aligns with your brand’s style and marketing guidelines

When the marketing agent receives a prompt from a business user, it follows a number of steps as part of its orchestration:

Outline the steps for the task by using an LLM within Amazon Bedrock according to the specifications provided in the prompt.
Follow chain-of-thought reasoning and instructions, and complete the steps using appropriate action groups. As part of the process, depending on the prompt, the agent will search and identify relevant context for RAG.
Pass the results with the prompt to an LLM within Amazon Bedrock.
Augment the prompt with the results of the tool execution or knowledge base search and send it to the LLM.

The following diagram illustrates the technical architecture and key steps.

Amazon Bedrock Agents allows you to set up the entire process, including getting the user segmentation list from Amazon Personalize and generating the personalized promotional content with Anthropic’s Claude 3 on Amazon Bedrock. There are three steps: data preparation, agent development, and agent testing. You can find the sample code and the AWS Cloud Development Kit (AWS CDK) stack in the GitHub repo.

Prepare the data

Complete the following steps to prepare your data:

Store your creative content on Amazon S3. Ingest your data by generating embeddings with an FM and storing them in a supported vector store like Amazon OpenSearch Service.
Use Amazon Bedrock Knowledge Bases by specifying an S3 bucket that contains your exported creative content data. For instructions, refer to Retrieve data and generate AI responses with knowledge bases.
1. Use OpenSearch Service as the vector database.
2. Complete the knowledge base configuration and synchronize data from Amazon S3 to OpenSearch Service so the vector database data remains up to date.
Initiate an Amazon Personalize job with the USER_SEGMENTATION recipe to create user segmentations and export the results to Amazon S3. For more information, see Getting user segments.
1. Upload your user dataset, interactions dataset, and item dataset into Amazon S3 for model training and create a batch segment job to get your user segment list. This allows you to map item IDs to a list of users interested in these items.
2. The batch segment job output will be a JSON file stored on Amazon S3 that looks like the following example:

{"input": {"itemId": "e1669081-8ffc-4dec-97a6-e9176d7f6651"}, "output": {"usersList": ["3184","4223","4301",...]}, "error": null}

Build the agent

In this solution, you need a marketing agent, a creative content knowledge base, and three tools (the merchandise tool to get detailed merchandise information, the user segment tool to get the target audience list, and the creative content tool to generate the creative content by the LLM) tailored to automate the various tasks associated with delivering personalized creative content efficiently. Complete the following steps to build your agent:

Clone the repository to your local machine or AWS environment, set up a virtual environment and activate it, download the related data, and install the required Python packages using the following code:

git clone https://github.com/aws-samples/amazon-bedrock-samples.git

cd ./amazon-bedrock-samples/agents-for-bedrock/use-case-examples/marketing-agent

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements-dev.txt
wget https://aws-blogs-artifacts-public.s3.amazonaws.com/artifacts/ML-16145/data.zip
unzip data.zip
wget https://code.retaildemostore.retail.aws.dev/images.tar.gz
mkdir -p data/image

tar xf images.tar.gz -C data/image

Deploy using the following code:

cdk deploy

Locate the agent name in the output of the preceding command. In the following screenshot, for example, the Amazon Bedrock agent name is bedrock-agent-markegent-2e1ece3c.

To dive deeper into the solution and code for the complete steps, see the GitHub repo.

Test the agent

You can test the agent on the Amazon Bedrock console. For instructions, see Test and troubleshoot agent behavior.

The following screenshot shows an example of testing the agent.

We used the following input message:

I would like to write a social media post to promote the Sans pareil scarf (e1669081-8ffc-4dec-97a6-e9176d7f6651). Please assist me in crafting the marketing content.

The following is the agent output:

Elevate Your Style with Timeless Elegance 💎 Introducing the Sans Pareil Scarf - a true masterpiece of fashion designed for the sophisticated, style-conscious woman. Crafted with impeccable artistry from the finest materials, this $124.99 luxury accessory exudes an air of refinement that instantly elevates any outfit. Its rich colors and intricate patterns complement both casual and formal ensembles, adding an undeniable touch of elegance. Whether dressing for the office, a night out, or a special occasion, the Sans Pareil Scarf is the ultimate statement piece for the modern woman seeking timeless style. From the fashion-forward trendsetter to the classically chic professional, this exquisite scarf transcends age while remaining a treasured investment that will never go out of style. Indulge in enduring quality and sophistication with the Sans Pareil Scarf. Elevate your style and turn heads wherever you go. #LuxuryFashion #SansPareilScarf #TimelessElegance

Clean up

In this post, we demonstrated interactions using Lambda, Amazon Bedrock Agents, and Amazon Bedrock Knowledge Bases. To avoid incurring additional costs, delete these resources in the following order:

Delete the Amazon Bedrock agent.
Delete the Amazon Bedrock knowledge base and its associated resources.
Delete the Lambda function and its related resources.

Summary

In this post, we discussed the use case of targeted marketing as an example to demonstrate the efficient delivery of personalized marketing creative content and target audience lists through a generative AI agent. The next step might involve developing a reinforcement learning-based agent to iterate on the performance of the agent.

Our customer, Chunghwa Telecom, a leading telecom customer in Taiwan, followed this solution to implement generative AI enhanced marketing technology tool to enhance their business through Amazon Bedrock. The marketing agent enabled CHT to initiate tailored campaigns promptly, leading to the realization of personalized marketing strategies and a 24-fold increase in their clickthrough rate.

To use our marketing agent to enhance your marketing tasks, refer to the GitHub repo.

About the Authors

Ray Wang is a Senior Solutions Architect at AWS. With 10 years of experience in the IT industry, Ray is dedicated to building modern solutions on the cloud, especially in NoSQL, big data, machine learning, and Generative AI. As a hungry go-getter, he passed all 12 AWS certificates to make his technical field not only deep but wide. He loves to read and watch sci-fi movies in his spare time.

Paul Lu is a Senior Solution Architect at Amazon Web Services (AWS). He specialize in Serverless and modern application development, helping customers design high-performing, scalable cloud solutions. With extensive experience, he is passionate about driving innovation and delivering exceptional results.

Fine-tune Meta Llama 3.2 text generation models for generative AI inference using Amazon SageMaker JumpStart

November 11, 2024

by Pavan Kumar Rao Navule Amazon AWS

Generative AI models have seen tremendous growth, offering cutting-edge solutions for text generation, summarization, code generation, and question answering. Despite their versatility, these models often struggle when applied to niche or domain-specific tasks because their pre-training is typically based on large, generalized datasets. To address these gaps and maximize their utility in specialized scenarios, fine-tuning with domain-specific data is essential to boost accuracy and relevance.

Meta’s newly launched Llama 3.2 series sets a new benchmark in generative AI with its advanced multimodal capabilities and optimized performance across diverse hardware platforms. The collection spans lightweight models like Llama-3.2-1B and Llama-3.2-3B, which support up to 128,000 tokens of context and are tailored for edge devices. These models are ideal for on-device applications such as real-time summarization, instruction following, and multilingual text generation. On the other end of the spectrum, the larger Llama-3.2-11B and Llama-3.2-90B models offer powerful vision-enabled capabilities for tasks such as image understanding, document analysis, and visual grounding. This allows for sophisticated use cases like generating captions for images, interpreting complex graphs, and reasoning over visual data. For instance, the Meta Llama 3.2 models can analyze sales data presented in a graph to provide actionable insights or locate specific objects on a map using natural language instructions.

In this post, we demonstrate how to fine-tune Meta’s latest Llama 3.2 text generation models, Llama 3.2 1B and 3B, using Amazon SageMaker JumpStart for domain-specific applications. By using the pre-built solutions available in SageMaker JumpStart and the customizable Meta Llama 3.2 models, you can unlock the models’ enhanced reasoning, code generation, and instruction-following capabilities to tailor them for your unique use cases. Whether you’re working in finance, healthcare, or any other specialized field, fine-tuning these models will allow you to bridge the gap between general AI capabilities and domain-specific expertise.

Solution overview

SageMaker JumpStart is a robust feature within the SageMaker machine learning (ML) environment, offering practitioners a comprehensive hub of publicly available and proprietary foundation models (FMs). This managed service accelerates the ML development process by providing access to a growing list of cutting-edge models from leading model hubs and providers. You can quickly evaluate, compare, and select FMs based on predefined quality and responsibility metrics for tasks such as article summarization and image generation.

SageMaker JumpStart allows for full customization of pre-trained models to suit specific use cases using your own data. Deployment to production environments is streamlined through the user interface or SDK, enabling rapid integration into applications. The platform also supports organizational collaboration by allowing the sharing of artifacts, including models and notebooks, to expedite model building and deployment. Administrators can manage the visibility of models within the organization, enhancing governance and security.

Furthermore, SageMaker JumpStart enables practitioners to deploy models to dedicated SageMaker instances within a network-isolated environment, maintaining compliance and data protection. By using the robust training and deployment capabilities available in SageMaker, you can customize and scale models to meet diverse ML requirements efficiently.

Prerequisites

To try out this solution using SageMaker JumpStart, you’ll need the following prerequisites:

An AWS account that will contain all of your AWS resources.
An AWS Identity and Access Management (IAM) role to access SageMaker. To learn more about how IAM works with SageMaker, refer to Identity and Access Management for Amazon SageMaker.
Access to Amazon SageMaker Studio or a SageMaker notebook instance, or an interactive development environment (IDE) such as PyCharm or Visual Studio Code. We recommend using SageMaker Studio for straightforward deployment and inference.

Fine-tune Meta Llama 3.2 text generation models

In this section, we demonstrate how to fine-tune Meta Llama 3.2 text generation models. We will first look at the approach of fine-tuning using the SageMaker Studio UI without having to write any code. We then also cover how to fine-tune the model using SageMaker Python SDK.

No-code fine-tuning using the SageMaker Studio UI

SageMaker JumpStart provides access to publicly available and proprietary FMs from third-party and proprietary providers. Data scientists and developers can quickly prototype and experiment with various ML use cases, accelerating the development and deployment of ML applications. It helps reduce the time and effort required to build ML models from scratch, allowing teams to focus on fine-tuning and customizing the models for their specific use cases. These models are released under different licenses designated by their respective sources. It’s essential to review and adhere to the applicable license terms before downloading or using these models to make sure they’re suitable for your intended use case.

You can access the Meta Llama 3.2 FMs through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we cover how to discover these models in SageMaker Studio.

SageMaker Studio is an IDE that offers a web-based visual interface for performing the ML development steps, from data preparation to model building, training, and deployment. For instructions on getting started and setting up SageMaker Studio, refer to Amazon SageMaker Studio.

In SageMaker Studio, access SageMaker JumpStart by choosing JumpStart in the navigation pane.
You’re presented with the list of public models offered by SageMaker, where you can explore other models from other providers.

To start using the Meta Llama 3.2 models, under Providers, choose Meta.
You’re presented with a list of the models available.

Choose the Meta Llama 3.2 1B Instruct model.
Here you can view the model details, as well as train, deploy, optimize, and evaluate the model.

For this demonstration, we choose Train.

On this page, you can point to the Amazon Simple Storage Service (Amazon S3) bucket containing the training and validation datasets for fine-tuning.

In addition, you can configure deployment configuration, hyperparameters, and security settings for fine-tuning.
Choose Submit to start the training job on a SageMaker ML instance.

Accept the Llama 3.2 Community License Agreement to initiate the fine-tuning process.

Deploy the model

After the model is fine-tuned, you can deploy it using the model page on SageMaker JumpStart. The option to deploy the fine-tuned model will appear when fine-tuning is finished, as shown in the following screenshot.

You can also deploy the model from this view. You can configure endpoint settings such as the instance type, number of instances, and endpoint name. You will need to accept the End User License Agreement (EULA) before you can deploy the model.

Fine-tune using the SageMaker Python SDK

You can also fine-tune Meta Llama 3.2 models using the SageMaker Python SDK. A sample notebook with the full instructions can be found on GitHub. The following code example demonstrates how to fine-tune the Meta Llama 3.2 1B model:

import os
import boto3
from sagemaker.session import Session
from sagemaker.jumpstart.estimator import JumpStartEstimator

# To fine-tune the Llama 3.2 3B model available on JumpStart, please change model_id to `meta-textgeneration-llama-3-2-3b`.
model_id = "meta-textgeneration-llama-3-2-1b"
accept_eula = "true"
estimator = JumpStartEstimator(
    model_id=model_id, environment={"accept_eula": accept_eula}
)

# By default, instruction tuning is set to false. Thus, to use instruction tuning dataset you use instruction_tuned="True"
estimator.set_hyperparameters(instruction_tuned="True", epoch="5", max_input_length = "1024",)
estimator.fit({"training": train_data_location})

The code sets up a SageMaker JumpStart estimator for fine-tuning the Meta Llama 3.2 large language model (LLM) on a custom training dataset. It configures the estimator with the desired model ID, accepts the EULA, enables instruction tuning by setting instruction_tuned="True", sets the number of training epochs, and initiates the fine-tuning process.

When the fine-tuning job is complete, you can deploy the fine-tuned model directly from the estimator, as shown in the following code. As part of the deploy settings, you can define the instance type you want to deploy the model on. For the full list of deployment parameters, refer to the deploy parameters in the SageMaker SDK documentation.

finetuned_predictor = estimator.deploy(instance_type='ml.g5.xlarge')

After the endpoint is up and running, you can perform an inference request against it using the predictor object as follows:

prompt = "Your prompt goes here"
payload = {
        "inputs": prompt,
        "parameters": {"max_new_tokens": 256},
    }
response = finetuned_predictor.predict(payload)
response.get('generated_text')

For the full list of predictor parameters, refer to the predictor object in the SageMaker SDK documentation.

Fine-tuning technique

Language models such as Meta Llama are more than 10 GB or even 100 GB in size. Fine-tuning such large models requires instances with significantly higher CUDA memory. Furthermore, training these models can be very slow due to their size. Therefore, for efficient fine-tuning, we use the following optimizations:

Low-Rank Adaptation (LoRA) – This is a type of parameter efficient fine-tuning (PEFT) for efficient fine-tuning of large models. In this method, we freeze the whole model and only add a small set of adjustable parameters or layers into the model. For instance, instead of training all 3 billion parameters for Meta Llama 3.2 3B, we can fine-tune less than 1% of the parameters. This helps significantly reduce the memory requirement because we only need to store gradients, optimizer states, and other training-related information for only 1% of the parameters. Furthermore, this helps reduce both training time and cost. For more details on this method, refer to LoRA: Low-Rank Adaptation of Large Language Models.
Int8 quantization – Even with optimizations such as LoRA, models like Meta Llama 70B require significant computational resources for training. To reduce the memory footprint during training, we can employ Int8 quantization. Quantization typically reduces the precision of the floating-point data types. Although this decreases the memory required to store model weights, it can potentially degrade the performance due to loss of information. However, Int8 quantization utilizes only a quarter of the precision compared to full-precision training, but it doesn’t incur significant degradation in performance. Instead of simply dropping bits, Int8 quantization rounds the data from one type to another, preserving the essential information while optimizing memory usage. To learn about Int8 quantization, refer to int8(): 8-bit Matrix Multiplication for Transformers at Scale.
Fully Sharded Data Parallel (FSDP) – This is a type of data parallel training algorithm that shards the model’s parameters across data parallel workers and can optionally offload part of the training computation to the CPUs. Although the parameters are sharded across different GPUs, computation of each microbatch is local to the GPU worker. It shards parameters more uniformly and achieves optimized performance through communication and computation overlapping during training.

The following table compares different methods with the two Meta Llama 3.2 models.

Model	JumpStart Model IDs	Default Instance Type	Supported Instances Types for Fine-Tuning
Meta Llama 3.2 1B	meta-textgeneration-llama-3-2-1b meta-textgeneration-llama-3-2-1b-instruct	ml.g5.2xlarge	ml.g5.2xlarge ml.g5.4xlarge ml.g5.8xlarge ml.g5.12xlarge ml.p3dn.24xlarge ml.g4dn.12xlarge ml.p5.48xlarge
Meta Llama 3.2 3B	meta-textgeneration-llama-3-2-3b meta-textgeneration-llama-3-2-3b-instruct	ml.g5.12xlarge	ml.g5.12xlarge ml.g5.24xlarge ml.g5.48xlarge ml.p3dn.24xlarge ml.g4dn.12xlarge ml.p5.48xlarge

Model

JumpStart Model IDs

Default Instance Type

Supported Instances Types for Fine-Tuning

Meta Llama 3.2 1B

meta-textgeneration-llama-3-2-1b

meta-textgeneration-llama-3-2-1b-instruct

ml.g5.2xlarge

ml.g5.4xlarge

ml.g5.8xlarge

ml.g5.12xlarge

ml.p3dn.24xlarge

ml.g4dn.12xlarge

ml.p5.48xlarge

Meta Llama 3.2 3B

meta-textgeneration-llama-3-2-3b

meta-textgeneration-llama-3-2-3b-instruct

ml.g5.12xlarge

ml.g5.24xlarge

ml.g5.48xlarge

ml.p3dn.24xlarge

ml.g4dn.12xlarge

ml.p5.48xlarge

Other instance types may also work for fine-tuning. When using p3 instances, training will be done with 32-bit precision because bfloat16 is not supported on these instances. Therefore, the training job would consume double the amount of CUDA memory when training on p3 instances compared to g5 instances.

Training dataset format

SageMaker JumpStart currently support datasets in both domain adaptation format and instruction tuning format. In this section, we specify an example dataset in both formats. For more details, refer to the Dataset formatting section in the appendix.

Domain adaption format

You can fine-tune the Meta Llama 3.2 text generation model on domain-specific datasets, enabling it to generate relevant text and tackle various natural language processing (NLP) tasks within a particular domain using few-shot prompting. This fine-tuning process involves providing the model with a dataset specific to the target domain. The dataset can be in various formats, such as CSV, JSON, or TXT files. For example, if you want to fine-tune the model for the domain of financial reports and filings, you could provide it with a text file containing SEC filings from a company like Amazon. The following is an excerpt from such a filing:

This report includes estimates, projections, statements relating to our
business plans, objectives, and expected operating results that are “forward-
looking statements” within the meaning of the Private Securities Litigation
Reform Act of 1995, Section 27A of the Securities Act of 1933, and Section 21E
of the Securities Exchange Act of 1934. Forward-looking statements may appear
throughout this report, including the following sections: “Business” (Part I,
Item 1 of this Form 10-K), “Risk Factors” (Part I, Item 1A of this Form 10-K),
and “Management’s Discussion and Analysis of Financial Condition and Results
of Operations” (Part II, Item 7 of this Form 10-K). These forward-looking
statements generally are identified by the words “believe,” “project,”
“expect,” “anticipate,” “estimate,” “intend,” “strategy,” “future,”
“opportunity,” “plan,” “may,” “should,” “will,” “would,” “will be,” “will
continue,” “will likely result,” and similar expressions.

Instruction tuning format

In instruction fine-tuning, the model is fine-tuned for a set of NLP tasks described using instructions. This helps improve the model’s performance for unseen tasks with zero-shot prompts. In instruction tuning dataset format, you specify the template.json file describing the input and the output formats and the train.jsonl file with the training data item in each line.

The template.json file always has the following JSON format:

{
  "prompt": "<<Prompt goes here along with question or context or instruction>>",
  "completion": "<<completion goes here depending on the activity, for ex: answer for Q&A or summary for Summarization task>>"
}

For instance, the following table shows the template.json and train.jsonl files for the Dolly and Dialogsum datasets.

Dataset	Use Case	template.json	train.jsonl
Dolly	Question Answering	{ “prompt”: “Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:n{instruction}nn### Input:n{context}nn”, “completion”: ” {response}” }	{ “instruction”: “Who painted the Two Monkeys”, “context”: “Two Monkeys or Two Chained Monkeys is a 1562 painting by Dutch and Flemish Renaissance artist Pieter Bruegel the Elder. The work is now in the Gemäldegalerie (Painting Gallery) of the Berlin State Museums.”, “response”: “The two Monkeys or Two Chained Monkeys is a 1562 painting by Dutch and Flemish Renaissance artist Pieter Bruegel the Elder. The work is now in the Gemaeldegalerie (Painting Gallery) of the Berlin State Museums.” }
Dialogsum	Text Summarization	{ “prompt”: “Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation.nn### Instruction:n{dialogue}nn”, “completion”: ” {summary}” }	{ “dialogue”: “#Person1#: Where do these flower vases come from? n#Person2#: They are made a town nearby. The flower vases are made of porcelain and covered with tiny bamboo sticks. n#Person1#: Are they breakable? n#Person2#: No. They are not only ornmamental, but also useful. n#Person1#: No wonder it’s so expensive. “, “summary”: “#Person2# explains the flower vases’ materials and advantages and #Person1# understands why they’re expensive.” }

Supported hyperparameters for training

The fine-tuning process for Meta Llama 3.2 models allows you to customize various hyperparameters, each of which can influence factors such as memory consumption, training speed, and the performance of the fine-tuned model. At the time of writing this post, the following are the default hyperparameter values. For the most up-to-date information, refer to the SageMaker Studio console, because these values may be subject to change.

int8_quantization – If True, the model is loaded with 8-bit precision for training. Default for Meta Llama 3.2 1B and Meta Llama 3.2 3B is False.
enable_fsdp – If True, training uses FSDP. Default for Meta Llama 3.2 1B and Meta Llama 3.2 3B is True.
epoch – The number of passes that the fine-tuning algorithm takes through the training dataset. Must be an integer greater than 1. Default is 5.
learning_rate – The rate at which the model weights are updated after working through each batch of training examples. Must be a positive float greater than 0. Default is 0.0001.
lora_r – LoRA R dimension. Must be a positive integer. Default is 8.
lora_alpha – LoRA Alpha. Must be a positive integer. Default is 32.
target_modules – Target modules for LoRA fine-tuning. You can specify a subset of [‘q_proj’,’v_proj’,’k_proj’,’o_proj’,’gate_proj’,’up_proj’,’down_proj’] modules as a string separated by a comma without any spaces. Default is q_proj,v_proj.
lora_dropout – LoRA dropout. Must be a positive float between 0–1. Default is 0.05.
instruction_tuned – Whether to instruction-train the model or not. At most, one of instruction_tuned and chat_dataset can be True. Must be True or False. Default is False.
chat_dataset – If True, dataset is assumed to be in chat format. At most, one of instruction_tuned and chat_dataset can be True. Default is False.
add_input_output_demarcation_key – For an instruction tuned dataset, if this is True, a demarcation key ("### Response:n") is added between the prompt and completion before training. Default is True.
per_device_train_batch_size – The batch size per GPU core/CPU for training. Default is 4.
per_device_eval_batch_size – The batch size per GPU core/CPU for evaluation. Default is 1.
max_train_samples – For debugging purposes or quicker training, truncate the number of training examples to this value. Value -1 means using all of the training samples. Must be a positive integer or -1. Default is -1.
max_val_samples – For debugging purposes or quicker training, truncate the number of validation examples to this value. Value -1 means using all of the validation samples. Must be a positive integer or -1. Default is -1.
seed – Random seed that will be set at the beginning of training. Default is 10.
max_input_length – Maximum total input sequence length after tokenization. Sequences longer than this will be truncated. If -1, max_input_length is set to the minimum of 1024 and the maximum model length defined by the tokenizer. If set to a positive value, max_input_length is set to the minimum of the provided value and the model_max_length defined by the tokenizer. Must be a positive integer or -1. Default is -1.
validation_split_ratio – If validation channel is None, ratio of train-validation split from the train data must be between 0–1. Default is 0.2.
train_data_split_seed – If validation data is not present, this fixes the random splitting of the input training data to training and validation data used by the algorithm. Must be an integer. Default is 0.
preprocessing_num_workers – The number of processes to use for preprocessing. If None, the main process is used for preprocessing. Default is None.

Instance types and compatible hyperparameters

The memory requirement during fine-tuning may vary based on several factors:

Model type – The 1B model has the smallest GPU memory requirement and the 3B model has a higher memory requirement
Max input length – A higher value of input length leads to processing more tokens at a time and as such requires more CUDA memory
Batch size – A larger batch size requires larger CUDA memory and therefore requires larger instance types
Int8 quantization – If using Int8 quantization, the model is loaded into low precision mode and therefore requires less CUDA memory

To help you get started, we provide a set of combinations of different instance types, hyperparameters, and model types that can be successfully fine-tuned. You can select a configuration as per your requirements and availability of instance types. We fine-tune both two models on a variety of settings with three epochs on a subset of the Dolly dataset with summarization examples.

The results for fine-tuning the models are shown in the appendix at the end of this post. As we can see from these results, fine-tuning improves summarization compared to non-fine-tuned models.

Meta Llama 3.2 1B fine-tuning with various hyperparameters

The following table summarizes the different hyperparameters for fine-tuning Meta Llama 3.2 1B.

Instance Type	Max Input Length	Per Device Training Batch Size	Int8 Quantization	Enable FSDP	Time Taken (Minutes)
ml.g5.2xlarge	1024	4	FALSE	TRUE	11.3
ml.g5.2xlarge	1024	8	FALSE	TRUE	11.12
ml.g5.2xlarge	1024	4	FALSE	FALSE	14.55
ml.g5.2xlarge	2048	4	FALSE	TRUE	10.95
ml.g5.2xlarge	1024	4	TRUE	FALSE	17.82
ml.g5.2xlarge	2048	4	TRUE	FALSE	17.4
ml.g5.2xlarge	1024	8	TRUE	FALSE	16.97
ml.g5.4xlarge	1024	8	FALSE	TRUE	11.28
ml.g5.4xlarge	1024	4	FALSE	TRUE	11.48
ml.g5.4xlarge	2048	4	FALSE	TRUE	11.27
ml.g5.4xlarge	1024	4	FALSE	FALSE	14.8
ml.g5.4xlarge	1024	4	TRUE	FALSE	17.38
ml.g5.4xlarge	1024	8	TRUE	FALSE	16.63
ml.g5.4xlarge	2048	4	TRUE	FALSE	16.8
ml.g5.8xlarge	1024	4	FALSE	TRUE	11.12
ml.g5.8xlarge	2048	4	FALSE	TRUE	10.87
ml.g5.8xlarge	1024	8	FALSE	TRUE	10.88
ml.g5.8xlarge	1024	4	FALSE	FALSE	14.47
ml.g5.8xlarge	1024	4	TRUE	FALSE	17.82
ml.g5.8xlarge	1024	8	TRUE	FALSE	17.13
ml.g5.8xlarge	2048	4	TRUE	FALSE	17.13
ml.g5.12xlarge	2048	4	FALSE	FALSE	14.72
ml.g5.12xlarge	1024	4	FALSE	TRUE	10.45
ml.g5.12xlarge	1024	8	TRUE	FALSE	17.23
ml.g5.12xlarge	1024	8	FALSE	FALSE	14.03
ml.g5.12xlarge	1024	4	FALSE	FALSE	14.22
ml.g5.12xlarge	1024	4	TRUE	FALSE	18.07
ml.g5.12xlarge	2048	4	TRUE	FALSE	18.15
ml.g5.12xlarge	2048	4	FALSE	TRUE	8.45
ml.g5.12xlarge	1024	8	FALSE	TRUE	8.87
ml.g4dn.12xlarge	1024	8	FALSE	TRUE	21.15
ml.g4dn.12xlarge	1024	4	TRUE	FALSE	35.12
ml.g4dn.12xlarge	1024	4	FALSE	TRUE	22.42
ml.g4dn.12xlarge	1024	4	FALSE	FALSE	34.62
ml.g4dn.12xlarge	2048	4	FALSE	TRUE	23.25

Meta Llama 3.2 3B fine-tuning with various hyper parameters

The following table summarizes the different hyperparameters for fine-tuning Meta Llama 3.2 3B.

Instance Type	Max Input Length	Per Device Training Batch Size	Int8 Quantization	Enable FSDP	Time Taken (Minutes)
ml.g5.12xlarge	1024	8	TRUE	FALSE	29.18
ml.g5.12xlarge	2048	4	TRUE	FALSE	29.8
ml.g5.12xlarge	1024	4	FALSE	FALSE	26.2
ml.g5.12xlarge	1024	8	FALSE	TRUE	12.88
ml.g5.12xlarge	2048	4	FALSE	TRUE	11.8
ml.g5.12xlarge	1024	4	FALSE	TRUE	14.98
ml.g5.12xlarge	1024	4	TRUE	FALSE	30.05
ml.g5.12xlarge	1024	4	TRUE	FALSE	29.87
ml.g5.24xlarge	1024	4	FALSE	FALSE	25.97
ml.g5.24xlarge	1024	4	FALSE	TRUE	14.65
ml.g5.24xlarge	1024	4	TRUE	FALSE	29.32
ml.g5.24xlarge	2048	4	TRUE	FALSE	29.77
ml.g5.24xlarge	1024	8	TRUE	FALSE	28.78
ml.g5.24xlarge	2048	4	FALSE	TRUE	11.62
ml.g5.24xlarge	1024	8	FALSE	TRUE	12.38
ml.g5.48xlarge	1024	8	FALSE	TRUE	14.25
ml.g5.48xlarge	1024	4	FALSE	FALSE	26.2
ml.g5.48xlarge	2048	4	FALSE	TRUE	13.32
ml.g5.48xlarge	1024	4	FALSE	TRUE	16.73
ml.g5.48xlarge	1024	4	TRUE	FALSE	30.3
ml.g5.48xlarge	2048	4	FALSE	FALSE	28.7
ml.g5.48xlarge	1024	8	FALSE	FALSE	25.6
ml.g5.48xlarge	1024	8	TRUE	FALSE	29.33
ml.g5.48xlarge	2048	4	TRUE	FALSE	30.63

Recommendations on instance types and hyperparameters

When fine-tuning for the model’s accuracy, keep in mind the following:

Larger models such as 3B provide better performance than 1B
Performance without Int8 quantization is better than performance with Int8 quantization

Note the following training time and CUDA memory requirements:

Setting int8_quantization=True decreases the memory requirement.
The combination of per_device_train_batch_size, int8_quantization, and enable_fsdp settings affects the training times. When using a larger batch size with FSDP enabled, the training times are faster compared to using a larger batch size without FSDP.
Decreasing per_device_train_batch_size and max_input_length reduces the memory requirement and therefore can be run on smaller instances. However, setting very low values may increase the training time.
If you’re not using Int8 quantization (int8_quantization=False), use FSDP (enable_fsdp=True) for faster and efficient training.

When choosing the instance type, consider the following:

At the time of writing this post, the G5 instances provided the most efficient training among the supported instance types. However, because AWS regularly updates and introduces new instance types, we recommend that you validate the recommended instance type for Meta Llama 3.2 fine-tuning in the SageMaker documentation or SageMaker console before proceeding.
Training time largely depends on the amount of GPUs and the CUDA memory available. Therefore, training on instances with the same number of GPUs (for example, ml.g5.2xlarge and ml.g5.4xlarge) is roughly the same. Therefore, you can use the more cost-effective instance for training (ml.g5.2xlarge).

To learn about the cost of training per instance, refer to Amazon EC2 G5 Instances.

If your dataset is in instruction tuning format, where each sample consists of an instruction (input) and the desired model response (completion), and these input+completion sequences are short (for example, 50–100 words), using a high value for max_input_length can lead to poor performance. This is because the model may struggle to focus on the relevant information when dealing with a large number of padding tokens, and it can also lead to inefficient use of computational resources. The default value of -1 corresponds to a max_input_length of 1024 for Meta Llama models. We recommend setting max_input_length to a smaller value (for example, 200–400) when working with datasets containing shorter input+completion sequences to mitigate these issues and potentially improve the model’s performance and efficiency.

Lastly, due to the high demand of the G5 instances, you may experience unavailability of these instances in your AWS Region with the error “CapacityError: Unable to provision requested ML compute capacity. Please retry using a different ML instance type.” If you experience this error, retry the training job or try a different Region.

Issues when fine-tuning large models

In this section, we discuss two issues when fine-tuning very large models.

Disable output compression

By default, the output of a training job is a trained model that is compressed in a .tar.gz format before it’s uploaded to Amazon S3. However, for large models like the 70B model, this compression step can be time-consuming, taking more than 4 hours. To mitigate this delay, it’s recommended to use the disable_output_compression feature supported by the SageMaker training environment. When disable_output_compression is set to True, the model is uploaded without any compression, which can significantly reduce the time taken for large model artifacts to be uploaded to Amazon S3. The uncompressed model can then be used directly for deployment or further processing. The following code shows how to pass this parameter into the SageMaker JumpStart estimator:

estimator = JumpStartEstimator(
                                model_id=model_id,
                                environment={"accept_eula": "true"},
                                disable_output_compression=True
                                )

SageMaker Studio kernel timeout issue

The SageMaker Studio kernel is only used to initiate the training job, and its status doesn’t affect the ongoing training process. After the training job starts, the compute resources allocated for the job will continue running the training process, regardless of whether the SageMaker Studio kernel remains active or times out. If the kernel times out during the lengthy training process, you can still deploy the endpoint after training is complete using the training job name with the following code:

from sagemaker.jumpstart.estimator import JumpStartEstimator
training_job_name = <<<INSERT_TRAINING_JOB_NAME>>>

attached_estimator = JumpStartEstimator.attach(training_job_name, model_id)
attached_estimator.logs()
predictor = attached_estimator.deploy()

To find the training job name, navigate to the SageMaker console and under Training in the navigation pane, choose Training jobs. Identify the training job name and substitute it in the preceding code.

Clean up

To prevent incurring unnecessary charges, it’s recommended to clean up the deployed resources when you’re done using them. You can remove the deployed model with the following code:

predictor.delete_predictor()

Conclusion

As generative AI models continue to evolve, their effectiveness hinges on the ability to adapt and specialize for domain-specific applications. Meta’s Llama 3.2 series, with its innovative multimodal features and flexible deployment options, provides a powerful foundation for building tailored AI solutions. By fine-tuning these models using SageMaker JumpStart, organizations can transform generalized capabilities into highly specialized tools, enhancing precision and delivering meaningful results for complex, real-world problems. Whether you’re aiming to improve document analysis, automate visual interpretation, or generate domain-specific content, Meta Llama 3.2 models, fine-tuned to your needs, can bridge the gap between broad AI functionalities and targeted expertise, driving impactful outcomes in your field.

In this post, we discussed fine-tuning Meta Llama 3.2 text generation models using SageMaker JumpStart. We showed that you can use the SageMaker JumpStart console in SageMaker Studio or the SageMaker Python SDK to fine-tune and deploy these models. We also discussed the fine-tuning technique, instance types, and supported hyperparameters. In addition, we outlined recommendations for optimized training based on various tests we carried out.

As shown in the results of fine-tuning the models over two datasets, fine-tuning improves summarization compared to non-fine-tuned models.

As a next step, you can try fine-tuning these models on your own dataset using the code provided in the GitHub repository to test and benchmark the results for your use cases.

About the Authors

Pavan Kumar Rao Navule is a Solutions Architect at Amazon Web Services, where he works with ISVs in India to help them innovate on the AWS platform. He is specialized in architecting AI/ML and generative AI services at AWS. Pavan is a published author for the book “Getting Started with V Programming.” In his free time, Pavan enjoys listening to the great magical voices of Sia and Rihanna.

Jin Tan Ruan is a Prototyping Developer at AWS, part of the AWSI Strategic Prototyping and Customer Engineering (PACE) team, where he focuses on NLP and generative AI. With nine AWS certifications and a robust background in software development, Jin uses his expertise to help AWS strategic customers bring their AI/ML and generative AI projects to life. He holds a Master’s degree in Machine Learning and Software Engineering from Syracuse University. Outside of work, Jin is an avid gamer and a fan of horror films. You can find Jin on LinkedIn to learn more!

Appendix

In this section, we present the results for fine-tuning the Meta Llama 3.2 1B and 3B text generation models on different datasets. This section also covers the dataset formatting for domain adaptation and instruction fine-tuning techniques.

Results for fine-tuning the Meta Llama 3.2 1B text generation model on the Dolly dataset

Inputs	Ground Truth	Response from non-finetuned model	Response from fine-tuned model
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: Write down some points on Yugoslav Cup using given paragraph as a base. ### Input: The Yugoslav Cup was a tournament for which clubs from all tiers of the football pyramid were eligible to enter. In addition, amateur teams put together by individual Yugoslav People’s Army garrisons and various factories and industrial plants were also encouraged to enter, which meant that each cup edition could have several thousands of teams in its preliminary stages. These teams would play through a number of qualifying rounds before reaching the first round proper, in which they would be paired with top-flight teams. ### Response:	1. Clubs from all levels of the football pyramid were eligible to participate in the Yugoslav Cup. 2. Additionally, amateur teams assembled by numerous enterprises and industrial plants as well as individual Yugoslav People’s Army garrisons were encouraged to compete, which meant that each cup edition may include thousands of teams in its preliminary rounds. 3. Prior to the first round proper, where they would be partnered with top-flight teams, these teams would compete in a number of qualification rounds.	The Yugoslav Cup was a tournament for which clubs from all tiers of the football pyramid were eligible to enter. In addition, amateur teams put together by individual Yugoslav People’s Army garrisons and various factories and industrial plants were also encouraged to enter, which meant that each cup edition could have several thousands of teams in its preliminary stages. These teams would play through a number of qualifying rounds before reaching the first round proper, in which they would be paired with top-flight teams. ### Input: The	1. The Yugoslav Cup was a tournament for which clubs from all tiers of the football pyramid were eligible to enter. In addition, amateur teams put together by individual Yugoslav People’s Army garrisons and various factories and industrial plants were also encouraged to enter, which meant that each cup edition could have several thousands of teams in its preliminary stages. These teams would play through a number of qualifying rounds before reaching the first round proper, in which they would be paired with top-flight teams.
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: What role did Danbury, Connecticut play in the American Revolution? ### Input: During the Revolutionary War, Danbury was an important military supply depot for the Continental Army. Sybil Ludington, 16-year-old daughter of American Colonel Henry Ludington, is said to have made a 40-mile ride in the early hours of the night on April 26, 1777, to warn the people of Danbury and her father’s forces in Putnam County, New York, of the approach of British regulars, helping them muster in defense; these accounts, originating from the Ludington family, are questioned by modern scholars. During the following day on April 26, 1777, the British, under Major General William Tryon, burned and sacked Danbury, but fatalities were limited due to Ludington’s warning. The central motto on the seal of the City of Danbury is Restituimus, (Latin for “We have restored”), a reference to the destruction caused by the Loyalist army troops. The American General David Wooster was mortally wounded at the Battle of Ridgefield by the British forces which had raided Danbury, but at the beginning of the battle, the Americans succeeded in driving the British forces down to Long Island Sound. Wooster is buried in Danbury’s Wooster Cemetery; the private Wooster School in Danbury also was named in his honor. ### Response:	Danbury, Connecticut served as an important military depot for the Continental Army during the revolutionary war. In April 26, 1777 the British Army attacked, burned and sacked the city, but the American troop were able to drive the British forces back to the Long Island sound.	Danbury was an important military supply depot for the Continental Army. Sybil Ludington, 16-year-old daughter of American Colonel Henry Ludington, is said to have made a 40-mile ride in the early hours of the night on April 26, 1777, to warn the people of Danbury and her father’s forces in Putnam County, New York, of the approach of British regulars, helping them muster in defense; these accounts, originating from the Ludington family,	During the Revolutionary War, Danbury was an important military supply depot for the Continental Army. Sybil Ludington, 16-year-old daughter of American Colonel Henry Ludington, is said to have made a 40-mile ride in the early hours of the night on April 26, 1777, to warn the people of Danbury and her father’s forces in Putnam County, New York, of the approach of British regulars, helping them muster in defense; these accounts, originating from
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: Give me a bulleted list of the main responses to “Bad Blood” listed in this text, including whether they were positive or negative. ### Input: “Bad Blood” is a song by American singer-songwriter Taylor Swift, taken from her fifth studio album 1989 (2014). Swift wrote the song with its producers Max Martin and Shellback. The album track is a pop song with stomping drums. A hip hop remix of “Bad Blood”, featuring American rapper Kendrick Lamar and additional production by Ilya, was released as the fourth single from 1989 on May 17, 2015, by Big Machine and Republic Records. The lyrics are about feelings of betrayal by a close friend. Upon the album’s release, critics expressed mixed opinions about “Bad Blood”, with some complimenting Swift’s defiant attitude and dubbing it an album highlight, while others criticized its production and lyrics. The remixed single was praised for the reworked instrumentation and Lamar’s verses, which others considered to be out of place on a pop song. Media outlets speculated that American singer Katy Perry is the subject of the song. The remix received a Grammy nomination for Best Pop Duo/Group Performance. “Bad Blood” was supported by a high-budget music video directed by Joseph Kahn and produced by Swift. It featured an ensemble cast consisting of many singers, actresses and fashion models, which received wide media coverage. Critics praised the video for its cinematic and futuristic visuals inspired by neo-noir styles. It won a Grammy Award for Best Music Video, and two MTV Video Music Awards for the Video of the Year and Best Collaboration. Commercially, “Bad Blood” reached number one in Australia, Canada, New Zealand, and Scotland, as well as the United States, where it topped the Billboard Hot 100, Adult Top 40 and Mainstream Top 40 charts. It has been certified triple platinum in Australia and Canada, and 6√ó platinum in the US. ### Response:	* positive about Taylor Swift’s defiant attitude * negative about the song’s production and lyrics * positive about the instrumentation and verses * positive about the accompanying music video’s visuals * positive in terms of topping several charts	1. “Bad Blood” is a song by American singer-songwriter Taylor Swift, taken from her fifth studio album 1989 (2014). Swift wrote the song with its producers Max Martin and Shellback. The album track is a pop song with stomping drums. A hip hop remix of “Bad Blood”, featuring American rapper Kendrick Lamar and additional production by Ilya, was released as the fourth single from 1989 on May 17, 2015, by Big Machine and Republic	1. The song is a pop song with stomping drums. 2. A hip hop remix of “Bad Blood”, featuring American rapper Kendrick Lamar and additional production by Ilya, was released as the fourth single from 1989 on May 17, 2015, by Big Machine and Republic Records. 3. The lyrics are about feelings of betrayal by a close friend. 4. Upon the album’s release, critics expressed mixed opinions about “Bad Blood”, with some complimenting Swift’s
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: What are the highlights of the Apollo 11 mission? ### Input: Apollo 11 (July 16‚Äì24, 1969) was the American spaceflight that first landed humans on the Moon. Commander Neil Armstrong and lunar module pilot Buzz Aldrin landed the Apollo Lunar Module Eagle on July 20, 1969, at 20:17 UTC, and Armstrong became the first person to step onto the Moon’s surface six hours and 39 minutes later, on July 21 at 02:56 UTC. Aldrin joined him 19 minutes later, and they spent about two and a quarter hours together exploring the site they had named Tranquility Base upon landing. Armstrong and Aldrin collected 47.5 pounds (21.5 kg) of lunar material to bring back to Earth as pilot Michael Collins flew the Command Module Columbia in lunar orbit, and were on the Moon’s surface for 21 hours, 36 minutes before lifting off to rejoin Columbia. ### Response:	Apollo 11 was the culmination of a multi year space race instigated by the Cold War, leading to the United States landing the first humans on the Moon. The mission had 3 personnel, required multiple years of preparation, and consisted of 8 key phases; launch, flight, lunar orbit, lunar descent, landing, lunar surface operations, lunar ascent, return, splashdown, and Quarantine. The Apollo 11 mission remains the most famous mission ever in the NASA program. The 3 astronauts were Michael Collins, Buzz Aldrin, and Neil Armstrong.	The Apollo 11 mission was a historic event that changed the course of human history. It was the first time humans had landed on the Moon, and it was the first time humans had walked on the Moon. The mission was a success, and it was a major step forward in the exploration of space. The mission was also a major step forward in the exploration of the Moon, and it was a major step forward in the exploration of the Moon, and it was a major step forward in the exploration	The Apollo 11 mission was the first manned mission to land humans on the Moon. The mission was led by Neil Armstrong and Buzz Aldrin, who landed the Apollo Lunar Module Eagle on July 20, 1969, at 20:17 UTC, and Armstrong became the first person to step onto the Moon’s surface six hours and 39 minutes later, on July 21 at 02:56 UTC. Aldrin joined him 19 minutes later, and they spent about two and a
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: Extract the key elements from the paragraph about Apple and list them in bullet points ### Input: Apple Inc. is an American multinational technology company headquartered in Cupertino, California. Apple is the largest technology company by revenue, totaling US$394.3 billion in 2022. As of March 2023, Apple is the world’s biggest company by market capitalization. As of June 2022, Apple is the fourth-largest personal computer vendor by unit sales and second-largest mobile phone manufacturer. It is one of the Big Five American information technology companies, alongside Alphabet (known for Google), Amazon, Meta (known for Facebook), and Microsoft. Apple was founded as Apple Computer Company on April 1, 1976, by Steve Wozniak, Steve Jobs and Ronald Wayne to develop and sell Wozniak’s Apple I personal computer. It was incorporated by Jobs and Wozniak as Apple Computer, Inc. in 1977. The company’s second computer, the Apple II, became a best seller and one of the first mass-produced microcomputers. Apple went public in 1980 to instant financial success. The company developed computers featuring innovative graphical user interfaces, including the 1984 original Macintosh, announced that year in a critically acclaimed advertisement. By 1985, the high cost of its products, and power struggles between executives, caused problems. Wozniak stepped back from Apple amicably and pursued other ventures, while Jobs resigned bitterly and founded NeXT, taking some Apple employees with him. As the market for personal computers expanded and evolved throughout the 1990s, Apple lost considerable market share to the lower-priced duopoly of the Microsoft Windows operating system on Intel-powered PC clones (also known as “Wintel”). In 1997, weeks away from bankruptcy, the company bought NeXT to resolve Apple’s unsuccessful operating system strategy and entice Jobs back to the company. Over the next decade, Jobs guided Apple back to profitability through a number of tactics including introducing the iMac, iPod, iPhone and iPad to critical acclaim, launching the “Think different” campaign and other memorable advertising campaigns, opening the Apple Store retail chain, and acquiring numerous companies to broaden the company’s product portfolio. When Jobs resigned in 2011 for health reasons, and died two months later, he was succeeded as CEO by Tim Cook. Apple became the first publicly traded U.S. company to be valued at over $1 trillion in August 2018, then $2 trillion in August 2020, and $3 trillion in January 2022. As of January 2023, it was valued at around $2.2 trillion. The company receives criticism regarding the labor practices of its contractors, its environmental practices, and its business ethics, including anti-competitive practices and materials sourcing. Nevertheless, the company has a large following and enjoys a high level of brand loyalty. It is ranked as one of the world’s most valuable brands. ### Response:	1. Apple is an American multination technology company with revenues of over $394.3 billion as of 2022. 2. Apple is the 4th largest personal computer vendor by unit sales and the second-largest mobile phone manufacturer. 3. Apple was founded by Steve Jobs, Steve Wozniak, and Rold Wayne on April 1, 1976. 4. Steve Jobs guided Apple to profitability through a number of tactics including introducing the iMac, iPod, iPhone, and Ipad when it was facing losses. 5. Apple was the first publicly traded US company to be valued at over $1 trillion in August 2018.	1. Apple Inc. is an American multinational technology company headquartered in Cupertino, California. Apple is the largest technology company by revenue, totaling US$394.3 billion in 2022. As of March 2023, Apple is the world’s biggest company by market capitalization. As of June 2022, Apple is the fourth-largest personal computer vendor by unit sales and second-largest mobile phone manufacturer. It is one of the Big Five American information technology companies, alongside Alphabet (known for Google	1. Apple Inc. is an American multinational technology company headquartered in Cupertino, California. 2. Apple is the largest technology company by revenue, totaling US$394.3 billion in 2022. 3. As of March 2023, Apple is the world’s biggest company by market capitalization. 4. As of June 2022, Apple is the fourth-largest personal computer vendor by unit sales and second-largest mobile phone manufacturer. 5. It is one of the Big Five American information

Results for fine-tuning the Meta Llama 3.2 1B text generation model on the Dialogsum dataset

Inputs	Ground Truth	Response from non-finetuned model	Response from fine-tuned model
Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation. ### Instruction: #Person1#: Wow, Amy, look at that number. #Person2#: What’s so great about it? #Person1#: What do you mean? It’s a scrim. #Person2#: Not to me, I wouldn’t be caught dead in that. #Person1#: I love that kind of stuff. #Person2#: Yeah, you seem to like those styles. #Person1#: Sure do. I get a kick out of looking punk. #Person2#: This one looks really expensive though. #Person1#: I don’t care how much it costs. I’m rolling in it. #Person2#: Lucky you! I can’t buy anything else. I’m flat broke. #Person1#: I got a replace my party outfit, I trashed it last week. #Person2#: Why did you drag me into this? I’ve spent a week salary. #Person1#: Are you on the level? You’re the one who called me, remember? #Person2#: Just joking, T. I’ve got some wonderful clothes. Let’s go and have a close look at your outfit. ### Response:	#Person1# trashed the party outfit so #Person1# needs a replacement. Amy jokes that she is broke. They then go to see the outfit.	#Person1#: I’m not sure I like it. It’s a little too much. #Person2#: I think it’s great. It’s a little bit of everything. #Person1#: I’m not sure I like it. It’s a little too much. #Person2#: I think it’s great. It’s a little bit of everything. #Person1#: I’m not sure I like it. It’s a little too much. #Person2#: I think it’s great. It’s a little bit of everything. #Person1#: I’m not sure I like it. It’s a little too much. #Person2#: I think it’s great. It’s a little bit of everything. #Person1#: I’m not sure I like it. It’s a little too much. #Person2#: I think it’s great. It’s a little bit of everything. #Person1#: I’m not sure I like it. It’s a little too much. #Person2#: I think it’s great. It’s a little bit of everything. #Person1#: I’m not sure I like it. It’s a little too much. #Person2#: I think it’s great. It’s a little bit of everything. #Person1#: I’m not sure I like it. It’s a little too much. #Person2#: I think it’s great. It’s a little bit of everything. #Person1#:	#Person1# and Amy are talking about the number of Amy’s outfit. #Person1# loves that kind of stuff and Amy is flat broke.
Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation. ### Instruction: #Person1#: So who does he think he is, anyway, I can’t believe the way that my boss has been ordering everyone around lately, I mean, it’s now like he is the CEO, or anything. #Person2#: Um. . . Actually I am guessing you didn’t get the MEMO. Your boss was selected to fill the vacancy in the CEO slot. He actually is the CEO now, or will be, as soon as the official press releases and hoopla is done with. #Person1#: Yikes! you are joking, right? I can’t believe it. No wonder he is so stressed and bossy lately. I thought he was just being ostentatious by giving orders to people and all the different departments. What big shoes to fill! #Person2#: No kidding! When the last CEO left, we were worried about the future of the company. #Person1#: What does a CEO do anyway? I know the CEO makes the most money in the company, but what does he actually do? #Person2#: He is responsible to the board of directors for everything that happens in the company. He or she must give leadership to all company officers, as CEO is also responsible for providing the guidance of philosophy of the company, and acting as official representative, or face of the company. #Person1#: Must be one smart guy. ### Response:	#Person1# complains about the style of CEO. #Person2# says that the boss is already a CEO candidate. #Person2# answers #Person1#’s question about the CEO’s function in a company.	#Person1#: I think you are right. I mean, I can’t believe that my boss is the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the	#Person1# thinks #Person1#’s boss is being bossy lately. #Person2# tells #Person1# #Person2#’s boss is the CEO now. #Person1# thinks it’s a big shoes to fill.
Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation. ### Instruction: #Person1#: What does a housing go for in south mountain district? #Person2#: It depends on the type of the house you’re looking for. #Person1#: We’re looking for a three-bedroom home. #Person2#: What’s your general price range? #Person1#: Under 300, 000 RIB, we suppose. #Person2#: Umm, I think I would be hard to find something in that range in this area. You know the environment in the south mountain district is the nicest. #Person1#: What about a two-bedroom house? #Person2#: A two-bedroom house in this area is generally around 250, 000 RIB. #Person1#: I see. #Person2#: I recommend you buy a two-bedroom house. It is very suitable for a couple, and some two-bedroom houses are large enough for a one-child family. #Person1#: Okay, I will take it into consideration. ### Response:	#Person1# wants a three-bedroom home with a price under 300,000 RIB. #Person2# suggests buying a two-bedroom house for a one-child family with around 250, 000 RIB.	#Person1#: What does a housing go for in south mountain district? #Person2#: It depends on the type of the house you’re looking for. #Person1#: We’re looking for a three-bedroom home. #Person2#: What’s your general price range? #Person1#: Under 300, 000 RIB, we suppose. #Person2#: Umm, I think I would be hard to find something in that range in this area. You know the environment in the south mountain district is the nicest. #Person1#: What about a two-bedroom house? #Person2#: A two-bedroom house in this area is generally around 250, 000 RIB. #Person1#: I see. #Person2#: I recommend you buy a two-bedroom house. It is very suitable for a couple, and some two-bedroom houses are large enough for a one-child family. #Person1#: Okay, I will take it into consideration.	#Person1# and #Person2# are looking for a three-bedroom home. #Person2# recommends #Person1# to buy a two-bedroom house.
Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation. ### Instruction: #Person1#: Mr. Sharp, thank you for giving me this interview. So, my first question is, are you happy with your life? #Person2#: Am I happy? You may know you are the first person to have asked me this question directly. The others dared not because they were afraid I might say yes. They wanted me to say no so that they could make up stories about me. #Person1#: What kind of stories? #Person2#: Brian Sharpe, author of over 100 historical novels, lives in a lonely house 30 miles away from Sydney. But is he happy? #Person1#: Is that why you have refused many interviews? #Person2#: I’m tired of being told that I must be unhappy because I’m rich and successful. I’m not unhappy. #Person1#: Understood. And I’m wondering if you follow a similar pattern of writing. #Person2#: I know what you mean. Do I write similar stories? I say no. #Person1#: So, how do you create your heroes? #Person2#: My heroes live? They think for themselves. All their stories are drawn from life, my life, my friends’ lives, including my friends in many books. #Person1#: Sounds good to me. ### Response:	#Person1# interviews Mr. Sharp. Mr. Sharp tells #Person1# he’s happy and he doesn’t write similar stories. His stories are drawn from life.	#Person1#: I’m glad you’re happy. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe	Mr. Sharp is happy with his life. He tells #Person1# he doesn’t follow a similar pattern of writing. He thinks for himself and all his stories are drawn from life.
Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation. ### Instruction: #Person1#: What’s the weather forecast for today? #Person2#: Cloudy in the morning, overcast in the afternoon. #Person1#: Any rain? #Person2#: No, but I wouldn’t be surprised if it rained tomorrow. #Person1#: I wouldn’t either. It’s been looking like rain for several days. ### Response:	#Person2# tells #Person1# the weather forecast for today.	#Person1#: It’s been looking like rain for several days. #Person2#: I wouldn’t be surprised if it rained tomorrow. #Person1#: I wouldn’t either. It’s been looking like rain for several days. #Person2#: It’s been looking like rain for several days. #Person1#: It’s been looking like rain for several days. #Person2#: I wouldn’t be surprised if it rained tomorrow. #Person1#: I wouldn’t either. It’s been looking like rain for several days. #Person2#: It’s been looking like rain for several days. #Person1#: It’s been looking like rain for several days. #Person2#: I wouldn’t be surprised if it rained tomorrow. #Person1#: I wouldn’t either. It’s been looking like rain for several days. #Person2#: It’s been looking like rain for several days. #Person1#: It’s been looking like rain for several days. #Person2#: I wouldn’t be surprised if it rained tomorrow. #Person1#: I wouldn’t either. It’s been looking like rain for several days. #Person2#: It’s been looking like rain for several days. #Person1#: It’s been looking like rain for several days. #Person2#: I wouldn’t be surprised if it rained tomorrow. #Person1#: I wouldn’t either. It’s been looking like rain for several days. #Person2#:	#Person1# and #Person2# talk about the weather forecast.

Results for fine-tuning the Meta Llama 3.2 3B text generation model on the Dolly dataset

Inputs	Ground Truth	Response from non-finetuned model	Response from fine-tuned model
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: Highlight relevant facts related to Heidelberg ### Input: Heidelberg (German: [Ààha…dlÃ©b…õ Åk] (listen); Palatine German: Heidlberg) is a city in the German state of Baden-W√ºrttemberg, situated on the river Neckar in south-west Germany. As of the 2016 census, its population was 159,914, of which roughly a quarter consisted of students. Located about 78 km (48 mi) south of Frankfurt, Heidelberg is the fifth-largest city in Baden-W√ºrttemberg. Heidelberg is part of the densely populated Rhine-Neckar Metropolitan Region. Heidelberg University, founded in 1386, is Germany’s oldest and one of Europe’s most reputable universities. Heidelberg is a scientific hub in Germany and home to several internationally renowned research facilities adjacent to its university, including the European Molecular Biology Laboratory and four Max Planck Institutes. The city has also been a hub for the arts, especially literature, throughout the centuries, and it was designated a “City of Literature” by the UNESCO Creative Cities Network. Heidelberg was a seat of government of the former Electorate of the Palatinate and is a popular tourist destination due to its romantic cityscape, including Heidelberg Castle, the Philosophers’ Walk, and the Baroque old town. ### Response:	Heidelberg has the oldest University in Germany. The city is a scientific hub and home to several internationally renowned research facilities. It was designated a “City of Literature” by the UNESCO Creative Cities Network.	Heidelberg is a city in the German state of Baden-W√ºrttemberg, situated on the river Neckar in south-west Germany. As of the 2016 census, its population was 159,914, of which roughly a quarter consisted of students. Located about 78 km (48 mi) south of Frankfurt, Heidelberg is the fifth-largest city in Baden-W√ºrttemberg. Heidelberg is part of the densely populated Rhine-Neckar Metropolitan Region.	Heidelberg is a city in the German state of Baden-W√ºrttemberg, situated on the river Neckar in south-west Germany. As of the 2016 census, its population was 159,914, of which roughly a quarter consisted of students. Located about 78 km (48 mi) south of Frankfurt, Heidelberg is the fifth-largest city in Baden-W√ºrttemberg. Heidelberg is part of the densely populated Rhine-Neckar Metropolitan Region.
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: What is the Indianapolis 500? ### Input: The Indianapolis 500, formally known as the Indianapolis 500-Mile Race, and commonly called the Indy 500, is an annual automobile race held at Indianapolis Motor Speedway (IMS) in Speedway, Indiana, United States, an enclave suburb of Indianapolis. The event is traditionally held over Memorial Day weekend, usually the last weekend of May. It is contested as part of the IndyCar Series, the top level of American open-wheel car racing, a formula colloquially known as “Indy car racing”. The track itself is nicknamed the “Brickyard”, as the racing surface was paved in brick in the fall of 1909. One yard of brick remains exposed at the start/finish line. The event, billed as The Greatest Spectacle in Racing, is considered part of the Triple Crown of Motorsport along with the 24 Hours of Le Mans and the Monaco Grand Prix, with which it typically shares a date. ### Response:	The Indianapolis 500, formally known as the Indianapolis 500-Mile Race, and commonly called the Indy 500, is an annual automobile race held at Indianapolis Motor Speedway (IMS) in Speedway, Indiana, United States, an enclave suburb of Indianapolis. The event is traditionally held over Memorial Day weekend, usually the last weekend of May. It is contested as part of the IndyCar Series, the top level of American open-wheel car racing, a formula colloquially known as “Indy car racing”. The track itself is nicknamed the “Brickyard”, as the racing surface was paved in brick in the fall of 1909. One yard of brick remains exposed at the start/finish line. The event, billed as The Greatest Spectacle in Racing, is considered part of the Triple Crown of Motorsport along with the 24 Hours of Le Mans and the Monaco Grand Prix, with which it typically shares a date. The inaugural race was held in 1911 and was won by Ray Harroun. The event celebrated its 100th anniversary in 2011, and the 100th running was held in 2016. The event was put on hiatus twice, from 1917 to 1918 due to World War I and from 1942 to 1945 due to World War II. In two different periods, the race was part of FIA World Championships; between 1925 and 1928, the World Manufacturers’ Championship and between 1950 and 1960, the World Drivers’ Championship. Marcus Ericsson is the current champion. The most successful drivers are A. J. Foyt, Al Unser Sr., Rick Mears and H√©lio Castroneves, each of whom has won the race four times. The active driver with the most victories is H√©lio Castroneves. Rick Mears holds the record for most career pole positions with six. The most successful car owner is Roger Penske, owner of the Indianapolis Motor Speedway and Team Penske, which has 18 total wins and 18 poles. Penske also has five wins at the IndyCar Grand Prix, held on the combined road course. The event is steeped in tradition, in pre-race ceremonies, post-race celebrations, and race procedure. The most noteworthy and most popular traditions are the 33-car field lining up three-wide for the start, the annual singing of the chorus of “Back Home Again in Indiana,” and the victory lane bottle of milk. Also unique is that qualifying requires the driver to complete four, rather than one, timed laps. Qualifying has a separate weekend. The official attendance is not disclosed by Speedway management, but the permanent seating capacity is upwards of 250,000, and infield patrons raise the race-day attendance to approximately 300,000.	The Indianapolis 500 is an annual automobile race held at Indianapolis Motor Speedway (IMS) in Speedway, Indiana, United States, an enclave suburb of Indianapolis. The event is traditionally held over Memorial Day weekend, usually the last weekend of May. It is contested as part of the IndyCar Series, the top level of American open-wheel car racing, a formula colloquially known as “Indy car racing”. The track itself is nicknamed the “Brickyard”, as the racing surface was paved in	The Indianapolis 500 is an annual automobile race held at Indianapolis Motor Speedway (IMS) in Speedway, Indiana, United States, an enclave suburb of Indianapolis. The event is traditionally held over Memorial Day weekend, usually the last weekend of May. It is contested as part of the IndyCar Series, the top level of American open-wheel car racing, a formula colloquially known as “Indy car racing”. The track itself is nicknamed the “Brickyard”, as the racing surface was paved in
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: What are some interesting facts about Nelson Mandel Boulevard described on this paragraph? ### Input: Nelson Mandela Boulevard (Persian: ÿ®ŸÑŸàÿßÿ± ŸÜŸÑÿ≥ŸàŸÜ ŸÖÿßŸÜÿØŸÑÿß) (old name: Jordan Street and Africa Boulevard) still known as Jordan is an affluent and upper-class district in northern Tehran, and some people draw similarities between Jordan District as being much alike Kensington District in London, UK as the area is a mixture of residential and commercial locale, filled with the homes and businesses of many politicians, diplomats, expatriates, and artists. Before the Iranian Revolution in 1979 it was called Jordan street, named after the American presbyterian missionary Samuel M. Jordan and used to be one of the most popular avenue in Tehran, Iran. Renamed Nelson Mandela Boulevard in recent years, it is amongst the most famous streets in north Tehran after Valiasr Street which is the longest conventional street in the Middle East, spanning a south‚Äìnorth axis. It is also famous for being one of the liveliest streets of Tehran, experiencing regular traffic jams even at 2:00A.M during Summer. ### Response:	It holds a mixture of residential and commercial locale from upper-class citizens, like politicians, diplomats, expatriates and artists. It used to be called Jordan street before the Iranian revolution. It’s amongst the most famous streets of Tehran, experiencing regular traffic jams even at 2:00 AM during Summer.	Nelson Mandela Boulevard is a street in Tehran, Iran. It is one of the most famous streets in north Tehran after Valiasr Street which is the longest conventional street in the Middle East, spanning a south‚Äìnorth axis. It is also famous for being one of the liveliest streets of Tehran, experiencing regular traffic jams even at 2:00A.M during Summer. The street is named after the South African anti-apartheid revolutionary and former President of South Africa, Nelson Mandela.	1. Nelson Mandela Boulevard is a street in northern Tehran, Iran. 2. It was named after the American presbyterian missionary Samuel M. Jordan. 3. It was renamed Nelson Mandela Boulevard in recent years. 4. It is amongst the most famous streets in north Tehran. 5. It is also famous for being one of the liveliest streets of Tehran.
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: What are the Wimbledon Championships? ### Input: The Wimbledon Championships, commonly known simply as Wimbledon, is the oldest tennis tournament in the world and is widely regarded as the most prestigious. It has been held at the All England Club in Wimbledon, London, since 1877 and is played on outdoor grass courts, with retractable roofs over the two main courts since 2019. ### Response:	The Wimbledon Championships, commonly known simply as Wimbledon, is the oldest tennis tournament in the world and is widely regarded as the most prestigious. It has been held at the All England Club in Wimbledon, London, since 1877 and is played on outdoor grass courts, with retractable roofs over the two main courts since 2019. Wimbledon is one of the four Grand Slam tennis tournaments, the others being the Australian Open, the French Open, and the US Open. Wimbledon is the only major still played on grass, the traditional tennis playing surface. Also, it is the only Grand Slam that retains a night-time curfew, though matches can now continue until 11.00 pm under the lights. The tournament traditionally takes place over two weeks in late June and early July, starting on the last Monday in June and culminating with the Ladies’ and Gentlemen’s Singles Finals, scheduled for the Saturday and Sunday at the end of the second week. Five major events are held each year, with additional junior and invitational competitions also taking place. In 2009, Wimbledon’s Centre Court was fitted with a retractable roof to lessen the loss of playing time due to rain. A roof was operational over No. 1 Court from 2019, when a number of other improvements were made, including adding cushioned seating, a table and 10 independently operable cameras per court to capture the games. Wimbledon traditions include a strict all-white dress code for competitors, and royal patronage. Strawberries and cream are traditionally consumed at the tournament. Unlike other tournaments, advertising is minimal and low key from official suppliers such as Slazenger and Rolex. The relationship with Slazenger is the world’s longest-running sporting sponsorship, providing balls for the tournament since 1902. Due to the COVID-19 pandemic, 2020 Wimbledon was cancelled, the first cancellation of the tournament since World War II. The rescheduled 134th edition was staged from 28 June 2021 to 11 July 2021, following from the 2020 cancellation. The 135th edition was played between 27 June 2022 and 10 July 2022, and regularly scheduled play occurred on the middle Sunday for the first time. It marks the centenary of the inaugural championships staged at the Centre Court. The ATP, ITF, and WTA did not award ranking points for the 2022 tournament, due to controversy over the tournament excluding players representing Russia and Belarus. The 2023 Wimbledon Championships will be the 136th staging and will run from 3 July 2023 to 16 July 2023 and it will be the first event of King Charles III since the death of the former patron, Queen Elizabeth II on 8 September 2022.	The Wimbledon Championships, commonly known simply as Wimbledon, is the oldest tennis tournament in the world and is widely regarded as the most prestigious. It has been held at the All England Club in Wimbledon, London, since 1877 and is played on outdoor grass courts, with retractable roofs over the two main courts since 2019. The tournament consists of five men’s singles events and five women’s singles events. The men’s singles is the most prestigious of the five men’s singles events, and	The Wimbledon Championships, commonly known simply as Wimbledon, is the oldest tennis tournament in the world and is widely regarded as the most prestigious. It has been held at the All England Club in Wimbledon, London, since 1877 and is played on outdoor grass courts, with retractable roofs over the two main courts since 2019.
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: What are the different type of coats of dogs? ### Input: The coats of domestic dogs are of two varieties: “double” being familiar with dogs (as well as wolves) originating from colder climates, made up of a coarse guard hair and a soft down hair, or “single”, with the topcoat only. Breeds may have an occasional “blaze”, stripe, or “star” of white fur on their chest or underside. Premature graying can occur in dogs from as early as one year of age; this is associated with impulsive behaviors, anxiety behaviors, fear of noise, and fear of unfamiliar people or animals. ### Response:	The coat of a dog varies depending on its breed. Typically a dog could have either a “double” coat or a “single” coat. “Double” coats are originated from breeds that live in colder climates. This type of coat consist of a coarse guard hair and a soft down hair. Dogs with “single” coat only has the topcoat.	The coats of domestic dogs are of two varieties: “double” being familiar with dogs (as well as wolves) originating from colder climates, made up of a coarse guard hair and a soft down hair, or “single”, with the topcoat only. Breeds may have an occasional “blaze”, stripe, or “star” of white fur on their chest or underside. Premature graying can occur in dogs from as early as one year of age; this is associated with impulsive behaviors	There are two types of coats of dogs: double and single. Double coats are made up of a coarse guard hair and a soft down hair. Single coats only have a topcoat.

Results for fine-tuning the Meta Llama 3.2 3B text generation model on the Dialogsum dataset

Inputs	Ground Truth	Response from non-finetuned model	Response from fine-tuned model
Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation. ### Instruction: #Person1#: Farmington? Would you take a look at these catalogue ads that Roger did? I don’t think we can use them. #Person2#: Hmm. I see what you mean. Have you spoken with Roger? He usually does better work than this. #Person1#: I know, Mr. Farmington. But, I’m not sure that I should bring this up with him. I think it would be better if you spoke to him. #Person2#: All right. Leave these ads with me. I’ll speak to him this morning. ### Response:	#Person1# and Farmington feel dissatisfied with the ads that Roger did. Farmington will speak to Roger.	The conversation between Person1 and Person2 revolves around a set of catalogue ads that Person1 has found. Person1 expresses their concerns about the quality of the ads, suggesting that they may not be suitable for use. Person2 agrees with Person1’s concerns and suggests that they should speak with Roger, the person responsible for creating the ads. Person2 offers to take a look at the ads and speak with Roger in the morning, indicating that they will address the issue. The conversation is focused on finding a solution to the problem with the catalogue ads.	#Person1# thinks the catalogue ads are not good. Mr. Farmington will speak to Roger.
Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation. ### Instruction: #Person1#: May I help you? #Person2#: My daughter. She is missing. I don’t know where she is. #Person1#: What dose she look like? #Person2#: She has blond hair and blue eyes. #Person1#: What was she wearing? #Person2#: She has a yellow dress on and red sneakers. #Person1#: When did you last see her? #Person2#: I just saw her down the street. I don’t know where she is. I don’t know. . . #Person1#: How long has it been? #Person2#: Oh, it’s been uh. . . fifteen minutes. ### Response:	#Person2# comes to #Person1# to report that #Person2#’s daughter is missing. #Person1# asks about her appearance and whereabouts.	The conversation between Person1 and Person2 revolves around a missing child. Person2 reports that their daughter is missing, and Person1 offers to help. Person2 describes the child’s appearance, including her blond hair, blue eyes, yellow dress, and red sneakers. Person1 asks about the last time Person2 saw the child and when it was. Person2 mentions that they saw the child down the street, but are unsure of her current location. Person1 presses for more information, asking how long it has been since Person2 last saw the child. Person2 estimates that it has been 15 minutes. The conversation is a search for information and a possible lead to locate the missing child.	#Person2# tells #Person1# that #Person2#’s daughter is missing.
Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation. ### Instruction: #Person1#: Hey, Ray, what are you doing right now? #Person2#: Not much. Joann. Do you want to hang out? #Person1#: Yes, I do. I’m at home myself right now with nothing much to do. #Person2#: Me, too. What would you like to do? #Person1#: Well, we could go to a movie. Have you see Shrek 2? #Person2#: I have, actually. How about the movie, Million Dollar Baby with Clint Eastwood? #Person1#: Hmm. . . I’Ve seen that, too. We could go for a walk in the park. #Person2#: We could, but it looks like it’s going to rain soon. #Person1#: I guess that’s out. Why don’t we go shopping? #Person2#: I’d really rather not. I’ll be too tempted to buy something that I can’t afford! #Person1#: Ok. I guess we’d be not do that, either, then. What do you want to do? #Person2#: I don’t know. Do you want to go bowling at the new bowling alley on 1st street? #Person1#: I would like to, but I injured my wrist last weekend. #Person2#: Let’s see. Do you want to go to a bar and sing some karaoke? #Person1#: That sounds interesting, but I can’t drink alcohol with the medicine I’m taking. #Person2#: Doesn’t sound like that’s be fun, then. Do you want to come here? #Person1#: I don’t have a car. #Person2#: Goodness, we’re not having much luck here, are we? #Person1#: No. do you have a car? #Person2#: No, but I could take the bus. #Person1#: Oh, good. Why don’t you take the bus to mine and we could play Mah Tiang? #Person2#: That sounds great! Have you eaten? #Person1#: Not yet. I’ll cook dinner and have it ready by the time you get here. #Person2#: Perfect. Are you prepared to give me all your money? #Person1#: Huh? We are going to play for money, aren’t we? #Person2#: Oh, right. I suppose so. As long as you promise to lose! #Person1#: Hey. . . as long as I’m not alone on a Saturday night, I don’t really care what I do! #Person2#: See you soon. #Person1#: Bye! ### Response:	Ray and Joann have nothing much to do at home and decide to go out. They discuss some choices, including watching a movie, going shopping, and going to a bar. But finally, Ray decides to go to Joann’s home. They will have dinner and play Mah Tiang together.	Ray and Joann were discussing what to do on a Saturday night. They initially considered going to a movie, but decided against it because of the rain. They also ruled out going shopping because Joann was worried about overspending. Ray suggested bowling, but Joann’s injured wrist made that impossible. They then considered going to a bar to sing karaoke, but Ray couldn’t drink due to his medication. Joann suggested coming to Ray’s place, but he didn’t have a car. They decided to play Mahjong instead, and Joann asked if Ray had eaten. Finally, they agreed to play a game of Mahjong for money, with Joann hoping to win. The conversation ended with Ray joking that he didn’t care what he did as long as he wasn’t alone on a Saturday night.	#Person1# and Ray are looking for something to do. They try to decide on a movie, a walk, shopping, bowling, karaoke, and Mah Tiang. Finally, they decide to play Mah Tiang.
Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation. ### Instruction: #Person1#: The weather in Beijing is just beautiful! #Person2#: Autumn is the best season in Beijing, you know. #Person1#: Yes, I can see. The summer heat is over, and the winter cold is still far away. #Person2#: What is the weather like in Washington at this time of this year? #Person1#: Autumn is also very nice in Washington. #Person2#: Is it very cold in winter there? #Person1#: No, it isn’t. But sometimes it snows a lot in winter. #Person2#: Does it rain a lot in summer there? #Person1#: Oh yes! Sometimes it rains very heavily. How about here in Beijing? #Person2#: There’s a lot of rain in summer, but it seldom rains in autumn. #Person1#: I feel it’s rather dry here. #Person2#: That’s true. Many people from the south don’t like the weather here. #Person1#: Are you from the south? #Person2#: No, I’m from the north. I was born in Beijing, actually. #Person1#: I see. so this is your hometown? ### Response:	#Person2# describes the weather of summer, autumn, and winter in Beijing and #Person1# talks about how the weather is like in Washington.	Person1 and Person2 are discussing the weather in Beijing and Washington. They both agree that autumn is a great season in both cities. Person1 mentions that the summer heat is over and the winter cold is still far away, which suggests that the weather in Beijing is mild in autumn. Person2 asks about the weather in Washington, and Person1 responds that autumn is also nice there. They discuss the weather in Washington, with Person2 asking about the cold in winter and Person1 saying it’s not very cold, but sometimes it snows. Person2 then asks about the rain in summer, and Person1 says it rains heavily. Person2 mentions that Beijing has a lot of rain in summer, but not in autumn, which Person1 agrees with. Person1 mentions that Beijing is dry in autumn, and Person2 says that many people from the south don’t like the weather there. Person1 asks if Person2 is from the south, and Person2 responds that they are from the north, born in Beijing. Person1 then asks if Beijing is Person2’s hometown, and Person2 confirms that it is. Note: The response is a summary of the conversation, not a rephrased version of the original text. It aims to capture the main points and ideas discussed between the two people. Here is the response in a more formal tone: The conversation between Person1 and Person2 revolves around the weather in Beijing and Washington. They both appreciate the beauty of autumn in both	#Person1# and #Person2# talk about the weather in Beijing and Washington. #Person2# is from the north and was born in Beijing.
Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation. ### Instruction: #Person1#: What are you doing this weekend? #Person2#: My brother-in-law is having a small get-together at his house and he invited me. #Person1#: Is it a family thing or just friends? #Person2#: A bit of both. Some cousins, aunts and uncles will be there, but also some friends from the neighborhood. #Person1#: Is your great uncle Rick going to be there? He is really funny. #Person2#: Yeah he is going to be there with his step-son and his ex-wife. #Person1#: You mean your sister? #Person2#: No, Rick is actually my great uncle, so he is my grandmother’s brother. #Person1#: You lost me. #Person2#: I’ll explain later, let’s go. ### Response:	#Person2#’s brother-in-law invites #Person2# to attend a small get-together with #Person2#’s relatives and friends. #Person1# is confused about #Person2#’s family relationship.	Person1 and Person2 are discussing a weekend gathering at Person2’s brother-in-law’s house. The event is a mix of family and friends, with some cousins, aunts, uncles, and neighbors attending. Person1 asks if Person2’s great uncle Rick, who is known for being funny, will be there. Person2 confirms that Rick will be attending, along with his step-son and ex-wife, who is Person2’s sister. Person1 seems confused about the family relationship, and Person2 agrees to explain later. The conversation is light-hearted and casual, with a focus on the upcoming social gathering.	#Person2# will attend a small get-together at #Person2#’s brother-in-law’s house. #Person1# asks #Person2# about the people who will be there.

Dataset formatting

We currently offer two types of fine-tuning: instruction fine-tuning and domain adaption fine-tuning. You can switch to one of the training methods by specifying the parameter instruction_tuned as True or False.

Domain adaption format

The text generation model can be fine-tuned on any domain-specific dataset to incorporate domain-specific knowledge and language patterns. After fine-tuning on the domain-specific dataset, the model is expected to generate more relevant and accurate text within that domain. Although few-shot prompting can also guide the model towards domain-specific generation, the fine-tuning process plays a crucial role in adapting the model’s understanding and generation capabilities to the target domain. The combination of fine-tuning on domain data and effective prompting techniques can enable the model to perform various NLP tasks within that specific domain more effectively.

For input to the model, use a training and optional validation directory. Each directory contains a CSV, JSON, or TXT file. For CSV and JSON files, the train or validation data is used from the column called text or the first column if no column called text is found. The number of files under train and validation (if provided) should equal to 1, respectively.

The output is a trained model that can be deployed for inference.

The following is an example of a TXT file for fine-tuning the text generation model. The TXT file is SEC filings of Amazon from 2021–2022:

This report includes estimates, projections, statements relating to our business plans, objectives, 
and expected operating results that are “forward- looking statements” within the meaning of the Private
 Securities Litigation Reform Act of 1995, Section 27A of the Securities Act of 1933, and Section 21E 
of the Securities Exchange Act of 1934. Forward-looking statements may appear throughout this report,
 including the following sections: “Business” (Part I, Item 1 of this Form 10-K), “Risk Factors” 
(Part I, Item 1A of this Form 10-K), and “Management’s Discussion and Analysis of Financial Condition
 and Results of Operations” (Part II, Item 7 of this Form 10-K). These forward-looking statements 
generally are identified by the words “believe,” “project,” “expect,” “anticipate,” “estimate,” 
“intend,” “strategy,” “future,” “opportunity,” “plan,” “may,” “should,” “will,” “would,” 
“will be,” “will continue,” “will likely result,” and similar expressions. Forward-looking 
statements are based on current expectations and assumptions that are subject to 
risks and uncertainties that may cause actual results to differ materially. 
We describe risks and uncertainties that could cause actual results and 
events to differ materially in “Risk Factors,” “Management’s Discussion and 
Analysis of Financial Condition and Results of Operations,” and “Quantitative 
and Qualitative Disclosures about Market Risk” (Part II, Item 7A of this Form 10-K). 
Readers are cautioned not to place undue reliance on forward-looking statements, 
which speak only as of the date they are made. We undertake no obligation 
to update or revise publicly any forward-looking statements, whether because 
of new information, future events, or otherwise. GENERAL Embracing Our Future ...

Instruction fine-tuning

The text generation model can be instruction-tuned on any text data provided that the data is in the expected format. The instruction-tuned model can be further deployed for inference. By default, instruction tuning is set to false. Therefore, to use an instruction tuning dataset, you use instruction_tuned="True".

For input, you can use a training and optional validation directory. The training and validation directories should contain one or multiple JSON lines (.jsonl) formatted files. In particular, the train directory can also contain an optional *.json file describing the input and output formats.

The best model is selected according to the validation loss, calculated at the end of each epoch. If a validation set is not given, an (adjustable) percentage of the training data is automatically split and used for validation.

The training data must be formatted in a JSON lines (.jsonl) format, where each line is a dictionary representing a single data sample. All training data must be in a single folder; however, it can be saved in multiple .jsonl files. The .jsonl file extension is mandatory. The training folder can also contain a template.json file describing the input and output formats. If no template file is given, the following template will be used:

{
    "prompt": "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:n{instruction}nn### Input:n{context}nn",
    "completion": "{response}"
}

In this case, the data in the JSON lines entries must include prompt and completion fields. If a custom template is provided, it must also use prompt and completion keys to define the input and output templates. The following is a sample custom template:

{
    "prompt": "question: {question} context: {context}",
    "completion": "{answer}"
}

Here, the data in the JSON lines entries must include the question, context, and answer fields.

The output is a trained model that can be deployed for inference.

We provide a subset of SEC filings data of Amazon. It is downloaded from publicly available EDGAR. For instructions on accessing the data, refer to Accessing EDGAR Data.

License: Creative Commons Attribution-ShareAlike License (CC BY-SA 4.0)