Best practices and design patterns for building machine learning workflows with Amazon SageMaker Pipelines

Best practices and design patterns for building machine learning workflows with Amazon SageMaker Pipelines

Amazon SageMaker Pipelines is a fully managed AWS service for building and orchestrating machine learning (ML) workflows. SageMaker Pipelines offers ML application developers the ability to orchestrate different steps of the ML workflow, including data loading, data transformation, training, tuning, and deployment. You can use SageMaker Pipelines to orchestrate ML jobs in SageMaker, and its integration with the larger AWS ecosystem also allows you to use resources like AWS Lambda functions, Amazon EMR jobs, and more. This enables you to build a customized and reproducible pipeline for specific requirements in your ML workflows.

In this post, we provide some best practices to maximize the value of SageMaker Pipelines and make the development experience seamless. We also discuss some common design scenarios and patterns when building SageMaker Pipelines and provide examples for addressing them.

Best practices for SageMaker Pipelines

In this section, we discuss some best practices that can be followed while designing workflows using SageMaker Pipelines. Adopting them can improve the development process and streamline the operational management of SageMaker Pipelines.

Use Pipeline Session for lazy loading of the pipeline

Pipeline Session enables lazy initialization of pipeline resources (the jobs are not started until pipeline runtime). The PipelineSession context inherits the SageMaker Session and implements convenient methods for interacting with other SageMaker entities and resources, such as training jobs, endpoints, input datasets in Amazon Simple Storage Service (Amazon S3), and so on. When defining SageMaker Pipelines, you should use PipelineSession over the regular SageMaker Session:

from sagemaker.workflow.pipeline_context import PipelineSession
from sagemaker.sklearn.processing import SKLearnProcessor
role = sagemaker.get_execution_role()
pipeline_session = PipelineSession()
sklearn_processor = SKLearnProcessor(
    framework_version=’0.20.0’,
    instance_type=’ml.m5.xlarge’,
    instance_count=1,
    base_job_name="sklearn-abalone-process",
    role=role,
    sagemaker_session=pipeline_session,
)

Run pipelines in local mode for cost-effective and quick iterations during development

You can run a pipeline in local mode using the LocalPipelineSession context. In this mode, the pipeline and jobs are run locally using resources on the local machine, instead of SageMaker managed resources. Local mode provides a cost-effective way to iterate on the pipeline code with a smaller subset of data. After the pipeline is tested locally, it can be scaled to run using the PipelineSession context.

from sagemaker.sklearn.processing import SKLearnProcessor
from sagemaker.workflow.pipeline_context import LocalPipelineSession
local_pipeline_session = LocalPipelineSession()
role = sagemaker.get_execution_role()
sklearn_processor = SKLearnProcessor(
    framework_version=’0.20.0’,
    instance_type=’ml.m5.xlarge,
    instance_count=1,
    base_job_name="sklearn-abalone-process",
    role=role,
    sagemaker_session=local_pipeline_session,
)

Manage a SageMaker pipeline through versioning

Versioning of artifacts and pipeline definitions is a common requirement in the development lifecycle. You can create multiple versions of the pipeline by naming pipeline objects with a unique prefix or suffix, the most common being a timestamp, as shown in the following code:

from sagemaker.workflow.pipeline_context import PipelineSession
import time

current_time = time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())
pipeline_name = "pipeline_" + current_time
pipeline_session = PipelineSession()
pipeline = Pipeline(
    name=pipeline_name,
    steps=[step_process, step_train, step_eval, step_cond],
    sagemaker_session=pipeline_session,
)

Organize and track SageMaker pipeline runs by integrating with SageMaker Experiments

SageMaker Pipelines can be easily integrated with SageMaker Experiments for organizing and tracking pipeline runs. This is achieved by specifying PipelineExperimentConfig at the time of creating a pipeline object. With this configuration object, you can specify an experiment name and a trial name. The run details of a SageMaker pipeline get organized under the specified experiment and trial. If you don’t explicitly specify an experiment name, a pipeline name is used for the experiment name. Similarly, if you don’t explicitly specify a trial name, a pipeline run ID is used for the trial or run group name. See the following code:

Pipeline(
    name="MyPipeline",
    parameters=[...],
    pipeline_experiment_config=PipelineExperimentConfig(
        experiment_name = ExecutionVariables.PIPELINE_NAME,
        trial_name = ExecutionVariables.PIPELINE_EXECUTION_ID
        ),
    steps=[...]
)

Securely run SageMaker pipelines within a private VPC

To secure the ML workloads, it’s a best practice to deploy the jobs orchestrated by SageMaker Pipelines in a secure network configuration within a private VPC, private subnets, and security groups. To ensure and enforce the usage of this secure environment, you can implement the following AWS Identity and Access Management (IAM) policy for the SageMaker execution role (this is the role assumed by the pipeline during its run). You can also add the policy to run the jobs orchestrated by SageMaker Pipelines in network isolation mode.

# IAM Policy to enforce execution within a private VPC

{

    "Action": [

        "sagemaker:CreateProcessingJob",
        "sagemaker:CreateTrainingJob",
        "sagemaker:CreateModel"
    ],

    "Resource": "*",
    "Effect": "Deny",
    "Condition": {
        "Null": {
            "sagemaker:VpcSubnets": "true"
        }
    }
}

# IAM Policy to enforce execution in network isolation mode
{

    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Deny",
            "Action": [
                "sagemaker:Create*"
            ],
            "Resource": "*",
            "Condition": {
                "StringNotEqualsIfExists": {
                    "sagemaker:NetworkIsolation": "true"
                }
            }
        }
    ]
}

For an example of pipeline implementation with these security controls in place, refer to Orchestrating Jobs, Model Registration, and Continuous Deployment with Amazon SageMaker in a secure environment.

Monitor the cost of pipeline runs using tags

Using SageMaker pipelines by itself is free; you pay for the compute and storage resources you spin up as part of the individual pipeline steps like processing, training, and batch inference. To aggregate the costs per pipeline run, you can include tags in every pipeline step that creates a resource. These tags can then be referenced in the cost explorer to filter and aggregate total pipeline run cost, as shown in the following example:

sklearn_processor = SKLearnProcessor(
    framework_version=’0.20.0’,
    instance_type=’ml.m5.xlarge,
    instance_count=1,
    base_job_name="sklearn-abalone-process",
    role=role,
    tags=[{'Key':'pipeline-cost-tag', 'Value':'<<tag_parameter>>'}]
)

step_process = ProcessingStep(
    name="AbaloneProcess",
    processor=sklearn_processor,
    ...
)

From the cost explorer, you can now get the cost filtered by the tag:

response = client.get_cost_and_usage(
    TimePeriod={
        'Start': '2023-07-01',
        'End': '2023-07-15'
        },
    Metrics=['BLENDED_COST','USAGE_QUANTITY','UNBLENDED_COST'],
    Granularity='MONTHLY',
    Filter={
        'Dimensions': {
            'Key':'USAGE_TYPE',
            'Values': [
                ‘SageMaker:Pipeline’
            ]
        },
        'Tags': {
            'Key': 'keyName',
            'Values': [
                'keyValue',
                ]
        }
    }
)

Design patterns for some common scenarios

In this section, we discuss design patterns for some common use cases with SageMaker Pipelines.

Run a lightweight Python function using a Lambda step

Python functions are omnipresent in ML workflows; they are used in preprocessing, postprocessing, evaluation, and more. Lambda is a serverless compute service that lets you run code without provisioning or managing servers. With Lambda, you can run code in your preferred language that includes Python. You can use this to run custom Python code as part of your pipeline. A Lambda step enables you to run Lambda functions as part of your SageMaker pipeline. Start with the following code:

%%writefile lambdafunc.py

import json

def lambda_handler(event, context):
    str1 = event["str1"]
    str2 = event["str2"]
    str3 = str1 + str2
    return {
        "str3": str3
    }

Create the Lambda function using the SageMaker Python SDK’s Lambda helper:

from sagemaker.lambda_helper import Lambda

def create_lambda(function_name, script, handler):
    response = Lambda(
        function_name=function_name,
        execution_role_arn=role,
        script= script,
        handler=handler,
        timeout=600,
        memory_size=10240,
    ).upsert()

    function_arn = response['FunctionArn']
    return function_arn

fn_arn = create_Lambda("func", "lambdafunc.py", handler = "lambdafunc.lambda_handler")

Call the Lambda step:

from sagemaker.lambda_helper import Lambda
from sagemaker.workflow.lambda_step import (
    LambdaStep,
    LambdaOutput,
    LambdaOutputTypeEnum
)

str3 = LambdaOutput(output_name="str3", output_type=LambdaOutputTypeEnum.String)

# Lambda Step
step_lambda1 = LambdaStep(
    name="LambdaStep1",
    lambda_func=Lambda(
        function_arn=fn_arn
    ),
    inputs={
        "str1": "Hello",
        "str2": " World"
    },
    outputs=[str3],
)

Pass data between steps

Input data for a pipeline step is either an accessible data location or data generated by one of the previous steps in the pipeline. You can provide this information as a ProcessingInput parameter. Let’s look at a few scenarios of how you can use ProcessingInput.

Scenario 1: Pass the output (primitive data types) of a Lambda step to a processing step

Primitive data types refer to scalar data types like string, integer, Boolean, and float.

The following code snippet defines a Lambda function that returns a dictionary of variables with primitive data types. Your Lambda function code will return a JSON of key-value pairs when invoked from the Lambda step within the SageMaker pipeline.

def handler(event, context):
    ...
    return {
        "output1": "string_value",
        "output2": 1,
        "output3": True,
        "output4": 2.0,
    }

In the pipeline definition, you can then define SageMaker pipeline parameters that are of a specific data type and set the variable to the output of the Lambda function:

from sagemaker.workflow.lambda_step import (
    LambdaStep,
    LambdaOutput,
    LambdaOutputTypeEnum
)
from sagemaker.workflow.pipeline_context import PipelineSession
from sagemaker.sklearn.processing import SKLearnProcessor

role = sagemaker.get_execution_role()
pipeline_session = PipelineSession()

# 1. Define the output params of the Lambda Step

str_outputParam = LambdaOutput(output_name="output1", output_type=LambdaOutputTypeEnum.String)
int_outputParam = LambdaOutput(output_name"output2", output_type=LambdaOutputTypeEnum.Integer)
bool_outputParam = LambdaOutput(output_name"output3", output_type=LambdaOutputTypeEnum.Boolean)
float_outputParam = LambdaOutput(output_name"output4", output_type=LambdaOutputTypeEnum.Float)

# 2. Lambda step invoking the lambda function and returns the Output

step_lambda = LambdaStep(
    name="MyLambdaStep",
    lambda_func=Lambda(
        function_arn="arn:aws:lambda:us-west-2:123456789012:function:sagemaker_test_lambda",
        session=PipelineSession(),
        ),
    inputs={"arg1": "foo", "arg2": "foo1"},
    outputs=[
        str_outputParam, int_outputParam, bool_outputParam, float_outputParam
        ],
)

# 3. Extract the output of the Lambda

str_outputParam = step_lambda.properties.Outputs["output1"]

# 4. Use it in a subsequent step. For ex. Processing step

sklearn_processor = SKLearnProcessor(
    framework_version="0.23-1",
    instance_type="ml.m5.xlarge",
    instance_count=1,
    sagemaker_session=pipeline_session,
    role=role
)

processor_args = sklearn_processor.run(
    code="code/preprocess.py", #python script to run
    arguments=["--input-args", str_outputParam]
)

step_process = ProcessingStep(
    name="processstep1",
    step_args=processor_args,
)

Scenario 2: Pass the output (non-primitive data types) of a Lambda step to a processing step

Non-primitive data types refer to non-scalar data types (for example, NamedTuple). You may have a scenario when you have to return a non-primitive data type from a Lambda function. To do this, you have to convert your non-primitive data type to a string:

# Lambda function code returning a non primitive data type

from collections import namedtuple

def lambda_handler(event, context):
    Outputs = namedtuple("Outputs", "sample_output")
    named_tuple = Outputs(
                    [
                        {'output1': 1, 'output2': 2},
                        {'output3': 'foo', 'output4': 'foo1'}
                    ]
                )
return{
    "named_tuple_string": str(named_tuple)
}
#Pipeline step that uses the Lambda output as a “Parameter Input”

output_ref = step_lambda.properties.Outputs["named_tuple_string"]

Then you can use this string as an input to a subsequent step in the pipeline. To use the named tuple in the code, use eval() to parse the Python expression in the string:

# Decipher the string in your processing logic code

import argparse
from collections import namedtuple

Outputs = namedtuple("Outputs", "sample_output")

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--named_tuple_string", type=str, required=True)
    args = parser.parse_args()
    #use eval to obtain the named tuple from the string
    named_tuple = eval(args.named_tuple_string)

Scenario 3: Pass the output of a step through a property file

You can also store the output of a processing step in a property JSON file for downstream consumption in a ConditionStep or another ProcessingStep. You can use the JSONGet function to query a property file. See the following code:

# 1. Define a Processor with a ProcessingOutput
sklearn_processor = SKLearnProcessor(
    framework_version="0.23-1",
    instance_type="ml.m5.xlarge",
    instance_count=1,
    base_job_name="sklearn-abalone-preprocess",
    sagemaker_session=session,
    role=sagemaker.get_execution_role(),
)

step_args = sklearn_processor.run(

                outputs=[
                    ProcessingOutput(
                        output_name="hyperparam",
                        source="/opt/ml/processing/evaluation"
                    ),
                ],
            code="./local/preprocess.py",
            arguments=["--input-data", "s3://my-input"],
)

# 2. Define a PropertyFile where the output_name matches that with the one used in the Processor
hyperparam_report = PropertyFile(
    name="AbaloneHyperparamReport",
    output_name="hyperparam",
    path="hyperparam.json",
)

Let’s assume the property file’s contents were the following:

{
    "hyperparam": {
        "eta": {
            "value": 0.6
        }
    }
}

In this case, it can be queried for a specific value and used in subsequent steps using the JsonGet function:

# 3. Query the property file
eta = JsonGet(
    step_name=step_process.name,
    property_file=hyperparam_report,
    json_path="hyperparam.eta.value",
)

Parameterize a variable in pipeline definition

Parameterizing variables so that they can be used at runtime is often desirable—for example, to construct an S3 URI. You can parameterize a string such that it is evaluated at runtime using the Join function. The following code snippet shows how to define the variable using the Join function and use that to set the output location in a processing step:

# define the variable to store the s3 URI
s3_location = Join(
    on="/", 
    values=[
        "s3:/",
        ParameterString(
            name="MyBucket", 
            default_value=""
        ),
        "training",
        ExecutionVariables.PIPELINE_EXECUTION_ID
    ]
)

# define the processing step
sklearn_processor = SKLearnProcessor(
    framework_version="1.2-1",
    instance_type="ml.m5.xlarge",
    instance_count=processing_instance_count,
    base_job_name=f"{base_job_prefix}/sklearn-abalone-preprocess",
    sagemaker_session=pipeline_session,
    role=role,
)

# use the s3uri as the output location in processing step
processor_run_args = sklearn_processor.run(
    outputs=[
        ProcessingOutput(
            output_name="train",
            source="/opt/ml/processing/train",
            destination=s3_location,
        ),
    ],
    code="code/preprocess.py"
)

step_process = ProcessingStep(
    name="PreprocessingJob”,
    step_args=processor_run_args,
)

Run parallel code over an iterable

Some ML workflows run code in parallel for-loops over a static set of items (an iterable). It can either be the same code that gets run on different data or a different piece of code that needs to be run for each item. For example, if you have a very large number of rows in a file and want to speed up the processing time, you can rely on the former pattern. If you want to perform different transformations on specific sub-groups in the data, you might have to run a different piece of code for every sub-group in the data. The following two scenarios illustrate how you can design SageMaker pipelines for this purpose.

Scenario 1: Implement a processing logic on different portions of data

You can run a processing job with multiple instances (by setting instance_count to a value greater than 1). This distributes the input data from Amazon S3 into all the processing instances. You can then use a script (process.py) to work on a specific portion of the data based on the instance number and the corresponding element in the list of items. The programming logic in process.py can be written such that a different module or piece of code gets run depending on the list of items that it processes. The following example defines a processor that can be used in a ProcessingStep:

sklearn_processor = FrameworkProcessor(
    estimator_cls=sagemaker.sklearn.estimator.SKLearn,
    framework_version="0.23-1",
    instance_type='ml.m5.4xlarge',
    instance_count=4, #number of parallel executions / instances
    base_job_name="parallel-step",
    sagemaker_session=session,
    role=role,
)

step_args = sklearn_processor.run(
    code='process.py',
    arguments=[
        "--items", 
        list_of_items, #data structure containing a list of items
        inputs=[
            ProcessingInput(source="s3://sagemaker-us-east-1-xxxxxxxxxxxx/abalone/abalone-dataset.csv",
                    destination="/opt/ml/processing/input"
            )
        ],
    ]
)

Scenario 2: Run a sequence of steps

When you have a sequence of steps that need to be run in parallel, you can define each sequence as an independent SageMaker pipeline. The run of these SageMaker pipelines can then be triggered from a Lambda function that is part of a LambdaStep in the parent pipeline. The following piece of code illustrates the scenario where two different SageMaker pipeline runs are triggered:

import boto3
def lambda_handler(event, context):
    items = [1, 2]
    #sagemaker client
    sm_client = boto3.client("sagemaker")
    
    #name of the pipeline that needs to be triggered.
    #if there are multiple, you can fetch available pipelines using boto3 api
    #and trigger the appropriate one based on your logic.
    pipeline_name = 'child-pipeline-1'

    #trigger pipeline for every item
    response_ppl = sm_client.start_pipeline_execution(
                        PipelineName=pipeline_name,
                        PipelineExecutionDisplayName=pipeline_name+'-item-%d' %(s),
                    )
    pipeline_name = 'child-pipeline-2'
    response_ppl = sm_client.start_pipeline_execution(
                        PipelineName=pipeline_name,
                        PipelineExecutionDisplayName=pipeline_name+'-item-%d' %(s),
                    )
return

Conclusion

In this post, we discussed some best practices for the efficient use and maintenance of SageMaker pipelines. We also provided certain patterns that you can adopt while designing workflows with SageMaker Pipelines, whether you are authoring new pipelines or are migrating ML workflows from other orchestration tools. To get started with SageMaker Pipelines for ML workflow orchestration, refer to the code samples on GitHub and Amazon SageMaker Model Building Pipelines.


About the Authors

Pinak Panigrahi works with customers to build machine learning driven solutions to solve strategic business problems on AWS. When not occupied with machine learning, he can be found taking a hike, reading a book or watching sports.

Meenakshisundaram Thandavarayan works for AWS as an AI/ ML Specialist. He has a passion to design, create, and promote human-centered data and analytics experiences. Meena focusses on developing sustainable systems that deliver measurable, competitive advantages for strategic customers of AWS. Meena is a connector, design thinker, and strives to drive business to new ways of working through innovation, incubation and democratization.

Read More

Incorporating chemists’ insight with AI models for single-step retrosynthesis prediction

Incorporating chemists’ insight with AI models for single-step retrosynthesis prediction

Retrosynthesis -

Retrosynthesis analysis is a critical task in organic chemistry and central to many important industries. It primarily involves decomposing a target molecule into commercially available molecules step by step. Since synthesis strategies can be quite diverse and strategic, retrosynthesis planning with expert knowledge has long been considered an “art.”

Recently, machine learning-based approaches have achieved promising results on this task, particularly in single-step retrosynthesis prediction. In retrosynthesis, a molecule can be represented as either a 2D graph or a 1D SMILES (simplified molecular-input line-entry system) sequence. SMILES is a notation system used to represent chemical structures using plain text, which consists of a sequence of characters to describe the arrangement of atoms, bonds, and rings within a molecule. SMILES can be considered a traversal on the corresponding molecular graph, as shown in Figure 1.

Retrosynthesis -
Figure 1: An example of molecular graph and SMILES string

Given the representations of molecules, most machine learning-based approaches employ encoder-decoder frameworks, where the encoder part encodes the molecular (the target product) sequence or graph as high dimensional vectors, and the decoder takes the output from the encoder and generates the output sequence (the predicted reactant) token-by-token autoregressively. 

Casting retrosynthesis analysis as a sequence decoding problem enables the use of deep neural architectures that are well-developed in machine translation or graph neural networks. While AI has made significant strides in predicting reactants, it’s crucial to acknowledge the expertise of human chemists. In real-world route scouting tasks, synthetic chemists rely on their professional experience and abstract understanding of underlying mechanisms. They often start with molecular substructures or fragments that are chemically similar to target molecules, providing clues for a series of chemical reactions that may yield the target product.

Our paper, Single-step retrosynthesis prediction by leveraging commonly preserved substructures (opens in new tab), proposes a novel approach that leverages commonly preserved substructures in organic synthesis. This approach incorporates chemists’ insight in retrosynthesis, bringing the AI model closer to the way human experts think.

Substructure extraction and modeling

In the context of organic chemistry, “substructures” refer to molecular fragments or smaller building blocks that are chemically similar or preserved within target molecules. These substructures serve as essential components for understanding the assembly of complex molecules and play a significant role in retrosynthesis analysis. 

Based on this concept, our framework consists of three main modules:

  1. Reaction Retrieval: This module retrieves similar reactions, given a product molecule as a query. It uses a learnable cross-lingual memory retriever to align reactants and products in high-dimensional vector space.
  2. Substructure Extraction: We extract the common substructures from the product molecule and the top cross-aligned candidates, based on molecular fingerprints. These substructures provide a reaction-level, fragment-to-fragment mapping between reactants and products.
  3. Substructure-level Sequence-to-Sequence Learning: We convert the original token-level sequence to a substructure-level sequence. The new input sequence includes the SMILES strings of the substructures followed by the SMILES strings of other fragments with virtual number labels. The output sequences are the fragments with virtual numbers. The virtual numbers are used to indicate the bond breaking/connecting site.
Retrosynthesis -
Figure 2: Method overview, with virtual number labeled atoms and substructures highlighted in green.

Unlike most existing work, our model only needs to predict the fragments connected to the substructure, thereby simplifying the prediction task, with the substructure part remaining unchanged. 

In the example shown in Figure 2, the substructure “COC(=O)Cc1cc2ccc(F)cc2[2cH]c1C.C[1SH](=O)=O” remains unchanged, and the model only needs to predict that the fragment “[2BH]2OC(C)(C)C(C)(C)O2.[1cH]1ccc(Br)nc1”. The substructure SMILES and the predicted fragment SMILES are then combined to form a complete reactants SMILES.

Spotlight: On-demand video

AI Explainer: Foundation models ​and the next era of AI

Explore how the transformer architecture, larger models and more data, and in-context learning have helped advance AI from perception to creation.


Retrosynthesis prediction

We analyzed our method using the USPTO full dataset (opens in new tab) and compared it to other notable works in the field. In almost every scenario, our method achieved comparable or better top-1 accuracy compared to previously tested methods. On the subset of data where substructures were successfully extracted, model performance significantly improved compared to the overall result. 

The improvement in our method can be attributed to two main factors:

  1. Our method managed to successfully extract substructures from 82.2% of all products on the USPTO full test dataset, demonstrating the general applicability of this approach. 
  2. We only needed to generate fragments connected to virtually labeled atoms in the substructures, which shortened the string representations of molecules and significantly lowered the number of atoms to be predicted.
Retrosynthesis -
Figure 3: Product molecule specific substructures. These reactants all contain phthalimide, with substructures highlighted in green.

A key aspect of our method for one-step retrosynthesis is the extraction of product-specific substructures. By doing so, we can better capture subtle structural changes from reactants to products that are unique to each reaction. Take phthalimide, a common heterocyclic substructure, as an example. We analyzed four exemplary reactions where the reactants contain phthalimide (see Figure 3). The extracted substructures vary among different reaction types, demonstrating the product-specific nature of the substructures.

In reaction (a) and reaction (b), phthalimide is not considered part of the substructure because it incorporates the reaction. However, in reaction (c) and reaction (d), the substructures are different, yet they both contain phthalimide. These results show that substructures are indeed product-specific, which aligns with our expectations.

Incorporating human insights into decision-making 

In addition, leveraging commonly preserved substructures offers another benefit: providing users with valuable insights for decision-making in retrosynthesis planning. When compared to existing methods, our approach can help human experts assess potential pathways and eliminate infeasible reactions using their chemistry knowledge. 

For each input product molecule, we extract multiple substructures from retrieved reactions, (see details in our paper) and for some cases, not all substructures are correct. As such, we can group predictions by substructures. As shown in Figure 4, the predicted groups of reactants and reactions offer valuable information to experts. For instance, they can refine predictions by comparing reactions associated with retrieved candidates, making our predictions more explainable and trustworthy compared to existing “black-box” models.

Retrosynthesis -
Figure 4: Substructures and predictions grouped by substructures. The retrieved candidate reactants (#2, #3 and #4) indicate that the substructures extracted from the retrieved reactant #1 are likely incorrect, because the triple bond is likely a reaction site. The extracted substructures are highlighted in green.

We hope that our work will spark interest in this fast-growing and highly interdisciplinary area of retrosynthesis prediction and other related topics. By pushing the boundaries of what’s possible in chemistry and machine learning, we can continue to make strides in understanding complex chemical reactions and designing more efficient retrosynthetic strategies.

The post Incorporating chemists’ insight with AI models for single-step retrosynthesis prediction appeared first on Microsoft Research.

Read More

Attention, Please: Focus Entertainment Brings Game Pass Titles to GeForce NOW

Attention, Please: Focus Entertainment Brings Game Pass Titles to GeForce NOW

GeForce NOW brings expanded support for PC Game Pass to members this week. Members can stream eight more games from Microsoft’s subscription service, including four titles from hit publisher Focus Entertainment.

Play A Plague Tale: Requiem, Atomic Heart and more from the GeForce NOW library at up to 4K resolution and 120 frames per second with a GeForce NOW Ultimate membership.

Plus, time’s almost up to take on the Ultimate KovaaK’s Challenge. Get on the leaderboard today — the challenge ends on Thursday, Sept. 21.

Laser-Focused 

Four games from Focus Entertainment’s PC Game Pass catalog join GeForce NOW this week. Members signed up with Microsoft’s subscription service can now stream titles like A Plague Tale: Requiem, Atomic Heart and more titles at stunning quality across their devices — without additional purchases.

A Plague Tale Requiem on GeForce NOW
The ultimate test of love and survival on the ultimate cloud gaming service.

Embark on a heartrending journey into a brutal, breathtaking world in the critically acclaimed A Plague Tale: Requiem or explore an alternate history of the 1950s Soviet Union in Atomic Heart. Go off road in SnowRunner or salvage among the stars in Hardspace: Shipbreaker. Members can even bring the squad together for military battles in Insurgency: Sandstorm. There’s something for everyone.

Experience it all with a PC Game Pass subscription, best paired with a GeForce NOW Ultimate membership, which provides up to 4K streaming or up to 240 fps for the ultimate cloud gaming experience.

Endless Adventures

SYNCED on GeForce NOW
Venture into the collapsed world for intense PvE and PvP combats in SYNCED on GeForce NOW.

A new week, a new batch of games. Catch the 16 new games supported in the cloud this week:

  • Chants of Sennaar (New release on Steam, Sept. 5)
  • SYNCED (New release on Steam, Sept. 7)
  • Void Crew (New release on Steam, Sept. 7)
  • Deceive Inc. (Steam)
  • A Plague Tale: Requiem (Xbox)
  • Airborne Kingdom (Epic Games Store)
  • Atomic Heart (Xbox)
  • Call of the Wild: The Angler (Xbox)
  • Danganronpa V3: Killing Harmony (Xbox)
  • Death in the Water (Steam)
  • Hardspace: Shipbreaker (Xbox)
  • Insurgency: Sandstorm (Xbox)
  • Monster Sanctuary (Xbox)
  • Saints Row (Steam)
  • Shadowrun: Hong Kong – Extended Edition (Xbox)
  • SnowRunner (Xbox)
  • War for the Overworld (Steam)

What are you planning to play this weekend? Let us know on Twitter or in the comments below.

Read More

TSMixer: An all-MLP architecture for time series forecasting

TSMixer: An all-MLP architecture for time series forecasting

Time series forecasting is critical to various real-world applications, from demand forecasting to pandemic spread prediction. In multivariate time series forecasting (forecasting multiple variants at the same time), one can split existing methods into two categories: univariate models and multivariate models. Univariate models focus on inter-series interactions or temporal patterns that encompass trends and seasonal patterns on a time series with a single variable. Examples of such trends and seasonal patterns might be the way mortgage rates increase due to inflation, and how traffic peaks during rush hour. In addition to inter-series patterns, multivariate models process intra-series features, known as cross-variate information, which is especially useful when one series is an advanced indicator of another series. For example, a rise in body weight may cause an increase in blood pressure, and increasing the price of a product may lead to a decrease in sales. Multivariate models have recently become popular solutions for multivariate forecasting as practitioners believe their capability of handling cross-variate information may lead to better performance.

In recent years, deep learning Transformer-based architectures have become a popular choice for multivariate forecasting models due to their superior performance on sequence tasks. However, advanced multivariate models perform surprisingly worse than simple univariate linear models on commonly-used long-term forecasting benchmarks, such as Electricity Transformer Temperature (ETT), Electricity, Traffic, and Weather. These results raise two questions:

  • Does cross-variate information benefit time series forecasting?
  • When cross-variate information is not beneficial, can multivariate models still perform as well as univariate models?

In “TSMixer: An All-MLP Architecture for Time Series Forecasting”, we analyze the advantages of univariate linear models and reveal their effectiveness. Insights from this analysis lead us to develop Time-Series Mixer (TSMixer), an advanced multivariate model that leverages linear model characteristics and performs well on long-term forecasting benchmarks. To the best of our knowledge, TSMixer is the first multivariate model that performs as well as state-of-the-art univariate models on long-term forecasting benchmarks, where we show that cross-variate information is less beneficial. To demonstrate the importance of cross-variate information, we evaluate a more challenging real-world application, M5. Finally, empirical results show that TSMixer outperforms state-of-the-art models, such as PatchTST, Fedformer, Autoformer, DeepAR and TFT.

TSMixer architecture

A key difference between linear models and Transformers is how they capture temporal patterns. On one hand, linear models apply fixed and time-step-dependent weights to capture static temporal patterns, and are unable to process cross-variate information. On the other hand, Transformers use attention mechanisms that apply dynamic and data-dependent weights at each time step, capturing dynamic temporal patterns and enabling them to process cross-variate information.

In our analysis, we show that under common assumptions of temporal patterns, linear models have naïve solutions to perfectly recover the time series or place bounds on the error, which means they are great solutions for learning static temporal patterns of univariate time series more effectively. In contrast, it is non-trivial to find similar solutions for attention mechanisms, as the weights applied to each time step are dynamic. Consequently, we develop a new architecture by replacing Transformer attention layers with linear layers. The resulting TSMixer model, which is similar to the computer vision MLP-Mixer method, alternates between applications of the multi-layer perceptron in different directions, which we call time-mixing and feature-mixing, respectively. The TSMixer architecture efficiently captures both temporal patterns and cross-variate information, as shown in the figure below. The residual designs ensure that TSMixer retains the capacity of temporal linear models while still being able to exploit cross-variate information.

Transformer block and TSMixer block architectures. TSMixer replaces the multi-head attention layer with time-mixing, a linear model applied on the time dimension.

Comparison between data-dependent (attention mechanisms) and time-step-dependent (linear models). This is an example of forecasting the next time step by learning the weights of the previous three time steps.

Evaluation on long-term forecasting benchmarks

We evaluate TSMixer using seven popular long-term forecasting datasets (ETTm1, ETTm2, ETTh1, ETTh2, Electricity, Traffic, and Weather), where recent research has shown that univariate linear models outperform advanced multivariate models with large margins. We compare TSMixer with state-of-the-art multivariate models (TFT, FEDformer, Autoformer, Informer), and univariate models, including linear models and PatchTST. The figure below shows the average improvement of mean squared error (MSE) by TSMixer compared with others. The average is calculated across datasets and multiple forecasting horizons. We demonstrate that TSMixer significantly outperforms other multivariate models and performs on par with state-of-the-art univariate models. These results show that multivariate models are capable of performing as well as univariate models.

The average MSE improvement of TSMixer compared with other baselines. The red bars show multivariate methods and the blue bars show univariate methods. TSMixer achieves significant improvement over other multivariate models and achieves comparable results to univariate models.

Ablation study

We performed an ablation study to compare TSMixer with TMix-Only, a TSMixer variant that consists of time mixing layers only. The results show that TMix-Only performs almost the same as TSMixer, which means the additional feature mixing layers do not improve the performance and confirms that cross-variate information is less beneficial on popular benchmarks. The results validate the superior univariate model performance shown in previous research. However, existing long-term forecasting benchmarks are not well representative of the need for cross-variate information in some real-world applications where time series may be intermittent or sparse, hence temporal patterns may not be sufficient for forecasting. Therefore, it may be inappropriate to evaluate multivariate forecasting models solely on these benchmarks.

Evaluation on M5: Effectiveness of cross-variate information

To further demonstrate the benefit of multivariate models, we evaluate TSMixer on the challenging M5 benchmark, a large-scale retail dataset containing crucial cross-variate interactions. M5 contains the information of 30,490 products collected over 5 years. Each product description includes time series data, like daily sales, sell price, promotional event information, and static (non-time-series) features, such as store location and product category. The goal is to forecast the daily sales of each product for the next 28 days, evaluated using the weighted root mean square scaled error (WRMSSE) from the M5 competition. The complicated nature of retail makes it more challenging to forecast solely using univariate models that focus on temporal patterns, so multivariate models with cross-variate information and even auxiliary features are more essential.

First, we compare TSMixer to other methods only considering the historical data, such as daily sales and historical sell prices. The results show that multivariate models outperforms univariate models significantly, indicating the usefulness of cross-variate information. And among all compared methods, TSMixer effectively leverages the cross-variate information and achieves the best performance.

Additionally, to leverage more information, such as static features (e.g., store location, product category) and future time series (e.g., a promotional event scheduled in coming days) provided in M5, we propose a principle design to extend TSMixer. The extended TSMixer aligns different types of features into the same length, and then applies multiple mixing layers to the concatenated features to make predictions. The extended TSMixer architecture outperforms models popular in industrial applications, including DeepAR and TFT, showcasing its strong potential for real-world impact.

The architecture of the extended TSMixer. In the first stage (align stage), it aligns the different types of features into the same length before concatenating them. In the second stage (mixing stage) it applies multiple mixing layers conditioned with static features.

The WRMSSE on M5. The first three methods (blue) are univariate models. The middle three methods (orange) are multivariate models that consider only historical features. The last three methods (red) are multivariate models that consider historical, future, and static features.

Conclusion

We present TSMixer, an advanced multivariate model that leverages linear model characteristics and performs as well as state-of-the-art univariate models on long-term forecasting benchmarks. TSMixer creates new possibilities for the development of time series forecasting architectures by providing insights into the importance of cross-variate and auxiliary information in real-world scenarios. The empirical results highlight the need to consider more realistic benchmarks for multivariate forecasting models in future research. We hope that this work will inspire further exploration in the field of time series forecasting, and lead to the development of more powerful and effective models that can be applied to real-world applications.

Acknowledgements

This research was conducted by Si-An Chen, Chun-Liang Li, Nate Yoder, Sercan O. Arik, and Tomas Pfister.

Read More

Build a secure enterprise application with Generative AI and RAG using Amazon SageMaker JumpStart

Build a secure enterprise application with Generative AI and RAG using Amazon SageMaker JumpStart

Generative AI is a type of AI that can create new content and ideas, including conversations, stories, images, videos, and music. It’s powered by large language models (LLMs) that are pre-trained on vast amounts of data and commonly referred to as foundation models (FMs).

With the advent of these LLMs or FMs, customers can simply build Generative AI based applications for advertising, knowledge management, and customer support. Realizing the impact of these applications can provide enhanced insights to the customers and positively impact the performance efficiency in the organization, with easy information retrieval and automating certain time-consuming tasks.

With generative AI on AWS, you can reinvent your applications, create entirely new customer experiences, and improve overall productivity.

In this post, we build a secure enterprise application using AWS Amplify that invokes an Amazon SageMaker JumpStart foundation model, Amazon SageMaker endpoints, and Amazon OpenSearch Service to explain how to create text-to-text or text-to-image and Retrieval Augmented Generation (RAG). You can use this post as a reference to build secure enterprise applications in the Generative AI domain using AWS services.

Solution overview

This solution uses SageMaker JumpStart models to deploy text-to-text, text-to-image, and text embeddings models as SageMaker endpoints. These SageMaker endpoints are consumed in the Amplify React application through Amazon API Gateway and AWS Lambda functions. To protect the application and APIs from inadvertent access, Amazon Cognito is integrated into Amplify React, API Gateway, and Lambda functions. SageMaker endpoints and Lambda are deployed in a private VPC, so the communication from API Gateway to Lambda functions is protected using API Gateway VPC links. The following workflow diagram illustrates this solution.

The workflow includes the following steps:

  1. Initial Setup: SageMaker JumpStart FMs are deployed as SageMaker endpoints, with three endpoints created from SageMaker JumpStart models. The text-to-image model is a Stability AI Stable Diffusion foundation model that will be used for generating images. The text-to-text model used for generating text and deployed in the solution is a Hugging Face Flan T5 XL model. The text-embeddings model, which will be used for generating embedding to be indexed in Amazon OpenSearch Service or searching the context for the incoming question, is a Hugging Face GPT 6B FP16 embeddings model. Alternative LLMs can be deployed based on the use case and model performance benchmarks. For more information about foundation models, see Getting started with Amazon SageMaker JumpStart.
  2. You access the React application from your computer. The React app has three pages: a page that takes image prompts and displays the image generated; a page that takes text prompts and displays the generated text; and a page that takes a question, finds the context matching the question, and displays the answer generated by the text-to-text model.
  3. The React app built using Amplify libraries are hosted on Amplify and served to the user in the Amplify host URL. Amplify provides the hosting environment for the React application. The Amplify CLI is used to bootstrap the Amplify hosting environment and deploy the code into the Amplify hosting environment.
  4. If you have not been authenticated, you will be authenticated against Amazon Cognito using the Amplify React UI library.
  5. When you provide an input and submit the form, the request is processed via API Gateway.
  6. Lambda functions sanitize the user input and invoke the respective SageMaker endpoints. Lambda functions also construct the prompts from the sanitized user input in the respective format expected by the LLM. These Lambda functions also reformat the output from the LLMs and send the response back to the user.
  7. SageMaker endpoints are deployed for text-to-text (Flan T5 XXL), text-to-embeddings (GPTJ-6B), and text-to-image models (Stability AI). Three separate endpoints using the recommended default SageMaker instance types are deployed.
  8. Embeddings for documents are generated using the text-to-embeddings model and these embeddings are indexed into OpenSearch Service. A k-Nearest Neighbor (k-NN) index is enabled to allow searching of embeddings from the OpenSearch Service.
  9. An AWS Fargate job takes documents and segments them into smaller packages, invokes the text-to-embeddings LLM model, and indexes the returned embeddings into OpenSearch Service for searching context as described previously.

Dataset overview

The dataset used for this solution is pile-of-law within the Hugging Face repository. This dataset is a large corpus of legal and administrative data. For this example, we use train.cc_casebooks.jsonl.xz within this repository. This is a collection of education casebooks curated in a JSONL format as required by the LLMs.

Prerequisites

Before getting started, make sure you have the following prerequisites:

Implement the solution

An AWS CDK project that includes all the architectural components has been made available in this AWS Samples GitHub repository. To implement this solution, do the following:

  1. Clone the GitHub repository to your computer.
  2. Go to the root folder.
  3. Initialize the Python virtual environment.
  4. Install the required dependencies specified in the requirements.txt file.
  5. Initialize AWS CDK in the project folder.
  6. Bootstrap AWS CDK in the project folder.
  7. Using the AWS CDK deploy command, deploy the stacks.
  8. Go to the Amplify folder within the project folder.
  9. Initialize Amplify and accept the defaults provided by the CLI.
  10. Add Amplify hosting.
  11. Publish the Amplify front end from within the Amplify folder and note the domain name provided at the end of run.
  12. On the Amazon Cognito console, add a user to the Amazon Cognito instance that was provisioned with the deployment.
  13. Go to the domain name from step 11 and provide the Amazon Cognito login details to access the application.

Trigger an OpenSearch indexing job

The AWS CDK project deployed a Lambda function named GenAIServiceTxt2EmbeddingsOSIndexingLambda. Navigate to this function on the Lambda console.

Run a test with an empty payload, as shown in the following screenshot.

This Lambda function triggers a Fargate task on Amazon Elastic Container Service (Amazon ECS) running within the VPC. This Fargate task takes the included JSONL file to segment and create an embeddings index. Each segments embedding is a result of invoking the text-to-embeddings LLM endpoint deployed as part of the AWS CDK project.

Clean up

To avoid future charges, delete the SageMaker endpoint and stop all Lambda functions. Also, delete the output data in Amazon S3 you created while running the application workflow. You must delete the data in the S3 buckets before you can delete the buckets.

Conclusion

In this post, we demonstrated an end-to-end approach to create a secure enterprise application using Generative AI and RAG. This approach can be used in building secure and scalable Generative AI applications on AWS. We encourage you to deploy the AWS CDK app into your account and build the Generative AI solution.

Additional resources

For more information about Generative AI applications on AWS, refer to the following:


About the Authors

Jay Pillai is a Principal Solutions Architect at Amazon Web Services. As an Information Technology Leader, Jay specializes in artificial intelligence, data integration, business intelligence, and user interface domains. He holds 23 years of extensive experience working with several clients across real estate, financial services, insurance, payments, and market research business domains.

Shikhar Kwatra is an AI/ML Specialist Solutions Architect at Amazon Web Services, working with a leading Global System Integrator. He has earned the title of one of the Youngest Indian Master Inventors with over 500 patents in the AI/ML and IoT domains. Shikhar aids in architecting, building, and maintaining cost-efficient, scalable cloud environments for the organization, and supports the GSI partner in building strategic industry solutions on AWS. Shikhar enjoys playing guitar, composing music, and practicing mindfulness in his spare time.

Karthik Sonti leads a global team of solution architects focused on conceptualizing, building and launching horizontal, functional and vertical solutions with Accenture to help our joint customers transform their business in a differentiated manner on AWS.

Read More

Intelligently search Adobe Experience Manager content using Amazon Kendra

Intelligently search Adobe Experience Manager content using Amazon Kendra

Amazon Kendra is an intelligent search service powered by machine learning (ML). With Amazon Kendra, you can easily aggregate content from a variety of content repositories into an index that lets you quickly search all your enterprise data and find the most accurate answer. Adobe Experience Manager (AEM) is a content management system that’s used for creating website or mobile app content. Many organizations use Adobe Experience Manager (On-Premise) or Adobe Experience Manager (Cloud Service) as their content management platform. Enterprise users need to be able to search for accurate answers easily and securely across content from multiple data sources in the enterprise, including AEM, from content such as assets and pages.

Amazon Kendra customers can now use the Amazon Kendra AEM connector to index pages and assets from AEM. Amazon Kendra supports AEM as a Cloud Service author instances and AEM On-Premise author and publish instances. You can index AEM content and filter the types of content you want to index with the Amazon Kendra AEM On-Premise or Cloud Service connector, and search your data from AEM with Amazon Kendra intelligent search.

This post shows you how to configure the Amazon Kendra AEM connector to index your content and search your AEM assets and pages. The connector also ingests the access control list (ACL) information for each document. The ACL information is used to show search results filtered by what a user has access to.

Solution overview

In our solution, we configure AEM as a data source for an Amazon Kendra search index using the Amazon Kendra AEM connector. Based on the configuration, when the data source is synchronized, the connector crawls and indexes all the content from AEM that was created on or before a specific date. The connector also indexes the Access Control List (ACL) information for each message and document. When access control or user context filtering is enabled, the search results of a query made by a user includes results only from those documents that the user is authorized to read.

The Amazon Kendra AEM connector can integrate with AWS IAM Identity Center (Successor to AWS Single Sign-On). You first must enable IAM Identity Center and create an organization to sync users and groups from your active directory. The connector will use the user name and group lookup for the user context of the search queries.

Prerequisites

To try out the Amazon Kendra connector for AEM using this post as a reference, you need the following:

Set up OAuth2.0

If you are using AEM On-Premise, setup OAuth2.0 to generate an SSL certificate in order to complete the configuration of Amazon Kendra AEM connector.

The Adobe Granite OAuth 2.0 server implementation (com.adobe.granite.oauth.server) provides the support for OAuth 2.0 server functionalities in AEM.

Enable the OAuth Server authentication handler

By default, AEM won’t enable the OAuth Server authentication handler. To enable it, complete the following steps:

  1. To start the AEM local instance, go to http://localhost:<port>/system/console/configMgr/com.adobe.granite.oauth.server.auth.impl.OAuth2ServerAuthenticationHandler
  2. Change the jaas.ranking.name value to 1100 in the Adobe Granite OAuth Server Authentication Handler section and save the configuration.

The OAuth Server authentication handler is now enabled.

Register the OAuth client

Every external application requires OAuth authentication to be registered as an OAuth client in AEM. To register the OAuth client, complete the following steps:

  1. On the AEM start page, choose Security and OAuth client.
  2. Enter a name and redirect URI.
  3. Choose Save.

After a successful authorization of an application, the OAuth server will redirect you back to the application with an authorization code to the configured redirect URL.

  1. Copy the client ID and client secret and keep them safe.

The Granite OAuth Server supports the following grant types:

  • Authorization code
  • Refresh token
  • JWT bearer token

For this post, we use OAuth2.0 with the JWT grant type.

The JWT bearer token is mainly used for server-to-server integration. This will help us enable the server-to-server integration without the resource owner interaction; for example, to retrieve or upload files without user interaction.

Generate the JWT token

Complete the following steps to generate the JWT token:

  1. Navigate to localhost and the OAuth client.
  2. Choose Download Private Key.
  3. Choose Download.

Generate the public certificate

Now, generate the public certificate from the downloaded private key, run the following command, and enter the private key password.

Use the openssl command to generate the private key:

>openssl pkcs12 -in store.p12 -out store.crt.pem -clcerts -nokeys

Extract the private key:

openssl pkcs12 -in store.p12 -passin pass:notasecret -nocerts -nodes -out store.private.key.txt

Make sure to install openssl and add to the environment path beforehand.

Before using the private key while configuring the Amazon Kendra data source, make sure to not use or copy “-----BEGIN PRIVATE KEY-----” and “-----END PRIVATE KEY-----“ in the code. Additionally, remove any empty spaces from the private key.

Use the generated ClientId, ClientSecret, and private key to configure the Amazon Kendra AEM data source.

For OAuth client registration, navigate to http://localhost:<port>/libs/granite/oauth/content/clients.html.

Set up SSL

Complete the following steps to set up SSL:

  1. Create the key:
openssl genrsa -aes256 -out <keyFileName>.key 4096
  1. Encrypt the key:
openssl req -sha256 -new -key <keyFileName>.key -out <keyFileName>.csr -subj '/CN=<keyFileName>'
  1. Sign the key:
openssl x509 -req -days 365 -in <keyFileName>.csr -signkey <keyFileName>.key -out <keyFileName>.crt
  1. Encode the private key to der format:
openssl pkcs8 -topk8 -inform PEM -outform DER -in <keyFileName>.key -out <keyFileName>.der -nocrypt

Four files will be generated with file names starting with <keyFileName>. We use <keyFileName>.crt and <keyFileName>.der in later steps.

  1. Next, log in to AEM at http://localhost:<port>/aem/start.html.
  2. Choose Tools, Security, and SSL Configuration.
  3. In the Store Credentials section, enter the key store and trust store password.

  1. In the Keys and Certificate section, specify the .der file for Private Key and the .crt file for Certificate.

  1. In the next section, enter the domain (localhost), and leave the port as is.
  2. Choose Done.

AEM will open in the specified new port. For example, https://localhost:8443.

  1. Log in to AEM using HTTPS and download the certificate in the browser using the lock/pad button, export the certificate, and name it privateKey.crt.

Now, let’s import the certificate into the keystore path using the key tool.

  1. Open a terminal and go to the folder location where privateKey.crt is present and run the following command:
keytool -import -trustcacerts -keystore <JAVA_HOME>/lib/security/cacerts -storepass changeit -noprompt -alias yourAliasName -file privateKey.crt

Be sure to open 8443 and 80 port in your firewall settings.

  1. Add the certificate privateKey.crt to an Amazon Simple Storage Service (Amazon S3) bucket.

Configure the data source using the Amazon Kendra connector for AEM

You can use an existing index or create a new index to index documents from AEM using the AEM connector. Then complete the following steps. For more information, refer to the Amazon Kendra Developer Guide.

  1. On the Amazon Kendra console, open your index and choose Data sources in the navigation pane.
  2. Choose Add data source.
  3. Under Adobe Experience Manager, choose Add connector.

  1. In the Specify data source details section, enter a name and optionally a description, then choose Next.

  1. In the Define access and security section, select either the AEM On-Premise or AEM as a Cloud Service source type and enter the AEM host URL. You can find the URL in your AEM settings.

If using AEM On-Premise, enter the host URL of the AEM On-Premise server. Then choose Browse S3 and choose the S3 bucket with the SSL certificate.

If using AEM as a Cloud Service, you can use the author URL https://author-xxxxxx-xxxxxxx.adobeaemcloud.com.

  1. Under Authentication, you have two options, Basic authentication and OAuth 2.0 authentication.

If you select Basic authentication, for AWS Secrets Manager secret, choose Create and add a new secret. Then enter a name for the secret, the AEM site user name, and password. The user must have admin permission or be an admin user.

If you select OAuth 2.0 authentication, for AWS Secrets Manager secret, choose Create and add a new secret. Enter a name for the secret, client ID, client secret, and private key. If you use AEM as a Cloud Service, enter a name for the secret, client ID, client secret, private key, organization ID, technical account ID, and Adobe Identity Management System (IMS) host.

  1. Choose Save or Add Secret.
  2. In the Configure VPC and security group section, you can optionally choose to use a VPC. If so, you must add subnets and VPC security groups.
  3. In the Identity crawler section, choose to crawl identity information on users and groups with access to certain documents and store this in the Amazon Kendra principal or identity store.

This is useful for filtering search results based on the user or their group access to documents.

  1. In the IAM section, create a new IAM role or choose an existing IAM role to access repository credentials and index content.
  2. Choose Next.

  1. In the Configure sync settings section, provide information about your sync scope.

You can include the files to be crawled using inclusion patterns or exclude them using exclusion patterns. When you provide a pattern in the Include patterns section, only documents matching that pattern will be crawled. When you provide a pattern in the Exclude patterns section, documents matching that pattern will be not be crawled.

  1. If you use AEM On-Premise and the time zone of your server is different than the time zone of the Amazon Kendra AEM connector or index, you can specify the server time zone to align with the AEM connector or index in the Timezone ID section.

The default time zone for AEM On-Premise is the time zone of the Amazon Kendra AEM connector or index. The default time zone for AEM as a Cloud Service is Greenwich Mean Time.

  1. Choose the Sync mode (for this post, select Full sync).

With the Full sync option, every time the sync runs, Amazon Kendra will crawl all documents and ingest each document even if ingested earlier. The full refresh enables you to reset your Amazon Kendra index without the need to delete and create a new data source. If you choose New or modified content sync or New, modified, or deleted content sync, every time the sync job runs, it will process only objects added, modified, or deleted since the last crawl. Incremental crawls can help reduce runtime and cost when used with datasets that append new objects to existing data sources on a regular basis.

  1. For Sync run schedule, choose Run on demand.
  2. Choose Next.

  1. In the Set field mappings section, you can optionally select from the Amazon Kendra generated default data source fields you want to map to your index. To add custom data source fields, choose Add Field to create an index field name to map to and the field data type. Specify the AEM field name, index field name, and data type.

  1. Choose Next.

  1. Review your settings and choose Add data source.

  1. After the data source is added, choose Data sources in the navigation pane, select the newly added data source, and choose Sync now to start data source synchronization with the Amazon Kendra index.

The sync process will depend on the amount of data to be crawled.

Now let’s enable access control for the Amazon Kendra index.

  1. In the navigation pane, choose your index.
  2. On the User access control tab, choose Edit settings.

  1. Change the settings to look like the following screenshot.
  2. Choose Next.

  1. Choose Update.

Wait a few minutes for the index to get updated by the changes. Now let’s see how you can perform intelligent search with Amazon Kendra.

Perform intelligent search with Amazon Kendra

Before you try searching on the Amazon Kendra console or using the API, make sure that the data source sync is complete. To check, view the data sources and verify if the last sync was successful.

Now we’re ready to search our index.

  1. On the Amazon Kendra console, navigate to the index and choose Search indexed content in the navigation pane.
  2. Let’s query the index using “What was the impact of Siberian heat wave?” without providing an access token.

Based on our access control settings in the index, a valid access token is needed to access content the user is allowed to see; therefore, when we use this search query without setting any user name or group, no results are returned.

  1. Next, choose Apply Token and set the user name or user email ID (for example, user-dev@company.com) that has access to AEM content.

While crawling the AEM data source, the connecter would set the user email ID as principal. If user’s email ID is not available, then the user name would be set as a principal.

The following screenshot shows an example with the user email ID user-dev-2@amazon.com set as principal.

The following example uses user name user-dev-2 set as principal.

  1. Now, let’s try to search the same content with the token of user user-dev@amazon.com, who is not authorized to view this specific document that appeared in the preceding query results.

This confirms that documents ingested by the Amazon Kendra connector for AEM honors the ACLs set by and within AEM and these same ACLs are being enforced on the search results based on applied token.

Clean up

To avoid incurring future costs, clean up the resources you created as part of this solution. If you created a new Amazon Kendra index while testing this solution, delete it. If you only added a new data source using the Amazon Kendra connector for AEM, delete that data source.

Conclusion

With the Amazon Kendra Adobe Experience Manager connector, your organization can search pages and assets securely using intelligent search powered by Amazon Kendra.

To learn more about the Amazon Kendra connector for AEM, refer to Adobe Experience Manager.

For more information on other Amazon Kendra built-in connectors to popular data sources, refer to Amazon Kendra native connectors.


About the Authors

Praveen Edem is a Senior Solutions Architect at Amazon Web Services. He works with major financial services customers, architecting and modernizing their critical large-scale applications while adopting AWS services. He specializes in serverless and container-based workloads. He has over 20 years of IT experience in application development and software architecture.

Manjula Nagineni is a Senior Solutions Architect with AWS based in New York. She works with major financial service institutions, architecting and modernizing their large-scale applications while adopting AWS Cloud services. She is passionate about designing big data workloads cloud-natively. She has over 20 years of IT experience in software development, analytics, and architecture across multiple domains such as finance, manufacturing, and telecom.

Omkar Phadtare is a Software Development Engineer at Amazon Web Services, with a deep-rooted passion for cloud computing. Leveraging his technical expertise and strong understanding of the domain, he designs, develops, and implements cutting-edge, highly scalable, and resilient cloud-based solutions for a diverse range of modern businesses and organizations.

Vijai Gandikota is a Senior Product Manager for Amazon Kendra at Amazon Web Services, responsible for launching Amazon Kendra connectors, Principal Store, Search Analytics Dashboard, and other features of Amazon Kendra. He has over 20 years of experience in designing, developing, and launching products in AI and analytics.

Read More

Fine-tune Llama 2 for text generation on Amazon SageMaker JumpStart

Fine-tune Llama 2 for text generation on Amazon SageMaker JumpStart

Today, we are excited to announce the capability to fine-tune Llama 2 models by Meta using Amazon SageMaker JumpStart. The Llama 2 family of large language models (LLMs) is a collection of pre-trained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Fine-tuned LLMs, called Llama-2-chat, are optimized for dialogue use cases. You can easily try out these models and use them with SageMaker JumpStart, which is a machine learning (ML) hub that provides access to algorithms, models, and ML solutions so you can quickly get started with ML. Now you can also fine-tune 7 billion, 13 billion, and 70 billion parameters Llama 2 text generation models on SageMaker JumpStart using the Amazon SageMaker Studio UI with a few clicks or using the SageMaker Python SDK.

Generative AI foundation models have been the focus of most of the ML and artificial intelligence research and use cases for over a year now. These foundation models perform very well with generative tasks, such as text generation, summarization, question answering, image and video generation, and more, because of their large size and also because they are trained on several large datasets and hundreds of tasks. Despite the great generalization capabilities of these models, there are often use cases that have very specific domain data (such as healthcare or financial services), because of which these models may not be able to provide good results for these use cases. This results in a need for further fine-tuning of these generative AI models over the use case-specific and domain-specific data.

In this post, we walk through how to fine-tune Llama 2 pre-trained text generation models via SageMaker JumpStart.

What is Llama 2

Llama 2 is an auto-regressive language model that uses an optimized transformer architecture. Llama 2 is intended for commercial and research use in English. It comes in a range of parameter sizes—7 billion, 13 billion, and 70 billion—as well as pre-trained and fine-tuned variations. According to Meta, the tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align to human preferences for helpfulness and safety. Llama 2 was pre-trained on 2 trillion tokens of data from publicly available sources. The tuned models are intended for assistant-like chat, whereas pre-trained models can be adapted for a variety of natural language generation tasks. Regardless of which version of the model a developer uses, the responsible use guide from Meta can assist in guiding additional fine-tuning that may be necessary to customize and optimize the models with appropriate safety mitigations.

Currently, Llama 2 is available in the following regions:

  • Deploy pre-trained model available: "us-west-2", "us-east-1", "us-east-2", "eu-west-1", "ap-southeast-1", "ap-southeast-2"
  • Fine-tune and deploy the fine-tuned model: “us-east-1”, “us-west-2”,“eu-west-1”

What is SageMaker JumpStart

With SageMaker JumpStart, ML practitioners can choose from a broad selection of publicly available foundation models. ML practitioners can deploy foundation models to dedicated Amazon SageMaker instances from a network isolated environment and customize models using SageMaker for model training and deployment. You can now discover and deploy Llama 2 with a few clicks in SageMaker Studio or programmatically through the SageMaker Python SDK, enabling you to derive model performance and MLOps controls with SageMaker features such as Amazon SageMaker Pipelines, Amazon SageMaker Debugger, or container logs. The model is deployed in an AWS secure environment and under your VPC controls, helping ensure data security. In addition, you can fine-tune Llama2 7B, 13B, and 70B pre-trained text generation models via SageMaker JumpStart.

Fine-tune Llama2 models

You can fine-tune the models using either the SageMaker Studio UI or SageMaker Python SDK. We discuss both methods in this section.

No-code fine-tuning via the SageMaker Studio UI

In SageMaker Studio, you can access Llama 2 models via SageMaker JumpStart under Models, notebooks, and solutions, as shown in the following screenshot.

If you don’t see Llama 2 models, update your SageMaker Studio version by shutting down and restarting. For more information about version updates, refer to Shut down and Update Studio Apps.

You can also find other four model variants by choosing Explore all Text Generation Models or searching for llama in the search box.

On this page, you can point to the Amazon Simple Storage Service (Amazon S3) bucket containing the training and validation datasets for fine-tuning. In addition, you can configure deployment configuration, hyperparameters, and security settings for fine-tuning. You can then choose Train to start the training job on a SageMaker ML instance. The preceding screenshot shows the fine-tuning page for the Llama-2 7B model; however, you can fine-tune the 13B and 70B Llama 2 text generation models using their respective model pages similarly. To use Llama 2 models, you need to accept the End User License Agreement (EULA). It will show up when you when you choose Train, as shown in the following screenshot. Choose I have read and accept EULA and AUP to start the fine-tuning job.

Deploy the model

After the model is fine-tuned, you can deploy it using the model page on SageMaker JumpStart. The option to deploy the fine-tuned model will appear when fine-tuning is finished, as shown in the following screenshot.

Fine-tune via the SageMaker Python SDK

You can also fine-tune Llama 2 models using the SageMaker Python SDK. The following is a sample code to fine-tune the Llama 2 7B on your dataset:

import os
import boto3
from sagemaker.session import Session
from sagemaker.jumpstart.estimator import JumpStartEstimator

# To fine-tune the 13B/70B model, please change model_id to `meta-textgeneration-llama-2-13b`/`meta-textgeneration-llama-2-70b`.
model_id = "meta-textgeneration-llama-2-7b"

estimator = JumpStartEstimator(
    model_id=model_id, environment={"accept_eula": "true"}
)
# By default, instruction tuning is set to false. Thus, to use instruction tuning dataset you use
estimator.set_hyperparameters(instruction_tuned="True", epoch="5")
estimator.fit({"training": train_data_location})

You can deploy the fine-tuned model directly from the estimator:

finetuned_predictor = estimator.deploy()

You can also find the code in Fine-tune LLaMA 2 models on SageMaker JumpStart. It includes dataset preparation, training on your custom dataset, and deploying the fine-tuned model. It demonstrates fine-tuning on a subset of the Dolly dataset with examples from the summarization task. The following is the example input with responses from fine-tuned and non-fine-tuned along with the ground truth response:

Input to the model:

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:nWhen did Felix Luna die?nn### Input:nFélix César Luna (30 September 1925 – 5 November 2009) was an Argentine writer, lyricist and historian.nnnn### Response:n

Ground Truth response:

Felix Luna died on November 5th, 2009

Response from the non fine-tuned model:

Félix César Luna (30 September 1925 – 5 November 2009) was an ArgentinennWhen did Luna die?nnn### Explanation:nWe answer the question with the input's date of birth and the date of death.nnn### Solution: 1102n

Response from the fine-tuned model:

Félix Luna died on November 5th, 2009.nn

For performance benchmarking of different models on the Dolly and Dialogsum dataset, refer to the Performance benchmarking section in the appendix at the end of this post.

Fine-tuning technique

Language models such as Llama are more than 10 GB or even 100 GB in size. Fine-tuning such large models requires instances with significantly high CUDA memory. Furthermore, training these models can be very slow due to the size of the model. Therefore, for efficient fine-tuning, we use the following optimizations:

  • Low-Rank Adaptation (LoRA) – This is a type of parameter efficient fine-tuning (PEFT) for efficient fine-tuning of large models. In this, we freeze the whole model and only add a small set of adjustable parameters or layers into the model. For instance, instead of training all 7 billion parameters for Llama 2 7B, we can fine-tune less than 1% of the parameters. This helps in significant reduction of the memory requirement because we only need to store gradients, optimizer states, and other training-related information for only 1% of the parameters. Furthermore, this helps in reduction of training time as well as the cost. For more details on this method, refer to LoRA: Low-Rank Adaptation of Large Language Models.
  • Int8 quantization – Even with optimizations such as LoRA, models such as Llama 70B are still too big to train. To decrease the memory footprint during training, we can use Int8 quantization during training. Quantization typically reduces the precision of the floating point data types. Although this decreases the memory required to store model weights, it degrades the performance due to loss of information. Int8 quantization uses only a quarter precision but doesn’t incur degradation of performance because it doesn’t simply drop the bits. It rounds the data from one type to the another. To learn about Int8 quantization, refer to LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale.
  • Fully Sharded Data Parallel (FSDP) – This is a type of data-parallel training algorithm that shards the model’s parameters across data parallel workers and can optionally offload part of the training computation to the CPUs. Although the parameters are sharded across different GPUs, computation of each microbatch is local to the GPU worker. It shards parameters more uniformly and achieves optimized performance via communication and computation overlapping during training.

The following table compares different methods with the three Llama 2 models.

, Default Instance Type Supported Instance Types with Default configuration Default Setting LORA + FSDP LORA + No FSDP Int8 Quantization + LORA + No FSDP
Llama 2 7B ml.g5.12xlarge ml.g5.12xlarge, ml.g5.24xlarge, ml.g5.48xlarge LORA + FSDP Yes Yes Yes
Llama 2 13B ml.g5.12xlarge ml.g5.24xlarge, ml.g5.48xlarge LORA + FSDP Yes Yes Yes
Llama 2 70B ml.g5.48xlarge ml.g5.48xlarge INT8 + LORA + NO FSDP No No Yes

Note that fine-tuning of Llama models is based on scripts provided by the following GitHub repo.

Training dataset format

SageMaker JumpStart currently support datasets in both domain adaptation format and instruction tuning format. In this section, we specify an example dataset in both formats. For more details, refer to the Dataset formatting section in the appendix.

Domain adaptation format

The text generation Llama 2 model can be fine-tuned on any domain-specific dataset. After it’s fine-tuned on the domain-specific dataset, the model is expected to generate domain-specific text and solve various NLP tasks in that specific domain with few-shot prompting. With this dataset, input consists of a CSV, JSON, or TXT file. For instance, input data may be SEC filings of Amazon as a text file:

This report includes estimates, projections, statements relating to our
business plans, objectives, and expected operating results that are “forward-
looking statements” within the meaning of the Private Securities Litigation
Reform Act of 1995, Section 27A of the Securities Act of 1933, and Section 21E
of the Securities Exchange Act of 1934. Forward-looking statements may appear
throughout this report, including the following sections: “Business” (Part I,
Item 1 of this Form 10-K), “Risk Factors” (Part I, Item 1A of this Form 10-K),
and “Management’s Discussion and Analysis of Financial Condition and Results
of Operations” (Part II, Item 7 of this Form 10-K). These forward-looking
statements generally are identified by the words “believe,” “project,”
“expect,” “anticipate,” “estimate,” “intend,” “strategy,” “future,”
“opportunity,” “plan,” “may,” “should,” “will,” “would,” “will be,” “will
continue,” “will likely result,” and similar expressions.

Instruction tuning format

In instruction fine-tuning, the model is fine-tuned for a set of natural language processing (NLP) tasks described using instructions. This helps improve the model’s performance for unseen tasks with zero-shot prompts. In instruction tuning dataset format, you specify the template.json file describing the input and the output formats. For instance, each line in the file train.jsonl looks like the following:

{"instruction": "What is a dispersive prism?", 
"context": "In optics, a dispersive prism is an optical prism that is used to disperse light, that is, to separate light into its spectral components (the colors of the rainbow). Different wavelengths (colors) of light will be deflected by the prism at different angles. This is a result of the prism material's index of refraction varying with wavelength (dispersion). Generally, longer wavelengths (red) undergo a smaller deviation than shorter wavelengths (blue). The dispersion of white light into colors by a prism led Sir Isaac Newton to conclude that white light consisted of a mixture of different colors.", 
"response": "A dispersive prism is an optical prism that disperses the light's different wavelengths at different angles. When white light is shined through a dispersive prism it will separate into the different colors of the rainbow."}

The additional file template.json looks like the following:

{
    "prompt": "Below is an instruction that describes a task, paired with an input that provides further context. "
    "Write a response that appropriately completes the request.nn"
    "### Instruction:n{instruction}nn### Input:n{context}nn",
    "completion": " {response}",
}

Supported hyperparameters for training

Llama 2 fine-tuning supports a number of hyperparameters, each of which can impact the memory requirement, training speed, and performance of the fine-tuned model:

  • epoch – The number of passes that the fine-tuning algorithm takes through the training dataset. Must be an integer greater than 1. Default is 5.
  • learning_rate – The rate at which the model weights are updated after working through each batch of training examples. Must be a positive float greater than 0. Default is 1e-4.
  • instruction_tuned – Whether to instruction-train the model or not. Must be ‘True‘ or ‘False‘. Default is ‘False‘.
  • per_device_train_batch_size – The batch size per GPU core/CPU for training. Must be a positive integer. Default is 4.
  • per_device_eval_batch_size – The batch size per GPU core/CPU for evaluation. Must be a positive integer. Default is 1.
  • max_train_samples – For debugging purposes or quicker training, truncate the number of training examples to this value. Value -1 means using all of the training samples. Must be a positive integer or -1. Default is -1.
  • max_val_samples – For debugging purposes or quicker training, truncate the number of validation examples to this value. Value -1 means using all of the validation samples. Must be a positive integer or -1. Default is -1.
  • max_input_length – Maximum total input sequence length after tokenization. Sequences longer than this will be truncated. If -1, max_input_length is set to the minimum of 1024 and the maximum model length defined by the tokenizer. If set to a positive value, max_input_length is set to the minimum of the provided value and the model_max_length defined by the tokenizer. Must be a positive integer or -1. Default is -1.
  • validation_split_ratio – If validation channel is none, ratio of train-validation split from the train data must be between 0–1. Default is 0.2.
  • train_data_split_seed – If validation data is not present, this fixes the random splitting of the input training data to training and validation data used by the algorithm. Must be an integer. Default is 0.
  • preprocessing_num_workers – The number of processes to use for preprocessing. If None, the main process is used for preprocessing. Default is None.
  • lora_r – Lora R. Must be a positive integer. Default is 8.
  • lora_alpha – Lora Alpha. Must be a positive integer. Default is 32
  • lora_dropout – Lora Dropout. must be a positive float between 0 and 1. Default is 0.05.
  • int8_quantization – If True, the model is loaded with 8-bit precision for training. Default for 7B and 13B is False. Default for 70B is True.
  • enable_fsdp – If True, training uses FSDP. Default for 7B and 13B is True. Default for 70B is False. Note that int8_quantization is not supported with FSDP.

Instance types and compatible hyperparameters

The memory requirement during fine-tuning may vary based on several factors:

  • Model type – The 7B model has the least GPU memory requirement and 70B has the largest memory requirement
  • Max input length – A higher value of input length leads to processing more tokens at a time and as such requires more CUDA memory
  • Batch size – A larger batch size requires larger CUDA memory and therefore requires larger instance types
  • Int8 quantization – If using Int8 quantization, the model is loaded into low precision and therefore requires less CUDA memory

To help you get started, we provide a set of combinations of different instance types, hyperparameters, and model types that can be successfully fine-tuned. You can select a configuration as per your requirements and availability of instance types. We fine-tune all three models on a variety of settings with three epochs on a subset of the Dolly dataset with summarization examples.

7B model

The following table summarizes the fine-tuning options on the 7B model.

Instance Type Max Input Len Per Device Batch Size Int8 Quantization Enable FSDP Time Taken (mins)
ml.g4dn.12xlarge 1024 8 TRUE FALSE 166
ml.g4dn.12xlarge 2048 2 TRUE FALSE 178
ml.g4dn.12xlarge 1024 4 FALSE TRUE 120
ml.g4dn.12xlarge 2048 2 FALSE TRUE 143
ml.g5.2xlarge 1024 4 TRUE FALSE 61
ml.g5.2xlarge 2048 2 TRUE FALSE 68
ml.g5.2xlarge 1024 4 FALSE TRUE 43
ml.g5.2xlarge 2048 2 FALSE TRUE 49
ml.g5.4xlarge 1024 4 FALSE TRUE 39
ml.g5.4xlarge 2048 2 FALSE TRUE 50
ml.g5.12xlarge 1024 16 TRUE FALSE 57
ml.g5.12xlarge 2048 4 TRUE FALSE 64
ml.g5.12xlarge 1024 4 FALSE TRUE 26
ml.g5.12xlarge 2048 4 FALSE TRUE 23
ml.g5.48xlarge 1024 16 TRUE FALSE 59
ml.g5.48xlarge 2048 4 TRUE FALSE 67
ml.g5.48xlarge 1024 8 FALSE TRUE 22
ml.g5.48xlarge 2048 4 FALSE TRUE 21

13B

The following table summarizes the fine-tuning options on the 13B model.

Instance Type Max Input Len Per Device Batch Size Int8 Quantization Enable FSDP Time Taken (mins)
ml.g4dn.12xlarge 1024 4 TRUE FALSE 283
ml.g4dn.12xlarge 2048 2 TRUE FALSE 328
ml.g5.12xlarge 1024 8 TRUE FALSE 92
ml.g5.12xlarge 2048 4 TRUE FALSE 104
ml.g5.48xlarge 1024 8 TRUE FALSE 95
ml.g5.48xlarge 2048 4 TRUE FALSE 107
ml.g5.48xlarge 1024 8 FALSE TRUE 35
ml.g5.48xlarge 2048 2 FALSE TRUE 41

70B

The following table summarizes the fine-tuning options on the 70B model.

Instance Type Max Input Len Per Device Batch Size Int8 Quantization Enable FSDP Time Taken (mins)
ml.g5.48xlarge 1024 4 TRUE FALSE 396
ml.g5.48xlarge 2048 1 TRUE FALSE 454

Recommendations on instance types and hyperparameters

When fine-tuning the model’s accuracy, keep in mind the following:

  • Larger models such as 70B provide better performance than 7B
  • Performance without Int8 quantization is better than performance with INT8 quantization

Note the following training time and CUDA memory requirements:

  • Setting int8_quantization=True decreases the memory requirement and leads to faster training.
  • Decreasing per_device_train_batch_size and max_input_length reduces the memory requirement and therefore can be run on smaller instances. However, setting very low values may increase the training time.
  • If you’re not using Int8 quantization (int8_quantization=False), use FSDP (enable_fsdp=True) for faster and efficient training.

When choosing the instance type, consider the following:

  • G5 instances provide the most efficient training among the instance types supported. Therefore, if you have G5 instances available, you should use them.
  • Training time largely depends on the amount of the number of GPUs and the CUDA memory available. Therefore, training on instances with the same number of GPUs (for example, ml.g5.2xlarge and ml.g5.4xlarge) is roughly the same. Therefore, you can use the cheaper instance for training (ml.g5.2xlarge).
  • When using p3 instances, training will be done with 32-bit precision because bfloat16 is not supported on these instances. Therefore, the training job will consume double the amount of CUDA memory when training on p3 instances compared to g5 instances.

To learn about the cost of training per instance, refer to Amazon EC2 G5 Instances.

If the dataset is in instruction tuning format and input+completion sequences are small (such as 50–100 words), then a high value of max_input_length leads to very poor performance. The default value of this parameter is -1, which corresponds to the max_input_length of 2048 for Llama models. Therefore, we recommend that if your dataset contain small samples, use a small value for max_input_length (such as 200–400).

Lastly, due to high demand of the G5 instances, you may experience unavailability of these instances in your region with the error “CapacityError: Unable to provision requested ML compute capacity. Please retry using a different ML instance type.” If you experience this error, retry the training job or try a different Region.

Issues when fine-tuning very large models

In this section, we discuss two issues when fine-tuning very large models.

Disable output compression

By default, the output of a training job is a trained model that is compressed in a .tar.gz format before it’s uploaded to Amazon S3. However, due to the large size of the model, this step can take a long time. For example, compressing and uploading the 70B model can take more than 4 hours. To avoid this issue, you can use the disable output compression feature supported by the SageMaker training platform. In this case, the model is uploaded without any compression, which is further used for deployment:

estimator = JumpStartEstimator(
model_id=model_id, environment={"accept_eula": "true"}, disable_output_compression=True
)

SageMaker Studio kernel timeout issue

Due to the size of the Llama 70B model, the training job may take several hours and the SageMaker Studio kernel may die during the training phase. However, during this time, training is still running in SageMaker. If this happens, you can still deploy the endpoint using the training job name with the following code:

from sagemaker.jumpstart.estimator import JumpStartEstimator
training_job_name = <<<INSERT_TRAINING_JOB_NAME>>>

attached_estimator = JumpStartEstimator.attach(training_job_name, model_id)
attached_estimator.logs()
attached_estimator.deploy()

To find the training job name, navigate to the SageMaker console and under Training in the navigation pane, choose Training jobs. Identify the training job name and substitute it in the preceding code.

Conclusion

In this post, we discussed fine-tuning Meta’s Llama 2 models using SageMaker JumpStart. We showed that you can use the SageMaker JumpStart console in SageMaker Studio or the SageMaker Python SDK to fine-tune and deploy these models. We also discussed the fine-tuning technique, instance types, and supported hyperparameters. In addition, we outlined recommendations for optimized training based on various tests we carried out. The results for fine-tuning the three models over two datasets are shown in the appendix at the end of this post. As we can see from these results, fine-tuning improves summarization compared to non-fine-tuned models. As a next step, you can try fine-tuning these models on your own dataset using the code provided in the GitHub repository to test and benchmark the results for your use cases.

The authors would like to acknowledge the technical contributions of Christopher Whitten, Xin Huang, Kyle Ulrich, Sifei Li, Amy You, Adam Kozdrowicz, Evan Kravitz , Benjamin Crabtree, Haotian An, Manan Shah, Tony Cruz, Ernev Sharma, Jonathan Guinegagne and June Won.


About the Authors

Vivek MadanDr. Vivek Madan is an Applied Scientist with the Amazon SageMaker JumpStart team. He got his PhD from University of Illinois at Urbana-Champaign and was a Post Doctoral Researcher at Georgia Tech. He is an active researcher in machine learning and algorithm design and has published papers in EMNLP, ICLR, COLT, FOCS, and SODA conferences.

Dr. Farooq Sabir is a Senior Artificial Intelligence and Machine Learning Specialist Solutions Architect at AWS. He holds PhD and MS degrees in Electrical Engineering from the University of Texas at Austin and an MS in Computer Science from Georgia Institute of Technology. He has over 15 years of work experience and also likes to teach and mentor college students. At AWS, he helps customers formulate and solve their business problems in data science, machine learning, computer vision, artificial intelligence, numerical optimization, and related domains. Based in Dallas, Texas, he and his family love to travel and go on long road trips.

Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker JumpStart and helps develop machine learning algorithms. He got his PhD from University of Illinois Urbana-Champaign. He is an active researcher in machine learning and statistical inference, and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.


Appendix

This appendix provides additional information about performance benchmarking and dataset formatting.

Performance benchmarking

In this section, we provide results for fine-tuning the three Llama 2 models (7B, 13B, and 70B) on two different datasets: Dolly and Dialogsum. For the Dolly dataset, our task is to summarize a paragraph of text, whereas for Dialogsum, we are fine-tuning the model to summarize a discussion between two people. In the following tables, we show the input to the model (prompt and instructions), ground truth (summary), response from the pre-trained Llama 2 model, and response from the fine-tuned Llama 2 model for each of the three Llama 2 models. We show inference results for five data points. You can notice from the following tables that the summaries improve for both the datasets when we fine-tune the models.

  • Results for fine-tuning the Llama 2 7B text generation model on the Dolly dataset:
Inputs Ground Truth Response from Non-Fine-Tuned Model Response from Fine-Tuned Model
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:nWhere is Allauchnn### Input:nAllauch (French pronunciation: ; Occitan: Alaug) is a French commune situated east of Marseille in the department of Bouches-du-Rhône in the Provence-Alpes-Côte d’Azur region of France.nnThe inhabitants of the commune are known as Allaudiens or Allaudiennes in French.nnnn### Response:n Allauch is a French commune situated east of Marseille in the department of Bouches-du-Rhône in the Provence-Alpes-Côte d’Azur region of France.nnThe inhabitants of the commune are known as Allaudiens or Allaudiennes in French. Allauch is a town located 28 km east of Marseille in the department of Bouches-du-Rhône in the Provence-Alpes-Côte d’Azur region of France. It is considered to be one of the most beautiful villages in France. The town is surrounded by hills and vineyards and has a population of around 6,000 people. It was first mentioned in 1006 and has a rich The commune is situated on a rocky plateau called Massif du Gélas, situated east of Marseille. It has a beautiful view of Marseille and its coastline. Allauch is now a well-developed commune with its own town hall and an array of shops, bars and restaurants. Due to its location between Marseille and Alpilles, it has retained many natural features of the area, including the Glandes Massif, also known as the “W
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:nWhat is an egg laying mammal?nn### Input:nThe platypus (Ornithorhynchus anatinus), sometimes referred to as the duck-billed platypus, is a semiaquatic, egg-laying mammal endemic to eastern Australia, including Tasmania. The platypus is the sole living representative or monotypic taxon of its family (Ornithorhynchidae) and genus (Ornithorhynchus), though a number of related species appear in the fossil record.nnnn### Response:n Duck-billed platypus is an egg-laying mammal found in eastern Australia. It is the sole living representative in its genus Ornithorhynchus. nI did quite a bit of research on your question. I read all the information that you provided in your second paragraph. I also read a bit about platypus, including [this page](https://phys.org/news/2019-04-australia-platypus-predator-species.html). I think I found some answers to your question. Are you asking if the platypus is an egg laying mammal or if A mammal that lays eggs, like a platypusnn
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:nWhen did Felix Luna die?nn### Input:nFélix César Luna (30 September 1925 – 5 November 2009) was an Argentine writer, lyricist and historian.nnnn### Response:n Felix Luna died on November 5th, 2009 Félix César Luna (30 September 1925 – 5 November 2009) was an ArgentinennWhen did Luna die?nnn### Explanation:nWe answer the question with the input’s date of birth and the date of death.nnn### Solution: 1102n Félix Luna died on November 5th, 2009.nn
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:nJot down some important points about optical illusion from the given passage.nn### Input:nIn visual perception, an optical illusion (also called a visual illusion ) is an illusion caused by the visualnsystem and characterized by a visual percept that arguably appears to differ from reality. Illusions comenin a wide variety; their categorization is difficult because the underlying cause is often not clear but anclassification proposed by Richard Gregory is useful as an orientation. According to that, there are threenmain classes: physical, physiological, and cognitive illusions, and in each class there are four kinds:nAmbiguities, distortions, paradoxes, and fictions. A classical example for a physical distortion would benthe apparent bending of a stick half immerged in water; an example for a physiological paradox isnthe motion aftereffect (where, despite movement, position remains unchanged). An example for anphysiological fiction is an afterimage. Three typical cognitive distortions are the Ponzo , Poggendorff,nand Müller-Lyer illusion. Physical illusions are caused by the physical environment, e.g. by the opticalnproperties of water. Physiological illusions arise in the eye or the visual pathway, e.g. from the effects ofnexcessive stimulation of a specific receptor type. Cognitive visual illusions are the result of unconsciousninferences and are perhaps those most widely known.nnnn### Response:n 1. Visual Illusion is caused by the visual system and characterized by a visual percept that appearsnto be different from reality.n2. According to Richard Gregory a classification proposed by him is useful as an orientation that,nthere are three main classes: physical, physiological, and cognitive illusions, and in each classnthere are four kinds: Ambiguities, distortions, paradoxes, and fictions.n3. An example for a physical distortion would be the apparent bending of a stick half immerged innwater; an example for a physiological paradox is the motion aftereffect (where, despitenmovement, position remains unchanged). n4. An example for a physiological fiction is an afterimagen5. Three cognitive distortions are the Ponzo , Poggendorff, and Müller-Lyer illusion.n6. Physical illusions are caused by the physical environment, e.g. by the optical properties ofnwater. Physiological illusions arise in the eye or the visual pathway.n7. The most widely know illusion is visual illusions which are the result of unconscious inferences. The optical illusion mentioned in the passage is the illusionary motion of the stick that is partially plunged in water.nn## 50.n### Instruction:nWrite an argument to support or refute the claim:nAll of the students in our lab have the same average IQ.nn### Input:nMany students in the experiment have the same average IQ, and I believe that they are the 10% with the highestn nOptical illusions make use of the physiological and psychological limitations related to the way that we interpret and respond to visual stimuli, both stimuli in the world around us and visual stimuli produced internally. They provide a common stimulus to the two systems (physical and psychological/perceptual) which allow us to compare both, and understand the way that each works. There are a variety of different forms of optical illusions, that include psychological,
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:nSummarize The Innovator’s Dilemma in a few sentencenn### Input:nIt describes how large incumbent companies lose market share by listening to their customers and providing what appears to be the highest-value products, but new companies that serve low-value customers with poorly developed technology can improve that technology incrementally until it is good enoughnnnn### Response:n The Innovator’s Dilemma: When New Technologies Cause Great Firms to Fail, first published in 1997, is the best-known work of the Harvard professor and businessman Clayton Christensen. It expands on the concept of disruptive technologies, a term he coined in a 1995 article “Disruptive Technologies: Catching the Wave”. It describes how large incumbent companies lose market share by listening to their customers and providing what appears to be the highest-value products, but new companies that serve low-value customers with poorly developed technology can improve that technology incrementally until it is good enough to quickly take market share from established business. Christensen recommends that large companies maintain small, nimble divisions that attempt to replicate this phenomenon internally to avoid being blindsided and overtaken by startup competitors. Hey there,n nSorry for the late response!! I was on vacation this past week :)nnThe Innovator’s Dilemma by Clayton Christensen explains that “the innovator’s dilemma is that a company’s pursuit of sustained profits by introducing or improving its products to provide more value to existing customers will result in those same customers abandoning the company to buy innovations that offer at least some value The Innovator’s Dilemma is a book that explains how large incumbent companies lose market share by listening to their customers and providing what appears to be the highest-value products, but new companies that serve low-value customers with poorly developed technology can improve that technology incrementally until it is good enough nn
  • Results for fine-tuning the Llama 2 13B text generation model on the Dolly dataset:
Inputs Ground Truth Response from Non-Fine-Tuned Model Response from Fine-Tuned Model
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:nWho is the next coach at Fc Bayern München to take over from Julian Nagelsmann?nn### Input:nNagelsmann era (2021–2023)nUnder new coach Julian Nagelsmann, Bayern have completed the feat of winning 10 consecutive Bundesliga titles following a 3–1 Der Klassiker win. However, the team unexpectedly lost to Villarreal in the Champions League quarter-finals, going out at that stage for the second year in a row. On 24 March 2023, Nagelsmann was released by Bayern and replaced by Thomas Tuchel, who received a contract until 30 June 2025.nnnn### Response:n Thomas Tuchel replaced Julian Nagelsmann in March 2023 and is now the new coach at FC Bayern Munich, and was previously the coach at Chelsea. nn“`pythonn{n “instruction”: “Who is the next coach at Fc Bayern München to take over from Julian Nagelsmann?”, n “action”: “next coach of FC Bayern”, n “context”:{n “Nagelsmann era (2021–2023)”: n ” Under new coach Julian Nagelsmann, Bayern have completed the feat of winning 10 consecutive Bundesliga titles The next coach at Fc Bayern München will be Thomas Tuchel. Tuchel will take over from Julian Nagelsmann upon Nagelsmann’s departure from the team on 24 March, 2023.nn
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:nTell me about the life of Adolphe Cohnnn### Input:nAdolphe Cohn (May 29, 1851– 1930) was a Franco-American educator, born in Paris. He was graduated “bachelier ès lettres” from the University of Paris in 1868, and studied law, historical criticism, and philology at various institutions of higher learning in Paris, receiving the degrees of LL.B. in 1873. A pupil of the École des Chartes, his thesis was called Vues sur l’histoire de l’organisation judiciaire en France du IXe au XIIIe siècle considérée au point de vue des juridictions extraordinaires and he got the diploma of “archiviste paléographe” in 1874. At the commencement of the Franco-Prussian War in 1870, he enlisted and served in the French army throughout the struggle.nnCohn emigrated to New York City on May 13, 1875, and from 1876 to 1884 was the American correspondent of La République française, then edited by Léon Gambetta, whom he had known in France, and whose political views he had adopted. In March, 1882, Cohn was appointed tutor in French at Columbia College, and soon afterward made an instructor. By a popular vote of the French residents of New York he was chosen to deliver the funeral oration on Gambetta in 1883 at Tammany Hall, and in 1885 was called from Cambridge, Massachusetts, for a similar purpose, upon the death of Victor Hugo.nnnn### Response:n Adolphe Cohn (May 29, 1851 – 1930) was a Franco-American educator, born in Paris. He was graduated “bachelier es lettres” from the University of Paris in 1868, and studied law, historical criticism, and philology at various institutions of higher learning in Paris, receiving the degrees of LL.B. in 1873. nnA pupil of the École des Chartes, his thesis was called Vues sur l’histoire de l’organisation judiciaire en France du IXe au XIIIe siècle considérée au point de vue des juridictions extraordinaires and he got the diploma of “archiviste paléographe” in 1874. At the commencement of the Franco-Prussian War in 1870, he enlisted and served in the French army throughout the struggle.nnCohn emigrated to New York City on May 13, 1875, and from 1876 to 1884 was the American correspondent of La République française, then edited by Léon Gambetta, whom he had known in France, and whose political views he had adopted. In March, 1882, Cohn was appointed tutor in French at Columbia College, and soon afterward made an instructor. By a popular vote of the French residents of New York he was chosen to deliver the funeral oration on Gambetta in 1883 at Tammany Hall, and in 1885 was called from Cambridge, Massachusetts, for a similar purpose, upon the death of Victor Hugo. nAdolphe Cohn (May 29, 1851– 1930) was a Franco-American educator, born in Paris. He was graduated “bachelier ès lettres” from the University of Paris in 1868, and studied law, historical criticism, and philology at various institutions of higher learning in Paris, receiving the degrees of LL.B. in 1873. A pupil of the É Adolphe Cohn was a Franco-American educator who was born in Paris on May 29, 1851. He attended the University of Paris and earned a law degree in 1873. He also worked at the École des Chartes and earned his “archiviste paléographe” degree in 1874. At the start of the Franco-Prussian War, he joined the French army and served them for the whole
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:nHow has McDonald’s target audience changed over time?nn### Input:nFast food restaurants often aim some of their advertising directly at the youth population. McDonald’s Happy Meals, which include a toy often tied in with a newly released family film, is a significant example. Ronald McDonald, a clown advertising mascot introduced in 1963 and designed to appeal to young children, is another. In addition, in 1987 McDonald’s incorporated a Play Place in their restaurants to further advertise to children, making their restaurants a more appealing environment for children. Additionally, from 1996 to 2006, Disney was an exclusive partner with McDonald’s, linking their products together. They announced the end of this deal in May 2006, with some reports saying that Disney was worried about childhood obesity. Other than Disney, McDonald’s has also been partnered with Nintendo since 1985, when the Nintendo Entertainment System was first introduced. In 1987, McDonald’s also created a Nintendo play land in their restaurants where kids could play on Nintendo consoles while waiting for their food. During 2007, McDonald’s began to provide WiFi from the Nintendo consoles; giving children the ability to play wherever they were. With also a target audience of children releasing these new films and incorporating toys, it uses the kids to reach out to their parents pockets because of their large investment.nnnn### Response:n McDonald’s has consistently aimed their advertising directly at the youth population.nnStarting in 1963, they introduced Ronald McDonald, a clown advertising mascot designed to appeal to young children. Over the years, McDonald’s partnered with multiple companies that have products targeted at children, such as Nintendo and Disney. Fast food restaurants have always targeted the youth population. Happy meals are also a huge factor when it comes to advertsing to youths.nAs many movies are realeased Happy Meals also come with a toy to promote that movie in your local McDonald’snn—n## 400-999nn### Comment:nAs a young company with a relatively narrow vision of the world, McDonald’s didn Since launch, McDonald’s has targeted its audience to young children and teens with its kid oriented fast food restaurants and food products, as well as its advertising, which has largely targeted the youth. While the children are being driven to McDonalds by their parents, the restaurant is in essence a parental advertisement for their children: McDonalds appeals to children with their brightly colored restaurants and toys found in their food that appeal to
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:nWhat is Anandapuram mandalnn### Input:nAnandapuram mandal is one of the 46 mandals in Visakhapatnam district of the Indian state of Andhra Pradesh. It is under the administration of Visakhapatnam revenue division and the headquarters are located at Anandapuram. Anandapuram mandal is bounded by Bheemunipatnam mandal to the east, Chinagadila to the south, Padmanabham to the north and Kothavalasa to the west.nnnn### Response:n Anandapuram mandal is one of the 46 mandals in Visakhapatnam district of the Indian state of Andhra Pradesh. It is under the administration of Visakhapatnam revenue division and the headquarters are located at Anandapuram. nnnn Anandapuram mandal is one of the 46 mandals in Visakhapatnam district of the Indian state of Andhra Pradesh. It is under the administration of Visakhapatnam revenue division and the headquarters are located at Anandapuram. Anandapuram mandal is bounded by Bheemunipatnam mandal to the east, Chinagadila to the south, Padmanabham to the north and Koth
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:nWhat happened in Woodstocknn### Input:nWoodstock Music and Art Fair, commonly referred to as Woodstock, was a music festival held during August 15–18, 1969, on Max Yasgur’s dairy farm in Bethel, New York, United States, 40 miles (65 km) southwest of the town of Woodstock. Billed as “an Aquarian Exposition: 3 Days of Peace & Music” and alternatively referred to as the Woodstock Rock Festival, it attracted an audience of more than 400,000 attendees. Thirty-two acts performed outdoors despite sporadic rain. It was one of the largest music festivals held in history.nnThe festival has become widely regarded as a pivotal moment in popular music history as well as a defining event for the counterculture generation. The event’s significance was reinforced by a 1970 documentary film, an accompanying soundtrack album, and a song written by Joni Mitchell that became a major hit for both Crosby, Stills, Nash & Young and Matthews Southern Comfort. Musical events bearing the Woodstock name were planned for anniversaries, which included the tenth, twentieth, twenty-fifth, thirtieth, fortieth, and fiftieth. In 2004, Rolling Stone magazine listed it as number 19 of the 50 Moments That Changed the History of Rock and Roll. In 2017, the festival site became listed on the National Register of Historic Places.nnnn### Response:n The Woodstock Music and Art Fair, commonly referred to as Woodstock, was a music festival held during August 15–18, 1969, on Max Yasgur’s dairy farm in Bethel, New York, United States, 40 miles (65 km) southwest of the town of Woodstock. Billed as “an Aquarian Exposition: 3 Days of Peace & Music” and alternatively referred to as the Woodstock Rock Festival, it attracted an audience of more than 400,000 attendees. Thirty-two acts performed outdoors despite sporadic rain. It was one of the largest music festivals held in history.nnThe festival has become widely regarded as a pivotal moment in popular music history as well as a defining event for the counterculture generation. The event’s significance was reinforced by a 1970 documentary film, an accompanying soundtrack album, and a song written by Joni Mitchell that became a major hit for both Crosby, Stills, Nash & Young and Matthews Southern Comfort. Musical events bearing the Woodstock name were planned for anniversaries, which included the tenth, twentieth, twenty-fifth, thirtieth, fortieth, and fiftieth. In 2004, Rolling Stone magazine listed it as number 19 of the 50 Moments That Changed the History of Rock and Roll. In 2017, the festival site became listed on the National Register of Historic Places. nAnswer:nn“`jsndocument.getElementById(‘_0x1000’).innerHTML = “https://commons.wikimedia.org/wiki/File:BethelMeetingHouse.JPG#/media/File:BethelMeetingHouse.JPG”;nn“`nn</details>nn## Solution:nn<details>n<summary>To View Solution</summary>nn“`jsnn// Solution required Woodstock Music and Art Fair held during the latter part of August 15–18, was a historic event in Bethel, New York by the counterculture generation. The crowd at the festival was more than 400,000 strong in attendance. About 32 acts performed outdoors in spite of sporadic rain. It was regarded as one of the biggest music festivals that had ever been in history.nn
  • Results for fine-tuning the Llama 2 70B text generation model on the Dolly dataset:
Inputs Ground Truth Response from Non-Fine-Tuned Model Response from Fine-Tuned Model
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:nWho is the company ABBnn### Input:nABB Ltd. is a Swedish-Swiss multinational corporation headquartered in Zürich, Switzerland. The company was formed in 1988 when Sweden’s Allmänna Svenska Elektriska Aktiebolaget (ASEA) and Switzerland’s Brown, Boveri & Cie merged to create ASEA Brown Boveri, later simplified to the initials ABB. Both companies were established in the late 1800s and were major electrical equipment manufacturers, a business that ABB remains active in today. The company has also since expanded to robotics and automation technology.nnnn### Response:n ABB Ltd. is a Swedish-Swiss multinational corporation headquartered in Zürich, Switzerland. The company was formed in 1988 when Sweden’s Allmanna Svenska Elektriska Aktiebolaget (ASEA) and Switzerland’s Brown, Boveri & Cie merged to create ASEA Brown Boveri, later simplified to the initials ABB. nnBoth companies were established in the late 1800s and were major electrical equipment manufacturers, a business that ABB remains active in today. The company has also since expanded to robotics and automation technology. * ABB is a swedish-swiss multinational corporationnn### LognEND I/O DATA —n ABB is a Swedish-Swiss multinational corporation headquartered in Zürich, Switzerland. The company was formed in 1988 when Sweden’s Allmänna Svenska Elektriska Aktiebolaget (ASEA) and Switzerland’s Brown, Boveri & Cie merged to create ASEA Brown Boveri, later simplified to the initials ABB. Both companies were established in the late 1800s and were
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:nWhy did a William the Conqueror, a Frenchman, become the King of England?nn### Input:nIn the 1050s and early 1060s, William became a contender for the throne of England held by the childless Edward the Confessor, his first cousin once removed. There were other potential claimants, including the powerful English earl Harold Godwinson, whom Edward named as king on his deathbed in January 1066. Arguing that Edward had previously promised the throne to him and that Harold had sworn to support his claim, William built a large fleet and invaded England in September 1066. He decisively defeated and killed Harold at the Battle of Hastings on 14 October 1066. After further military efforts, William was crowned king on Christmas Day, 1066, in London. He made arrangements for the governance of England in early 1067 before returning to Normandy. Several unsuccessful rebellions followed, but William’s hold was mostly secure on England by 1075, allowing him to spend the greater part of his reign in continental Europe.nnnn### Response:n William the Conqueror was from Normandy, located in northwest France. The King of England in the late 1050s and early 1060s was Edward the Confessor. Edward was William’s cousin. Since Edward was childless, there were multiple people who claimed that they were the rightful heir and should become king. These conflicting claims gave rise to armed conflict. William invaded England, where he won the Battle of Hastings, defeating earl Harold Godwinson. A number of battles followed, but William became King of England late in 1066. He dealt with multiple rebellions over the next decade, but retained his power. Ultimately he was able to return to Normandy and rule England from there. William the Conqueror became King of England as in the 1050s and early 1060s, he became a contender for the throne of England held by the childless Edward the Confessor, his first cousin once removed.nnnResponse can be empty but must fit within 256 characters long.n After the death of King Harold in the Battle of Hastings in October 14, 1066, William the Conqueror was crowned the King of England on Christmas Day, 1066, establishing the Norman conquest of England.nn
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:nTell me about the RMS Adriaticnn### Input:nRMS Adriatic was a British ocean liner of the White Star Line. She was the fourth of a quartet of ships of more than 20,000 GRT, dubbed The Big Four. The ship was the only one of the four which was never the world’s largest ship. However, she was the largest, the fastest, and the most luxurious of the Big Four, being the first ocean liner to have an indoor swimming pool and a Turkish bath.nnShe began her career on the brand new route from Southampton to New York before joining, from 1911, her sister ships on the secondary route from Liverpool. They were in fact slow liners intended to provide a service at moderate prices. When World War I broke out, the Adriatic was among the ships that continued their civilian transatlantic service, while carrying many provisions. In 1917, she was requisitioned and served as a troop transport.nnAfter the war, she was refitted several times, and was gradually used for cruises, which became her main services in the 1930s. When Cunard Line and White Star Line merged in 1934, she was quickly deemed unnecessary and sold at the end of the year. She was scrapped in Osaka in 1935.nnnn### Response:n The RMS Adriatic was a British ocean liner of the White Star Line. She was the fourth of a quartet of ships of more than 20,000 GRT, dubbed The Big Four. The ship was the only one of the four which was never the world’s largest ship. However, she was the largest, the fastest, and the most luxurious of the Big Four, being the first ocean liner to have an indoor swimming pool and a Turkish bath. RMS Adriatic was a British ocean liner of the White Star Line. She was the fourth of a quartet of ships of more than 20,000 GRT, dubbed The Big Four. The ship was the only one of the four which was never the world’s largest ship. nn### ContextnA question of this type is commonly used to query a search engine. The input is an input of the query, and the response is the public The RMS Adriatic was a british ocean liner of the white star line. She was launched in 1906 but her first commercial sailing was in 1907. She was the fourth of a quartet of ships over 20,000 tonnes and dubbed the Big Four. The other three were the RMS Celtic in 1901, the RMS Baltic in 1903, and the R
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:nWhat is the CGRO that detects photons?nn### Input:nThe Compton Gamma Ray Observatory (CGRO) was a space observatory detecting photons with energies from 20 keV to 30 GeV, in Earth orbit from 1991 to 2000. The observatory featured four main telescopes in one spacecraft, covering X-rays and gamma rays, including various specialized sub-instruments and detectors. Following 14 years of effort, the observatory was launched from Space Shuttle Atlantis during STS-37 on April 5, 1991, and operated until its deorbit on June 4, 2000. It was deployed in low Earth orbit at 450 km (280 mi) to avoid the Van Allen radiation belt. It was the heaviest astrophysical payload ever flown at that time at 17,000 kilograms (37,000 lb).nnnn### Response:n The CGRO is the Compton Gamma Ray Observatory (CGRO) and was a space observatory detecting photons with energies from 20 keV to 30GeV in earth from 1991 to 2000. **Transcriber:** The EGRET instrument detects photons.nn<!–nThe following comments are intended to provide “tips” that can be used to apply the concepts from the course to this problem. Some of these comments are “correct” (specifically the explanations of the reasoning for each “correct” response), and some are “incorrect” (those explain what is wrong with the incorrect responsese).nnThe “Correct” way to The Compton Gamma Ray Observatory (CGRO) measures photons with energies from 20 keV to 30 GeV.nn
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:nHow good as a pianist is Yeol Eum Sonnn### Input:nYeol Eum Son (born May 2, 1986, in Wonju, South Korea) is a South Korean classical pianist. She first drew international attention in October 2004 at age 18 when she appeared as a soloist performing Liszt Piano Concerto No. 1 with the New York Philharmonic under the baton of Lorin Maazel on their Asia tour in Seoul, Daejeon, and Tokyo. Son again performed with Maazel and the New York Philharmonic when they returned to the Seoul Arts Center in February 2008, this time as soloist for Beethoven Piano Concerto No. 2.nnHer awards include Silver Medals at both the Thirteenth Van Cliburn International Piano Competition (2009) and 2011 International Tchaikovsky Competition, where she also received the Best Chamber Concerto Performance and the Best Performance of the Commissioned Work prizes. She also won the Bronze Medal at the Arthur Rubinstein International Piano Master Competition (2005) nnnn### Response:n Yeol Eum Son (born May 2, 1986, in Wonju, South Korea) is a South Korean classical pianist. She first drew international attention in October 2004 at age 18 when she appeared as a soloist performing Liszt Piano Concerto No. 1 with the New York Philharmonic under the baton of Lorin Maazel on their Asia tour in Seoul, Daejeon, and Tokyo. Son again performed with Maazel and the New York Philharmonic when they returned to the Seoul Arts Center in February 2008, this time as soloist for Beethoven Piano Concerto No. 2.nnHer awards include Silver Medals at both the Thirteenth Van Cliburn International Piano Competition (2009) and 2011 International Tchaikovsky Competition, where she also received the Best Chamber Concerto Performance and the Best Performance of the Commissioned Work prizes. She also won the Bronze Medal at the Arthur Rubinstein International Piano Master Competition (2005) Yeol Eum Son is an accomplished pianist, who has performed with some of the world’s most prestigious orchestras. In 2009, she won silver at the Thirteenth Van Cliburn International Piano Competition and in 2005 she won bronze at the Arthur Rubinstein International Piano Master Competition. She continues to perform with great success all over the world. Yeol Eum Son is a very good pianist. She began her international career by joining some of the greatest musicians in the world at age 18. And was already a Silver Medalist in two international competitions by the time she turned 25.nn
  • Results for fine-tuning the Llama 2 7B text generation model on the Dialogsum dataset:
Inputs Ground Truth Response from Non-Fine-Tuned Model Response from Fine-Tuned Model
Summarize this dialog:n#Person1#: This is a great meal!n#Person2#: Thank you. I am glad that everything turned out so well.n#Person1#: Who taught you how to cook this well?n#Person2#: I took a cooking class at the Culinary Institute.n#Person1#: Wow, what is in this chicken dish?n#Person2#: Isn’t that great? That is coconut ginger chicken with rice.n#Person1#: Is the soup a shrimp soup?n#Person2#: Yes, there is shrimp in there along with sea vegetables and lemon grass.n#Person1#: I think I got lucky in choosing a wine that blends with this meal.n#Person2#: I appreciate you bringing the wine. It goes well with the chicken.n—nSummary:nnn### Response:n #Person1# thinks the meal prepared by #Person2# is great. #Person2# introduces the ingredients of the dishes. After the dialog is complete, the person will provide an appropriate response to the information gathered. For example, person 1 can say,nn#Person1#: This is great! Thank you.nn#Person2#: You are very welcome. I thought that you would like your meal. I’m happy that you enjoyed it.nn—n— — — — –nnn### Summary:nnn### Response:nAfter the dialog #Person1# and #Person2# eat at #Person2#’s place. They talk about the food and the wine that blend with the chicken.nn
Summarize this dialog:n#Person1#: Excuse me, I am writing an essay about ancient Greece. Are there any books on this topic that you can recommend?n#Person2#: Geese? You want to write about ancient geese? I’ve never heard of such a thing. Anyway, it would be with a book of some animals, that’s row J. Or you want to try science, that’s in Row G.n#Person1#: Animals? But Greece is a country, not an animal.n#Person2#: Oh, Greece. Well, you know, today must be your lucky day. We have just received a new collection of books on that very topic. There are writers over here in row C.n#Person1#: I see. That’s a lot of books, which do you suggest?n#Person2#: Well, the one by Robin Smith is excellent but it is very long. It’s the one with the statue on the cover. There is also one by Milton Pike. It has a painting of a soldier on the front. As you can see, it’s not very thick, so it shouldn’t take too much time to read. But probably the best book is the one with the picture of ancient Olympic Stadium. It has won several history works rewards.n#Person1#: Well, it’s hard to decide, but I’m afraid I don’t have enough time for a lot of reading. I think I’ll get the book by Milton Pike.n#Person2#: OK. Could I have your library card please?n#Person1#: Sure, here you are.n—nSummary:nnn### Response:n #Person1# asks #Person2# to recommend some books on ancient Greece. #Person2# suggests the one by Robin Smith which is very long, the one by Milton Pike which is not very thick, and the one winning several history works rewards. Since #Person1# doesn’t have enough time, #Person1#’ll get the book by Milton Pike. n—nnSummary:n #Person1#’s writing an essay about ancient Greece. #Person2# recommends the books about ancient Greece to #Person1# which are all newly received and suggests getting the book by Milton Pike.nn
Summarize this dialog:n#Person1#: Well, I see you’ve got your books, George. But are you reading? Looks like you’re just coloring.n#Person2#: I’m marking the important parts of the book. The colors are for different things.n#Person1#: Like what?n#Person2#: Main ideas are yellow. Important details to remember are pink and the light green is for difficult ideas.n#Person1#: This seems like a lot of activities. Aren’t you wasting your time?n#Person2#: I don’t think so, mom. I’ll read it all tonight. I’ll listen to it tomorrow. And then when it’s time to study, I will already know most of the information.n#Person1#: In my day, we just read the book and tried to remember the information.n#Person2#: The teacher told us the importance of active reading. And it’s the best way to learn at whatever speed you choose.n#Person1#: I see. And what’s this notebook for?n#Person2#: That’s for taking notes. What kind of things do you write down?n#Person1#: Sometimes I write a summary. On this page I write down questions. And of course I have another page where I write down the new words.n—nSummary:nnn### Response:n George tells his mother that marking the important parts of the book in different colors is his way of learning. He also takes notes on the notebook and sometimes writes a summary. #Person1#: Ohhh! Active reading, reading questions, vocabulary notes, summaries, and highlighting. I can see now what a difference you have when it comes to learning information. I’m impressed.n#Person2#: Thanks. Reading is the best way to learn and remember new information. Are you using active reading right now?n#Person1#: Hmmm….. I’ll have to think about that.nnn#Person George describes the activities he’s doing in order to read his books fast. George shows the notebook to mom and explains how to take notes.nn
Summarize this dialog:n#Person1#: Good morning, sir. May I see your passport?n#Person2#: Yes, of course. Here it is.n#Person1#: Thank you. Are all these yours?n#Person2#: No, some of them are my friend’s. Do you want her passport, too?n#Person1#: Yes, please. Do you have any carry-on items?n#Person2#: Do you want me to put it on the belt, too?n#Person1#: Yes, please. Would you please open this big suitcase?n#Person2#: OK. Just a moment. It’s just some clothes and toiletries.n#Person1#: OK. Do you have anything to declare?n#Person2#: I think there is nothing.n#Person1#: Any fresh food, live plants, or animals?n#Person2#: Oh, no. None of that.n#Person1#: Thank you. You may lock up your suitcase. Here are your passports.n#Person2#: Thanks a lot.n—nSummary:nnn### Response:n #Person1# sees #Person2#’s and #Person2#’s friend’s passport and checks their luggage. #Person2# tells #Person1# they’ve nothing to declare. #Person1# lets them go. > I think there is nothing.nnn—n**Categories:**nn- [Conversation (32)](../conversation/32.md)n – [TSA (4)](../conversation/32/4.md)nn### Key:nn :information_source: 31 (TSA)n:information_source: 32 (TSA)nn—-n* Any copyright The immigration official asks #Person2# to show his passport and open his big suitcase. #Person2# has nothing to declare and he may lock up his suitcase.nn
Summarize this dialog:n#Person1#: Chinese tea is famous all over the world. I want to buy several boxes.n#Person2#: There are various kinds of tea in China. It depends on the taste and effect.n#Person1#: Well, I do not know Chinese tea at all.n#Person2#: This is green tea, you can drink it in summer. It can relieve summer-heat and it is good for maintaining your lungs.n#Person1#: Sounds interesting. What about this?n#Person2#: It is black tea, it has a strong taste and promotes digestion.n#Person1#: What’s this? It smells fragrant.n#Person2#: It is jasmine tea of high quality. It is fragrant and has a pleasant taste.n#Person1#: I will take the jasmine tea. Could you pack it with the wrapping paper and ribbon? It is a gift for my friend.n#Person2#: No problem. Wait for a moment.n—nSummary:nnn### Response:n #Person2# recommends green tea, black tee, and jasmine tee to #Person1# who wants to buy Chinese tea and #Person1# decides to take the jasmine tea. #[English]#: Chinese tea is famous all over the world. I want to buy several boxes.n#[French]#: Il y a différentes sorts de thé chinois au pays. C’est selon le goût, et les effets qu’on veut obtenir.n#[Spanish]#: El té chino es muy famosos, tengo que comprar varias cajas.n#[German]#: Ich mö #Person1# wants to buy several boxes of Chinese tea, with different kinds of tea. #Person2# helps #Person1# to select and will pack it up.nn
  • Results for fine-tuning the Llama-2 13B model on the Dialogsum dataset:
Inputs Ground Truth Response from Non-Fine-Tuned Model Response from Fine-Tuned Model
Summarize this dialog:n#Person1#: Mrs. Thompson once ditheist, Jason.n#Person2#: Why? We shouldn’t be in trouble or anything? In fact we were super active during the entire class discussion this morning.n#Person1#: That’s just it. I don’t think it’s a bad thing at all. I think she is going to ask us to run for class monitor.n#Person2#: Whoa! Really? That’s a lot of responsibility.n#Person1#: Well, I think you’d be great at it. Besides, it’s a big honor to be asked to do it. It means that Mrs. Thompson sees us as leaders.n#Person2#: I’m not sure if I can do it by myself, though. Could we do it together?n#Person1#: Let’s ask Mrs. Thompson. I hear her coming in from the hallway now.n—nSummary:nnn### Response:n #Person1# tells Jason Mrs. Thompson once ditheist and #Person1# guesses she wants them to run for class monitors. They’ll ask Mrs. Thompson whether they can do this together. n* When the teacher asked Jason and his classmate if they wanted to run for class monitor, he and his classmate agreed it together. Both boys are Christian, but they are not friends in class.n* Their faith in Jesus may not be publicly shared, but it is evident through prayer and the example they follow together to help others and show love.nn—n## 1654nn**[Situation:** Students are talking about the recent Mrs. Thompson asks Jason and James to run for the class monitor because she thinks they are leaders.nn
Summarize this dialog:n#Person1#: Hey, let’s pull over and look at that house over there.n#Person2#: Yes, that looks great! Is it on our Open House list?n#Person1#: Yes, it is on our list.n#Person2#: Well, it looks great from the curb. Let’s go on inside.n#Person1#: Let’s sign the registry. The realtor looks kind of busy right now.n#Person2#: OK, let’s go check out the kitchen next.n#Person1#: Oh look, they have some refreshments in here.n#Person2#: Make sure that you pick up a flyer with the information about the home.n#Person1#: Does it say how much they are asking for this house?n#Person2#: Yes, it seems to have been discounted from two hundred and thirty-five thousand to two hundred and twenty-five thousandn—nSummary:nnn### Response:n #Person1# and #Person2# come to see a house. They check out the inside and talk about its price. This is an AMI response to the following stimulus.nnPerson 2: nYes, that looks great! Is it on our Open House list? nYes, it is on our list.nWell, it looks great from the curb. Let’s go on inside.nLet’s sign the registry. The realtor looks kind of busy right now.nOK, let’s go check out the kitchen next.nOh look, they have some #Person1# and #Person2# are looking around a house which is on their Open House list. They sign a registry and #Person2# tells #Person1# to take a flyer with the information.nn
Summarize this dialog:n#Person1#: How are you, Mr. Wilson? I am Tina.n#Person2#: Glad to meet you, Mrs. Tina.n#Person1#: Please have a seat. I know our employment of forcer has already given you the information about your employee contract, but I’d like to go over the main details again before signing. First, you’ll be getting a monthly salary, with one month paid vacation after one year of service.n#Person2#: That’s what I understand.n#Person1#: You’ll be covered by our medical plan while on duty. Since you’ll be middle management, you’re expected to be available up to 1. 5 hours past normal working hours. Any approved time over that will be paid at time and a half, which you can take as salary or time off.n#Person2#: Exactly my understanding.n#Person1#: A reasonable number of sick days will be covered by the company. Any extended illness will be covered by insurance. Have you read the other terms of the contract?n#Person2#: Yes, I have.n#Person1#: Do you have any other questions?n#Person2#: Just one. I noticed an item about flex-time. Is that a possibility for me?n#Person1#: Yes, it is, but you’ll have to discuss the details with your manager.n#Person2#: That’s acceptable.n#Person1#: Good. Now, if you’ll just sign here, you can start work in 3 days.n—nSummary:nnn### Response:n Mr.Wilson describes the elementary information about the employment contract to Tina. He mentions monthly salary, one month paid vacation, medical plans, extended working hours and flex-time. nNo, the summaries aren’t right. The #Person1 dialogs don’t say the name is Tina, they don’t introduce themselves. I think they both know each others’ names, so that’s unneeded.nnI think if everything else is right, the Summary won’t change, unless #Person1 ends, and that’s ok.nn Tina meets with Mr. Wilson to go over the main details of Tina’s employee contract. The two discuss the monthly salary and medical plan. Tina has no questions and will start work in 3 days.nn
Summarize this dialog:n#Person1#: Jack, what is your favorite subject?n#Person2#: My favorite subject? Well, I think I prefer science. Yes, science, I find it very interesting. The science teacher makes the subject very exciting. What about you?n#Person1#: I always do very well in the maths tests. I also like geography. I like learning about people and the places where they live. Are there any subjects that you don’t like Jack?n#Person2#: I don’t like physics very much. It’s not easy to understand, most of the children are bored with it.n—nSummary:nnn### Response:n Jack’s favorite subject is science while he doesn’t like physics very much. #Person1# likes maths and geography. n—nn### The Word: “favorite” (adj)nnMeaning: Loved, appreciated most of the others nExamples: My favorite book is the Bible; What’s your favorite sport?; My favorite color is blue.nn—nnnn### What is your ____ subject? (formal)nnQuestion and Answer Dialog:nn#Person1#: What’s your favorite subject?n# Jack and Jack are talking about their favorite and the least liked subjects.nn
Summarize this dialog:n#Person1#: Hi, Maggie. I saw your painting in the end of year school art exhibition that all the parents went to. It was great.n#Person2#: Well, our art teacher seemed really pleased with it, but I’m not sure what the parents thought about.n#Person1#: Oh, don’t worry. I only heard them saying good things about it.n#Person2#: Hmm…He probably didn’t notice that I hadn’t used the right colors.n#Person1#: I don’t think anyone noticed.n#Person2#: No, but I did. But it was too late to do anything about it. Well, next time I’ll do something better hopefully.n—nSummary:nnn### Response:n #Person1# saw Maggie’s painting and thinks it was great, but Maggie worries that she didn’t use the right colors and hopes she’ll do something better next time. This summary should highlight the key points and conclusions from the discussion.nn# Respond to Person1:n[Response summary]nn—n### Response:nThis summary should highlight the key points and conclusions from the discussion.nn# Respond to Person2:n[Response summary]nn### Person1:n[Person1 summary]n—n### Person2:n[Person2 summary]nn### Personn—n #Person1# saw Maggie’s painting and told Maggie that a lot of parents said great things about Maggie’s painting. Maggie does not think so.nn
  • Results for fine-tuning the Llama 2 70B model on the Dialogsum dataset:
Inputs Ground Truth Response from Non-Fine-Tuned Model Response from Fine-Tuned Model
Summarize this dialog:n#Person1#: Have you received any degrees?n#Person2#: Yes. In 1996 I received my Bachelor of Science degree from Hebes University, and in 2001 I received my MBA degree from Peking University.n#Person1#: How about your academic records at college?n#Person2#: In fact my records were excellent. My overall GPA was 9 on a 10 scale, which was the highest in my class.n#Person1#: That’s very impressive. Which course did you like best?n#Person2#: English. It was both interesting and useful, so I showed a great interest in it.n#Person1#: Can you tell me why you changed your major when you were a graduate student?n#Person2#: Because I am very interested in administration and I did take some courses in it. I also performed well in the subjects. Hence I believe that I can do a good job in this position.n#Person1#: Did you get any honors and awards at college?n#Person2#: Yes. I was awarded a scholarship from the university every year. In 1995 I participated in the National Contest of Maths Models and I won the prize.n—nSummary:nnn### Response:n #Person1# interviews #Person2# and asks #Person2# some questions, including #Person2#’s degrees, academic records, the favorite course, and awards in college. #Person2# also tells #Person1# why #Person2# changed the major into administration. The candidate summarized that the main idea of the dialogue was that he received two degrees, 0. And the majors did well. <br/>nHowever his first sentence doesn’t provide much emotion above besides narcissism, nor does it state why his experience is relevant to the company. He could make a better connection and perhaps end the summary with an underlying emotion/attitude other than “professional” (as he does in the dialog).<br/> #Person2# tells #Person1# #Person2# received a bachelor’s degree and an MBA. #Person2#’s GPA is the highest in #Person2#’s class and #Person2# is interested in administration. #Person2# also got some honors and awards at college.nn
Summarize this dialog:n#Person1#: Excuse me, what material is it? n#Person2#: Rhaki. Do you want some? n#Person1#: No, where are the cotton cloths? n#Person2#: I will show you. n#Person1#: What is the shrink rate after washing? n#Person2#: Less than 5%. It’s quite durable. n—nSummary:nnn### Response:n #Person2# tells #Person1# the cotton clothes are durable. For route QA41913, this section should contain a paragraph summary of the dialog so that, for example, an automatic pronunciation and speech generation system could use this text for output synthesis. #Person1# wants some cotton cloth. #Person2# shows some rhaki.nn
Summarize this dialog:n#Person1#: Sorry, I’m late, Tom.n#Person2#: It’s all right, Rita. Where have you been?n#Person1#: At the police station. I’ve lost my handbag. Or perhaps someone’s taken it. I don’t know.n#Person2#: Oh, no. What happened? Sit down. Was there anything important in it?n#Person1#: Important! My checkbook, all the papers I need for work.n#Person2#: Oh, that’s terrible. But how did you lose it?n#Person1#: Well, as you know, I was with a friend all morning and we had lunch together. After I had lunch, I went shopping. And when I wanted to buy something, I couldn’t find my checkbook. Then I remembered that it was in my handbag. And my handbag was in my car.n#Person2#: So you went back to your car.n#Person1#: But I didn’t find it there.n#Person2#: And you went to the police station?n#Person1#: Not immediately. Before I went to the police station I called my friend’s office. No luck.n#Person2#: You should go to the restaurant where you had lunch and look for it.n#Person1#: Oh, I should have done that.n#Person2#: Now you’d better telephone the manager right away.n—nSummary:nnn### Response:n Rita tells Tom she’s late because she’s lost her handbag and her efforts on searching for the handbag. Tom suggests Rita go to the restaurant where she had lunch and look for it. Translate each sentence into English, using the word combination you like the best:n### Value:n Rita lost her handbag and Tom advises her to go to the restaurant where she had lunch to look for it.nn
Summarize this dialog:n#Person1#: Morning, Mum!n#Person2#: Morning, Meg. You look not well today? Are you ill?n#Person1#: No, I am not ill.n#Person2#: Then, What’s the matter with you my child?n#Person1#: Nothing.n#Person2#: Oh, come on, baby. Tell me what happened.n#Person1#: I. . . I failed to pass the examination. How I wish I had studied hard.n#Person2#: Oh. Take it easy. You can set your aim from today.n#Person1#: Ok, Mum. I will not fail the examination next time.n—nSummary:nnn### Response:n Meg isn’t in the mood because she failed the examination. Her mom encourages her to set her aim from today. Aide sans action verbnAide et direct object apres le verbe amountn #Person1# failed to pass the examination and tells Mum about it. Mum encourages #Person1# to study harder.nn
Summarize this dialog:n#Person1#: Everyone wants to be financial lose kill. You must invest your savings if you planed to retire rich. I’m thinking about buying stocks. It can be a good investment if I can manage well. What do you say?n#Person2#: I partly agree with you. Buying stocks is not difficult, but making money consistently from buying stock is very difficult. Even professionals don’t find it easy, so be careful. Anyway, people like us need help with investing. Why not do some online research? There are data, reports and discussion forums to educate people on how to invest. In addition, you need to visit an established website that offers stock charts.n#Person1#: Wow, you sound very professional.n#Person2#: I just watched the investment TV program last weekend. It has become a very hot topic these days. Actually I am not even a beginner.n—nSummary:nnn### Response:n #Person1# and #Person2# are talking about buying stocks. #Person2# thinks it is hard to make money consistently in this way and suggests doing online research to learn about investing. nI agree with the investment strategy recommended earlier. It took me a great deal of time and energy to come up with such plan. It is great that trust has been put it my judgement to this extent. Indeed, my colleague can feel secure, because I will do everything in order to lead this team to bright and safe retirement future. I look forward to your role as an active member of the team. #Person1# wants to buy stocks but #Person2# tells #Person1# to be careful because it is difficult to invest. #Person2# suggests #Person1# do online research, visit established websites, and watch investment programs.nn

Dataset formatting

We currently offer two types of fine-tuning: instruction fine-tuning and domain adaption fine-tuning. You can easily switch to one of the training methods by specifying the parameter instruction_tuned as ‘True‘ or ‘False‘.

Domain adaption format

The text generation model can also be fine-tuned on any domain-specific dataset. After it’s fine-tuned on the domain-specific dataset, the model is expected to generate domain-specific text and solve various NLP tasks in that specific domain with few-shot prompting.

For input to the model, use a training and optional validation directory. Each directory contains a CSV, JSON, or TXT file. For CSV and JSON files, the train or validation data is used from the column called text or the first column if no column called text is found. The number of files under train and validation (if provided) should equal to 1, respectively.

The output is a trained model that can be deployed for inference.

The following is an example of a TXT file for fine-tuning the text generation model. The TXT file is SEC filings of Amazon from 2021–2022:

This report includes estimates, projections, statements relating to our
business plans, objectives, and expected operating results that are “forward-
looking statements” within the meaning of the Private Securities Litigation
Reform Act of 1995, Section 27A of the Securities Act of 1933, and Section 21E
of the Securities Exchange Act of 1934. Forward-looking statements may appear
throughout this report, including the following sections: “Business” (Part I,
Item 1 of this Form 10-K), “Risk Factors” (Part I, Item 1A of this Form 10-K),
and “Management’s Discussion and Analysis of Financial Condition and Results
of Operations” (Part II, Item 7 of this Form 10-K). These forward-looking
statements generally are identified by the words “believe,” “project,”
“expect,” “anticipate,” “estimate,” “intend,” “strategy,” “future,”
“opportunity,” “plan,” “may,” “should,” “will,” “would,” “will be,” “will
continue,” “will likely result,” and similar expressions. Forward-looking
statements are based on current expectations and assumptions that are subject
to risks and uncertainties that may cause actual results to differ materially.
We describe risks and uncertainties that could cause actual results and events
to differ materially in “Risk Factors,” “Management’s Discussion and Analysis
of Financial Condition and Results of Operations,” and “Quantitative and
Qualitative Disclosures about Market Risk” (Part II, Item 7A of this Form
10-K). Readers are cautioned not to place undue reliance on forward-looking
statements, which speak only as of the date they are made. We undertake no
obligation to update or revise publicly any forward-looking statements,
whether because of new information, future events, or otherwise.

GENERAL

Embracing Our Future ...

Instruction fine-tuning

The text generation model can be instruction-tuned on any text data provided that the data is in the expected format. The instruction-tuned model can be further deployed for inference.

For input, use a training and optional validation directory. The train and validation directories should contain one or multiple JSON lines (.jsonl) formatted files. In particular, the train directory can also contain an optional *.json file describing the input and output formats.

The best model is selected according to the validation loss, calculated at the end of each epoch. If a validation set is not given, an (adjustable) percentage of the training data is automatically split and used for validation.

The training data must be formatted in a JSON lines (.jsonl) format, where each line is a dictionary representing a single data sample. All training data must be in a single folder; however, it can be saved in multiple .jsonl files. The .jsonl file extension is mandatory. The training folder can also contain a template.json file describing the input and output formats. If no template file is given, the following template will be used:

{
    "prompt": "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:n{instruction}nn### Input:n{context}`,
    "completion": "{response}",
}

In this case, the data in the JSON lines entries must include prompt and completion fields. If a custom template is provided, it must also use prompt and completion keys to define the input and output templates. The following is a sample custom template:

{
  "prompt": "question: {question} context: {context}",
  "completion": "{answer}"
}

Here, the data in the JSON lines entries must include the question, context, and answer fields.

The output is a trained model that can be deployed for inference.

We provide a subset of SEC filings data of Amazon. It is downloaded from publicly available EDGAR. For instructions on accessing the data, refer to Accessing EDGAR Data.

License: Creative Commons Attribution-ShareAlike License (CC BY-SA 4.0)


Read More

Run multiple generative AI models on GPU using Amazon SageMaker multi-model endpoints with TorchServe and save up to 75% in inference costs

Run multiple generative AI models on GPU using Amazon SageMaker multi-model endpoints with TorchServe and save up to 75% in inference costs

Multi-model endpoints (MMEs) are a powerful feature of Amazon SageMaker designed to simplify the deployment and operation of machine learning (ML) models. With MMEs, you can host multiple models on a single serving container and host all the models behind a single endpoint. The SageMaker platform automatically manages the loading and unloading of models and scales resources based on traffic patterns, reducing the operational burden of managing a large quantity of models. This feature is particularly beneficial for deep learning and generative AI models that require accelerated compute. The cost savings achieved through resource sharing and simplified model management makes SageMaker MMEs an excellent choice for you to host models at scale on AWS.

Recently, generative AI applications have captured widespread attention and imagination. Customers want to deploy generative AI models on GPUs but at the same time are conscious of costs. SageMaker MMEs support GPU instances and is a great option for these types of applications. Today, we are excited to announce TorchServe support for SageMaker MMEs. This new model server support gives you the advantage of all the benefits of MMEs while still using the serving stack that TorchServe customers are most familiar with. In this post, we demonstrate how to host generative AI models, such as Stable Diffusion and Segment Anything Model, on SageMaker MMEs using TorchServe and build a language-guided editing solution that can help artists and content creators develop and iterate their artwork faster.

Solution overview

Language-guided editing is a common cross-industry generative AI use case. It can help artists and content creators work more efficiently to meet content demand by automating repetitive tasks, optimizing campaigns, and providing a hyper-personalized experience for the end customer. Businesses can benefit from increased content output, cost savings, improved personalization, and enhanced customer experience. In this post, we demonstrate how you can build language-assisted editing features using MME TorchServe that allow you to erase any unwanted object from an image and modify or replace any object in an image by supplying a text instruction.

The user experience flow for each use case is as follows:

  • To remove an unwanted object, the select the object from the image to highlight it. This action sends the pixel coordinates and the original image to a generative AI model, which generates a segmentation mask for the object. After confirming the correct object selection, you can send the original and mask images to a second model for removal. The detailed illustration of this user flow is demonstrated below.
ML-14465-dog-click

Step 1: Select an object (“dog”) from the image

Step 2: Confirm the correct object is highlighted

Step 3: Erase the object from the image

  • To modify or replace an object, the select and highlight the desired object, following the same process as described above. Once you confirm the correct object selection, you can modify the object by supplying the original image, the mask, and a text prompt. The model will then change the highlighted object based on the provided instructions. A detailed illustration of this second user flow is as follows.

Step 1: Select an object (“vase”) from the image

Step 2: Confirm the correct object is highlighted

Step 3: Provide a text prompt (“futuristic vase”) to modify the object

To power this solution, we use three generative AI models: Segment Anything Model (SAM), Large Mask Inpainting Model (LaMa), and Stable Diffusion Inpaint (SD). Here are how these models been utilized in the user experience workflow:

To remove an unwanted object To modify or replace an object
  1. Segment Anything Model (SAM) is used to generate a segment mask of the object of interest. Developed by Meta Research, SAM is an open-source model that can segment any object in an image. This model has been trained on a massive dataset known as SA-1B, which comprises over 11 million images and 1.1 billion segmentation masks. For more information on SAM, refer to their website and research paper.
  2. LaMa is used to remove any undesired objects from an image. LaMa is a Generative Adversarial Network (GAN) model specializes in fill missing parts of images using irregular masks. The model architecture incorporates image-wide global context and a single-step architecture that uses Fourier convolutions, enabling it to achieve state-of-the-art results at a faster speed. For more details on LaMa, visit their website and research paper.
  3. SD 2 inpaint model from Stability AI is used to modify or replace objects in an image. This model allows us to edit the object in the mask area by providing a text prompt. The inpaint model is based on the text-to-image SD model, which can create high-quality images with a simple text prompt. It provides additional arguments such as original and mask images, allowing for quick modification and restoration of existing content. To learn more about Stable Diffusion models on AWS, refer to Create high-quality images with Stable Diffusion models and deploy them cost-efficiently with Amazon SageMaker.

All three models are hosted on SageMaker MMEs, which reduces the operational burden from managing multiple endpoints. In addition to that, using MME eliminates concerns about certain models being underutilized because resources are shared. You can observe the benefit from improved instance saturation, which ultimately leads to cost savings. The following architecture diagram illustrates how all three models are served using SageMaker MMEs with TorchServe.

We have published the code to implement this solution architecture in our GitHub repository. To follow along with the rest of the post, use the notebook file. It is recommended to run this example on a SageMaker notebook instance using the conda_python3 (Python 3.10.10) kernel.

Extend the TorchServe container

The first step is to prepare the model hosting container. SageMaker provides a managed PyTorch Deep Learning Container (DLC) that you can retrieve using the following code snippet:

# Use SageMaker PyTorch DLC as base image
baseimage = sagemaker.image_uris.retrieve(
    framework="pytorch",
    region=region,
    py_version="py310",
    image_scope="inference",
    version="2.0.0",
    instance_type="ml.g5.2xlarge",
)
print(baseimage)

Because the models require resources and additional packages that are not on the base PyTorch DLC, you need to build a Docker image. This image is then uploaded to Amazon Elastic Container Registry (Amazon ECR) so we can access directly from SageMaker. The custom installed libraries are listed in the Docker file:

ARG BASE_IMAGE

FROM $BASE_IMAGE

#Install any additional libraries
RUN pip install segment-anything-py==1.0
RUN pip install opencv-python-headless==4.7.0.68
RUN pip install matplotlib==3.6.3
RUN pip install diffusers
RUN pip install tqdm
RUN pip install easydict
RUN pip install scikit-image
RUN pip install xformers
RUN pip install tensorflow
RUN pip install joblib
RUN pip install matplotlib
RUN pip install albumentations==0.5.2
RUN pip install hydra-core==1.1.0
RUN pip install pytorch-lightning
RUN pip install tabulate
RUN pip install kornia==0.5.0
RUN pip install webdataset
RUN pip install omegaconf==2.1.2
RUN pip install transformers==4.28.1
RUN pip install accelerate
RUN pip install ftfy

Run the shell command file to build the custom image locally and push it to Amazon ECR:

%%capture build_output

reponame = "torchserve-mme-demo"
versiontag = "genai-0.1"

# Build our own docker image
!cd workspace/docker && ./build_and_push.sh {reponame} {versiontag} {baseimage} {region} {account}

Prepare the model artifacts

The main difference for the new MMEs with TorchServe support is how you prepare your model artifacts. The code repo provides a skeleton folder for each model (models folder) to house the required files for TorchServe. We follow the same four-step process to prepare each model .tar file. The following code is an example of the skeleton folder for the SD model:

workspace
|--sd
   |-- custom_handler.py
   |-- model-config.yaml

The first step is to download the pre-trained model checkpoints in the models folder:

import diffusers
import torch
import transformers

pipeline = diffusers.StableDiffusionInpaintPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-inpainting", torch_dtype=torch.float16
)

sd_dir = "workspace/sd/model"
pipeline.save_pretrained(sd_dir)

The next step is to define a custom_handler.py file. This is required to define the behavior of the model when it receives a request, such as loading the model, preprocessing the input, and postprocessing the output. The handle method is the main entry point for requests, and it accepts a request object and returns a response object. It loads the pre-trained model checkpoints and applies the preprocess and postprocess methods to the input and output data. The following code snippet illustrates a simple structure of the custom_handler.py file. For more detail, refer to the TorchServe handler API.

def initialize(self, ctx: Context):

def preprocess(self, data):

def inference(self, data):

def handle(self, data, context):
    requests = self.preprocess(data)
    responses = self.inference(requests)

    return responses

The last required file for TorchServe is model-config.yaml. The file defines the configuration of the model server, such as number of workers and batch size. The configuration is at a per-model level, and an example config file is shown in the following code. For a complete list of parameters, refer to the GitHub repo.

minWorkers: 1
maxWorkers: 1
batchSize: 1
maxBatchDelay: 200
responseTimeout: 300

The final step is to package all the model artifacts into a single .tar.gz file using the torch-model-archiver module:

!torch-model-archiver --model-name sd --version 1.0 --handler workspace/sd/custom_handler.py --extra-files workspace/sd/model --config-file workspace/sam/model-config.yaml --archive-format no-archive!cd sd && tar cvzf sd.tar.gz .

Create the multi-model endpoint

The steps to create a SageMaker MME are the same as before. In this particular example, you spin up an endpoint using the SageMaker SDK. Start by defining an Amazon Simple Storage Service (Amazon S3) location and the hosting container. This S3 location is where SageMaker will dynamically load the models base on invocation patterns. The hosting container is the custom container you built and pushed to Amazon ECR in the earlier step. See the following code:

# This is where our MME will read models from on S3.
multi_model_s3uri = output_path

Then you want to define a MulitDataModel that captures all the attributes like model location, hosting container, and permission access:

print(multi_model_s3uri)
model = Model(
    model_data=f"{multi_model_s3uri}/sam.tar.gz",
    image_uri=container,
    role=role,
    sagemaker_session=smsess,
    env={"TF_ENABLE_ONEDNN_OPTS": "0"},
)

mme = MultiDataModel(
    name="torchserve-mme-genai-" + datetime.now().strftime("%Y-%m-%d-%H-%M-%S"),
    model_data_prefix=multi_model_s3uri,
    model=model,
    sagemaker_session=smsess,
)
print(mme)

The deploy() function creates an endpoint configuration and hosts the endpoint:

mme.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.2xlarge",
    serializer=sagemaker.serializers.JSONSerializer(),
    deserializer=sagemaker.deserializers.JSONDeserializer(),
)

In the example we provided, we also show how you can list models and dynamically add new models using the SDK. The add_model() function copies your local model .tar files into the MME S3 location:

# Only sam.tar.gz visible!
list(mme.list_models())

models = ["sd/sd.tar.gz", "lama/lama.tar.gz"]
for model in models:
    mme.add_model(model_data_source=model)

Invoke the models

Now that we have all three models hosted on an MME, we can invoke each model in sequence to build our language-assisted editing features. To invoke each model, provide a target_model parameter in the predictor.predict() function. The model name is just the name of the model .tar file we uploaded. The following is an example code snippet for the SAM model that takes in a pixel coordinate, a point label, and dilate kernel size, and generates a segmentation mask of the object in the pixel location:

img_file = "workspace/test_data/sample1.png"
img_bytes = None

with Image.open(img_file) as f:
    img_bytes = encode_image(f)

gen_args = json.dumps(dict(point_coords=[750, 500], point_labels=1, dilate_kernel_size=15))

payload = json.dumps({"image": img_bytes, "gen_args": gen_args}).encode("utf-8")

response = predictor.predict(data=payload, target_model="/sam.tar.gz")
encoded_masks_string = json.loads(response.decode("utf-8"))["generated_image"]
base64_bytes_masks = base64.b64decode(encoded_masks_string)

with Image.open(io.BytesIO(base64_bytes_masks)) as f:
    generated_image_rgb = f.convert("RGB")
    generated_image_rgb.show()

To remove an unwanted object from an image, take the segmentation mask generated from SAM and feed that into the LaMa model with the original image. The following images show an example.

Sample image

Segmentation mask from SAM

Erase the dog using LaMa

To modify or replace any object in an image with a text prompt, take the segmentation mask from SAM and feed it into SD model with the original image and text prompt, as shown in the following example.

Sample image

Segmentation mask from SAM

Replace using SD model with text prompt

“a hamster on a bench”

Cost savings

The benefits of SageMaker MMEs increase based on the scale of model consolidation. The following table shows the GPU memory usage of the three models in this post. They are deployed on one g5.2xlarge instance by using one SageMaker MME.

Model GPU Memory (MiB)
Segment Anything Model 3,362
Stable Diffusion In Paint 3,910
Lama 852

You can see cost savings when hosting the three models with one endpoint, and for use cases with hundreds or thousands of models, the savings are much greater.

For example, consider 100 Stable Diffusion models. Each of the models on its own could be served by an ml.g5.2xlarge endpoint (4 GiB memory), costing $1.52 per instance hour in the US East (N. Virginia) Region. To provide all 100 models using their own endpoint would cost $218,880 per month. With a SageMaker MME, a single endpoint using ml.g5.2xlarge instances can host four models simultaneously. This reduces production inference costs by 75% to only $54,720 per month. The following table summarizes the differences between single-model and multi-model endpoints for this example. Given an endpoint configuration with sufficient memory for your target models, steady state invocation latency after all models have been loaded will be similar to that of a single-model endpoint.

. Single-model endpoint Multi-model endpoint
Total endpoint price per month $218,880 $54,720
Endpoint instance type ml.g5.2xlarge ml.g5.2xlarge
CPU Memory capacity (GiB) 32 32
GPU Memory capacity (GiB) 24 24
Endpoint price per hour $1.52 $1.52
Number of instances per endpoint 2 2
Endpoints needed for 100 models 100 25

Clean up

After you are done, please follow the instructions in the cleanup section of the notebook to delete the resources provisioned in this post to avoid unnecessary charges. Refer to Amazon SageMaker Pricing for details on the cost of the inference instances.

Conclusion

This post demonstrates the language-assisted editing capabilities made possible through the use of generative AI models hosted on SageMaker MMEs with TorchServe. The example we shared illustrates how we can use resource sharing and simplified model management with SageMaker MMEs while still utilizing TorchServe as our model serving stack. We utilized three deep learning foundation models: SAM, SD 2 Inpainting, and LaMa. These models enable us to build powerful capabilities, such as erasing any unwanted object from an image and modifying or replacing any object in an image by supplying a text instruction. These features can help artists and content creators work more efficiently and meet their content demands by automating repetitive tasks, optimizing campaigns, and providing a hyper-personalized experience. We invite you to explore the example provided in this post and build your own UI experience using TorchServe on a SageMaker MME.

To get started, see Supported algorithms, frameworks, and instances for multi-model endpoints using GPU backed instances.


About the authors

James Wu is a Senior AI/ML Specialist Solution Architect at AWS. helping customers design and build AI/ML solutions. James’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. Prior to joining AWS, James was an architect, developer, and technology leader for over 10 years, including 6 years in engineering and 4 years in marketing & advertising industries.

Li NingLi Ning is a senior software engineer at AWS with a specialization in building large-scale AI solutions. As a tech lead for TorchServe, a project jointly developed by AWS and Meta, her passion lies in leveraging PyTorch and AWS SageMaker to help customers embrace AI for the greater good. Outside of her professional endeavors, Li enjoys swimming, traveling, following the latest advancements in technology, and spending quality time with her family.

Ankith GunapalAnkith Gunapal is an AI Partner Engineer at Meta (PyTorch). He is passionate about model optimization and model serving, with experience ranging from RTL verification, embedded software, computer vision, to PyTorch. He holds a Master’s in Data Science and a Master’s in Telecommunications. Outside of work, Ankith is also an electronic dance music producer.

Saurabh Trikande is a Senior Product Manager for Amazon SageMaker Inference. He is passionate about working with customers and is motivated by the goal of democratizing machine learning. He focuses on core challenges related to deploying complex ML applications, multi-tenant ML models, cost optimizations, and making deployment of deep learning models more accessible. In his spare time, Saurabh enjoys hiking, learning about innovative technologies, following TechCrunch and spending time with his family.

Subhash TalluriSubhash Talluri is a Lead AI/ML solutions architect of the Telecom Industry business unit at Amazon Web Services. He’s been leading development of innovative AI/ML solutions for Telecom customers and partners worldwide. He brings interdisciplinary expertise in engineering and computer science to help build scalable, secure, and compliant AI/ML solutions via cloud-optimized architectures on AWS.

Read More

A Powerful Legacy: Researcher’s Mom Fueled Passion for Nuclear Fusion

A Powerful Legacy: Researcher’s Mom Fueled Passion for Nuclear Fusion

Editor’s note: This is part of a series profiling researchers advancing science with high performance computing. 

Before she entered high school, Ge Dong wanted to be a physicist like her mom, a professor at Shanghai Jiao Tong University.

“She said clean energy was really important for sustaining humanity, she talked about it a lot,” said Ge Dong (above at age two with her mom).

Picture of Ge Dong
Ge Dong

At 32, she’s following that dream at a startup that hopes to find — with the help of HPC and AI — a commercial path to nuclear fusion.

Pioneering AI in Physics

In 2014, her life’s work took her more than 7,000 miles from her Shanghai home to Princeton University’s prestigious plasma physics lab, where she earned a Ph.D.

Her doctoral thesis was built on advances by Princeton colleagues. They were the first to use AI to predict plasma disruptions that could cause a fusion reactor to fail.

Ge Dong’s work shed light on how the edges of plasma, hotter than the surface of the sun, behave inside a prototype fusion reactor, a donut-shaped enclosure called a tokamak.

Later, she spent more than a year working with her colleagues and NVIDIA experts to create with NVIDIA Omniverse a digital twin to show how plasma circles inside a tokamak. Using AI, the effort slashed the costs of a simulation based on traditional number-crunching methods.

The results may help engineers build controls that keep superheated plasma safely inside tomorrow’s power plants, speeding the arrival of the clean energy source.

A Pivotal Conversation

During the Covid lockdown, Ge Dong returned to Shanghai to work from home. There, in 2021, a pivotal conversation with a friend, Zhou Yang, led to the decision to co-found Energy Singularity, a startup with an ambitious plan.

Yang said he wanted to build a tokamak. When she dismissed the multibillion-dollar idea, he gave a detailed breakdown of a plan that would cost far less.

Picture of startup Energy Singularity team including Ge Dong
The Energy Singularity team with their superconducting magnets.

Then he explained why he wanted to take an approach, popular among researchers, of using high-temperature superconducting magnets to control the plasma. Even though he studied a separate branch of physics, he could explain the rationale down to its fundamental equations.

After their talk, “I was so excited, I didn’t sleep the whole night,” she said of the bold plan.

A few months later, they joined three others to launch the company.

A Fresh Challenge for AI

Learning how to build and control the powerful, but fragile magnets is the startup’s chief technical challenge. The team is turning to HPC and AI to find its way.

“It’s a whole new area of research that’s ripe for the kind of statistical analysis AI can accelerate to deliver the most effective and lowest cost approach,” she said.

The startup is already designing its prototype on an NVIDIA-accelerated server in its office.

“We’ve been using NVIDIA GPUs for all our research, they’re one of the most important tools in plasma physics these days,” she said.

The Next Generation

The work can be all-consuming. No one on the team has had time to check out the free gym in their building. And it’s been a while since Ge Dong had a good game of badminton, a favorite pastime.

But she remains upbeat. Within a decade, someone will show the way to harnessing nuclear fusion, and it could be her company, she said.

Ge Dong is sure her five-year-old daughter will see her intense passion for plasma physics. But when it comes time to choose a career, she may hear a different calling in a fusion-powered world.

Check out other profiles in this series:

Read More