Extend Amazon SageMaker Pipelines to include custom steps using callback steps

Launched at AWS re:Invent 2020, Amazon SageMaker Pipelines is the first purpose-built, easy-to-use continuous integration and continuous delivery (CI/CD) service for machine learning (ML). With Pipelines, you can create, automate, and manage end-to-end ML workflows at scale.

You can extend your pipelines to include steps for tasks performed outside of Amazon SageMaker by taking advantage of custom callback steps. This feature lets you include tasks that are performed using other AWS services, third parties, or tasks run outside AWS. Before the launch of this feature, steps within a pipeline were limited to the supported native SageMaker steps. With the launch of this new feature, you can use the new CallbackStep to generate a token and add a message to an Amazon Simple Queue Service (Amazon SQS) queue. The message on the SQS queue triggers a task outside of the currently supported native steps. When that task is complete, you can call the new SendStepSuccess API with the generated token to signal that the callback step and corresponding tasks are finished and the pipeline run can continue.

In this post, we demonstrate how to use CallbackStep to perform data preprocessing using AWS Glue. We use an Apache Spark job to prepare NYC taxi data for ML training. The raw data has one row per taxi trip, and shows information like the trip duration, number of passengers, and trip cost. To train an anomaly detection model, we want to transform the raw data into a count of the number of passengers that took taxi rides over 30-minute intervals.

Although we could run this specific Spark job in SageMaker Processing, we use AWS Glue for this post. In some cases, we may need capabilities that Amazon EMR or AWS Glue offer, like support for Hive queries or integration with the AWS Glue metadata catalog, so we demonstrate how to invoke AWS Glue from the pipeline.

Solution overview

The pipeline step that launches the AWS Glue job sends a message to an SQS queue. The message contains the callback token we need to send success or failure information back to the pipeline. This callback token triggers the next step in the pipeline. When handling this message, we need a handler that can launch the AWS Glue job and reliably check for job status until the job completes. We have to keep in mind that a Spark job can easily take longer than 15 minutes (the maximum duration of a single AWS Lambda function invocation), and the Spark job itself could fail for a number of reasons. That last point is worth emphasizing: in most Apache Spark runtimes, the job code itself runs in transient containers under the control of a coordinator like Apache YARN. We can’t add custom code to YARN, so we need something outside the job to check for completion.

We can accomplish this task several ways:

  • Have a Lambda function launch a container task that creates the AWS Glue job and polls for job completion, then sends the callback back to the pipeline
  • Have a Lambda function send a work notification to another SQS queue, with a separate Lambda function that picks up the message, checks for job status, and requeues the message if the job isn’t complete
  • Use AWS Glue job event notifications to respond to job status events sent by AWS Glue

For this post, we use the first technique because it’s the simplest (but likely not the most efficient). For this, we build out the solution as shown in the following diagram.

The solution is one example of how to use the new CallbackStep to extend your pipeline to steps outside SageMaker (such as AWS Glue). You can apply the same general steps and architectural guidance to extend pipelines to other custom processes or tasks. In our solution, the pipeline runs the following tasks:

Data preprocessing

  • This step (Step 1 in the preceding diagram) uses CallbackStep to send a generated token and defined input payload to the configured SQS queue (2). In this example, the input sent to the SQS queue is the Amazon Simple Storage Service (Amazon S3) locations of the input data and the step output training data.
    • The new message in the SQS queue triggers a Lambda function (3) that is responsible for running an AWS Fargate task with Amazon Elastic Container Service (Amazon ECS) (4).
    • The Fargate task runs using a container image that is configured to run a task. The task in this case is an AWS Glue job (5) used to transform your raw data into training data stored in Amazon S3 (6). This task is also responsible for sending a callback message that signals either the job’s success or failure.
  • Model training – This step (7) runs when the previous callback step has completed successfully. It uses the generated training data to train a model using a SageMaker training job and the Random Cut Forest algorithm.
  • Package model – After the model is successfully trained, the model is packaged for deployment (8).
  • Deploy model – In this final step (9), the model is deployed using a batch transform job.

These pipeline steps are just examples; you can modify the pipeline to meet your use case, such as adding steps to register the model in the SageMaker Model Registry.

In the next sections, we discuss how to set up this solution.

Prerequisites

For the preceding pipeline, you need the prerequisites outlined in this section. The detailed setup of each of these prerequisites is available in the supporting notebook.

Notebook dependencies

To run the provided notebook, you need the following:

Pipeline dependencies

Your pipeline uses the following services:

  • SQS message queue – The callback step requires an SQS queue to trigger a task. For this, you need to create an SQS queue and ensure that AWS Identity and Access Management (IAM) permissions are in place that allow SageMaker to put a message in the queue and allow Lambda to poll the queue for new messages. See the following code:
sqs_client = boto3.client('sqs')
queue_url = ''
queue_name = 'pipeline_callbacks_glue_prep'
try:
    response = sqs_client.create_queue(QueueName=queue_name)
except:
    print(f"Failed to create queue")
  • Lambda function: The function is triggered by new messages put to the SQS queue. The function consumes these new messages and starts the ECS Fargate task. In this case, the Lambda execution IAM role needs permissions to pull messages from Amazon SQS, notify SageMaker of potential failures, and run the Amazon ECS task. For this solution, the function starts a task on ECS Fargate using the following code:
%%writefile queue_handler.py
import json
import boto3
import os
import traceback

ecs = boto3.client('ecs')
sagemaker = boto3.client('sagemaker')

def handler(event, context):   
    print(f"Got event: {json.dumps(event)}")
    
    cluster_arn = os.environ["cluster_arn"]
    task_arn = os.environ["task_arn"]
    task_subnets = os.environ["task_subnets"]
    task_sgs = os.environ["task_sgs"]
    glue_job_name = os.environ["glue_job_name"]
    print(f"Cluster ARN: {cluster_arn}")
    print(f"Task ARN: {task_arn}")
    print(f"Task Subnets: {task_subnets}")
    print(f"Task SG: {task_sgs}")
    print(f"Glue job name: {glue_job_name}")
    
    for record in event['Records']:
        payload = json.loads(record["body"])
        print(f"Processing record {payload}")
        
        token = payload["token"]
        print(f"Got token {token}")
        
        try:
            input_data_s3_uri = payload["arguments"]["input_location"]
            output_data_s3_uri = payload["arguments"]["output_location"]
            print(f"Got input_data_s3_uri {input_data_s3_uri}")
            print(f"Got output_data_s3_uri {output_data_s3_uri}")

            response = ecs.run_task(
                cluster = cluster_arn,
                count=1,
                launchType='FARGATE',
                taskDefinition=task_arn,
                networkConfiguration={
                    'awsvpcConfiguration': {
                        'subnets': task_subnets.split(','),
                        'securityGroups': task_sgs.split(','),
                        'assignPublicIp': 'ENABLED'
                    }
                },
                overrides={
                    'containerOverrides': [
                        {
                            'name': 'FargateTask',
                            'environment': [
                                {
                                    'name': 'inputLocation',
                                    'value': input_data_s3_uri
                                },
                                {
                                    'name': 'outputLocation',
                                    'value': output_data_s3_uri
                                },
                                {
                                    'name': 'token',
                                    'value': token
                                },
                                {
                                    'name': 'glue_job_name',
                                    'value': glue_job_name
                                }
                                
                            ]
                        }
                    ]
                }
            )
            if 'failures' in response and len(response['failures']) > 0:
                f = response['failures'][0]
                print(f"Failed to launch task for token {token}: {f['reason']}")
                sagemaker.send_step_failure(
                    CallbackToken=token,
                    FailureReason = f['reason']
                )
            else:
                print(f"Launched task {response['tasks'][0]['taskArn']}")
        except Exception as e:
            trc = traceback.format_exc()
            print(f"Error handling record: {str(e)}:m {trc}")
            sagemaker.send_step_failure(
                CallbackToken=token,
                FailureReason = e
            )
  • After we create the SQS queue and Lambda function, we need to set up the function as an SQS target so that when new messages are placed in the queue, the function is automatically triggered:
lambda_client.create_event_source_mapping(
    EventSourceArn=f'arn:aws:sqs:{region}:{account}:{queue_name}',
    FunctionName='SMPipelineQueueHandler',
    Enabled=True,
    BatchSize=10
) 
  • Fargate cluster – Because we use Amazon ECS to run and monitor the status of the AWS Glue job, we need to ensure we have an ECS Fargate cluster running:
import boto3
ecs = boto3.client('ecs')
response = ecs.create_cluster(clusterName='FargateTaskRunner')
  • Fargate task: We also need to create a container image with the code (task.py) that starts the data preprocessing job on AWS Glue and reports the status back to the pipeline upon the success or failure of that task. The IAM role attached to the task must include permissions that allow the task to pull images from Amazon ECR, create logs in Amazon CloudWatch, start and monitor an AWS Glue job, and send the callback token when the task is complete. When we issue send_pipeline_execution_step_success back to the pipeline, we also indicate the output file with the prepared training data. We use the output parameter in the model training step in the pipeline. The following is the code for task.py:
import boto3
import os
import sys
import traceback
import time

if 'inputLocation' in os.environ:
    input_uri = os.environ['inputLocation']
else:
    print("inputLocation not found in environment")
    sys.exit(1)
if 'outputLocation' in os.environ:
    output_uri = os.environ['outputLocation']
else:
    print("outputLocation not found in environment")
    sys.exit(1)
if 'token' in os.environ:
    token = os.environ['token']
else:
    print("token not found in environment")
    sys.exit(1)
if 'glue_job_name' in os.environ:
    glue_job_name = os.environ['glue_job_name']
else:
    print("glue_job_name not found in environment")
    sys.exit(1)

print(f"Processing from {input_uri} to {output_uri} using callback token {token}")
sagemaker = boto3.client('sagemaker')
glue = boto3.client('glue')

poll_interval = 60

try:
    
    t1 = time.time()
    response = glue.start_job_run(
        JobName=glue_job_name,
        Arguments={
            '--output_uri': output_uri,
            '--input_uri': input_uri
        }
    )
    job_run_id = response['JobRunId']
    print(f"Starting job {job_run_id}")
    
    job_status = 'STARTING'
    job_error = ''
    while job_status in ['STARTING','RUNNING','STOPPING']:
        time.sleep(poll_interval)
        response = glue.get_job_run(
            JobName=glue_job_name,
            RunId=job_run_id,
            PredecessorsIncluded=False
        )
        job_status = response['JobRun']['JobRunState']
        if 'ErrorMessage' in response['JobRun']:
            job_error = response['JobRun']['ErrorMessage']
        print(f"Job is in state {job_status}")
        
    t2 = time.time()
    total_time = (t2 - t1) / 60.0
    if job_status == 'SUCCEEDED':
        print("Job succeeded")
        sagemaker.send_pipeline_execution_step_success(
            CallbackToken=token,
            OutputParameters=[
                {
                    'Name': 'minutes',
                    'Value': str(total_time)
                },
                {
                    'Name': 's3_data_out',
                    'Value': str(output_uri),
                } 
            ]
        )
    else:
        print(f"Job failed: {job_error}")
        sagemaker.send_pipeline_execution_step_failure(
            CallbackToken=token,
            FailureReason = job_error
        )
except Exception as e:
    trc = traceback.format_exc()
    print(f"Error running ETL job: {str(e)}:m {trc}")
    sagemaker.send_pipeline_execution_step_failure(
        CallbackToken=token,
        FailureReason = str(e)
    )
  • Data preprocessing code – The pipeline callback step does the actual data preprocessing using a PySpark job running in AWS Glue, so we need to create the code that is used to transform the data:
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from pyspark.sql.types import IntegerType
from pyspark.sql import functions as F

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME', 'input_uri', 'output_uri'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

df = spark.read.format("csv").option("header", "true").load("{0}*.csv".format(args['input_uri']))
df = df.withColumn("Passengers", df["passenger_count"].cast(IntegerType()))
df = df.withColumn(
  'pickup_time',
  F.to_timestamp(
  F.unix_timestamp('tpep_pickup_datetime', 'yyyy-MM-dd HH:mm:ss').cast('timestamp')))
  
dfW = df.groupBy(F.window("pickup_time", "30 minutes")).agg(F.sum("Passengers").alias("passenger"))
dfOut = dfW.drop('window')
dfOut.repartition(1).write.option("timestampFormat", "yyyy-MM-dd HH:mm:ss").csv(args['output_uri'])

job.commit()
  • Data preprocessing job – We need to also configure the AWS Glue job that runs the preceding code when triggered by your Fargate task. The IAM role used must have permissions to read and write from the S3 bucket. See the following code:
glue = boto3.client('glue')
response = glue.create_job(
    Name='GlueDataPrepForPipeline',
    Description='Prepare data for SageMaker training',
    Role=glue_role_arn,
    ExecutionProperty={
        'MaxConcurrentRuns': 1
    },
    Command={
        'Name': 'glueetl',
        'ScriptLocation': glue_script_location,
    },
    MaxRetries=0,
    Timeout=60,
    MaxCapacity=10.0,
    GlueVersion='2.0'
)
glue_job_name = response['Name']

After these prerequisites are in place, including the necessary IAM permissions outlined in the example notebook, we’re ready to configure and run the pipeline.

Configure the pipeline

To build out the pipeline, we rely on the preceding prerequisites in the callback step that perform data processing. We also combine that with steps native to SageMaker for model training and deployment to create an end-to-end pipeline.

To configure the pipeline, complete the following steps:

  1. Initialize the pipeline parameters:
from sagemaker.workflow.parameters import (
    ParameterInteger,
    ParameterString,
)

input_data = ParameterString(
    name="InputData",
    default_value=f"s3://{default_bucket}/{taxi_prefix}/"
)
id_out = ParameterString(
    name="IdOut",
    default_value="taxiout"+ str(timestamp)
)
output_data = ParameterString(
    name="OutputData",
    default_value=f"s3://{default_bucket}/{taxi_prefix}_output/"
)
training_instance_count = ParameterInteger(
    name="TrainingInstanceCount",
    default_value=1
)
training_instance_type = ParameterString(
    name="TrainingInstanceType",
    default_value="ml.c5.xlarge"
)
  1. Configure the first step in the pipeline, which is CallbackStep.

This step uses the SQS queue created in the prerequisites in combination with arguments that are used by tasks in this step. These arguments include the inputs of the Amazon S3 location of the input (raw taxi data) and output training data. The step also defines the outputs, which in this case includes the callback output and Amazon S3 location of the training data. The outputs become the inputs to the next step in the pipeline. See the following code:

from sagemaker.workflow.callback_step import CallbackStep,CallbackOutput,CallbackOutputTypeEnum

callback1_output=CallbackOutput(output_name="s3_data_out", output_type=CallbackOutputTypeEnum.String)

step_callback_data = CallbackStep(
                    name="GluePrepCallbackStep",
                    sqs_queue_url=queue_url,
                    inputs={
                        "input_location": f"s3://{default_bucket}/{taxi_prefix}/",
                        "output_location": f"s3://{default_bucket}/{taxi_prefix}_{id_out}/"
                    },
                    outputs=[
                        callback1_output
                    ],
                )
  1. We use TrainingStep to train a model using the Random Cut Forest algorithm.

We first need to configure an estimator, then we configure the actual pipeline step. This step takes the output of the previous step and Amazon S3 location of the training data created by AWS Glue as input to train the model. See the following code:

containers = {
    'us-west-2': '174872318107.dkr.ecr.us-west-2.amazonaws.com/randomcutforest:latest',
    'us-east-1': '382416733822.dkr.ecr.us-east-1.amazonaws.com/randomcutforest:latest',
    'us-east-2': '404615174143.dkr.ecr.us-east-2.amazonaws.com/randomcutforest:latest',
    'eu-west-1': '438346466558.dkr.ecr.eu-west-1.amazonaws.com/randomcutforest:latest'}
region_name = boto3.Session().region_name
container = containers[region_name]
model_prefix = 'model'

session = sagemaker.Session()

rcf = sagemaker.estimator.Estimator(
    container,
    sagemaker.get_execution_role(),
    output_path='s3://{}/{}/output'.format(default_bucket, model_prefix),
    instance_count=training_instance_count,
    instance_type=training_instance_type,
    sagemaker_session=session)

rcf.set_hyperparameters(
    num_samples_per_tree=200,
    num_trees=50,
    feature_dim=1)
from sagemaker.inputs import TrainingInput
from sagemaker.workflow.steps import TrainingStep

step_train = TrainingStep(
    name="TrainModel",
    estimator=rcf,
    inputs={
        "train": TrainingInput(
        #s3_data = Output of the previous call back 
        steps3_data=step_callback_data.properties.Outputs['s3_data_out'],
        content_type="text/csv;label_size=0",
        distribution='ShardedByS3Key'
        ),
    },
)
  1. We use CreateModelStep to package the model for SageMaker deployment:
from sagemaker.model import Model
from sagemaker import get_execution_role
role = get_execution_role()

image_uri = sagemaker.image_uris.retrieve("randomcutforest", region)

model = Model(
    image_uri=image_uri,
    model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
    sagemaker_session=sagemaker_session,
    role=role,
    )
from sagemaker.inputs import CreateModelInput
from sagemaker.workflow.steps import CreateModelStep

inputs = CreateModelInput(
    instance_type="ml.m5.large",
)

create_model = CreateModelStep(
    name="TaxiModel",
    model=model,
    inputs=inputs,
)
  1. We deploy the trained model using a SageMaker batch transform job using TransformStep.

This step loads the trained model and processes the prediction request data stored in Amazon S3, then outputs the results (anomaly scores in this case) to the specified Amazon S3 location. See the following code:

base_uri = step_callback_data.properties.Outputs['s3_data_out']
output_prefix = 'batch-out'

from sagemaker.transformer import Transformer

transformer = Transformer(
    model_name=create_model.properties.ModelName,
    instance_type="ml.m5.xlarge",
    assemble_with = "Line",
    accept = 'text/csv',
    instance_count=1,
    output_path=f"s3://{default_bucket}/{output_prefix}/",
)
from sagemaker.inputs import TransformInput
from sagemaker.workflow.steps import TransformStep

batch_data=step_callback_data.properties.Outputs['s3_data_out']

step_transform = TransformStep(
    name="TaxiTransform",
    transformer=transformer,
    inputs=TransformInput(data=batch_data,content_type="text/csv",split_type="Line",input_filter="$[0]",join_source='Input',output_filter='$[0,-1]')
)

Create and run the pipeline

You’re now ready to create and run the pipeline. To do this, complete the following steps:

  1. Define the pipeline including the parameters accepted and steps:
from sagemaker.workflow.pipeline import Pipeline

pipeline_name = f"GluePipeline-{id_out}"
pipeline = Pipeline(
    name=pipeline_name,
    parameters=[
        input_data,
        training_instance_type,
        training_instance_count,
        id_out,
    ],
    steps=[step_callback_data, step_train,create_model,step_transform],
)
  1. Submit the pipeline definition to create the pipeline using the role that is used to create all the jobs defined in each step:
from sagemaker import get_execution_role
pipeline.upsert(role_arn = get_execution_role())
  1. Run the pipeline:
execution = pipeline.start()

You can monitor your pipeline using the SageMaker SDK, execution.list_steps(), or via the Studio console, as shown in the following screenshot.

Use CallbackStep to integrate other tasks outside of SageMaker

You can follow the same pattern to integrate any long-running tasks or jobs with Pipelines. This may include running AWS Batch jobs, Amazon EMR job flows, or Amazon ECS or Fargate tasks.

You can also implement an email approval step for your models as part of your ML pipeline.
CallbackStep runs after the model EvaluationStep and sends an email containing approve or reject links with model metrics to a user. The workflow progresses to the next state after the user approves the task to proceed.

You can implement this pattern using a Lambda function and Amazon Simple Notification Service (Amazon SNS).

Conclusion

In this post, we showed you an example of how to use CallbackStep in Pipelines to extend your pipelines to integrate an AWS Glue job for data preprocessing. You can follow the same process to integrate any task or job outside of SageMaker. You can walk through the full solution explained in the example notebook.


About the Author

Shelbee Eigenbrode is a Principal AI and Machine Learning Specialist Solutions Architect at Amazon Web Services (AWS). She holds 6 AWS certifications and has been in technology for 23 years spanning multiple industries, technologies, and roles. She is currently focusing on combining her DevOps and ML background to deliver and manage ML workloads at scale. With over 35 patents granted across various technology domains, she has a passion for continuous innovation and using data to drive business outcomes. Shelbee co-founded the Denver chapter of Women in Big Data.

 

Sofian Hamiti is an AI/ML specialist Solutions Architect at AWS. He helps customers across industries accelerate their AI/ML journey by helping them build and operationalize end-to-end machine learning solutions.

 

 

 

Randy DeFauw is a principal solutions architect at Amazon Web Services. He works with the AWS customers to provide guidance and technical assistance on database projects, helping them improve the value of their solutions when using AWS.

 

 

 

Payton Staub is a senior engineer with Amazon SageMaker. His current focus includes model building pipelines, experiment management, image management and other tools to help customers productionize and automate machine learning at scale.

Read More

Enhancing customer service experiences using Conversational AI: Power your contact center with Amazon Lex and Genesys Cloud

Customers expect personalized contact center experiences. They want easy access to customer support and quick resolution of their issues. Delighting callers with a quick and easy interaction remains central to the customer experience (CX) strategy for support organizations. Enterprises often deploy omni-channel contact centers so that they can provide simple mechanisms for their customers to access customer support. But even with these efforts, callers face long wait times, especially during peak hours, which can lead to lower CSAT scores. In addition, organizations have to manage support costs as their footprint expands. As the customer base grows, operational costs for managing a contact center can rapidly increase.

With Amazon Lex bots, you can use conversational AI capabilities to provide highly engaging and lifelike conversational experiences. Organizations can use Amazon Lex to automate customer service interactions and deliver faster responses to queries. As a result, customer issues are resolved in real time, reducing wait times and driving higher satisfaction. You can use Amazon Lex to handle the most common problems encountered by customers. Furthermore, complex issues that require human intervention can be seamlessly handed over from the Amazon Lex bot to a human agent. Augmenting your contact center operations with Amazon Lex bots provides an enhanced caller experience, while optimizing your operational costs with self-service automation. In addition, you can seamlessly scale your contact center operations on the AWS Cloud as your user base grows.

We’re excited to announce Amazon Lex V2 bot support on the Genesys Cloud platform. With this launch, you can build an Amazon Lex bot and set up your contact center in minutes.

About Amazon Lex V2 APIs and Genesys Cloud

Amazon Lex launched V2 APIs and a new console interface that makes it easier to build, deploy, and manage conversational experiences. The Lex V2 console and API enhancements provide support for multiple languages in a single bot, enables simplified versioning, and provides builder productivity tools. These features provide you more control over the bot building and deployment processes.

Genesys Cloud (an omni-channel orchestration and customer relationship platform) provides a contact center platform in a public cloud model that enables quick and simple integration of AWS Contact Center Intelligence (AWS CCI) to transform the modern contact center from a cost center into a profit center. As part of AWS CCI, Genesys cloud integrates with Amazon Lex, Amazon Polly (text to speech) and Amazon Kendra (intelligent search) to offer self-service conversational AI capabilities.

Key features

Genesys Cloud uses the continuous streaming capability with Amazon Lex V2 APIs to enable advanced IVR conversations. With this integration, you can now enable the following:

  • Interruptions (“barge-in”) – Callers can now interrupt the bot and answer a question before the prompt is completed
  • Wait and Continue – Callers can instruct the bot to wait if they need time for retrieving additional information during the call (such as a credit card number or booking ID)
  • DTMF support – Callers can provide information via speech or DTMF interchangeably
  • SSML support – You can configure prompts within the Amazon Lex bot using SSML tags, enabling greater control over speech generation from text
  • Configurable timeouts – You can configure how long to wait for the customer to finish speaking before Amazon Lex collects speech input from callers, such as answering a yes/no question, or providing a date or credit card number

Creating the bot

Let’s create a banking bot as an example and integrate with Genesys Cloud for IVR-based interactions. For a step-by-step process to build an Amazon Lex bot, refer to banker bot workshop. You can also download the bot and import it using the Amazon Lex V2 console.

In addition to the intents presented in the workshop, we add a SpeakToAgent intent to enable handing over the conversation to a human agent based on user requests.

Enabling the integrations

The Amazon Lex V2 integration is available for installation via Genesys AppFoundry. You need an active subscription for premium applications to access the Integration page from the Genesys Cloud Admin dashboard. Genesys also offers a free trial for validation purposes.

1. Configure the IAM role

As invocations for Amazon Lex take place in your AWS environment, you configure an AWS Identity and Access Management (IAM) role with proper permission for Genesys Cloud to assume the role and use resources.

  1. Create an IAM role and select trusted entity to be Another AWS account.
  2. Enter the Genesys Cloud production ID 765628985471 in the Account ID field.
  3. As part of the AWS best practices, you should select Require external ID and enter your organization’s ID to prevent the confused deputy problem and enhance integration security.

By default, IAM roles don’t have permission to create or modify AWS resources. For Genesys Cloud to successfully access Amazon Lex bots, a few permissions are required.

  1. Choose Create Policy and enter the following JSON blob into the policy editor.
{
     "Version": "2012-10-17",
     "Statement": [
          {
               "Sid": "GenesysLexPolicy",
               "Effect": "Allow",
               "Action": [
                    "lex:List*",
                    "lex:Describe*",
                    "lex:StartConversation",
             		"lex:Recognize*",
 					"lex:DeleteSession",
             		"lex:GetSession",
  			  		"lex:PutSession"
              		  ],
                "Resource": "*"
          }
     ]
}
  1. Attach the policy to the role created previously.
  2. Copy the role ARN and configure it within Genesys Cloud.
  3. Save and set the integration status to activate the bot.

2. Configure Amazon Polly

To use Amazon Lex for a voice bot, you set up the text to speech (TTS) capability. Genesys Cloud supports several TTS engines, including Amazon Polly. You can install and configure the Amazon Polly integration following the Genesys documentation. You can then select Amazon Polly as the engine and configure the voice you prefer. To keep the IVR voice consistent in the call flow, the Amazon Polly voice selected in Genesys Cloud should be the same voice configured in your Lex bot. For additional details, see a list of available voices and the associated characteristics.

3. Configure the Genesys Cloud Architect flow

Create an Inbound Call Flow in Architect to orchestrate your bot interaction. You add a Reusable Tasks and use Call Lex V2 bot action to bring in the Amazon Lex bot and design various actions in the call flow.

The integration also allows Genesys Cloud to capture the preconfigured slots as Architect variables. These variables can be used outside of the bot for use-cases such as application of business rules. For example, if a customer provides an account ID that matches with the VIP customer segment, the call can be routed to the priority support queue when transferring to an agent.

4. Configure graceful escalation

When the automated solution can’t fulfill a customer’s request, the interaction should be escalated gracefully. This fallback process allows a human agent to take over the interaction for more complex tasks.

You can save key information from the prior exchange (such as intents, slots, and conversation transcripts) into a script to provide historical context to the agent so that conversations can be picked up seamlessly. This prevents customers from wasting valuable time to repeat the information provided previously.

In the following example, the call is transferred to an available Tier 1 support agent when a customer asks for more help or to be connected to an agent. You can also collect additional context from the customer and hand off to either another bot or human based on specialty.

5. Test the integrations

You can use the native soft phone in Genesys Cloud to make calls as you would with a desktop phone and validate the integration. Enter the bot’s name in the Enter Names and Numbers field and choose Call to follow the prompts.

Summary

Enterprises increasingly invest in automated solutions such as IVR and chatbots as a part of their customer service strategy in contact centers. Automation provides highly available support that handles common tasks without the presence of a live agent, while reducing operational cost.

With the adoption of the Amazon Lex V2 APIs, Genesys Cloud provides an overall improved user experience using the continuous streaming architecture, and enables a more natural customer-bot interaction.

This post outlines the key steps to enable the Amazon Lex V2 integration in your Genesys Cloud environment, and should give you a jump start to create and customize your own chatbot initiative. Check out the following resources for additional information:


About the Author

Anubhav Mishra is a Product Manager with AWS. He spends his time understanding customers and designing product experiences to address their business challenges.

 

 

 

Jessica Ho is a Solutions Architect at Amazon Web Services supporting ISV partners who build business applications on AWS. She is passionate about creating differentiated solutions that unlock customers for cloud adoption. Outside of work, she enjoys spoiling her garden into a mini jungle.

Read More

Application now open for the 2022 Facebook Fellowship program

Application for the 2022 Facebook Fellowship program is now open and closes on September 20, 2021. The program supports promising PhD students conducting research in areas related to computer science and engineering, from AR/VR to security and privacy. Head to the Facebook Fellowship page to see all the fellowships available for 2022 and to read descriptions from research teams. Those eligible can apply at the link below.
ApplyEach year, thousands of bright and passionate PhD students from all over the world apply to become a Facebook Fellow. The Fellowship program is highly competitive, with only a handful of applicants selected each cycle. To help prepare potential applicants, we’ve put together a list of resources. Check out the following blog posts for tips and advice from Facebook Fellowship alumni as well as application reviewers.

Resources for applicants

The six most common Fellowship questions, answered by Facebook Fellow Moses Namara

As a former Emerging Scholar, 2020 Fellow Moses Namara knows the Fellowship program like the back of his hand. In this Q&A, Moses offers advice about writing a research statement, navigating the application process, being a Facebook Fellow, and knowing whether you’re qualified to apply.

Read Q&A

Fellowship 101: Facebook Fellow Daricia Wilkinson outlines the basics for PhDs

In this Q&A, 2019 Fellow Daricia Wilkinson breaks down the basics for PhD students looking to get their research funded. Inspired by Wilkinson’s Medium post about how to make a successful PhD fellowship application, this Q&A outlines the most common questions Wilkinson receives about fellowships, research statements, and the application process.

Read Q&A

Five tips for a successful Facebook Fellowship application from the people who review them

Last year, we connected with some reviewers to discuss what they look for in an application and what advice they would give to prospective applicants. Drawing from their experience reading hundreds of research statements, CVs, and letters of recommendation, they came up with five tips for a successful application.

Read blog

Applying twice: How Facebook Fellow David Pujol adjusted his application for success

It’s pretty common for PhD students to apply for the Fellowship program the following year if they didn’t get selected the first time they applied. In this Q&A, 2020 Fellow David Pujol tells us about his first approach to his Fellowship application, what changed the second time, what he spent the most time on in his applications, and more.

Read Q&A

Fellow spotlights, career advice, and more

We frequently reach out to Facebook Fellowship alumni to highlight them on our blog. Browse the Fellowship section of our blog to read more about the bright and talented PhD students that we see in the program.

Browse blogs

Application for the 2021 Facebook Fellowship program closes September 20, 2021. Apply and learn more about eligibility criteria, application requirements, available fellowships, and more on the Facebook Fellowship page.

The post Application now open for the 2022 Facebook Fellowship program appeared first on Facebook Research.

Read More

Find Your Groove: Add NVIDIA AI Essentials Series to Your Summer Playlist

If AI, data science, graphics or robotics is your jam, stream the NVIDIA AI Essentials Learning Series this summer.

These intro-level courses provide foundational knowledge to students and early-career developers looking to broaden their areas of expertise. The free series includes over a dozen sessions — each less than an hour long — on topics including conversational AI, ray tracing and robotics fundamentals.

For a deeper dive, register for daylong hands-on courses from the NVIDIA Deep Learning Institute and earn a certificate of competency to boost your resume. To date, DLI has trained more than 300,000 developers through an extensive catalog of self-paced and instructor-led courses and workshops.

Besides picking up new skills to boost your career journey, there are summer sweepstakes at stake: For every 30 minutes a participant watches the learning series, they earn an entry to win a DLI book and free registration codes for self-paced courses.

For those who register for DLI courses on deep learning, accelerated data science and AI on the NVIDIA Jetson Nano, we’re upping the game. Upon successful completion of one or more of these courses, participants will be entered to win an NVIDIA GeForce RTX 3090 GPU in addition to three DLI registration codes and the book.

Here’s a taste of what students, educators and early-career technologists will find in the learning series.

Deep Learning Demystified

This talk covers what deep learning is, what it’s good for and why it’s such a powerful technology. Will Ramey, senior director and global head of developer programs at NVIDIA, talks through the different types of neural networks and explains how they’re trained, optimized and deployed in industries like healthcare and energy. He also discusses some of the challenges organizations face when adopting deep learning.

Ray Tracing in One Weekend

Ray tracing brings stunning, realistic visuals to the movie and gaming industries — but how does it work? In this session, NVIDIA researcher Pete Shirley guides viewers through the process of writing code to generate a ray-traced image of a 3D scene. As he goes, Shirley explains key concepts that form the foundation of ray tracing.

Conversational AI Demystified 

Building a conversational AI model requires developers to achieve two key features: high accuracy and low latency. This session provides an overview of conversionational AI models for automatic speech recognition, natural language processing and text-to-speech. Meriem Bendris, a solution architect at NVIDIA, shares how to train and fine-tune these models using NVIDIA NeMo, the Transfer Learning Toolkit and the NVIDIA Riva application framework.

Jetson Nano: From Zero to Hero in 20 Minutes

Learn how to get started from scratch with the NVIDIA Jetson Nano 2GB developer kit — even if you’re not a developer. This crash course, led by NVIDIA’s Asier Arranz Jiminez, walks through the process from installation to inference with an AI model that detects vehicles, roads and pedestrians.

To get started, visit the NVIDIA AI Essentials Learning Series homepage. Thousands more talks are available to stream free through NVIDIA On-Demand.

The post Find Your Groove: Add NVIDIA AI Essentials Series to Your Summer Playlist appeared first on The Official NVIDIA Blog.

Read More

What’s New in PyTorch Profiler 1.9?

PyTorch Profiler v1.9 has been released! The goal of this new release (previous PyTorch Profiler release) is to provide you with new state-of-the-art tools to help diagnose and fix machine learning performance issues regardless of whether you are working on one or numerous machines. The objective is to target the execution steps that are the most costly in time and/or memory, and visualize the work load distribution between GPUs and CPUs.

Here is a summary of the five major features being released:

  1. Distributed Training View: This helps you understand how much time and memory is consumed in your distributed training job. Many issues occur when you take a training model and split the load into worker nodes to be run in parallel as it can be a black box. The overall model goal is to speed up model training. This distributed training view will help you diagnose and debug issues within individual nodes.
  2. Memory View: This view allows you to understand your memory usage better. This tool will help you avoid the famously pesky Out of Memory error by showing active memory allocations at various points of your program run.
  3. GPU Utilization Visualization: This tool helps you make sure that your GPU is being fully utilized.
  4. Cloud Storage Support: Tensorboard plugin can now read profiling data from Azure Blob Storage, Amazon S3, and Google Cloud Platform.
  5. Jump to Source Code: This feature allows you to visualize stack tracing information and jump directly into the source code. This helps you quickly optimize and iterate on your code based on your profiling results.

Getting Started with PyTorch Profiling Tool

PyTorch includes a profiling functionality called « PyTorch Profiler ». The PyTorch Profiler tutorial can be found here.

To instrument your PyTorch code for profiling, you must:

$ pip install torch-tb-profiler

import torch.profiler as profiler
With profiler.profile(XXXX)

Comments:

• For CUDA and CPU profiling, see below:

with torch.profiler.profile( 
activities=[ 
torch.profiler.ProfilerActivity.CPU, 
torch.profiler.ProfilerActivity.CUDA], 

• With profiler.record_function(“$NAME”): allows putting a decorator (a tag associated to a name) for a block of function

• Profile_memory=True parameter under profiler.profile allows you to profile CPU and GPU memory footprint

Visualizing PyTorch Model Performance using PyTorch Profiler

Distributed Training

Recent advances in deep learning argue for the value of large datasets and large models, which requires you to scale out model training to more computational resources. Distributed Data Parallel (DDP) and NVIDIA Collective Communications Library (NCCL) are the widely adopted paradigms in PyTorch for accelerating your deep learning training.

In this release of PyTorch Profiler, DDP with NCCL backend is now supported.

Computation/Communication Overview

In the Computation/Communication overview under the Distributed training view, you can observe the computation-to-communication ratio of each worker and [load balancer](https://en.wikipedia.org/wiki/Load_balancing_(computing) nodes between worker as measured by granularity.

Scenario 1:

If the computation and overlapping time of one worker is much larger than the others, this may suggest an issue in the workload balance or worker being a straggler. Computation is the sum of kernel time on GPU minus the overlapping time. The overlapping time is the time saved by interleaving communications during computation. The more overlapping time represents better parallelism between computation and communication. Ideally the computation and communication completely overlap with each other. Communication is the total communication time minus the overlapping time. The example image below displays how this scenario appears on Tensorboard.

Figure: A straggler example

Scenario 2:

If there is a small batch size (i.e. less computation on each worker) or the data to be transferred is large, the computation-to-communication may also be small and be seen in the profiler with low GPU utilization and long waiting times. This computation/communication view will allow you to diagnose your code to reduce communication by adopting gradient accumulation, or to decrease the communication proportion by increasing batch size. DDP communication time depends on model size. Batch size has no relationship with model size. So increasing batch size could make computation time longer and make computation-to-communication ratio bigger.

Synchronizing/Communication Overview

In the Synchronizing/Communication view, you can observe the efficiency of communication. This is done by taking the step time minus computation and communication time. Synchronizing time is part of the total communication time for waiting and synchronizing with other workers. The Synchronizing/Communication view includes initialization, data loader, CPU computation, and so on Insights like what is the ratio of total communication is really used for exchanging data and what is the idle time of waiting for data from other workers can be drawn from this view.

For example, if there is an inefficient workload balance or straggler issue, you’ll be able to identify it in this Synchronizing/Communication view. This view will show several workers’ waiting time being longer than others.

This table view above allows you to see the detailed statistics of all communication ops in each node. This allows you to see what operation types are being called, how many times each op is called, what is the size of the data being transferred by each op, etc.

Memory View:

This memory view tool helps you understand the hardware resource consumption of the operators in your model. Understanding the time and memory consumption on the operator-level allows you to resolve performance bottlenecks and in turn, allow your model to execute faster. Given limited GPU memory size, optimizing the memory usage can:

  1. Allow bigger model which can potentially generalize better on end level tasks.
  2. Allow bigger batch size. Bigger batch sizes increase the training speed.

The profiler records all the memory allocation during the profiler interval. Selecting the “Device” will allow you to see each operator’s memory usage on the GPU side or host side. You must enable profile_memory=True to generate the below memory data as shown here.

With torch.profiler.profile(
Profiler_memory=True # this will take 1 – 2 minutes to complete. 
)

Important Definitions:

• “Size Increase” displays the sum of all allocation bytes and minus all the memory release bytes.

• “Allocation Size” shows the sum of all allocation bytes without considering the memory release.

• “Self” means the allocated memory is not from any child operators, instead by the operator itself.

GPU Metric on Timeline:

This feature will help you debug performance issues when one or more GPU are underutilized. Ideally, your program should have high GPU utilization (aiming for 100% GPU utilization), minimal CPU to GPU communication, and no overhead.

Overview:
The overview page highlights the results of three important GPU usage metrics at different levels (i.e. GPU Utilization, Est. SM Efficiency, and Est. Achieved Occupancy). Essentially, each GPU has a bunch of SM each with a bunch of warps that can execute a bunch of threads concurrently. Warps execute a bunch because the amount depends on the GPU. But at a high level, this GPU Metric on Timeline tool allows you can see the whole stack, which is useful.

If the GPU utilization result is low, this suggests a potential bottleneck is present in your model. Common reasons:

•Insufficient parallelism in kernels (i.e., low batch size)

•Small kernels called in a loop. This is to say the launch overheads are not amortized

•CPU or I/O bottlenecks lead to the GPU not receiving enough work to keep busy

Looking of the overview page where the performance recommendation section is where you’ll find potential suggestions on how to increase that GPU utilization. In this example, GPU utilization is low so the performance recommendation was to increase batch size. Increasing batch size 4 to 32, as per the performance recommendation, increased the GPU Utilization by 60.68%.

GPU Utilization: the step interval time in the profiler when a GPU engine was executing a workload. The high the utilization %, the better. The drawback of using GPU utilization solely to diagnose performance bottlenecks is it is too high-level and coarse. It won’t be able to tell you how many Streaming Multiprocessors are in use. Note that while this metric is useful for detecting periods of idleness, a high value does not indicate efficient use of the GPU, only that it is doing anything at all. For instance, a kernel with a single thread running continuously will get a GPU Utilization of 100%

Estimated Stream Multiprocessor Efficiency (Est. SM Efficiency) is a finer grained metric, it indicates what percentage of SMs are in use at any point in the trace This metric reports the percentage of time where there is at least one active warp on a SM and those that are stalled (NVIDIA doc). Est. SM Efficiency also has it’s limitation. For instance, a kernel with only one thread per block can’t fully use each SM. SM Efficiency does not tell us how busy each SM is, only that they are doing anything at all, which can include stalling while waiting on the result of a memory load. To keep an SM busy, it is necessary to have a sufficient number of ready warps that can be run whenever a stall occurs

Estimated Achieved Occupancy (Est. Achieved Occupancy) is a layer deeper than Est. SM Efficiency and GPU Utilization for diagnosing performance issues. Estimated Achieved Occupancy indicates how many warps can be active at once per SMs. Having a sufficient number of active warps is usually key to achieving good throughput. Unlike GPU Utilization and SM Efficiency, it is not a goal to make this value as high as possible. As a rule of thumb, good throughput gains can be had by improving this metric to 15% and above. But at some point you will hit diminishing returns. If the value is already at 30% for example, further gains will be uncertain. This metric reports the average values of all warp schedulers for the kernel execution period (NVIDIA doc). The larger the Est. Achieve Occupancy value is the better.

Overview details: Resnet50_batchsize4

Overview details: Resnet50_batchsize32

Kernel View
The kernel has “Blocks per SM” and “Est. Achieved Occupancy” which is a great tool to compare model runs.

Mean Blocks per SM:
Blocks per SM = Blocks of this kernel / SM number of this GPU. If this number is less than 1, it indicates the GPU multiprocessors are not fully utilized. “Mean Blocks per SM” is weighted average of all runs of this kernel name, using each run’s duration as weight.

Mean Est. Achieved Occupancy:
Est. Achieved Occupancy is defined as above in overview. “Mean Est. Achieved Occupancy” is weighted average of all runs of this kernel name, using each run’s duration as weight.

Trace View
This trace view displays a timeline that shows the duration of operators in your model and which system executed the operation. This view can help you identify whether the high consumption and long execution is because of input or model training. Currently, this trace view shows GPU Utilization and Est. SM Efficiency on a timeline.

GPU utilization is calculated independently and divided into multiple 10 millisecond buckets. The buckets’ GPU utilization values are drawn alongside the timeline between 0 – 100%. In the above example, the “ProfilerStep5” GPU utilization during thread 28022’s busy time is higher than the following the one during “Optimizer.step”. This is where you can zoom-in to investigate why that is.

From above, we can see the former’s kernels are longer than the later’s kernels. The later’s kernels are too short in execution, which results in lower GPU utilization.

Est. SM Efficiency: Each kernel has a calculated est. SM efficiency between 0 – 100%. For example, the below kernel has only 64 blocks, while the SMs in this GPU is 80. Then its “Est. SM Efficiency” is 64/80, which is 0.8.

Cloud Storage Support

After running pip install tensorboard, to have data be read through these cloud providers, you can now run:

torch-tb-profiler[blob] 
torch-tb-profiler[gs] 
torch-tb-profiler[s3] 

pip install torch-tb-profiler[blob], pip install torch-tb-profiler[gs], or pip install torch-tb-profiler[S3] to have data be read through these cloud providers. For more information, please refer to this README.

Jump to Source Code:

One of the great benefits of having both TensorBoard and the PyTorch Profiler being integrated directly in Visual Studio Code (VS Code) is the ability to directly jump to the source code (file and line) from the profiler stack traces. VS Code Python Extension now supports TensorBoard Integration.

Jump to source is ONLY available when Tensorboard is launched within VS Code. Stack tracing will appear on the plugin UI if the profiling with_stack=True. When you click on a stack trace from the PyTorch Profiler, VS Code will automatically open the corresponding file side by side and jump directly to the line of code of interest for you to debug. This allows you to quickly make actionable optimizations and changes to your code based on the profiling results and suggestions.

Gify: Jump to Source using Visual Studio Code Plug In UI

For how to optimize batch size performance, check out the step-by-step tutorial here. PyTorch Profiler is also integrated with PyTorch Lightning and you can simply launch your lightning training jobs with –trainer.profiler=pytorch flag to generate the traces. Check out an example here.

What’s Next for the PyTorch Profiler?

You just saw how PyTorch Profiler can help optimize a model. You can now try the Profiler by pip install torch-tb-profiler to optimize your PyTorch model.

Look out for an advanced version of this tutorial in the future. If you want tailored enterprise-grade support for this, check out PyTorch Enterprise on Azure. We are also thrilled to continue to bring state-of-the-art tool to PyTorch users to improve ML performance. We’d love to hear from you. Feel free to open an issue here.

For new and exciting features coming up with PyTorch Profiler, follow @PyTorch on Twitter and check us out on pytorch.org.

Read More

Simplify and automate anomaly detection in streaming data with Amazon Lookout for Metrics

Do you want to monitor your business metrics and detect anomalies in your existing streaming data pipelines? Amazon Lookout for Metrics is a service that uses machine learning (ML) to detect anomalies in your time series data. The service goes beyond simple anomaly detection. It allows developers to set up autonomous monitoring for important metrics to detect anomalies and identify their root cause in a matter of few clicks, using the same technology used by Amazon internally to detect anomalies in its metrics—all with no ML experience required. However, one limitation you may face if you have an existing Amazon Kinesis Data Streams data pipeline is not being able to directly run anomaly detection on your data streams using Lookout for Metrics. As of this writing, Lookout for Metrics doesn’t have native integration with Kinesis Data Streams to ingest streaming data and run anomaly detection on it.

In this post, we show you how to solve this problem by using an AWS Glue Spark streaming extract, transform, and load (ETL) script to ingest and organize streaming data in Amazon Simple Storage Service (Amazon S3) and using a Lookout for Metrics live detector to detect anomalies. If you have an existing Kinesis Data Streams pipeline that ingests ecommerce data, for example, you can use the solution to detect anomalies such as unexpected dips in revenue, high rates of abandoned shopping carts, increases in new user signups, and many more.

Included in this post is a sample streaming data generator to help you get started quickly. The included GitHub repo provides step-by-step deployment instructions, and uses the AWS Cloud Development Kit (AWS CDK) to simplify and automate the deployment.

Lookout for Metrics allows users to set up anomaly detectors in both continuous and backtest modes. Backtesting allows you to detect anomalies on historical data. This feature is helpful when you want to try out the service on past data or validate against known anomalies that occurred in the past. For this post, we use continuous mode, where you can detect anomalies on live data as they occur. In continuous mode, the detector monitors an input S3 bucket for continuous data and runs anomaly detection on new data at specified time intervals. For the live detector to consume continuous time series data from Amazon S3 correctly, it needs to know where to look for data for the current time interval, therefore, it requires continuous input data in S3 buckets organized by time interval.

Overview of solution

The solution architecture consists of the following components:

  • Streaming data generator – To help you get started quickly, we provide Python code that generates sample time series data and writes to a Kinesis data stream at a specified time interval. The provided code generates sample data for an ecommerce schema (platform, marketplace, event_time, views, revenue). You can also use your own data stream and data, but you must update the downstream processes in the architecture to process your schema.
  • Kinesis Data Streams to Lookout for Metrics – The AWS Glue Spark streaming ETL code is the core component of the solution. It contains logic to do the following:
    • Ingest streaming data
    • Micro-batch data by time interval
    • Filter dimensions and metrics columns
    • Deliver filtered data to Amazon S3 grouped by timestamp
  • Lookout for Metrics continuous detector – The AWS Glue streaming ETL code writes time series data as CSV files to the S3 bucket, with objects organized by time interval. The Lookout for Metrics continuous detector monitors the S3 bucket for live data and runs anomaly detection at the specified time interval (for example, every 5 minutes). You can view the detected anomalies on the Lookout for Metrics dashboard.

The following diagram illustrates the solution architecture.

AWS Glue Spark streaming ETL script

The main component of the solution is the AWS Glue serverless streaming ETL script. The script contains the logic to ingest the streaming data and write the output, grouped by time interval, to an S3 bucket. This makes it possible for Lookout for Metrics to use streaming data from Kinesis Data Streams to detect anomalies. In this section, we walk through the Spark streaming ETL script used by AWS Glue.

The AWS Glue Spark streaming ETL script performs the following steps:

  1. Read from the AWS Glue table that uses Kinesis Data Streams as the data source.

The following screenshot shows the AWS Glue table created for the ecommerce data schema.

  1. Ingest the streaming data from the AWS Glue table (table_name parameter) batched by time window (stream_batch_time parameter) and create a data frame for each micro-batch using create_data_frame.from_catalog(), as shown in the following code:
data_frame_datasource0 = glueContext.create_data_frame.from_catalog(stream_batch_time = BATCH_WIN_SIZE, 
                            database = glue_dbname, table_name = glue_tablename, transformation_ctx = "datasource0", 
                            additional_options = {"startingPosition": "TRIM_HORIZON", "inferSchema": "false"})
  1. Perform the following processing steps for each batch of data (data frame) ingested:
    1. Select only the required columns and convert the data frame to the AWS Glue native DynamicFrame.
datasource0 = DynamicFrame.fromDF(data_frame, glueContext, "from_data_frame").select_fields(['marketplace','event_time', 'views'])

As shown in the preceding example code, select only the columns marketplace, event_time, and views to write to output CSV files in Amazon S3. Lookout for Metrics uses these columns for running anomaly detection. In this example, marketplace is the optional dimension column used for grouping anomalies, views is the metric to be monitored for anomalies, and event_time is the timestamp for time series data.

    1. Populate the time interval in each streaming record ingested:
datasource1 = Map.apply(frame=datasource0, f=populateTimeInterval)

In the preceding code, we provide the custom function populateTimeInterval, which determines the appropriate time interval for the given data point based on its event_time timestamp column.

The following table includes example time intervals determined by the function for a 5-minute frequency.

Input timestamp Start of time interval determined by populateTimeInterval function
2021-05-24 19:18:28 2021-05-24 19:15
2021-05-24 19:21:15 2021-05-24 19:20

The following table includes example time intervals determined by the function for a 10-minute frequency.

Input timestamp Start of time interval determined by populateTimeInterval function
2021-05-24 19:18:28 2021-05-24 19:10
2021-05-24 19:21:15 2021-05-24 19:20
    1. The write_dynamic_frame() function uses the time interval (as determined in the previous step) as the partition key to write output CSV files to the appropriate S3 prefix structure:
datasink1 = glueContext.write_dynamic_frame.from_options(frame = datasource1, connection_type = "s3", 
                        connection_options = {"path": path_datasink1, "partitionKeys": ["intvl_date", "intvl_hhmm"]}, 
                        format_options={"quoteChar": -1, "timestamp.formats": "yyyy-MM-dd HH:mm:ss"}, 
                        format = src_format, transformation_ctx = "datasink1")

For example, the following screenshot shows that the ETL script writes output to the S3 folder structure organized by 5-minute time intervals.

For additional details on partitions for ETL outputs, see Managing Partitions for ETL Output in AWS Glue.

You can set up a live detector using Amazon S3 as a continuous data source to start detecting the anomalies in streaming data. For detailed instructions, see GitHub repo.

Prerequisites

You need the following to deploy the solution:

  • An AWS account with permissions to deploy the solution using AWS CDK
  • A workstation or development environment with the following installed and configured:
    • npm
    • Typescript
    • AWS CDK
    • AWS account credentials

You can find detailed instructions in the “Getting Started” section of the GitHub repo.

Deploy the solution

Follow the step-by-step instructions in the GitHub repo to deploy the solution components. AWS CDK templates are provided for each of the solution components, organized in their own directory structure within the GitHub repo. The templates deploy the following resources:

  • Data generator – The Lambda function, Amazon EventBridge rule, and Kinesis data stream
  • Connector for Lookout for Metrics – The AWS Glue Spark streaming ETL job and S3 bucket
  • Lookout for Metrics continuous detector – Our continuous detector

Clean up

To avoid incurring future charges, delete the resources by deleting the stacks deployed by the AWS CDK.

Conclusion

In this post, we showed how you can detect anomalies in streaming data sources using a Lookout for Metrics continuous detector. The solution used serverless streaming ETL with AWS Glue to prepare the data for Lookout for Metrics anomaly detection. The reference implementation used an ecommerce sample data schema (platform, marketplace, event_time, views, revenue) to demonstrate and test the solution.

You can easily extend the provided data generator code and ETL script to process your own schema and sample data. Additionally, you can adjust the solution parameters such as anomaly detection frequency to match your use case. With minor changes, you can replace the sample data generator with an existing Kinesis Data Streams streaming data source.

To learn more about Amazon Lookout for Metrics, see Introducing Amazon Lookout for Metrics: An anomaly detection service to proactively monitor the health of your business and the Lookout for Metrics Developer Guide. For additional information about streaming ETL jobs with AWS Glue, see Crafting serverless streaming ETL jobs with AWS Glue and Adding Streaming ETL Jobs in AWS Glue.


About the Author

Babu Srinivasan is a Sr. Solutions Architect at AWS, with over 24 years of experience in IT and the last 6 years focused on the AWS Cloud. He is passionate about AI/ML. Outside of work, he enjoys woodworking and entertains friends and family (sometimes strangers) with sleight of hand card magic.

Read More

Google at ACL 2021

Posted by Catherine Armato, Program Manager

This week, the 59th annual meeting of the Association for Computational Linguistics (ACL), a premier conference covering a broad spectrum of research areas that are concerned with computational approaches to natural language, is taking place online.

As a leader in natural language processing and understanding, and a Diamond Level sponsor of ACL 2021, Google will showcase the latest research in the field with over 35 publications, and the organization of and participation in a variety of workshops and tutorials.

If you’re registered for ACL 2021, we hope that you’ll visit the Google virtual booth in Gather Town to learn more about the projects and opportunities at Google that go into solving interesting problems for billions of people. You can also learn more about Google’s participation on the ACL 2021 Expo page, and see a full list of Google publications below (Google affiliations in bold).

Organizing Committee
Senior Area Chairs include: Dan Roth, Emily Pitler, Jimmy Lin, Ming-Wei Chang, Sebastian Ruder, Slav Petrov
Area Chairs include: Ankur P. Parikh, Artem Sokolov, Bhuwan Dhingra, Cicero Nogueira dos Santos, Colin Cherry, Dani Yogatama, David Mimno, Hideto Kazawa, Ian Tenney, Jasmijn Bastings, Jun Suzuki, Katja Filippova, Kyle Gorma, Lu Wang, Manaal Faruqui, Natalie Schluter, Peter Liu, Radu Soricut, Sebastian Gehrmann, Shashi Narayan, Tal Linzen, Vinodkumar Prabhakaran, Waleed Ammar

Publications
Parameter-Efficient Multi-task Fine-Tuning for Transformers via Shared Hypernetwork
Rabeeh Karimi Mahabadi*, Sebastian Ruder, Mostafa Dehghani, James Henderson

TicketTalk: Toward Human-Level Performance with End-to-End, Transaction-Based Dialog Systems
Bill Byrne, Karthik Krishnamoorthi, Saravanan Ganesh, Mihir Sanjay Kale

Increasing Faithfulness in Knowledge-Grounded Dialogue with Controllable Feature
Hannah Rashkin, David Reitter, Gaurav Singh Tomar, Dipanjan Das

Compositional Generalization and Natural Language Variation: Can a Semantic Parsing Approach Handle Both?
Peter Shaw, Ming-Wei Chang, Panupong Pasupat, Kristina Toutanova

Exploiting Language Relatedness for Low Web-Resource Language Model Adaptation: An Indic Languages Study
Yash Khemchandani, Sarvesh Mehtani, Vaidehi Patil, Abhijeet Awasthi, Partha Talukdar, Sunita Sarawagi

Causal Analysis of Syntactic Agreement Mechanisms in Neural Language Model
Matthew Finlayson, Aaron Mueller, Sebastian Gehrmann, Stuart Shieber, Tal Linzen*, Yonatan Belinkov

Modeling Fine-Grained Entity Types with Box Embeddings
Yasumasa Onoe, Michael Boratko, Andrew McCallum, Greg Durrett

TextSETTR: Few-Shot Text Style Extraction and Tunable Targeted Restyling
Parker Riley*, Noah Constant, Mandy Guo, Girish Kumar*, David Uthus, Zarana Parekh

Which Linguist Invented the Lightbulb? Presupposition Verification for Question-Answering
Najoung Kim*, Ellie Pavlick, Burcu Karagol Ayan, Deepak Ramachandran

H-Transformer-1D: Fast One-Dimensional Hierarchical Attention for Sequences
Zhenhai Zhu, Radu Soricut

Are Pretrained Convolutions Better than Pretrained Transformers?
Yi Tay, Mostafa Dehghani, Jai Gupta, Dara Bahri, Vamsi Aribandi, Zhen Qin, Donald Metzler

Benchmarking Scalable Methods for Streaming Cross Document Entity Coreference
Robert L Logan IV, Andrew McCallum, Sameer Singh, Dan Bikel

PhotoChat: A Human-Human Dialogue Dataset With Photo Sharing Behavior For Joint Image-Text Modeling
Xiaoxue Zang, Lijuan Liu, Maria Wang, Yang Song*, Hao Zhang, Jindong Chen

Focus Attention: Promoting Faithfulness and Diversity in Summarization
Rahul Aralikatte*, Shashi Narayan, Joshua Maynez, Sascha Rothe, Ryan McDonald*

A Cognitive Regularizer for Language Modeling
Jason Wei, Clara Meister, Ryan Cotterell

Language Model Augmented Relevance Score
Ruibo Liu, Jason Wei, Soroush Vosoughi

Cross-Replication Reliability – An Empirical Approach to Interpreting Inter-rater Reliability
Ka Wong, Praveen Paritosh, Lora Aroyo

TIMEDIAL: Temporal Commonsense Reasoning in Dialog
Lianhui Qin*, Aditya Gupta, Shyam Upadhyay, Luheng He, Yejin Choi, Manaal Faruqui

StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling
Yikang Shen*, Yi Tay, Che Zheng, Dara Bahri, Donald Metzler, Aaron Courville

MOLEMAN: Mention-Only Linking of Entities with a Mention Annotation Network
Nicholas FitzGerald, Jan A. Botha, Daniel Gillick, Daniel M. Bikel, Tom Kwiatkowski, Andrew McCallum

Neural Retrieval for Question Answering with Cross-Attention Supervised Data Augmentation
Yinfei Yanga, Ning Jinb, Kuo Linb, Mandy Guoa, Daniel Cera

ROPE: Reading Order Equivariant Positional Encoding for Graph-Based Document Information Extraction
Chen-Yu Lee, Chun-Liang Li, Chu Wang∗, Renshen Wang, Yasuhisa Fujii, Siyang Qin, Ashok Popat, Tomas Pfister

Measuring and Improving BERT’s Mathematical Abilities by Predicting the Order of Reasoning
Piotr Piekos, Henryk Michalewski, Mateusz Malinowsk

Improving Compositional Generalization in Classification Tasks via Structure Annotations
Juyong Kim, Pradeep Ravikumar, Joshua Ainslie, Santiago Ontañón

A Simple Recipe for Multilingual Grammatical Error Correction
Sascha Rothe, Jonathan Mallinson, Eric Malmi, Sebastian Krause, Aliaksei Severyn

nmT5 – Is Parallel Data Still Relevant for Pre-training Massively Multilingual Language Models?
Mihir Kale, Aditya Siddhant, Noah Constant, Melvin Johnson, Rami Al-Rfou, Linting Xue

QA-Driven Zero-Shot Slot Filling with Weak Supervision Pretraining
Xinya Du*, Luheng He, Qi Li, Dian Yu*, Panupong Pasupat, Yuan Zhang

AgreeSum: Agreement-Oriented Multi-Document Summarization
Richard Yuanzhe Pang*, Adam D. Lelkes, Vinh Q. Tran, Cong Yu

Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering
Aditya Gupta, Jiacheng Xu*, Shyam Upadhyay, Diyi Yang, Manaal Faruqui

Training ELECTRA Augmented with Multi-word Selection
Jiaming Shen*, Jialu Liu, Tianqi Liu, Cong Yu, Jiawei Han

A Survey of Data Augmentation Approaches for NLP
Steven Y. Feng, Varun Gangal, Jason Wei, Sarath Chandar, Soroush Vosoughi, Teruko Mitamura, Eduard Hovy

RealFormer: Transformer Likes Residual Attention
Ruining He, Anirudh Ravula, Bhargav Kanagal, Joshua Ainslie

Scaling Within Document Coreference to Long Texts
Raghuveer Thirukovalluru, Nicholas Monath, Kumar Shridhar, Manzil Zaheer, Mrinmaya Sachan, Andrew McCallum

MergeDistill: Merging Language Models using Pre-trained Distillation
Simran Khanuja, Melvin Johnson, Partha Talukdar

DoT: An Efficient Double Transformer for NLP tasks with Tables
Syrine Krichene, Thomas Müller*, Julian Martin Eisenschlos

How Reliable are Model Diagnostics?
Vamsi Aribandi, Yi Tay, Donald Metzler

Workshops
Interactive Learning for Natural Language Processing
Organizers include: Filip Radlinski
Invited Panelist: Julia Kreutzer

6th Workshop on Representation Learning for NLP (RepL4NLP-2021)
Organizers include: Chris Dyer, Laura Rimell

Third Workshop on Gender Bias for Natural Language Processing
Organizers include: Kellie Webster

Benchmarking: Past, Present and Future
Invited Speaker: Eunsol Choi

SemEval-2021, 15th International Workshop on Semantic Evaluation
Organizers include: Natalie Schluter

Workshop on Online Abuse and Harms
Organizers include: Vinodkumar Prabhakaran

GEM: Natural Language Generation, Evaluation, and Metrics
Organizers include: Sebastian Gehrmann

Workshop on Natural Language Processing for Programming
Invited Speaker: Charles Sutton

WPT 2021: The 17th International Conference on Parsing Technologies
Organizers include: Weiwei Sun

Tutorial
Recognizing Multimodal Entailment
Instructors include: Cesar Ilharco, Vaiva Imbrasaite, Ricardo Marino, Jannis Bulian, Chen Sun, Afsaneh Shirazi, Lucas Smaira, Cordelia Schmid


*  Work conducted while at Google. 

Read More

Display systems research: Reverse passthrough VR

As AR and VR devices become a bigger part of how we work and play, how do we maintain seamless social connection between real and virtual worlds? In other words, how do we maintain “social co-presence” in shared spaces among people who may or may not be involved in the same AR/VR experience?

This year at SIGGRAPH, Facebook Reality Labs (FRL) Research will present a new concept for social co-presence with virtual reality headsets: reverse passthrough VR, led by research scientist Nathan Matsuda. Put simply, reverse passthrough is an experimental VR research demo that allows the eyes of someone wearing a headset to be seen by the outside world. This is in contrast to what Quest headsets can do today with Passthrough+ and the experimental Passthrough API, which use externally facing cameras to help users easily see their external surroundings while they’re wearing the headset.

Over the years, we’ve made strides in enabling Passthrough features for Oculus for consumers and developers to explore. In fact, the idea for this experimental reverse passthrough research occurred to Matsuda after he spent a day in the office wearing a Quest headset with Passthrough, thinking through how to make mixed reality environments more seamless for social and professional settings. Wearing the headset with Passthrough, he could see his colleagues and the room around him just fine. But his colleagues couldn’t see him without an external display. Every time he attempted to speak to someone, they remarked how strange it was that he wasn’t able to make eye contact. So Matsuda posed the question: What if you could see his eyes — would that add something to the social dynamic?

When Matsuda first demonstrated reverse passthrough for FRL Chief Scientist Michael Abrash in 2019, Abrash was unconvinced about the utility of this work. In the demo, Matsuda wore a custom-built Rift S headset with a 3D display mounted to the front. On the screen, a floating 3D image of Matsuda’s face, crudely rendered from a game engine, re-created his eye gaze using signals from a pair of eye-tracking cameras inside the headset.


Research scientist Nathan Matsuda wears an early reverse passthrough prototype with 2D outward-facing displays. Right: The first fully functional reverse passthrough demo using 3D light field displays.

“My first reaction was that it was kind of a goofy idea, a novelty at best,” said Abrash. “But I don’t tell researchers what to do, because you don’t get innovation without freedom to try new things, and that’s a good thing, because now it’s clearly a unique idea with genuine promise.”

Nearly two years after the initial demo, the 3D display technology and research prototype have evolved significantly, featuring purpose-built optics, electronics, software, and a range of supporting technologies to capture and depict more realistic 3D faces. This progress is promising, but this research is clearly still experimental: Tethered by many cables, it’s far from a standalone headset, and the eye and facial renderings are not yet completely lifelike. However, it is a research prototype designed in the spirit of FRL Research’s core ethos to run with far-flung concepts that may seem a bit outlandish. While this work is nowhere near a product roadmap, it does offer a glimpse into how reverse passthrough could be used in collaborative spaces of the future — both real and virtual.

Left: A VR headset with the external display disabled, representing the current state of the art. No gaze cues are visible through the opaque headset enclosure. Middle: A VR headset with outward-facing 2D displays, as proposed in prior academic works[1][2][3][4]. Some gaze cues are visible, but the incorrect perspective limits the viewer’s ability to discern gaze direction. Right: Our recent prototype uses 3D reverse passthrough displays, showing correct perspective for multiple external viewers.

Reverse passthrough

The essential component in a reverse passthrough headset is the externally facing 3D display. You could simply put a 2D display on the front of the headset and show a flat projection of the user’s face on it, but the offset from the user’s actual face to the front of the headset makes for a visually jarring, unnatural effect that breaks any hope of reading correct eye contact. As the research prototype evolved, it was clear that a 3D display was a better direction, as it would allow the user’s eyes and face to appear at the correct position in space on the front of the headset. This depiction helps maintain alignment as external viewers move in relation to the 3D display.

There are several established ways to display 3D images. For this research, we used a microlens-array light field display because it’s thin, simple to construct, and based on existing consumer LCD technology. These displays use a tiny grid of lenses that send light from different LCD pixels out in different directions, with the effect that an observer sees a different image when looking at the display from different directions. The perspective of the images shift naturally so that any number of people in the room can look at the light field display and see the correct perspective for their location.

As with any early stage research prototype, this hardware still carries significant limitations: First, the viewing angle can’t be too severe, and second, the prototype can only show objects in sharp focus that are within a few centimeters of the physical screen surface. Conversations take place face-to-face, which naturally limits reverse passthrough viewing angles. And the wearer’s face is only a few centimeters from the physical screen surface, so the technology works well for this case — and will work even better if VR headsets continue to shrink in size, using methods such as holographic optics.

Building the research prototype

FRL researchers used a Rift S for early explorations of reverse passthrough. As the concept evolved, the team began iterating on Half Dome 2 to build the research prototype presented this year at SIGGRAPH. Stripping down the headset to the bare display pod, mechanical research engineer Joel Hegland provided a roughly 50-millimeter-thick VR headset to serve as a base for the latest reverse passthrough demo. Then, optical scientist Brian Wheelwright designed a microlens array to be fitted in front.

The resulting headset contains two display pods that are mirror images of each other. They contain an LCD panel and lens for the base VR display. A ring of infrared LEDs illuminates the part of the face covered by the pod. A mirror that is reflective only for infrared light sits between the lens and screen, so that a pair of infrared cameras can view the eye from nearly head-on. Doing all this in the invisible infrared band keeps the eye imaging system from distracting the user from the VR display itself. Then the front of the pod has another LCD with the microlens array.


Left: A cutaway view of one of the prototype display pods. Right: The prototype display pod with driver electronics, prior to installation in the full headset prototype.

Imaging eyes and faces in 3D

Producing the interleaved 3D images to show on the light field display presented a significant challenge in itself. For this research prototype, Matsuda and team opted to use a stereo camera pair to produce a surface model of the face, then projected the views of the eye onto that surface. While the resulting projected eyes and face are not lifelike, this is just a short-term solution to pave the way for future development.

FRL’s Codec Avatars research points toward the next generation of this imaging. Codec Avatars are realistic representations of the human face, expressions, voice, and body that, via deep learning, can be driven from a compact set of measurements taken inside a VR headset in real time. These virtual avatars should be much more effective for reverse passthrough, allowing for a unified system of facial representation that works whether the viewer is local or remote.

Shown below, a short video depicts a Codec Avatar from our Pittsburgh lab running on the prototype reverse passthrough headset. These images, and their motion over time, appear much more lifelike than those captured using the current stereo camera method, indicating the sort of improvements that such a system could provide while working in tandem with remote telepresence systems.

The reverse passthrough prototype displaying a high-fidelity Codec Avatar facial reconstruction.

A path toward social co-presence in VR

Totally immersive VR and AR glasses with a display are fundamentally different technologies that will likely end up serving different users in different scenarios in the long term. There will be situations where people will need the true transparent optics of AR glasses, and others where people will prefer the image quality and immersion of VR. Facebook Reality Labs Research, under Michael Abrash’s direction, has cast a wide net when probing new technical concepts in order to move the ball forward across both of these display architectures. Fully exploring this space will ensure that the lab has a grasp on the full range of possibilities — and limitations — for future AR/VR devices, and eventually put those findings into practice in a way that supports human-computer interaction for the most people in the most places.

Reverse passthrough is representative of this sort of work — an example of how ideas from around the lab are pushing the utility of VR headsets forward. Later this year, we’ll give a more holistic update on our display systems research and show how all this work — from varifocal, holographic optics, eye tracking, and distortion correction to reverse passthrough — is coming together to help us pass what we call the Visual Turing Test in VR.

Ultimately, these innovations and more will come together to create VR headsets that are compact, light, and all-day wearable; that mix high-quality virtual images with high-quality real-world images; and that let you be socially present with anyone in the world, whether they’re on the other side of the planet or standing next to you. Making that happen is our goal at Facebook Reality Labs Research.


[1] Liwei Chan and Kouta Minamizawa. 2017. FrontFace: Facilitating Communication between HMD Users and Outsiders Using Front-Facing-Screen HMDs

[2] Kana Misawa and Jun Rekimoto. 2015. ChameleonMask: Embodied Physical and Social Telepresence using Human Surrogates

[3] Christian Mai, Lukas Rambold, and Mohamed Khamis. 2017. TransparentHMD: Revealing the HMD User’s Face to Bystanders

[4] Jan Gugenheimer, Christian Mai, Mark McGill, Julie Williamson, Frank Steinicke, and Ken Perlin. 2019. Challenges Using Head-Mounted Displays in Shared and Social Spaces

The post Display systems research: Reverse passthrough VR appeared first on Facebook Research.

Read More