Unlocking efficiency: Harnessing the power of Selective Execution in Amazon SageMaker Pipelines

MLOps is a key discipline that often oversees the path to productionizing machine learning (ML) models. It’s natural to focus on a single model that you want to train and deploy. However, in reality, you’ll likely work with dozens or even hundreds of models, and the process may involve multiple complex steps. Therefore, it’s important to have the infrastructure in place to track, train, deploy, and monitor models with varying complexities at scale. This is where MLOps tooling comes in. MLOps tooling helps you repeatably and reliably build and simplify these processes into a workflow that is tailored for ML.

Amazon SageMaker Pipelines, a feature of Amazon SageMaker, is a purpose-built workflow orchestration service for ML that helps you automate end-to-end ML workflows at scale. It simplifies the development and maintenance of ML models by providing a centralized platform to orchestrate tasks such as data preparation, model training, tuning and validation. SageMaker Pipelines can help you streamline workflow management, accelerate experimentation and retrain models more easily.

In this post, we spotlight an exciting new feature of SageMaker Pipelines known as Selective Execution. This new feature empowers you to selectively run specific portions of your ML workflow, resulting in significant time and compute resource savings by limiting the run to pipeline steps in scope and eliminating the need to run steps out of scope. Furthermore, we explore various use cases where the advantages of utilizing Selective Execution become evident, further solidifying its value proposition.

Solution overview

SageMaker Pipelines continues to innovate its developer experience with the release of Selective Execution. ML builders now have the ability to choose specific steps to run within a pipeline, eliminating the need to rerun the entire pipeline. This feature enables you to rerun specific sections of the pipeline while modifying the runtime parameters associated with the selected steps.

It’s important to note that the selected steps may rely on the results of non-selected steps. In such cases, the outputs of these non-selected steps are reused from a reference run of the current pipeline version. This means that the reference run must have already completed. The default reference run is the latest run of the current pipeline version, but you can also choose to use a different run of the current pipeline version as a reference.

The overall state of the reference run must be Successful, Failed or Stopped. It cannot be Running when Selective Execution attempts to use its outputs. When using Selective Execution, you can choose any number of steps to run, as long as they form a contiguous portion of the pipeline.

The following diagram illustrates the pipeline behavior with a full run.

The following diagram illustrates the pipeline behavior using Selective Execution.

In the following sections, we show how to use Selective Execution for various scenarios, including complex workflows in pipeline Direct Acyclic Graphs (DAGs).

Prerequisites

To start experimenting with Selective Execution, we need to first set up the following components of your SageMaker environment:

SageMaker Python SDK – Ensure that you have an updated SageMaker Python SDK installed in your Python environment. You can run the following command from your notebook or terminal to install or upgrade the SageMaker Python SDK version to 2.162.0 or higher: python3 -m pip install sagemaker>=2.162.0 or pip3 install sagemaker>=2.162.0.
Access to SageMaker Studio (optional) – Amazon SageMaker Studio can be helpful for visualizing pipeline runs and interacting with preexisting pipeline ARNs visually. If you don’t have access to SageMaker Studio or are using on-demand notebooks or other IDEs, you can still follow this post and interact with your pipeline ARNs using the Python SDK.

The sample code for a full end-to-end walkthrough is available in the GitHub repo.

Setup

With the sagemaker>=1.162.0 Python SDK, we introduced the SelectiveExecutionConfig class as part of the sagemaker.workflow.selective_execution_config module. The Selective Execution feature relies on a pipeline ARN that has been previously marked as Succeeded, Failed or Stopped. The following code snippet demonstrates how to import the SelectiveExecutionConfig class, retrieve the reference pipeline ARN, and gather associated pipeline steps and runtime parameters governing the pipeline run:

import boto3
from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.selective_execution_config import SelectiveExecutionConfig


sm_client = boto3.client('sagemaker')
# reference the name of your sample pipeline 
pipeline_name = "AbalonePipeline"
# filter for previous success pipeline execution arns
pipeline_executions = [_exec
    for _exec in Pipeline(name=pipeline_name).list_executions()['PipelineExecutionSummaries'] 
    if _exec['PipelineExecutionStatus'] == "Succeeded"
]
# get the last successful execution
latest_pipeline_arn = pipeline_executions[0]['PipelineExecutionArn']
print(latest_pipeline_arn)
>>> arn:aws:sagemaker:us-east-1:123123123123:pipeline/AbalonePipeline/execution/x62pbar3gs6h

# list all steps of your sample pipeline
execution_steps = sm_client.list_pipeline_execution_steps(
    PipelineExecutionArn=latest_pipeline_arn
)['PipelineExecutionSteps']
print(execution_steps)
>>> 
[{'StepName': 'Abalone-Preprocess',
  'StartTime': datetime.datetime(2023, 6, 27, 4, 41, 30, 519000, tzinfo=tzlocal()),
  'EndTime': datetime.datetime(2023, 6, 27, 4, 41, 30, 986000, tzinfo=tzlocal()),
  'StepStatus': 'Succeeded',
  'AttemptCount': 0,
  'Metadata': {'ProcessingJob': {'Arn': 'arn:aws:sagemaker:us-east-1:123123123123:processing-job/pipelines-fvsmu7m7ki3q-Abalone-Preprocess-d68CecvHLU'}},
  'SelectiveExecutionResult': {'SourcePipelineExecutionArn': 'arn:aws:sagemaker:us-east-1:123123123123:pipeline/AbalonePipeline/execution/ksm2mjwut6oz'}},
 {'StepName': 'Abalone-Train',
  'StartTime': datetime.datetime(2023, 6, 27, 4, 41, 31, 320000, tzinfo=tzlocal()),
  'EndTime': datetime.datetime(2023, 6, 27, 4, 43, 58, 224000, tzinfo=tzlocal()),
  'StepStatus': 'Succeeded',
  'AttemptCount': 0,
  'Metadata': {'TrainingJob': {'Arn': 'arn:aws:sagemaker:us-east-1:123123123123:training-job/pipelines-x62pbar3gs6h-Abalone-Train-PKhAc1Q6lx'}}},
 {'StepName': 'Abalone-Evaluate',
  'StartTime': datetime.datetime(2023, 6, 27, 4, 43, 59, 40000, tzinfo=tzlocal()),
  'EndTime': datetime.datetime(2023, 6, 27, 4, 57, 43, 76000, tzinfo=tzlocal()),
  'StepStatus': 'Succeeded',
  'AttemptCount': 0,
  'Metadata': {'ProcessingJob': {'Arn': 'arn:aws:sagemaker:us-east-1:123123123123:processing-job/pipelines-x62pbar3gs6h-Abalone-Evaluate-vmkZDKDwhk'}}},
 {'StepName': 'Abalone-MSECheck',
  'StartTime': datetime.datetime(2023, 6, 27, 4, 57, 43, 821000, tzinfo=tzlocal()),
  'EndTime': datetime.datetime(2023, 6, 27, 4, 57, 44, 124000, tzinfo=tzlocal()),
  'StepStatus': 'Succeeded',
  'AttemptCount': 0,
  'Metadata': {'Condition': {'Outcome': 'True'}}}]

# list all configureable pipeline parameters 
# params can be altered during selective execution
parameters = sm_client.list_pipeline_parameters_for_execution(
    PipelineExecutionArn=latest_pipeline_arn
)['PipelineParameters']
print(parameters)
>>> 
[{'Name': 'XGBNumRounds', 'Value': '120'},
 {'Name': 'XGBSubSample', 'Value': '0.9'},
 {'Name': 'XGBGamma', 'Value': '2'},
 {'Name': 'TrainingInstanceCount', 'Value': '1'},
 {'Name': 'XGBMinChildWeight', 'Value': '4'},
 {'Name': 'XGBETA', 'Value': '0.25'},
 {'Name': 'ApprovalStatus', 'Value': 'PendingManualApproval'},
 {'Name': 'ProcessingInstanceCount', 'Value': '1'},
 {'Name': 'ProcessingInstanceType', 'Value': 'ml.t3.medium'},
 {'Name': 'MseThreshold', 'Value': '6'},
 {'Name': 'ModelPath',
  'Value': 's3://sagemaker-us-east-1-123123123123/Abalone/models/'},
 {'Name': 'XGBMaxDepth', 'Value': '12'},
 {'Name': 'TrainingInstanceType', 'Value': 'ml.c5.xlarge'},
 {'Name': 'InputData',
  'Value': 's3://sagemaker-us-east-1-123123123123/sample-dataset/abalone/abalone.csv'}]

Use cases

In this section, we present a few scenarios where Selective Execution can potentially save time and resources. We use a typical pipeline flow, which includes steps such as data extraction, training, evaluation, model registration and deployment, as a reference to demonstrate the advantages of Selective Execution.

SageMaker Pipelines allows you to define runtime parameters for your pipeline run using pipeline parameters. When a new run is triggered, it typically runs the entire pipeline from start to finish. However, if step caching is enabled, SageMaker Pipelines will attempt to find a previous run of the current pipeline step with the same attribute values. If a match is found, SageMaker Pipelines will use the outputs from the previous run instead of recomputing the step. Note that even with step caching enabled, SageMaker Pipelines will still run the entire workflow to the end by default.

With the release of the Selective Execution feature, you can now rerun an entire pipeline workflow or selectively run a subset of steps using a prior pipeline ARN. This can be done even without step caching enabled. The following use cases illustrate the various ways you can use Selective Execution.

Use case 1: Run a single step

Data scientists often focus on the training stage of a MLOps pipeline and don’t want to worry about the preprocessing or deployment steps. Selective Execution allows data scientists to focus on just the training step and modify training parameters or hyperparameters on the fly to improve the model. This can save time and reduce cost because compute resources are only utilized for running user-selected pipeline steps. See the following code:

# select a reference pipeline arn and subset step to execute
selective_execution_config = SelectiveExecutionConfig(
    source_pipeline_execution_arn="arn:aws:sagemaker:us-east-1:123123123123:pipeline/AbalonePipeline/execution/9e3ljoql7s0n",
    selected_steps=["Abalone-Train"]
)

# start execution of pipeline subset
select_execution = pipeline.start(
    selective_execution_config=selective_execution_config,
    parameters={
        "XGBNumRounds": 120,
        "XGBSubSample": 0.9,
        "XGBGamma": 2,
        "XGBMinChildWeight": 4,
        "XGBETA": 0.25,
        "XGBMaxDepth": 12
    }
)

The following figures illustrate the pipeline with one step in process and then complete.

Use case 2: Run multiple contiguous pipeline steps

Continuing with the previous use case, a data scientist wants to train a new model and evaluate its performance against a golden test dataset. This evaluation is crucial to ensure that the model meets rigorous guidelines for user acceptance testing (UAT) or production deployment. However, the data scientist doesn’t want to run the entire pipeline workflow or deploy the model. They can use Selective Execution to focus solely on the training and evaluation steps, saving time and resources while still getting the validation results they need:

# select a reference pipeline arn and subset step to execute
selective_execution_config = SelectiveExecutionConfig(
    source_pipeline_execution_arn="arn:aws:sagemaker:us-east-1:123123123123:pipeline/AbalonePipeline/execution/9e3ljoql7s0n",
    selected_steps=["Abalone-Train", "Abalone-Evaluate"]
)

# start execution of pipeline subset
select_execution = pipeline.start(
    selective_execution_config=selective_execution_config,
    parameters={
        "ProcessingInstanceType": "ml.t3.medium",
        "XGBNumRounds": 120,
        "XGBSubSample": 0.9,
        "XGBGamma": 2,
        "XGBMinChildWeight": 4,
        "XGBETA": 0.25,
        "XGBMaxDepth": 12
    }
)

Use case 3: Update and rerun failed pipeline steps

You can use Selective Execution to rerun failed steps within a pipeline or resume the run of a pipeline from a failed step onwards. This can be useful for troubleshooting and debugging failed steps because it allows developers to focus on the specific issues that need to be addressed. This can lead to more efficient problem-solving and faster iteration times. The following example illustrates how you can choose to rerun just the failed step of a pipeline.

# select a previously failed pipeline arn
selective_execution_config = SelectiveExecutionConfig(
    source_pipeline_execution_arn="arn:aws:sagemaker:us-east-1:123123123123:pipeline/AbalonePipeline/execution/fvsmu7m7ki3q",
    selected_steps=["Abalone-Evaluate"]
)

# start execution of failed pipeline subset
select_execution = pipeline.start(
    selective_execution_config=selective_execution_config
)

Alternatively, a data scientist can resume a pipeline from a failed step to the end of the workflow by specifying the failed step and all the steps that follow it in the SelectiveExecutionConfig.

Use case 4: Pipeline coverage

In some pipelines, certain branches are less frequently run than others. For example, there might be a branch that only runs when a specific condition fails. It’s important to test these branches thoroughly to ensure that they work as expected when a failure does occur. By testing these less frequently run branches, developers can verify that their pipeline is robust and that error-handling mechanisms effectively maintain the desired workflow and produce reliable results.

selective_execution_config = SelectiveExecutionConfig(
    source_pipeline_execution_arn="arn:aws:sagemaker:us-east-1:123123123123:pipeline/AbalonePipeline/execution/9e3ljoql7s0n",
    selected_steps=["Abalone-Train", "Abalone-Evaluate", "Abalone-MSECheck", "Abalone-FailNotify"]
)

Conclusion

In this post, we discussed the Selective Execution feature of SageMaker Pipelines, which empowers you to selectively run specific steps of your ML workflows. This capability leads to significant time and computational resource savings. We provided some sample code in the GitHub repo that demonstrates how to use Selective Execution and presented various scenarios where it can be advantageous for users. If you would like to learn more about Selective Execution, refer to our Developer Guide and API Reference Guide.

To explore the available steps within the SageMaker Pipelines workflow in more detail, refer to Amazon SageMaker Model Building Pipeline and SageMaker Workflows. Additionally, you can find more examples showcasing different use cases and implementation approaches using SageMaker Pipelines in the AWS SageMaker Examples GitHub repository. These resources can further enhance your understanding and help you take advantage of the full potential of SageMaker Pipelines and Selective Execution in your current and future ML projects.

About the Authors

Pranav Murthy is an AI/ML Specialist Solutions Architect at AWS. He focuses on helping customers build, train, deploy and migrate machine learning (ML) workloads to SageMaker. He previously worked in the semiconductor industry developing large computer vision (CV) and natural language processing (NLP) models to improve semiconductor processes. In his free time, he enjoys playing chess and traveling.

Akhil Numarsu is a Sr.Product Manager-Technical focused on helping teams accelerate ML outcomes through efficient tools and services in the cloud. He enjoys playing Table Tennis and is a sports fan.

Nishant Krishnamoorthy is a Sr. Software Development Engineer with Amazon Stores. He holds a masters degree in Computer Science and currently focuses on accelerating ML Adoption in different orgs within Amazon by building and operationalizing ML solutions on SageMaker.

Repairing interrupted questions makes voice agents more accessible

Learning to represent truncated sentences with semantic graphs improves models’ ability to infer missing content.Read More

Train self-supervised vision transformers on overhead imagery with Amazon SageMaker

This is a guest blog post co-written with Ben Veasey, Jeremy Anderson, Jordan Knight, and June Li from Travelers.

Satellite and aerial images provide insight into a wide range of problems, including precision agriculture, insurance risk assessment, urban development, and disaster response. Training machine learning (ML) models to interpret this data, however, is bottlenecked by costly and time-consuming human annotation efforts. One way to overcome this challenge is through self-supervised learning (SSL). By training on large amounts of unlabeled image data, self-supervised models learn image representations that can be transferred to downstream tasks, such as image classification or segmentation. This approach produces image representations that generalize well to unseen data and reduces the amount of labeled data required to build performant downstream models.

In this post, we demonstrate how to train self-supervised vision transformers on overhead imagery using Amazon SageMaker. Travelers collaborated with the Amazon Machine Learning Solutions Lab (now known as the Generative AI Innovation Center) to develop this framework to support and enhance aerial imagery model use cases. Our solution is based on the DINO algorithm and uses the SageMaker distributed data parallel library (SMDDP) to split the data over multiple GPU instances. When pre-training is complete, the DINO image representations can be transferred to a variety of downstream tasks. This initiative led to improved model performances within the Travelers Data & Analytics space.

Overview of solution

The two-step process for pre-training vision transformers and transferring them to supervised downstream tasks is shown in the following diagram.

In the following sections, we provide a walkthrough of the solution using satellite images from the BigEarthNet-S2 dataset. We build on the code provided in the DINO repository.

Prerequisites

Before getting started, you need access to a SageMaker notebook instance and an Amazon Simple Storage Service (Amazon S3) bucket.

Prepare the BigEarthNet-S2 dataset

BigEarthNet-S2 is a benchmark archive that contains 590,325 multispectral images collected by the Sentinel-2 satellite. The images document the land cover, or physical surface features, of ten European countries between June 2017 and May 2018. The types of land cover in each image, such as pastures or forests, are annotated according to 19 labels. The following are a few example RGB images and their labels.

The first step in our workflow is to prepare the BigEarthNet-S2 dataset for DINO training and evaluation. We start by downloading the dataset from the terminal of our SageMaker notebook instance:

wget https://bigearth.net/downloads/BigEarthNet-S2-v1.0.tar.gz
tar -xvf BigEarthNet-S2-v1.0.tar.gz

The dataset has a size of about 109 GB. Each image is stored in its own folder and contains 12 spectral channels. Three bands with 60m spatial resolution (60-meter pixel height/width) are designed to identify aerosols (B01), water vapor (B09), and clouds (B10). Six bands with 20m spatial resolution are used to identify vegetation (B05, B06, B07, B8A) and distinguish between snow, ice, and clouds (B11, B12). Three bands with 10m spatial resolution help capture visible and near-infrared light (B02, B03, B04, B8/B8A). Additionally, each folder contains a JSON file with the image metadata. A detailed description of the data is provided in the BigEarthNet Guide.

To perform statistical analyses of the data and load images during DINO training, we process the individual metadata files into a common geopandas Parquet file. This can be done using the BigEarthNet Common and the BigEarthNet GDF Builder helper packages:

python -m bigearthnet_gdf_builder.builder build-recommended-s2-parquet BigEarthNet-v1.0/

The resulting metadata file contains the recommended image set, which excludes 71,042 images that are fully covered by seasonal snow, clouds, and cloud shadows. It also contains information on the acquisition date, location, land cover, and train, validation, and test split for each image.

We store the BigEarthNet-S2 images and metadata file in an S3 bucket. Because we use true color images during DINO training, we only upload the red (B04), green (B03), and blue (B02) bands:

aws s3 cp final_ben_s2.parquet s3://bigearthnet-s2-dataset/metadata/
aws s3 cp BigEarthNet-v1.0/ s3://bigearthnet-s2-dataset/data_rgb/ 
    --recursive 
    --exclude "*" 
    --include "_B02.tif" 
    --include "_B03.tif"  
    --include "_B04.tif"

The dataset is approximately 48 GB in size and has the following structure:

bigearthnet-s2-dataset/                                    Amazon S3 bucket
├── metadata/
│ └── final_ben_s2.parquet 
└── dataset_rgb/
  ├── S2A_MSIL2A_20170613T101031_0_45/
  │ └── S2A_MSIL2A_20170613T101031_0_45_B02.tif            Blue channel
  │ └── S2A_MSIL2A_20170613T101031_0_45_B03.tif            Green channel
  │ └── S2A_MSIL2A_20170613T101031_0_45_B04.tif            Red channel

Train DINO models with SageMaker

Now that our dataset has been uploaded to Amazon S3, we move to train DINO models on BigEarthNet-S2. As shown in the following figure, the DINO algorithm passes different global and local crops of an input image to student and teacher networks. The student network is taught to match the output of the teacher network by minimizing the cross-entropy loss. The student and teacher weights are connected by an exponential moving average (EMA).

We make two modifications to the original DINO code. First, we create a custom PyTorch dataset class to load the BigEarthNet-S2 images. The code was initially written to process ImageNet data and expects images to be stored by class. BigEarthNet-S2, however, is a multi-label dataset where each image resides in its own subfolder. Our dataset class loads each image using the file path stored in the metadata:

import pandas as pd
import rasterio
from PIL import Image
import torch
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms, utils
 
OPTICAL_MAX_VALUE = 2000

LAND_COVER_LABELS = [
    "Urban fabric",
    "Industrial or commercial units",
    "Arable land",
    "Permanent crops",
    "Pastures",
    "Complex cultivation patterns",
    "Land principally occupied by agriculture, with significant areas of natural vegetation",
    "Agro-forestry areas",
    "Broad-leaved forest",
    "Coniferous forest",
    "Mixed forest",
    "Natural grassland and sparsely vegetated areas",
    "Moors, heathland and sclerophyllous vegetation",
    "Transitional woodland, shrub",
    "Beaches, dunes, sands",
    "Inland wetlands",
    "Coastal wetlands",
    "Inland waters",
    "Marine waters",
]
 
class BigEarthNetDataset(Dataset):
     """
     PyTorch dataset class that loads the BigEarthNet-S2 images from a metadata file.

     Args: 
          metadata_file: path to metadata file 
          data_dir: directory where BigEarthNet-S2 data is located  
          split: train, validation, or test split
          transform: transformations applied to the input image
     """
     def __init__(self, metadata_file, data_dir, split="train", transform=None):
		# image file paths from metadata
        metadata = pd.read_parquet(metadata_file)
        self.metadata_split = metadata[metadata["original_split"] == split]
        self.data_dir = data_dir
        self.patch_names = self.metadata_split["name"].tolist()
 
        # one-hot-encode land cover labels 
        multiclass_labels = self.metadata_split.new_labels.tolist()
        self.labels = self.get_multi_onehot_labels(multiclass_labels)

        # transforms        
        self.transform = transform
 
    def __len__(self):
        """Return length of dataset."""
        return len(self.metadata_split)
 
    def __getitem__(self, index):
        """Returns the image and label for a given index."""
        patch_name = self.patch_names[index]
        file_path = os.path.join(self.data_dir, patch_name)
	
	# generate RGB image
        r_channel = rasterio.open(os.path.join(file_path, patch_name + "_B04.tif")).read(1)
        g_channel = rasterio.open(os.path.join(file_path, patch_name + "_B03.tif")).read(1)
        b_channel = rasterio.open(os.path.join(file_path, patch_name + "_B02.tif")).read(1)
 
        image = np.stack([r_channel, g_channel, b_channel], axis=2)
        image = image / OPTICAL_MAX_VALUE * 255
        image = np.clip(image, 0, 225).astype(np.uint8)
    
        # apply image transforms
        image = Image.fromarray(image, mode="RGB")
        if self.transform is not None:
            image = self.transform(image)
 
        # load label
        label = self.labels[index]
 
        return image, label
  
    def get_multi_onehot_labels(self, multiclass_labels):
        """Convert BEN-19 labels to one-hot encoded vector."""
        targets = torch.zeros([len(multiclass_labels), len(LAND_COVER_LABELS)])
        for index, img_labels in enumerate(multiclass_labels):
            for label in img_labels:
                index_hot = LAND_COVER_LABELS.index(label)
                targets[index, index_hot] = 1.
        return targets

This dataset class is called in main_dino.py during training. Although the code includes a function to one-hot encode the land cover labels, these labels are not used by the DINO algorithm.

The second change we make to the DINO code is to add support for SMDDP. We add the following code to the init_distributed_mode function in the util.py file:

init_distributed_mode function in the util.py file:

def init_distributed_mode(args):
     if json.loads(
          os.environ.get('SM_FRAMEWORK_PARAMS', '{}'))
         .get('sagemaker_distributed_dataparallel_enabled', False)
     ): 
          # launch training with SMDDP 
          dist.init_process_group(backend='smddp')
          args.word_size = dist.get_world_size() 
          args.gpu = int(os.environ['LOCAL_RANK'])

With these adjustments, we are ready to train DINO models on BigEarthNet-S2 using SageMaker. To train on multiple GPUs or instances, we create a SageMaker PyTorch Estimator that ingests the DINO training script, the image and metadata file paths, and the training hyperparameters:

import time
from sagemaker.pytorch import PyTorch

# output bucket where final model artifacts are uploaded 
DINO_OUTPUT_BUCKET = 'dino-models'

# paths on training instance  
sm_metadata_path = '/opt/ml/input/data/metadata'              
sm_data_path = '/opt/ml/input/data/train'                     
sm_output_path = '/opt/ml/output/data'                        
sm_checkpoint_path = '/opt/ml/checkpoints'                

# training job name
dino_base_job_name = f'dino-model-{int(time.time())}'

# create SageMaker Estimator
estimator = PyTorch(
    base_job_name=dino_base_job_name,
    source_dir='path/to/aerial_featurizer',
    entry_point='main_dino.py',
    role=role,
    framework_version="1.12",
    py_version="py38",
    instance_count=1,
    instance_type="ml.p3.16xlarge",    
    distribution = {'smdistributed':{'dataparallel':{'enabled': True}}},        
    volume_size=100,
    sagemaker_session=sagemaker_session,
    hyperparameters = {
        # hyperparameters passed to entry point script
        'arch': 'vit_small',
        'patch_size': 16,
        'metadata_dir': sm_metadata_path,
        'data_dir': sm_data_path,
        'output_dir': sm_output_path,
        'checkpoint_dir': sm_checkpoint_path,
        'epochs': 100,
        'saveckp_freq': 20,
    },
    max_run=24*60*60,               
    checkpoint_local_path = sm_checkpoint_path,
    checkpoint_s3_uri =f's3://{DINO_OUTPUT_BUCKET}/checkpoints/{base_job_name}', 
    debugger_hook_config=False,                           
)

This code specifies that we will train a small vision transformer model (21 million parameters) with a patch size of 16 for 100 epochs. It is best practice to create a new checkpoint_s3_uri for each training job in order to reduce the initial data download time. Because we are using SMDDP, we must train on an ml.p3.16xlarge, ml.p3dn.24xlarge, or ml.p4d.24xlarge instance. This is because SMDDP is only enabled for the largest multi-GPU instances. To train on smaller instance types without SMDDP, you will need to remove the distribution and debugger_hook_config arguments from the estimator.

After we have created the SageMaker PyTorch Estimator, we launch the training job by calling the fit method. We specify the input training data using the Amazon S3 URIs for the BigEarthNet-S2 metadata and images:

# call fit to begin training
estimator.fit(
    inputs={
        'metadata': 's3://bigearthnet-s2-dataset/metadata/',
        'train': 's3://bigearthnet-s2-dataset/data_rgb/',
    },
    wait=False
)

SageMaker spins up the instance, copies the training script and dependencies, and begins DINO training. We can monitor the progress of the training job from our Jupyter notebook using the following commands:

# monitor training
training_job_name = estimator.latest_training_job.name 
attached_estimator = PyTorch.attach(training_job_name)
attached_estimator.logs()

We can also monitor instance metrics and view log files on the SageMaker console under Training jobs. In the following figures, we plot the GPU utilization and loss function for a DINO model trained on an ml.p3.16xlarge instance with a batch size of 128.

During training, the GPU utilization is 83% of the ml.p3.16xlarge capacity (8 NVIDIA Tesla V100 GPUs) and the VRAM usage is 85%. The loss function steadily decreases with each epoch, indicating that the outputs of the student and teacher networks are becoming more similar. In total, training takes about 11 hours.

Transfer learning to downstream tasks

Our trained DINO model can be transferred to downstream tasks like image classification or segmentation. In this section, we use the pre-trained DINO features to predict the land cover classes for images in the BigEarthNet-S2 dataset. As depicted in the following diagram, we train a multi-label linear classifier on top of frozen DINO features. In this example, the input image is associated with arable land and pasture land covers.

Most of the code for the linear classifier is already in place in the original DINO repository. We make a few adjustments for our specific task. As before, we use the custom BigEarthNet dataset to load images during training and evaluation. The labels for the images are one-hot encoded as 19-dimensional binary vectors. We use the binary cross-entropy for the loss function and compute the average precision to evaluate the performance of the model.

To train the classifier, we create a SageMaker PyTorch Estimator that runs the training script, eval_linear.py. The training hyperparameters include the details of the DINO model architecture and the file path for the model checkpoint:

# output bucket where final model artifacts are uploaded 
CLASSIFIER_OUTPUT_BUCKET = 'land-cover-classification'

# DINO checkpoint name 
checkpoint = 'checkpoint.pth'

# paths on training instance  
sm_dino_path = f'/opt/ml/input/data/dino_checkpoint'          
sm_dino_checkpoint = f'{sm_dino_path}/{checkpoint}'           

# training job name
classifier_base_job_name = f'linear-classifier-{int(time.time())}'

# create Estimator 
estimator = PyTorch(
    base_job_name=classifier_base_job_name,
    source_dir='path/to/aerial_featurizer',
    entry_point = 'eval_linear.py',
    role=role,
    framework_version='1.12',
    py_version='py38',
    instance_count=1,
    instance_type='ml.p3.2xlarge',
    sagemaker_session=sagemaker_session,
    hyperparameters = {
    # hyperparameters passed to entry point script
        'arch': 'vit_small',
        'pretrained_weights': sm_dino_checkpoint,
        'epochs': 50,
        'data_dir': sm_data_path,
        'metadata_dir': sm_metadata_path,
        'output_dir': sm_checkpoint_path,
        'num_labels': 19,
    },
    max_run=1*60*60, 
    checkpoint_local_path = sm_checkpoint_path,
    checkpoint_s3_uri =f's3://{CLASSIFIER_OUTPUT_BUCKET}/checkpoints/{base_job_name}',
)

We start the training job using the fit method, supplying the Amazon S3 locations of the BigEarthNet-S2 metadata and training images and the DINO model checkpoint:

# call fit to begin training
estimator.fit(
    inputs={
    'metadata': 's3://bigearthnet-s2-dataset/metadata/',
    'dataset': 's3://bigearthnet-s2-dataset/data_rgb/',
    'dino_checkpoint': f's3://bigearthnet-s2-dataset/dino-models/checkpoints/{dino_base_job_name}',
    },
    wait=False
)

When training is complete, we can perform inference on the BigEarthNet-S2 test set using SageMaker batch transform or SageMaker Processing. In the following table, we compare the average precision of the linear model on test set images using two different DINO image representations. The first model, ViT-S/16 (ImageNet), is the small vision transformer checkpoint included in the DINO repository that was pre-trained using front-facing images in the ImageNet dataset. The second model, ViT-S/16 (BigEarthNet-S2), is the model we produced by pre-training on overhead imagery.

Model	Average precision
ViT-S/16 (ImageNet)	0.685
ViT-S/16 (BigEarthNet-S2)	0.732

We find that the DINO model pre-trained on BigEarthNet-S2 transfers better to the land cover classification task than the DINO model pre-trained on ImageNet, resulting in a 6.7% increase in the average precision.

Clean up

After completing DINO training and transfer learning, we can clean up our resources to avoid incurring charges. We stop or delete our notebook instance and remove any unwanted data or model artifacts from Amazon S3.

Conclusion

This post demonstrated how to train DINO models on overhead imagery using SageMaker. We used SageMaker PyTorch Estimators and SMDDP in order to generate representations of BigEarthNet-S2 images without the need for explicit labels. We then transferred the DINO features to a downstream image classification task, which involved predicting the land cover class of BigEarthNet-S2 images. For this task, pre-training on satellite imagery yielded a 6.7% increase in average precision relative to pre-training on ImageNet.

You can use this solution as a template for training DINO models on large-scale, unlabeled aerial and satellite imagery datasets. To learn more about DINO and building models on SageMaker, check out the following resources:

About the Authors

Ben Veasey is a Senior Associate Data Scientist at Travelers, working within the AI & Automation Accelerator team. With a deep understanding of innovative AI technologies, including computer vision, natural language processing, and generative AI, Ben is dedicated to accelerating the adoption of these technologies to optimize business processes and drive efficiency at Travelers.

Jeremy Anderson is a Director & Data Scientist at Travelers on the AI & Automation Accelerator team. He is interested in solving business problems with the latest AI and deep learning techniques including large language models, foundational imagery models, and generative AI. Prior to Travelers, Jeremy earned a PhD in Molecular Biophysics from the Johns Hopkins University and also studied evolutionary biochemistry. Outside of work you can find him running, woodworking, or rewilding his yard.

Jordan Knight is a Senior Data Scientist working for Travelers in the Business Insurance Analytics & Research Department. His passion is for solving challenging real-world computer vision problems and exploring new state-of-the-art methods to do so. He has a particular interest in the social impact of ML models and how we can continue to improve modeling processes to develop ML solutions that are equitable for all. Jordan graduated from MIT with a Master’s in Business Analytics. In his free time you can find him either rock climbing, hiking, or continuing to develop his somewhat rudimentary cooking skills.

June Li is a data scientist at Travelers’s Business Insurance’s Artificial Intelligence team, where she leads and coordinates work in the AI imagery portfolio. She is passionate about implementing innovative AI solutions that bring substantial value to the business partners and stakeholders. Her work has been integral in transforming complex business challenges into opportunities by leveraging cutting-edge AI technologies.

Sourav Bhabesh is a Senior Applied Scientist at the AWS Titan Labs, where he builds Foundational Model (FM) capabilities and features. His specialty is Natural Language Processing (NLP) and is passionate about deep learning. Outside of work he enjoys reading books and traveling.

Laura Kulowski is an Applied Scientist at Amazon’s Generative AI Innovation Center, where she works closely with customers to build generative AI solutions. In her free time, Laura enjoys exploring new places by bike.

Andrew Ang is a Sr. Machine Learning Engineer at AWS. In addition to helping customers build AI/ML solutions, he enjoys water sports, squash and watching travel & food vlogs.

Mehdi Noori is an Applied Science Manager at the Generative AI Innovation Center. With a passion for bridging technology and innovation, he assists AWS customers in unlocking the potential of generative AI, turning potential challenges into opportunities for rapid experimentation and innovation by focusing on scalable, measurable, and impactful uses of advanced AI technologies, and streamlining the path to production.

Research Focus: Week of August 14, 2023

Microsoft Research Focus 22 | Week of August 14, 2023

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

NEW RESEARCH

HyWay: Enabling Mingling in the Hybrid World

As remote work has grown in recent years, videoconferencing tools like Teams help support structured meetings with a scheduled time, a specific agenda, and a set of invitees. For unstructured interactions, like hallway conversations or water cooler chats, newer “spatial” tools such as Gather and SpatialChat arose. But these are confined to users in virtual-only settings.

Many organizations and events now offer a mix of in-person and remote attendance, or “hybrid” work. This creates a new challenge for remote workers or conference goers who want to stay visible to, and mingle with, their colleagues attending in person. Existing tools fall short either in not supporting unstructured interactions, or in not supporting hybrid settings, or both.

In a recent paper: HyWay: Enabling Mingling in the Hybrid World, researchers from Microsoft present a system to support informal interactions among physical and virtual participants. HyWay lets remote users see and hear, and be seen and heard by, in-person users using large displays placed in hallways or “physical zones,” with the ability to move between the zones using a map-based interface. In-person users, who aren’t tethered to a device or app, can simply walk from one zone to another.

The paper includes user survey findings from multiple deployments.

Read the paper

NEW RESEARCH

Auto-Tables: Synthesizing Multi-Step Transformations to Relationalize Tables without Using Examples

Relational tables, where each row corresponds to an entity and each column corresponds to an attribute, are the standard tables in relational databases. However, a survey of real spreadsheet-tables and web-tables shows that over 30% of tables “in the wild” do not conform to the relational standard. This means complex table-restructuring transformations are needed before these tables can be queried using SQL-based analytics tools. Unfortunately, the required transformations are non-trivial to program, creating a substantial pain point for technical and non-technical users alike, as evidenced by large numbers of forum questions in places like StackOverflow and Excel/Power BI/Tableau forums.

In a new paper: Auto-Tables: Synthesizing Multi-Step Transformations to Relationalize Tables without Using Examples, researchers from Microsoft present a system that can automatically synthesize pipelines with multi-step transformations (in Python or other languages). This system transforms non-relational tables into standard relational forms for downstream analytics, obviating the need for users to manually program transformations.

The research includes an extensive benchmark for this new task, compiled by collecting 244 real test cases from publicly available spreadsheets and online forums. The accompanying evaluation suggests that Auto-Tables can successfully synthesize transformations for over 70% of test cases at interactive speeds, without requiring any input from users, making this an effective tool for both technical and non-technical users to prepare data for analytics.

Read the paper

View the data

NEW RESEARCH

Learning to Retrieve In-Context Examples for Large Language Models

In-context learning is an emerging paradigm that allows large language models (LLMs) to perform tasks with few-shot examples, without requiring any updates to the model parameters. However, the effectiveness of in-context learning is heavily reliant on the quality of the selected examples.

In a new paper: Learning to Retrieve In-Context Examples for Large Language Models, researchers from Microsoft propose a novel framework to iteratively train dense retrievers that can identify high-quality in-context examples for LLMs. This framework initially trains a reward model based on LLM feedback to evaluate the quality of candidate examples, followed by knowledge distillation to train a bi-encoder-based dense retriever. Experiments on a suite of 30 tasks demonstrate that the framework significantly enhances in-context learning performance. The research also demonstrates the generalization ability of the framework to unseen tasks during training. An in-depth analysis reveals that the model improves performance by retrieving examples with similar patterns, and the gains are consistent across LLMs of varying sizes.

Read the paper

NEW RESEARCH

End-to-End Word-Level Pronunciation Assessment with MASK Pre-training

The Computer-Aided Pronunciation Training (CAPT) system is a powerful tool designed to help people improve their language skills by using advanced AI technologies. Pronunciation assessment is a major challenge in CAPT, especially at the word (phoneme)-level. To obtain word (phoneme)-level scores, current methods usually rely on aligning components to obtain acoustic features of each word (phoneme), which limits the performance of assessment to the accuracy of alignments.

To address this problem, a new paper from researchers at Microsoft: End-to-End Word-Level Pronunciation Assessment with MASK Pre-training, proposes a simple, yet effective method called Masked pre-training for Pronunciation Assessment (MPA). By incorporating a mask-predict strategy, MPA allows the model to train in an end-to-end manner, eliminating the problem of misalignment in word-level assessment. Furthermore, the researchers designed two evaluation strategies to enable the model to conduct assessments in both unsupervised and supervised settings. Experimental results on the SpeechOcean762 dataset demonstrate that MPA could achieve better performance than previous methods, without any explicit alignment. Despite this, MPA still has some limitations, such as requiring more inference time and reference text. Those limitations are expected to be addressed in future work.

Read the paper

The post Research Focus: Week of August 14, 2023 appeared first on Microsoft Research.

How Thomson Reuters developed Open Arena, an enterprise-grade large language model playground, in under 6 weeks

This post is cowritten by Shirsha Ray Chaudhuri, Harpreet Singh Baath, Rashmi B Pawar, and Palvika Bansal from Thomson Reuters.

Thomson Reuters (TR), a global content and technology-driven company, has been using artificial intelligence (AI) and machine learning (ML) in its professional information products for decades. Thomson Reuters Labs, the company’s dedicated innovation team, has been integral to its pioneering work in AI and natural language processing (NLP). A key milestone was the launch of Westlaw Is Natural (WIN) in 1992. This technology was one of the first of its kind, using NLP for more efficient and natural legal research. Fast forward to 2023, and Thomson Reuters continues to define the future of professionals through rapid innovation, creative solutions, and powerful technology.

The introduction of generative AI provides another opportunity for Thomson Reuters to work with customers and once again advance how they do their work, helping professionals draw insights and automate workflows, enabling them to focus their time where it matters most. While Thomson Reuters pushes the boundaries of what generative AI and other technologies could do for the modern professional, how is it using the power of this technology for its own teams?

Thomson Reuters is highly focused on driving awareness and understanding of AI among colleagues in every team and every business area. Starting from foundational principles of what is AI and how does ML work, it’s delivering a rolling program of company-wide AI awareness sessions, including webinars, training materials, and panel discussions. During these sessions, ideas on how AI could be used started to surface as colleagues considered how to use tools that helped them use AI for their day-to-day tasks as well as serve their customers.

In this post, we discuss how Thomson Reuters Labs created Open Arena, Thomson Reuters’s enterprise-wide large language model (LLM) playground that was developed in collaboration with AWS. The original concept came out of an AI/ML Hackathon supported by Simone Zucchet (AWS Solutions Architect) and Tim Precious (AWS Account Manager) and was developed into production using AWS services in under 6 weeks with support from AWS. AWS-managed services such as AWS Lambda, Amazon DynamoDB, and Amazon SageMaker, as well as the pre-built Hugging Face Deep Learning Containers (DLCs), contributed to the pace of innovation. Open Arena has helped unlock company-wide experimentation with generative AI in a safe and controlled environment.

Diving deeper, Open Arena is a web-based playground that allows users to experiment with a growing set of tools enabled with LLMs. This provides non-programmatic access for Thomson Reuters employees who don’t have a background in coding but want to explore the art of the possible with generative AI at TR. Open Arena has been developed to get quick answers from several sets of corpora, such as for customer support agents, solutions to get quick answers from websites, solutions to summarize and verify points in a document, and much more. The capabilities of Open Arena continue to grow as the experiences from employees across Thomson Reuters spur new ideas and as new trends emerge in the field of generative AI. This is all facilitated by the modular serverless AWS architecture that underpins the solution.

Envisioning the Open Arena

Thomson Reuters’s objective was clear: to build a safe, secure, user-friendly platform—an “open arena”—as an enterprise-wide playground. Here, internal teams could not only explore and test the various LLMs developed in-house and those from the open-source community such as with the AWS and Hugging Face partnership, but also discover unique use cases by merging the capabilities of LLMs with Thomson Reuters’s extensive company data. This kind of platform would enhance the ability of teams to generate innovative solutions, improving the products and services that Thomson Reuters could offer its clients.

The envisioned Open Arena platform would serve the diverse teams within Thomson Reuters globally, providing them with a playground to freely interact with LLMs. The ability to have this interaction in a controlled environment would allow teams to uncover new applications and methodologies that might not have been apparent in a less direct engagement with these complex models.

Building the Open Arena

Building the Open Arena was a multi-faceted process. We aimed to harness the capabilities of AWS’s serverless and ML services to craft a solution that would seamlessly enable Thomson Reuters employees to experiment with the latest LLMs. We saw the potential of these services not only to provide scalability and manageability but also to ensure cost-effectiveness.

Solution overview

From creating a robust environment for model deployment and fine-tuning to ensuring meticulous data management and providing a seamless user experience, TR needed each aspect to integrate with several AWS services. Open Arena’s architecture was designed to be comprehensive yet intuitive, balancing complexity with ease of use. The following diagram illustrates this architecture.

SageMaker served as the backbone, facilitating model deployment as SageMaker endpoints and providing a robust environment for fine-tuning the models. We capitalized on the Hugging Face on SageMaker DLC offered by AWS to enhance our deployment process. In addition, we used the SageMaker Hugging Face Inference Toolkit and the Accelerate library to accelerate the inference process and effectively handle the demands of running complex and resource-intensive models. These comprehensive tools were instrumental in ensuring the fast and seamless deployment of our LLMs. Lambda functions, triggered by Amazon API Gateway, managed the APIs, ensuring meticulous preprocessing and postprocessing of the data.

In our quest to deliver a seamless user experience, we adopted a secure API Gateway to connect the front end hosted in Amazon Simple Storage Service (Amazon S3) to the Lambda backend. We deployed the front end as a static site on an S3 bucket, ensuring user authentication with the help of Amazon CloudFront and our company’s single sign-on mechanism.

Open Arena has been designed to integrate seamlessly with multiple LLMs through REST APIs. This ensured that the platform was flexible enough to react and integrate quickly as new state-of-the art-models were developed and released in the fast-paced generative AI space. From its inception, Open Arena was architected to provide a safe and secure enterprise AI/ML playground, so Thomson Reuters employees can experiment with any state-of-the-art LLM as quickly as they are released. Using Hugging Face models on SageMaker allowed the team to fine-tune models in a secure environment because all data is encrypted and doesn’t leave the virtual private cloud (VPC), ensuring that data remains private and confidential.

DynamoDB, our chosen NoSQL database service, efficiently stored and managed a wide variety of data, including user queries, responses, response times, and user data. To streamline the development and deployment process, we employed AWS CodeBuild and AWS CodePipeline for continuous integration and continuous delivery (CI/CD). Monitoring the infrastructure and ensuring its optimal functioning was made possible with Amazon CloudWatch, which provided custom dashboards and comprehensive logging capabilities.

Model development and integration

The heart of Open Arena is its diverse assortment of LLMs, which comprise both open-source and in-house developed models. These models have been fine-tuned to provide responses following specific user prompts.

We have experimented with different LLMs for different use cases in Open Arena, including Flan-T5-XL, Open Assistant, MPT, Falcon, and fine-tuned Flan-T5-XL on available open-source datasets using the parameter efficient fine-tuning technique. We used bitsandbytes integration from Hugging Face to experiment with various quantization techniques. This allowed us to optimize our LLMs for enhanced performance and efficiency, paving the way for even greater innovation. While selecting a model as a backend behind these use cases, we considered different aspects, like what does the performance of these models look like on NLP tasks that are of relevance to Thomson Reuters. Furthermore, we needed to consider engineering aspects, such as the following:

Increased efficiency when building applications with LLMs – Quickly integrating and deploying state-of-the-art LLMs into our applications and workloads that run on AWS, using familiar controls and integrations with the depth and breadth of AWS
Secure customization – Ensuring that all data used to fine-tune LLMs remains encrypted and does not leave the VPC
Flexibility – The ability to choose from a wide selection of AWS native and open-source LLMs to find the right model for our varied use cases

We’ve been asking questions like is the higher cost of larger models justified by significant performance gains? Can these models handle long documents?

The following diagram illustrates our model architecture.

We have been evaluating these models on the preceding aspects on open-source legal datasets and Thomson Reuters internal datasets to assess them for specific use cases.

For content-based use cases (experiences that call for answers from specific corpus), we have a retrieval augmented generation (RAG) pipeline in place, which will fetch the most relevant content against the query. In such pipelines, documents are split into chunks and then embeddings are created and stored in OpenSearch. To get the best match documents or chunks, we use the retrieval/re-ranker approach based on bi-encoder and cross-encoder models. The retrieved best match is then passed as an input to the LLM along with the query to generate the best response.

The integration of Thomson Reuters’s internal content with the LLM experience has been instrumental in enabling users to extract more relevant and insightful results from these models. More importantly, it led to sparking ideas amongst every team for possibilities of adopting AI-enabled solutions in their business workflows.

Open Arena tiles: Facilitating user interaction

Open Arena adopts a user-friendly interface, designed with pre-set enabling tiles for each experience, as shown in the following screenshot. These tiles serve as pre-set interactions that cater to the specific requirements of the users.

For instance, the Experiment with Open Source LLM tile opens a chat-like interaction channel with open-source LLMs.

The Ask your Document tile allows users to upload documents and ask specific questions related to the content from the LLMs. The Experiment with Summarization tile enables users to distil large volumes of text into concise summaries, as shown in the following screenshot.

These tiles simplify the user consumption of AI-enabled work solutions and the navigation process within the platform, igniting creativity and fostering the discovery of innovative use cases.

The impact of the Open Arena

The launch of the Open Arena marked a significant milestone in Thomson Reuters’s journey towards fostering a culture of innovation and collaboration. The platform’s success was undeniable, with its benefits becoming rapidly evident across the company.

The Open Arena’s intuitive, chat-based design required no significant technical knowledge, making it accessible to different teams and different job roles across the globe. This ease of use boosted engagement levels, encouraging more users to explore the platform and unveiling innovative use cases.

In under a month, the Open Arena catered to over 1,000 monthly internal users from TR’s global footprint, averaging an interaction time of 5 minutes per user. With a goal to foster internal TR LLM experimentation and crowdsource creation of LLM use cases, Open Arena’s launch led to an influx of new use cases, effectively harnessing the power of LLMs combined with Thomson Reuters’s vast data resources.

Here’s what some of our users had to say about the Open Arena:

“Open Arena gives employees from all parts of the company a chance to experiment with LLMs in a practical, hands-on way. It’s one thing to read about AI tools, and another to use them yourself. This platform turbo-charges our AI learning efforts across Thomson Reuters.”

– Abby Pinto, Talent Development Solutions Lead, People Function

“OA (Open Arena) has enabled me to experiment with tricky news translation problems for the German Language Service of Reuters that conventional translation software can’t handle, and to do so in a safe environment where I can use our actual stories without fear of data leaks. The team behind OA has been incredibly responsive to suggestions for new features, which is the sort of service you can only dream of with other software.”

– Scot W. Stevenson, Senior Breaking News Correspondent for the German Language Service, Berlin, Germany

“When I used Open Arena, I got the idea to build a similar interface for our teams of customer support agents. This playground helped us reimagine the possibilities with GenAI.”

– Marcel Batista, Gerente de Servicos, Operations Customer Service & Support

“Open Arena powered by AWS serverless services, Amazon SageMaker, and Hugging Face helped us to quickly expose cutting-edge LLMs and generative AI tooling to our colleagues, which helped drive enterprise-wide innovation.”

– Shirsha Ray Chaudhuri, Director, Research Engineering, Thomson Reuters Labs

On a broader scale, the introduction of the Open Arena had a profound impact on the company. It not only increased AI awareness among employees but also stimulated a spirit of innovation and collaboration. The platform brought teams together to explore, experiment, and generate ideas, fostering an environment where groundbreaking concepts could be turned into reality.

Furthermore, the Open Arena has had a positive influence on Thomson Reuters AI services and products. The platform has served as a sandbox for AI, allowing teams to identify and refine AI applications before incorporating them into our offerings. Consequently, this has accelerated the development and enhancement of Thomson Reuters AI services, providing customers with solutions that are ever evolving and at the forefront of technological advancement.

Conclusion

In the fast-paced world of AI, it is crucial to continue advancing, and Thomson Reuters is committed to doing just that. The team behind the Open Arena is constantly working to add more features and enhance the platform’s capabilities, using AWS services like Amazon Bedrock and Amazon SageMaker Jumpstart, ensuring that it remains a valuable resource for our teams. As we move forward, we aim to keep pace with the rapidly evolving landscape of generative AI and LLMs. AWS provides the services needed for TR to keep pace with the constantly evolving generative AI field.

In addition to the ongoing development of the Open Arena platform, we are actively working on productionizing the multitude of use cases generated by the platform. This will allow us to provide our customers with even more advanced and efficient AI solutions, tailored to their specific needs. Furthermore, we will continue to foster a culture of innovation and collaboration, enabling our teams to explore new ideas and applications for AI technology.

As we embark on this exciting journey, we are confident that the Open Arena will play a pivotal role in driving innovation and collaboration across Thomson Reuters. By staying at the forefront of AI advancements, we will ensure that our products and services continue to evolve and meet the ever-changing demands of our customers.

About the Authors

Shirsha Ray Chaudhuri (Director, Research Engineering) heads the ML Engineering team in Bangalore for Thomson Reuters Labs, where she is leading the development and deployment of well-architected solutions in AWS and other cloud platforms for ML projects that drive efficiency and value for AI-driven features in Thomson Reuters products, platforms, and business systems. She works with communities on AI for good, societal impact projects and in the tech for D&I space. She loves to network with people who are using AI and modern tech for building a better world that is more inclusive, more digital, and together a better tomorrow.

Harpreet Singh Baath is a Senior Cloud and DevOps Engineer at Thomson Reuters Labs, where he helps research engineers and scientists develop machine learning solutions on cloud platforms. With over 6 years of experience, Harpreet’s expertise spans across cloud architectures, automation, containerization, enabling DevOps practices, and cost optimization. He is passionate about efficiency and cost-effectiveness, ensuring that cloud resources are utilized optimally.

Rashmi B Pawar is a Machine Learning Engineer at Thomson Reuters. She possesses considerable experience in productionizing models, establishing inference, and creating training pipelines tailored for various machine learning applications. Furthermore, she has significant expertise in incorporating machine learning workflows into existing systems and products.

Palvika Bansal is an Associate Applied Research Scientist at Thomson Reuters. She has worked on projects across diverse sectors to solve business problems for customers using AI/ML. She is highly passionate about her work and enthusiastic about taking on new challenges. Outside of work, she enjoys traveling, cooking, and reading.

Simone Zucchet is a Senior Solutions Architect at AWS. With close to a decade’s experience as a Cloud Architect, Simone enjoys working on innovative projects that help transform the way organizations approach business problems. He helps support large enterprise customers at AWS and is part of the Machine Learning TFC. Outside of his professional life, he enjoys working on cars and photography.

Heiko Hotz is a Senior Solutions Architect for AI & Machine Learning with a special focus on natural language processing, large language models, and generative AI. Prior to this role, he was the Head of Data Science for Amazon’s EU Customer Service. Heiko helps our customers be successful in their AI/ML journey on AWS and has worked with organizations in many industries, including insurance, financial services, media and entertainment, healthcare, utilities, and manufacturing. In his spare time, Heiko travels as much as possible.

João Moura is an AI/ML Specialist Solutions Architect at AWS, based in Spain. He helps customers with deep learning model training and inference optimization, and more broadly building large-scale ML platforms on AWS. He is also an active proponent of ML-specialized hardware and low-code ML solutions.

Georgios Schinas is a Specialist Solutions Architect for AI/ML in the EMEA region. He is based in London and works closely with customers in the UK and Ireland. Georgios helps customers design and deploy machine learning applications in production on AWS, with a particular interest in MLOps practices and enabling customers to perform machine learning at scale. In his spare time, he enjoys traveling, cooking, and spending time with friends and family.

Replit CEO Amjad Masad on Empowering the Next Billion Software Creators

Replit aims to empower the next billion software creators.

In this week’s episode of NVIDIA’s AI Podcast, host Noah Kraviz dives into a conversation with Replit CEO Amjad Masad. Masad says the San Francisco-based maker of a software development platform, which came up as a member of NVIDIA’s Inception program for startups, wants to bridge the gap between ideas and software, a task simplified by advances in generative AI.

“Replit is fundamentally about reducing the friction between an idea and a software product,” Masad said.

The company’s Ghostwriter coding AI has two main features: a code completion model and a chat model. These features not only make suggestions as users type their code, but also provide intelligent explanations of what a piece of code is doing, tracing dependencies and context. The model can even flag errors and offer solutions — like a full collaborator in a Google Docs for code.

The company is also developing “make me an app” functionality. This tool allows users to provide high-level instructions to an Artificial Developer Intelligence, which then builds, tests and iterates the requested software.

The aim is to make software creation accessible to all, even those with no coding experience. While this feature is still under development, Masad said the company plans to improve it over the next year, potentially having it ready for developers in the next six to eight months.

Going forward, Masad envisions a future where AI functions as a collaborator, able to conduct high-level tasks and even manage resources. “We’re entering a period where software is going to feel more alive,” Masad said. “And so I think computing is becoming more humane, more accessible, more exciting, more natural.”
<

The AI Podcast · Replit CEO Amjad Masad on Empowering the Next Billion Software Creators

h2>You Might Also Like

Jules Anh Tuan Nguyen Explains How AI Lets Amputee Control Prosthetic Hand, Video Games
A postdoctoral researcher at the University of Minnesota discusses his efforts to allow amputees to control their prosthetic limb — right down to the finger motions — with their minds.

Overjet’s Ai Wardah Inam on Bringing AI to Dentistry
Overjet, a member of NVIDIA Inception, is moving fast to bring AI to dentists’ offices. Dr. Wardah Inam, CEO of the company, discusses using AI to improve patient care.

Immunai CTO and Co-Founder Luis Voloch on Using Deep Learning to Develop New Drugs
Luis Voloch, co-founder and chief technology officer of Immunai, talks about tackling the challenges of the immune system with a machine learning and data science mindset.

Subscribe to the AI Podcast: Now Available on Amazon Music

In addition to Amazon Music, get the AI Podcast through iTunes, Google Podcasts, Google Play, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn.

Make the AI Podcast better. Have a few minutes to spare? Fill out this listener survey.

Into the Omniverse: Reallusion Elevates Character Animation Workflows With Two-Way Live Sync and OpenUSD Support

Editor’s note: This post is part of Into the Omniverse, a series focused on how artists, developers and enterprises can transform their workflows using the latest advances in OpenUSD and NVIDIA Omniverse.

Whether animating a single 3D character or generating a group of them for industrial digitalization, creators and developers who use the popular Reallusion software can boost their workflows with the latest update to the iClone Omniverse Connector released this month.

The upgrade enables seamless collaboration and expands creative possibilities for creators using NVIDIA Omniverse, a development platform for connecting and building OpenUSD-based tools and applications.

New features include real-time synchronization of projects, as well as enhanced import functionality for the Universal Scene Description framework, known as OpenUSD, which makes work between iClone and Omniverse quicker, smoother and more efficient. The update also comes with bug fixes and improvements.

Animating 3D Characters Better, Together

Creators across the world are using Reallusion iClone, a real-time 3D animation software, to bring their characters to life.

Solomon Jagwe — a 3D artist, animator and award-winning film director — uses Reallusion iClone and Omniverse for his work, which often focuses on environmental themes.

Jagwe, who grew up in East Africa, recalls fond childhood memories drawing the creatures he’d see when he ventured into the countryside with his brother. Even now, much of his 3D work begins with a simple sketch using pen and paper.

The artist said he always strives to create art that makes a difference.

For example, Jagwe created Adventures of Nkoza and Nankya, a video series for educating people of all ages on Ugandan culture. He modeled the sets for the series in Autodesk 3ds Max and Autodesk Maya, animated in Reallusion iClone and composed in Omniverse.

“With the iClone Connector for Omniverse, I can easily render my iClone animations in Omniverse and take advantage of the iClone animation tools in combination with the Omniverse Audio2Face generative AI capabilities,” Jagwe said.

Jagwe’s entire creative pipeline is accelerated by USD, which acts as a common language between 3D applications and enables sharing full scenes across content-creation tools.

“OpenUSD makes it so much easier to transport all the textures and characters together in one place for animation,” Jagwe said. The artist added that he hopes his work inspires other indie filmmakers to bring their story ideas to life using iClone and Omniverse.

A scene from Jagwe’s educational series, “Adventures of Nkoza and Nankya.”

Another indie filmmaker, Benjamin Sokomba Dazhi, aka Benny Dee, has also mastered the art of animation. He’s landed roles as head animator for the film The Legend of Oronpoto as well as creator and director of the Cartoon Network Africa Dance Challenge.

Dazhi uses Omniverse with Reallusion’s iClone and Character Creator to supercharge his artistic workflow.

“The main challenges I faced when trying to meet deadlines were long render times and difficulties with software compatibility, but using an Omniverse Connector for Reallusion’s iClone app has been game-changing for my workflow,” he said.

A scene from one of Dhazi’s animated music videos.

Several other Omniverse community members recently joined a livestream to share their workflows using Reallusion and Omniverse. Watch the stream on demand:

Sync in Style With New Connector Updates

The updated Reallusion iClone Connector for Omniverse offers powerful integrations between the two platforms.

Users can now seamlessly synchronize their projects in real time thanks to new bidirectional live-sync capabilities. This means changes made in either iClone or Omniverse can be automatically reflected back to the other. Such bidirectional synchronization can be applied to animation-related changes for characters, such as skeletal and morph animation.

The iClone Connector also enables enhanced USD import capabilities. Users can now import static meshes, cameras and lights from Omniverse directly into iClone. This improved functionality includes a filter that allows importing assets with optimal efficiency based on their types.

See how designers can now preview Omniverse renders in real time while animating in iClone, as they enjoy seamless two-way USD data transfer:

Get Plugged Into the Omniverse

Don’t miss our community livestream next week with Reallusion VP John Martin to learn all about the ways the iClone Omniverse Connector can advance your 3D character animation pipeline.

Watch NVIDIA founder and CEO Jensen Huang’s keynote address at SIGGRAPH on demand to learn about the latest breakthroughs in graphics, research, OpenUSD and AI.

Like Reallusion, learn how anyone can build their own Omniverse extension or Connector to enhance their 3D workflows and tools.

Share your Reallusion and Omniverse work as part of the latest community challenge, #StartToFinish. Use the hashtag to submit a screenshot of a project featuring both its beginning and ending stages for a chance to be featured on the @NVIDIAStudio and @NVIDIAOmniverse social channels.

Get started with NVIDIA Omniverse by downloading the standard license free, or learn how Omniverse Enterprise can connect your team.
Developers can check out these Omniverse resources to begin building on the platform.

Stay up to date on the platform by subscribing to the newsletter and following NVIDIA Omniverse on Instagram, LinkedIn, Medium, Threads and Twitter.

For more, check out our forums, Discord server, Twitch and YouTube channels.

Featured image courtesy of Reallusion.

Your phone camera can autofocus. Why can't your specs?

The Finland-based company IXI, with seed funding from the Alexa Fund, is developing eyeglasses that adjust to your vision on the fly.

Your phone camera can autofocus. Why can’t your specs?

Startup IXI is working on a breakthrough in vision correction.

Staff writer

August 16, 09:00 AMApril 25, 09:02 AM

How many pairs of eyeglasses do you own? If you need vision correction, you may have accumulated a few. There’s your everyday pair, a handful of cheap readers, the old pair you keep around just in case and the next ones you’ll need when your prescription changes.

The startup IXI wants to bring that number down to one. The Finland-based company, with seed funding from Amazon’s Alexa Fund, is developing eyeglasses that adjust to your vision on the fly, potentially eliminating the need to swap out lenses or peer through half of a bifocal.

“In some sense, the glasses that you’re wearing are more than 100 years old at the moment,” says Niko Eiden, IXI founder and CEO. “There’s absolutely no modern technology in them so far.” IXI’s inaugural product, IXI’s Adaptive Eyewear, combines software and optics for autofocus lenses.

The company has been conducting research and development on the concept since its 2021 debut and is now in the product prototyping phase. On a recent call with Amazon Science, Eiden showed a build of an IXI pair that looked unremarkable, as glasses go that’s by design.

“The experience that we are aiming for would be the same as any pair of eyewear,” he says.

From mixed reality to real-world view

Eiden and his cofounder Ville Miettinen, chief algorithm officer, have been working for the past several years on virtual and augmented reality tech for major companies in the field. Even as they had worked on advanced headsets and sensing, they saw an opportunity in the specs so many people don for everyday tasks.

The team has a compelling vision pardon the pun for advancing the state of the art in corrective eyewear in a way that moves computing to the background and makes it truly ambient,” Amazon Alexa Fund director Paul Bernard said when announcing the funding.

As part of the R&D phase, IXI interviewed potential users to understand the customer need and which features would make most sense, Eiden says. Eiden himself is a pilot, and he wanted a way to ensure consistently good vision while switching views from outside the plane to instruments on the dashboard and ceiling. The desirability of that quick-switch focus without multifocals something that will be familiar to any driver who has struggled with toggling between looking at the road and the dashboard nav system was confirmed in the interviews. A mechanic mentioned flipping bifocals upside down to work under a car.

Having shown that the combination of tunable lenses and eye tracking could work and would solve a problem, the team began iterating on prototypes. “Now we are diving into the hardware problems: How do we produce this efficiently so that we don’t waste materials and it lasts long enough? What type of power requirements? How much real estate do we have to put the components in?” Eiden explains. “It’s an interesting journey, because the question and the problem change continuously.”

The robustness requirements of real-life usage such as different lighting conditions and temperature impacts were impressive, Eiden says of the research process. Measuring visual quality also proved to be very difficult, he adds: The experienced visual quality differed vastly from person to person and even more when compared to our lab measurements made with instruments or cameras.

IXI’s founders say they saw an opportunity for improvement in the glasses many people wear for everyday tasks.

IXI’s technology combines a liquid crystal-based lens with eye tracking. Eye tracking is used to understand the distances to whatever the user is looking at, while liquid crystal technology changes the optical power of the glasses on the fly. A typical way to track eye movement would be to set up cameras that capture images of the eyes and send them to a computer vision algorithm that figures out where the pupil is positioned. That’s not feasible for a pair of glasses.

“Just the sheer amount of data that we would have to process makes that impossible, Eiden says. So we had to find a completely new way to do the eye tracking, that works without cameras, fits into the frames and uses next to no power.” Eiden notes their proprietary process relies on calibrating each pair of IXI glasses to the eyes of the user, which can be done with IXIs mobile app.

The prototype pair he showed still relied on external hardware to wirelessly process the signals, which makes it much faster to iterate new versions. Behind him, engineers were using a control board resembling an audio mixer to find the correct “tune” for each specific lens a process that will be fully automated later on at the factory.

“Of course, we need to miniaturize this,” he says, showing a driver component that changes the optical power of the lens with software. Everything about the IXI, from the lens driver components to the batteries, needs to be tiny to fit the eyeglass frame.

A product whose time has come?

Eiden says the wristwatch industry, combined with the boom in wearable fitness trackers, has paved the way for IXI. “There’s a lot of optimization needed, from a power perspective,” he says. “Five years ago, this type of product wouldn’t have been possible, because those components just didn’t exist yet.”

Still, the size and form factor of the glasses require specialized components. “We’ve had to build our own manufacturing equipment. We are using very sophisticated lasering processes for the manufacturing part,” he says.

Once the team has ironed out a successful prototype and manufacturing process, IXI will start on factory builds of the IXI Adaptive Eyewear. Eiden says the company is targeting a launch by the end of next year.

The promise of the technology goes beyond the convenience of being able to, say, ditch your multifocals, Eiden notes. Eventually, you might be able to have a prescription that takes into account the fact that your eyes get tired later in the day.

The whole model of how we think about eyeglasses and prescriptions is due for a change, Eiden says: “If it’s not us, it’s going to be somebody else. It just makes so much sense.”

Tags: Alexa Fund , Research and development (R&D)

Your phone camera can autofocus. Why can’t your specs?

Startup Pixieray is working on a breakthrough in vision correction.Read More

OpenAI acquires Global Illumination

The entire team has joined OpenAI.OpenAI Blog

Solution overview

Prerequisites

Setup

Use cases

Use case 1: Run a single step

Use case 2: Run multiple contiguous pipeline steps

Use case 3: Update and rerun failed pipeline steps

Use case 4: Pipeline coverage

Conclusion

About the Authors

Overview of solution

Prerequisites

Prepare the BigEarthNet-S2 dataset

Train DINO models with SageMaker

Transfer learning to downstream tasks

Clean up

Conclusion

About the Authors

NEW RESEARCH

HyWay: Enabling Mingling in the Hybrid World

NEW RESEARCH

Auto-Tables: Synthesizing Multi-Step Transformations to Relationalize Tables without Using Examples

NEW RESEARCH

Learning to Retrieve In-Context Examples for Large Language Models

NEW RESEARCH

End-to-End Word-Level Pronunciation Assessment with MASK Pre-training

Envisioning the Open Arena

Building the Open Arena

Solution overview

Model development and integration

Open Arena tiles: Facilitating user interaction

The impact of the Open Arena

Conclusion

About the Authors

Subscribe to the AI Podcast: Now Available on Amazon Music

Animating 3D Characters Better, Together

Sync in Style With New Connector Updates

Get Plugged Into the Omniverse

From mixed reality to real-world view

A product whose time has come?

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.