November 2022 – Page 14

Refit trained parameters on large datasets using Amazon SageMaker Data Wrangler

Amazon SageMaker Data Wrangler helps you understand, aggregate, transform, and prepare data for machine learning (ML) from a single visual interface. It contains over 300 built-in data transformations so you can quickly normalize, transform, and combine features without having to write any code.

Data science practitioners generate, observe, and process data to solve business problems where they need to transform and extract features from datasets. Transforms such as ordinal encoding or one-hot encoding learn encodings on your dataset. These encoded outputs are referred as trained parameters. As datasets change over time, it may be necessary to refit encodings on previously unseen data to keep the transformation flow relevant to your data.

We are excited to announce the refit trained parameter feature, which allows you to use previous trained parameters and refit them as desired. In this post, we demonstrate how to use this feature.

Overview of the Data Wrangler refit feature

We illustrate how this feature works with the following example, before we dive into the specifics of the refit trained parameter feature.

Assume your customer dataset has a categorical feature for country represented as strings like Australia and Singapore. ML algorithms require numeric inputs; therefore, these categorical values have to be encoded to numeric values. Encoding categorical data is the process of creating a numerical representation for categories. For example, if your category country has values Australia and Singapore, you may encode this information into two vectors: [1, 0] to represent Australia and [0, 1] to represent Singapore. The transformation used here is one-hot encoding and the new encoded output reflects the trained parameters.

After training the model, over time your customers may increase and you have more distinct values in the country list. The new dataset could contain another category, India, which wasn’t part of the original dataset, which can affect the model accuracy. Therefore, it’s necessary to retrain your model with the new data that has been collected over time.

To overcome this problem, you need to refresh the encoding to include the new category and update the vector representation as per your latest dataset. In our example, the encoding should reflect the new category for the country, which is India. We commonly refer to this process of refreshing an encoding as a refit operation. After you perform the refit operation, you get the new encoding: Australia: [1, 0, 0], Singapore: [0, 1, 0], and India: [0, 0, 1]. Refitting the one-hot encoding and then retraining the model on the new dataset results in better quality predictions.

Data Wrangler’s refit trained parameter feature is useful in the following cases:

New data is added to the dataset – Retraining the ML model is necessary when the dataset is enriched with new data. To achieve optimal results, we need to refit the trained parameters on the new dataset.
Training on a full dataset after performing feature engineering on sample data – For a large dataset, a sample of the dataset is considered for learning trained parameters, which may not represent your entire dataset. We need to relearn the trained parameters on the complete dataset.

The following are some of the most common Data Wrangler transforms performed on the dataset that benefit from the refit trained parameter option:

For more information about transformations in Data Wrangler, refer to Transform Data.

In this post, we show how to process these trained parameters on datasets using Data Wrangler. You can use Data Wrangler flows in production jobs to reprocess your data as it grows and changes.

Solution overview

For this post, we demonstrate how to use the Data Wrangler’s refit trained parameter feature with the publicly available dataset on Kaggle: US Housing Data from Zillow, For-Sale Properties in the United States. It has the home sale prices across various geo-distributions of homes.

The following diagram illustrates the high-level architecture of Data Wrangler using the refit trained parameter feature. We also show the effect on the data quality without the refit trained parameter and contrast the results at the end.

The workflow includes the following steps:

Perform exploratory data analysis – Create a new flow on Data Wrangler to start the exploratory data analysis (EDA). Import business data to understand, clean, aggregate, transform, and prepare your data for training. Refer to Explore Amazon SageMaker Data Wrangler capabilities with sample datasets for more details on performing EDA with Data Wrangler.
Create a data processing job – This step exports all the transformations that you made on the dataset as a flow file stored in the configured Amazon Simple Storage Service (Amazon S3) location. The data processing job with the flow file generated by Data Wrangler applies the transforms and trained parameters learned on your dataset. When the data processing job is complete, the output files are uploaded to the Amazon S3 location configured in the destination node. Note that the refit option is turned off by default. As an alternative to executing the processing job instantly, you can also schedule a processing job in a few clicks using Data Wrangler – Create Job to run at specific times.
Create a data processing job with the refit trained parameter feature – Select the new refit trained parameter feature while creating the job to enforce relearning of your trained parameters on your full or reinforced dataset. As per the Amazon S3 location configuration for storing the flow file, the data processing job creates or updates the new flow file. If you configure the same Amazon S3 location as in Step 2, the data processing job updates the flow file generated in the Step 2, which can be used to keep your flow relevant to your data. On completion of the processing job, the output files are uploaded to the destination node configured S3 bucket. You can use the updated flow on your entire dataset for a production workflow.

Prerequisites

Before getting started, upload the dataset to an S3 bucket, then import it into Data Wrangler. For instructions, refer to Import data from Amazon S3.

Let’s now walk through the steps mentioned in the architecture diagram.

Perform EDA in Data Wrangler

To try out the refit trained parameter feature, set up the following analysis and transformation in Data Wrangler. At the end of setting up EDA, Data Wrangler creates a flow file captured with trained parameters from the dataset.

Create a new flow in Amazon SageMaker Data Wrangler for exploratory data analysis.
Import the business data you uploaded to Amazon S3.
You can preview the data and options for choosing the file type, delimiter, sampling, and so on. For this example, we use the First K sampling option provided by Data Wrangler to import first 50,000 records from the dataset.
Choose Import.

After you check out the data type matching applied by Data Wrangler, add a new analysis.

For Analysis type, choose Data Quality and Insights Report.
Choose Create.

With the Data Quality and Insights Report, you get a brief summary of the dataset with general information such as missing values, invalid values, feature types, outlier counts, and more. You can pick features property_type and city for applying transformations on the dataset to understand the refit trained parameter feature.

Let’s focus on the feature property_type from the dataset. In the report’s Feature Details section, you can see the property_type, which is a categorical feature, and six unique values derived from the 50,000 sampled dataset by Data Wrangler. The complete dataset may have more categories for the feature property_type. For a feature with many unique values, you may prefer ordinal encoding. If the feature has a few unique values, a one-hot encoding approach can be used. For this example, we opt for one-hot encoding on property_type.

Similarly, for the city feature, which is a text data type with a large number of unique values, let’s apply ordinal encoding to this feature.

Navigate to the Data Wrangler flow, choose the plus sign, and choose Add transform.

Choose the Encode categorical option for transforming categorical features.

From the Data Quality and Insights Report, feature property_type shows six unique categories: CONDO, LOT, MANUFACTURED, SINGLE_FAMILY, MULTI_FAMILY, and TOWNHOUSE.

For Transform, choose One-hot encode.

After applying one-hot encoding on feature property_type, you can preview all six categories as separate features added as new columns. Note that 50,000 records were sampled from your dataset to generate this preview. While running a Data Wrangler processing job with this flow, these transformations are applied to your entire dataset.

Add a new transform and choose Encode Categorical to apply a transform on the feature city, which has a larger number of unique categorical text values.
To encode this feature into a numeric representation, choose Ordinal encode for Transform.

Choose Preview on this transform.

You can see that the categorical feature city is mapped to ordinal values in the output column e_city.

Add this step by choosing Update.

You can set the destination to Amazon S3 to store the applied transformations on the dataset to generate the output as CSV file.

Data Wrangler stores the workflow you defined in the user interface as a flow file and uploads to the configured data processing job’s Amazon S3 location. This flow file is used when you create Data Wrangler processing jobs to apply the transforms on larger datasets, or to transform new reinforcement data to retrain the model.

Launch a Data Wrangler data processing job without refit enabled

Now you can see how the refit option uses trained parameters on new datasets. For this demonstration, we define two Data Wrangler processing jobs operating on the same data. The first processing job won’t enable refit; for the second processing job, we use refit. We compare the effects at the end.

Choose Create job to initiate a data processing job with Data Wrangler.

For Job name, enter a name.
Under Trained parameters, do not select Refit.
Choose Configure job.

Configure the job parameters like instance types, volume size, and Amazon S3 location for storing the output flow file.
Data Wrangler creates a flow file in the flow file S3 location. The flow uses transformations to train parameters, and we later use the refit option to retrain these parameters.
Choose Create.

Wait for the data processing job to complete to see the transformed data in the S3 bucket configured in the destination node.

Launch a Data Wrangler data processing job with refit enabled

Let’s create another processing job enabled with the refit trained parameter feature enabled. This option enforces the trained parameters relearned on the entire dataset. When this data processing job is complete, a flow file is created or updated to the configured Amazon S3 location.

Choose Create job.

For Job name, enter a name.
For Trained parameters, select Refit.
If you choose View all, you can review all the trained parameters.

Choose Configure job.
Enter the Amazon S3 flow file location.
Choose Create.

Wait for the data processing job to complete.

Refer to the configured S3 bucket in the destination node to view the data generated by the data processing job running the defined transforms.

Export to Python code for running Data Wrangler processing jobs

As an alternative to starting the processing jobs using the Create job option in Data Wrangler, you can trigger the data processing jobs by exporting the Data Wrangler flow to a Jupyter notebook. Data Wrangler generates a Jupyter notebook with inputs, outputs, processing job configurations, and code for job status checks. You can change or update the parameters as per your data transformation requirements.

Choose the plus sign next to the final Transform node.
Choose Export to and Amazon S3 (Via Jupyter Notebook).

You can see a Jupyter notebook opened with inputs, outputs, processing job configurations, and code for job status checks.

To enforce the refit trained parameters option via code, set the refit parameter to True.

Compare data processing job results

After the Data Wrangler processing jobs are complete, you must create two new Data Wrangler flows with the output generated by the data processing jobs stored in the configured Amazon S3 destination.

You can refer to the configured location in the Amazon S3 destination folder to review the data processing jobs’ outputs.

To inspect the processing job results, create two new Data Wrangler flows using the Data Quality and Insights Report to compare the transformation results.

Create a new flow in Amazon SageMaker Data Wrangler.
Import the data processing job without refit enabled output file from Amazon S3.
Add a new analysis.
For Analysis type, choose Data Quality and Insights Report.
Choose Create.

Repeat the above steps and create new data wrangler flow to analyze the data processing job output with refit enabled.

Now let’s look at the outputs of processing jobs for the feature property_type using the Data Quality and Insights Reports. Scroll to the feature details on the Data and Insights Reports listing feature_type.

The refit trained parameter processing job has refitted the trained parameters on the entire dataset and encoded the new value APARTMENT with seven distinct values on the full dataset.

The normal processing job applied the sample dataset trained parameters, which have only six distinct values for the property_type feature. For data with feature_type APARTMENT, the invalid handling strategy Skip is applied and the data processing job doesn’t learn this new category. The one-hot encoding has skipped this new category present on the new data, and the encoding skips the category APARTMENT.

Let’s now focus on another feature, city. The refit trained parameter processing job has relearned all the values available for the city feature, considering the new data.

As shown in the Feature Summary section of the report, the new encoded feature column e_city has 100% valid parameters by using the refit trained parameter feature.

In contrast, the normal processing job has 82.4% of missing values in the new encoded feature column e_city. This phenomenon is because only the sample set of learned trained parameters are applied on the full dataset and no refitting is applied by the data processing job.

The following histograms depict the ordinal encoded feature e_city. The first histogram is of the feature transformed with the refit option.

The next histogram is of the feature transformed without the refit option. The orange column shows missing values (NaN) in the Data Quality and Insights Report. The new values that aren’t learned from the sample dataset are replaced as Not a Number (NaN) as configured in the Data Wrangler UI’s invalid handling strategy.

The data processing job with the refit trained parameter relearned the property_type and city features considering the new values from the entire dataset. Without the refit trained parameter, the data processing job only uses the sampled dataset’s pre-learned trained parameters. It then applies them to the new data, but the new values aren’t considered for encoding. This will have implications on the model accuracy.

Clean up

When you’re not using Data Wrangler, it’s important to shut down the instance on which it runs to avoid incurring additional fees.

To avoid losing work, save your data flow before shutting Data Wrangler down.

To save your data flow in Amazon SageMaker Studio, choose File, then choose Save Data Wrangler Flow. Data Wrangler automatically saves your data flow every 60 seconds.
To shut down the Data Wrangler instance, in Studio, choose Running Instances and Kernels.
Under RUNNING APPS, choose the shutdown icon next to the sagemaker-data-wrangler-1.0 app.

Choose Shut down all to confirm.

Data Wrangler runs on an ml.m5.4xlarge instance. This instance disappears from RUNNING INSTANCES when you shut down the Data Wrangler app.

After you shut down the Data Wrangler app, it has to restart the next time you open a Data Wrangler flow file. This can take a few minutes.

Conclusion

In this post, we provided an overview of the refit trained parameter feature in Data Wrangler. With this new feature, you can store the trained parameters in the Data Wrangler flow, and the data processing jobs use the trained parameters to apply the learned transformations on large datasets or reinforcement datasets. You can apply this option to vectorizing text features, numerical data, and handling outliers.

Preserving trained parameters throughout the data processing of the ML lifecycle simplifies and reduces the data processing steps, supports robust feature engineering, and supports model training and reinforcement training on new data.

We encourage you to try out this new feature for your data processing requirements.

About the authors

Hariharan Suresh is a Senior Solutions Architect at AWS. He is passionate about databases, machine learning, and designing innovative solutions. Prior to joining AWS, Hariharan was a product architect, core banking implementation specialist, and developer, and worked with BFSI organizations for over 11 years. Outside of technology, he enjoys paragliding and cycling.

Santosh Kulkarni is an Enterprise Solutions Architect at Amazon Web Services who works with sports customers in Australia. He is passionate about building large-scale distributed applications to solve business problems using his knowledge in AI/ML, big data, and software development.

Vishaal Kapoor is a Senior Applied Scientist with AWS AI. He is passionate about helping customers understand their data in Data Wrangler. In his spare time, he mountain bikes, snowboards, and spends time with his family.

Aniketh Manjunath is a Software Development Engineer at Amazon SageMaker. He helps support Amazon SageMaker Data Wrangler and is passionate about distributed machine learning systems. Outside of work, he enjoys hiking, watching movies, and playing cricket.

Run machine learning inference workloads on AWS Graviton-based instances with Amazon SageMaker

Today, we are launching Amazon SageMaker inference on AWS Graviton to enable you to take advantage of the price, performance, and efficiency benefits that come from Graviton chips.

Graviton-based instances are available for model inference in SageMaker. This post helps you migrate and deploy a machine learning (ML) inference workload from x86 to Graviton-based instances in SageMaker. We provide a step-by-step guide to deploy your SageMaker trained model to Graviton-based instances, cover best practices when working with Graviton, discuss the price-performance benefits, and demo how to deploy a TensorFlow model on a SageMaker Graviton instance.

Brief overview of Graviton

AWS Graviton is a family of processors designed by AWS that provide the best price-performance and are more energy efficient than their x86 counterparts. AWS Graviton 3 processors are the latest in the Graviton processor family and are optimized for ML workloads, including support for bfloat16, and twice the Single Instruction Multiple Data (SIMD) bandwidth. When these two features are combined, Graviton 3 can deliver up to three times better performance vs. Graviton 2 instances. Graviton 3 also uses up to 60% less energy for the same performance as comparable Amazon Elastic Compute Cloud (Amazon EC2) instances. This is a great feature if you want to reduce your carbon footprint and achieve your sustainability goals.

Solution overview

To deploy your models to Graviton instances, you either use AWS Deep Learning Containers or bring your own containers compatible with Arm v8.2 architecture.

The migration (or new deployment) of your models from x86 powered instances to Graviton instances is simple because AWS provides containers to host models with PyTorch, TensorFlow, Scikit-learn, and XGBoost, and the models are architecture agnostic. Nevertheless, if you’re willing to bring your own libraries, you can also do so, just ensure that your container is built with an environment that supports Arm64 architecture. For more information, see Building your own algorithm container.

You need to complete three steps to deploy your model:

Create a SageMaker model: This will contain, among other parameters, the information about the model file location, the container that will be used for the deployment, and the location of the inference script. (If you have an existing model already deployed in an x86 based inference instance, you can skip this step.)
Create an endpoint configuration: This will contain information about the type of instance you want for the endpoint (for example, ml.c7g.xlarge for Graviton3), the name of the model you created in the step 1, and the number of instances per endpoint.
Launch the endpoint with the endpoint configuration created in the step 2.

Prerequisites

Before starting, consider the following prerequisites:

Complete the prerequisites as listed in Prerequisites.
Your model should be either a PyTorch, TensorFlow, XGBoost, or Scikit-learn based model. The following table summarizes the versions currently supported as of this writing. For the latest updates, refer to SageMaker Framework Containers (SM support only).

. Python TensorFlow PyTorch Scikit-learn XGBoost

Versions supported 3.8 2.9.1 1.12.1 1.0-1 1.3-1 to 1.5-1
The inference script is stored in Amazon Simple Storage Service (Amazon S3).

In the following sections, we walk you through the deployment steps.

Create a SageMaker model

If you have an existing model already deployed in an x86-based inference instance, you can skip this step. Otherwise, complete the following steps to create a SageMaker model:

Locate the model that you stored in an S3 bucket. Copy the URI.
You use the model URI later in the MODEL_S3_LOCATION.
Identify the framework version and Python version that was used during model training.
You need to select a container from the list of available AWS Deep Learning Containers per your framework and Python version. For more information, refer to Introducing multi-architecture container images for Amazon ECR.
Locate the inference Python script URI in the S3 bucket (the common file name is inference.py).
The inference script URI is needed in the INFERENCE_SCRIPT_S3_LOCATION.

With these variables, you can then call the SageMaker API with the following command:

client = boto3.client("sagemaker")

client.create_model(
    ModelName="Your model name",
    PrimaryContainer={
        "Image": <AWS_DEEP_LEARNING_CONTAINER_URI>,
        "ModelDataUrl": <MODEL_S3_LOCATION>,
        "Environment": {
        "SAGEMAKER_PROGRAM": "inference.py",
        "SAGEMAKER_SUBMIT_DIRECTORY": <INFERENCE_SCRIPT_S3_LOCATION>,
        "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
        "SAGEMAKER_REGION": <REGION>
        }
    },
    ExecutionRoleArn= <ARN for AmazonSageMaker-ExecutionRole>
)

You can also create multi-architecture images, and use the same image but with different tags. You can indicate on which architecture your instance will be deployed. For more information, refer to Introducing multi-architecture container images for Amazon ECR.

Create an endpoint config

After you create the model, you have to create an endpoint configuration by running the following command (note the type of instance we’re using):

client.create_endpoint_config(
    EndpointConfigName= <Your endpoint config name>,
    ProductionVariants=[
        {
         "VariantName": "v0",
         "ModelName": "Your model name",
         "InitialInstanceCount": 1,
         "InstanceType": "ml.c7g.xlarge",
        },
    ]
)

The following screenshot shows the endpoint configuration details on the SageMaker console.

Launch the endpoint

With the endpoint config created in the previous step, you can deploy the endpoint:

client.create_endpoint(
    EndpointName = "<Your endpoint name>",
    EndpointConfigName = "<Your endpoint config name>"
    )

Wait until your model endpoint is deployed. Predictions can be requested in the same way you request predictions for your endpoints deployed in x86-based instances.

The following screenshot shows your endpoint on the SageMaker console.

What is supported

SageMaker provides performance-optimized Graviton deep containers for TensorFlow and PyTorch frameworks. These containers support computer vision, natural language processing, recommendations, and generic deep and wide model-based inference use cases. In addition to deep learning containers, SageMaker also provides containers for classical ML frameworks such as XGBoost and Scikit-learn. The containers are binary compatible across c6g/m6g and c7g instances, therefore migrating the inference application from one generation to another is seamless.

C6g/m6g supports fp16 (half-precision float) and for compatible models provides equivalent or better performance compared to c5 instances. C7g substantially increases the ML performance by doubling the SIMD width and supporting bfloat-16 (bf16), which is the most cost-efficient platform for running your models.

Both c6g/m6g and c7g provide good performance for classical ML (for example, XGBoost) compared to other CPU instances in SageMaker. Bfloat-16 support on c7g allows efficient deployment of bf16 trained or AMP (Automatic Mixed Precision) trained models. The Arm Compute Library (ACL) backend on Graviton provides bfloat-16 kernels that can accelerate even the fp32 operators via fast math mode, without the model quantization.

Recommended best practices

On Graviton instances, every vCPU is a physical core. There is no contention for the common CPU resources (unlike SMT), and the workload performance scaling is linear with every vCPU addition. Therefore, it’s recommended to use batch inference whenever the use case allows. This will enable efficient use of the vCPUs by parallel processing the batch on each physical core. If the batch inference isn’t possible, the optimal instance size for a given payload is required to ensure OS thread scheduling overhead doesn’t outweigh the compute power that comes with the additional vCPUs.

TensorFlow comes with Eigen kernels by default, and it’s recommended to switch to OneDNN with ACL to get the most optimized inference backend. The OneDNN backend and the bfloat-16 fast math mode can be enabled while launching the container service:

docker run -p 8501:8501 --name tfserving_resnet 
--mount type=bind,source=/tmp/resnet,target=/models/resnet 
-e MODEL_NAME=resnet -e TF_ENABLE_ONEDNN_OPTS=1 
-e DNNL_DEFAULT_FPMATH_MODE=BF16 -e -t tfs:mkl_aarch64

The preceding serving command hosts a standard resnet50 model with two important configurations:

-e TF_ENABLE_ONEDNN_OPTS=1
-e DNNL_DEFAULT_FPMATH_MODE=BF16

These can be passed to the inference container in the following way:

client.create_model(
    ModelName="Your model name",
    PrimaryContainer={
    "Image": <AWS_DEEP_LEARNING_CONTAINER_URI>,
    "ModelDataUrl": <MODEL_S3_LOCATION>,
    "Environment": {
        "SAGEMAKER_PROGRAM": "inference.py",
        "SAGEMAKER_SUBMIT_DIRECTORY": "<INFERENCE_SCRIPT_S3_LOCATION>",
        "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
        "SAGEMAKER_REGION": <REGION>,
        "TF_ENABLE_ONEDNN_OPTS": "1",
        "DNNL_DEFAULT_FPMATH_MODE": "BF16"
         }
     },
     ExecutionRoleArn='ARN for AmazonSageMaker-ExecutionRole'
)

Deployment example

In this post, we show you how to deploy a TensorFlow model, trained in SageMaker, on a Graviton-powered SageMaker inference instance.

You can run the code sample either in a SageMaker notebook instance, an Amazon SageMaker Studio notebook, or a Jupyter notebook in local mode. You need to retrieve the SageMaker execution role if you use a Jupyter notebook in local mode.

The following example considers the CIFAR-10 dataset. You can follow the notebook example from the SageMaker examples GitHub repo to reproduce the model that is used in this post. We use the trained model and the cifar10_keras_main.py Python script for inference.

The model is stored in an S3 bucket: s3://aws-ml-blog/artifacts/run-ml-inference-on-graviton-based-instances-with-amazon-sagemaker/model.tar.gz

The cifar10_keras_main.py script, which can be used for the inference, is stored at:s3://aws-ml-blog/artifacts/run-ml-inference-on-graviton-based-instances-with-amazon-sagemaker/script/cifar10_keras_main.py

We use the us-east-1 Region and deploy the model on an ml.c7g.xlarge Graviton-based instance. Based on this, the URI of our AWS Deep Learning Container is 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference-graviton:2.9.1-cpu-py38-ubuntu20.04-sagemaker

Set up with the following code:

import sagemaker
import boto3
import datetime
import json
import gzip
import os

sagemaker_session = sagemaker.Session()
bucket = sagemaker_session.default_bucket()
role = sagemaker.get_execution_role()
region = sagemaker_session.boto_region_name

Download the dataset for endpoint testing:

from keras.datasets import cifar10
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

Create the model and endpoint config, and deploy the endpoint:

timestamp = "{:%Y-%m-%d-%H-%M-%S}".format(datetime.datetime.now())

client = boto3.client("sagemaker")

MODEL_NAME = f"graviton-model-{timestamp}"
ENDPOINT_NAME = f"graviton-endpoint-{timestamp}"
ENDPOINT_CONFIG_NAME = f"graviton-endpoint-config-{timestamp}"

# create sagemaker model
create_model_response = client.create_model(
    ModelName=MODEL_NAME,
    PrimaryContainer={
    "Image":  "763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference-graviton:2.9.1-cpu-py38-ubuntu20.04-sagemaker ",
    "ModelDataUrl":  "s3://aws-ml-blog/artifacts/run-ml-inference-on-graviton-based-instances-with-amazon-sagemaker/model.tar.gz",
    "Environment": {
        "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
        "SAGEMAKER_REGION": region
        }
    },
    ExecutionRoleArn=role
)
print ("create_model API response", create_model_response)

Optionally, you can add your inference script to Environment in create_model if you didn’t originally add it as an artifact to your SageMaker model during training:

"SAGEMAKER_PROGRAM": "inference.py",
"SAGEMAKER_SUBMIT_DIRECTORY": <INFERENCE_SCRIPT_S3_LOCATION>,
		
# create sagemaker endpoint config
create_endpoint_config_response = client.create_endpoint_config(
    EndpointConfigName=ENDPOINT_CONFIG_NAME,
    ProductionVariants=[
        {
         "VariantName": "v0",
         "ModelName": MODEL_NAME,
         "InitialInstanceCount": 1,
         "InstanceType": "ml.c7g.xlarge" 
        },
    ]
)
print ("ncreate_endpoint_config API response", create_endpoint_config_response)

# create sagemaker endpoint
create_endpoint_response = client.create_endpoint(
    EndpointName = ENDPOINT_NAME,
    EndpointConfigName = ENDPOINT_CONFIG_NAME,
)
print ("ncreate_endpoint API response", create_endpoint_response)

You have to wait a couple of minutes for the deployment to take place.

Verify the endpoint status with the following code:
```
describe_response = client.describe_endpoint(EndpointName=ENDPOINT_NAME)
print(describe_response["EndpointStatus"]
```
You can also check the AWS Management Console to see when your model is deployed.
Set up the runtime environment to invoke the endpoints:
```
runtime = boto3.Session().client(service_name="runtime.sagemaker")
```
Now we prepare the payload to invoke the endpoint. We use the same type of images used for the training of the model. These were downloaded in previous steps.
Cast the payload to tensors and set the correct format that the model is expecting. For this example, we only request one prediction.
```
input_image = x_test[0].reshape(1,32,32,3)
```
We get the model output as an array.

We can turn this output into probabilities if we apply a softmax to it:

CONTENT_TYPE = 'application/json'
ACCEPT = 'application/json'
PAYLOAD = json.dumps(input_image.tolist())

response = runtime.invoke_endpoint(
    EndpointName=ENDPOINT_NAME, 
    ContentType=CONTENT_TYPE,
    Accept=ACCEPT,
    Body=PAYLOAD
)
    
print(response['Body'].read().decode())

Clean up resources

The services involved in this solution incur costs. When you’re done using this solution, clean up the following resources:

client.delete_endpoint(EndpointName=ENDPOINT_NAME)
client.delete_endpoint_config(EndpointConfigName=ENDPOINT_CONFIG_NAME)
client.delete_model(ModelName=MODEL_NAME)

Price-performance comparison

Graviton-based instances offer the lowest price and the best price-performance when compared to x86-based instances. Similar to EC2 instances, the SageMaker inference endpoints with ml.c6g instances (Graviton 2) offer a 20% lower price compared to ml.c5, and the Graviton 3 ml.c7g instances are 15% cheaper than ml.c6 instances. For more information, refer to Amazon SageMaker Pricing.

Conclusion

In this post, we showcased the newly launched SageMaker capability to deploy models in Graviton-powered inference instances. We gave you guidance on best practices and briefly discussed the price-performance benefits of the new type of inference instances.

To learn more about Graviton, refer to AWS Graviton Processor. You can get started with AWS Graviton-based EC2 instances on the Amazon EC2 console and by referring to AWS Graviton Technical Guide. You can deploy a Sagemaker model endpoint for inference on Graviton with the sample code in this blog post.

About the authors

Victor Jaramillo, PhD, is a Senior Machine Learning Engineer in AWS Professional Services. Prior to AWS, he was a university professor and research scientist in predictive maintenance. In his free time, he enjoys riding his motorcycle and DIY motorcycle mechanics.

Zmnako Awrahman, PhD, is a Practice Manager, ML SME, and Machine Learning Technical Field Community (TFC) member at Amazon Web Services. He helps customers leverage the power of the cloud to extract value from their data with data analytics and machine learning.

Sunita Nadampalli is a Software Development Manager at AWS. She leads Graviton software performance optimizations for machine leaning, HPC, and multimedia workloads. She is passionate about open-source development and delivering cost-effective software solutions with Arm SoCs.

Johna Liu is a Software Development Engineer in the Amazon SageMaker team. Her current work focuses on helping developers efficiently host machine learning models and improve inference performance. She is passionate about spatial data analysis and using AI to solve societal problems.

Alan Tan is a Senior Product Manager with SageMaker, leading efforts on large model inference. He’s passionate about applying machine learning to the area of analytics. Outside of work, he enjoys the outdoors.

Going Green: New Generation of NVIDIA-Powered Systems Show Way Forward

With the end of Moore’s law, traditional approaches to meet the insatiable demand for increased computing performance will require disproportionate increases in costs and power.

At the same time, the need to slow the effects of climate change will require more efficient data centers, which already consume more than 200 terawatt-hours of energy each year, or around 2% of the world’s energy usage.

Released today, the new Green500 list of the world’s most-efficient supercomputers demonstrates the energy efficiency of accelerated computing, which is already used in all of the top 30 systems on the list. Its impact on energy efficiency is staggering.

We estimate the TOP500 systems require more than 5 terawatt-hours of energy per year, or $750 million worth of energy, to operate.

But that could be slashed by more than 80% to just $150 million — saving 4 terawatt-hours of energy — if these systems were as efficient as the 30 greenest systems on the TOP500 list.

Conversely, with the same power budget as today’s TOP500 systems and the efficiency of the top 30 systems, these supercomputers could deliver 5x today’s performance.

And the efficiency gains highlighted by the latest Green500 systems are just the start. NVIDIA is racing to deliver continuous energy improvements across its CPUs, GPUs, software and systems portfolio.

Hopper’s Green500 Debut

NVIDIA technologies already power 23 of the top 30 systems on the latest Green500 list.

Among the highlights: the Flatiron Institute in New York City topped the Green500 list of most efficient supercomputers with an air-cooled ThinkSystem built by Lenovo featuring NVIDIA Hopper H100 GPUs.

The supercomputer, dubbed Henri, produces 65 billion double-precision, floating-point operations per watt, according to the Green500, and will be used to tackle problems in computational astrophysics, biology, mathematics, neuroscience and quantum physics.

The NVIDIA H100 Tensor Core GPU, based on the NVIDIA Hopper GPU architecture, has up to 6x more AI performance and up to 3x more HPC performance compared to the prior-generation A100 GPU. It’s designed to perform with incredible efficiency. Its second-generation Multi-Instance GPU technology can partition the GPU into smaller compute units, dramatically boosting the number of GPU clients available to data center users.

And the show floor at this year’s SC22 conference is packed with new systems featuring NVIDIA’s latest technologies from ASUS, Atos, Dell Technologies, GIGABYTE, Hewlett Packard Enterprise, Lenovo, QCT and Supermicro.

The fastest new computer on the TOP500 list, Leonardo, hosted and managed by the Cineca nonprofit consortium, and powered by nearly 14,000 NVIDIA A100 GPUs, took the No. 4 spot, while also being the 12th most energy-efficient system.

The latest TOP500 list boasts the highest number of NVIDIA technologies so far.

In total, NVIDIA technologies power 361 of the systems on the TOP500 list, including 90% of the new systems (see chart).

The Next-Generation Accelerated Data Center

NVIDIA is also developing new computing architectures to deliver even greater energy efficiency and performance to the accelerated data center.

The Grace CPU and Grace Hopper Superchips, announced earlier this year, will provide the next big boost in the energy efficiency of the NVIDIA accelerated computing platform. The Grace CPU Superchip delivers up to twice the performance per watt of a traditional CPU, thanks to the incredible efficiency of the Grace CPU and low-power LPDDR5X memory.

Assuming a 1-megawatt HPC data center with 20% of the power allocated for CPU partition and 80% toward the accelerated portion using Grace and Grace Hopper, data centers can get 1.8x more work done for the same power budget compared to a similarly partitioned x86-based data center.

DPUs Driving Additional Efficiency Gains

Along with Grace and Grace Hopper, NVIDIA networking technology is supercharging cloud-native supercomputing just as the increased usage of simulations is accelerating demand for supercomputing services.

Based on NVIDIA’s BlueField-3 DPU, the NVIDIA Quantum-2 InfiniBand platform delivers the extreme performance, broad accessibility and strong security needed by cloud computing providers and supercomputing centers.

The effort, described in a recent whitepaper, demonstrated how DPUs can be used to offload and accelerate networking, security, storage or other infrastructure functions and control-plane applications, reducing server power consumption up to 30%.

The amount of power savings increases as server load increases and can easily save $5 million in electricity costs for a large data center with 10,000 servers over the three-year lifespan of the servers, plus additional savings in cooling, power delivery, rack space and server capital costs.

Accelerated computing with DPUs for networking, security and storage jobs is one of the next big steps for making data centers more power efficient.

More With Less

Breakthroughs like these come as the scientific method is rapidly transforming into an approach driven by data analytics, AI and physics-based simulation, making more efficient computers key to the next generation of scientific breakthroughs.

By providing researchers with a multi-discipline, high-performance computing platform optimized for this new approach — and able to deliver both performance and efficiency — NVIDIA gives scientists an instrument to make critical discoveries that will benefit us all.

More Resources

The post Going Green: New Generation of NVIDIA-Powered Systems Show Way Forward appeared first on NVIDIA Blog.

Hear what Google’s first Responsible Innovation intern learned

In 2018, we launched Google’s AI Principles to ensure we’re building AI that not only solves important problems and helps people in their daily lives, but also AI that is ethical, fair and safe. At the same time, we launched a central Responsible Innovation team to ensure the rest of Google is held accountable to these AI Principles. As the team grows, we continue to incorporate the perspectives and ideas of people from around the world — and this spring we welcomed our first intern, Lieke Dom. Lieke is based in Amsterdam, recently got her Master’s in Digital Business & Innovation, and is completing her Master’s in Applied Ethics.

I sat down with Lieke to learn more about her experience so far, including how her educational career led her here and what she’s learned from the internship.

Can you tell me a bit about your background?

In undergrad, I studied Communication Science and had some exposure to subjects like ethics and philosophy of technology. Studying at a technical university triggered my interest in this field, so I started a Masters in Philosophy of Science, Technology & Society. While I felt the tools and methodologies that you learn in philosophy are important to technology and business, I realized I didn’t want to go into pure philosophy as my main profession.

Why is that?

I think of ethical decision making as a skill that’s essential to most — if not all — professions. In order for a company, or a society, to truly build ethical technology, everyone involved in the research and product development process has to be equipped with ethical and responsible problem solving skills.

How did this thinking shape your educational focus?

I wanted to think about ethical problems with an emphasis on how we can apply methodologies from ethics and philosophy to contemporary issues. So, I pivoted to a Digital Business & Innovation degree followed by a Masters in Applied Ethics, both of which I’m completing during my internship. By combining these programs, I learned a lot about the opportunities technology provides businesses and the challenges that arise as a result of technological innovation.

Both of those degrees seem really well suited for the field of Responsible Innovation — did you know this was the field you wanted to go into when you chose those degrees?

While I knew I wanted to go into a field that combined ethics and technology, I didn’t know that a team like the Responsible Innovation team existed for most of my academic career. I chose studies based on my interests, but I wasn’t sure what it could bring me in my further career. Then, during my first Masters, a friend of mine gave me a book by Barbara Sher called Refuse to Choose!, which highlights the power of combining seemingly distinct fields. Reading about other people who didn’t choose a specific course and instead studied what interests them made me realize that the most important thing is that your journey makes sense to you. Although my degrees felt pretty haphazard (to others), it made sense to me how these areas complement each other. However, I was unsure about how these would come together in a professional career. So I was excited to find out about Google’s Responsible Innovation initiatives and AI Principles and eventually find a role on this team.

Did your understanding of tech ethics change during your internship?

During my internship I got to sit in on some AI Principles Reviews, a process that assesses proposals for new AI research and application for alignment with our Principles. I’m also working on expanding our body of external case studies so that we can share our learnings with AI practitioners everywhere — my colleague Dr. Molly FitzMorris recently published our team’s first business school case study in partnership with the Berkeley Haas School of Business. I’ve enjoyed working on these case studies because they show how our Principles are operationalized across the whole company.

These experiences deepened my belief that ethical decision making is an important skill for everyone to have, from developers, to designers, and researchers beyond teams like Responsible Innovation. Being on this team has also reinforced that it’s essential to have people tasked with taking deep dives into what the ethical development of technologies like AI should look like, ensuring that other people put those ideas into practice. Ethics aren’t defined or static, so it’s important to have people who devote themselves completely to it.

Can you share any key learnings and takeaways from your internship?

Stay eager to learn, and always ask a lot of questions. Find what genuinely interests you, and don’t be afraid if that strays from traditional or linear career paths; even if those areas don’t seem directly related, interdisciplinary skills and thinking are incredibly valuable.

And if you’re interested in going into tech, don’t limit yourself to purely technical fields. These days, technology is interwoven into almost all aspects of our everyday lives. Understanding the human and cultural components of new technology is essential to understanding its broader impact — and ensuring that it is really serving everyone.

Speaking the Language of the Genome: Gordon Bell Finalist Applies Large Language Models to Predict New COVID Variants

A finalist for the Gordon Bell special prize for high performance computing-based COVID-19 research has taught large language models (LLMs) a new lingo — gene sequences — that can unlock insights in genomics, epidemiology and protein engineering.

Published in October, the groundbreaking work is a collaboration by more than two dozen academic and commercial researchers from Argonne National Laboratory, NVIDIA, the University of Chicago and others.

The research team trained an LLM to track genetic mutations and predict variants of concern in SARS-CoV-2, the virus behind COVID-19. While most LLMs applied to biology to date have been trained on datasets of small molecules or proteins, this project is one of the first models trained on raw nucleotide sequences — the smallest units of DNA and RNA.

“We hypothesized that moving from protein-level to gene-level data might help us build better models to understand COVID variants,” said Arvind Ramanathan, computational biologist at Argonne, who led the project. “By training our model to track the entire genome and all the changes that appear in its evolution, we can make better predictions about not just COVID, but any disease with enough genomic data.”

The Gordon Bell awards, regarded as the Nobel Prize of high performance computing, will be presented at this week’s SC22 conference by the Association for Computing Machinery, which represents around 100,000 computing experts worldwide. Since 2020, the group has awarded a special prize for outstanding research that advances the understanding of COVID with HPC.

Training LLMs on a Four-Letter Language

LLMs have long been trained on human languages, which usually comprise a couple dozen letters that can be arranged into tens of thousands of words, and joined together into longer sentences and paragraphs. The language of biology, on the other hand, has only four letters representing nucleotides — A, T, G and C in DNA, or A, U, G and C in RNA — arranged into different sequences as genes.

While fewer letters may seem like a simpler challenge for AI, language models for biology are actually far more complicated. That’s because the genome — made up of over 3 billion nucleotides in humans, and about 30,000 nucleotides in coronaviruses — is difficult to break down into distinct, meaningful units.

“When it comes to understanding the code of life, a major challenge is that the sequencing information in the genome is quite vast,” Ramanathan said. “The meaning of a nucleotide sequence can be affected by another sequence that’s much further away than the next sentence or paragraph would be in human text. It could reach over the equivalent of chapters in a book.”

NVIDIA collaborators on the project designed a hierarchical diffusion method that enabled the LLM to treat long strings of around 1,500 nucleotides as if they were sentences.

“Standard language models have trouble generating coherent long sequences and learning the underlying distribution of different variants,” said paper co-author Anima Anandkumar, senior director of AI research at NVIDIA and Bren professor in the computing + mathematical sciences department at Caltech. “We developed a diffusion model that operates at a higher level of detail that allows us to generate realistic variants and capture better statistics.”

Predicting COVID Variants of Concern

Using open-source data from the Bacterial and Viral Bioinformatics Resource Center, the team first pretrained its LLM on more than 110 million gene sequences from prokaryotes, which are single-celled organisms like bacteria. It then fine-tuned the model using 1.5 million high-quality genome sequences for the COVID virus.

By pretraining on a broader dataset, the researchers also ensured their model could generalize to other prediction tasks in future projects — making it one of the first whole-genome-scale models with this capability.

Once fine-tuned on COVID data, the LLM was able to distinguish between genome sequences of the virus’ variants. It was also able to generate its own nucleotide sequences, predicting potential mutations of the COVID genome that could help scientists anticipate future variants of concern.

visualization of sequenced covid genomes — Trained on a year’s worth of SARS-CoV-2 genome data, the model can infer the distinction between various viral strains. Each dot on the left corresponds to a sequenced SARS-CoV-2 viral strain, color-coded by variant. The figure on the right zooms into one particular strain of the virus, which captures evolutionary couplings across the viral proteins specific to this strain. *Image courtesy of Argonne National Laboratory’s Bharat Kale, Max Zvyagin and Michael E. Papka.*

“Most researchers have been tracking mutations in the spike protein of the COVID virus, specifically the domain that binds with human cells,” Ramanathan said. “But there are other proteins in the viral genome that go through frequent mutations and are important to understand.”

The model could also integrate with popular protein-structure-prediction models like AlphaFold and OpenFold, the paper stated, helping researchers simulate viral structure and study how genetic mutations impact a virus’ ability to infect its host. OpenFold is one of the pretrained language models included in the NVIDIA BioNeMo LLM service for developers applying LLMs to digital biology and chemistry applications.

Supercharging AI Training With GPU-Accelerated Supercomputers

The team developed its AI models on supercomputers powered by NVIDIA A100 Tensor Core GPUs — including Argonne’s Polaris, the U.S. Department of Energy’s Perlmutter, and NVIDIA’s in-house Selene system. By scaling up to these powerful systems, they achieved performance of more than 1,500 exaflops in training runs, creating the largest biological language models to date.

“We’re working with models today that have up to 25 billion parameters, and we expect this to significantly increase in the future,” said Ramanathan. “The model size, the genetic sequence lengths and the amount of training data needed means we really need the computational complexity provided by supercomputers with thousands of GPUs.”

The researchers estimate that training a version of their model with 2.5 billion parameters took over a month on around 4,000 GPUs. The team, which was already investigating LLMs for biology, spent about four months on the project before publicly releasing the paper and code. The GitHub page includes instructions for other researchers to run the model on Polaris and Perlmutter.

The NVIDIA BioNeMo framework, available in early access on the NVIDIA NGC hub for GPU-optimized software, supports researchers scaling large biomolecular language models across multiple GPUs. Part of the NVIDIA Clara Discovery collection of drug discovery tools, the framework will support chemistry, protein, DNA and RNA data formats.

Find NVIDIA at SC22.

Image at top represents COVID strains sequenced by the researchers’ LLM. Each dot is color-coded by COVID variant. Image courtesy of Argonne National Laboratory’s Bharat Kale, Max Zvyagin and Michael E. Papka.

The post Speaking the Language of the Genome: Gordon Bell Finalist Applies Large Language Models to Predict New COVID Variants appeared first on NVIDIA Blog.

Going the Distance: NVIDIA Platform Solves HPC Problems at the Edge

Collaboration among researchers, like the scientific community itself, spans the globe.

Universities and enterprises sharing work over long distances require a common language and secure pipeline to get every device — from microscopes and sensors to servers and campus networks — to see and understand the data each is transmitting. The increasing amount of data that needs to be stored, transmitted and analyzed only compounds the challenge.

To overcome this problem, NVIDIA has introduced a high performance computing platform that combines edge computing and AI to capture and consolidate streaming data from scientific edge instruments, and then allow the devices to talk to each other over long distances.

The platform consists of three major components. NVIDIA Holoscan is a software development kit that data scientists and domain experts can use to build GPU-accelerated pipelines for sensors that stream data. MetroX-3 is a new long-haul system that extends the connectivity of the NVIDIA Quantum-2 InfiniBand platform. And NVIDIA BlueField-3 DPUs provide secure and intelligent data migration.

Researchers can use the new NVIDIA platform for HPC edge computing to securely communicate and collaborate on solving problems and bring their disparate devices and algorithms together to operate as one large supercomputer.

Holoscan for HPC at the Edge

Accelerated by GPU computing platforms — including NVIDIA IGX, HGX, DGX systems — NVIDIA Holoscan delivers the extreme performance required to process massive streams of data generated by the world’s scientific instruments.

NVIDIA Holoscan for HPC includes new APIs for C++ and Python that HPC researchers can use to build sensor data processing workflows that are flexible enough for non-image formats and scalable enough to translate raw data into real-time insights.

Holoscan also manages memory allocation to ensure zero-copy data exchanges, so developers can focus on the workflow logic and not worry about managing file and memory I/O.

The new features in Holoscan will be available to all the HPC developers next month. Sign up to be notified of early access to Holoscan 0.4 SDK.

MetroX-3 Goes the Distance

The NVIDIA MetroX-3 long-haul system, available next month, extends the latest cloud-native capabilities of the NVIDIA Quantum-2 InfiniBand platform from the edge to the HPC data center core. It enables GPUs between sites to securely share data over the InfiniBand network up to 25 miles (40km) away.

Taking advantage of native remote direct memory access, users can easily migrate data and compute jobs from one InfiniBand-connected mini-cluster to the main data center, or combine geographically dispersed compute clusters for higher overall performance and scalability.

Data center operators can efficiently provision, monitor and operate across all the InfiniBand-connected data center networks by using the NVIDIA Unified Fabric Manager to manage their MetroX-3 systems.

BlueField for Secure, Efficient HPC

NVIDIA BlueField data processing units offload, accelerate and isolate advanced networking, storage and security services to boost performance and efficiency for modern HPC.

During SC22, system software company Zettar is demonstrating its data migration and storage offload solution based on BlueField-3. Zettar software can consolidate data migration tasks to a data center footprint of 4U rack space, which today requires 13U with x86-based solutions.

Learn more about the new NVIDIA platform for HPC computing at the edge.

The post Going the Distance: NVIDIA Platform Solves HPC Problems at the Edge appeared first on NVIDIA Blog.

Supercomputing Superpowers: NVIDIA Brings Digital Twin Simulation to HPC Data Center Operators

The technologies powering the world’s 7 million data centers are changing rapidly. The latest have allowed IT organizations to reduce costs even while dealing with exponential data growth.

Simulation and digital twins can help data center designers, builders and operators create highly efficient and performant facilities. But building a digital twin that can accurately represent all components of an AI supercomputing facility is a massive, complex undertaking.

The NVIDIA Omniverse simulation platform helps address this challenge by streamlining the process for collaborative virtual design. An Omniverse demo at SC22 showcased how the people behind data centers can use this open development platform to enhance the design and development of complex supercomputing facilities.

Omniverse, for the first time, lets data center operators aggregate real-time data inputs from their core third-party computer-aided design, simulation and monitoring applications so they can see and work with their complete datasets in real time.

The demo shows how Omniverse allows users to tap into the power of accelerated computing, simulation and operational digital twins connected to real-time monitoring and AI. This enables teams to streamline facility design, accelerate construction and deployment, and optimize ongoing operations.

The demo also highlighted NVIDIA Air, a data center simulation platform designed to work in conjunction with Omniverse to simulate the network — the central nervous system of the data center. With NVIDIA Air, teams can model the entire network stack, allowing them to automate and validate network hardware and software prior to bring-up.

Creating Digital Twins to Elevate Design and Simulation

In planning and constructing one of NVIDIA’s latest AI supercomputers, multiple engineering CAD datasets were collected from third-party industry tools such as Autodesk Revit, PTC Creo and Trimble SketchUp. This allowed designers and engineers to view the Universal Scene Description-based model in full fidelity, and they could collaboratively iterate on the design in real time.

PATCH MANAGER is an enterprise software application for planning cabling, assets and physical layer point-to-point connectivity in network domains. With PATCH MANAGER connected to Omniverse, the complex topology of port-to-port connections, rack and node layouts, and cabling can be integrated directly into the live model. This enables data center engineers to see the full view of the model and its dependencies.

To predict airflow and heat transfers, engineers used Cadence 6SigmaDCX, a software for computational fluid dynamics. Engineers can also use AI surrogates trained with NVIDIA Modulus for “what-if” analysis in near-real time. This lets teams simulate changes in complex thermals and cooling, and they can see the results instantly.

And with NVIDIA Air, the exact network topology — including protocols, monitoring and automation — can be simulated and prevalidated.

Once construction of a data center is complete, its sensors, control system and telemetry can be connected to the digital twin inside Omniverse, enabling real-time monitoring of operations.

With a perfectly synchronized digital twin, engineers can simulate common dangers such as power peaking or cooling system failures. Operators can benefit from AI-recommended changes that optimize for key priorities like boosting energy efficiency and reducing carbon footprint. The digital twin also allows them to test and validate software and component upgrades before deploying to the physical data center.

Catch up on the latest announcements by watching NVIDIA’s SC22 special address, and learn more about NVIDIA Omniverse.

The post Supercomputing Superpowers: NVIDIA Brings Digital Twin Simulation to HPC Data Center Operators appeared first on NVIDIA Blog.

NVIDIA and Dell Technologies Deliver AI and HPC Performance in Leaps and Bounds With Hopper, at SC22

Whether focused on tiny atoms or the immensity of outer space, supercomputing workloads benefit from the flexibility that the largest systems provide scientists and researchers.

To meet the needs of organizations with such large AI and high performance computing (HPC) workloads, Dell Technologies today unveiled the Dell PowerEdge XE9680 system — its first system with eight NVIDIA GPUs interconnected with NVIDIA NVLink — at SC22, an international supercomputing conference running through Friday.

The Dell PowerEdge XE9680 system is built on the NVIDIA HGX H100 architecture and packs eight NVIDIA H100 Tensor Core GPUs to serve the growing demand for large-scale AI and HPC workflows.

These include large language models for communications, chemistry and biology, as well as simulation and research in industries spanning aerospace, agriculture, climate, energy and manufacturing.

The XE9680 system is arriving alongside other new Dell servers announced today with NVIDIA Hopper architecture GPUs, including the Dell PowerEdge XE8640.

“Organizations working on advanced research and development need both speed and efficiency to accelerate discovery,” said Ian Buck, vice president of Hyperscale and High Performance Computing, NVIDIA. “Whether researchers are building more efficient rockets or investigating the behavior of molecules, Dell Technologies’ new PowerEdge systems provide the compute power and efficiency needed for massive AI and HPC workloads.”

“Dell Technologies and NVIDIA have been working together to serve customers for decades,” said Rajesh Pohani, vice president of portfolio and product management for PowerEdge, HPC and Core Compute at Dell Technologies. “As enterprise needs have grown, the forthcoming Dell PowerEdge servers with NVIDIA Hopper Tensor Core GPUs provide leaps in performance, scalability and security to accelerate the largest workloads.”

NVIDIA H100 to Turbocharge Dell Customer Data Centers

Fresh off setting world records in the MLPerf AI training benchmarks earlier this month, NVIDIA H100 is the world’s most advanced GPU. It’s packed with 80 billion transistors and features major advances to accelerate AI, HPC, memory bandwidth and interconnects at data center scale.

H100 is the engine of AI factories that organizations use to process and refine large datasets to produce intelligence and accelerate their AI-driven businesses. It features a dedicated Transformer Engine and fourth generation NVIDIA NVLink interconnect to accelerate exascale workloads.

Each system built on the NVIDIA HGX H100 platform features four or eight Hopper GPUs to deliver the highest AI performance with 3.5x more energy efficiency compared with the prior generation, saving development costs while accelerating discoveries.

Powerful Performance and Customer Options for AI, HPC Workloads

Dell systems power the work of leading organizations, and the forthcoming Hopper-based systems will broaden Dell’s portfolio of solutions for its customers around the world.

With its enhanced, air-cooled design and support for eight NVIDIA H100 GPUs with built-in NVLink connectivity, the PowerEdge XE9680 is purpose-built for optimal performance to help modernize operations and infrastructure to drive AI initiatives.

The PowerEdge XE8640, Dell’s new HGX H100 system with four Hopper GPUs, enables businesses to develop, train and deploy AI and machine learning models. A 4U rack system, the XE8540 delivers faster AI training performance and increased core capabilities with up to four PCIe Gen5 slots, NVIDIA Multi-Instance GPU (MIG) technology and NVIDIA GPUDirect Storage support.

Availability

The Dell PowerEdge XE9680 and XE8640 will be available from Dell starting in the first half of 2023.

Customers can now try NVIDIA H100 GPUs on Dell PowerEdge servers on NVIDIA LaunchPad, which provides free hands-on experiences and gives companies access to the latest hardware and NVIDIA AI software.

To take a first look at Dell’s new servers with NVIDIA H100 GPUs at SC22, visit Dell in booth 2443.

The post NVIDIA and Dell Technologies Deliver AI and HPC Performance in Leaps and Bounds With Hopper, at SC22 appeared first on NVIDIA Blog.

Honors and awards presented to Amazon researchers

Omar Javed, Steven Flammia, Michael I. Jordan, and Daniela Witten recently recognized for their contributions to science.Read More

Amazon SageMaker Studio Lab continues to democratize ML with more scale and functionality

To make machine learning (ML) more accessible, Amazon launched Amazon SageMaker Studio Lab at AWS re:Invent 2021. Today, tens of thousands of customers use it every day to learn and experiment with ML for free. We made it simple to get started with just an email address, without the need for installs, setups, credit cards, or an AWS account.

SageMaker Studio Lab resonates with customers who want to learn in either an informal or formal setting, as indicated by a recent survey that suggests 49% of our current customer base is learning on their own, whereas 21% is taking a formal ML class. Higher learning institutions have started to adopt it, because it helps them teach ML fundamentals beyond the notebook, like environment and resource management, which are critical areas for successful ML projects. Enterprise partners like Hugging Face, Snowflake, and Roboflow are using SageMaker Studio Lab to showcase their own ML capabilities.

In this post, we discuss new features in SageMaker Studio Lab, and share some customer success stories.

New features in SageMaker Studio Lab

We have continued to develop new features and mechanisms to delight, protect, and enable our ML community. Here are the latest enhancements:

To safeguard the CPU and GPU capacity from potential usage abuse, we launched a 2-step verification, increasing the size of the community we can serve. Going forward every customer be required to link their account to a mobile phone number.
In October 2022, we rolled out automated account approvals, enabling you to get a SageMaker Studio Lab account in less than a day.
We tripled capacity for GPU and CPU, enabling most of our customers to get an instance when they need it.
A safe mode was introduced to help you move forward if your environment becomes unstable. Although this is rare, it typically happens when customers exceed their storage limits.
We’ve added support for the Juptyer-LSP (Language Server Protocol) extension, providing you with code completion functionality. Note that if you got your account before November 2022, you can get this functionality by following few simple instructions (see FAQ for details).

Customer success stories

We continue to be customer obsessed, offering important features to customers based on their feedback. Here are some highlights from key institutions and partners:

“SageMaker Studio Lab solves a real problem in the classroom in that it provides an industrial-strength hosted Jupyter solution with GPU that goes beyond just a hosted notebook alone. The ability to add packages, configure an environment, and open a terminal has opened up many new learning opportunities for students. Finally, fine-tuning Hugging Face models with powerful GPUs has been an amazing emerging workflow to present to students. LLMs (large language models) are the future of AI, and SageMaker Studio Lab has enabled me to teach the future of AI.”

—Noah Gift, Executive in Residence at Duke MIDS (Data Science)

“SageMaker Studio Lab has been used by my team since it was in beta because of its powerful experience for ML developers. It effortlessly integrates with Snowpark, Snowflake’s developer framework, to provide an easy-to-get-started notebook interface for Snowflake Python developers. I’ve used it for multiple demos with customers and partners, and the response has been overwhelmingly favorable.”

—Eda Johnson, Partner Industry Solutions Manager at Snowflake

“Roboflow empowers developers to build their own computer vision applications, no matter their skillset or experience. With SageMaker Studio Lab, our large community of computer vision developers can access our models and data in an environment that closely resembles a local JupyterLab, which is what they are most accustomed to. The persistent storage of SageMaker Studio Lab is a game changer, because you don’t need to start from the beginning for each user session. SageMaker Studio Lab has personally become my go-to notebook platform of choice.”

—Mark McQuade, Field Engineering at Roboflow

“RPI owns one of the most powerful super computers in the world, but it (AiMOS) has a steep learning curve. We needed a way for our students to get started effectively, and frugally. SageMaker Studio Lab’s intuitive interface enabled our students to get started quickly, and provided powerful GPU, enabling them to work with complex deep learning models for their capstone projects.”

—Mohammed J. Zaki, Professor of Computer Science at Rensselaer Polytechnic Institute

“I use SageMaker Studio Lab in basic machine learning and Python-related courses that are designed to give students a solid foundation in many cloud technologies. Studio Lab enables our students to get hands-on experience with real-world data science projects, without them having to get bogged down in setups or configurations. Unlike other vendors, it is a Linux machine for students, and students can do much more coding exercises indeed!”

—Cyrus Wong, Senior Lecturer, Higher Diploma in Cloud and Data Centre Administration at the Department of Information Technology, IVE (LWL)

“Students in Northwestern Engineering’s Master of Science in Artificial Intelligence (MSAI) program were given a quick tour of SageMaker Studio Lab before using it in a 5-hour hackathon to apply what they learned to a real-world situation. We expected the students to naturally hit some obstacles during the very short time period. Instead, the students exceeded our expectations by not only completing all the projects but also giving very good presentations in which they showcased fascinating solutions to important real-world problems.”

—Mohammed Alam, Deputy Director of the MSAI program at Northwestern University

Get started with SageMaker Studio Lab

SageMaker Studio Lab is a great entry point for anyone interested in learning more about ML and data science. Amazon continues to invest in this free service, as well as other training assets and scholarship programs, to make ML accessible to all.

Get started with SageMaker Studio Lab today!

About the author

Michele Monclova is a principal product manager at AWS on the SageMaker team. She is a native New Yorker and Silicon Valley veteran. She is passionate about innovations that improve our quality of life.

.	Python	TensorFlow	PyTorch	Scikit-learn	XGBoost
Versions supported	3.8	2.9.1	1.12.1	1.0-1	1.3-1 to 1.5-1

Overview of the Data Wrangler refit feature

Solution overview

Prerequisites

Perform EDA in Data Wrangler

Launch a Data Wrangler data processing job without refit enabled

Launch a Data Wrangler data processing job with refit enabled

Export to Python code for running Data Wrangler processing jobs

Compare data processing job results

Clean up

Conclusion

About the authors

Brief overview of Graviton

Solution overview

Prerequisites

Create a SageMaker model

Create an endpoint config

Launch the endpoint

What is supported

Recommended best practices

Deployment example

Clean up resources

Price-performance comparison

Conclusion

About the authors

Hopper’s Green500 Debut

The Next-Generation Accelerated Data Center

DPUs Driving Additional Efficiency Gains

More With Less

More Resources

Training LLMs on a Four-Letter Language

Predicting COVID Variants of Concern

Supercharging AI Training With GPU-Accelerated Supercomputers

Holoscan for HPC at the Edge

MetroX-3 Goes the Distance

BlueField for Secure, Efficient HPC

Creating Digital Twins to Elevate Design and Simulation

NVIDIA H100 to Turbocharge Dell Customer Data Centers

Powerful Performance and Customer Options for AI, HPC Workloads

Availability

New features in SageMaker Studio Lab

Customer success stories

Get started with SageMaker Studio Lab

About the author

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.