December 2022 – Page 5

Power recommendation and search using an IMDb knowledge graph – Part 1

The IMDb and Box Office Mojo Movies/TV/OTT licensable data package provides a wide range of entertainment metadata, including over 1 billion user ratings; credits for more than 11 million cast and crew members; 9 million movie, TV, and entertainment titles; and global box office reporting data from more than 60 countries. Many AWS media and entertainment customers license IMDb data through AWS Data Exchange to improve content discovery and increase customer engagement and retention.

In this three-part series, we demonstrate how to transform and prepare IMDb data to power out-of-catalog search for your media and entertainment use cases. In this post, we discuss how to prepare IMDb data and load the data into Amazon Neptune for querying. In Part 2, we discuss how to use Amazon Neptune ML to train graph neural network (GNN) embeddings from the IMDb graph. In Part 3, we walk through a demo application out-of-catalog search that is powered by the GNN embeddings.

Solution overview

In this series, we use the IMDb and Box Office Mojo Movies/TV/OTT licensed data package to show how you can built your own applications using graphs.

This licensable data package consists of JSON files with IMDb metadata for more than 9 million titles (including movies, TV and OTT shows, and video games) and credits for more than 11 million cast, crew, and entertainment professionals. IMDb’s metadata package also includes over 1 billion user ratings, as well as plots, genres, categorized keywords, posters, credits, and more.

IMDb delivers data through AWS Data Exchange, which makes it incredibly simple for you to access data to power your entertainment experiences and seamlessly integrate with other AWS services. IMDb licenses data to a wide range of media and entertainment customers, including pay TV, direct-to-consumer, and streaming operators, to improve content discovery and increase customer engagement and retention. Licensing customers also use IMDb data to enhance in-catalog and out-of-catalog title search and power relevant recommendations.

We use the following services as part of this solution:

AWS Lambda
Amazon Neptune
Amazon Neptune ML
Amazon OpenSearch Service
AWS Glue
Amazon SageMaker notebooks
Amazon SageMaker Processing
Amazon SageMaker Training

The following diagram depicts the workflow for part 1 of the 3 part blog series.

In this post, we walk through the following high-level steps:

Provision Neptune resources with AWS CloudFormation.
Access the IMDb data from AWS Data Exchange.
Clone the GitHub repo.
Process the data in Neptune Gremlin format.
Load the data into a Neptune cluster.
Query the data using Gremlin Query Language.

Prerequisites

The IMDb data used in this post requires an IMDb content license and paid subscription to the IMDb and Box Office Mojo Movies/TV/OTT licensing package in AWS Data Exchange. To inquire about a license and access sample data, visit developer.imdb.com.

Additionally, to follow along with this post, you should have an AWS account and familiarity with Neptune, the Gremlin query language, and SageMaker.

Provision Neptune resources with AWS CloudFormation

Now that you’ve seen the structure of the solution, you can deploy it into your account to run an example workflow.

You can launch the stack in AWS Region us-east-1 on the AWS CloudFormation console by choosing Launch Stack:

To launch the stack in a different Region, refer to Using the Neptune ML AWS CloudFormation template to get started quickly in a new DB cluster.

The following screenshot shows the stack parameters to provide.

Stack creation takes approximately 20 minutes. You can monitor the progress on the AWS CloudFormation console.

When the stack is complete, you’re now ready to process the IMDb data. On the Outputs tab for the stack, note the values for NeptuneExportApiUri and NeptuneLoadFromS3IAMRoleArn. Then proceed to the following steps to gain access to the IMDb dataset.

Access the IMDb data

IMDb publishes its dataset once a day on AWS Data Exchange. To use the IMDb data, you first subscribe to the data in AWS Data Exchange, then you can export the data to Amazon Simple Storage Service (Amazon S3). Complete the following steps:

On the AWS Data Exchange console, choose Browse catalog in the navigation pane.
In the search field, enter IMDb.
Subscribe to either IMDb and Box Office Mojo Movie/TV/OTT Data (SAMPLE) or IMDb and Box Office Mojo Movie/TV/OTT Data.
Complete the steps in the following workshop to export the IMDb data from AWS Data Exchange to Amazon S3.

Clone the GitHub repository

Complete the following steps:

Open the SageMaker instance that you created from the CloudFormation template.
Clone the GitHub repository.

Process IMDb data in Neptune Gremlin format

To add the data into Amazon Neptune, we process the data in Neptune gremlin format. From the GitHub repository, we run process_imdb_data.py to process the files. The script creates the CSVs to load the data into Neptune. Upload the data to an S3 bucket and note the S3 URI location.

Note that for this post, we filter the dataset to include only movies. You need either an AWS Glue job or Amazon EMR to process the full data.

To process the IMDb data using AWS Glue, complete the following steps:

On the AWS Glue console, in the navigation pane, choose Jobs.
On the Jobs page, choose Spark script editor.
Under Options, choose Upload and edit existing script and upload the 1_process_imdb_data.py file.
Choose Create.
On the editor page, choose Job Details.
On the Job Details page, add the following options:
1. For Name, enter imdb-graph-processor.
2. For Description, enter processing IMDb dataset and convert to Neptune Gremlin Format.
3. For IAM role, use an existing AWS Glue role or create an IAM role for AWS Glue. Make sure you give permission to your Amazon S3 location for the raw data and output data path.
4. For Worker type, choose G 2X.
5. For Requested number of workers, enter 20.
Expand Advanced properties.
Under Job Parameters, choose Add new parameter and enter the following key value pair:
1. For the key, enter --output_bucket_path.
2. For the value, enter the S3 path where you want to save the files. This path is also used to load the data into the Neptune cluster.
To add another parameter, choose Add new parameter and enter the following key value pair:
1. For the key, enter --raw_data_path.
2. For the value, enter the S3 path where the raw data is stored.
Choose Save and then choose Run.

This job takes about 2.5 hours to complete.

The following table provide details about the nodes for the graph data model.

Description	Label
Principal cast members	Person
Long format movie	Movie
Genre of movies	Genre
Keyword descriptions of movies	Keyword
Shooting locations of movies	Place
Ratings for movies	rating
Awards event where movie received an award	awards

Similarly, the following table shows some of the edges included in the graph. There will be in total 24 edge types.

Description	Label	From	To
Movies an actress has acted in	casted-by-actress	Movie	Person
Movies an actor has acted in	casted-by-actor	Movie	Person
Keywords in a movie by character	described-by-character-keyword	Movie	keyword
Genre of a movie	is-genre	Movie	Genre
Place where the movie was shot	Filmed-at	Movie	Place
Composer of a movie	Crewed-by-composer	Movie	Person
award nomination	Nominated_for	Movie	Awards
award winner	Has_won	Movie	Awards

Load the data into a Neptune cluster

In the repo, navigate to the graph_creation folder and run the 2_load.ipynb. To load the data to Neptune, use the %load command in the notebook, and provide your AWS Identity and Access Management (IAM) role ARN and Amazon S3 location of your processed data.

role = '<NeptuneLoadFromS3IAMRoleArn>'
%load -l {role} -s <s3_location> --store-to load_id

The following screen shot shows the output of the command.

Note that the data load takes about 1.5 hours to complete. To check the status of the load, use the following command:

%load_status {load_id['payload']['loadId']} --errors --details

When the load is complete, the status displays LOAD_COMPLETED, as shown in the following screenshot.

All the data is now loaded into graphs, and you can start querying the graph.

Fig: Sample Knowledge graph representation of movies in IMDb dataset. Movies “Saving Private Ryan” and “Bridge of Spies” have common connections like actor and director as well as indirect connections through movies like “The Catcher was a Spy” in the graph network.

Query the data using Gremlin

To access the graph in Neptune, we use the Gremlin query language. For more information, refer to Querying a Neptune Graph.

The graph consists of a rich set of information that can be queried directly using Gremlin. In this section, we show a few examples of questions that you can answer with the graph data. In the repo, navigate to the graph_creation folder and run the 3_queries.ipynb notebook. The following section goes over all the queries from the notebook.

Worldwide gross of movies that have been shot in New Zealand, with minimum 7.5 rating

The following query returns the worldwide gross of movies filmed in New Zealand, with a minimum rating of 7.5:

%%gremlin --store-to result

g.V().has('place', 'name', containing('New Zealand')).in().has('movie', 'rating', gt(7.5)).dedup().valueMap(['name', 'gross_worldwide', 'rating', 'studio','id'])

The following screenshot shows the query results.

Top 50 movies that belong to action and drama genres and have Oscar-winning actors

In the following example, we want to find the top 50 movies in two different genres (action and drama) with Oscar-winning actors. We can do this by using three different queries and merging the information using Pandas:

%%gremlin --store result_action
g.V().has('genre', 'name', 'Action').in().has('movie', 'rating', gt(8.5)).limit(50).valueMap(['name', 'year', 'poster'])

%%gremlin --store result_drama
g.V().has('genre', 'name', 'Drama').in().has('movie', 'rating', gt(8.5)).limit(50).valueMap(['name', 'year', 'poster'])

%%gremlin --store result_actors --silent
g.V().has('person', 'oscar_winner', true).in().has('movie', 'rating', gt(8.5)).limit(50).valueMap(['name', 'year', 'poster'])

The following screenshot shows our results.

Top movies that have common keywords “tattoo” and “assassin”

The following query returns movies with keywords “tattoo” and “assassin”:

%%gremlin --store result

g.V().has('keyword','name','assassin').in("described-by-plot-related-keyword").where(out("described-by-plot-related-keyword").has('keyword','name','tattoo')).dedup().limit(10).valueMap(['name', 'poster','year'])

The following screenshot shows our results.

Movies that have common actors

In the following query, we find movies that have Leonardo DiCaprio and Tom Hanks:

%%gremlin --store result

g.V().has('person', 'name', containing('Leonardo DiCaprio')).in().hasLabel('movie').out().has('person','name', 'Tom Hanks').path().by(valueMap('name', 'poster'))

We get the following results.

Conclusion

In this post, we showed you the power of the IMDb and Box Office Mojo Movies/TV/OTT dataset and how you can use it in various use cases by converting the data into a graph using Gremlin queries. In Part 2 of this series, we show you how to create graph neural network models on this data that can be used for downstream tasks.

For more information about Neptune and Gremlin, refer to Amazon Neptune Resources for additional blog posts and videos.

About the Authors

Gaurav Rele is a Data Scientist at the Amazon ML Solution Lab, where he works with AWS customers across different verticals to accelerate their use of machine learning and AWS Cloud services to solve their business challenges.

Matthew Rhodes is a Data Scientist I working in the Amazon ML Solutions Lab. He specializes in building Machine Learning pipelines that involve concepts such as Natural Language Processing and Computer Vision.

Divya Bhargavi is a Data Scientist and Media and Entertainment Vertical Lead at the Amazon ML Solutions Lab, where she solves high-value business problems for AWS customers using Machine Learning. She works on image/video understanding, knowledge graph recommendation systems, predictive advertising use cases.

Karan Sindwani is a Data Scientist at Amazon ML Solutions Lab, where he builds and deploys deep learning models. He specializes in the area of computer vision. In his spare time, he enjoys hiking.

Soji Adeshina is an Applied Scientist at AWS where he develops graph neural network-based models for machine learning on graphs tasks with applications to fraud & abuse, knowledge graphs, recommender systems, and life sciences. In his spare time, he enjoys reading and cooking.

Vidya Sagar Ravipati is a Manager at the Amazon ML Solutions Lab, where he leverages his vast experience in large-scale distributed systems and his passion for machine learning to help AWS customers across different industry verticals accelerate their AI and cloud adoption.

Accelerate the investment process with AWS Low Code-No Code services

The last few years have seen a tremendous paradigm shift in how institutional asset managers source and integrate multiple data sources into their investment process. With frequent shifts in risk correlations, unexpected sources of volatility, and increasing competition from passive strategies, asset managers are employing a broader set of third-party data sources to gain a competitive edge and improve risk-adjusted returns. However, the process of extracting benefits from multiple data sources can be extremely challenging. Asset managers’ data engineering teams are overloaded with data acquisition and preprocessing, while data science teams are mining data for investment insights.

Third-party or alternative data refers to data used in the investment process, sourced outside of the traditional market data providers. Institutional investors are frequently augmenting their traditional data sources with third-party or alternative data to gain an edge in their investment process. Typically cited examples include, but are not limited to, satellite imaging, credit card data, and social media sentiment. Fund managers invest nearly $3 billion annually in external datasets, with yearly spend growing by 20–30 percent.

With the exponential growth of available third-party and alternative datasets, the ability to quickly analyze whether a new dataset adds new investment insights is a competitive differentiator in the investment management industry. AWS no-code low-code (LCNC) data and AI services enable nontechnical teams to perform the initial data screening, prioritize data onboarding, accelerate time-to-insights, and free valuable technical resources—creating an enduring competitive advantage.

In this blog post, we discuss how, as an institutional asset manager, you can leverage AWS LCNC data and AI services to scale the initial data analysis and prioritization process beyond technical teams and accelerate your decision-making. With AWS LCNC services, you are able to quickly subscribe to and evaluate diverse third-party datasets, preprocess data, and check their predictive power using machine learning (ML) models without writing a single piece of code.

Solution overview

Our use case is to analyze the stock price predictive power of an external dataset and identify its feature importance—which fields most impact the stock price performance. This serves as a first-pass test to identify which of the multiple fields in a dataset should be more closely evaluated using traditional quantitative methodologies to fit with your investment process. This type of first-pass test can be done quickly by analysts, saving time and letting you more quickly prioritize dataset onboarding. Also, while we are using stock price as our target example, other metrics such as profitability, valuation ratios, or trading volumes could also be used. All datasets used for this use case are published in AWS Data Exchange.

The following diagram explains the end-to-end architecture and the AWS LCNC services used to drive the decisions:

Our solution consists of the following steps and solutions:

Data ingestion: AWS Data Exchange for subscribing to the published alternative datasets and downloading them on to Amazon Simple Storage Service (Amazon S3) bucket.
Data engineering: AWS Glue DataBrew for data engineering and transformation of the data stored in Amazon S3.
Machine learning: Amazon SageMaker Canvas for building a time series forecasting model for prediction and identifying the impact of data on the forecast.
Business intelligence: Amazon QuickSight or Amazon SageMaker Canvas to review feature importance to the forecast for decision-making.

Data ingestion

AWS Data Exchange makes it easy to find, subscribe to, and use third-party data in the cloud. You can browse through the AWS Data Exchange catalog and find data products that are relevant to your business and subscribe to the data from the providers without any further processing, and no need for an ETL process. Note that many providers offer free initial subscriptions, which allow you to analyze their data without having to first incur upfront costs.

For this use case, search and subscribe to the below datasets in AWS Data Exchange:

20 Years of End-of-Day Stock Data for Top 10 US Companies by Market Cap published by Alpha Vantage. This free dataset contains 20 years of historical data for the top 10 US stocks by market capitalization as of September 5, 2020. The dataset contains the following 10 symbols—AAPL: Apple Inc.; AMZN: Amazon.com, Inc.; BRK-A: Berkshire Hathaway Inc. (Class A); FB: Facebook, Inc.; GOOG: Alphabet Inc.; JNJ: Johnson & Johnson; MA: Mastercard Incorporated; MSFT: Microsoft Corporation V: Visa Inc.; and WMT: Walmart Inc.
Key data fields include
- Open: as-traded opening price for the day
- High: as-traded high price for the day
- Low: as-traded low price for the day
- Close: as-traded close price for the day
- Volume: trading volume for the day
- Adjusted Close: split and dividend-adjusted closing price of the day
- Split Ratio: ratio of new to old number of shares on the effective date
- Dividend: cash dividend payout amount
S3 Short Interest and Securities Finance Data published by S3 partners. This dataset contains the following fields:

Field	Description
Business Date	Effective date for the rate
Security IDs	Security identifiers contain Sedol, ISIN, FIGI, Ticker, Bloomberg ID
Name	Security Name
Offer Rate	Market composite financing fee paid for existing short positions
Bid Rate	Market composite lending fee earned for existing shares on loan by long holders
Last Rate	Market composite lending fee earned for incremental shares loaned on that date (spot rate)
Crowding	The momentum indicator measures daily shorting and covering events relative to the market float
Short Interest	Real-time short interest expressed in number of shares
ShortInterestNotional	ShortInterest * Price (USD)
ShortInterestPct	Real-time short interest expressed as a percentage of equity float
S3Float	The number of tradable shares including synthetic longs created by short selling
S3SIPctFloat	Real-time short interest projection divided by the S3 float
IndicativeAvailability	S3 projected available lendable quantity
Utilization	Real-time short interest divided by total lendable supply
DaystoCover10Day	It is a liquidity measure = short interest / 10-day average ADTV
DaystoCover30Day	It is a liquidity measure = short interest / 30-day average ADTV
DaystoCover90Day	It is a liquidity measure = short interest / 90-day average ADTV
Original SI	Point in time short interest

To get the data, you will first search for the dataset in AWS Data Exchange and subscribe to the dataset:

Once the publisher of the datasets approves your subscription requests, you will have the datasets available for you to download to your S3 bucket:

Select Add auto-export job destination, provide the details of the S3 bucket, and download the dataset:

Repeat the steps to get the Alpha Vantage dataset. Once completed, you will have both datasets in your S3 bucket.

Data engineering

Once the dataset is in your S3 buckets, you can use AWS Glue DataBrew to transform the data. AWS Glue DataBrew offers over 350 pre-built transformations to automate data preparation tasks (such as filtering anomalies, standardizing formats, and correcting invalid values) that would otherwise require days or weeks of writing hand-coded transformations.

To create a consolidated curated dataset for forecasting in AWS DataBrew, perform the below steps. For detailed information, please refer to this blog.

Create the DataBrew datasets.
Load DataBrew datasets into DataBrew projects.
Build the DataBrew recipes.
Run the DataBrew jobs.

Create DataBrew Datasets: In AWS Glue DataBrew, a dataset represents data that is uploaded from the S3 bucket. We will create two DataBrew datasets—for both end-of-day stock price and S3 short interest. When you create your dataset, you enter the S3 connection details only once. From that point, DataBrew can access the underlying data for you.

Load the DataBrew datasets into DataBrew projects: In AWS Glue DataBrew, a project is the centerpiece of your data analysis and transformation efforts. A DataBrew project brings together the DataBrew datasets and enables you to develop a data transformation (DataBrew recipe). Here again, we will create two DataBrew projects, for end-of-day stock price and S3 short interest.

Build the DataBrew recipes: In DataBrew, a recipe is a set of data transformation steps. You can apply these steps to your dataset. For the use case, we will build two transformations. The first one will change the format of the end-of-day stock price timestamp column so that the dataset can be joined to the S3 short interest:

The second transformation curates the data, and its last step ensures we join the datasets into a single curated dataset. For more details on building data transformation recipes, refer to this blog.

DataBrew jobs: After the creation of the DataBrew recipes, you can run first the end-of-day stock price DataBrew job followed by the S3 short interest recipe. Refer to this blog to create a single consolidated dataset. Save the final curated dataset into an S3 bucket.

The end-to-end data engineering workflow will look like this:

Machine learning

With the curated dataset created post-data engineering, you can use Amazon SageMaker Canvas to build your forecasting model and analyze the impact of features on the forecast. Amazon SageMaker Canvas provides business users with a visual point-and-click interface that allows them to build models and generate accurate ML predictions on their own—without requiring any ML experience or having to write a single line of code.

To build a time series forecasting model in Amazon SageMaker Canvas, follow the below steps. For detailed information, refer to this blog:

Select the curated dataset in SageMaker Canvas.
Build the time series forecasting model.
Analyze the results and feature importance.

Build the time series forecasting model: Once you have selected the dataset, select the target column to be predicted. In our case, this will be the close price of the stock ticker. SageMaker Canvas automatically detects this is a time series forecasting problem statement.

You will have to configure the model as follows for time series forecasting. For item ID, select the stock ticker name. Remember, our dataset has stock ticker prices for the top 10 stocks. Select the timestamp column for the time stamp, and finally, enter the number of days you want to forecast in the future [Forecast Horizon].

Now you are ready to build the model. SageMaker Canvas provides two options to build the model: Quick Build and Standard Build. In our case, we will use “Standard Build”.

Standard Build takes approximately three hours to build the model and uses Amazon Forecast, a time series forecasting service based on ML as the underlying forecasting engine. Forecast creates highly accurate forecasts through model ensembling of traditional and deep learning models without requiring ML experience.

Once the model is built, you can now review the model performance (prediction accuracy) and feature importance. As can be seen from the figure below, the model identifies Crowding and DaysToCover10Day as the two top features driving forecast values. This is in line with our market intuition, as crowding is a momentum indicator measuring daily shorting and covering events, and near-term short interest is a liquidity measure, indicating how investors are positioned in a stock. Both momentum and liquidity can drive price volatility.

This result indicates that these two features (or fields) have a close relationship with stock price movements and can be prioritized higher for onboarding and further analysis.

Business intelligence

In the context of time series forecasting, the notion of backtesting refers to the process of assessing the accuracy of a forecasting method using existing historical data. The process is typically iterative and repeated over multiple dates present in the historical data.

As we already discussed, SageMaker Canvas uses Amazon Forecast as the engine for time-series forecasting. Forecast creates a backtest as a part of the model building process. You can now view the predictor details by signing in to Amazon Forecast. For deeper dive understanding on Model Explainability, refer to this blog.

Amazon Forecast provides additional details on predictor metrics like weighted absolute percentage error (WAPE), root mean square error (RMSE), mean absolute percentage error (MAPE), and mean absolute scaled error (MASE). You can export predictor quality scores from Amazon Forecast.

Amazon Forecast runs one backtest for the time series dataset provided. The backtest results are available for download using the Export backtest results button. Exported backtest results are downloaded to an S3 bucket.

We will now plot the backtest results in Amazon QuickSight. To visualize the backtest results in Amazon QuickSight, connect to the dataset in Amazon S3 from QuickSight and create a visualization.

Clean up

AWS services leveraged in this solution are managed and serverless in nature. SageMaker Canvas is designed to run long running ML training and will be always on. Ensure you explicitly log off SageMaker Canvas. Please refer to the docs for more details.

Conclusion

In this blog post, we discussed how, as an institutional asset manager, you can leverage AWS low-code no-code (LCNC) data and AI services to accelerate the evaluation of external datasets by offloading the initial dataset screening to nontechnical personnel. This first-pass analysis can be done quickly to help you decide which datasets should be prioritized for onboarding and further analysis.

We demonstrated step-by-step how a data analyst can acquire new third-party data through AWS Data Exchange , use AWS Glue DataBrew no-code ETL services to preprocess data and evaluate which features in a dataset have the most impact on the model’s forecast.

Once data is analysis-ready, an analyst uses SageMaker Canvas to build a predictive model, evaluate its fit and identify significant features. In our example, the model’s MAPE (.05) and WAPE (.045) indicated a good fit and showed “Crowding” and “DaysToCover10Day” as the signals in the dataset with the largest impact over the forecast. This analysis quantified what data most influenced the model and could therefore be prioritized for further investigation and potential inclusion into your alpha signals or risk management process. And just as importantly, explainability scores indicate what data plays relatively little role in determining the forecast and therefore can be a lower priority for further investigation.

To more quickly evaluate the ability of third-party financial data to support your investment process, review the Financial Services data sources available on AWS Data Exchange, and give DataBrew and Canvas a try today.

About the Authors

Boris Litvin is Principal Solution Architect, responsible for Financial Services industry innovation. He is a former Quant and FinTech founder, passionate about systematic investing.

Meenakshisundaram Thandavarayan is a Senior AI/ML specialist with AWS. He helps high-tech strategic accounts on their AI and ML journey. He is very passionate about data-driven AI.

Camillo Anania is a Senior Startup Solutions Architect with AWS based in the UK. He is a passionate technologist helping startups of any size build and grow.

Dan Sinnreich is a Sr. Product Manager with AWS, focused on empowering companies to make better decisions with ML. He formerly built portfolio analytics platforms and multi-asset class risk models for large institutional investors.

3D Artist Edward McEvenue Animates Holiday Cheer This Week ‘In the NVIDIA Studio’

Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates how NVIDIA Studio technology improves creative workflows. We’re also deep diving on new GeForce RTX 40 Series GPU features, technologies and resources, and how they dramatically accelerate content creation.

3D artist Edward McEvenue shares his imaginative, holiday-themed short film The Great Candy Inquisition this week In the NVIDIA Studio. The artist, recently featured in our Meet the Omnivore series, is creating the film with Autodesk 3ds Max, Houdini, Adobe Substance 3D and Unreal Engine — as well as the NVIDIA Omniverse Create app.

In addition, NVIDIA artist Michael Johnson brings holiday cheer with more winter-themed artwork built in Omniverse Create.

Santa brought creative app upgrades and optimizations early, as video-editing app Filmora added NVIDIA AV1 dual encoder support with GeForce RTX 40 Series GPUs, slashing export times in half.

Technology company CORSAIR’s iCUE software release 4.31 enabled NVIDIA Broadcast integration, unlocking Noise Reduction and Room Echo cancellation features in systems powered by RTX 40 Series GPUs.

Get into the holiday mood with incredible wintery art in the latest “Studio Standouts,” featuring pieces from the #WinterArtChallenge.

There’s still time to enter by sharing winter-themed art on Instagram, Twitter or Facebook for a chance to be featured on NVIDIA Studio’s social media channels. Be sure to tag #WinterArtChallenge to join.

Hide Your Candy

The Great Candy Inquisition is a whimsical short film full of childlike wonder. Jealous that children often only want candy, the reindeer toys, nutcrackers and other animated characters in the film go on a sticky, sweet inquisition to remove candy from the toy kingdom. Will the gingerbread boy, whose gingerbread parents are sent to the “gulnog” for refusing to comply, be able to stop them?

Find out by watching the final video next year, being beautifully pieced together in NVIDIA Omniverse, a platform for building and operating metaverse applications, using the Omniverse Create app for large-scale world-building and scene composition.

Virtually all of McEvenue’s creative workflow is accelerated by his GeForce RTX 3080 Ti GPU. As the founder of EDSTUDIOS, McEvenue takes on freelance work for which it’s critical that he and his team complete tasks quickly and efficiently.

Modeling for The Great Candy Inquisition is being split between Houdini, which has an RTX-accelerated Karma XPU renderer that enables fast rendering of complex 3D models and simulations, and Autodesk 3ds Max, which uses RTX-accelerated AI denoising to unlock smooth, interactive rendering. 3D assets were sourced from Sketchfab and Turbosquid, using the built-in asset browser within Omniverse Create.

McEvenue then built textures and materials in Adobe Substance 3D Painter and Designer, which he baked (rather than gingerbread men or women) in seconds, thanks to RTX-accelerated light and ambient occlusion.

Animations in Unreal Engine 5 were quick and easy, McEvenue said. RTX-accelerated rendering guaranteed photorealistic detail, further enhanced by AI features in NVIDIA DLSS to upscale frames rendered at lower resolution while still retaining high-fidelity details.

At this juncture, McEvenue imported 3D elements into Omniverse Create to piece together stunning scenes.

Omniverse Create houses the advanced, multi-GPU-enabled, path-traced RTX Renderer capable of global illumination, reflections and refractions — all at the speed of light, powered by an RTX GPU. McEvenue tickled and touched up scenes without changes in the stunning level of detail. Omniverse Create includes access to NVIDIA vMaterials for even more realistic scenes and true-to-reality visualizations.

“The ability to progressively iterate on designs and see your work rendered in real time in the viewport, with full-fidelity lighting, materials and post-production effects like DOF, Bloom and atmospheric fog makes all the difference in finalizing artwork,” said McEvenue.

With The Great Candy Inquisition close to completion, the team applied final details in their preferred 3D apps by live-syncing Omniverse Connectors in Autodesk 3ds Max, Adobe Substance 3D Painter and Unreal Engine, simultaneously, despite working in several different physical locations. Working in such a cohesive virtual environment eliminated the need to download, reupload and redownload files.

EDSTUDIOS’ upcoming projects will be completed much quicker thanks to GeForce RTX GPUs, McEvenue said. “Real-time rendering is the future, and only possible with GPU-powered systems — and NVIDIA GPUs lead the pack,” the artist said.

Check out Edward McEvenue’s website for more inspirational artwork.

It’s Beginning to Look a Lot Like Omniverse

NVIDIA artist Michael Johnson is a big fan of the holiday season. Unable to resist the temptation to create winter-themed art in Omniverse Create, he decided to work on a piece for the #WinterArtChallenge, which runs through the end of the month and is open to creatives from around the globe. Johnson spent a week creating different assets and assembled the image.

A steaming mug of hot cocoa — studded with creamy marshmallows and emblazoned with “Happy Holidays, From Ours to Yours” — sets the scene. Scattered around the mug are squares of chocolate, gingerbread cookies, shimmering ornaments and a furry throw, all aglow from twinkling holiday lights.

“The holiday season tends to make me feel warm inside,” Johnson said. “Listening to music, decorating a tree with family and wearing cozy clothes while eating sweet treats — this is the feeling I wanted to give off with this piece of art.”

Like McEvenue, Johnson maneuvered his piece quickly, changing angles and lighting in the viewport with little to no delay, while incredibly realistic visuals populated the scene.

Johnson manipulates ornaments, in the video below, resizing assets and adding fine detail.

He then easily applied colors and textures with the Adobe Substance 3D Painter Connector.

Download Omniverse to build magnificent virtual worlds.

Creative App Updates Come Early This Holiday Season

Wondershare’s intuitive video-editing app, Filmora, with over 100 million users, has integrated NVIDIA AV1 dual encoders in the latest version 12 update, powered by GeForce RTX 40 Series GPUs. The dual encoders can work in tandem, dividing work automatically to double output and cut export times in half.

GeForce RTX 40 Series GPUs also unlock faster decoding with NVIDIA decoder (NVDEC) for smooth playback of high-resolution and high-dynamic-range videos, plus faster rendering of GPU-accelerated video effects.

Learn more about the Filmora update.

Filmora’s easy-to-access user interface.

A leader in high-performance gear and systems for gamers, content creators and PC enthusiasts, CORSAIR has released iCUE software now with support for the new GeForce RTX 40 Series GPUs.

NVIDIA Broadcast features are now accessible directly in CORSAIR’s iCUE software for GeForce RTX 40 Series GPUs.

iCUE Version 4.31 and later updates will integrate NVIDIA Broadcast technology to take advantage of AI-powered features. Noise Reduction and Room Echo cancellation eliminate keyboard typing, annoying microphone static, loud PC fans and more, ensuring content creators and creative professionals can find a quiet place to work with their systems powered by GeForce RTX 40 Series GPUs.

For the latest creative app updates, download the monthly NVIDIA Studio Driver.

Access tutorials on the Studio YouTube channel and get updates directly in your inbox by subscribing to the Studio newsletter.

The post 3D Artist Edward McEvenue Animates Holiday Cheer This Week ‘In the NVIDIA Studio’ appeared first on NVIDIA Blog.

Automatically retrain neural networks with Renate

Today we announce the general availability of Renate, an open-source Python library for automatic model retraining. The library provides continual learning algorithms able to incrementally train a neural network as more data becomes available.

By open-sourcing Renate, we would like to create a venue where practitioners working on real-world machine learning systems and researchers interested in advancing the state of the art in automatic machine learning, continual learning, and lifelong learning come together. We believe that synergies between these two communities will generate new ideas in the machine learning research community and provide a tangible positive impact in real-world applications.

Model retraining and catastrophic forgetting

Training neural networks incrementally is not a simple task. In practice, data provided at different points in time is often sampled from different distributions. For example, in question-answering systems, the distribution of the topics in the questions can significantly vary over time. In classification systems, the addition of new categories may be required when the data is collected in different parts of the world. Fine-tuning the previously trained models with new data in these cases will lead to a phenomenon called “catastrophic forgetting.” There will be good performance on the most recent examples, but the quality of the predictions made for data collected in the past will degrade significantly. Moreover, the performance degradation will be even more severe when the retraining operation happens regularly (e.g., daily or weekly).

When storing a small chunk of data is possible, methods based on reusing old data during the retraining can partially alleviate the catastrophic forgetting problem. Several methods have been developed following this idea. Some of them store only the raw data, while more advanced ones also save additional metadata (e.g., the intermediate representation of the data points in memory). Storing a small amount of data (e.g., thousands of data points) and using them carefully led to the superior performance displayed in the figure below.

Bring your own model and dataset

When training neural network models, it may be necessary to change the network structure, the data transformation and other important details. While code changes are limited, it can become a complex task when these models are part of a large software library. To avoid these inconveniences, Renate offers customers the ability to define their models and datasets in predefined Python functions as part of a configuration file. This has the advantage of keeping the customers’ code clearly separate from the rest of the library and allow customers without any knowledge of the Renate’s internal structure to use the library effectively.

Moreover, all functions, including the model definition, are very flexible. In fact, the model definition function allows users to create neural networks from scratch following their own needs or to instantiate well-known models from open-source libraries like transformers or torchvision. It just requires adding the necessary dependencies to the requirements file.

A tutorial on how to write the configuration file is available at How to Write a Config File.

The benefit of hyperparameter optimization

As is often the case in machine learning, continual learning algorithms come with a number of hyperparameters. Its settings can make an important difference in the overall performance, and careful tuning can positively impact the predictive performance. When training a new model, Renate can enable hyperparameter optimization (HPO) using state-of-the-art algorithms like ASHA to exploit the ability to run multiple parallel jobs on Amazon SageMaker. An example of the outcomes is displayed in the figure below.

In order to enable HPO, the user will need to define the search space or use one of the default search spaces provided with the library. Refer to the example at Run a training job with HPO. Customers that are looking for a quicker retuning can also leverage the results of their previous tuning jobs by selecting algorithms with transfer learning functionalities. In this way, optimizers will be informed about which hyperparameters are performing well across different tuning jobs and will be able to focus on those, reducing the tuning time.

Run it in the cloud

Renate allows users to quickly transition from training models on a local machine for experimentation to train large-scale neural networks using SageMaker. In fact, running training jobs on a local machine is rather unusual, especially when training large-scale models. At the same time, being able to verify details and test the code locally can be extremely useful. To answer this need, Renate allows quick switching between the local machine and the SageMaker service just by changing a simple flag in the configuration file.

For example, when launching a tuning job, it is possible to run locally execute_tuning_job(..., backend='local') and quickly switch to SageMaker, changing the code as follows:

execute_tuning_job(
...,
backend="sagemaker",
role=get_execution_role(),      # requires importing the function from Syne Tune
instance_type="ml.g4dn.2xlarge" # the desired instance type
job_name="name_prefix_",             # a prefix to be used to identify the job
...
)

After running the script, it will be possible to see the job running from the SageMaker web interface:

It will also be possible to monitor the training job and read the logs in CloudWatch:

All of this without any additional code or effort.

A full example of running training jobs in the cloud is available at How to Run a Training Job.

Conclusion

In this post, we described the problems associated with retraining neural networks and the main benefits of the Renate library in the process. To learn more about the library, check out the GitHub repository, where you will find a high-level overview of the library and its algorithms, instructions for the installation, and examples that can help you to get you started.

We look forward to your contributions, feedback and discussing this further with everyone interested, and to seeing the library integrated into real-world retraining pipelines.

About the authors

Giovanni Zappella is a Sr. Applied Scientist working on Long-term science at AWS Sagemaker. He currently works on continual learning, model monitoring and AutoML. Before that he worked on applications of multi-armed bandits for large-scale recommendations systems at Amazon Music.

Martin Wistuba is an Applied Scientist in the Long-term science team at AWS Sagemaker. His research focuses on automatic machine learning.

Lukas Balles is an Applied Scientist at AWS. He works on continual learning and topics relating to model monitoring.

Cedric Archambeau is a Principal Applied Scientist at AWS and Fellow of the European Lab for Learning and Intelligent Systems.

Create Amazon SageMaker models using the PyTorch Model Zoo

Deploying high-quality, trained machine learning (ML) models to perform either batch or real-time inference is a critical piece of bringing value to customers. However, the ML experimentation process can be tedious—there are a lot of approaches requiring a significant amount of time to implement. That’s why pre-trained ML models like the ones provided in the PyTorch Model Zoo are so helpful. Amazon SageMaker provides a unified interface to experiment with different ML models, and the PyTorch Model Zoo allows us to easily swap our models in a standardized manner.

This blog post demonstrates how to perform ML inference using an object detection model from the PyTorch Model Zoo within SageMaker. Pre-trained ML models from the PyTorch Model Zoo are ready-made and can easily be used as part of ML applications. Setting up these ML models as a SageMaker endpoint or SageMaker Batch Transform job for online or offline inference is easy with the steps outlined in this blog post. We will use a Faster R-CNN object detection model to predict bounding boxes for pre-defined object classes.

We walk through an end-to-end example, from loading the Faster R-CNN object detection model weights, to saving them to an Amazon Simple Storage Service (Amazon S3) bucket, and to writing an entrypoint file and understanding the key parameters in the PyTorchModel API. Finally, we will deploy the ML model, perform inference on it using SageMaker Batch Transform, and inspect the ML model output and learn how to interpret the results. This solution can be applied to any other pre-trained model on the PyTorch Model Zoo. For a list of available models, see the PyTorch Model Zoo documentation.

Solution overview

This blog post will walk through the following steps. For a full working version of all steps, see create_pytorch_model_sagemaker.ipynb

Step 1: Setup
Step 2: Loading an ML model from PyTorch Model Zoo
Step 3 Save and upload ML model artifacts to Amazon S3
Step 4: Building ML model inference scripts
Step 5: Launching a SageMaker batch transform job
Step 6: Visualizing results

Architecture diagram

Directory structure

The code for this blog can be found in this GitHub repository. The codebase contains everything we need to build ML model artifacts, launch the transform job, and visualize results.

This is the workflow we use. All of the following steps will refer to modules in this structure.

sagemaker_pytorch_model_zoo --> root directory
    |- inference.py --> entry point file
    |- create_pytorch_model_sagemaker.ipynb --> walks through all steps in this blog post
    |- cars.jpg --> input image

The sagemaker_torch_model_zoo folder should contain inference.py as an entrypoint file, and create_pytorch_model_sagemaker.ipynb to load and save the model weights, create a SageMaker model object, and finally pass that into a SageMaker batch transform job. In order to bring your own ML models, change the paths in the Step 1: setup section of the notebook and load a new model in the Step 2: Loading an ML Model from the PyTorch Model Zoo section. The rest of the following steps below would remain the same.

Step 1: Setup

IAM roles

SageMaker performs operations on infrastructure that is managed by SageMaker. SageMaker can only perform actions permitted as defined in the notebook’s accompanying IAM execution role for SageMaker. For a more detailed documentation on creating IAM roles and managing IAM permissions, refer to the AWS SageMaker roles documentation. We can create a new role, or we could get the SageMaker (Studio) notebook’s default execution role by running the following lines of code:

import sagemaker

session = sagemaker.Session()

# Set a default S3 bucket
default_bucket = session.default_bucket()

# Get the region
region = boto3.Session().region_name

# Get the SageMaker Execution Role
role_arn = sagemaker.get_execution_role()

The above code gets the SageMaker execution role for the notebook instance. This is the IAM role that we created for our SageMaker or SageMaker Studio notebook instance.

User configurable parameters

Here are all the configurable parameters needed for building and launching our SageMaker batch transform job:

INSTANCE_TYPE= "ml.m5.xlarge"
INSTANCE_COUNT= 1
BUCKET = os.path.join("s3://", default_bucket)

DATA_PATH= os.path.join(BUCKET, "images")
IMAGE_NAME = "cars.jpg"
RANDOM_STRING_LENGTH= 16
MODEL_NAME= "FasterRCNNResnet50"

# Needs to be set to version 1.2 or higher to enable automatic PyTorch model repackaging
FRAMEWORK_VERSION= "1.2"
ENTRY_POINT_FILE_NAME= "inference.py"

SAGEMAKER_EXECUTION_ROLE_ARN= role_arn
MODEL_ARTIFACTS_FILE_NAME= os.path.join(BUCKET, "modelzoo/fasterrcnn_resnet50_fpn/model.tar.gz")
IMAGE_URI= sagemaker.image_uris.retrieve(framework="pytorch",
region=region,
version="1.9.1",
py_version="py38",
image_scope='inference',
instance_type=INSTANCE_TYPE)

Step 2: Loading an ML model from the PyTorch Model Zoo

Next, we specify an object detection model from the PyTorch Model Zoo and save its ML model weights. Typically, we save a PyTorch model using the .pt or .pth file extensions. The code snippet below downloads a pre-trained Faster R-CNN ResNet50 ML model from the PyTorch Model Zoo:

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)

SageMaker batch transform requires as an input some model weights, so we will save the pre-trained ML model as model.pt. If we want to load a custom model, we could save the model weights from another PyTorch model as model.pt instead.

H = 1080
W = 1920
scripted_fn = torch.jit.script(model, torch.randn(1, 3, H, W))
scripted_fn.save("model.pt")

Step 3: Save and upload ML model artifacts to Amazon S3

Since we will be using SageMaker for ML inference, we need to upload the model weights to an S3 bucket. We can do this using the following commands or by downloading and simply dragging and dropping the file directly into S3. The following commands will first compress the group of files within model.pt to a tarball and copy the model weights from our local machine to the S3 bucket.

Note: To run the following commands, you need to have the AWS Command Line Interface (AWS CLI) installed.

tar -czvf model.tar.gz model.pt
aws s3 cp model.tar.gz $MODEL_ARTIFACTS_FILE_NAME

Next, we copy our input image over to S3. Below is the full S3 path for the image.

car_image_path = os.path.join(DATA_PATH, IMAGE_NAME)

We can copy over this image to S3 with another aws s3 cp command.

aws s3 cp cars.jpg $car_image_path

Step 4: Building ML model inference scripts

Now we will go over our entrypoint file, inference.py module. We can deploy a PyTorch model trained outside of SageMaker using the PyTorchModel class. First, we instantiate the PyTorchModelZoo object. Then we will construct an inference.py entrypoint file to perform ML inference using SageMaker batch transform on sample data hosted in Amazon S3.

Understanding the PyTorchModel object

The PyTorchModel class within the SageMaker Python API allows us to perform ML inference using our downloaded model artifact.

To initiate the PyTorchModel class, we need to understand the following input parameters:

name: Model name; we recommend using either the model name + date time, or a random string + date time for uniqueness.
model_data: The S3 URI of the packaged ML model artifact.
entry_point: A user-defined Python file to be used by the inference Docker image to define handlers for incoming requests. The code defines model loading, input preprocessing, prediction logic, and output post-processing.
framework_version: Needs to be set to version 1.2 or higher to enable automatic PyTorch model repackaging.
source_dir: The directory of the entry_point file.
role: An IAM role to make AWS service requests.
image_uri: Use this Amazon ECR Docker container image as a base for the ML model compute environment.
sagemaker_session: The SageMaker session.
py_version: The Python version to be used

The following code snippet instantiates the PyTorchModel class to perform inference using the pre-trained PyTorch model:

model = PyTorchModel(
               name=RANDOM_STRING,
               model_data=MODEL_ARTIFACTS_FILE_NAME,
               entry_point=ENTRY_POINT_FILE_NAME,
               framework_version=FRAMEWORK_VERSION,
               role=SAGEMAKER_EXECUTION_ROLE_ARN,
               sagemaker_session=sagemaker_session,
               image_uri=IMAGE_URI,
        )

Understanding the entrypoint file (inference.py)

The entry_point parameter points to a Python file named inference.py. This entrypoint defines model loading, input preprocessing, prediction logic, and output post-processing. It supplements the ML model serving code in the prebuilt PyTorch SageMaker Deep Learning Container image.

Inference.py will contain the following functions. In our example, we implement the model_fn, input_fn, predict_fn and output_fn functions to override the default PyTorch inference handler.

model_fn: Takes in a directory containing static model checkpoints in the inference image. Opens and loads the model from a specified path and returns a PyTorch model.
input_fn: Takes in the payload of the incoming request (request_body) and the content type of an incoming request (request_content_type) as input. Handles data decoding. This function needs to be adjusted for what input the model is expecting.
predict_fn: Calls a model on data deserialized in input_fn. Performs prediction on the deserialized object with the loaded ML model.
output_fn: Serializes the prediction result into the desired response content type. Converts predictions obtained from the predict_fn function to JSON, CSV, or NPY formats.

Step 5: Launching a SageMaker batch transform job

For this example, we will obtain ML inference results through a SageMaker batch transform job. Batch transform jobs are most useful when we want to obtain inferences from datasets once, without the need for a persistent endpoint. We instantiate a sagemaker.transformer.Transformer object for creating and interacting with SageMaker batch transform jobs.

transformer = model.transformer(instance_type=INSTANCE_TYPE, 
                                instance_count=INSTANCE_COUNT
                                )
transformer.transform(data=DATA_PATH,
                      data_type="S3Prefix",
                      content_type="application/x-image",
                      wait=True
                      )

See the documentation for creating a batch transform job at CreateTransformJob.

Step 6: Visualizing tesults

Once the SageMaker batch transform job finishes, we can load the ML inference outputs from Amazon S3. For this, navigate to the AWS Management Console and search for Amazon SageMaker. On the left panel, under Inference, see Batch transform jobs.

After selecting Batch transform, see the webpage listing all SageMaker batch transform jobs. We can view the progress of our most recent job execution.

First, the job will have the status “InProgress.” Once it’s done, see the status change to Completed.

Once the status is marked as completed, we can click on the job to view the results. This webpage contains the job summary, including configurations of the job we just executed.

Under Output data configuration, we will see an S3 output path. This is where we will find our ML inference output.

Select the S3 output path and see an [image_name].[file_type].out file with our output data. Our output file will contain a list of mappings. Example output:

[
  {
    "boxes": [
      [
        214.32322692871094,
        192.18418884277344,
        830.3932495117188,
        521.6996459960938
      ],
      [
        235.6244354248047,
        301.3315734863281,
        253.6448516845703,
        312.3525695800781
      ],
      [
        183.92031860351562,
        291.7759704589844,
        207.28196716308594,
        312.1448669433594
      ],
    ],
    "labels": [
      3,
      3,
      9,
    ],
    "scores": [
      0.8823906183242798,
      0.7710548639297485,
      0.4969744384288788,
    ]
  }
]

In order to visualize these predictions, we first read the output path from our transformer object.

def get_output_from_s3(s3uri, file_name):
    parsed_url = urlparse(s3uri)
    bucket_name = parsed_url.netloc
    prefix = parsed_url.path[1:]
    s3 = boto3.resource('s3')
    obj = s3.Object(bucket_name, '{}/{}'.format(prefix, file_name))
    return obj.get()["Body"].read().decode('utf-8')
    
# Output path from Batch Transform job
output_path = transformer.output_path

# Get the output file from S3
predictions = get_output_from_s3(output_path, "car.jpg.out")

Next, we process this output file and visualize our predictions. Below we specify our confidence threshold. We get the list of classes from the COCO dataset object mapping. During inference, the model requires only the input tensors and returns the post-processed predictions as a List[Dict[Tensor]], one for each input image. The fields of the Dict are as follows, where N is the number of detections:

boxes (FloatTensor[N, 4]): the predicted boxes in [x1, y1, x2, y2] format, with 0 <= x1 < x2 <= W and 0 <= y1 < y2 <= H, where W is the width of the image and H is the height of the image
labels (Int64Tensor[N]): the predicted labels for each detection
scores (Tensor[N]): the prediction scores for each detection

For more details on the output, refer to the PyTorch Faster R-CNN FPN Documentation.

The model output contains bounding boxes with respective confidence scores. We can optimize displaying false positives by removing bounding boxes for which the model is not confident. The following code snippets process the predictions in the output file and draw bounding boxes on the predictions where the score is above our confidence threshold. We set the probability threshold, CONF_THRESH, to .75 for this example.

def procress_batch_transform_output(predictions):
    predictions = eval(predictions)
    for pred in predictions[1:]:
        pred = pred[0]
        boxes = np.array(pred["boxes"])
        labels = np.array(pred["labels"])
        scores = np.array(pred["scores"])

        scores_idx = scores >= CONF_THRESH
        boxes_meet = boxes[scores_idx, :]
        labels_meet = labels[scores_idx]
        scores_meet = scores[scores_idx]

        labels_str = [CLASSES[i] for i in labels_meet]
        
        # Return a tuple containing labels, label index, score, and bounding box
        processed_predictions =  list(zip(labels_str, labels_meet, scores_meet, boxes_meet))
        return processed_predictions
    
    
def visualize_batch_transform_output(input_image, processed_predictions):
    # read input image from computer
    img = read_image(input_image)
    for label, label_index, score, box in processed_predictions:
        label = label + ", score: " + str(round(score, 2))
        # draw bounding box and fill color
        box = torch.tensor(box)
        box = box.unsqueeze(0)
        img = draw_bounding_boxes(img, box, width=5,labels=[label], font_size=16)

    # transform this image to PIL image
    img = torchvision.transforms.ToPILImage()(img)

    # display output
    img.show()

# Process the predictions in the output file
processed_predictions = procress_batch_transform_output(predictions)
visualize_batch_transform_output("car.jpg", processed_predictions)

Finally, we visualize these mappings to understand our output.

Note: if the image doesn’t display in your notebook, please locate it in the directory tree on the left-hand side of JupyterLab and open it from there.

Running the example code

For a full working example, clone the code in the amazon-sagemaker-examples GitHub and run the cells in the create_pytorch_model_sagemaker.ipynb notebook.

Conclusion

In this blog post, we showcased an end-to-end example of performing ML inference using an object detection model from the PyTorch Model Zoo using SageMaker batch transform. We covered loading the Faster R-CNN object detection model weights, saving them to an S3 bucket, writing an entrypoint file, and understanding the key parameters in the PyTorchModel API. Finally, we deployed the model and performed ML model inference, visualized the model output, and learned how to interpret the results.

About the Authors

Dipika Khullar is an ML Engineer in the Amazon ML Solutions Lab. She helps customers integrate ML solutions to solve their business problems. Most recently, she has built training and inference pipelines for media customers and predictive models for marketing.

Marcelo Aberle is an ML Engineer in the AWS AI organization. He is leading MLOps efforts at the Amazon ML Solutions Lab, helping customers design and implement scalable ML systems. His mission is to guide customers on their enterprise ML journey and accelerate their ML path to production.

Ninad Kulkarni is an Applied Scientist in the Amazon ML Solutions Lab. He helps customers adopt ML and AI by building solutions to address their business problems. Most recently, he has built predictive models for sport, automotive, and media customers.

Yash Shah is a Science Manager in the Amazon ML Solutions Lab. He and his team of applied scientists and ML engineers work on a range of ML use cases from healthcare, sports, automotive, and manufacturing.

Research @ Microsoft 2022: A look back at a year of accelerating progress in AI

2022 Microsoft Research - Year in review graphic

2022 has seen remarkable progress in foundational technologies that have helped to advance human knowledge and create new possibilities to address some of society’s most challenging problems. Significant advances in AI have also enabled Microsoft to bring new capabilities to customers through our products and services, including GitHub Copilot, an AI pair programmer capable of turning natural language prompts into code, and a preview of Microsoft Designer, a graphic design app that supports the creation of social media posts, invitations, posters, and one-of-a-kind images.

These offerings provide an early glimpse of how new AI capabilities, such as large language models, can enable people to interact with machines in increasingly powerful ways. They build on a significant, long-term commitment to fundamental research in computing and across the sciences, and the research community at Microsoft plays an integral role in advancing the state of the art in AI, while working closely with engineering teams and other partners to transform that progress into tangible benefits.

In 2022, Microsoft Research established AI4Science, a global organization applying the latest advances in AI and machine learning toward fundamentally transforming science; added to and expanded the capabilities of the company’s family of foundation models; worked to make these models and technologies more adaptable, collaborative, and efficient; further developed approaches to ensure that AI is used responsibly and in alignment with human needs; and pursued different approaches to AI, such as causal machine learning and reinforcement learning.

We shared our advances across AI and many other disciplines during our second annual Microsoft Research Summit, where members of our research community gathered virtually with their counterparts across industry and academia to discuss how emerging technologies are being explored and deployed to bring the greatest possible benefits to humanity.

Plenary sessions at the event focused on the transformational impact of deep learning on the way we practice science, research that empowers medical practitioners and reduces inequities in healthcare, and emerging foundations for planet-scale computing. Further tracks and sessions over three days provided deeper dives into the future of the cloud; efficient large-scale AI; amplifying human productivity and creativity; delivering precision healthcare; building user trust through privacy, identity, and responsible AI; and enabling a resilient and sustainable world.

Blog

Microsoft Climate Research Initiative (MCRI)

In June, the Microsoft Climate Research Initiative (MCRI) announced its first phase of collaborations among multidisciplinary researchers working together to accelerate cutting-edge research and transformative innovation in climate science and technology.

Publication

New Future of Work Report 2022

In May, researchers across Microsoft published the New Future of Work Report 2022, which summarizes important recent research developments related to hybrid work. It highlights themes that have emerged in the findings of the past year and resurfaces older research that has become newly relevant.

In this blog post, we look back at some of the key achievements and notable work in AI and highlight other advances across our diverse, multidisciplinary, and global organization.

Advancing AI foundations and accelerating progress

Over the past year, the research community at Microsoft made significant contributions to the rapidly evolving landscape of powerful large-scale AI models. Microsoft Research and the Microsoft Turing team unveiled a new Turing Universal Language Representation model capable of performing both English and multilingual understanding tasks. In computer vision, advancements for the Project Florence-VL (Florence-Vision and Language) team spanned still imagery and video: its GIT model was the first to surpass human performance on the image captioning benchmark TextCaps; LAVENDER showed strong performance in video question answering, text-to-video retrieval, and video captioning; and GLIP and GLIPv2 combined localization and vision-language understanding. The group also introduced NUWA-Infinity, a model capable of converting text, images, and video into high-resolution images or long-duration video. Meanwhile, the Visual Computing Group scaled up its Transformer-based general-purpose computer vision architecture, Swin Transformer, achieving applicability across more vision tasks than ever before.

Researchers from Microsoft Research Asia and the Microsoft Turing team also introduced BEiT-3, a general-purpose multimodal foundation model that achieves state-of-the-art transfer performance on both vision and vision-language tasks. In BEiT-3, researchers introduce Multiway Transformers for general-purpose modeling, where the modular architecture enables both deep fusion and modality-specific encoding. Based on the shared backbone, BEiT-3 performs masked “language” modeling on images (Imglish), texts (English), and image-text pairs (“parallel sentences”) in a unified manner. The code and pretrained models will be available at GitHub.

One of the most crucial accelerators of progress in AI is the ability to optimize training and inference for large-scale models. In 2022, the DeepSpeed team made a number of breakthroughs to improve mixture of experts (MoE) models, making them more efficient, faster, and less costly. Specifically, they were able to reduce training cost by 5x, reduce MoE parameter size by up to 3.7x, and reduce MoE inference latency by 7.3x while offering up to 4.5x faster and 9x cheaper inference for MoE models compared to quality-equivalent dense models.

Transforming scientific discovery and adding societal value

Our ability to comprehend and reason about the natural world has advanced over time, and the new AI4Science organization, announced in July, represents another turn in the evolution of scientific discovery. Machine learning is already being used in the natural sciences to model physical systems using observational data. AI4Science aims to dramatically accelerate our ability to model and predict natural phenomena by creating deep learning emulators that learn by using computational solutions to fundamental equations as training data.

This new paradigm can help scientists gain greater insight into natural phenomena, right down to their smallest components. Such molecular understanding and powerful computational tools can help accelerate the discovery of new materials to combat climate change, and new drugs to help support the prevention and treatment of disease.

For instance, AI4Science’s Project Carbonix is working on globally accessible, at-scale solutions for decarbonizing the world economy, including reverse engineering materials that can pull carbon out of the environment and recycling carbon into materials. Collaborating on these efforts through the Microsoft Climate Research Initiative (MCRI) are domain experts from academia, industry, and government. Announced in June, MCRI is focused on areas such as carbon accounting, climate risk assessments, and decarbonization.

As part of the Generative Chemistry project, Microsoft researchers have been working with the global medicines company Novartis to develop and execute machine learning tools and human-in-the-loop approaches to enhance the entire drug discovery process. In April, they introduced MoLeR, a graph-based generative model for designing compounds that is more reflective of how chemists think about the process and is more efficient and practical than an earlier generative model the team developed.

While AI4Science is focused on computational simulation, we have seen with projects like InnerEye that AI can have societal value in many other ways. In March, Microsoft acquired Nuance Communications Inc., further cementing the companies’ shared commitment to outcome-based AI across industries, particularly in healthcare. Tools like the integration of Microsoft Teams and Dragon Ambient eXperience (Nuance DAX) to help ease the administrative burden of physicians and support meaningful doctor-patient interactions are already making a difference.

Making AI more adaptable, collaborative, and efficient

To help accelerate the capabilities of large-scale AI while building a landscape in which everyone can benefit from it, the research community at Microsoft aimed to drive progress in three areas: adaptability, collaboration, and efficiency.

To provide consistent value, AI systems must respond to changes in task and environment. Research in this area includes multi-task learning with task-aware routing of inputs, knowledge-infused decoding, model repurposing with data-centric ML, pruning and cognitive science or brain-inspired AI. A good example of our work toward adaptability is GODEL, or Grounded Open Dialogue Language Model, which ushers in a new class of pretrained language models that enable chatbots to help with tasks and then engage in more general conversations.

Microsoft’s research into more collaborative AI includes AdaTest, which leverages human expertise alongside the generative power of large language models to help people more efficiently find and correct bugs in natural language processing models. Researchers have also explored expanding the use of AI in creative processes, including a project in which science fiction writer Gabrielle Loisel used OpenAI’s GPT-3 to co-author a novella and other stories.

To enable more people to make use of AI in an efficient and sustainable way, Microsoft researchers are pursuing several new architectures and training paradigms. This includes new modular architectures and novel techniques, such as DeepSpeed Compression, a composable library for extreme compression and zero-cost quantization, and Z-Code Mixture of Experts models, which boost translation efficiency and were deployed in Microsoft Translator in 2022.

In December, researchers unveiled AutoDistil, a new technique that leverages knowledge distillation and neural architecture search to improve the balance between cost and performance when generating compressed models. They also introduced AdaMix, which improves the fine-tuning of large pretrained models for downstream tasks using mixture of adaptations modules for parameter-efficient model tuning. And vision-language model compression research on the lottery ticket hypothesis showed that pretrained language models can be significantly compressed without hurting their performance.

Blog

Infusing AI into cloud computing systems

Cloud Intelligence/AIOps is a rapidly emerging technology trend and an interdisciplinary research direction across system, software engineering, and AI/ML communities. In this blog post from November, the researchers behind Microsoft’s AIOps work outline a research vision to make the cloud more autonomous, proactive, and manageable.

Building and deploying AI responsibly

Building AI that maximizes its benefit to humanity, and does so equitably, requires considering both the opportunities and risks that come with each new advancement in line with our guiding principles: fairness, reliability and safety, privacy and security, inclusiveness, transparency, and accountability.

Helping to put these principles into practice is Microsoft’s Responsible AI Standard, which the company made publicly available in June. The standard comprises tools and steps that AI practitioners can execute in their workflows today to help ensure that building AI responsibly is baked into every stage of development. These standards will evolve as the tools and resources to responsibly build AI evolve in response to the rapid pace of AI advancement, particularly pertaining to the growing size of AI models and the new challenges they bring.

With FedKD and InclusiveFL, researchers tackled some of the obstacles in applying federated learning, an ML method for protecting privacy, to model training. Two separate teams explored solutions for the harmful language that large generative models can reproduce—one presenting a unified framework for both detoxifying and debiasing models and another introducing methods for making content moderation tools more robust. Meanwhile, researchers sought to strengthen human-AI collaboration by giving users more insight into how models arrive at their outputs via explanations provided by the models themselves.

The responsible development of AI also means deploying technologies that operate the way they were designed to—and the way people expect them to. In a pair of blog posts, researchers draw on their respective experiences developing a technology to support social agency in children who are born blind and another to support mental health practitioners in guiding patient treatment to stress the need for multiple measures of performance in determining the readiness of increasingly complex AI systems and the incorporation of domain experts and user research throughout the development process.

Advancing AI for decision making

Building the next generation of AI requires continuous research into fundamental new AI innovations. Two significant areas of study in 2022 were causal ML and reinforcement learning.

Causal ML

Identifying causal effects is an integral part of scientific inquiry. It helps us understand everything from educational outcomes to the effects of social policies to risk factors for diseases. Questions of cause and effect are also critical for the design and data-driven evaluation of many technological systems we build today. 

This year, Microsoft Research continued its work on causal ML, which combines traditional machine learning with causal inference methods. To help data scientists better understand and deploy causal inference, Microsoft researchers built the DoWhy library, an end-to-end causal inference tool, in 2018. To broaden access to this critical knowledge base, DoWhy has now migrated to an independent open-source governance model in a new PyWhy GitHub organization. As part of this new collaborative model, Amazon Web Services is contributing new technology based on structural causal models.

At this year’s Conference on Neural Information Processing Systems (NeurIPS), researchers presented a suite of open-source causal tools and libraries that aims to simultaneously provide core causal AI functionality to practitioners and create a platform for research advances to be rapidly deployed. This includes ShowWhy, a no-code user interface suite that empowers domain experts to become decision scientists. We hope that our work accelerates use-inspired basic research for improvement of causal AI.

Reinforcement learning (RL)

Reinforcement learning is a powerful tool for learning which behaviors are likely to produce the best outcomes in a given scenario, typically through trial and error. But this powerful tool faces some challenges. Trial and error can consume enormous resources when applied to large datasets. And for many real-time applications, there’s no room to learn from mistakes.

To address RL’s computational bottleneck, Microsoft researchers developed Path Predictive Elimination, a reinforcement learning method that is robust enough to remove noise from continuously changing environments. Also in 2022, a Microsoft team released MoCapAct, a library of pretrained simulated models to enable advanced research on artificial humanoid control at a fraction of the compute resources currently required.

Researchers also developed a new method for using offline RL to augment human-designed strategies for making critical decisions. This team deployed game theory to design algorithms that can use existing data to learn policies that improve on current strategies.

Readers’ choice: Notable blog posts for 2022

Thank you for reading

2022 was an exciting year for research, and we look forward to the future breakthroughs our global research community will deliver. In the coming year, you can expect to hear more from us about our vision, and the impact we hope to achieve. We appreciate the opportunity to share our work with you, and we hope you will subscribe to the Microsoft Research Newsletter for the latest developments.

Writers and Editors
Elise Ballard
Kristina Dodge
Kate Forster
Chris Stetkiewicz
Larry West

Managing Editor
Amber Tingle

Project Manager
Amanda Melfi

Graphic Designer
Matt Sanderson

Editor in Chief
Matt Corwine

The post Research @ Microsoft 2022: A look back at a year of accelerating progress in AI appeared first on Microsoft Research.

Controlling formality in machine translation

Transfer learning using limited contrastive data improves formality accuracy without compromising performance.Read More

Top 5 Edge AI Trends to Watch in 2023

With the state of the world under constant flux in 2022, some technology trends were put on hold while others were accelerated. Supply chain challenges, labor shortages and economic uncertainty had companies reevaluating their budgets for new technology.

For many organizations, AI is viewed as the solution to a lot of the uncertainty bringing improved efficiency, differentiation, automation and reduced cost.

Until now, AI has operated almost exclusively in the cloud. But increasingly diverse streams of data are being generated around the clock from sensors at the edge. These require real-time inference, which is leading more AI deployments to move to edge computing.

For airports, stores, hospitals and more, AI brings advanced efficiency, automation and even cost reduction, which is why edge AI adoption accelerated last year.

In 2023, expect to see a similarly challenging environment, which will drive the following edge AI trends.

1. Focus on AI Use Cases With High ROI

Return on investment is always an important factor for technology purchases. But with companies looking for new ways to reduce cost and gain a competitive advantage, expect AI projects to become more common.

A few years ago, AI was often viewed as experimental, but, according to research from IBM, 35% of companies today report using AI in their business, and an additional 42% report they’re exploring AI. Edge AI use cases, in particular, can help increase efficiency and reduce cost, making them a compelling place to focus new investments.

For example, supermarkets and big box stores are investing heavily in AI at self-checkout machines to reduce loss from theft and human error. With solutions that can detect errors with 98% accuracy, companies can quickly see a return of investment in a matter of months.

AI industrial inspection also has an immediate return, helping augment human inspectors on factory lines. Bootstrapped with synthetic data, AI can detect defects at a much higher rate and address a variety of defects that simply cannot be captured manually, resulting in more products with fewer false negative or positive detections.

2. Growth in Human and Machine Collaboration

Often seen as a far-off use case of edge AI, the use of intelligent machines and autonomous robots is on the rise. From automated distribution facilities to meet the demands of same-day deliveries, to robots monitoring grocery stores for spills and stock outs, to robot arms working alongside humans on a production line, these intelligent machines are becoming more common.

According to Gartner, the use of robotics and intelligent machines is expected to grow significantly by the end of the decade. “By 2030, 80% of humans will engage with smart robots on a daily basis, due to smart robot advancements in intelligence, social interactions and human augmentation capabilities, up from less than 10% today.” (Gartner, “Emerging Technologies: AI Roadmap for Smart Robots — Journey to a Super Intelligent Humanoid Robot”, G00761328, June 2022)

For this future to happen, one area of focus that needs attention in 2023 is aiding human and machine collaboration. Automated processes benefit from the strength and repeatable actions performed by robots, leaving humans to perform specialized and dexterous tasks that are more suited to our skills. Expect organizations to invest more in this human-machine collaboration in 2023 as a way to alleviate labor shortages and supply chain issues.

3. New AI Use Cases for Safety

Related to the trend of human and machine collaboration is that of AI functional safety. First seen in autonomous vehicles, more companies are looking to use AI to add proactive and flexible safety measures to industrial environments.

Historically, functional safety has been applied in industrial environments in a binary way, with the primary role of the safety function to immediately stop the equipment from causing any harm or damage when an event is triggered. AI, on the other hand, works in combination with context awareness to predict an event happening. This allows AI to proactively send alerts regarding future potential safety events, preventing the events before they happen, which can drastically reduce safety incidents and related downtime in industrial environments.

New functional safety standards that define the use of AI in safety are expected to be released in 2023 and will open the door for early adoption in factories, warehouses, agricultural use cases and more. One of the first areas for AI safety adoption will focus on improved worker safety, including worker posture detection, falling object prevention and personal protection equipment detection.

4. IT Focus on Cybersecurity at the Edge

Cyber attacks rose 50% in 2021 and haven’t slowed down since, making this a top focus for IT organizations. Edge computing, particularly when combined with AI use cases, can increase cybersecurity risk for many organizations by creating a wider attack surface outside of the traditional data center and its firewalls.

Edge AI in industries like manufacturing, energy, and transportation requires IT teams to expand their security footprint into environments traditionally managed by operational technology teams. Operational technology teams typically focus on operational efficiency as their main metric, relying on air-gapped systems with no network connectivity to the outside world. Edge AI use cases will start to break down these restrictions, requiring IT to enable cloud connectivity while still maintaining strict security standards.

With billions of devices and sensors around the world that will all be connected to the internet, IT organizations have to both protect edge devices from direct attack and consider network and cloud security. In 2023, expect to see AI applied to cybersecurity. Log data generated from IoT networks can now be fed through intelligent security models that can flag suspicious behavior and notify security teams to take action.

5. Connecting Digital Twins to the Edge

The term digital twin refers to perfectly synchronized, physically accurate virtual representations of real-world assets, processes or environments. Last year, NVIDIA partnered with Siemens to enable industrial metaverse use cases, helping customers accelerate their adoption of industrial automation technologies. Leading companies spanning manufacturing, retail, consumer packaged goods and telco, such as BMW, Lowe’s, PepsiCo and Heavy.AI, have also begun building operational digital twins allowing them to simulate and optimize their production environments.

What connects digital twins to the physical world and edge computing is the explosion of IoT sensors and data that is driving both these trends. In 2023, we’ll see organizations increasingly connect live data from their physical environment into their virtual simulations. They’ll move away from historical data-based simulations toward a live, digital environment — a true digital twin.

By connecting live data from the physical world to their digital twins, organizations can gain real-time insight into their environment, allowing them to make faster and more informed decisions. While still early, expect to see massive growth in this space next year for ecosystem providers and in customer adoption.

The Year of Edge AI

While the 2023 economic environment remains uncertain, edge AI will certainly be an area of investment for organizations looking to drive automation and efficiency. Many of the trends we saw take off last year continue to accelerate with the new focus on initiatives that help drive sales, reduce costs, grow customer satisfaction and enhance operational efficiency.

Visit NVIDIA’s Edge Computing Solutions page to learn more about edge AI and how we’re helping organizations implement it in their environments today.

The post Top 5 Edge AI Trends to Watch in 2023 appeared first on NVIDIA Blog.

New performance improvements in Amazon SageMaker model parallel library

Foundation models are large deep learning models trained on a vast quantity of data at scale. They can be further fine-tuned to perform a variety of downstream tasks and form the core backbone of enabling several AI applications. The most prominent category is large-language models (LLM), including auto-regressive models such as GPT variants trained to complete natural text. LLMs typically contain billions of parameters, making them rarely fit on one single accelerator, and require model parallelism techniques. Another category is diffusion models, notably Stable Diffusion, that has pushed AI image generation to an unprecedented milestone where remarkable visuals can be generated from a simple text description. Diffusion models are typically much smaller than LLMs and distributed training remains to play a critical role in facilitating development.

SageMaker model parallel (SMP) library is a large-model training solution available on Amazon SageMaker platform. It can be integrated with PyTorch models to easily apply a range of state-of-the-art large-model distributed training techniques to train at scale. Earlier this year, SMP launched sharded data parallelism, a distributed training technique powered by Amazon in-house MiCS technology under the hood. Sharded data parallel shards model parameters, gradients, and optimizer states across data-parallel workers. MiCS performs a number of optimizations including scale-aware partitioning to provide near-linear scalability. In Train gigantic models with near-linear scaling using sharded data parallelism, we shared that sharded data parallel in SMP achieved 39.7% speed up compared to DeepSpeed ZeRO-3 on a 30B parameter GPT-2 model with sequence length 2048.

To help our customers further minimize training costs and accelerate time-to-market, we are thrilled to introduce two new performance improvements in SageMaker model parallel — SMDDP Collectives and FlashAttention. SMDDP Collectives is the most performant collective library on AWS infrastructure for large model training offered by SageMaker distributed data parallel library. FlashAttention is introduced in Dao et al., which re-implements the attention mechanism in an IO-aware manner, reducing the memory bandwidth requirement and saving on attention speed and memory footprint. These two components collectively push our sharded data parallel technique to be 30.58% faster when training a 100B parameter GPT-NeoX model on 32 p4d.24xlarge instances. For customers who are already using sharded data parallel on supported models, no code changes are necessary to benefit from the performance boost offered by these latest features. Stability AI, the inventor of the Stable Diffusion family of models that showed unparalleled image generation abilities, chose to use SMP to build foundation models. With SMP, Stability AI achieved 163 TFLOPs per GPU for a 13B-parameter GPT-NeoX on 32 p4d.24xlarge instances, a 58% speed up compared to DeepSpeed. You can learn more about Stability AI’s mission and partnership with AWS in the talk of Stability AI CEO at AWS re:Invent 2022 or in this blog post.

“Our mission at Stability AI is to build the foundation to activate humanity’s potential through AI. To achieve this mission, we need to efficiently train open-source foundation models on hundreds of accelerated compute instances. We rely on SageMaker and its distributed training libraries to optimize performance and implement state-of-the-art strategies to shard models and data across our training cluster. These optimizations reduce our training costs, help us meet customer needs faster, and speed up the development of new models.”

— Emad Mostaque, Founder and CEO of Stability AI.

In this blog post, we’ll first present our latest performance improvements in the SageMaker model parallel library. Then, we’ll revisit how to train foundational models using sharded data parallel. Finally, we’ll benchmark performance of 13B, 50B, and 100B parameter auto-regressive models and wrap up with future work.

New performance improvements in SageMaker model parallel library

Starting from AWS Deep Learning Containers (DLC) PyTorch 1.12.1, SageMaker model parallel library v1.13 comes with the following two new components that are critical in improving training performance. They are currently available on ml.p4d.24xlarge instance with Elastic Fabric Adapter (EFA) enabled:

1. AWS-optimized AllGather from SMDDP Collectives

In sharded data parallel, since only a shard of the model state is present on a GPU, an AllGather collective is needed to gather the full set of parameters from across all GPUs in the sharding group during forward or backward pass computations. In the previous versions of SageMaker model parallel, we used NVIDIA Collective Communications Library (NCCL) for these collectives. However, NCCL is a general purpose collective communications library not designed for AWS infrastructure, which leads to sub-optimal performance even with EFA enabled.

Previously, we had developed the SMDDP Collectives library that provided an AWS-optimized implementation of the All-Reduce collective to speedup performance of pure data parallel training. To improve the performance of large model training with sharded data parallelism, we expanded the SMDDP Collectives library to include an optimized implementation of the AllGather collective. The key advantage of SMDDP Collectives AllGather is that it adopts an all-to-all-type communication pattern for inter-node communication, enabling our collective to have high-throughput and be less latency-sensitive. In addition, our AllGather collective offloads the communication-related processing to the CPU, thereby freeing up valuable GPU cycles for gradient computation, leading to significant performance improvement especially on large models.

2. FlashAttention

In modern transformer architecture, one of the largest sources of memory consumption is the activation footprint in the self-attention layer. This is because each attention head computes an SxS attention matrix for each input, where S is the sequence length, and this matrix goes through several operations, such as dropout, softmax, and matrix multiplication, with each intermediate output requiring memory space for use in back-propagation.

FlashAttention (Dao et al.) is a recent innovation from HazyResearch in Stanford that re-implements the self-attention mechanism in an I/O-aware manner. The main insight behind FlashAttention is that the self-attention mechanism is bottlenecked by memory bandwidth to and from GPU high bandwidth memory (HBM). This means that the self-attention layer can be computed in chunks across the sequence dimension, with each chunk going through the entire self-attention pipeline at a time. The intermediate results for a chunk are stored at the high-bandwidth SRAM, avoiding the expensive round-trip to the HBM for every iteration. Although a naive implementation would run into the issue of the cross-chunk dependency at the softmax layer, FlashAttention introduces a clever implementation that side-steps this dependency. Combined with re-computation in backward pass, FlashAttention results in substantial memory savings and performance improvement (25% faster training for GPT-NeoX 13B over 16 p4d nodes), due to avoidance of the HBM round-trip and storage of SxS matrices. You can find visuals and more explanations in HazyResearch’s FlashAttention repository.

Train foundation models at scale with SageMaker model parallel

To train foundation models with SMP powered by SMDDP Collectives, there’s no additional changes required in your sharded data parallel training jobs. If you’re new to using sharded data parallel, follow this complete tutorial notebook and blog post that will walk you through the entire process, from data processing, defining and submitting training jobs, to monitoring training logs. A ready-to-use training script for GPT-2 model can be found at train_gpt_simple.py. For training a different model type, you can follow the API document to learn about how to apply SMP APIs.

We highlight the key hyperparameters in the PyTorch Estimator of a sharded data parallel training job as below. The hyperparameter ddp_dist_backend in smp_options now has a new option, "auto" , as its default value. With "auto", SMP will use AWS-optimized AllGather for sharded data parallelism jobs and fall back to NCCL otherwise. You can refer to this document for supported configurations. If you want to run sharded data parallel in SMP specifically with NCCL as the communication backend of choice, you can set “ddp_dist_backend" to "nccl" in smp_options.

import sagemaker
from sagemaker.pytorch import PyTorch

smp_options = {
    "enabled": True,
    "parameters": {
        "ddp": True,
        "ddp_dist_backend": "auto", #OR "nccl" to disable SMDDP Collectives
        # To enable sharded data parallelism.
        # Here we shard model states across 128 GPUs.
        "sharded_data_parallel_degree": 128,  
    }
}

smp_estimator = PyTorch(
    entry_point="train_gpt_simple.py",
    role=sagemaker.get_execution_role(),
    instance_type='ml.p4d.24xlarge',
    instance_count=32,
    distribution={
        "smdistributed": {"modelparallel": smp_options},
        ...
    },
    ...
)

smp_estimator.fit(inputs=data_channels)

With the latest SMPv1.13 release, the sharded data parallel training technique supports FlashAttention for popular models including BERT, RoBERTa, GPT-2, GPT-J, GPT-Neo and GPT-NeoX out-of-the-box. This is enabled by passing tensor_parallelism=True during model creation without setting tensor_parallel_degree. You can find an example in the same training script train_gpt_simple.py .

Benchmarking performance

We benchmarked sharded data parallelism in the SageMaker model parallel library on three different scales of models to understand how the two new features, FlashAttention and AWS-optimized AllGather, contribute to performance improvement. Placement group is not required to reproduce these benchmarks on SageMaker.

13B parameter GPT-NeoX

In this setting, we focus on understanding the performance gain contributed by FlashAttention and we leave AWS-optimized AllGather out of the picture. Using flash attention saves substantial GPU memory, which helps us increase batch size or reduce sharding degree, thereby improving performance. As the below results show, we observed an average of about 20.4% speedup in SMP with flash attention for 13B parameter GPT-NeoX model on various configurations across 16-64 p4d nodes. Memory usage during standard attention computation scales in a quadratic manner with an increase in sequence length, but FlashAttention has memory usage linear in sequence length. Hence FlashAttention is even more helpful as sequence length increases and makes it possible to use larger sequence lengths. Being memory-efficient without trading off model quality, FlashAttention has gained traction quickly in the large model training community in the past months including integration with Hugging Face Diffusers and Mosaic ML.

Configuration			Performance
Model/Training	Cluster	SMP	Without FlashAttention (TFLOPs/GPU)	With FlashAttention (TFLOPs/GPU)	% Speedup
13B GPT-NeoX Seq length: 2048 Global batch size: 1024 FP16	16 p4d.24xlarge nodes	Activation checkpointing sharded_data_parallel_degree:64 gradient_accumulation: 1	130	159	22.31
13B GPT-NeoX Seq length: 2048 Global batch size: 2048 FP16	32 p4d.24xlarge nodes	Activation checkpointing sharded_data_parallel_degree:64 gradient_accumulation: 1	131	157	19.85
13B GPT-NeoX Seq length: 2048 Global batch size: 4096 FP16	64 p4d.24xlarge nodes	Activation checkpointing sharded_data_parallel_degree:64 gradient_accumulation: 1	131	156	19.08

50B parameter Bloom

Now, we look at how AWS-optimized AllGather from SMDDP Collectives speedup large model training with SMP. We benchmark a 50B-parameter Bloom model and compare the performance with and without AWS-optimized AllGather collective. We observe that SMDDP collectives speeds up model training by upto 40% across 32 nodes to 64 nodes training jobs. SMDDP collectives help achieve better performance due to better utilization of the 400 Gbps network bandwidth available with p4d.24xlarge instances. This coupled with the design choice to offload communication-related processing to the CPU, helps achieve good compute-to-network overlap leading to optimized performance. Compute-to-network overlap especially becomes important in large models since the size of data communicated across nodes scales linearly with an increase in the model size.

Configuration			Performance
Model/Training	Cluster	SMP	Without AWS-optimized AllGather (TFLOPs/GPU)	With AWS-optimized AllGather (TFLOPs/GPU)	% Speedup
50B Bloom Seq length: 2048 Global batch size: 2048 BF16	32 p4d.24xlarge nodes	Activation checkpointing sharded_data_parallel_degree:128 gradient_accumulation: 1	102	143	40.20
50B Bloom Seq length: 2048 Global batch size: 4096 BF16	64 p4d.24xlarge nodes	Activation checkpointing sharded_data_parallel_degree:128 gradient_accumulation: 1	101	140	38.61

100B parameter GPT-NeoX

Finally, we benchmark SMP with both of the latest features enabled. It shows that this new release of SMP v1.13 is 30% faster than the previous version on a 100B-parameter GPT-NeoX model.

Configuration			Performance
Model/Training	Cluster	SMP	Without FlashAttention and without AWS-optimized AllGather (TFLOPs/GPU)	With FlashAttention + AWS-optimized AllGather (TFLOPs/GPU)	% Speedup
100B GPT-NeoX Seq length: 2048 Global batch size: 2048 FP16	32 p4d.24xlarge nodes	Activation checkpointing sharded_data_parallel_degree:256 offload_activations Without FlashAttention: batch size is 4 with gradient accumulation of 2 steps. With FlashAttention: batch size is 8 with no gradient accumulation	121	158	30.58
100B GPT-NeoX Seq length: 2048 Global batch size: 4096 FP16	64 p4d.24xlarge nodes	Activation checkpointing sharded_data_parallel_degree:256 offload_activations Without FlashAttention: batch size is 4 with gradient accumulation of 2 steps. With FlashAttention: batch size is 8 with no gradient accumulation	122	158	29.51

For future work, we’ll be working on supporting an AWS-optimized Reduce-Scatter in SMDDP Collectives. The Reduce-Scatter collective is critical in averaging and sharding gradients computed in the backward pass. We expect this to further speed up SMP library in the future releases.

Conclusion

In this post, we discuss the two latest performance improvements for sharded data parallel technique in SageMaker model parallel library. LLMs show great promise in improving the quality and re-usability of ML models. AWS teams are working closely with customers to keep reducing their training costs and time-to-market. You can find more SageMaker model parallel examples in Amazon SageMaker Examples GitHub repo or attend our next distributed training workshops. If you are interested in speeding up large model training, check out these features and let us know what you build!

About the authors

Arjun Balasubramanian is a Senior Software Engineer at AWS focused on building high-performance, hardware accelerated collective communication algorithms for distributed deep learning. He is broadly interested in systems for large-scale machine learning and networking. Outside of work, he enjoys traveling and playing various sports.

Zhaoqi Zhu is a Software Development Engineer at AWS, specializing in distributed deep learning systems and working on the SageMaker Distributed Data Parallel library. Outside of work, Zhaoqi is passionate about soccer and hopes to not receive any red card in the upcoming season.

Can Karakus is a Senior Applied Scientist at AWS, optimizing large-scale distributed deep learning on AWS. His research interests cover deep learning, distributed optimization, distributed systems, and information theory. Outside of work, he enjoys cycling, traveling, reading and learning.

Rahul Huilgol is a Senior Software Engineer at AWS. He works on distributed deep learning systems, towards making it easy and performant to train large deep learning models in the cloud. In his spare time, he enjoys photography, biking and gardening.

Suhit Kodgule is a Software Development Engineer with AWS Artificial Intelligence group working on deep learning frameworks. In his spare time, he enjoys hiking, traveling and cooking.

Fei Wu is a Software Engineer at AWS. He works on distributed training for large-scale deep learning models on cloud. Outside of work, he enjoys basketball, gaming and cooking.

Next generation Amazon SageMaker Experiments – Organize, track, and compare your machine learning trainings at scale

Today, we’re happy to announce updates to our Amazon SageMaker Experiments capability of Amazon SageMaker that lets you organize, track, compare and evaluate machine learning (ML) experiments and model versions from any integrated development environment (IDE) using the SageMaker Python SDK or boto3, including local Jupyter Notebooks.

Machine learning (ML) is an iterative process. When solving a new use case, data scientists and ML engineers iterate through various parameters to find the best model configurations (aka hyperparameters) that can be used in production to solve the identified business challenge. Over time, after experimenting with multiple models and hyperparameters, it becomes difficult for ML teams to efficiently manage model runs to find the optimal one without a tool to keep track of the different experiments. Experiment tracking systems streamline the processes to compare different iterations and helps simplify collaboration and communication in a team, thereby increasing productivity and saving time. This is achieved by organizing and managing ML experiments in an effortless way to draw conclusions from them, for example, finding the training run with the best accuracy.

To solve this challenge, SageMaker provides SageMaker Experiments, a fully integrated SageMaker capability. It provides the flexibility to log your model metrics, parameters, files, artifacts, plot charts from the different metrics, capture various metadata, search through them and support model reproducibility. Data scientists can quickly compare the performance and hyperparameters for model evaluation through visual charts and tables. They can also use SageMaker Experiments to download the created charts and share the model evaluation with their stakeholders.

With the new updates to SageMaker Experiments, it is now a part of the SageMaker SDK, simplifying the data scientist work and eliminating the need to install an extra library to manage multiple model executions. We are introducing the following new core concepts:

Experiment: A collection of runs that are grouped together. An experiment includes runs for multiple types that can be initiated from anywhere using the SageMaker Python SDK.
Run: Each execution step of a model training process. A run consists of all the inputs, parameters, configurations, and results for one iteration of model training. Custom parameters and metrics can be logged using the log_parameter, log_parameters, and log_metric functions. Custom input and output can be logged using the log_file function.

The concepts that are implemented as part of a Run class are made available from any IDE where the SageMaker Python SDK is installed. For SageMaker Training, Processing and

Transform Jobs, the SageMaker Experiment Run is automatically passed to the job if the job is invoked within a run context. You can recover the run object using load_run() from your job. Finally, with the new functionalities’ integration, data scientists can also automatically log a confusion matrix, precision and recall graphs, and a ROC curve for classification use cases using the run.log_confusion_matrix, run.log_precision_recall, and run.log_roc_curve functions, respectively.

In this blog post, we will provide examples of how to use the new SageMaker Experiments functionalities in a Jupyter notebook via the SageMaker SDK. We will demonstrate these capabilities using a PyTorch example to train an MNIST handwritten digits classification example. The experiment will be organized as follow:

Creating experiment’s runs and logging parameters: We will first create a new experiment, start a new run for this experiment, and log parameters to it.
Logging model performance metrics:We will log model performance metrics and plot metric graphs.
Comparing model runs:We will compare different model runs according to the model hyperparameters. We will discuss how to compare those runs and how to use SageMaker Experiments to select the best model.
Running experiments from SageMaker jobs: We will also provide an example of how to automatically share your experiment’s context with a SageMaker processing, training or batch transform job. This allows you to automatically recover your run context with the load_run function inside your job.
Integrating SageMaker Clarify reports: We will demonstrate how we can now integrate SageMaker Clarify bias and explainability reports to a single view with your trained model report.

Prerequisites

For this blog post, we will use Amazon SageMaker Studio to showcase how to log metrics from a Studio notebook using the updated SageMaker Experiments functionalities. To execute the commands presented in our example, you need the following prerequisites:

SageMaker Studio Domain
SageMaker Studio user profile with SageMaker full access
A SageMaker Studio notebook with at least an ml.t3.medium instance type

If you do not have a SageMaker Domain and user profile available, you can create one using this quick setup guide.

Logging parameters

For this exercise, we will use torchvision, a PyTorch package that provides popular datasets, model architectures, and common image transformations for computer vision. SageMaker Studio provides a set of Docker images for common data science use cases that are made available in Amazon ECR. For PyTorch, you have the option of selecting images optimized for CPU or GPU training. For this example, we will select the image PyTorch 1.12 Python 3.8 CPU Optimized and the Python 3 kernel. The examples described below will focus on the SageMaker Experiments functionalities and are not code complete.

Let’s download the data with the torchvision package and track the number of data samples for the train and test datasets as parameters with SageMaker Experiments. For this example, let’s assume train_set and test_set as already downloaded torchvision datasets.

from sagemaker.session import Session
from sagemaker.experiments.run import Run
import os

# create an experiment and start a new run
experiment_name = "local-experiment-example"
run_name = "experiment-run"

with Run(experiment_name=experiment_name, sagemaker_session=Session(), run_name=run_name) as run:
    run.log_parameters({
        "num_train_samples": len(train_set.data),
        "num_test_samples": len(test_set.data)
    })
    for f in os.listdir(train_set.raw_folder):
        print("Logging", train_set.raw_folder+"/"+f)
        run.log_file(train_set.raw_folder+"/"+f, name=f, is_output=False)

In this example, we use the run.log_parameters to log the number of train and test data samples and run.log_file to upload the raw datasets to Amazon S3 and log them as inputs to our experiment.

Training a model and logging model metrics

Now that we’ve downloaded our MNIST dataset, let’s train a CNN model to recognize the digits. While training the model, we want to load our existing experiment run, log new parameters to it, and track the model performance by logging model metrics.

We can use the load_run function to load our previous run and use it to log our model training

with load_run(experiment_name=experiment_name, run_name=run_name, sagemaker_session=Session()) as run:
    train_model(
        run=run,
        train_set=train_set,
        test_set=test_set,
        epochs=10,
        hidden_channels=5,
        optimizer="adam"
    )

We can then use run.log_parameter and run.log_parameters to log one or multiple model parameters to our run.

# log the parameters of your model
run.log_parameter("device", "cpu")
run.log_parameters({
    "data_dir": data_dir,
    "optimizer": optimizer,
    "epochs": epochs,
    "hidden_channels": hidden_channels
})

And we can use run.log_metric to log performance metrics to our experiment.

run.log_metric(name=metric_type+":loss", value=loss, step=epoch)
run.log_metric(name=metric_type+":accuracy", value=accuracy, step=epoch)

For classification models, you can also use run.log_confusion_matrix, run.log_precision_recall, and run.log_roc_curve, to automatically plot the confusion matrix, precision recall graph, and the ROC curve of your model. Since our model solves a multiclass classification problem, let’s log only the confusion matrix for it.

# log confusion matrix
with torch.no_grad():
    for data, target in test_loader:
        data, target = data.to(device), target.to(device)
        output = model(data)
        pred = output.max(1, keepdim=True)[1] 
        run.log_confusion_matrix(target, pred, "Confusion-Matrix-Test-Data")

When looking at our run details, we can now see the generated metrics as shown in the screenshot below:

The run details page provides further information about the metrics.

And the new model parameters are tracked on the parameters overview page.

You can also analyze your model performance by class using the automatically plotted confusion matrix, which can also be downloaded and used for different reports. And you can plot extra graphs to analyze the performance of your model based on the logged metrics.

Comparing multiple model parameters

As a data scientist, you want to find the best possible model. That includes training a model multiple times with different hyperparameters and comparing the performance of the model with those hyperparameters. To do so, SageMaker Experiments allows us to create multiple runs in the same experiment. Let’s explore this concept by training our model with different num_hidden_channels and optimizers.

# define the list of parameters to train the model with
num_hidden_channel_param = [5, 10, 20]
optimizer_param = ["adam", "sgd"]
run_id = 0
# train the model using SageMaker Experiments to track the model parameters, 
# metrics and performance
sm_session = Session()
for i, num_hidden_channel in enumerate(num_hidden_channel_param):
    for k, optimizer in enumerate(optimizer_param):
        run_id += 1
        run_name = "experiment-run-"+str(run_id)
        print(run_name)
        print(f"Training model with: {num_hidden_channel} hidden channels and {optimizer} as optimizer")
        # Defining an experiment run for each model training run
        with Run(experiment_name=experiment_name, run_name=run_name, sagemaker_session=sm_session) as run:
            train_model(
                run=run, 
                train_set=train_set,
                test_set=test_set,
                epochs=10, 
                hidden_channels=num_hidden_channel,
                optimizer=optimizer
            )

We are now creating six new runs for our experiment. Each one will log the model parameters, metrics, and confusion matrix. We can then compare the runs to select the best-performing model for the problem. When analyzing the runs, we can plot the metric graphs for the different runs as a single plot, comparing the performance of the runs across the different training steps (or epochs).

Using SageMaker Experiments with SageMaker training, processing and batch transform jobs

In the example above, we used SageMaker Experiments to log model performance from a SageMaker Studio notebook where the model was trained locally in the notebook. We can do the same to log model performance from SageMaker processing, training and batch transform jobs. With the new automatic context passing capabilities, we do not need to specifically share the experiment configuration with the SageMaker job, as it will be automatically captured.

The example below will focus on the SageMaker Experiments functionalities and is not code complete.

from sagemaker.pytorch import PyTorch
from sagemaker.experiments.run import Run
from sagemaker.session import Session
from sagemaker import get_execution_role
role = get_execution_role()

# set new experiment configuration
exp_name = "training-job-experiment-example"
run_name = "experiment-run-example"

# Start training job with experiment setting
with Run(experiment_name=exp_name, run_name=run_name, sagemaker_session=Session()) as run:
    est = PyTorch(
        entry_point="<MODEL_ENTRY_POINT>",
        dependencies=["<MODEL_DEPENDENCIES>"],
        role=role,
        model_dir=False,
        framework_version="1.12",
        py_version="py38",
        instance_type='ml.c5.xlarge',
        instance_count=1,
            hyperparameters={
            "epochs": 10,
            "hidden_channels":5,
            "optimizer": "adam",
        },
        keep_alive_period_in_seconds=3600
    )
    
    est.fit()

In our model script file, we can get the run context using load_run(). In SageMaker processing and training jobs, we do not need to provide the experiment configuration for loading the configuration. For batch transform jobs, we need to provide experiment_name and run_name to load the experiment’s configuration.

with load_run() as run:
    run.log_parameters({...})
    train_model(run, ...)

In addition to the information we get when running SageMaker Experiments from a notebook script, the run from a SageMaker job will automatically populate the job parameters and outputs.

The new SageMaker Experiments SDK also ensures backwards compatibility with the previous version using the concepts of trials and trial components. Any experiment triggered using the previous SageMaker Experiments version will be automatically made available in the new UI, for analyzing the experiments.

Integrating SageMaker Clarify and model training reports

SageMaker Clarify helps improve our ML models by detecting potential bias and helping explain how these models make predictions. Clarify provides pre-built containers that run as SageMaker processing jobs after your model has been trained, using information about your data (data configuration), model (model configuration), and the sensitive data columns that we want to analyze for possible bias (bias configuration). Up until now, SageMaker Experiments displayed our model training and Clarify reports as individual trial components that were connected via a trial.

With the new SageMaker Experiments, we can also integrate SageMaker Clarify reports with our model training having one source of truth that allows us to further understand our model. For an integrated report, all we need to do is to have the same run name for our training and Clarify jobs. The following example demonstrates how we can integrate the reports using an XGBoost model to predict the income of adults across the United States. The model uses the UCI Adult dataset. For this exercise, we assume that the model was already trained and that we already calculated the data, model, and bias configurations.

with Run(
    experiment_name='clarify-experiment',
    run_name="joint-run",
    sagemaker_session=sagemaker_session,
) as run:
    xgb.fit({"train": train_input}, logs=False)
    clarify_processor.run_bias(
        data_config=bias_data_config,
        bias_config=bias_config,
        model_config=model_config,
        model_predicted_label_config=predictions_config,
        pre_training_methods="all",
        post_training_methods="all",
    )
    clarify_processor.run_explainability(
        data_config=explainability_data_config,
        model_config=model_config,
        explainability_config=shap_config,
    )

With this setup, we get a combined view that includes the model metrics, joint inputs and outputs, and the Clarify reports for model statistical bias and explainability.

Conclusion

In this post, we explored the new generation of SageMaker Experiments, an integrated part of SageMaker SDK. We demonstrated how to log your ML workflows from anywhere with the new Run class. We presented the new Experiments UI that allows you to track your experiments and plot graphs for a single run metric as well as to compare multiple runs with the new analysis capability. We provided examples of logging experiments from a SageMaker Studio notebook and from a SageMaker Studio training job. Finally, we showed how to integrate model training and SageMaker Clarify reports in a unified view, allowing you to further understand your model.

We encourage you to try out the new Experiments functionalities and connect with the Machine Learning & AI community if you have any questions or feedback!

About the Authors

Maira Ladeira Tanke is a Machine Learning Specialist at AWS. With a background in Data Science, she has 9 years of experience architecting and building ML applications with customers across industries. As a technical lead, she helps customers accelerate their achievement of business value through emerging technologies and innovative solutions. In her free time, Maira enjoys traveling and spending time with her family someplace warm.

Mani Khanuja is an Artificial Intelligence and Machine Learning Specialist SA at Amazon Web Services (AWS). She helps customers using machine learning to solve their business challenges using the AWS. She spends most of her time diving deep and teaching customers on AI/ML projects related to computer vision, natural language processing, forecasting, ML at the edge, and more. She is passionate about ML at edge, therefore, she has created her own lab with self-driving kit and prototype manufacturing production line, where she spends lot of her free time.

Dewen Qi is a Software Development Engineer at AWS. She currently participating in building a collection of platform services and tools in AWS SageMaker to help customer in making their ML projects successful. She is also passionate about bringing the concept of MLOps to broader audience. Outside of work, Dewen enjoys practicing Cello.

Abhishek Agarwal is a Senior Product Manager for Amazon SageMaker. He is passionate about working with customers and making machine learning more accessible. In his spare time, Abhishek enjoys painting, biking and learning about innovative technologies.

Dana Benson is a Software Engineer working in the Amazon SageMaker Experiments, Lineage, and Search team. Prior to joining AWS, Dana spent time enabling smart home functionality in Alexa and mobile ordering at Starbucks.