January 2021 – Page 6

IM AI: China Automaker SAIC Unveils EV Brand Powered by NVIDIA DRIVE Orin

There’s a new brand of automotive intelligence equipped with the brains — and the battery — to go the distance.

SAIC, the largest automaker in China, joined forces with etail giant Alibaba to unveil a new premium EV brand, dubbed IM, or “intelligence in motion.” The long-range electric vehicles will feature AI capabilities powered by the high-performance, energy-efficient NVIDIA DRIVE Orin compute platform.

The first two vehicles in the lineup — a flagship sedan and SUV — will have autonomous parking and other automated driving features, as well as a 93kWh battery that comes standard. SAIC will begin taking orders for the sedan at the Shanghai Auto Show in April, with the SUV following in 2022.

These models will have multi NVIDIA Orin SoCs (system-on-a-chip) at the core of a centralized computer system, achieving 500 to 1,000+ TOPS of performance for automated and autonomous capabilities in addition to in-cabin personalization that is continuously upgradable over-the-air for a truly software-defined experience.

By centralizing and unifying the compute architecture, IM vehicles will be able to receive advanced software features as they’re developed. Just like a mobile phone, which periodically gets software updates, these software-defined vehicles will do the same.

Premium Vehicles Inside and Out

Developing a top-of-the-line premium electric brand requires best-in-class in-vehicle compute.

Orin is the world’s highest-performance, most-advanced AV and robotics processor. This supercomputer on a chip is capable of delivering up to 254 trillion operations per second (TOPS) to handle the large number of applications and deep neural networks that run simultaneously in autonomous vehicles and robots, while meeting systematic safety standards such as ISO 26262 ASIL-D.

With two Orin SoCs at the center of IM vehicles, these compute platforms will be able to deliver more than 500 TOPS of performance to achieve the redundancy and diversity necessary for autonomous operation.

Like all modern computing devices, these intelligent vehicles will be supported by a large team of AI and software engineers, dedicated to improving the performance and capability of the car as technology advances.

Intelligence in Motion

The new IM vehicle lineup is upping the ante for intelligent and electric mobility.

These electric vehicles will come standard with a 93kWh battery, and a 115kWh one on premium trims.

The software-defined IM vehicles don’t just improve like mobile devices, they also work seamlessly with such technology. The brand includes smartphone features for personalized driver and passenger experiences, creating a smart living space inside the car.

As a new, continuously upgradeable electric vehicle lineup, SAIC’s premium IM brand will drive further innovation in intelligent, personal transportation.

The post IM AI: China Automaker SAIC Unveils EV Brand Powered by NVIDIA DRIVE Orin appeared first on The Official NVIDIA Blog.

Using machine learning to predict vessel time of arrival with Amazon SageMaker

According to the International Chamber of Shipping, 90% of world commerce happens at sea. Vessels are transporting every possible kind of commodity, including raw materials and semi-finished and finished goods, making ocean transportation a key component of the global supply chain. Manufacturers, retailers, and the end consumer are reliant on hundreds of thousands of ships carrying freight across the globe, delivering their precious cargo at the port of discharge after navigating for days or weeks.

As soon as a vessel arrives at its port of call, off-loading operations begin. Bulk cargo, containers, and vehicles are discharged, depending on the kind of vessel. Complex landside operations are triggered by cargo off-loading, involving multiple actors. Terminal operators, trucking companies, railways, customs, and logistic service providers work together to make sure that goods are delivered according to a specific SLA to the consignee in the most efficient way.

Business problem

Shipping companies publicly advertise their vessels’ estimated time of arrival (ETA) in port, and downstream supply chain activities are planned accordingly. However, delays often occur, and the ETA might differ from the vessel’s actual time of arrival (ATA), for instance due to technical or weather-related issues. This impacts the entire supply chain, in many instances reducing productivity and increasing waste and inefficiencies.

Predicting the exact time a vessel arrives in a port and starts off-loading operations poses remarkable challenges. Today, a majority of companies rely on experience and improvisation to respectively guess ATA and cope with its fluctuations. Very few providers are leveraging machine learning (ML) techniques to scientifically predict ETA and help companies create better planning for their supply chain. In this post, we’ll show how to use Amazon SageMaker, a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy ML models quickly, to predict the arrival time of vessels.

Study

Vessel ETA prediction is a very complex problem. It involves a huge number of variables and a lot of uncertainty. So when you decide to apply a technique like ML on a problem like that, it’s crucial to have a baseline (such as an expert user or a rule-based engine) to compare the performance and understand if your model is good enough.

This work is a study of the challenge of accurately predicting the vessel ETA. It’s not a complete solution, but it can be seen as a reference for you to implement your own sound and complete model, based on your data and expertise. The solution includes the following high-level steps:

Reduce the problem to a single vessel voyage (when the vessel departs from one given port and gets to another).
Explore a temporal dataset.
Identify the spatiotemporal aspects in the checkpoints sent by each vessel.
From a given checkpoint, predict the ETA in days for the vessel to reach the destination port (inside a given vessel voyage).

The following image shows multiple vessel voyages of the same vessel in different colors. A shipping is composed of multiple voyages.

Methodology

Vesseltracker, an AWS customer focused on maritime transportation intelligence, shared with us a sample of the historical data they collect from vessels (checkpoints) and ports (port calls) every day. The checkpoints contain the main characteristics of each vessel, plus their current geoposition, speed, direction, draught, and more. The port calls are the dates and times of each vessel’s arrival or departure.

Because we had to train a ML model to predict continuous values, we decided to experiment with some regression algorithms like XGBoost, Random Forest, and MLP. At the end of the experiments (including hyperparameter optimization), we opted for the Random Forest Regressor, given it gave us a better performance.

To train a regressor, we had to transform the data and prepare one feature to be the label, in this case, the number of days (float) that the vessel takes to get to the destination port.

For feature engineering, it’s important to highlight the following steps:

Identify each vessel voyage in the temporal dataset. Join with the port calls to mark the departure and the arrival checkpoints.
Compute backward from the destination port to the departure port the accumulated time per checkpoint.
Apply any geo-hashing mechanism to encode the GPS (latitude, longitude) and transform it into a useful feature.
Compute the great-circle distance between each sequential pair of geopositions from checkpoints.
Because the vessel changes the speed over time, we need to compute a new feature (called efficiency) that helps the model ponder the vessel displacement (speed and performance) before computing the remaining time.
Use the historical data of all voyages and the great-circle distance between each checkpoint to create an in-memory graph that shows us all the paths and distances between each segment (checkpoints).

With this graph, you can compute the distance between the current position of the vessel and the destination port. This process resulted in a new feature called accum_dist, or accumulated distance. As the feature importance analysis shows, because this feature has a high linear correlation with the target, it has a higher importance to the model.

Amazon SageMaker

We chose Amazon SageMaker to manage the entire pipeline of our study. SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy ML models quickly. SageMaker removes the heavy lifting from each step of the ML process to make it easier to develop high-quality models.

Traditional ML development is a complex, expensive, and iterative process made even harder because there are no integrated tools for the entire ML workflow. You need to stitch together tools and workflows, which is time consuming and error prone. SageMaker solves this challenge by providing all the components used for ML in a single toolset so models get to production faster with much less effort and at lower cost.

Dataset

The following tables show samples of the data we used to create the dataset. The port calls (shown in the first table) are expressed by a few attributes from the vessel, the port identification, and timestamps of the arrival and departure events of a given vessel.

	imo	arrival_date	departure_date	port
0	0	2019-01-07	2019-01-08	GUAM
1	0	2019-01-11	2019-01-12	NAHA
2	0	2019-01-12	2019-01-17	NAHA

Then we have the vessel checkpoints. A checkpoint is a message sent by each vessel at a frequency X (in this case, approximately 1 day) that contains information about the vessel itself and its current status. By joining both tables, we enrich the checkpoints with information about the vessel departure and arrival, which is crucial to enclose all the other checkpoints sent between these two events. The following table is an example of vessel checkpoint data. You can click on the table for an enlarged view.

In the next table, both tables are joined and cleaned, with the geolocation encoded and the accumulated time and distance calculated. This view is for one particular vessel, which shows all the checkpoints that belong to one given vessel voyage. You can click on the table for an enlarged view.

Finally, we have the dataset used to train our model. The first column is the label or the target value our model tries to create a regression. The rest of the columns are the features the decision tree uses during training. You can click on the table for an enlarged view.

Model and results

After preparing the data, it’s time to train our model. We used a technique called k-fold cross validation to create six different combinations of training and validation data (approximately 80% and 20%) to explore the variation of the data as much as possible. With the native support for Scikit-learn on SageMaker, we only had to create a Python script with the training code and share it to a SageMaker Estimator, prepared with the SageMaker Python library. See the following code:

model = RandomForestRegressor(
        n_estimators=est, verbose=0, n_jobs=4, criterion='mse',
        max_leaf_nodes=1500, max_depth=depth, random_state=0
    )

After training our model, we used a metric called R2 to evaluate the model performance. R2 measures the proportion of the variance in the dependent variable that is predictable from the independent variables. A poor model has a low R2 and a useful model has an R2 as close as possible to 1.0 (or 100%).

In this case, we expected a model that predicted values with a high level of correlation with the testing data. With this combination of data preparation, algorithm selection, and hyperparameters optimization and cross validation, our model achieved an R2 score of 0.9473.

This result isn’t bad, but it doesn’t mean that we can’t improve the solution. We can minimize the accumulated error by the model by adding important features to the dataset. These features can help the model better understand all the low-level nuances and conditions from each checkpoint that can cause a delay. Some examples include weather conditions from the geolocation of the vessel, port conditions, accidents, extraordinary events, seasonality, and holidays in the port countries.

Then we have the feature importance (shown in the following graph). It’s a measurement of how strong or important a given feature from the dataset is for the prediction itself. Each feature has a different importance, and we want to keep only those features that are impactful for the model in the dataset.

The graph shows that accumulated distance is the most important feature (which is expected, given the high correlation with the target), followed by efficiency (an artificial feature we created to ponder the impact of the vessel displacement over time). In third place, we have destination port, closer to the encoded geoposition.

You can download the notebooks created for this experiment, to see all the details of the implementation. Click on the links bellow to get them:

Data preparation Notebook: data exploration, data transformation and feature engineering applied to the raw data to create the dataset
Training Notebook: model training using Amazon SageMaker – Scikit-learn Estimator.

You will need Amazon SageMaker to run these notebooks so create a SageMaker Studio Domain in your AWS Account, upload the notebooks to your new environment and run your own experiments!

Summary

The use of ML in predicting vessel time of arrival can substantially increase the accuracy of land-side operations planning and implementation, in comparison to traditional, manual estimation methodologies that are used widely across the industry. We’re working with and shipping companies to improve the accuracy of our model, as well as on add other relevant features. If your company is interested in learning more about our model and how it can be consumed, please reach out to our Head of World Wide Technology for Transportation and Logistics, Michele Sancricca, at msancric@amazon.com.

About the Authors

Samir Araújo is an AI/ML Solutions Architect at AWS. He helps customers creating AI/ML solutions for solving their business challenges, using the AWS platform. He has been working on several AI/ML projects related to Computer Vision, Natural Language Processing, Forecasting, ML at the edge, etc. He likes playing with hardware and automation projects in his free time and he has a particular interest for robotics.

Michele Sancricca is the AWS Worldwide Head of Technology for Transportation and Logistics. Previously, he worked as Head of Supply Chain Products for Amazon Global Mile and led the Digital Transformation Division of Mediterranean Shipping Company. A retired Lieutenant Commander, Michele spent 12 years in the Italian Navy as Telecommunication Officer and Commanding Officer.

Creating high-quality machine learning models for financial services using Amazon SageMaker Autopilot

Machine learning (ML) is used throughout the financial services industry to perform a wide variety of tasks, such as fraud detection, market surveillance, portfolio optimization, loan solvency prediction, direct marketing, and many others. This breadth of use cases has created a need for lines of business to quickly generate high-quality and performant models that can be produced with little to no code. This reduces the long cycles for taking use cases from concept to production and generates business value. In this post, we explore how to use Amazon SageMaker Autopilot for some common use cases in the financial services industry.

Autopilot automatically generates pipelines, trains and tunes the best ML models for classification or regression tasks on tabular data, while allowing you to maintain full control and visibility. Autopilot enables automatic creation of ML models without requiring any ML experience. Autopilot automatically analyzes the dataset, processes the data into features, and trains multiple optimized ML models.

Data scientists in financial services often work on tasks where the datasets are highly imbalanced (heavily skewed towards examples of one class). Examples of such tasks include credit card fraud (where a very small fraction of the transactions are actually fraudulent) or bankruptcy (only few corporations file for bankruptcy). We demonstrate how Autopilot automatically handles class imbalance without requiring any additional inputs from the user.

Autopilot recently announced the ability to tune models using the Area Under a Curve (AUC) metric in addition to F1 as the objective metric (which is the default objective for binary classification tasks), more specifically the area under the Receiver Operating Characteristic (ROC) curve. In this post, we show how using the AUC as the model evaluation metric for highly imbalanced data allows Autopilot to generate high-quality models.

Our first use case is to detect credit card fraud based on various anonymized attributes. The dataset is highly imbalanced, with over 99% of the transactions being non-fraudulent. Our second use case is to predict bankruptcy of Polish companies [2]. Here, bankruptcy is similarly a binary response variable (will bankrupt = 1, will not bankrupt = 0), with 96% of the companies not becoming bankrupt.

Prerequisites

To reproduce these steps in your own environment, you must complete the following prerequisites:

Create an AWS Identity and Access Management (IAM) role that allows the Amazon SageMaker notebook access to Amazon Simple Storage Service (Amazon S3) for storing data.
Create an Amazon SageMaker notebook instance.
Create an S3 bucket to store the outputs of your machine learning models and any data.

Credit card fraud detection

In fraud detection tasks, companies are interested in maintaining a very low false positive rate while correctly identifying the fraudulent transactions to the greatest extent possible. A false positive can lead to a company canceling or placing a hold on a customers’ card over a legitimate transaction, which leads to a poor customer experience. As a result, accuracy is not the best metric to consider for this problem; better metrics are the AUC and the F1 score.

The following code shows data for a credit card fraud task:

import pandas as pd 
fraud_df = pd.read_csv('creditcard.csv') 
fraud_df.head(5)

You can click on the previous table to expand for better viewing.

Class 0 and class 1 correspond to No Fraud and Fraud accordingly. As we can see, other than Amount, other columns are anonymized. A key differentiator of Autopilot is its ability to process raw data directly, without the need for data processing on the part of data scientists. For example, Autopilot automatically converts categorical features into numerical values, handles missing values (as we show in the second example), and performs simple text preprocessing.

Using the AWS boto3 API or the AWS Command Line Interface (AWS CLI), we upload the data to Amazon S3 in CSV format:

import boto3
s3 = boto3.client('s3')
s3.upload_file(file_name, bucket, object_name=None)

fraud_df = pd.read_csv(<your S3 file location>)

Now, we select all columns except Class as features and Class as target:

X = fraud_df[set(fraud_df.columns) - set(['Class'])]
y = fraud_df['Class']
print (y.value_counts())
0    284315
1       492

The binary label column Class is highly imbalanced, which is a typical occurrence in financial use cases. We can verify how well Autopilot handles this highly imbalanced data.

In the following code, we demonstrate how to configure Autopilot in Jupyter notebooks. We have to provide train and test files, and to set TargetAttributeName as Class, this is the target column (the column we predict):

auto_ml_job_name = 'automl-creditcard-fraud'
import boto3
sm = boto3.client('sagemaker')
import sagemaker  
session = sagemaker.Session()

prefix = 'sagemaker/' + auto_ml_job_name
bucket = session.default_bucket()
training_data = pd.DataFrame(X_train)
training_data['Class'] = list(y_train)
test_data = pd.DataFrame(X_test)

train_file = 'train_data.csv';
training_data.to_csv(train_file, index=False, header=True)
train_data_s3_path = session.upload_data(path=train_file, key_prefix=prefix + "/train")
print('Train data uploaded to: ' + train_data_s3_path)

test_file = 'test_data.csv';
test_data.to_csv(test_file, index=False, header=False)
test_data_s3_path = session.upload_data(path=test_file, key_prefix=prefix + "/test")
print('Test data uploaded to: ' + test_data_s3_path)
input_data_config = [{
      'DataSource': {
        'S3DataSource': {
          'S3DataType': 'S3Prefix',
          'S3Uri': 's3://{}/{}/train'.format(bucket,prefix)
        }
      },
      'TargetAttributeName': 'Class'
    }
  ]

Next, we create the Autopilot job. For this post, we set ProblemType='BinaryClassification' and job_objective='AUC'. If you don’t set these fields, Autopilot automatically determines the type of supervised learning problem by analyzing the data and uses the default metric for that problem type. The default metric for binary classification is F1. We explicitly set these parameters because we want to optimize AUC.

from sagemaker.automl.automl import AutoML
from time import gmtime, strftime, sleep
from sagemaker import get_execution_role

timestamp_suffix = strftime('%d-%H-%M-%S', gmtime())
base_job_name = 'automl-card-fraud' 

target_attribute_name = 'Class'
role = get_execution_role()
automl = AutoML(role=role,
                target_attribute_name=target_attribute_name,
                base_job_name=base_job_name,
                sagemaker_session=session,
                problem_type='BinaryClassification',
                job_objective={'MetricName': 'AUC'},
                max_candidates=100)

For more information about the parameters for job configuration, see create-auto-ml-job.

After the Autopilot job is created, we call the fit() function to run it:

automl.fit(train_file, job_name=base_job_name, wait=False, logs=False)
describe_response = automl.describe_auto_ml_job()
print (describe_response)
job_run_status = describe_response['AutoMLJobStatus']
    
while job_run_status not in ('Failed', 'Completed', 'Stopped'):
    describe_response = automl.describe_auto_ml_job()
    job_run_status = describe_response['AutoMLJobStatus']
    print (job_run_status)
    sleep(30)
print ('completed')

When the job is complete, we can select the best candidate based on the AUC objective metric:

best_candidate = automl.describe_auto_ml_job()['BestCandidate']
best_candidate_name = best_candidate['CandidateName']
print("CandidateName: " + best_candidate_name)
print("FinalAutoMLJobObjectiveMetricName: " + best_candidate['FinalAutoMLJobObjectiveMetric']['MetricName'])
print("FinalAutoMLJobObjectiveMetricValue: " + str(best_candidate['FinalAutoMLJobObjectiveMetric']['Value']))
CandidateName: tuning-job-1-7e8f6c9dffe840a0bf-009-636d28c2
FinalAutoMLJobObjectiveMetricName: validation:auc
FinalAutoMLJobObjectiveMetricValue: 0.9890000224113464

We now create the Autopilot model object using the model artifacts from the Autopilot job in Amazon S3, and the inference container from the best candidate after running the tuning job. In addition to the predicted label, we’re interested in the probability of the prediction—we use this probability later to plot the AUC and precision and recall graphs.

model_name = 'automl-cardfraud-model-' + timestamp_suffix
inference_response_keys = ['predicted_label', 'probability']
model = automl.create_model(name=best_candidate_name,
candidate=best_candidate,inference_response_keys=inference_response_keys)

After the model is created, we can generate inferences for the test set using the following code. During inference time, Autopilot orchestrates deployment of the inference pipeline, including feature engineering and the ML algorithm on the inference machine.

s3_transform_output_path = 's3://{}/{}/inference-results/'.format(bucket, prefix);
output_path = s3_transform_output_path + best_candidate['CandidateName'] +'/'
transformer=model.transformer(instance_count=1, 
                          instance_type='ml.m5.xlarge',
                          assemble_with='Line',
                          output_path=output_path)
transformer.transform(data=test_data_s3_path, split_type='Line', content_type='text/csv', wait=False)

describe_response = sm.describe_transform_job(TransformJobName = transform_job_name)
job_run_status = describe_response['TransformJobStatus']
print (job_run_status)

while job_run_status not in ('Failed', 'Completed', 'Stopped'):
    describe_response = sm.describe_transform_job(TransformJobName = transform_job_name)
    job_run_status = describe_response['TransformJobStatus']
    print (describe_response)
    sleep(30)
print ('transform job completed with status : ' + job_run_status)

Finally, we read the inference and predicted data into a dataframe:

import json
import io
from urllib.parse import urlparse

def get_csv_from_s3(s3uri, file_name):
    parsed_url = urlparse(s3uri)
    bucket_name = parsed_url.netloc
    prefix = parsed_url.path[1:].strip('/')
    s3 = boto3.resource('s3')
    obj = s3.Object(bucket_name, '{}/{}'.format(prefix, file_name))
    return obj.get()["Body"].read().decode('utf-8')    
pred_csv = get_csv_from_s3(transformer.output_path, '{}.out'.format(test_file))
data_auc=pd.read_csv(io.StringIO(pred_csv), header=None)
data_auc.columns= ['label', 'proba']

Model metrics

Common metrics to compare classifiers are the ROC curve and the precision-recall curve. The ROC curve is a plot of the true positive rate against the false positive rate for various thresholds. The higher the prediction quality of the classification model, the more the ROC curve is skewed toward the top left.

The precision-recall curve demonstrates the trade-off between precision and recall, with the best models having a precision-recall curve that is flat initially and drops steeply as the recall approaches 1. The higher the precision and recall, the more the curve is skewed towards the upper right.

To optimize for the F1 score, we simply repeat the steps from earlier, setting the job_objective={'MetricName': 'F1'} and rerunning the Autopilot job. Because the steps are identical, we don’t repeat them in this section. Please note, F1 objective is default for binary classification problems. The following code plots the ROC curve:

import matplotlib.pyplot as plt
colors = ['blue','green']
model_names = ['Objective : AUC','Objective : F1']
models = [data_auc,data_f1]
from sklearn import metrics
for i in range(0,len(models)):
    fpr, tpr, _ = metrics.roc_curve(y_test, models[i]['proba'])
    fpr, tpr, _  = metrics.roc_curve(y_test, models[i]['proba'])
    auc_score = metrics.auc(fpr, tpr)
    plt.plot(fpr, tpr, label=str('Auto Pilot {:.2f} '+ model_names[i]).format(auc_score),color=colors[i]) 
        
plt.xlim([-0.1,1.1])
plt.ylim([-0.1,1.1])
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.legend(loc='lower right')
plt.title('ROC Cuve')

The following plot shows the results.

In the preceding AUC ROC plot, Autopilot models provide high AUC when optimizing both objective metrics. We also didn’t select any specific model or tune any hyperparameters; Autopilot did all that heavy lifting for us.

Finally, we plot the precision-recall curves for the trained Autopilot model:

from sklearn.metrics import precision_recall_curve
from sklearn.metrics import f1_score, precision_score, recall_score
from sklearn.metrics import plot_precision_recall_curve
import matplotlib.pyplot as plt
from sklearn import metrics

colors = ['blue','green']
model_names = ['Objective : AUC','Objective : F1']
models = [data_auc,data_f1]

print ('model ', 'F1 ', 'precision ', 'recall ')
for i in range(0,len(models)):
precision, recall, _ = precision_recall_curve(y_test, models[i]['proba'])
print (model_names[i],f1_score(y_test, np.array(models[i]['label'])),precision_score(y_test, models[i]['label']),recall_score(y_test, models[i]['label']) )
plt.plot(recall,precision,color=colors[i],label=model_names[i])

plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision-Recall Curve')
plt.legend(loc='upper right')
plt.show()

                    F1          precision      recall 
Objective : AUC 0.8164          0.872          0.7676
Objective : F1  0.7968          0.8947         0.7183

The following plot shows the results.

As we can see from the plot, Autopilot models provide good precision and recall, because the graph is heavily skewed toward the top-right corner.

Autopilot outputs

In addition to handling the heavy lifting of building and training the models, Autopilot provides visibility into the steps taken to build the models by generating two notebooks: CandidateDefinitionNotebook and DataExplorationNotebook.

You can use the candidate definition notebook to interactively step through the steps taken by Autopilot to arrive at the best candidate. You can also use this notebook to override various runtime parameters like parallelism, hardware used, algorithms explored, feature engineering scripts, hyperparameter tuning ranges, and more.

You can download the notebook from the following Amazon S3 location:

automl.describe_auto_ml_job()['AutoMLJobArtifacts']['CandidateDefinitionNotebookLocation']

The notebook also outlines the various feature engineering steps taken to build the models. The models are indexed by their model type and the feature engineering pipeline. For example, as shown in the Tuning Job Result Overview, the winning model corresponds to the pipeline dpp1-xgboost:

best_candidate_name = best_candidate['CandidateName']
print(best_candidate). From there if we look at 
print (describe_response)

If we search for ModelDataUrl, we can find Autopilot used dpp1-xgboost 'ModelDataUrl': 's3://sagemaker-us-east-1-<ACCOUNT-NUM>/automl-card-fraud-7/tuning/automl-car-dpp1-xgb/tuning-job-1-7e8f6c9dffe840a0bf-009-636d28c2/output/model.tar.gz'.

dpp1-xgboost is a data transformation strategy that transforms numeric features using RobustImputer. It merges all the generated features and applies RobustPCA followed by RobustStandardScaler. The transformed data is used to tune an XGBoost model.

From the candidate definition notebook, we can also see that Autopilot automatically applied up-weighting to the minority class using scale_pos_weight. This improves prediction quality for imbalanced datasets where the model doesn’t see many examples of the minority class during training. You can change the scale_pos_weight to a different value:

STATIC_HYPERPARAMETERS = {
    'xgboost': {
        'objective': 'binary:logistic',
        'scale_pos_weight': 568.6114285714285,
    },
}

The data exploration notebook generates a report that provides insights about the input dataset, such as the missing values or the data types for the different features:

automl.describe_auto_ml_job()['AutoMLJobArtifacts']['DataExplorationNotebookLocation']

Having described in detail the use of Autopilot to detect credit card fraud, we now briefly discuss a second task: predicting the bankruptcy of companies.

Predicting bankruptcy of Polish companies

For this post, we explore the various economic attributes in the Polish companies bankruptcy data dataset. There are 64 features and a target attribute class. We rename the column class to bankrupt (not bankrupt = 0, bankrupt = 1) for clarity. As noted before, this dataset is also highly imbalanced, with 96% of the data in the non-bankrupt category.

You can click on the previous table to expand for better viewing.

We followed the same process for running and configuring Autopilot as in the credit card fraud use case. However, unlike the credit card fraud dataset, this dataset contains missing values. Because Autopilot automatically handles missing values, we simply pass the raw data to Autopilot.

We don’t repeat the code steps in this section; we merely show the ROC and precision-recall curves. Autopilot again yields high-quality models as evidenced from the AUC, ROC, and precision-recall curves. For bankruptcy prediction, incorrectly predicting false negatives can lead to poor investment decisions, and incorrectly predicting that solvent companies may go bankrupt might lead to missed opportunities.

To boost model performance, Autopilot also automatically up-weights the minority class label, penalizing the model for mis-classifying the minority class during training. The following plot shows the precision-recall curve.

The following plot shows the ROC curve.

As we can see from these plots, for bankruptcy, the AUC objective is slightly better than F1. Autopilot can generate accurate predictions for a complex event like bankruptcy without any specialized manual feature-engineering steps.

Cleaning up

The Autopilot job creates many underlying artifacts, such as dataset splits, preprocessing scripts, and preprocessed data. To avoid incurring costs, delete these resources using the following code:

#s3 = boto3.resource('s3')
#bucket = s3.Bucket(bucket)
 
#job_outputs_prefix = '{}/output/{}'.format(prefix,auto_ml_job_name)
#bucket.objects.filter(Prefix=job_outputs_prefix).delete()

Conclusion

In this post, we demonstrated how to create ML models without any prior knowledge of algorithms using Autopilot. For imbalanced data, which is common in financial services use cases, we showed that using objective metrics such as AUC and F1 along with the automatic minority class up-weighting can lead to high-quality models. Autopilot provides the flexibility of AutoML with the control and detail of a do-it-yourself approach by unveiling the underlying metadata and the code used to preprocess the data and train the models. Importantly, AutoPilot works on datasets of all sizes ranging from few MBs to hundreds of GBs without you having to set up the underlying infrastructure. Finally, note that Amazon SageMaker Studio provides a UI for you to build, train, and deploy models using Autopilot with little to no code. For more information about tuning, training, and deploying Autopilot models, see Create a machine learning model automatically with Amazon SageMaker Autopilot.

References

[1] Yeh, I. C., & Lien, C. H. (2009). The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 36(2), 2473-2480.

[2] Zieba, M., Tomczak, S. K., & Tomczak, J. M. (2016). Ensemble Boosted Trees with Synthetic Features Generation in Application to Bankruptcy Prediction. Expert Systems with Applications.

About the Authors

Sumon Samanta is a Senior Specialist Architect for Global Financial Services at AWS. Previously, he worked as a Quantitative Developer at several investment banks to develop pricing and risk systems.

Stefan Natu is a Sr. Machine Learning Specialist at Amazon Web Services. He is focused on helping financial services customers build end-to-end machine learning solutions on AWS. In his spare time, he enjoys reading machine learning blogs, playing the guitar, and exploring the food scene in New York City.

Ilya Epshteyn is a solutions architect with AWS. He helps customers to innovate on AWS by building highly available, scalable, and secure architectures. He enjoys spending time outdoors and building Lego creations with his kids.

Miroslav Miladinovic is a Software Development Manager at Amazon SageMaker.

Jean Baptiste Faddoul is an Applied Science Manager working on SageMaker Autopilot and Automatic Model Tuning

Yotam Elor is a Senior Applied Scientist at AWS Sagemaker. He works on Sagemaker Autopilot – AWS’s auto ML solution.

How to train procedurally generated game-like environments at scale with Amazon SageMaker RL

A gym is a toolkit for developing and comparing reinforcement learning algorithms. Procgen Benchmark is a suite of 16 procedurally-generated gym environments designed to benchmark both sample efficiency and generalization in reinforcement learning. These environments are associated with the paper Leveraging Procedural Generation to Benchmark Reinforcement Learning (citation). Compared to Gym Retro, these environments have the following benefits:

Faster – Gym Retro environments are already fast, but Procgen environments can run over four times faster.
Non-deterministic – Gym Retro environments are always the same, so you can memorize a sequence of actions that gets the highest reward. Procgen environments are randomized so this isn’t possible.
Customizable – If you install from source, you can perform experiments where you change the environments, or build your own environments. The environment-specific code for each environment is often less than 300 lines. This is almost impossible with Gym Retro.

This post demonstrates how to use the Amazon SageMaker reinforcement learning starter kit for the NeurIPS 2020 – Procgen competition hosted on AIcrowd. The competition was held from June to November 2020, and results can be found here but you can still try out the solution on your own. Our solution allows participants using AIcrowd’s existing neurips2020-procgen-starter-kit to get started with SageMaker seamlessly without making any algorithmic changes. It also helps you reduce the time and effort required to build your sample-efficient reinforcement learning solutions using homogenous and heteregeneous scaling.

Finally, our solution utilizes Spot Instances to reduce cost. The cost savings with Spot GPU Instances is approximately 70% for GPU instances such as ml.p3.2x and ml.p3.8x when training with a popular state-of-the-art reinforcement learning algorithm, Proximate Policy Optimization, and a multi-layer convolutional neural network as the agent’s policy.

Architecture

As part of the solution, we use the following services:

Amazon Simple Storage Service (Amazon S3) to store datasets
A SageMaker notebook to preprocess and visualize the data, and to train the deep learning model
An Amazon Virtual Private Cloud (Amazon VPC) or AWS VPC
AWS Lambda

SageMaker reinforcement learning uses Ray and RLLib the same as in the starter kit. SageMaker supports distributed reinforcement learning in a single SageMaker ML instance with just a few lines of configuration by using the Ray RLlib library.

A typical SageMaker reinforcement learning job for an actor-critic algorithm uses GPU instances to learn a policy network and CPU instances to collect experiences for faster training at optimized costs. SageMaker allows you to achieve this by spinning up two jobs within the same Amazon VPC, and the communications between the instances are taken care of automatically.

Cost

You can contact AICrowd to get credits to use any AWS service.

You’re responsible for the cost of the AWS services used while running this solution, and should set up a budget alert when you’ve reached 90% of your allotted credits. For more information, see Amazon SageMaker Pricing.

As of September 1, 2020, SageMaker training costs (excluding notebook instances) are as follows:

c5.4xlarge – $0.952 per hour (16 vCPU)
g4dn.4xlarge – $1.686 per hour (1 GPU, 16 vCPU)
p3.2xlarge – $4.284 per hour (1 GPU, 8 vCPU)

Launching the solution

To launch the solution, complete the following steps:

While signed in to your AWS account, choose the following link to create the AWS CloudFormation stack for the Region you want to run your notebook:

You’re redirected to the AWS CloudFormation console to create your stack.

Acknowledge the use of the instance type for your SageMaker notebook and training instance.

Make sure that your AWS account has the limits for required instances. If you need to increase the limits for the instances you want to use, contact AWS Support.

As the final parameter, provide the name of the S3 bucket for the solution.

The default name is neurips-2020. You should provide a unique name to make sure there are no conflicts with your existing S3 buckets. An S3 bucket name is globally unique, and the namespace is shared by all AWS accounts. This means that after a bucket is created, the name of that bucket can’t be used by another AWS account in any AWS Region until the bucket is deleted.

Choose Create stack.

You can monitor the progress of your stack by choosing the Event tab or refreshing your screen. If you encounter any error during stack creation (such as confirming again that your S3 bucket name is unique), you can delete the stack and launch it again. When stack creation is complete, go to the SageMaker console. Your notebook should already be created and its status should read InService.

You’re now ready to start training!

On the SageMaker console, choose Notebook instances.
Locate the rl-procgen-neurips instance and choose Open Jupyter or Open JupyterLab.
Choose the notebook 1_train.ipynb.

You can use the cells following the training to run evaluations, do rollouts, and visualize your outcome.

Configuring algorithm parameters and the agent’s neural network model

To configure your RLLib algorithm parameters, go to your notebook folder and open source/train-sagemaker-distributed-{}.py. A subset of algorithm parameters are provided for PPO, but for the full set of algorithm-specific parameters, see Proximal Policy Optimization (PPO). For baselines provided in the starter kit, refer to experiments{}.yaml files and copy additional parameters to the RLLib configuration parameters in the source/train-sagemaker-distributed-.py.

To check whether your model is using the correct parameters, go to the S3 bucket and navigate to the JSON file with the parameters. For example, {Amazon SageMaker training job} >output>intermediate>training>{PPO_procgen_env_wrapper_}>param.json.

To add a custom model, create a file inside the models/ directory and name it models/my_vision_network.py.

For a working implementation of how to add a custom model, see the GitHub repo. You can set the custom_model field in the experiment .yaml file to my_vision_network to use that model.

Make sure that the model is registered. If you get an error that your model isn’t registered, go to train-sagmaker.py or train-sagmaker-distributed.py and edit def register_algorithms_and_preprocessors(self) by adding the following code:

ModelCatalog.register_custom_model("impala_cnn_tf", ImpalaCNN)

Distributed training with multiple instances

SageMaker supports distributed reinforcement learning in a single SageMaker ML instance with just a few lines of configuration by using the Ray RLlib library.

In homogeneous scaling, you use multiple instances with the same type (typically CPU instances) for a single SageMaker job. A single CPU core is reserved for the driver, and you can use the remaining as rollout workers, which generate experiences through environmental simulations. The number of available CPU cores increases with multiple instances. Homogeneous scaling is beneficial when experience collection is the bottleneck of the training workflow; for example, when your environment is computationally heavy.

With more rollout workers, neural network updates can often become the bottleneck. In this case, you could use heterogeneous scaling, in which you use different instance types together. A typical choice is to use GPU instances to perform network optimization and CPU instances to collect experiences for faster training at optimized costs. SageMaker allows you to achieve this by spinning up two jobs within the same Amazon VPC, and the communications between the instances are taken care of automatically.

To run distributed training with multiple instances, use 2_train-homo-distributed-cpu.ipynb / 3_train-homo-distributed-gpu.ipynb and train-hetero-distributed.ipynb for homogenous and heterogenous scaling, respectively. The configurable parameters for distributed training are stored in source/train-sagemaker-distributed.py. You don’t have to configure ray_num_cpus or ray_num_gpus.

Make sure you scale num_workers and train_batch_size to reflect the number of instances in the notebook. For example, if you set train_instance_count = 5 for a p3.2xlarge instance, the maximum number of workers is 39. See the following code:

"num_workers": 8*5 -1, # adjust based on total number of CPUs available in the cluster, e.g., p3.2xlarge has 8 CPUs and 1 CPU is reserved for resource allocation
  "num_gpus": 0.2, # adjust based on number of GPUs available in a single node, e.g., p3.2xlarge has 1 GPU
  "num_gpus_per_worker": 0.1, # adjust based on number of GPUs, e.g., p3.2x large (1 GPU - num_gpus) / num_workers = 0.1
  "rollout_fragment_length": 140,
  "train_batch_size": 64 * (8*5 -1),

To use a Spot Instance, you need to set the flag train_use_spot_instances = True in the final cell of train-homo-distributed.ipynb or train-hetero-distributed.ipynb. You can also use the MaxWaitTimeInSeconds parameter to control the total duration of your training job (actual training time plus waiting time).

Summary

We compared our starter kit with three different GPU instances (ml.g4n.4x, ml.p3.2x, and ml.p3.8x) using single and multiple instances. On all GPU instances, our Spot Instance training provided a 70% cost reduction. This means that you spend less than $1 with the starter kit hyperparameters for the competition’s benchmarking solution, such as, 8 MM steps, using an ml.p3.2x instance.

Our starter kit allows you to run multiple instances of ml.p3.2x with 1 GPU versus a single instance of ml.p3.8x with 4 GPUs. We observed that running a single instance of ml.p3.8x with 4 GPUs is more cost-effective than running five instances of ml.p3.2x (=5 GPUs) due to communication overhead. The single instance training with ml.p3.8x converges in 20 minutes, helping you iterate faster to meet the competition deadline.

Finally, we observed that ml.g4n.4x instances provide an additional 40% cost reduction over 70% reduction from Spot Instances. However, it takes longer to train: 45 minutes with ml.p3.8x versus 70 minutes with ml.g4n.4x.

To get started with this solution, sign in to your AWS account and choose the quick create to launch the CloudFormation stack for the Region you want to run your notebook.

About the Authors

Jonathan Chung is an Applied scientist in AWS. He works on applying deep learning to various applications including games and document analysis. He enjoys cooking and visiting historical cities around the world.

Anna Luo is an Applied Scientist in AWS. She works on utilizing reinforcement learning techniques for different domains including supply chain and recommender system. Her current personal goal is to master snowboarding.

Sahika Genc is a Principal Applied Scientist in the AWS AI team. Her current research focus is deep reinforcement learning (RL) for smart automation and robotics. Previously, she was a senior research scientist in the Artificial Intelligence and Learning Laboratory at the General Electric (GE) Global Research Center, where she led science teams on healthcare analytics for patient monitoring.

Breaking down the AI wizardry of ‘Microsoft Flight Simulator’

The post Breaking down the AI wizardry of ‘Microsoft Flight Simulator’ appeared first on The AI Blog.

Cross-lingual transfer learning for multilingual voice agents

In experiments, multilingual models outperform monolingual models.Read More

Glassdoor Ranks NVIDIA No. 2 in Latest Best Places to Work List

NVIDIA is the second-best place to work in the U.S. according to a ranking released today by Glassdoor.

The site’s Best Places to Work in 2021 list rates the 100 best U.S. companies with more than 1,000 employees, based on how their own employees rate career opportunities, company culture and senior management.

The survey’s top finisher was Bain & Company. Right behind in NVIDIA on the list are In-N-Out Burger, HubSpot and McKinsey & Company.

“This year’s winning employers have proven, according to employees, that even during extraordinary times, they’ll rise to the challenge to support their people,” said Christian Sutherland-Wong, Glassdoor chief executive officer.

Among other recent recognitions of NVIDIA’s efforts to take care of our people and communities amid the pandemic:

No. 1 on Investor’s Business Daily’s 2020 Best ESG Companies List, covering environmental, social and governance factors.
No. 2 in Newsweek’s list of America’s Most Responsible Companies.
No. 2 on America’s Most JUST Companies.
No. 4 in the People magazine’s 50 Companies That Care.
No. 4 on Fortune’s Change the World list in recognition of the company’s contributions to healthcare.

The post Glassdoor Ranks NVIDIA No. 2 in Latest Best Places to Work List appeared first on The Official NVIDIA Blog.

Thought Gaming Was Big in 2020? 2021 Is Amped Up for More

Cooking on video calls with friends. Getting to the end of Netflix’s endless content well. Going 10 months without a haircut.

Over the past year, we all found different ways to keep ourselves occupied.

Gaming, however, is a longer-term trend that promises to continue remaking global culture for years to come.

Over 2.5 billion gamers are now engaged in playing, creating, sharing and connecting with one another.

Together we watched over 100 billion hours of gaming content on YouTube — twice as much as 2018. That’s like 11 million people watching non-stop for a year.

And esports viewership is nearly half a billion people globally — up 100 million from 2018 — with another 150 million new viewers expected over the next three years.

Building on a long-term surge that has made gaming an integral part of all our lives over the past decade, market researcher Newzoo expects gaming revenue to rise 8 percent in 2021.

The Gaming Market Set New Records (Again)

And gaming revenue — projected by analysts at $175 billion last year — already towers over other global consumer entertainment markets such as music and radio, internet-based or “over the top” video of all kinds, and cinema.

There are now 2.6 billion people playing — over half the global online population.

The amount of time people played grew dramatically, with gameplay time up 26 percent in the U.S. over the past six months.

Participation on Discord — the leading real-time voice chat app for gamers — is up 2x in the past two years, with 140 million monthly active users who connect in over 4 billion minutes of conversation each day.

The Thrill of Victory

Today’s games are getting more realistic and immersive, from lifelike graphics to AI-based gameplay and realistic physics simulation.

And to play the latest titles, gamers want the latest hardware — whether a recently launched console or a PC with the latest graphics technology, NVIDIA GeForce RTX.

Two years ago, NVIDIA introduced a breakthrough in graphics — real-time ray tracing for the ultimate realism and AI-based DLSS (deep learning super sampling) to “magically” improve performance.

We named it RTX.

Together with Microsoft and top developers and game engines, we’re working to bring the visual realism of movies to fully interactive gaming.

The momentum is unstoppable. The latest consoles and the rest of the gaming ecosystem are now onboard: Ray tracing is the new standard. There are already over 30 titles with ray tracing and many more on the way. You can find a list of select titles here.

And some of the most popular games over the past few years — Minecraft, with over 200 million copies sold and 126 million people playing per month, and Fortnite, with over 350 million registered users — have introduced real-time ray tracing enhancements to their games, enabling much more realistic and immersive experiences.

New Hardware Enables Better Games

The hits keep on coming.

Cyberpunk 2077, released a few weeks ago, generated 8 million pre-orders (almost 5 million for PC), and sold 13 million copies in its first 10 days. It shattered the single-player record for number of concurrent Steam users within two hours of launch with over 1 million.

Cyberpunk 2077 is just one example of how production value of games continues to increase. Bigger worlds and cinematic graphics demand more of the GPU. A survey of GeForce gamers showed that 59 percent of GPU or PC upgrades are due to the low performance of a game they’re playing or requirements for one that they’re anticipating.

This increase in production value is evident in the latest sequels to these AAA games, where the GeForce GTX 1060, one of our most popular GPUs of all time, struggles to keep up.

For instance, gamers were playing Watch Dogs 2 (released in 2016) with high settings at 60 frames per second on a GTX 1060. But when they moved on to Watch Dogs: Legion, released in October, they only saw 24 frames per second on that system.

And turning on ray tracing to get the best visuals makes Watch Dogs: Legion virtually unplayable on a GTX 1060.

Fear not, however. The just-announced GeForce RTX 3060 based on the NVIDIA Ampere architecture brings the game back to life, at a price that makes RTX technology available to all PC gamers.

With tens of millions of GeForce GTX GPUs in use today, the upgrade opportunity is enormous. The performance gains and new features in GeForce RTX 30 Series GPUs make it a great time for all PC gamers to upgrade.

Competitive Gaming Takes Center Stage

Weekend basketball player? We already know you love the NBA. It’s no surprise, then, that hundreds of millions of video game players like to watch pros play their games, too.

There are professional leagues for top games like Counter-Strike, League of Legends and Dota 2. The stakes in these contests are serious. In 2019, the prize pool for the Dota 2 International tournament reached $34 million. That’s more than triple the $11 million prize purse for the U.S. Masters golf tournament.

When the stakes are high, quality gear matters. Response time is critical to be most competitive at esports games — typically shooters — like Counter-Strike or Fortnite. Every millisecond matters.

Esports pros and enthusiasts strive for the lowest latency — down to zero if they could get there. So NVIDIA invented Reflex to help get the latency of the system as low as possible.

NVIDIA Reflex optimizes the rendering pipeline across the CPU and GPU to reduce latency by up to 50 percent, giving gamers a precious advantage. A 20-millisecond advantage can mean the difference between winning and losing in the physical sports world and esports.

With Reflex and our Game Ready Drivers, over 100 million GeForce gamers are instantly more competitive. Fortnite, Apex Legends, Valorant and Call of Duty: Warzone were among the first to integrate Reflex technology. With this week’s announcements, seven of the top 10 competitive shooter games now support Reflex. And more are on the way.

It’s no wonder that 99 percent of esports pros play on GeForce GPUs.

You Can Take It With You

Laptops are the fastest growing gaming platform, increasing 7x in seven years. And the power of these laptops has inspired gamers to find new ways to play, whether in console mode connected to a big screen TV using a controller, or driving an ultra-wide monitor with a keyboard and mouse.

Laptop PCs were flying off shelves so fast last year, manufacturers couldn’t keep up with demand. Laptop PC sales were up 26 percent over 2019, the largest gain in many years. It would have been even higher if more supply were available.

Many laptop buyers are looking for their PCs to perform a wide variety of activities — from keeping up on social media and streaming videos to creating and editing videos and playing the latest games. They want as much performance as they can get without having to sacrifice portability.

To address that demand, we’ve just announced that the GeForce RTX 30 Series, powered by the NVIDIA Ampere architecture, will soon be available in laptops. These laptops can handle all the latest tasks and are available in slim designs using our third-generation Max-Q technologies.

RTX 30 Series laptops bring exceptional power to gamers and creators, with the best laptops for creators meeting the specifications of our NVIDIA Studio program.

And there are laptops available specifically for esports players that include 240Hz displays for fast response and low latency. Gamers can compete at the highest level on these devices.

This week we announced that our partners are introducing over 70 laptops — it’s our biggest GeForce laptop launch ever. These are the world’s fastest laptops that give gamers and creators a huge variety to choose the right device for their needs.

More to Come

The GeForce RTX 30 Series GPUs shipping in desktops and, very shortly, in laptops, coupled with ray tracing in games, will fuel the next round of gaming PCs and upgrades.

Over the past two decades, GPUs have revolutionized modern graphics again and again. Once the holy grail of computer graphics, ray tracing is now the standard.

We look forward to more revolutions to come: if the last 20 years of graphics and gaming were amazing, the next 20 will seem nothing short of science fiction.

The post Thought Gaming Was Big in 2020? 2021 Is Amped Up for More appeared first on The Official NVIDIA Blog.

Google Research: Looking Back at 2020, and Forward to 2021

Posted by Jeff Dean, Senior Fellow and SVP of Google Research and Health, on behalf of the entire Google Research community

When I joined Google over 20 years ago, we were just figuring out how to really start on the journey of making a high quality and comprehensive search service for information on the web, using lots of curiously wired computers. Fast forward to today, and while we’re taking on a much broader array of technical challenges, it’s still with the same overarching goal of organizing the world’s information and making it universally accessible and useful. In 2020, as the world has been reshaped by COVID-19, we saw the ways research-developed technologies could help billions of people better communicate, understand the world, and get things done. I’m proud of what we’ve accomplished, and excited about new possibilities on the horizon.

The goal of Google Research is to work on long-term, ambitious problems across a wide range of important topics — from predicting the spread of COVID-19, to designing algorithms, to learning to translate more and more languages automatically, to mitigating bias in ML models. In the spirit of our annual reviews for 2019, 2018, and more narrowly focused reviews of some work in 2017 and 2016, this post covers key Google Research highlights from this unusual year. This is a long post, but grouped into many different sections. Hopefully, there’s something interesting in here for everyone! For a more comprehensive look, please see our >750 research publications in 2020.

COVID-19 and Health
As the impact of COVID-19 took a tremendous toll on people’s lives, researchers and developers around the world rallied together to develop tools and technologies to help public health officials and policymakers understand and respond to the pandemic. Apple and Google partnered in 2020 to develop the Exposure Notifications System (ENS), a Bluetooth-enabled privacy-preserving technology that allows people to be notified if they have been exposed to others who have tested positive for COVID-19. ENS supplements traditional contact tracing efforts and has been deployed by public health authorities in more than 50 countries, states and regions to help curb the spread of infection.

In the early days of the pandemic, public health officials signalled their need for more comprehensive data to combat the virus’ rapid spread. Our Community Mobility Reports, which provide anonymized insights into movement trends, are helping researchers not only understand the impact of policies like stay-at-home directives and social distancing, and also conduct economic forecasting.

Community Mobility Reports: Navigate and download a report for regions of interest.

Our own researchers have also explored using this anonymized data to forecast COVID-19 spread using graph neural networks instead of traditional time series-based models.

Although the research community knew little about this disease and secondary effects initially, we’re learning more every day. Our COVID-19 Search Trends symptoms allows researchers to explore temporal or symptomatic associations, such as anosmia — the loss of smell that is sometimes a symptom of the virus. To further support the broader research community, we launched Google Health Studies app to provide the public ways to participate in research studies.

Our COVID-19 Search Trends are helping researchers study the link between the disease’s spread and symptom-related searches.

Teams across Google are contributing tools and resources to the broader scientific community, which is working to address the health and economic impacts of the virus.

A spatio-temporal graph for modelling COVID-19 Spread.

Accurate information is critical in dealing with public health threats. We collaborated with many product teams at Google in order to improve information quality about COVID-19 in Google News and Search through supporting fact checking efforts, as well as similar efforts in YouTube.

We helped multilingual communities get equal access to critical COVID-19 information by sponsoring localization of Nextstrain.org’s weekly Situation Reports and developing a COVID-19 open source parallel dataset in collaboration with Translators Without Borders.

Modelling a complex global event is particularly challenging and requires more comprehensive epidemiological datasets, the development of novel interpretable models and agent-based simulators to inform the public health response. Machine learning techniques have also helped in other ways from deploying natural language understanding to helping researchers quickly navigate the mountains of COVID-19 scientific literature, applying anonymization technology to protect privacy while making useful datasets available, and exploring whether public health can conduct faster screening with fewer tests via Bayesian group testing.

These are only a sample of the many pieces of work that happened across Google to help users and public health authorities respond to COVID-19. For more, see using technology to help take on COVID-19.

Research in Machine Learning for Medical Diagnostics
We continue to make headway helping clinicians harness the power of ML to deliver better care for more patients. This year we have described notable advances in applying computer vision to aid doctors in the diagnosis and management of cancer, including helping to make sure that doctors don’t miss potentially cancerous polyps during colonoscopies, and showing that an ML system can achieve substantially higher accuracy than pathologists in Gleason grading of prostate tissue, enabling radiologists to achieve significant reductions in both false negative and false positive results when examining X-rays for signs of breast cancer.

To determine the aggressiveness of prostate cancers, pathologists examine a biopsy and assign it a Gleason grade. In published research, our system was able to grade with higher accuracy than a cohort of pathologists who have not had specialist training in prostate cancer. The first stage of the deep learning system assigns a Gleason grade to every region in a biopsy. In this biopsy, green indicates Gleason pattern 3, while yellow indicates Gleason pattern 4.

We’ve also been working on systems to help identify skin disease, help detect age-related macular degeneration (the leading cause of blindness in the U.S. and U.K., and the third-largest cause of blindness worldwide), and on potential novel non-invasive diagnostics (e.g., being able to detect signs of anemia from retinal images).

Our study examines how a deep learning model can quantify hemoglobin levels — a measure doctors use to detect anemia — from retinal images.

This year has also brought exciting demonstrations of how these same technologies can peer into the human genome. Google’s open-source tool, DeepVariant, identifies genomic variants in sequencing data using a convolutional neural network, and this year won the FDA Challenge for best accuracy in 3 out of 4 categories. Using this same tool, a study led by the Dana-Farber Cancer Institute improved diagnostic yield by 14% for genetic variants that lead to prostate cancer and melanoma in a cohort of 2,367 cancer patients.

Research doesn’t end at measurement of experimental accuracy. Ultimately, truly helping patients receive better care requires understanding how ML tools will affect people in the real world. This year we began work with Mayo Clinic to develop a machine learning system to assist in radiotherapy planning and to better understand how this technology could be deployed into clinical practice. With our partners in Thailand, we’ve used diabetic eye disease screening as a test case in how we can build systems with people at the center, and recognize the fundamental role of diversity, equity, and inclusion in building tools for a healthier world.

Weather, Environment and Climate Change
Machine learning can help us better understand the environment and make useful predictions to help people in both their everyday life as well as in disaster situations. For weather and precipitation forecasting, computationally intensive physics-based models like NOAA’s HRRR have long reigned supreme. We have been able to show, though, that ML-based forecasting systems can predict current precipitation with much better spatial resolution (“Is it raining in my local park in Seattle?” and not just “Is it raining in Seattle?”) and can produce short-term forecasts of up to eight hours that are considerably more accurate than HRRR, and can compute the forecast more quickly, yet with higher temporal and spatial resolution.

A visualization of predictions made over the course of roughly one day. Left: The 1-hour HRRR prediction made at the top of each hour, the limit to how often HRRR provides predictions. Center: The ground truth, i.e., what we are trying to predict. Right: The predictions made by our model. Our predictions are every 2 minutes (displayed here every 15 minutes) at roughly 10 times the spatial resolution made by HRRR. Notice that we capture the general motion and general shape of the storm.

We’ve also developed an improved technique called HydroNets, which uses a network of neural networks to model the actual river systems in the world to more accurately understand the interactions of upstream water levels to downstream inundation, resulting in more accurate water-level predictions and flood forecasting. Using these techniques, we’ve expanded our coverage of flood alerts by 20x in India and Bangladesh, helping to better protect more than 200 million people in 250,000 square kilometers.

An illustration of the HydroNets architecture.

Better analysis of satellite imagery data can also give Google users a better understanding of the impact and extent of wildfires (which caused devastating effects in California and Australia this year). We showed that automated analysis of satellite imagery can help with rapid assessment of damage after natural disasters even with limited prior satellite imagery. It can also aid urban tree-planting efforts by helping cities assess their current tree canopy coverage and where they should focus on planting new trees. We’ve also shown how machine learning techniques that leverage temporal context can help improve ecological and wildlife monitoring.

Based on this work, we’re excited to partner with NOAA on using AI and ML to amplify NOAA’s environmental monitoring, weather forecasting and climate research using Google Cloud’s infrastructure.

Accessibility
Machine learning continues to provide amazing opportunities for improving accessibility, because it can learn to transfer one kind of sensory input into others. As one example, we released Lookout, an Android application that can help visually impaired users by identifying packaged foods, both in a grocery store and also in their kitchen cupboard at home. The machine learning system behind Lookout demonstrates that a powerful-but-compact machine learning model can accomplish this in real-time on a phone for nearly 2 million products.

Similarly, people who communicate with sign language find it difficult to use video conferencing systems because even if they are signing, they are not detected as actively speaking by audio-based speaker detection systems. Developing Real-Time, Automatic Sign Language Detection for Video Conferencing presents a real-time sign language detection model and demonstrates how it can be used to provide video conferencing systems with a mechanism to identify the person signing as the active speaker.

We also enabled useful Android accessibility capabilities such as Voice Access and Sound Notifications for important household sounds.

Live Caption was expanded to support calls on the Pixel phone with the ability to caption phone calls and video calls. This came out of the Live Relay research project, which enables deaf and hard of hearing people to make calls without assistance.

Applications of ML to Other Fields
Machine learning continues to prove vital in helping us make progress across many fields of science. In 2020, in collaboration with the FlyEM team at HHMI Janelia Research Campus, we released the drosophila hemibrain connectome, the large synapse-resolution map of brain connectivity, reconstructed using large-scale machine learning models applied to high-resolution electron microscope imaging of brain tissue. This connectome information will aid neuroscientists in a wide variety of inquiries, helping us all better understand how brains function. Be sure to check out the very fly interactive 3-D UI!

The application of ML to problems in systems biology is also on the rise. Our Google Accelerated Science team, in collaboration with our colleagues at Calico, have been applying machine learning to yeast, to get a better understanding of how genes work together as a whole system. We’ve also been exploring how to use model-based reinforcement learning in order to design biological sequences like DNA or proteins that have desirable properties for medical or industrial uses. Model-based RL is used to improve sample efficiency. At each round of experimentation the policy is trained offline using a simulator fit on functional measurements from prior rounds. On various tasks like designing DNA transcription factor binding sites, designing antimicrobial proteins, and optimizing the energy of Ising models based on protein structures, we find that model-based RL is an attractive alternative to existing methods.

In partnership with X-Chem Pharmaceuticals and ZebiAI, we have also been developing ML techniques to do “virtual screening” of promising molecular compounds computationally. Previous work in this area has tended to focus on relatively small sets of related compounds, but in this work, we are trying to use DNA-encoded small molecule libraries in order to be able to generalize to find “hits” across a wide swath of chemical space, reducing the need for slower, physical-based lab work in order to progress from idea to working pharmaceutical.

We’ve also seen success applying machine learning to core computer science and computer systems problems, a growing trend that is spawning entire new conferences like MLSys. In Learning-based Memory Allocation for C++ Server Workloads, a neural network-based language model predicts context-sensitive per-allocation site object lifetime information, and then uses this to organize the heap so as to reduce fragmentation. It is able to reduce fragmentation by up to 78% while only using huge pages (which are better for TLB behavior). End-to-End, Transferable Deep RL for Graph Optimization described an end-to-end transferable deep reinforcement learning method for computational graph optimization that shows 33%-60% speedup on three graph optimization tasks compared to TensorFlow default optimization, with 15x faster convergence over prior computation graph optimization methods.

Overview of GO: An end-to-end graph policy network that combines graph embedding and sequential attention.

As described in Chip Design with Deep Reinforcement Learning, we have also been applying reinforcement learning to the problem of place-and-route in computer chip design. This is normally a very time-consuming, labor-intensive process, and is a major reason that going from an idea for a chip to actually having a fully designed and fabricated chip takes so long. Unlike prior methods, our approach has the ability to learn from past experience and improve over time. In particular, as we train over a greater number of chip blocks, our method becomes better at rapidly generating optimized placements for previously unseen chip blocks. The system is able to generate placements that usually outperform those of human chip design experts, and we have been using this system (running on TPUs) to do placement and layout for major portions of future generations of TPUs. Menger is a recent infrastructure we’ve built for large-scale distributed reinforcement learning that is yielding promising performance for difficult RL tasks such as chip design.

Macro placements of Ariane, an open-source RISC-V processor, as training progresses. On the left, the policy is being trained from scratch, and on the right, a pre-trained policy is being fine-tuned for this chip. Each rectangle represents an individual macro placement. Notice how the cavity that is occupied by non-macro logic cells that is discovered by the from-scratch policy is already present from the outset in the pre-trained policy’s placement.

Responsible AI
The Google AI Principles guide our development of advanced technologies. We continue to invest in responsible AI research and tools, update our recommended technical practices in this area, and share regular updates — including a 2020 blog post and report — on our progress in implementation.

To help better understand the behavior of language models, we developed the Language Interpretability Tool (LIT), a toolkit for better interpretability of language models, enabling interactive exploration and analysis of their decisions. We developed techniques for measuring gendered correlations in pre-trained language models and scalable techniques for reducing gender bias in Google Translate. We used the kernel trick to propose a simple method to estimate the influence of a training data example on an individual prediction. To help non-specialists interpret machine learning results, we extended the TCAV technique introduced in 2019 to now provide a complete and sufficient set of concepts. With the original TCAV work, we were able to say that ‘fur’ and ‘long ears’ are important concepts for ‘rabbit’ prediction. With this work, we can also say that these two concepts are enough to fully explain the prediction; you don’t need any other concepts. Concept bottleneck models are a technique to make models more interpretable by training them so that one of the layers is aligned with pre-defined expert concepts (e.g., “bone spurs present”, or “wing color”, as shown below) before making a final prediction for a task, so that we can not only interpret but also turn on/off these concepts on the fly.

Aligning predictions to pre-identified concepts can make models more interpretable, as described in Concept Bottleneck Models.

In collaboration with many other institutions, we also looked into memorization effects of language models, showing that training data extraction attacks are realistic threats on state-of-the-art large language models. This finding along with a result that embedding models can leak information can have significant privacy implications (especially for models trained on private data). In Thieves of Sesame Street: Model Extraction on BERT-based APIs, we demonstrated that attackers with only API access to a language model could create models whose outputs had very high correlation with the original model, even with relatively few API queries to the original model. Subsequent work demonstrated that attackers can extract smaller models with arbitrary accuracy. On the AI Principle of safety we demonstrated that thirteen published defenses to adversarial examples can be circumvented despite attempting to perform evaluations using adaptive attacks. Our work focuses on laying out the methodology and the approach necessary to perform an adaptive attack, and thus will allow the community to make further progress in building more robust models.

Examining the way in which machine learning systems themselves are examined is also an important area of exploration. In collaboration with the Partnership on AI, we defined a framework for how to audit the use of machine learning in software product settings, drawing on lessons from the aerospace, medical devices, and finance industries and their best practices. In joint work with University of Toronto and MIT, we identified several ethical concerns that can arise when auditing the performance of facial recognition systems. In joint work with the University of Washington, we identified some important considerations related to diversity and inclusion when choosing subsets for evaluating algorithmic fairness. As an initial step in making responsible AI work for the next billion users and to help understand if notions of fairness were consistent in different parts of the world, we analyzed and created a framework for algorithmic fairness in India, accounting for datasets, fairness optimizations, infrastructures, and ecosystems

The Model Cards work that was introduced in collaboration with the University of Toronto in 2019 has been growing in influence. Indeed, many well-known models like OpenAI’s GPT-2 and GPT-3, many of Google’s MediaPipe models and various Google Cloud APIs have all adopted Model Cards as a way of giving users of a machine learning model more information about the model’s development and the observed behavior of the model under different conditions. To make this easier for others to adopt for their own machine learning models, we also introduced the Model Card Toolkit for easier model transparency reporting. In order to increase transparency in ML development practices, we demonstrate the applicability of a range of best practices throughout the dataset development lifecycle, including data requirements specification and data acceptance testing.

In collaboration with the U.S. National Science Foundation (NSF), we announced and helped to fund a National AI Research Institute for Human-AI Interaction and Collaboration. We also released the MinDiff framework, a new regularization technique available in the TF Model Remediation library for effectively and efficiently mitigating unfair biases when training ML models, along with ML-fairness gym for building simple simulations that explore potential long-run impacts of deploying machine learning-based decision systems in social environments.

In addition to developing frameworks for fairness, we developed approaches for identifying and improving the health and quality of experiences with Recommender Systems, including using reinforcement learning to introduce safer trajectories. We also continue to work on improving the reliability of our machine learning systems, where we’ve seen that approaches such as generating adversarial examples can improve robustness and that robustness approaches can improve fairness.

Differential privacy is a way to formally quantify privacy protections and requires a rethinking of the most basic algorithms to operate in a way that they do not leak information about any particular individual. In particular, differential privacy can help in addressing memorization effects and information leakage of the kinds mentioned above. In 2020 there were a number of exciting developments, from more efficient ways of computing private empirical risk minimizers to private clustering methods with tight approximation guarantees and private sketching algorithms. We also open sourced the differential privacy libraries that lie at the core of our internal tools, taking extra care to protect against leakage caused by the floating point representation of real numbers. These are the exact same tools that we use to produce differentially private COVID-19 mobility reports that have been a valuable source of anonymous data for researchers and policymakers.

To help developers assess the privacy properties of their classification models we released an ML privacy testing library in Tensorflow. We hope this library will be the starting point of a robust privacy testing suite that can be used by any machine learning developer around the world.

Membership inference attack on models for CIFAR10. The x-axis is the test accuracy of the model, and y-axis is vulnerability score (lower means more private). Vulnerability grows while test accuracy remains the same — better generalization could prevent privacy leakage.

In addition to pushing the state of the art in developing private algorithms, I am excited about the advances we made in weaving privacy into the fabric of our products. One of the best examples is Chrome’s Privacy Sandbox, which changes the underpinnings of the advertising ecosystem and helps systematically protect individuals’ privacy. As part of the project, we proposed and evaluated a number of different APIs, including federated learning of cohorts (FLoC) for interest based targeting, and aggregate APIs for differentially private measurement.

Launched in 2017, federated learning is now a complete research field unto itself, with over 3000 publications on federated learning appearing in 2020 alone. Our cross-institutional Advances and Open Problems in Federated Learning survey paper published in 2019 has been cited 367 times in the past year, and an updated version will soon be published in the Foundations & Trends in Machine Learning series. In July, we hosted a Workshop on Federated Learning and Analytics, and made all research talks and a TensorFlow Federated tutorial publicly available.

The lifecycle of an FL-trained model and the various actors in a federated learning system.

We continue to push the state of the art in federated learning, including the development of new federated optimization algorithms including adaptive learning algorithms, posterior averaging algorithms, and techniques for mimicking centralized algorithms in federated settings, substantial improvements in complimentary cryptographic protocols, and more. We announced and deployed federated analytics, enabling data science over raw data that is stored locally on users’ devices. New uses of federated learning in Google products include contextual emoji suggestions in Gboard, and pioneering privacy-preserving medical research with Google Health Studies. Furthermore, in Privacy Amplification via Random Check-Ins we presented the first privacy accounting mechanism for Federated Learning.

Security for our users is also an area of considerable interest for us. In 2020, we continued to improve protections for Gmail users, by deploying a new ML-based document scanner that provides protection against malicious documents, which increased malicious office document detection by 10% on a daily basis. Thanks to its ability to generalize, this tool has been very effective at blocking some adversarial malware campaigns that elude other detection mechanisms and increased our detection rate by 150% in some cases.

On the account protection side, we released a fully open-source security key firmware to help advance state of art in the two factor authentication space, staying focused on security keys as the best way to protect accounts against phishing.

Natural Language Understanding
Better understanding of language is an area where we saw considerable progress this year. Much of the work in this space from Google and elsewhere now relies on Transformers, a particular style of neural network model originally developed for language problems (but with a growing body of evidence that they are also useful for images, videos, speech, protein folding, and a wide variety of other domains).

One area of excitement is in dialog systems that can chat with a user about something of interest, often encompassing multiple turns of interaction. While successful work in this area to date has involved creating systems that are specialized around particular topics (e.g., Duplex) these systems cannot carry on general conversations. In pursuit of the general research goal of creating systems capable of much more open-ended dialog, in 2020 we described Meena, a learned conversational agent that aspirationally can chat about anything. Meena achieves high scores on a dialog system metric called SSA, which measures both sensibility and specificity of responses. We’ve seen that as we scale up the model size of Meena, it is able to achieve lower perplexity and, as shown in the paper, lower perplexity correlates extremely closely with improved SSA.

A chat between Meena (left) and a person (right).

One well-known issue with generative language models and dialog systems is that when discussing factual data, the model’s capacity may not be large enough to remember every specific detail about a topic, so they generate language that is plausible but incorrect. (This is not unique to machines — people can commit these errors too.) To address this in dialog systems, we are exploring ways to augment a conversational agent by giving it access to external information sources (e.g., a large corpus of documents or a search engine API), and developing learning techniques to use this as an additional resource in order to generate language that is consistent with the retrieved text. Work in this area includes integrating retrieval into language representation models (and a key underlying technology for this to work well is something like ScaNN, an efficient vector similarity search, to efficiently match the desired information to information in the corpus of text). Once appropriate content is found, it can be better understood with approaches like using neural networks to find answers in tables and extracting structured data from templatic documents. Our work on PEGASUS, a state-of-the-art model for abstractive text summarization can also help to create automatic summaries from any piece of text, a general technique useful in conversations, retrieval systems, and many other places.

Efficiency of NLP models has also been a significant focus for our work in 2020. Techniques like transfer learning and multi-task learning can dramatically help with making general NLP models usable for new tasks with modest amounts of computation. Work in this vein includes transfer learning explorations in T5, sparse activation of models (as in our GShard work mentioned below), and more efficient model pre-training with ELECTRA. Several threads of work also look to improve on the basic Transformer architecture, including Reformer, which uses locality-sensitive hashing and reversible computation to more efficiently support much larger attention windows, Performers, which use an approach for attention that scales linearly rather than quadratically (and discusses its use in the context of protein modeling), and ETC and BigBird, which utilize global and sparse random connections, to enable linear scaling for larger and structured sequences. We also explored techniques for creating very lightweight NLP models that are 100x smaller than a larger BERT model, but perform nearly as well for some tasks, making them very suitable for on-device NLP. In Encode, Tag and Realize, we also explored new approaches for generative text models that use edit operations rather than fully general text generation, which can have advantages in computation requirements for generation, more control over the generated text, and require less training data.

Language Translation
Effective language translation helps bring the world closer together by enabling us to all communicate, despite speaking different languages. To date, over a billion people around the world use Google Translate, and last year we added support for five new languages (Kinyarwanda, Odia, Tatar, Turkmen and Uyghur, collectively spoken by 75 million people). Translation quality continues to improve, showing an average +5 BLEU point gain across more than 100 languages from May 2019 to May 2020, through a wide variety of techniques like improved model architectures and training, better handling of noise in datasets, multilingual transfer and multi-task learning, and better use of monolingual data to improve low-resource languages (those without much written public content on the web), directly in line with our goals of improving ML fairness of machine learning systems to provide benefits to the greatest number of people possible.

We strongly believe that continued scaling of multilingual translation models will bring further quality improvements, especially to the billions of speakers of low-resource languages around the world. In GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding, Google researchers showed that training sparsely-activated multilingual translation models of up to 600 billion parameters leads to major improvements in translation quality for 100 languages as measured by BLEU score improvement over a baseline of a separate 400M parameter monolingual baseline model for each language. Three trends stood out in this work, illustrated by Figure 6 in the paper, reproduced below (see the paper for complete discussion):

The BLEU score improvements from multilingual training are high for all languages but are even higher for low-resource languages (right hand side of graph is higher than the left) whose speakers represent billions of people in some of the world’s most marginalized communities. Each rectangle on the figure represents languages with 1B speakers.
The larger and deeper the model, the larger the BLEU score improvements were across all languages (the lines hardly ever cross).
Large, sparse models also show a ~10x to 100x improvement in computational efficiency for model training over training a large, dense model, while simultaneously matching or significantly exceeding the BLEU scores of the large, dense model (computational efficiency discussed in paper).

An illustration of the significant gains in translation quality across 100 languages for large, sparsely-activated language models described in GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding.

We’re actively working on bringing the benefits demonstrated in this GShard research work to Google Translate, as well as training single models that cover 1000 languages, including languages like Dhivehi and Sudanese Arabic (while sharing some challenges that needed solving along the way).

We also developed techniques to create language-agnostic representations of sentences for BERT models, which can help with developing better translation models. To more effectively evaluate translation quality, we introduced BLEURT, a new metric for evaluating language generation for tasks like translation that considers the semantics of the generated text, rather than just the amount of word overlap with ground-truth data, illustrated in the table below.

Machine Learning Algorithms
We continue to develop new machine learning algorithms and approaches for training that enable systems to learn more quickly and from less supervised data. By replaying intermediate results during training of neural networks, we find that we can fill idle time on ML accelerators and therefore can train neural networks faster. By changing the connectivity of neurons dynamically during training, we can find better solutions compared with statically-connected neural networks. We also developed SimCLR, a new self-supervised and semi-supervised learning technique that simultaneously maximizes agreement between differently transformed views of the same image and minimizes agreement between transformed views of different images. This approach significantly improves on the best self-supervised learning techniques.

ImageNet top-1 accuracy of linear classifiers trained on representations learned with different self-supervised methods (pretrained on ImageNet). Gray cross indicates supervised ResNet-50.

We also extended the idea of contrastive learning to the supervised regime, resulting in a loss function that significantly improves over cross-entropy for supervised classification problems.

Reinforcement Learning
Reinforcement learning (RL), which learns to make good long-term decisions from limited experience, has been an important focus area for us. An important challenge in RL is to learn to make decisions from few data points, and we’ve improved RL algorithm efficiency through learning from fixed datasets, learning from other agents, and improving exploration.

A major focus area this year has been around offline RL, which relies solely on fixed, previously collected datasets (for example, from previous experiments or human demonstrations), extending RL to the applications that can’t collect training data on-the-fly. We’ve introduced a duality approach to RL, developed improved algorithms for off-policy evaluation, estimating confidence intervals, and offline policy optimization. In addition, we’re collaborating with the broader community to tackle these problems by releasing open-source benchmark datasets, and DQN dataset for Atari.

Offline RL on Atari games using the DQN Replay Dataset.

Another line of research improved sample efficiency by learning from other agents through apprenticeship learning. We developed methods to learn from informed agents, matching other agent’s distribution, or learning from adversarial examples. To improve the exploration in RL, we explored bonus-based exploration methods including imitation techniques able to mimic structured exploration arising in agents having prior knowledge about their environment.

We’ve also made significant advances in the mathematical theory of reinforcement learning. One of our main areas of research was studying reinforcement learning as an optimization process. We found connections to the Frank-Wolfe algorithm, momentum methods, KL divergence regularization, operator theory, and convergence analysis; some of these insights led to an algorithm that achieves state-of-the-art performance in challenging RL benchmarks and discovery that polynomial transfer functions avoid convergence problems associated with softmax, both in RL and supervised learning. We’ve made some exciting progress on the topic of safe reinforcement learning, where one seeks to discover optimal control rules while respecting important experimental constraints. This includes a framework for safe policy optimization. We studied efficient RL-based algorithms for solving a class of problems known as mean field games, which model systems with a large number of decision-makers, from mobile networks to electric grids.

We’ve made breakthroughs toward generalization to new tasks and environments, an important challenge for scaling up RL to complex real-world problems. A 2020 focus area was population-based learning-to-learn methods, where another RL or evolutionary agent trained a population of RL agents to create a curriculum of emergent complexity, and discover new state-of-the-art RL algorithms. Learning to estimate the importance of data points in the training set and parts of visual input with selective attention resulted in significantly more skillful RL agents.

Overview of our method and illustration of data processing flow in AttentionAgent. Top: Input transformation — A sliding window segments an input image into smaller patches, and then “flattens” them for future processing. Middle: Patch election — The modified self-attention module holds votes between patches to generate a patch importance vector. Bottom: Action generation — AttentionAgent picks the patches of the highest importance, extracts corresponding features and makes decisions based on them.

Further, we made progress in model-based RL by showing that learning predictive behavior models accelerates RL learning, and enables decentralized cooperative multi-agent tasks in diverse teams, and learning long-term behavior models. Observing that skills bring predictable changes in the environment, we discover skills without supervision. Better representations stabilize RL learning, while hierarchical latent spaces and value-improvement paths yield better performance.

We shared open source tools for scaling up and productionizing RL. To expand the scope and problems tackled by users, we’ve introduced SEED, a massively parallel RL agent, released a library for measuring the RL algorithm reliability, and a new version of TF-Agents that includes distributed RL, TPU support, and a full set of bandit algorithms. In addition, we performed a large empirical study of RL algorithms to improve hyperparameter selection and algorithm design.

Finally, in collaboration with Loon, we trained and deployed RL to more efficiently control stratospheric balloons, improving both power usage and their ability to navigate.

AutoML
Using learning algorithms to develop new machine learning techniques and solutions, or meta-learning, is a very active and exciting area of research. In much of our previous work in this area, we’ve created search spaces that look at how to find ways to combine sophisticated hand-designed components together in interesting ways. In AutoML-Zero: Evolving Code that Learns, we took a different approach, by giving an evolutionary algorithm a search space consisting of very primitive operations (like addition, subtraction, variable assignment, and matrix multiplication) in order to see if it was possible to evolve modern ML algorithms from scratch. The presence of useful learning algorithms in this space is incredibly sparse, so it is remarkable that the system was able to progressively evolve more and more sophisticated ML algorithms. As shown in the figure below, the system reinvents many of the most important ML discoveries over the past 30 years, such as linear models, gradient descent, rectified linear units, effective learning rate settings and weight initializations, and gradient normalization.

We also used meta-learning to discover a variety of new efficient architectures for object detection in both still images and videos. Last year’s work on EfficientNet for efficient image classification architectures showed significant accuracy improvements and computational cost reductions for image classification. In follow-on work this year, EfficientDet: Towards Scalable and Efficient Object Detection builds on top of the EfficientNet work to derive new efficient architectures for object detection and localization, showing remarkable improvements in both highest absolute accuracy, as well as computational cost reductions of 13-42x over previous approaches to achieve a given level of accuracy.

EfficientDet achieves state-of-the-art 52.2 mAP, up 1.5 points from the prior state of the art (not shown since it is at 3045B FLOPs) on COCO test-dev under the same setting. Under the same accuracy constraint, EfficientDet models are 4x-9x smaller and use 13x-42x less computation than previous detectors.

Our work on SpineNet describes a meta-learned architecture that can retain spatial information more effectively, allowing detection to be done at finer resolution. We also focused on learning effective architectures for a variety of video classification problems. AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures, AssembleNet++: Assembling Modality Representations via Attention Connections, and AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification demonstrate how to use evolutionary algorithms to create novel state-of-the-art video processing machine learning architectures.

This approach can also be used to develop effective model architectures for time series forecasting. Using AutoML for Time Series Forecasting describes the system that discovers new forecasting models via an automated search over a search space involving many interesting kinds of low-level building blocks, and its effectiveness was demonstrated in the Kaggle M5 Forecasting Competition, by generating an algorithm and system that placed 138th out of 5558 participants (top 2.5%). While many of the competitive forecasting models required months of manual effort to create, our AutoML solution found the model in a short time with only a moderate compute cost (500 CPUs for 2 hours) and no human intervention.

Better Understanding of ML Algorithms and Models
Deeper understanding of machine learning algorithms and models is crucial for designing and training more effective models, as well as understanding when models may fail. Last year, we focused on fundamental questions around representation power, optimization, model generalization, and label noise, among others. As mentioned earlier in this post, Transformer networks have had a huge impact on modeling language, speech and vision problems, but what is the class of functions represented by these models? Recently we showed that transformers are universal approximators for sequence-to-sequence functions. Furthermore, sparse transformers also remain universal approximators even when they use just a linear number of interactions among the tokens. We have been developing new optimization techniques based on layerwise adaptive learning rates to improve the convergence speed of transformers, e.g., Large batch optimization for deep learning (LAMB): Training BERT in 76 minutes.

As neural networks are made wider and deeper, they often train faster and generalize better. This is a core mystery in deep learning since classical learning theory suggests that large networks should overfit more. We are working to understand neural networks in this overparameterized regime. In the limit of infinite width, neural networks take on a surprisingly simple form, and are described by a Neural Network Gaussian Process (NNGP) or Neural Tangent Kernel (NTK). We studied this phenomenon theoretically and experimentally, and released Neural Tangents, an open-source software library written in JAX that allows researchers to build and train infinite-width neural networks.

Left: A schematic showing how deep neural networks induce simple input / output maps as they become infinitely wide. Right: As the width of a neural network increases, we see that the distribution of outputs over different random instantiations of the network becomes Gaussian.

As finite width networks are made larger, they also demonstrate peculiar double descent phenomena — where they generalize better, then worse, then better again with increasing width. We have shown that this phenomenon can be explained by a novel bias-variance decomposition, and further that it can sometimes manifest as triple descent.

Lastly, in real-world problems, one often needs to deal with significant label noise. For instance, in large scale learning scenarios, weakly labeled data is available in abundance with large label noise. We have developed new techniques for distilling effective supervision from severe label noise leading to state-of-the-art results. We have further analyzed the effects of training neural networks with random labels, and shown that it leads to alignment between network parameters and input data, enabling faster downstream training than initializing from scratch. We have also explored questions such as whether label smoothing or gradient clipping can mitigate label noise, leading to new insights for developing robust training techniques with noisy labels.

Algorithmic Foundations and Theory
2020 was a productive year for our work in algorithmic foundations and theory, with several impactful research publications and notable results. On the optimization front, our paper on edge-weighted online bipartite matching develops a new technique for online competitive algorithms and solves a thirty-year old open problem for the edge-weighted variant with applications in efficient online ad allocation. Along with this work in online allocation, we developed dual mirror descent techniques that generalize to a variety of models with additional diversity and fairness constraints, and published a sequence of papers on the topic of online optimization with ML advice in online scheduling, online learning and online linear optimization. Another research result gave the first improvement in 50 years on the classic bipartite matching in dense graphs. Finally, another paper solves a long-standing open problem about chasing convex bodies online — using an algorithm from The Book, no less.

We also continued our work in scalable graph mining and graph-based learning and hosted the Graph Mining & Learning at Scale Workshop at NeurIPS’20, which covered work on scalable graph algorithms including graph clustering, graph embedding, causal inference, and graph neural networks. As part of the workshop, we showed how to solve several fundamental graph problems faster, both in theory and practice, by augmenting standard synchronous computation frameworks like MapReduce with a distributed hash-table similar to a BigTable. Our extensive empirical study validates the practical relevance of the AMPC model inspired by our use of distributed hash tables in massive parallel algorithms for hierarchical clustering and connected components, and our theoretical results show how to solve many of these problems in constant distributed rounds, greatly improving upon our previous results. We also achieved exponential speedup for computing PageRank and random walks. On the graph-based learning side, we presented Grale, our framework for designing graphs for use in machine learning. Furthermore, we presented our work on more scalable graph neural network models, where we show that PageRank can be used to greatly accelerate inference in GNNs.

In market algorithms, an area at the intersection of computer science and economics, we continued our research in designing improved online marketplaces, such as measuring incentive properties of ad auctions, two-sided markets, and optimizing order statistics in ad selection. In the area of repeated auctions, we developed frameworks to make dynamic mechanisms robust against lack of forecasting or estimation errors of the current market and/or the future market, leading to provably tight low-regret dynamic mechanisms. Later, we characterized when it is possible to achieve the asymptotically optimal objective through geometry-based criteria. We also compared the equilibrium outcome of a range of budget management strategies used in practice, showed their impact on the tradeoff between revenue and buyers’ utility and shed light on their incentive properties. Additionally, we continued our research in learning optimal auction parameters, and settled the complexity of batch-learning with revenue loss. We designed the optimal regret and studied combinatorial optimization for contextual auction pricing, and developed a new active learning framework for auctions and improved the approximation for posted-price auctions. Finally, motivated by the importance of incentives in ad auctions, and in the hope to help advertisers study the impact of incentives in auctions, we introduce a data-driven metric to quantify how much a mechanism deviates from incentive compatibility.

Machine Perception
Perceiving the world around us — understanding, modeling and acting on visual, auditory and multimodal input — continues to be a research area with tremendous potential to be beneficial in our everyday lives.

In 2020, deep learning powered new approaches that bring 3D computer vision and computer graphics closer together. CvxNet, deep implicit functions for 3D shapes, neural voxel rendering and CoReNet are a few examples of this direction. Furthermore, our research on representing scenes as neural radiance fields (aka NeRF, see also this blog post) is a good example of how Google Research’s academic collaborations stimulate rapid progress in the area of neural volume rendering.

In Learning to Factorize and Relight a City, a collaboration with UC Berkeley, we proposed a learning-based framework for disentangling outdoor scenes into temporally-varying illumination and permanent scene factors. This gives the ability to change lighting effects and scene geometry for any Street View panorama, or even turn it into a full-day timelapse video.

Our work on generative human shape and articulated pose models introduces a statistical, articulated 3D human shape modeling pipeline, within a fully trainable, modular, deep learning framework. Such models enable 3D human pose and shape reconstruction of people from a single photo to better understand the scene.

Overview of end-to-end statistical 3D articulated human shape model construction in GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models.

The growing area of media compression using neural networks continued to make strong progress in 2020, not only on learned image compression, but also in deep approaches to video compression, volume compression and nice results in deep distortion-agnostic image watermarking.

Samples of encoded and cover images for Distortion Agnostic Deep Watermarking. First row: Cover image with no embedded message. Second row: Encoded image from HiDDeN combined distortion model. Third row: Encoded images from our model. Fourth row: Normalized difference of the encoded image and cover image for the HiDDeN combined model. Fifth row: Normalized difference for our model

Additional important themes in perceptual research included:

Making better use of data (e.g., self-training with noisy students, learning from simulated data, learning with noisy labels, contrastive learning)
Reasoning across modalities (e.g., exploiting cross-modal supervision, audiovisual speech enhancement, language grounding, Open Images (V6) update featuring localized narratives — multimodal annotations connecting vision and language)
Developing approaches for efficient perception, particularly on edge devices (e.g., fast sparse convnets, structured multi-hashing for model compression)
Improving the ability to represent and reason about objects and scenes (e.g., detecting 3D objects and predicting 3D shapes, 3D scene reconstruction from a single RGB image, leveraging temporal context for object detection, learning to see transparent objects and estimating their pose from stereo)
AI enabling human creativity (e.g., automatic creation of videos from web pages, intelligent video reframing, using GANs to create fantastical creatures, relighting portraits)

Engaging with the broader research community through open sourcing of solutions and datasets is another important aspect of furthering perceptual research. In 2020, we open sourced multiple new perceptual inference capabilities and solutions in MediaPipe, such as on-device face, hand and pose prediction, real-time body pose tracking, real-time iris tracking and depth estimation, and real-time 3D object detection.

We continued to make strides to improve experiences and promote helpfulness on mobile devices through ML-based solutions. Our ability to run sophisticated natural language processing on-device, enabling more natural conversational features, continues to improve. In 2020, we expanded Call Screen and launched Hold for Me to allow users to save time when performing mundane tasks, and we also launched language-based actions and language navigability of our Recorder app to aid productivity.

We have used Google’s Duplex technology to make calls to businesses and confirm things like temporary closures. This has enabled us to make 3 million updates to business information globally, that have been seen over 20 billion times on Maps and Search. We also used text to speech technology for easier access to web pages, by enabling Google Assistant to read it aloud, supporting 42 languages.

We also continued to make meaningful improvements to imaging applications. We made it easier to capture precious moments on Pixel with innovative controls and new ways to relight, edit, enhance and relive them again in Google Photos. For the Pixel camera, beginning with Pixel 4 and 4a, we added Live HDR+, which uses machine learning to approximate the vibrance and balanced exposure and appearance of HDR+ burst photography in real time in the viewfinder. We also created dual exposure controls, which allow the brightness of shadows and highlights in a scene to be adjusted independently — live in the viewfinder.

More recently, we introduced Portrait Light, a new post-capture feature for the Pixel Camera and Google Photos apps that adds a simulated directional light source to portraits. This feature is again one that is powered by machine learning, having been trained on 70 different people, photographed one light at a time, in our pretty cool 331-LED Light Stage computational illumination system.

In the past year, Google researchers were excited to contribute to many new (and timely) ways of using Google products. Here are a few examples

Enhancing learning by making it easier to get help with homework or explore concepts in 3D via augmented reality
Improving virtual meetings via in-browser background blur & replace in Google Meet.
Powering new ways to virtually try products on at home.
Helping you get to the most relevant content faster via key moments in video.
Helping you find that tune that was stuck in your head by humming it.
Helped YouTube identify potentially harmful content for human review.
Helping YouTube creators make better videos by automatically enhancing their voices and reducing background noise.

Robotics
In the area of robotics research, we’ve made tremendous progress in our ability to learn more and more complex, safe and robust robot behaviors with less and less data, using many of the RL techniques described earlier in the post.

Transporter Networks are a novel approach to learning how to represent robotic tasks as spatial displacements. Representing relations between objects and the robot end-effectors, as opposed to absolute positions in the environment, makes learning robust transformations of the workspace very efficient.

In Grounding Language in Play, we demonstrated how a robot can be taught to follow natural language instructions (in many languages!). This required a scalable approach to collecting paired data of natural language instructions and robot behaviors. One key insight is that this can be accomplished by asking robot operators to simply play with the robot, and label after-the-fact what instructions would have led to the robot accomplishing the same task.

We also explored doing away with robots altogether (by having humans use a camera-equipped grasping stick) for even more scalable data collection, and how to efficiently transfer visual representations across robotic tasks.

We investigated how to learn very agile strategies for robot locomotion, by taking inspiration from nature, using evolutionary meta-learning strategies, human demonstrations, and various approaches to training data-efficient controllers using deep reinforcement learning.

One increased emphasis this year has been on safety: how do we deploy safe delivery drones in the real world? How do we explore the world in a way that always allows the robot to recover from its mistakes? How do we certify the stability of learned behaviors? This is a critical area of research on which we expect to see increased focus in the future.

Quantum Computing
Our Quantum AI team continued its work to establish practical uses of quantum computing. We ran experimental algorithms on our Sycamore processors to simulate systems relevant to chemistry and physics. These simulations are approaching a scale at which they can not be performed on classical computers anymore, making good on Feynman’s original idea of using quantum computers as an efficient means to simulate systems in which quantum effects are important. We published new quantum algorithms, for instance to perform precise processor calibration, to show an advantage for quantum machine learning or to test quantum enhanced optimization. We also worked on programming models to make it easier to express quantum algorithms. We released qsim, an efficient simulation tool to develop and test quantum algorithms with up to 40 qubits on Google Cloud.

We continued to follow our roadmap towards building a universal error-corrected quantum computer. Our next milestone is the demonstration that quantum error correction can work in practice. To achieve this, we will show that a larger grid of qubits can hold logical information exponentially longer than a smaller grid, even though individual components such as qubits, couplers or I/O devices have imperfections. We are also particularly excited that we now have our own cleanroom which should significantly increase the speed and quality of our processor fabrication.

Supporting the Broader Developer and Researcher Community
This year marked TensorFlow’s 5th birthday, passing 160M downloads. The TensorFlow community continued its impressive growth with new special interest groups, TensorFlow User Groups, TensorFlow Certificates, AI Service partners, and inspiring demos #TFCommunitySpotlight. We significantly improved TF 2.x with seamless TPU support, out of the box performance (and best-in-class performance on MLPerf 0.7), data preprocessing, distribution strategy, and a new NumPy API.

We also added many more capabilities to the TensorFlow Ecosystem to help developers and researchers in their workflows: Sounds of India demonstrated going from research to production in under 90 days, using TFX for training and TF.js for deployment in the browser. With Mesh TensorFlow, we pushed the boundaries of model parallelism to provide ultra-high image resolution image analysis. We open-sourced the new TF runtime, TF Profiler for model performance debugging, and tools for Responsible AI, such as the Model Card Toolkit for model transparency and a privacy testing library. With TensorBoard.dev we made it possible to easily host, track, and share your ML experiments for free.

In addition, we redoubled our investment in JAX, an open-source, research-focused ML system that has been actively developed over the past two years. Researchers at Google and beyond are now using JAX in a wide range of fields, including differential privacy, neural rendering, physics-informed networks, fast attention, molecular dynamics, tensor networks, neural tangent kernels, and neural ODEs. JAX accelerates research at DeepMind, powering a growing ecosystem of libraries and work on GANs, meta-gradients, reinforcement learning, and more. We also used JAX and the Flax neural network library to build record-setting MLPerf benchmark submissions, which we demonstrated live at NeurIPS on a large TPU Pod slice with a next-generation Cloud TPU user experience (slides, video, sign-up form). Finally, we’re ensuring that JAX works seamlessly with TF ecosystem tooling, from TF.data for data preprocessing and TensorBoard for experiment visualization to the TF Profiler for performance debugging, with more to come in 2021.

Many recent research breakthroughs have been enabled by increased computing power, and we make more than 500 petaflops of Cloud TPU computing power available for free to researchers around the world via the TFRC program to help broaden access to the machine learning research frontier. More than 120 TFRC-supported papers have been published to date, many of which would not have been possible without the computing resources that the program provides. For example, TFRC researchers have recently developed simulations of wildfire spread, helped analyze COVID-19 content and vaccine sentiment changes on social media networks, and advanced our collective understanding of the lottery ticket hypothesis and neural network pruning. Members of the TFRC community have also published experiments with Persian poetry, won a Kaggle contest on fine-grained fashion image segmentation, and shared tutorials and open-source tools as starting points for others. In 2021, we will change the name of the TFRC program to the TPU Research Cloud program to be more inclusive now that Cloud TPUs support JAX and PyTorch in addition to TensorFlow.

Finally, this was a huge year for Colab. Usage doubled, and we launched productivity features to help people do their work more efficiently, including improved Drive integration and access to the Colab VM via the terminal. And we launched Colab Pro to enable users to access faster GPUs, longer runtimes and more memory.

Open Datasets and Dataset Search
Open datasets with clear and measurable goals are often very helpful in driving forward the field of machine learning. To help the research community find interesting datasets, we continue to index a wide variety of open datasets sourced from many different organizations with Google Dataset Search. We also think it’s important to create new datasets for the community to explore and to develop new techniques, while ensuring that we share open data responsibly. This year, in addition to open datasets to help address the COVID crisis, we released a number of open datasets across many different areas:

An Analysis of Online Datasets Using Dataset Search (Published, in Part, as a Dataset): a meta dataset about datasets!
Google Compute Cluster Trace Data: in 2011, Google published a trace of 29 days of compute activity on one of our compute clusters, which has proven very useful for the computer systems community to explore job scheduling policies, better understand utilization in these clusters, etc. This year we published a larger and more extensive version of this data, covering eight of our compute clusters with much finer-grained information.
Announcing the Objectron Dataset: a collection of 15,000 short, object-centric video clips annotated with 3-D bounding boxes, capturing a larger set of common objects from different angles, as well as 4M annotated images collected from a geo-diverse sample (covering 10 countries across five continents).
Open Images V6 — Now Featuring Localized Narratives: in addition to the 9M images annotated with 36M image-level labels, 15.8M bounding boxes, 2.8M instance segmentations, and 391k visual relationships found in version 5, this new release adds localized narratives, a completely new form of multimodal annotations that consist of synchronized voice, text, and mouse traces over the objects being described. In Open Images V6, these localized narratives are available for 500k of its images. Additionally, in order to facilitate comparison to previous works, we also release localized narratives annotations for the full 123k images of the COCO dataset.
A Challenge and Workshop in Efficient Open-Domain Question Answering, created in collaboration with researchers from University of Washington and Princeton, aims to challenge researchers to create systems capable of answering arbitrary questions. A technical report describes the competition and workshop that happened at NeurIPS 2020.
TyDi QA: A Multilingual Question Answering Benchmark discusses a new benchmark for measuring effectiveness of multilingual question answering (since many benchmarks in this area are English-only or otherwise monolingual, and we feel answering questions in any language is important).
Wiki-40B: Multilingual Language Model Dataset is a new multilingual language model benchmark that is composed of 40+ languages spanning several scripts and linguistic families. With around 40 billion characters, we hope this new resource will accelerate the research of multilingual modeling. We also released high quality trained language models trained on this dataset, enabling easy comparison of different techniques with these baselines.
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization can help assess progress on cross-lingual generalizations in a multi-task setting.
How to Ask Better Questions? A Large-Scale Multi-Domain Dataset for Rewriting Ill-Formed Questions presents a dataset of 427,719 question pairs across 303 domains that can be used to train models to rewrite poorly-formed questions into better formulations.
Open-Sourcing Big Transfer (BiT): Exploring Large-Scale Pre-training for Computer Vision describes an open-sourced pre-trained model that can be used as a starting point for a wide variety of image-related tasks.
The 2020 Image Matching Benchmark and Challenge is a dataset and benchmark challenge for the problem of capturing 3D structure from motion (either from videos, or from many images of a scene from many different angles), created in collaboration with University of Victoria, Czech Technical University, and EPFL.
Meta-Dataset: A Dataset of Datasets for Few-Shot Learning is a dataset of datasets. One of the long-term goals in ML is to build systems that can generalize not from one example to another within the same task, but can generalize even across tasks to solve new problems with little or no training. This meta-dataset can allow us to measure progress towards this ultimate goal.
Google Landmarks Dataset v2 – A Large-Scale Benchmark for Instance-Level Recognition and Retrieval introduces the Google Landmarks Dataset v2 (GLDv2), a new benchmark for large-scale, fine-grained instance recognition and image retrieval in the domain of human-made and natural landmarks. GLDv2 is the largest such dataset to date by a large margin, including over 5M images and 200k distinct instance labels. Its test set consists of 118k images with ground truth annotations for both the retrieval and recognition tasks.
Enhancing the Research Community’s Access to Street View Panoramas for Language Grounding Tasks describes a new open dataset to allow researchers to compare techniques for language-grounded navigation tasks or other tasks that rely on Street View panoramas, using a well-known set of data so that comparisons between different techniques can be done more easily.

Research Community Interaction
We are proud to enthusiastically support and participate in the broader research community. In 2020, Google researchers presented over 500 papers at leading research conferences, additionally serving on program committees, organizing workshops, tutorials and numerous other activities aimed at collectively progressing the state of the art in the field. To learn more about our contributions to some of the larger research conferences this year, please see our blog posts for ICLR 2020, CVPR 2020, ACL 2020, ICML 2020, ECCV 2020 and NeurIPS 2020.

In 2020 we supported external research with $37M in funding, including $8.5M in COVID research, $8M in research inclusion and equity, and $2M in responsible AI research. In February, we announced the 2019 Google Faculty Research Award Recipients, funding research proposals from 150 faculty members throughout the world. Among this group, 27% self-identified as members of historically underrepresented groups within technology. We also announced a new Research Scholar Program to support early-career professors who are pursuing research in fields relevant to Google via unrestricted gifts. As we have for more than a decade, we selected a group of incredibly talented PhD student researchers to receive Google PhD Fellowships, which provides funding for graduate studies, as well as mentorship as they pursue their research, and opportunities to interact with other Google PhD Fellows.

We are also expanding the ways that we support inclusion and bring new voices into the field of computer science. In 2020, we created a new Award for Inclusion Research program that supports academic research in computing and technology addressing the needs of underrepresented populations. In the inaugural set of awards, we selected 16 proposals for funding with 25 principal investigators, focused on topics around diversity and inclusion, algorithmic bias, education innovation, health tools, accessibility, gender bias, AI for social good, security, and social justice. We additionally partnered with the Computing Alliance of Hispanic-Serving Institutions (CAHSI) and the CMD-IT Diversifying Future Leadership in the Professoriate Alliance (FLIP) to create an award program for doctoral students from traditionally underrepresented backgrounds to support the last year of the completion of the dissertation requirements.

In 2019, Google’s CS Research Mentorship Program (CSRMP) helped provide mentoring to 37 undergraduate students to introduce them to conducting computer science research. Based on the success of the program in 2019/2020, we’re excited to greatly expand this program in 2020/2021 and will have hundreds of Google researchers mentoring hundreds of undergraduate students in order to encourage more people from underrepresented backgrounds to pursue computer science research careers. Finally, in October we provided exploreCSR awards to 50 institutions around the world for the 2020 academic year. These awards fund faculty to host workshops for undergraduates from underrepresented groups in order to encourage them to pursue CS research.

Looking Forward to 2021 and Beyond
I’m excited about what’s to come, from our technical work on next-generation AI models, to the very human work of growing our community of researchers.

We’ll keep ensuring our research is done responsibly and has a positive impact, using our AI Principles as a guiding framework and applying particular scrutiny to topics that can have broad societal impact. This post covers just a few of the many papers on responsible AI that Google published in the past year. While pursuing our research, we’ll focus on:

Promoting research integrity: We’ll make sure Google keeps conducting a wide range of research in an appropriate manner, and provides comprehensive, scientific views on a variety of challenging, interesting topics.
Responsible AI development: Tackling tough topics will remain core to our work, and Google will continue creating new ML algorithms to make machine learning more efficient and accessible, developing approaches to combat unfair bias in language models, devising new techniques for ensuring privacy in learning systems, and much more. And importantly, beyond looking at AI development with a suitably critical eye, we’re eager to see what techniques we and others in the community can develop to mitigate risks and make sure new technologies have equitable, positive impacts on society.
Advancing diversity, equity, and inclusion: We care deeply that the people who are building influential products and computing systems better reflect the people using these products all around the world. Our efforts here are both within Google Research, as well as within the wider research and academic communities — we’ll be calling upon the academic and industry partners we work with to advance these efforts together. On a personal level, I am deeply committed to improving representation in computer science, having spent hundreds of hours working towards these goals over the last few years, as well as supporting universities like Berkeley, CMU, Cornell, Georgia Tech, Howard, UW, and numerous other organizations that work to advance inclusiveness. This is important to me, to Google, and to the broader computer science community.

Finally, looking ahead to the year, I’m particularly enthusiastic about the possibilities of building more general-purpose machine learning models that can handle a variety of modalities and that can automatically learn to accomplish new tasks with very few training examples. Advances in this area will empower people with dramatically more capable products, bringing better translation, speech recognition, language understanding and creative tools to billions of people all around the world. This kind of exploration and impact is what keeps us excited about our work!

Acknowledgements
Thanks to Martin Abadi, Marc Bellemare, Elie Bursztein, Zhifeng Chen, Ed Chi, Charina Chou, Katherine Chou, Eli Collins, Greg Corrado, Corinna Cortes, Tiffany Deng, Tulsee Doshi, Robin Dua, Kemal El Moujahid, Aleksandra Faust, Orhan Firat, Jen Gennai, Till Hennig, Ben Hutchinson, Alex Ingerman, Tomáš Ižo, Matthew Johnson, Been Kim, Sanjiv Kumar, Yul Kwon, Steve Langdon, James Laudon, Quoc Le, Yossi Matias, Brendan McMahan, Aranyak Mehta, Vahab Mirrokni, Meg Mitchell, Hartmut Neven, Mohammad Norouzi, Timothy Novikoff, Michael Piatek, Florence Poirel, David Salesin, Nithya Sambasivan, Navin Sarma, Tom Small, Jascha Sohl-Dickstein, Zak Stone, Rahul Sukthankar, Mukund Sundararajan, Andreas Terzis, Sergei Vassilvitskii, Vincent Vanhoucke, and Leslie Yeh and others for helpful feedback and for drafting portions of this post, and to the entire Research and Health communities at Google for everyone’s contributions towards this work.

NVIDIA Introduces GeForce RTX 30 Series Laptops, RTX 3060 Graphics Cards, New RTX Games & Features in Special Event

Bringing more gaming capabilities to millions more gamers, NVIDIA on Tuesday announced more than 70 new laptops will feature GeForce RTX 30 Series Laptop GPUs and unveiled the NVIDIA GeForce RTX 3060 graphics card for desktops.

All are powered by the award-winning NVIDIA Ampere GPU architecture, the second generation of RTX with enhanced Ray Tracing Cores, Tensor Cores, and new streaming multiprocessors.

The announcements were among the highlights of a streamed presentation from Jeff Fisher, senior vice president of NVIDIA’s GeForce business.

Amid the unprecedented challenges of 2020, “millions of people tuned into gaming — to play, create and connect with one another,” Fisher said. “More than ever, gaming has become an integral part of our lives.” Among the stats he cited:

Steam saw its number of concurrent users more than double from 2018
Discord, a messaging and social networking service most popular with gamers, has seen monthly active users triple to 140 million from two years ago
In 2020 alone, more than 100 billion hours of gaming content have been watched on YouTube
Also in 2020, viewership of esports reached half a billion people

Meanwhile, NVIDIA has been delivering a series of major gaming advancements, Fisher explained.

RTX ‘the New Standard’

Two years ago, NVIDIA introduced a breakthrough in graphics real-time ray tracing and AI-based DLSS (deep learning super sampling), together called RTX, he said.

NVIDIA quickly partnered with Microsoft and top developers and game engines to bring the visual realism of movies to fully interactive gaming, Fisher said.

In fact, 36 games are now powered by RTX. They include the #1 Battle Royale game, the #1 RPG, the #1 MMO and the #1 best-selling game of all time – Minecraft.

Now, we’re announcing more games that support RTX technology, including DLSS, which is coming to both Call of Duty: Warzone and Square Enix’s new IP, Outriders. And Five Nights at Freddy’s: Security Breach and F.I.S.T.: Forged in Shadow Torch will be adding ray tracing and DLSS.

For more details, read our January 2021 RTX Games article.

“The momentum is unstoppable,” Fisher said. “As consoles and the rest of the ecosystem are now onboard — ray tracing is the new standard.

last year, NVIDIA launched its second generation of RTX, the GeForce RTX 30 Series GPUs. Based on the NVIDIA Ampere architecture, it represents “our biggest generational leap ever,” Fisher said.

NVIDIA built NVIDIA Reflex to deliver the lowest system latency for competitive gamers – from mouse click to display. Since Reflex’s launch in September, a dozen games have added support.

Fisher announced that Overwatch and Rainbow Six Siege are also adopting NVIDIA Reflex. Now, 7 of the top 10 competitive shooters support Reflex.

And over the past four months, NVIDIA has launched four NVIDIA Ampere architecture-powered graphics cards, from the ultimate BFGPU — the GeForce RTX 3090 priced at $1,499 — to the GeForce RTX 3060 Ti at $399.

“Ampere has been our fastest selling architecture ever, selling almost twice as much as our prior generation,” Fisher said.

GeForce RTX 3060: An NVIDIA Ampere GPU for Every Gamer

With gaming now a key part of global culture, the new GeForce RTX 3060 brings the power of the NVIDIA Ampere architecture to every gamer, Fisher said.

“The RTX 3060 offers twice the raster performance of the GTX 1060 and 10x the ray-tracing performance,” Fisher said, noting that the GTX 1060 is the world’s most popular GPU. “The RTX 3060 powers the latest games with RTX On at 60 frames per second.”

The RTX 3060 has 13 shader teraflops, 25 RT teraflops for ray tracing, and 101 tensor teraflops to power DLSS, an NVIDIA technology introduced in 2019 that uses AI to accelerate games. And it boasts 12 gigabytes of GDDR6 memory.

“With most of the installed base underpowered for the latest games, we’re bringing RTX to every gamer with the GeForce RTX 3060,” Fisher said.

The GeForce RTX 3060 starts at just $329 and will be available worldwide in late February.

“Amazing Gaming Doesn’t End at the Desktop”

NVIDIA also announced a new generation of NVIDIA Ampere architecture-powered laptop GPUs.

Laptops, Fisher explained, are the fastest-growing gaming platform. There are now 50 million gaming laptops, which powered over 14 billion gaming hours last year.

“Amazing gaming doesn’t end at the desktop,” Fisher said.

These high-performance machines also meet the needs of 45 million creators and everyone working and studying from home, Fisher said.

The new generation of NVIDIA Ampere architecture-powered laptops, with second-generation RTX and third-generation Max-Q technologies, deliver twice the power efficiency of previous generations.

Efficiency Is Paramount in Laptops

That’s why NVIDIA introduced Max-Q four years ago, Fisher explained.

Max-Q is a system design approach that delivers high performance in thin and light gaming laptops.

“It has fundamentally changed how laptops are built, every aspect — the CPU GPU, software, PCB design, power delivery, thermals — are optimized for power and performance,” Fisher said.

NVIDIA’s third-gen Max-Q technologies use AI and new system optimizations to make high-performance gaming laptops faster and better than ever, he said.

Fisher introduced Dynamic Boost 2.0, which for the first time uses AI to shift power between the CPU, GPU and now, GPU memory.

“So your laptop is constantly optimizing for maximum performance,” Fisher said.

Fisher also introduced WhisperMode 2.0, which delivers a new level of acoustic control for gaming laptops.

Pick your desired acoustics and WhisperMode 2.0’s AI-powered algorithms manage the CPU, GPU system temperature and fan speeds to “deliver great acoustics at the best possible performance,” Fisher explained.

Another new feature, Resizable BAR, uses the advanced capabilities of PCI Express to boost gaming performance.

Games use GPU memory for textures, shaders and geometry — constantly updating as the player moves through the world.

Today, only part of the GPU’s memory can be accessed at any one time by the CPU, requiring many memory updates, Fisher explained.

With Resizable BAR, the game can access the entire GPU memory, allowing for multiple updates at the same time, improving performance, Fisher said.

Resizable BAR will also be supported on GeForce RTX 30 Series graphics cards for desktops, starting with the GeForce RTX 3060. NVIDIA and GPU partners are readying VBIOS updates for existing GeForce RTX 30 series graphics cards starting in March.

Finally, NVIDIA DLSS offers a breakthrough for gaming laptops. It uses AI and RTX Tensor Cores to deliver up to 2x to performance in the same power envelope.

World’s Fastest Laptops for Gamers and Creators

Starting at $999, RTX 3060 laptops are “faster than anything on the market today,” Fisher said.

They’re 30 percent faster than the PlayStation 5 and deliver 90 frames per second on the latest games at ultra settings 1080p, Fisher said.

Starting at $1,299, GeForce RTX 3070 laptops are “a 1440p gaming beast.”

Boasting twice the pixels of 1080p, this new generation of laptops “provides the perfect mix of high-fidelity graphics and great performance.”

And starting at $1,999, GeForce RTX 3080 laptops will come with up to 16 gigabytes of GDDR6 memory.

They’re “the world’s fastest laptop for gamers and creators,” Fisher said, delivering hundreds of frames per second with RTX on.

As a result, laptop gamers will be able to play at 240 frames per second, across top titles like Overwatch, Rainbow Six Siege, Valorant and Fortnite, Fisher said.

Availability

Manufacturers worldwide, starting Jan. 26, will begin shipping over 70 different GeForce RTX gaming and creator laptops featuring GeForce RTX 3080 and GeForce RTX 3070 laptop GPUs, followed by GeForce RTX 3060 laptop GPUs on Feb. 2.

The GeForce RTX 3060 graphics card will be available in late February, starting at $329, as custom boards — including stock-clocked and factory-overclocked models — from top add-in card providers such as ASUS, Colorful, EVGA, Gainward, Galaxy, Gigabyte, Innovision 3D, MSI, Palit, PNY and Zotac.

Look for GeForce RTX 3060 GPUs at major retailers and etailers, as well as in gaming systems by major manufacturers and leading system builders worldwide.

“RTX is the new standard, and the momentum continues to grow,” Fisher said.

The post NVIDIA Introduces GeForce RTX 30 Series Laptops, RTX 3060 Graphics Cards, New RTX Games & Features in Special Event appeared first on The Official NVIDIA Blog.

Premium Vehicles Inside and Out

Intelligence in Motion

Business problem

Study

Methodology

Amazon SageMaker

Dataset

Model and results

Summary

About the Authors

Prerequisites

Credit card fraud detection

Model metrics

Autopilot outputs

Predicting bankruptcy of Polish companies

Cleaning up

Conclusion

References

About the Authors

Architecture

Cost

Launching the solution

Configuring algorithm parameters and the agent’s neural network model

Distributed training with multiple instances

Summary

About the Authors

The Gaming Market Set New Records (Again)

The Thrill of Victory

New Hardware Enables Better Games

Competitive Gaming Takes Center Stage

You Can Take It With You

More to Come

RTX ‘the New Standard’

GeForce RTX 3060: An NVIDIA Ampere GPU for Every Gamer

“Amazing Gaming Doesn’t End at the Desktop”

Efficiency Is Paramount in Laptops

World’s Fastest Laptops for Gamers and Creators

Availability

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.