Recommend top trending items to your users using the new Amazon Personalize recipe

Recommend top trending items to your users using the new Amazon Personalize recipe

Amazon Personalize is excited to announce the new Trending-Now recipe to help you recommend items gaining popularity at the fastest pace among your users.

Amazon Personalize is a fully managed machine learning (ML) service that makes it easy for developers to deliver personalized experiences to their users. It enables you to improve customer engagement by powering personalized product and content recommendations in websites, applications, and targeted marketing campaigns. You can get started without any prior ML experience, using APIs to easily build sophisticated personalization capabilities in a few clicks. All your data is encrypted to be private and secure, and is only used to create recommendations for your users.

User interests can change based on a variety of factors, such as external events or the interests of other users. It’s critical for websites and apps to tailor their recommendations to these changing interests to improve user engagement. With Trending-Now, you can surface items from your catalog that are rising in popularity with higher velocity than other items, such as trending news, popular social content, or newly released movies. Amazon Personalize looks for items that are rising in popularity at a faster rate than other catalog items to help users discover items that are engaging their peers. Amazon Personalize also allows you to define the time periods over which trends are calculated depending on their unique business context, with options for every 30 minutes, 1 hour, 3 hours, or 1 day, based on the most recent interactions data from users.

In this post, we show how to use this new recipe to recommend top trending items to your users.

Solution overview

Trending-Now identifies the top trending items by calculating the increase in interactions that each item has over configurable intervals of time. The items with the highest rate of increase are considered trending items. The time is based on timestamp data in your interactions dataset. You can specify the time interval by providing a trend discovery frequency when you create your solution.

The Trending-Now recipe requires an interactions dataset, which contains a record of the individual user and item events (such as clicks, watches, or purchases) on your website or app along with the event timestamps. You can use the parameter Trend discovery frequency to define the time intervals over which trends are calculated and refreshed. For example, if you have a high traffic website with rapidly changing trends, you can specify 30 minutes as the trend discovery frequency. Every 30 minutes, Amazon Personalize looks at the interactions that have been ingested successfully and refreshes the trending items. This recipe also allows you to capture and surface any new content that has been introduced in the last 30 minutes and has seen a higher degree of interest from your user base than any preexisting catalog items. For any parameter values that are greater than 2 hours, Amazon Personalize automatically refreshes the trending item recommendations every 2 hours to account for new interactions and new items.

Datasets that have low traffic but use a 30-minute value can see poor recommendation accuracy due to sparse or missing interactions data. The Trending-Now recipe requires that you provide interaction data for at least two past time periods (this time period is your desired trend discovery frequency). If interaction data doesn’t exist for the last 2 time periods, Amazon Personalize will replace the trending items with popular items until the required minimum data is available.

The Trending-Now recipe is available for both custom dataset groups as well as video-on-demand domain dataset groups. In this post, we demonstrate how to tailor your recommendations for the fast-changing trends in user interest with this new Trending-Now feature for a media use case with a custom dataset group. The following diagram illustrates the solution workflow.

solution workflow.

For example, in video-on-demand applications, you can use this feature to show what movies are trending in the last 1 hour by specifying 1 hour for your trend discovery frequency. For every 1 hour of data, Amazon Personalize identifies the items with the greatest rate of increase in interactions since the last evaluation. Available frequencies include 30 minutes, 1 hour, 3 hours, and 1 day.

Prerequisites

To use the Trending-Now recipe, you first need to set up Amazon Personalize resources on the Amazon Personalize console. Create your dataset group, import your data, train a solution version, and deploy a campaign. For full instructions, see Getting started.

For this post, we have followed the console approach to deploy a campaign using the new Trending-Now recipe. Alternatively, you can build the entire solution using the SDK approach with this provided notebook. For both approaches, we use the MovieLens public dataset.

Prepare the dataset

Complete the following steps to prepare your dataset:

  1. Create a dataset group.
  2. Create an interactions dataset using the following schema:
    { "type": "record", "name": "Interactions", "namespace": "com.amazonaws.personalize.schema", "fields": [ { "name": "USER_ID", "type": "string" }, { "name": "ITEM_ID", "type": "string" }, { "name": "TIMESTAMP", "type": "long" } ], "version": "1.0" }

  3. Import the interactions data to Amazon Personalize from Amazon Simple Storage Service (Amazon S3).

For the interactions data, we use ratings history from the movies review dataset, MovieLens.

Please use below python code to curate interactions dataset from the MovieLens public dataset.

import pandas as pd
import time
import datetime

data_dir = "blog_data"
!mkdir $data_dir
!cd $data_dir && wget http://files.grouplens.org/datasets/movielens/ml-25m.zip
!cd $data_dir && unzip ml-25m.zip
dataset_dir = data_dir + "/ml-25m/"

interactions_df = pd.read_csv(dataset_dir + '/ratings.csv')
interactions_df.drop(columns=['rating'], axis=1, inplace=True)
interactions_df = interactions_df.rename(columns = {'userId':'USER_ID', 'movieId':'ITEM_ID', 'timestamp':'TIMESTAMP'})
interactions_file = 'curated_interactions_training_data.csv'
interactions_df.to_csv(interactions_file, index=False)

The MovieLens dataset contains the user_id, rating, item_id, interactions between the users and items, and the time this interaction took place (a timestamp, which is given as UNIX epoch time). The dataset also contains movie title information to map the movie ID to the actual title and genres. The following table is a sample of the dataset.

USER_ID ITEM_ID TIMESTAMP TITLE GENRES
116927 1101 1105210919 Top Gun (1986) Action|Romance
158267 719 974847063 Multiplicity (1996) Comedy
55098 186871 1526204585 Heal (2017) Documentary
159290 59315 1485663555 Iron Man (2008) Action|Adventure|Sci-Fi
108844 34319 1428229516 Island, The (2005) Action|Sci-Fi|Thriller
85390 2916 953264936 Total Recall (1990) Action|Adventure|Sci-Fi|Thriller
103930 18 839915700 Four Rooms (1995) Comedy
104176 1735 985295513 Great Expectations (1998) Drama|Romance
97523 1304 1158428003 Butch Cassidy and the Sundance Kid (1969) Action|Western
87619 6365 1066077797 Matrix Reloaded, The (2003) Action|Adventure|Sci-Fi|Thriller|IMAX

The curated dataset includes USER_ID, ITEM_ID (movie ID), and TIMESTAMP to train the Amazon Personalize model. These are the mandatory required fields to train a model with the Trending-Now recipe. The following table is a sample of the curated dataset.

USER_ID ITEM_ID TIMESTAMP
48953 529 841223587
23069 1748 1092352526
117521 26285 1231959564
18774 457 848840461
58018 179819 1515032190
9685 79132 1462582799
41304 6650 1516310539
152634 2560 1113843031
57332 3387 986506413
12857 6787 1356651687

Train a model

After the dataset import job is complete, you’re ready to train your model.

  1. On the Solutions tab, choose Create solution.
  2. Choose the new aws-trending-now recipe.
  3. In the Advanced configuration section, set Trend discovery frequency to 30 minutes.
  4. Choose Create solution to start training.
    Create Solution

Create a campaign

In Amazon Personalize, you use a campaign to make recommendations for your users. In this step, you create a campaign using the solution you created in the previous step and get the Trending-Now recommendations:

  1. On the Campaigns tab, choose Create campaign.
  2. For Campaign name, enter a name.
  3. For Solution, choose the solution trending-now-solution.
  4. For Solution version ID, choose the solution version that uses the aws-trending-now recipe.
  5. For Minimum provisioned transactions per second, leave it at the default value.
  6. Choose Create campaign to start creating your campaign.
    Create new campaign

Get recommendations

After you create or update your campaign, you can get a recommended list of items that are trending, sorted from highest to lowest. On the campaign (trending-now-campaign) Personalization API tab, choose Get recommendations.

Get recommendations

The following screenshot shows the campaign detail page with results from a GetRecommendations call that includes the recommended items and the recommendation ID.

campaign detail page with results

The results from the GetRecommendations call includes the IDs of recommended items. The following table is a sample after mapping the IDs to the actual movie titles for readability. The code to perform the mapping is provided in the attached notebook.

ITEM_ID TITLE
356 Forrest Gump (1994)
318 Shawshank Redemption, The (1994)
58559 Dark Knight, The (2008)
33794 Batman Begins (2005)
44191 V for Vendetta (2006)
48516 Departed, The (2006)
195159 Spider-Man: Into the Spider-Verse (2018)
122914 Avengers: Infinity War – Part II (2019)
91974 Underworld: Awakening (2012)
204698 Joker (2019)

Get trending recommendations

After you create a solution version using the aws-trending-now recipe, Amazon Personalize will identify the top trending items by calculating the increase in interactions that each item has over configurable intervals of time. The items with the highest rate of increase are considered trending items. The time is based on timestamp data in your interactions dataset.

Now let’s provide the latest interactions to Amazon Personalize to calculate the trending items. We can provide the latest interactions using real-time ingestion by creating an event tracker or through a bulk data upload with a dataset import job in incremental mode. In the notebook, we have provided sample code to individually import the latest real-time interactions data into Amazon Personalize using the event tracker.

For this post we will provide the latest interactions as a bulk data upload with a dataset import job in incremental mode. Please use below python code to generate dummy incremental interactions and upload the incremental interactions data using a dataset import job.

import pandas as pd
import time
import datetime

#Selecting some random USER_ID’s for generating incremental interactions.
users_list = ['20371','63409','54535','119138','58953','82982','19044','139171','98598','23822','112012','121380','2660','46948','5656','68919','152414','31234','88240','40395','49296','80280','150179','138474','124489','145218','141810','82607']
#Selecting some random ITEM_ID’s for generating incremental interactions.
items_list = [ '153','2459','1792','3948','2363','260','61248','6539','2407','8961']

time_epoch = int(time.time())
time_epoch = time_epoch-3600
inc_df = pd.DataFrame(columns=["USER_ID","ITEM_ID","TIMESTAMP"])

i=0
for j in range(0,10):
    for k in users_list:
        for l in items_list:
            time_epoch = time_epoch+1
            list_row = [str(k),str(l),time_epoch]
            inc_df.loc[i] = list_row
            i=i+1

incremental_file = 'interactions_incremental_data.csv'
inc_df.to_csv(incremental_file, index=False)

We have synthetically generated these interactions by randomly selecting a few values for USER_ID and ITEM_ID, and generating interactions between those users and items with latest timestamps. The following table contains the randomly selected ITEM_ID values that are used for generating incremental interactions.

ITEM_ID TITLE
153 Batman Forever (1995)
260 Star Wars: Episode IV – A New Hope (1977)
1792 U.S. Marshals (1998)
2363 Godzilla (Gojira) (1954)
2407 Cocoon (1985)
2459 Texas Chainsaw Massacre, The (1974)
3948 Meet the Parents (2000)
6539 Pirates of the Caribbean: The Curse of the Bla…
8961 Incredibles, The (2004)
61248 Death Race (2008)

Upload the incremental interactions data by selecting Append to current dataset (or use incremental mode if using APIs), as shown in the following snapshot.

Upload the incremental interactions data by selecting Append to current dataset (or use incremental mode if using APIs),

After the import job of incremental interactions dataset is complete, wait for the length of the trend discovery frequency time that you configured for the new recommendations to get reflected.

Choose Get recommendations on the campaign API page to get the latest recommended list of items that are trending.

Now we see the latest list of recommended items. The following table contains the data after mapping the IDs to the actual movie titles for readability. The code to perform the mapping is provided in the attached notebook.

ITEM_ID TITLE
260 Star Wars: Episode IV – A New Hope (1977)
6539 Pirates of the Caribbean: The Curse of the Bla…
153 Batman Forever (1995)
3948 Meet the Parents (2000)
1792 U.S. Marshals (1998)
2459 Texas Chainsaw Massacre, The (1974)
2363 Godzilla (Gojira) (1954)
61248 Death Race (2008)
8961 Incredibles, The (2004)
2407 Cocoon (1985)

The preceding GetRecommendations call includes the IDs of recommended items. Now we see the ITEM_ID values recommended are from the incremental interactions dataset that we had provided to the Amazon Personalize model. This is not surprising because these are the only items that gained interactions in the most recent 30 minutes from our synthetic dataset.

You have now successfully trained a Trending-Now model to generate item recommendations that are becoming popular with your users and tailor the recommendations according to user interest. Going forward, you can adapt this code to create other recommenders.

You can also use filters along with the Trending-Now recipe to differentiate the trends between different types of content, like long vs. short videos, or apply promotional filters to explicitly recommend specific items based on rules that align with your business goals.

Clean up

Make sure you clean up any unused resources you created in your account while following the steps outlined in this post. You can delete filters, recommenders, datasets, and dataset groups via the AWS Management Console or using the Python SDK.

Summary

The new aws-trending-now recipe from Amazon Personalize helps you identify the items that are rapidly becoming popular with your users and tailor your recommendations for the fast-changing trends in user interest.

For more information about Amazon Personalize, see the Amazon Personalize Developer Guide.


About the authors

Vamshi Krishna Enabothala is a Sr. Applied AI Specialist Architect at AWS. He works with customers from different sectors to accelerate high-impact data, analytics, and machine learning initiatives. He is passionate about recommendation systems, NLP, and computer vision areas in AI and ML. Outside of work, Vamshi is an RC enthusiast, building RC equipment (planes, cars, and drones), and also enjoys gardening.

Anchit Gupta is a Senior Product Manager for Amazon Personalize. She focuses on delivering products that make it easier to build machine learning solutions. In her spare time, she enjoys cooking, playing board/card games, and reading.

Abhishek Mangal is a Software Engineer for Amazon Personalize and works on architecting software systems to serve customers at scale. In his spare time, he likes to watch anime and believes ‘One Piece’ is the greatest piece of story-telling in recent history.

Read More

AI and the Future of Health

AI and the Future of Health

AI and the future of health - female doctor reviewing tablet

The emergence of increasingly capable large-scale AI models, such as the recently released GPT-4, is one of the most significant advances in computing in decades. These innovations are rapidly transforming every aspect of the value we get from technology, as demonstrated through Microsoft’s integration of GPT-4 into Bing, Edge, Microsoft 365, Power Platform, GitHub, and other offerings. More recently, Nuance has announced DAX Express, which uses a unique combination of conversational, ambient, and generative AI to automatically draft clinical notes after patient visits – helping to reduce care providers’ cognitive burdens and increase the joy of practicing medicine (whilst releasing time for care).

We are at an inflection point for the use of AI in healthcare – one of society’s most critical sectors. The significance of this moment is reflected in Peter Lee’s recent article in the New England Journal of Medicine on the potential future clinical applications of GPT-4. At Microsoft Research’s Health Futures organization, the multidisciplinary group dedicated to discovery in this space, we see this as the continuation of a journey, and a major milestone in the long process of innovating to help address the greatest challenges in healthcare.

In this blog, we will share some of our research team’s work to make healthcare more data-driven, predictive, and precise – ultimately, empowering every person on the planet to live a healthier future.

Enabling precision medicine and connected care

We are today at a unique moment in history where medicine, biology, and technology are converging on a large scale. This presents immense possibilities to revolutionize healthcare and the practice of medicine with the aid of trustworthy AI. While we embrace the potential of AI, we understand that the practice of medicine is an intricate balance of “art” and “science.” We recognize and honor the enduring physician-patient relationship, which is fundamental and timeless. Our diverse team comprises researchers, scientists, engineers, biotechnologists, designers, social scientists, strategists, healthcare experts, and medical professionals who collaborate globally and inclusively to reimagine and transform the lives of the patients and public we serve.

As we consider how technologies have shaped the practice of medicine over the centuries, from the individual to the ecosystem level, we are reminded that no technology exists in a vacuum. Our core understanding of biological systems is rapidly evolving, and with it, our understanding of what technologies are relevant and useful. Simultaneously, the use of technology across the health and life science industries, and the way healthcare is delivered, are also rapidly changing – reshaping our traditional healthcare delivery model from one of diagnosis and treatment, to one that prioritizes prevention and precise individualized care.

Spotlight: On-Demand EVENT

Microsoft Research Summit 2022

On-Demand
Watch now to learn about some of the most pressing questions facing our research community and listen in on conversations with 120+ researchers around how to ensure new technologies have the broadest possible benefit for humanity.

Recent advancements in machine learning and AI have fueled computational technologies that allow us to aggregate complex inputs from multiple data sources, with the potential to derive rich insights that rapidly expand our knowledge base and drive deeper discovery and faster innovation. At the same time, it remains an open question how to best use and regulate these technologies in real-world settings and at scale across healthcare and the life sciences. Nonetheless, we believe that we are on a path to delivering on the goal of precision medicine – a change in clinical practice which will be enabled by precision diagnostics, precision therapeutics, and connected care technologies.

To achieve this goal, we seek to collaborate with health and life sciences organizations with a similar appetite for transformation, complementary expertise, and a commitment to propel the change required. We are also engaged with the broader community in pursuing responsible and ethical use of AI in healthcare. Our diverse team has been successful in bridging the gap between the fields of medicine, biology and chemistry on one hand, and computing on the other. We act as “translators” between these fields, and through a process of ongoing collaboration and feedback, we have discovered new challenges and innovative solutions.

Below are some examples of our collaborative research approach:

Exploring diagnostic tools from new modalities

Multimodal foundation models for medicine: an example from radiology

The field of biomedicine involves a great deal of multimodal data, such as radiology images and text-based reports. Interpreting this data at scale is essential for improving care and accelerating research. Radiology reports often compare current and prior images to track changes in findings over time. This is crucial for decision making, but most AI models do not take into account this temporal structure. We are exploring a novel self-supervised framework that pre-trains vision-language models using pairs of reports and sequences of images. This includes handling missing or misaligned images and exploiting temporal information to learn more efficiently. Our approach, called BioViL-T, achieves state-of-the-art results on several downstream tasks, such as report generation, and interpreting disease progression by focusing on relevant image regions across time. BioViL-T is part of ongoing collaboration with our colleagues at Nuance to develop scalable and flexible AI solutions for radiology that can empower care providers and augment existing workflows.

Project InnerEye: Democratizing Medical Imaging AI

Project InnerEye is a research project that is exploring ways in which machine learning has the potential to assist clinicians in planning radiotherapy treatments so that they can spend more time with their patients. Project InnerEye has been working closely with the University of Cambridge and Cambridge University Hospitals NHS Foundation Trust to make progress on this problem through a deep research collaboration. To make our research as accessible as possible, we released the InnerEye Deep Learning Toolkit as open-source software. Cambridge University Hospitals NHS Foundation Trust and University Hospitals Birmingham NHS Trust led an NHS AI in Health and Care Award to evaluate how this technology could potentially save clinicians’ time, reduce the time between the scan and commencing treatment, and scale this to more NHS Trusts. Any clinical use of the InnerEye machine learning models remains subject to regulatory approval.

Immunomics: Decoding the Immune System to Diagnose Disease

The human immune system is an astonishing diagnostic engine, continuously adapting itself to detect any signal of disease in the body. Essentially, the state of the immune system tells a story about virtually everything affecting a person’s health. What if we could “read” this story? Our scientific understanding of human health would be fundamentally advanced. More importantly, this would provide a platform for a new generation of precise medical diagnostics and treatment options. We are partnering with Adaptive Biotechnologies to develop the machine learning and biotechnology tools that will allow us to realize this dream.

Fundamental advances towards new medicines and therapeutics

Protein Engineering

Several research groups are delving into the potential of machine learning to enhance our comprehension of proteins and their pivotal role in various biological processes. We are also using AI to design new proteins for therapeutics and industry. By applying machine learning to extract patterns from databases of sequences, structures, and properties, Microsoft hopes to train models that can make protein engineering by directed evolution more efficient, and directly generate proteins that will perform desired functions. The ability to generate computationally distinct yet viable protein structures holds tremendous promise for uncovering novel biological insights and developing targeted therapies for previously untreatable illnesses.

Investigating the Cancer Microenvironment through Ex Vivo Research

Microsoft is working on ways to identify specific characteristics of cancer cells and their surrounding microenvironments that might be targeted for treatment. By studying how cancer cells and their surroundings interact with each other, the team aims to create a more precise approach to cancer treatment that takes into account both genetic and non-genetic factors.

Accelerating biomedical research

Microsoft and the Broad Institute – combining their expertise in genomics, disease research, cloud computing and data analytics – are developing an open-source platform to accelerate biomedical research using scalable analytical tools. The platform is built on top of the Broad Institute’s Terra platform, providing a user-friendly interface for accessing and analyzing genomic data. Leveraging Microsoft’s Azure cloud computing services, the platform will enable secure storage and analysis of large datasets. Additionally, the platform will incorporate machine learning and other advanced analytical tools to help researchers gain insights into complex diseases and develop new treatments.

Advancing clinical interpretation and exploration through multimodal language models

In the quest for precision medicine and accelerating biomedical discovery, Microsoft is committed to advancing the state of the art in biomedical natural language processing (NLP). A crucial factor in future-facing, data-driven health systems is the accessibility and interpretability of multimodal health information. To meet this need, Microsoft has laid a solid foundation across multiple modalities in biomedical NLP building on our deep research assets in deep learning and biomedical machine reading.

One significant achievement is our development and application of large language models (LLMs) in biomedicine. Microsoft was among the first to create and assess the applicability of LLMs, such as PubMedBERT and BioGPT, which are highly effective in structuring biomedical data. However, to address the inherent limitations of LLMs, Microsoft is developing methods to teach them to fact-check themselves and provide fine-grained provenance. Additionally, Microsoft is exploring ways to facilitate efficient verification with humans in the loop.

Besides text, other modalities such as radiology images, digital pathology slides, and genomics contain valuable health information. Microsoft is developing multimodal learning and fusion methods that incorporate these modalities. These methods include predicting disease progression and drug response, with the ultimate goal of delivering safe and high-quality healthcare.

Observational data in biomedicine is often plagued by confounders, making it challenging to draw causal relationships. To overcome this obstacle, Microsoft is developing advanced causal methods that correct implicit biases and scale biomedical discovery. These methods will allow Microsoft to leverage real-world evidence and contribute to the creation of more effective healthcare delivery systems. For our end-to-end biomedical applications, we have made exciting progress in deep collaborations with Microsoft partners such as The Jackson Laboratory and Providence St. Joseph Health.

Empowering everyone to live a healthier future

Microsoft has pursued interdisciplinary research that enables people to reach the full potential of their health for many years, but we’ve never been more excited about the possibilities than we are today. The latest developments in AI have inspired us to accelerate our efforts across these and many other projects, and we look forward to even more innovation and collaboration in this new era.

The post AI and the Future of Health appeared first on Microsoft Research.

Read More

AI and the Future of Health

AI and the Future of Health

AI and the future of health - female doctor reviewing tablet

The emergence of increasingly capable large-scale AI models, such as the recently released GPT-4, is one of the most significant advances in computing in decades. These innovations are rapidly transforming every aspect of the value we get from technology, as demonstrated through Microsoft’s integration of GPT-4 into Bing, Edge, Microsoft 365, Power Platform, GitHub, and other offerings. More recently, Nuance has announced DAX Express, which uses a unique combination of conversational, ambient, and generative AI to automatically draft clinical notes after patient visits – helping to reduce care providers’ cognitive burdens and increase the joy of practicing medicine (whilst releasing time for care).

We are at an inflection point for the use of AI in healthcare – one of society’s most critical sectors. The significance of this moment is reflected in Peter Lee’s recent article in the New England Journal of Medicine on the potential future clinical applications of GPT-4. At Microsoft Research’s Health Futures organization, the multidisciplinary group dedicated to discovery in this space, we see this as the continuation of a journey, and a major milestone in the long process of innovating to help address the greatest challenges in healthcare.

In this blog, we will share some of our research team’s work to make healthcare more data-driven, predictive, and precise – ultimately, empowering every person on the planet to live a healthier future.

Enabling precision medicine and connected care

We are today at a unique moment in history where medicine, biology, and technology are converging on a large scale. This presents immense possibilities to revolutionize healthcare and the practice of medicine with the aid of trustworthy AI. While we embrace the potential of AI, we understand that the practice of medicine is an intricate balance of “art” and “science.” We recognize and honor the enduring physician-patient relationship, which is fundamental and timeless. Our diverse team comprises researchers, scientists, engineers, biotechnologists, designers, social scientists, strategists, healthcare experts, and medical professionals who collaborate globally and inclusively to reimagine and transform the lives of the patients and public we serve.

As we consider how technologies have shaped the practice of medicine over the centuries, from the individual to the ecosystem level, we are reminded that no technology exists in a vacuum. Our core understanding of biological systems is rapidly evolving, and with it, our understanding of what technologies are relevant and useful. Simultaneously, the use of technology across the health and life science industries, and the way healthcare is delivered, are also rapidly changing – reshaping our traditional healthcare delivery model from one of diagnosis and treatment, to one that prioritizes prevention and precise individualized care.

Spotlight: On-demand video

AI Explainer: Foundation models ​and the next era of AI

Explore how the transformer architecture, larger models and more data, and in-context learning have helped advance AI from perception to creation.

Recent advancements in machine learning and AI have fueled computational technologies that allow us to aggregate complex inputs from multiple data sources, with the potential to derive rich insights that rapidly expand our knowledge base and drive deeper discovery and faster innovation. At the same time, it remains an open question how to best use and regulate these technologies in real-world settings and at scale across healthcare and the life sciences. Nonetheless, we believe that we are on a path to delivering on the goal of precision medicine – a change in clinical practice which will be enabled by precision diagnostics, precision therapeutics, and connected care technologies.

To achieve this goal, we seek to collaborate with health and life sciences organizations with a similar appetite for transformation, complementary expertise, and a commitment to propel the change required. We are also engaged with the broader community in pursuing responsible and ethical use of AI in healthcare. Our diverse team has been successful in bridging the gap between the fields of medicine, biology and chemistry on one hand, and computing on the other. We act as “translators” between these fields, and through a process of ongoing collaboration and feedback, we have discovered new challenges and innovative solutions.

Below are some examples of our collaborative research approach:

Exploring diagnostic tools from new modalities

Multimodal foundation models for medicine: an example from radiology

The field of biomedicine involves a great deal of multimodal data, such as radiology images and text-based reports. Interpreting this data at scale is essential for improving care and accelerating research. Radiology reports often compare current and prior images to track changes in findings over time. This is crucial for decision making, but most AI models do not take into account this temporal structure. We are exploring a novel self-supervised framework that pre-trains vision-language models using pairs of reports and sequences of images. This includes handling missing or misaligned images and exploiting temporal information to learn more efficiently. Our approach, called BioViL-T, achieves state-of-the-art results on several downstream tasks, such as report generation, and interpreting disease progression by focusing on relevant image regions across time. BioViL-T is part of ongoing collaboration with our colleagues at Nuance to develop scalable and flexible AI solutions for radiology that can empower care providers and augment existing workflows.

Project InnerEye: Democratizing Medical Imaging AI

Project InnerEye is a research project that is exploring ways in which machine learning has the potential to assist clinicians in planning radiotherapy treatments so that they can spend more time with their patients. Project InnerEye has been working closely with the University of Cambridge and Cambridge University Hospitals NHS Foundation Trust to make progress on this problem through a deep research collaboration. To make our research as accessible as possible, we released the InnerEye Deep Learning Toolkit as open-source software. Cambridge University Hospitals NHS Foundation Trust and University Hospitals Birmingham NHS Trust led an NHS AI in Health and Care Award to evaluate how this technology could potentially save clinicians’ time, reduce the time between the scan and commencing treatment, and scale this to more NHS Trusts. Any clinical use of the InnerEye machine learning models remains subject to regulatory approval.

Immunomics: Decoding the Immune System to Diagnose Disease

The human immune system is an astonishing diagnostic engine, continuously adapting itself to detect any signal of disease in the body. Essentially, the state of the immune system tells a story about virtually everything affecting a person’s health. What if we could “read” this story? Our scientific understanding of human health would be fundamentally advanced. More importantly, this would provide a platform for a new generation of precise medical diagnostics and treatment options. We are partnering with Adaptive Biotechnologies to develop the machine learning and biotechnology tools that will allow us to realize this dream.

Fundamental advances towards new medicines and therapeutics

Protein Engineering

Several research groups are delving into the potential of machine learning to enhance our comprehension of proteins and their pivotal role in various biological processes. We are also using AI to design new proteins for therapeutics and industry. By applying machine learning to extract patterns from databases of sequences, structures, and properties, Microsoft hopes to train models that can make protein engineering by directed evolution more efficient, and directly generate proteins that will perform desired functions. The ability to generate computationally distinct yet viable protein structures holds tremendous promise for uncovering novel biological insights and developing targeted therapies for previously untreatable illnesses.

Investigating the Cancer Microenvironment through Ex Vivo Research

Microsoft is working on ways to identify specific characteristics of cancer cells and their surrounding microenvironments that might be targeted for treatment. By studying how cancer cells and their surroundings interact with each other, the team aims to create a more precise approach to cancer treatment that takes into account both genetic and non-genetic factors.

Accelerating biomedical research

Microsoft and the Broad Institute – combining their expertise in genomics, disease research, cloud computing and data analytics – are developing an open-source platform to accelerate biomedical research using scalable analytical tools. The platform is built on top of the Broad Institute’s Terra platform, providing a user-friendly interface for accessing and analyzing genomic data. Leveraging Microsoft’s Azure cloud computing services, the platform will enable secure storage and analysis of large datasets. Additionally, the platform will incorporate machine learning and other advanced analytical tools to help researchers gain insights into complex diseases and develop new treatments.

Advancing clinical interpretation and exploration through multimodal language models

In the quest for precision medicine and accelerating biomedical discovery, Microsoft is committed to advancing the state of the art in biomedical natural language processing (NLP). A crucial factor in future-facing, data-driven health systems is the accessibility and interpretability of multimodal health information. To meet this need, Microsoft has laid a solid foundation across multiple modalities in biomedical NLP building on our deep research assets in deep learning and biomedical machine reading.

One significant achievement is our development and application of large language models (LLMs) in biomedicine. Microsoft was among the first to create and assess the applicability of LLMs, such as PubMedBERT and BioGPT, which are highly effective in structuring biomedical data. However, to address the inherent limitations of LLMs, Microsoft is developing methods to teach them to fact-check themselves and provide fine-grained provenance. Additionally, Microsoft is exploring ways to facilitate efficient verification with humans in the loop.

Besides text, other modalities such as radiology images, digital pathology slides, and genomics contain valuable health information. Microsoft is developing multimodal learning and fusion methods that incorporate these modalities. These methods include predicting disease progression and drug response, with the ultimate goal of delivering safe and high-quality healthcare.

Observational data in biomedicine is often plagued by confounders, making it challenging to draw causal relationships. To overcome this obstacle, Microsoft is developing advanced causal methods that correct implicit biases and scale biomedical discovery. These methods will allow Microsoft to leverage real-world evidence and contribute to the creation of more effective healthcare delivery systems. For our end-to-end biomedical applications, we have made exciting progress in deep collaborations with Microsoft partners such as The Jackson Laboratory and Providence St. Joseph Health.

Empowering everyone to live a healthier future

Microsoft has pursued interdisciplinary research that enables people to reach the full potential of their health for many years, but we’ve never been more excited about the possibilities than we are today. The latest developments in AI have inspired us to accelerate our efforts across these and many other projects, and we look forward to even more innovation and collaboration in this new era.

The post AI and the Future of Health appeared first on Microsoft Research.

Read More

Bundesliga Match Fact Ball Recovery Time: Quantifying teams’ success in pressing opponents on AWS

Bundesliga Match Fact Ball Recovery Time: Quantifying teams’ success in pressing opponents on AWS

In football, ball possession is a strong predictor for team success. It’s hard to control the game without having control over the ball. In the past three Bundesliga seasons, as well as in the current season (at the time of this writing), Bayern Munich is ranked first in the table and in ball possession percentage, followed by Dortmund being second in both. The active tactics and playing styles that facilitate high possession values through ball retention have been widely discussed. Terms like Tiki-Taka were established to describe a playing style that is characterized by a precise short passing game with frequent long ball possessions of the attacking team. However, in order to arrive at high possession rates, teams also need to adapt their defense to quickly win back a ball lost to the opponent. Terms like high-press, middle-press, and low-press are often used to describe the amount of room a defending team is allowing their opponents when moving towards their goal before applying pressure on the ball.

The recent history of Bundesliga club FC Köln emphasizes the effect of different pressing styles on a team’s success. Since Steffen Baumgart took over as coach at FC Köln in 2021, the team has managed to lift themselves from the bottom and has established a steady position in the middle of the table. When analyzing the team statistics after the switch in coaches, one aspect stands our specifically: with 54 pressing situations per game, the team was ranked first in the league, being able to win the ball back in a third of those situations. This proved especially successful when attacking in the opponent’s half of the pitch. With an increased number of duels per match (+10% compared to previous season), the Billy Goats managed to finish the last season on a strong seventh place, securing a surprising spot in the UEFA Europa Conference League.

Our previous Bundesliga Match Fact (BMF) Pressure Handling sheds light on how successful different players and teams are in withstanding this pressure while retaining the ball. To facilitate the understanding of how active and successful a defending team applies pressure, we need to understand how long it takes them to win back a lost ball. Which Bundesliga teams are fastest in winning back lost possessions? How does a team’s ability to quickly regain possession develop over the course of a match? Are their recovery times diminished when playing stronger teams? And finally, are short recovery times a necessary ingredient to a winning formula?

Introducing the new Bundesliga Match Fact: Ball Recovery Time.

How it works

Ball Recovery Time (BRT) calculates the amount of time it takes for a team to regain possession of the ball. It indicates how hungry a team is at winning the ball back and is measured in average ball recovery time in seconds.

Throughout a match, the positions of the players and the ball are tracked by cameras around the pitch and stored as coordinates in a positional data stream. This allows us to calculate which player has ball possession at any given moment in time. It’s no surprise that the ball possession alternates between the two teams over the course of a match. However, less obvious are the times where the ball possession is contested and can’t be directly assigned to any particular team. The timer for ball recovery starts counting from the moment the team loses possession until they regain it. The time when the ball’s possession is not clear is included in the timer, incentivizing teams to favor clear and fast recoveries.

The following example shows a sequence of alternating ball possessions between team A and B. At some point, team A loses ball possession to team B, which starts the ball recovery time for team A. The ball recovery time is calculated until team A regains the ball.

As already mentioned, FC Cologne has been the league leader in the number of pressing situations since Steffen Baumgart took office. This style of play is also evident when you look at the ball recovery times for the first 24 match days in the 2022/23 season. Cologne achieved an incredible ball recovery time of 13.4 seconds, which is the fourth fastest in the league. On average, it took them only 1.4 seconds longer to recover a lost ball than the fastest team in the league, Bayern Munich, who got the ball back from their opponents after an average of 12 seconds.

Let’s look at certain games played by Cologne in the 2022/23 season. The following chart shows the ball recovery times of Cologne for various games. At least two games stand out in particular. On the first match day, they faced FC Schalke—also known as the Miners—and managed an exceptionally low BRT of 8.3 seconds. This was aided by a red card for Schalke in the first half when the game was still tied 0:0. Cologne’s quick recovery of the ball subsequently helped them prevail a 3:1 against the Miners.

Also worth mentioning is the Cologne derby against Borussia Mönchengladbach on the ninth match day. In that game, Cologne took 21.6 seconds to recover the ball, which is around 60% slower than their season average of 13.4 seconds. A yellow-red card just before halftime certainly made it difficult for the Billy Goats to speed up recovering the ball from their local rival Borussia. At the same time, Borussia managed to win the ball back from Cologne on average after just 13.7 seconds, resulting in a consistent 5:2 win for Borussia over their perennial rivals from Cologne.

How it’s implemented

Positional data from an ongoing match, which is recorded at a sampling rate of 25 Hz, is utilized to determine the time taken to recover the ball. To ensure real-time updates of ball recovery times, we have implemented Amazon Managed Streaming for Apache Kafka (Amazon MSK) as a central solution for data streaming and messaging. This allows for seamless communication of positional data and various outputs of Bundesliga Match Facts between containers in real time.

The following diagram illustrates the end-to-end workflow for Ball Recovery Time.

The match-related data is collected and ingested using DFL’s DataHub. Metadata of the match is processed within the AWS Lambda function MetaDataIngestion, while positional data is ingested using the AWS Fargate container called MatchLink. Both the Lambda function and the Fargate container publish the data for further consumption in the relevant MSK topics. The core of the Ball Recovery Time BMF resides within a dedicated Fargate container called BMF BallRecoveryTime. This container operates throughout the corresponding match and obtains all necessary data continuously through Amazon MSK. Its logic responds instantly to positional changes and constantly computes the current ball recovery times.

After the ball recovery times have been computed, they’re transmitted back to the DataHub for distribution to other consumers of Bundesliga Match Facts. Additionally, the ball recovery times are sent to a specific topic in the MSK cluster, where they can be accessed by other Bundesliga Match Facts. A Lambda function retrieves all recovery times from the relevant Kafka topic and stores them in an Amazon Aurora Serverless database. This data is then utilized to create interactive, near-real-time visualizations with Amazon QuickSight.

Summary

In this post, we demonstrated how the new Bundesliga Match Fact Ball Recovery Time makes it possible to quantify and objectively compare the speed of different Bundesliga teams in winning back a lost ball possession. This allows commentators and fans to understand how early and successful teams apply pressure to their opponents.

The new Bundesliga Match Fact is the result of an in-depth analysis by a team of football experts and data scientists from the Bundesliga and AWS. Noteworthy ball recovery times are shown in the live ticker of the respective matches in the official Bundesliga app and website. During live matches, ball recovery times are also provided to commentators through the data story finder and visually shown to fans at key moments in broadcast.

We hope that you enjoy this brand-new Bundesliga Match Fact and that it provides you with new insights into the game. To learn more about the partnership between AWS and Bundesliga, visit Bundesliga on AWS!

We’re excited to learn what patterns you will uncover. Share your insights with us: @AWScloud on Twitter, with the hashtag #BundesligaMatchFacts.


About the Authors

Javier Poveda-Panter is a Senior Data Scientist for EMEA sports customers within the AWS Professional Services team. He enables customers in the area of spectator sports to innovate and capitalize on their data, delivering high-quality user and fan experiences through machine learning and data science. He follows his passion for a broad range of sports, music, and AI in his spare time.

Tareq Haschemi is a consultant within AWS Professional Services. His skills and areas of expertise include application development, data science, machine learning, and big data. He supports customers in developing data-driven applications within the cloud. Prior to joining AWS, he was also a consultant in various industries such as aviation and telecommunications. He is passionate about enabling customers on their data/AI journey to the cloud.

Jean-Michel Lourier is a Senior Data Scientist within AWS Professional Services. He leads teams implementing data driven applications side by side with AWS customers to generate business value out of their data. He’s passionate about diving into tech and learning about AI, machine learning, and their business applications. He is also an enthusiastic cyclist, taking long bike-packing trips.

Fotinos Kyriakides is an ML Engineer with AWS Professional Services. He focuses his efforts in the fields of machine learning, MLOps, and application development, in supporting customers to develop applications in the cloud that leverage and innovate on insights generated from data. In his spare time, he likes to run and explore nature.

Luuk Figdor is a Principal Sports Technology Advisor in the AWS Professional Services team. He works with players, clubs, leagues, and media companies such as the Bundesliga and Formula 1 to help them tell stories with data using machine learning. In his spare time, he likes to learn all about the mind and the intersection between psychology, economics, and AI.

Read More

Bundesliga Match Fact Keeper Efficiency: Comparing keepers’ performances objectively using machine learning on AWS

Bundesliga Match Fact Keeper Efficiency: Comparing keepers’ performances objectively using machine learning on AWS

The Bundesliga is renowned for its exceptional goalkeepers, making it potentially the most prominent among Europe’s top five leagues in this regard. Apart from the widely recognized Manuel Neuer, the Bundesliga has produced remarkable goalkeepers who have excelled in other leagues, including the likes of Marc-André ter Stegen, who is a superstar at Barcelona. In view of such steep competition, people are split on the question of who the most remarkable sweeper in the German top league is. As demonstrated by Yann Sommer’s stunning 19 saves (Bundesliga record) against Bayern Munich last summer that aided his former club Mönchengladbach to pull a draw on the Bavarians, this league’s keepers are fiercely vying for the top spot.

We have witnessed time and time again that a keeper can make or break a win, yet it remains challenging to objectively quantify their effect on a team’s success. Who is the most efficient goal keeper in the Bundesliga? Who prevents more goals than the average? How can we even compare keepers with different playing styles? It’s about time to shed some light on our guardians’ achievements. Enter the brand-new Bundesliga Match Fact: Keeper Efficiency.

When talking about the best of the best shot-stoppers in the Bundesliga, the list is long and rarely complete. In recent years, one name has been especially dominant: Kevin Trapp. For years, Trapp has been regarded as one of the finest goalies in the Bundesliga. Not only was he widely considered the top-rated goalkeeper in the league during the 2021/22 season, but he also held that title back in 2018/19 when Eintracht Frankfurt reached the Europa League semifinals. Similar to Yann Sommer, Trapp often delivered his best performances on nights when his team was up against the Bavarians.

Many football enthusiasts would argue that Yann Sommer is the best keeper in Germany’s top league, despite being also the smallest. Sommer is highly skilled with the ball at his feet and has demonstrated his ability to produce jaw-dropping saves that are on par with others in the world elite. Although Sommer can genuinely match any goalkeeper’s level on his best days, he hasn’t had enough of those best days frequently in the past. Although he has improved his consistency over time, he still makes occasional errors that can frustrate fans. While being the well-deserved Switzerland’s #1 since 2016, time will tell whether he pushes Manuel Neuer off the throne in Munich.

And let’s not forget about Gregor Kobel. Since joining Borussia Dortmund, Kobel, who has previously played for Hoffenheim, Augsburg, and VfB Stuttgart, has been a remarkable signing for the club. Although Jude Bellingham has possibly overtaken him as the team’s highest valued player, there is still a valid argument that Kobel is the most important player for Dortmund. At only 25 years old, Kobel is among the most promising young goalkeepers globally, with the ability to make quality saves and face a significant number of shots in the Bundesliga. The pressure to perform at Dortmund is immense, second only to their fierce rivals Bayern Munich (at the time of this writing), and Kobel doesn’t have the same defensive protection as any Bayern keeper would. In 2022/23 so far, he has almost secured a clean sheet every other match for Die Schwarzgelben, despite the team’s inconsistency and often poor midfield performance.

As these examples show, the ways in which keepers shine and compete are manifold. Therefore, it’s no surprise that determining the proficiency of goalkeepers in preventing the ball from entering the net is considered one of the most difficult tasks in football data analysis. Bundesliga and AWS have collaborated to perform an in-depth examination to study the quantification of achievements of Bundesliga’s keepers. The result is a machine learning (ML)-powered insight that allows fans to easily evaluate and compare the goalkeepers’ proficiencies. We’re excited to announce the new Bundesliga Match Fact: Keeper Efficiency.

How it works

The new Bundesliga Match Fact Keeper Efficiency allows fans to evaluate the proficiency of goalkeepers in terms of their ability to prevent shooters from scoring. Although tallying the total number of saves a goalkeeper makes during a match can be informative, it doesn’t account for variations in the difficulty of the shots faced. To avoid treating a routine catch of a 30-meter shot aimed directly at the goalkeeper as being equivalent to an exceptional save made from a shot taken from a distance of 5 meters, we assign each shot a value known as xSaves, which measures the probability that a shot will be saved by a Keeper. In other words, a shot with an xSaves value of 0.9 would be saved 9 out of 10 times.

An ML model is trained through Amazon SageMaker, using data from four seasons of the first and second Bundesliga, encompassing all shots that landed on target (either resulting in a goal or being saved). Using derived characteristics of a shot, the model generates the probability that the shot will be successfully saved by the goalkeeper. Some of the factors considered by the model are: distance to goal, distance to goalkeeper, shot angle, number of players between the shot location and the goal, goalkeeper positioning, and predicted shot trajectory. We utilize an extra model to predict the trajectory of the shot using the initial few frames of the observed shot. With the predicted trajectory of the shot and the goalkeeper’s position, the xSaves model can evaluate the probability of the goalkeeper saving the ball.

Adding up all xSaves values of saved and conceded shots by a goalkeeper yields the expected number of saves a goalkeeper should have during a match or season. Comparing that against the actual number of saves yields the Keeper Efficiency. In other words, a goalkeeper with a positive Keeper Efficiency rating indicates that the goalkeeper has saved more shots than expected.

Keeper Efficiency in action

The following are a few estimates to showcase the Keeper Efficiency.

Example 1

Due to the large distance to the goal, and the relatively low distance and large number of defenders covering the goal, the probability that the shot will result in a goal is low. Because the goalkeeper saved the shot, he will receive a small increase in the Keeper Efficiency ranking.

Example 2

In this example, the striker is much closer to the goal, with only one defender between him and the goalkeeper, resulting in a lower save probability.

Example 3

In this example, the speed of the ball is much higher and the ball is higher off the ground, resulting in a very low probability that the ball will be saved. The goal was conceded, and therefore the goalkeeper will see a small decrease in his Keeper Efficiency statistic.

What makes a good save

The preceding video shows a medium difficulty shot with approximately a 50/50 chance of being saved, meaning that half the keepers in the league would save it and the other half concede the goal. What makes this save remarkable is the goalkeeper’s positioning, instinct, and reflexes. The goalkeeper remains focused on the ball even when his vision is obstructed by the defenders and changes his positioning multiple times according to where he thinks the biggest opening lies. Looking at it frame by frame, as soon as the attacking player winds up to take the shot, the goalkeeper makes a short hop backwards to better position himself for the jump to save the shot. The keeper’s reflexes are perfect, landing precisely at the moment when the striker kicks the ball. If he lands too late, he would be mid-air as the ball is flying towards the goal, wasting precious time. With both feet planted on the grass, he makes a strong jump, managing to save the shot.

How Keeper Efficiency is implemented

This Bundesliga Match Fact consumes both event and positional data. Positional data is information gathered by cameras on the positions of the players and ball at any moment during the match (x-y coordinates), arriving at 25Hz. Event data consists of hand-labelled event descriptions with useful attributes, such as shot on target. When a shot on target (a scored or saved goal) event is received, it queries the stored positional data and finds a sync frame—a frame during which the timing and position of the ball match with the event. This frame is used to synchronize the event data with the positional data. Having synchronized, the subsequent frames that track the ball trajectory are used to predict where the ball will enter the goal. Additionally, the goalkeeper position at the time of the shot is considered, as well as a number of other features such as the number of defenders between the ball and the goalpost and the speed of the ball. All this data is then passed to an ML model (xGBoost), which is deployed on Amazon SageMaker Serverless Inference to generate a prediction on the probability of the shot being saved.

The BMF logic itself (except for the ML model) runs on an AWS Fargate container. For every xSaves prediction, it produces a message with the prediction as a payload, which then gets distributed by a central message broker running on Amazon Managed Streaming for Apache Kafka (Amazon MSK). The information also gets stored in a data lake for future auditing and model improvements. The contents of the Kafka messages then get written via an AWS Lambda function to an Amazon Aurora Serverless database to be presented in an Amazon QuickSight dashboard. The following diagram illustrates this architecture.

Summary

The new Bundesliga Match Fact Keeper Efficiency measures the shot-stopping skills of the Bundesliga’s goalies, which are considered to be among the finest in the world. This gives fans and commentators the unique opportunity to understand quantitatively how much a goalkeeper’s performance has contributed to a team’s match result or seasonal achievements.

This Bundesliga Match Fact was developed among a team of Bundesliga and AWS experts. Noteworthy goalkeeper performances are pushed into the Bundesliga live ticker in the mobile app and on the webpage. Match commentators can observe exceptional Keeper Efficiency through the data story finder, and visuals are presented to the fans as part of broadcasting streams.

We hope that you enjoy this brand-new Bundesliga Match Fact and that it provides you with new insights into the game. To learn more about the partnership between AWS and Bundesliga, visit Bundesliga on AWS!

We’re excited to learn what patterns you will uncover. Share your insights with us: @AWScloud on Twitter, with the hashtag #BundesligaMatchFacts.


About the Authors

Javier Poveda-Panter is a Senior Data Scientist for EMEA sports customers within the AWS Professional Services team. He enables customers in the area of spectator sports to innovate and capitalize on their data, delivering high-quality user and fan experiences through machine learning and data science. He follows his passion for a broad range of sports, music, and AI in his spare time.

Tareq Haschemi is a consultant within AWS Professional Services. His skills and areas of expertise include application development, data science, machine learning, and big data. He supports customers in developing data-driven applications within the cloud. Prior to joining AWS, he was also a consultant in various industries such as aviation and telecommunications. He is passionate about enabling customers on their data/AI journey to the cloud.

Jean-Michel Lourier is a Senior Data Scientist within AWS Professional Services. He leads teams implementing data-driven applications side by side with AWS customers to generate business value out of their data. He’s passionate about diving into tech and learning about AI, machine learning, and their business applications. He is also an enthusiastic cyclist, taking long bike-packing trips.

Fotinos Kyriakides is an ML Engineer with AWS Professional Services. He focuses his efforts in the fields of machine learning, MLOps, and application development, in supporting customers to develop applications in the cloud that leverage and innovate on insights generated from data. In his spare time, he likes to run and explore nature.

Uwe Dick is a Data Scientist at Sportec Solutions AG. He works to enable Bundesliga clubs and media to optimize their performance using advanced stats and data—before, after, and during matches. In his spare time, he settles for less and just tries to last the full 90 minutes for his recreational football team.

Luuk Figdor is a Principal Sports Technology Advisor in the AWS Professional Services team. He works with players, clubs, leagues, and media companies such as the Bundesliga and Formula 1 to help them tell stories with data using machine learning. In his spare time, he likes to learn all about the mind and the intersection between psychology, economics, and AI.

Read More

Data-centric ML benchmarking: Announcing DataPerf’s 2023 challenges

Data-centric ML benchmarking: Announcing DataPerf’s 2023 challenges

Machine learning (ML) offers tremendous potential, from diagnosing cancer to engineering safe self-driving cars to amplifying human productivity. To realize this potential, however, organizations need ML solutions to be reliable with ML solution development that is predictable and tractable. The key to both is a deeper understanding of ML data — how to engineer training datasets that produce high quality models and test datasets that deliver accurate indicators of how close we are to solving the target problem.

The process of creating high quality datasets is complicated and error-prone, from the initial selection and cleaning of raw data, to labeling the data and splitting it into training and test sets. Some experts believe that the majority of the effort in designing an ML system is actually the sourcing and preparing of data. Each step can introduce issues and biases. Even many of the standard datasets we use today have been shown to have mislabeled data that can destabilize established ML benchmarks. Despite the fundamental importance of data to ML, it’s only now beginning to receive the same level of attention that models and learning algorithms have been enjoying for the past decade.

Towards this goal, we are introducing DataPerf, a set of new data-centric ML challenges to advance the state-of-the-art in data selection, preparation, and acquisition technologies, designed and built through a broad collaboration across industry and academia. The initial version of DataPerf consists of four challenges focused on three common data-centric tasks across three application domains; vision, speech and natural language processing (NLP). In this blogpost, we outline dataset development bottlenecks confronting researchers and discuss the role of benchmarks and leaderboards in incentivizing researchers to address these challenges. We invite innovators in academia and industry who seek to measure and validate breakthroughs in data-centric ML to demonstrate the power of their algorithms and techniques to create and improve datasets through these benchmarks.

Data is the new bottleneck for ML

Data is the new code: it is the training data that determines the maximum possible quality of an ML solution. The model only determines the degree to which that maximum quality is realized; in a sense the model is a lossy compiler for the data. Though high-quality training datasets are vital to continued advancement in the field of ML, much of the data on which the field relies today is nearly a decade old (e.g., ImageNet or LibriSpeech) or scraped from the web with very limited filtering of content (e.g., LAION or The Pile).

Despite the importance of data, ML research to date has been dominated by a focus on models. Before modern deep neural networks (DNNs), there were no ML models sufficient to match human behavior for many simple tasks. This starting condition led to a model-centric paradigm in which (1) the training dataset and test dataset were “frozen” artifacts and the goal was to develop a better model, and (2) the test dataset was selected randomly from the same pool of data as the training set for statistical reasons. Unfortunately, freezing the datasets ignored the ability to improve training accuracy and efficiency with better data, and using test sets drawn from the same pool as training data conflated fitting that data well with actually solving the underlying problem.

Because we are now developing and deploying ML solutions for increasingly sophisticated tasks, we need to engineer test sets that fully capture real world problems and training sets that, in combination with advanced models, deliver effective solutions. We need to shift from today’s model-centric paradigm to a data-centric paradigm in which we recognize that for the majority of ML developers, creating high quality training and test data will be a bottleneck.

Shifting from today’s model-centric paradigm to a data-centric paradigm enabled by quality datasets and data-centric algorithms like those measured in DataPerf.

Enabling ML developers to create better training and test datasets will require a deeper understanding of ML data quality and the development of algorithms, tools, and methodologies for optimizing it. We can begin by recognizing common challenges in dataset creation and developing performance metrics for algorithms that address those challenges. For instance:

  • Data selection: Often, we have a larger pool of available data than we can label or train on effectively. How do we choose the most important data for training our models?
  • Data cleaning: Human labelers sometimes make mistakes. ML developers can’t afford to have experts check and correct all labels. How can we select the most likely-to-be-mislabeled data for correction?

We can also create incentives that reward good dataset engineering. We anticipate that high quality training data, which has been carefully selected and labeled, will become a valuable product in many industries but presently lack a way to assess the relative value of different datasets without actually training on the datasets in question. How do we solve this problem and enable quality-driven “data acquisition”?

DataPerf: The first leaderboard for data

We believe good benchmarks and leaderboards can drive rapid progress in data-centric technology. ML benchmarks in academia have been essential to stimulating progress in the field. Consider the following graph which shows progress on popular ML benchmarks (MNIST, ImageNet, SQuAD, GLUE, Switchboard) over time:

Performance over time for popular benchmarks, normalized with initial performance at minus one and human performance at zero. (Source: Douwe, et al. 2021; used with permission.)

Online leaderboards provide official validation of benchmark results and catalyze communities intent on optimizing those benchmarks. For instance, Kaggle has over 10 million registered users. The MLPerf official benchmark results have helped drive an over 16x improvement in training performance on key benchmarks.

DataPerf is the first community and platform to build leaderboards for data benchmarks, and we hope to have an analogous impact on research and development for data-centric ML. The initial version of DataPerf consists of leaderboards for four challenges focused on three data-centric tasks (data selection, cleaning, and acquisition) across three application domains (vision, speech and NLP):

  • Training data selection (Vision): Design a data selection strategy that chooses the best training set from a large candidate pool of weakly labeled training images.
  • Training data selection (Speech): Design a data selection strategy that chooses the best training set from a large candidate pool of automatically extracted clips of spoken words.
  • Training data cleaning (Vision): Design a data cleaning strategy that chooses samples to relabel from a “noisy” training set where some of the labels are incorrect.
  • Training dataset evaluation (NLP): Quality datasets can be expensive to construct, and are becoming valuable commodities. Design a data acquisition strategy that chooses which training dataset to “buy” based on limited information about the data.

For each challenge, the DataPerf website provides design documents that define the problem, test model(s), quality target, rules and guidelines on how to run the code and submit. The live leaderboards are hosted on the Dynabench platform, which also provides an online evaluation framework and submission tracker. Dynabench is an open-source project, hosted by the MLCommons Association, focused on enabling data-centric leaderboards for both training and test data and data-centric algorithms.

How to get involved

We are part of a community of ML researchers, data scientists and engineers who strive to improve data quality. We invite innovators in academia and industry to measure and validate data-centric algorithms and techniques to create and improve datasets through the DataPerf benchmarks. The deadline for the first round of challenges is May 26th, 2023.

Acknowledgements

The DataPerf benchmarks were created over the last year by engineers and scientists from: Coactive.ai, Eidgenössische Technische Hochschule (ETH) Zurich, Google, Harvard University, Meta, ML Commons, Stanford University. In addition, this would not have been possible without the support of DataPerf working group members from Carnegie Mellon University, Digital Prism Advisors, Factored, Hugging Face, Institute for Human and Machine Cognition, Landing.ai, San Diego Supercomputing Center, Thomson Reuters Lab, and TU Eindhoven.

Read More

How Adobe used Web ML with TensorFlow.js to enhance Photoshop for web

How Adobe used Web ML with TensorFlow.js to enhance Photoshop for web

Guest post by Joseph Hsieh (Principal Scientist, Project Lead at Adobe), Devin Fernandez (Director of Product Management, Adobe), and Jason Mayes (Web ML Lead, Google)

Introduction

Photoshop Web is a browser-based version of the popular desktop image editing software, Adobe Photoshop. This online tool offers a wide range of features and capabilities for editing, enhancing, and manipulating images, all through a web browser.

In this post, we will explore how Adobe plans to bring advanced ML features from desktop to web, such as the Object Selection tool. We will also look at how web-based machine learning in JavaScript can improve the performance and user experience of Photoshop Web, and what we can expect in the future.

Challenge

Photoshop has recently been made available on the web through WebAssembly in our first attempt to port our tooling to the browser. However, to bring advanced ML features such as the Object Selection Tool to Photoshop Web, it currently adopts a cloud inference solution for object selection tasks which requires the user to be online, and to send data to the cloud service to perform the machine learning task. This means the web app cannot run offline, user privacy is not preserved, and there is an added latency and monetary cost to each call to the cloud as we need to run those models on our own hardware.

Moving image of screenshot illustrating responsive UI in Object Selection in Adobe Photoshop

When it comes to the Object Selection tool, relying on cloud inference can sometimes result in suboptimal performance due to network latency. To provide a better user experience, Adobe Photoshop Web eliminates this latency by developing an on-device inference solution, resulting in faster predictions and a more responsive UI.

TensorFlow.js is an open-source machine learning library from Google aimed at JavaScript developers that’s able to run client side in the browser. It’s the most mature option for web ML with comprehensive WebGL and WebAssembly backend operators support, and in the future, there will also be an option for a WebGPU backend to be used within the browser for faster performance as new web standards evolve. Adobe has collaborated with Google to bring TensorFlow.js to Photoshop Web and enable advanced tasks such as object selection using ML running in the browser, the details of the collaboration are explained below.

When we first started to convert to a web solution, we noticed that there were synchronization issues between WebAssembly (what our core ported Photoshop code was running in) and TensorFlow.js (for running the ML models in the browser). Essentially we needed to load and run the TensorFlow.js models synchronously instead of asynchronously to work with our WebAssembly port of Photoshop. One potential 3rd party solution was not an option due to its drawbacks – such as large code overhead size or unpredictable performance across devices. So, a new solution was required.

To tackle these challenges, first Google and Adobe collaborated to bring a proxying API to Emscripten – a 3rd party compiler toolchain that can compile to WebAssembly that uses LLVM to enable code written in C or C++ to run in browser and interact with JavaScript libraries. A Proxying API for Emscripten effectively resolves these issues that the 3rd party solution suffered and allows for seamless integration between Photoshop’s Web Assembly implementation and the TensorFlow.js ML model running.

Next, once communication between WebAssembly and TensorFlow.js was possible, Adobe ported key ML models such as the one used in object selection shown above to the TensorFlow.js format. The TensorFlow.js team aided in model optimization for such models by focusing on optimizing common ops models utilized such as the Conv2D operation to ensure the converted models ran as fast as possible in the browser.

With both cloud and on-device solutions now a possibility, Photoshop Web can choose the optimal option for delivering the best user experience and deploy ML models accordingly. While on-device inference offers superior user interaction with low latency and privacy for frequently used tasks, not all ML models can run locally due to the limited memory per browser tab (currently around 4GB in Chrome). On the other hand, cloud inference can accommodate larger ML models for tasks where network latency may be acceptable, with the tradeoffs of less perceived privacy by the end user and the associated cost to host and execute such models on server side hardware.

Performance Improvement

Since the Google team has improved TensorFlow.js hardware execution performance via its various supported backends (WebGL, WASM, Web GPU), it has resulted in models seeing anywhere from 30% to 200% performance improvements (especially for the larger models that tend to see the biggest gains), enabling close to real time performance right in the browser.

Looking Ahead

Photoshop Web’s Select Subject and Object Selection tools demonstrate how machine learning can help enhance user workflow and experience. As web-based machine learning technology continues to evolve and TensorFlow.js backend support and efficiency continue to make performance gains, Photoshop Web will be able to bring more advanced models to the edge on device in the browser, pushing the limits of what is possible and enabling even more advanced features to delight users.

Try it out

Try out Photoshop Web right now for yourself at https://photoshop.adobe.com and see the power of machine learning in the browser that brings the best of Web ML (coming soon) and Cloud ML inference in action!

Adobe offerings and trademarks belong to Adobe Inc and are not associated with Google.

Read More

AI Frontiers: AI for health and the future of research with Peter Lee

AI Frontiers: AI for health and the future of research with Peter Lee

Peter Lee wearing glasses and smiling at the camera with the Microsoft Research Podcast logo to the left

Episode 137 | March 30, 2023

Powerful new large-scale AI models like GPT-4 are showing dramatic improvements in reasoning, problem-solving, and language capabilities. This marks a phase change for artificial intelligence—and a signal of accelerating progress to come.

In this new Microsoft Research Podcast series, AI scientist and engineer Ashley Llorens hosts conversations with his collaborators and colleagues about what these new models—and the models that will come next—mean for our approach to creating, understanding, and deploying AI, its applications in areas such as health care and education, and its potential to benefit humanity.

The second episode features Peter Lee, head of Microsoft Research. Lee was among a group within Microsoft to have early access to GPT-4 for evaluation and experimentation. Here, he applies his philosophy of tackling research from what will be inevitably true at a future point in time to this current moment. He also explores the differences that may make integrating today’s AI advancements into health care more attainable, a topic he expands on in the soon-to-be-released book The AI Revolution in Medicine: GPT-4 and Beyond and the New England Journal of Medicine article “Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine.”

Transcript

[MUSIC PLAYS]

Ashley Llorens: I’m Ashley Llorens with Microsoft Research. I’ve spent the last 20 years working in AI and machine learning. But I’ve never felt more fortunate to work in the field than at this moment. Just this month, March 2023, OpenAI announced GPT-4, a powerful new large-scale AI model with dramatic improvements in reasoning, problem-solving, and much more. This model and the models that will come after it represent a phase change in the decades-long pursuit of artificial intelligence.

In this podcast series, I’ll share conversations with fellow researchers about our initial impressions of GPT-4, the nature of intelligence, and ultimately, how innovations like these can have the greatest benefit for humanity.


Today we’re sitting down with Peter Lee, head of Microsoft Research. Peter and a number of MSR colleagues, including myself, have had the privilege of working to evaluate and experiment with GPT-4 and support its integration into Microsoft products.

Peter has also deeply explored the potential application of GPT-4 in health care, where its powerful reasoning and language capabilities could make it a useful copilot for practitioners in patient interaction, managing paperwork, and many other tasks.

Welcome to AI Frontiers.

[MUSIC FADES]

I’m going to jump right in here, Peter. So you and I have known each other now for a few years. And one of the values I believe that you and I share is around societal impact and in particular creating spaces and opportunities where science and technology research can have the maximum benefit to society. In fact, this shared value is one of the reasons I found coming to Redmond to work with you an exciting prospect

Now, in preparing for this episode, I listened again to your discussion with our colleague Kevin Scott on his podcast around the idea of research in context. And the world’s changed a little bit since then, and I just wonder how that thought of research in context kind of finds you in the current moment.

Peter Lee: It’s such an important question and, you know, research in context, I think the way I explained it before is about inevitable futures. You try to think about, you know, what will definitely be true about the world at some point in the future. It might be a future just one year from now or maybe 30 years from now. But if you think about that, you know what’s definitely going to be true about the world and then try to work backwards from there.

And I think the example I gave in that podcast with Kevin was, well, 10 years from now, we feel very confident as scientists that cancer will be a largely solved problem. But aging demographics on multiple continents, particularly North America but also Europe and Asia, is going to give huge rise to age-related neurological disease. And so knowing that, that’s a very different world than today, because today most of medical research funding is focused on cancer research, not on neurological disease.

And so what are the implications of that change? And what does that tell us about what kinds of research we should be doing? The research is still very future oriented. You’re looking ahead a decade or more, but it’s situated in the real world. Research in context. And so now if we think about inevitable futures, well, it’s looking increasingly inevitable that very general forms of artificial intelligence at or potentially beyond human intelligence are inevitable. And maybe very quickly, you know, like in much, much less than 10 years, maybe much less than five years.

And so what are the implications for research and the kinds of research questions and problems we should be thinking about and working on today? That just seems so much more disruptive, so much more profound, and so much more challenging for all of us than the cancer and neurological disease thing, as big as those are.

I was reflecting a little bit through my research career, and I realized I’ve lived through one aspect of this disruption five times before. The first time was when I was still an assistant professor in the late 1980s at Carnegie Mellon University, and, uh, Carnegie Mellon University, as well as several other top universities’, uh, computer science departments, had a lot of, of really fantastic research on 3D computer graphics.

It was really a big deal. And so ideas like ray tracing, radiosity, uh, silicon architectures for accelerating these things were being invented at universities, and there was a big academic conference called SIGGRAPH that would draw hundreds of professors and graduate students, uh, to present their results. And then by the early 1990s, startup companies started taking these research ideas and founding companies to try to make 3D computer graphics real. One notable company that got founded in 1993 was NVIDIA.

You know, over the course of the 1990s, this ended up being a triumph of fundamental computer science research, now to the point where today you literally feel naked and vulnerable if you don’t have a GPU in your pocket. Like if you leave your home, you know, without your mobile phone, uh, it feels bad.

And so what happened is there’s a triumph of computer science research, let’s say in this case in 3D computer graphics, that ultimately resulted in a fundamental infrastructure for life, at least in the developed world. In that transition, which is just a positive outcome of research, it also had some disruptive effect on research.

You know, in 1991, when Microsoft Research was founded, one of the founding research groups was a 3D computer graphics research group that was amongst, uh, the first three research groups for MSR. At Carnegie Mellon University and at Microsoft Research, we don’t have 3D computer graphics research anymore. There had to be a transition and a disruptive impact on researchers who had been building their careers on this. Even with the triumph of things, when you’re talking about the scale of infrastructure for human life, it moves out of the realm completely of—of fundamental research. And that’s happened with compiler design. That was my, uh, area of research. It’s happened with wireless networking; it’s happened with hypertext and, you know, hyperlinked document research, with operating systems research, and all of these things, you know, have become things that that you depend on all day, every day as you go about your life. And they all represent just majestic achievements of computer science research. We are now, I believe, right in the midst of that transition for large language models.

Llorens: I wonder if you see this particular transition, though, as qualitatively different in that those other technologies are ones that blend into the background. You take them for granted. You mentioned that I leave the home every day with a GPU in my pocket, but I don’t think of it that way. Then again, maybe I have some kind of personification of my phone that I’m not thinking of. But certainly, with language models, it’s a foreground effect. And I wonder if, if you see something different there.

Lee: You know, it’s such a good question, and I don’t know the answer to that, but I agree it feels different. I think in terms of the impact on research labs, on academia, on the researchers themselves who have been building careers in this space, the effects might not be that different. But for us, as the consumers and users of this technology, it certainly does feel different. There’s something about these large language models that seems more profound than, let’s say, the movement of pinch-to-zoom UX design, you know, out of academic research labs into, into our pockets. This might get into this big question about, I think, the hardwiring in our brains that when we interact with these large language models, even though we know consciously they aren’t, you know, sentient beings with feelings and emotions, our hardwiring forces uswe can’t resist feeling that way.

I think it’s a, it’s a deep sort of thing that we evolved, you know, in the same way that when we look at an optical illusion, we can be told rationally that it’s an optical illusion, but the hardwiring in our kind of visual perception, just no amount of willpower can overcome, to see past the optical illusion.

And similarly, I think there’s a similar hardwiring that, you know, we are drawn to anthropomorphize these systems, and that does seem to put it into the foreground, as you’ve—as you’ve put it. Yeah, I think for our human experience and our lives, it does seem like it’ll feel—your term is a good one—it’ll feel more in the foreground.

Llorens: Let’s pin some of these, uh, concepts because I think we’ll come back to them. I’d like to turn our attention now to the health aspect of your current endeavors and your path at Microsoft.

You’ve been eloquent about the many challenges around translating frontier AI technologies into the health system and into the health care space in general. In our interview, [LAUGHS] actually, um, when I came here to Redmond, you described the grueling work that would be needed there. I’d like to talk a little bit about those challenges in the context of the emergent capabilities that we’re seeing in GPT-4 and the wave of large-scale AI models that we’re seeing. What’s different about this wave of AI technologies relative to those systemic challenges in, in the health space?

Lee: Yeah, and I think to be really correct and precise about it, we don’t know that GPT-4 will be the difference maker. That still has to be proven. I think it really will, but it, it has to actually happen because we’ve been here before where there’s been so much optimism about how technology can really help health care and in advanced medicine. And we’ve just been disappointed over and over again. You know, I think that those challenges stem from maybe a little bit of overoptimism or what I call irrational exuberance. As techies, we look at some of the problems in health care and we think, oh, we can solve those. You know, we look at the challenges of reading radiological images and measuring tumor growth, or we look at, uh, the problem of, uh, ranking differential diagnosis options or therapeutic options, or we look at the problem of extracting billing codes out of an unstructured medical note. These are all problems that we think we know how to solve in computer science. And then in the medical community, they look at the technology industry and computer science research, and they’re dazzled by all of the snazzy, impressive-looking AI and machine learning and cloud computing that we have. And so there is this incredible optimism coming from both sides that ends up feeding into overoptimism because the actual challenges of integrating technology into the workflow of health care and medicine, of making sure that it’s safe and sort of getting that workflow altered to really harness the best of the technology capabilities that we have now, ends up being really, really difficult.

Furthermore, when we get into actual application of medicine, so that’s in diagnosis and in developing therapeutic pathways, they happen in a really fluid environment, which in a machine learning context involves a lot of confounding factors. And those confounding factors ended up being really important because medicine today is founded on precise understanding of causes and effects, of causal reasoning.

Our best tools right now in machine learning are essentially correlation machines. And as the old saying goes, correlation is not causation. And so if you take a classic example like does smoking cause cancer, it’s very important to take account of the confounding effects and know for certain that there’s a cause-and-effect relationship there. And so there’s always been those sorts of issues.

When we’re talking about GPT-4, I remember I was sitting next to Eric Horvitz the first time it got exposed to me. So Greg Brockman from OpenAI, who’s amazing, and actually his whole team at OpenAI is just spectacularly good. And, uh, Greg was giving a demonstration of an early version of GPT-4 that was codenamed Davinci 3 at the time, and he was showing, as part of the demo, the ability of the system to solve biology problems from the AP biology exam.

And it, you know, gets, I think, a score of 5, the maximum score of 5, on that exam. Of course, the AP exam is this multiple-choice exam, so it was making those multiple choices. But then Greg was able to ask the system to explain itself. How did you come up with that answer? And it would explain, in natural language, its answer. And what jumped out at me was in its explanation, it was using the word “because.”

“Well, I think the answer is C, because, you know, when you look at this aspect, uh, statement of the problem, this causes something else to happen, then that causes some other biological thing to happen, and therefore we can rule out answers A and B and E, and then because of this other factor, we can rule out answer D, and all the causes and effects line up.”

And so I turned immediately to Eric Horvitz, who was sitting next to me, and I said, “Eric, where is that cause-and-effect analysis coming from? This is just a large language model. This should be impossible.” And Eric just looked at me, and he just shook his head and he said, “I have no idea.” And it was just this mysterious thing.

And so that is just one of a hundred aspects of GPT-4 that we’ve been studying over the past now more than half year that seemed to overcome some of the things that have been blockers to the integration of machine intelligence in health care and medicine, like the ability to actually reason and explain its reasoning in these medical scenarios, in medical terms, and that plus its generality just seems to give us just a lot more optimism that this could finally be the very significant difference maker.

The other aspect is that we don’t have to focus squarely on that clinical application. We’ve discovered that, wow, this thing is really good at filling out forms and reducing paperwork burden. It knows how to apply for prior authorization for health care reimbursement. That’s part of the crushing kind of administrative and clerical burden that doctors are under right now.

This thing just seems to be great at that. And that doesn’t really impinge on life-or-death diagnostic or therapeutic decisions. But they happen in the back office. And those back-office functions, again, are bread and butter for Microsoft’s businesses. We know how to interact and sell and deploy technologies there, and so working with OpenAI, it seems like, again, there’s just a ton of reason why we think that it could really make a big difference.

Llorens: Every new technology has opportunities and risks associated with it. This new class of AI models and systems, you know, they’re fundamentally different because they’re not learning, uh, specialized function mapping. There were many open problems on even that kind of machine learning in various applications, and there still are, but instead, it’s—it’s got this general-purpose kind of quality to it. How do you see both the opportunities and the risks associated with this kind of general-purpose technology in the context of, of health care, for example?

Lee: Well, I—I think one thing that has made an unfortunate amount of social media and public media attention are those times when the system hallucinates or goes off the rails. So hallucination is actually a term which isn’t a very nice term. It really, for listeners who aren’t familiar with the idea, is the problem that GPT-4 and other similar systems can have sometimes where they, uh, make stuff up, fabricate, uh, information.

You know, over the many months now that we’ve been working on this, uh, we’ve witnessed the steady evolution of GPT-4, and it hallucinates less and less. But what we’ve also come to understand is that it seems that that tendency is also related to GPT-4’s ability to be creative, to make informed, educated guesses, to engage in intelligent speculation.

And if you think about the practice of medicine, in many situations, that’s what doctors and nurses are doing. And so there’s sort of a fine line here in the desire to make sure that this thing doesn’t make mistakes versus its ability to operate in problem-solving scenarios that—the way I would put it is—for the first time, we have an AI system where you can ask it questions that don’t have any known answer. It turns out that that’s incredibly useful. But now the question is—and the risk is—can you trust the answers that you get? One of the things that happens is GPT-4 has some limitations, particularly that can be exposed fairly easily in mathematics. It seems to be very good at, say, differential equations and calculus at a basic level, but I have found that it makes some strange and elementary errors in basic statistics.

There’s an example from my colleague at Harvard Medical School, Zak Kohane, uh, where he uses standard Pearson correlation kinds of math problems, and it seems to consistently forget to square a term and—and make a mistake. And then what is interesting is when you point out the mistake to GPT-4, its first impulse sometimes is to say, “Uh, no, I didn’t make a mistake; you made a mistake.” Now that tendency to kind of accuse the user of making the mistake, it doesn’t happen so much anymore as the system has improved, but we still in many medical scenarios where there’s this kind of problem-solving have gotten in the habit of having a second instance of GPT-4 look over the work of the first one because it seems to be less attached to its own answers that way and it spots errors very readily.

So that whole story is a long-winded way of saying that there are risks because we’re asking this AI system for the first time to tackle problems that require some speculation, require some guessing, and may not have precise answers. That’s what medicine is at core. Now the question is to what extent can we trust the thing, but also, what are the techniques for making sure that the answers are as good as possible. So one technique that we’ve fallen into the habit of is having a second instance. And, by the way, that second instance ends up really being useful for detecting errors made by the human doctor, as well, because that second instance doesn’t care whether the answers were produced by man or machine. And so that ends up being important. But now moving away from that, there are bigger questions that—as you and I have discussed a lot, Ashley, at work—pertain to this phrase responsible AI, uh, which has been a research area in computer science research. And that term, I think you and I have discussed, doesn’t feel apt anymore.

I don’t know if it should be called societal AI or something like that. And I know you have opinions about this. You know, it’s not just errors and correctness. It’s not just the possibility that these things might be goaded into saying something harmful or promoting misinformation, but there are bigger issues about regulation; about job displacements, perhaps at societal scale; about new digital divides; about haves and have-nots with respect to access to these things. And so there are now these bigger looming issues that pertain to the idea of risks of these things, and they affect medicine and health care directly, as well.

Llorens: Certainly, this matter of trust is multifaceted. You know, there’s trust at the level of institutions, and then there’s trust at the level of individual human beings that need to make decisions, tough decisions, you know—where, when, and if to use an AI technology in the context of a workflow. What do you see in terms of health care professionals making those kinds of decisions? Any barriers to adoption that you would see at the level of those kinds of independent decisions? And what’s the way forward there?

Lee: That’s the crucial question of today right now. There is a lot of discussion about to what extent and how should, for medical uses, how should GPT-4 and its ilk be regulated. Let’s just take the United States context, but there are similar discussions in the UK, Europe, Brazil, Asia, China, and so on.

In the United States, there’s a regulatory agency, the Food and Drug Administration, the FDA, and they actually have authority to regulate medical devices. And there’s a category of medical devices called SaMDs, software as a medical device, and the big discussion really over the past, I would say, four or five years has been how to regulate SaMDs that are based on machine learning, or AI. Steadily, there’s been, uh, more and more approval by the FDA of medical devices that use machine learning, and I think the FDA and the United States has been getting closer and closer to actually having a fairly, uh, solid framework for validating ML-based medical devices for clinical use. As far as we’ve been able to tell, those emerging frameworks don’t apply at all to GPT-4. The methods for doing the clinical validation do not make sense and don’t work for GPT-4.

And so a first question to ask is—even before you get to, should this thing be regulated?—is if you were to regulate it, how on earth would you do it. Uh, because it’s basically putting a doctor’s brain in a box. And so, Ashley, if I put a doctor—let’s take our colleague Jim Weinstein, you know, a great spine surgeon. If we put his brain in a box and I give it to you and ask you, “Please validate this thing,” how on earth do you think about that? What’s the framework for that? And so my conclusion in all of this—it’s possible that regulators will react and impose some rules, but I think it would be a mistake, because I think my fundamental conclusion of all this is that at least for the time being, the rules of application engagement have to apply to human beings, not to the machines.

Now the question is what should doctors and nurses and, you know, receptionists and insurance adjusters, and all of the people involved, you know, hospital administrators, what are their guidelines and what is and isn’t appropriate use of these things. And I think that those decisions are not a matter for the regulators, but that the medical community itself should take ownership of the development of those guidelines and those rules of engagement and encourage, and if necessary, find ways to impose—maybe through medical licensing and other certification—adherence to those things.

That’s where we’re at today. Someday in the future—and we would encourage and in fact we are actively encouraging universities to create research projects that would try to explore frameworks for clinical validation of a brain in a box, and if those research projects bear fruit, then they might end up informing and creating a foundation for regulators like the FDA to have a new form of medical device. I don’t know what you would call it, AI MD, maybe, where you could actually relieve some of the burden from human beings and instead have a version of some sense of a validated, certified brain in a box. But until we get there, you know, I think it’s—it’s really on human beings to kind of develop and monitor and enforce their own behavior.

Llorens: I think some of these questions around test and evaluation, around assurance, are at least as interesting as, [LAUGHS] you knowdoing research in that space is going to be at least as interesting as—as creating the models themselves, for sure.

Lee: Yes. By the way, I want to take this opportunity just to commend Sam Altman and the OpenAI folks. I feel like, uh, you and I and other colleagues here at Microsoft Research, we’re in an extremely privileged position to get very early access, specifically to try to flesh out and get some early understanding of the implications for really critical areas of human development like health and medicine, education, and so on.

The instigator was really Sam Altman and crew at OpenAI. They saw the need for this, and they really engaged with us at Microsoft Research to kind of dive deep, and they gave us a lot of latitude to kind of explore deeply in as kind of honest and unvarnished a way as possible, and I think it’s important, and I’m hoping that as we share this with the world, that—that there can be an informed discussion and debate about things. I think it would be a mistake for, say, regulators or anyone to overreact at this point. This needs study. It needs debate. It needs kind of careful consideration, uh, just to understand what we’re dealing with here.

Llorens: Yeah, what a—what a privilege it’s been to be anywhere near the epicenter of these—of these advancements. Just briefly back to this idea of a brain in a box. One of the super interesting aspects of that is it’s not a human brain, right? So some of what we might intuitively think about when you say brain in the box doesn’t really apply, and it gets back to this notion of test and evaluation in that if I give a licensing exam, say, to the brain in the box and it passes it with flying colors, had that been a human, there would have been other things about the intelligence of that entity that are underlying assumptions that are not explicitly tested in that test that then those combined with the knowledge required for the certification makes you fit to do some job. It’s just interesting; there are ways in which the brain that we can currently conceive of as being an AI in that box underperforms human intelligence in some ways and overperforms it in others.

Lee: Right.

Llorens: Verifying and assuring that brain in that—that box I think is going to be just a really interesting challenge.

Lee: Yeah. Let me acknowledge that there are probably going to be a lot of listeners to this podcast who will really object to the idea of “brain in the box” because it crosses the line of kind of anthropomorphizing these systems. And I acknowledge that, that there’s probably a better way to talk about this than doing that. But I’m intentionally being overdramatic by using that phrase just to drive home the point, what a different beast this is when we’re talking about something like clinical validation. It’s not the kind of narrow AI—it’s not like a machine learning system that gives you a precise signature of a T-cell receptor repertoire. There’s a single right answer to those things. In fact, you can freeze the model weights in that machine learning system as we’ve done collaboratively with Adaptive Biotechnologies in order to get an FDA approval as a medical device, as an SaMD. There’s nothing that is—this is so much more stochastic. The model weights matter, but they’re not the fundamental thing.

There’s an alignment of a self-attention network that is in constant evolution. And you’re right, though, that it’s not a brain in some really very important ways. There’s no episodic memory. Uh, it’s not learning actively. And so it, I guess to your point, it is just, it’s a different thing. The big important thing I’m trying to say here is it’s also just different from all the previous machine learning systems that we’ve tried and successfully inserted into health care and medicine.

Llorens: And to your point, all the thinking around various kinds of societally important frameworks are trying to catch up to that previous generation and not yet even aimed really adequately, I think, at these new technologies. You know, as we start to wrap up here, maybe I’ll invoke Peter Lee, the head of Microsoft Research, again, [LAUGHS] kind of—kind of where we started. This is a watershed moment for AI and for computing research, uh, more broadly. And in that context, what do you see next for computing research?

Lee: Of course, AI is just looming so large and Microsoft Research is in a weird spot. You know, I had talked before about the early days of 3D computer graphics and the founding of NVIDIA and the decade-long kind of industrialization of 3D computer graphics, going from research to just, you know, pure infrastructure, technical infrastructure of life. And so with respect to AI, this flavor of AI, we’re sort of at the nexus of that. And Microsoft Research is in a really interesting position, because we are at once contributors to all of the research that is making what OpenAI is doing possible, along with, you know, great researchers and research labs around the world. We’re also then part of the company, Microsoft, that wants to make this with OpenAI a part of the infrastructure of everyday life for everybody. So we’re part of that transition. And so I think for that reason, Microsoft Research, uh, will be very focused on kind of major threads in AI; in fact, we’ve sort of identified five major AI threads.

One we’ve talked about, which is this sort of AI in society and the societal impact, which encompasses also responsible AI and so on. One that our colleague here at Microsoft Research Sébastien Bubeck has been advancing is this notion of the physics of AGI. There has always been a very important thread of theoretical computer science, uh, in machine learning. But what we’re finding is that that style of research is increasingly applicable to trying to understand the fundamental capabilities, limits, and trend lines for these large language models. And you don’t anymore get kind of hard mathematical theorems, but it’s still kind of mathematically oriented, just like physics of the cosmos and of the Big Bang and so on, so physics of AGI.

There’s a third aspect, which more is about the application level. And we’ve been, I think in some parts of Microsoft Research, calling that costar or copilot, you know, the idea of how is this thing a companion that amplifies what you’re trying to do every day in life? You know, how can that happen? What are the modes of interaction? And so on.

And then there is AI4Science. And, you know, we’ve made a big deal about this, and we still see just tremendous just evidence, in mounting evidence, that these large AI systems can give us new ways to make scientific discoveries in physics, in astronomy, in chemistry, biology, and the like. And that, you know, ends up being, you know, just really incredible.

And then there’s the core nuts and bolts, what we call model innovation. Just a little while ago, we released new model architectures, one called Kosmos, for doing multimodal kind of machine learning and classification and recognition interaction. Earlier, we did VALL-E, you know, which just based on a three-second sample of speech is able to ascertain your speech patterns and replicate speech. And those are kind of in the realm of model innovations, um, that will keep happening.

The long-term trajectory is that at some point, if Microsoft and other companies are successful, OpenAI and others, this will become a completely industrialized part of the infrastructure of our lives. And I think I would expect the research on large language models specifically to start to fade over the next decade. But then, whole new vistas will open up, and that’s on top of all the other things we do in cybersecurity, and in privacy and security, and the physical sciences, and on and on and on. For sure, it’s just a very, very special time in AI, especially along those five dimensions.

Llorens: It will be really interesting to see which aspects of the technology sink into the background and become part of the foundation and which ones remain up close and foregrounded and how those aspects change what it means to be human in some ways and maybe to be—to be intelligent, uh, in some ways. Fascinating discussion, Peter. Really appreciate the time today.

Lee: It was really great to have a chance to chat with you about things and always just great to spend time with you, Ashley.

Llorens: Likewise.

[MUSIC]

The post AI Frontiers: AI for health and the future of research with Peter Lee appeared first on Microsoft Research.

Read More

April Showers Bring 23 New GeForce NOW Games Including ‘Have a Nice Death’

April Showers Bring 23 New GeForce NOW Games Including ‘Have a Nice Death’

It’s another rewarding GFN Thursday, with 23 new games for April on top of 11 joining the cloud this week and a new Marvel’s Midnight Suns reward now available first for GeForce NOW Premium members.

Newark RT 4080 SuperPOD GeForce NOW
There are dozens of us…dozens!

Newark, N.J., is next to complete its upgrade to RTX 4080 SuperPODs, making it the 12th region worldwide to bring new performance to Ultimate members.

GeForce NOW on SHIELD TV is being updated for a more consistent experience across Android and TV devices. Update 6.00 has begun rolling out to SHIELD TV owners this week.

Plus, work is underway to bring the initial batch of Xbox first-party games and features to GeForce NOW.

Teamwork Makes the Dream Work

GeForce NOW and Microsoft

Last month, we announced a partnership with Microsoft to bring Xbox Game Studios PC games to the GeForce NOW library, including titles from Bethesda, Mojang Studios and Activision, pending closure of Microsoft’s acquisition. It’s a shared commitment to giving gamers more choice and enabling PC gamers to play their favorite games anywhere.

Since then the teams at both companies have been collaborating on delivering a best-in-class cloud gaming experience that PC gamers have come to expect, delivering a seamless experience across any device, whether playing locally or in the cloud.

We’re making progress, and in future GFN Thursdays we will provide an update on onboarding of individual titles from Microsoft’s incredibly rich catalog of first party PC games. Stay tuned to our GFN Thursday updates for the latest.

Medieval Marvel

Marvels Midnight Suns Reward on GeForce NOW
Fight among the legends in Captain Marvel’s Medieval Marvel suit.

Starting today, Premium GeForce NOW members can claim their marvel-ous new reward. Marvel’s Midnight Suns, the tactical role-playing game from the creators of XCOM, has been praised for its immersive game play and cutting-edge visuals with support for DLSS 3 technology on top of RTX-powered ray tracing.

With the game’s first downloadable content, called The Good, The Bad, and The Undead, fans were thrilled to welcome Deadpool to the roster. This week, members can get their free reward to secure Captain Marvel’s Medieval Marvel suit.

Ultimate and Priority members can visit the GeForce NOW Rewards portal today and update the settings to start receiving special offers and in-game goodies. Better hurry, as this reward is available on a first-come, first-served basis only through Saturday, May 6.

April is FOOL of Games

Have a Nice Death on GeForce NOW
Death Inc. opens a new branch in the cloud.

No joke, kick the weekend off right by streaming Have a Nice Death. Restore order in this darkly charming 2D action game from Gearbox while playing as an overworked Death whose employees at Death Inc. have run rampant as caretakers of souls. Hack and slash through numerous minions and bosses in each department at the company, using unique weapons and spells.

This leads the 11 new games joining the cloud this week:

  • 9 Years of Shadows (New release on Steam)
  • Terra Nil (New release on Steam, March 28)
  • Gripper (New release on Steam, March 29)
  • Smalland: Survive the Wilds (New release on Steam, March 29)
  • DREDGE (New release on Steam, March 30)
  • Ravenbound (New release on Steam, March 30)
  • The Great War: Western Front (New release on Steam, March 30)
  • Troublemaker (New release on Steam, March 31)
  • Have a Nice Death (Steam)
  • Tower of Fantasy (Steam)
  • Tunche (Free on Epic Games Store)

Plus, look forward to the rest of April:

  • Meet Your Maker (New release on Steam, April 4)
  • Road 96: Mile 0 (New release on Steam, April 4)
  • TerraScape (New release on Steam, April 5)
  • Curse of the Sea Rats (New release on Steam, April 6)
  • Ravenswatch (New release on Steam, April 6)
  • Supplice (New release on Steam, April 6)
  • DE-EXIT – Eternal Matters (New release on Steam, April 14)
  • Survival: Fountain of Youth (New release on Steam, April 19)
  • Tin Hearts (New release on Steam, April 20)
  • Dead Island 2 (New Release on Epic Games Store, April 21)
  • Afterimage (New release on Steam, April 25)
  • Roots of Pacha (New release on Steam, April 25)
  • Bramble: The Mountain King (New release on Steam, April 27)
  • 11-11 Memories Retold (Steam)
  • canVERSE (Steam)
  • Teardown (Steam)
  • Get Even (Steam)
  • Little Nightmares (Steam)
  • Little Nightmares II (Steam)
  • The Dark Pictures Anthology: Man of Medan (Steam)
  • The Dark Pictures Anthology: Little Hope (Steam)
  • The Dark Pictures Anthology: House of Ashes (Steam)
  • The Dark Pictures Anthology: The Devil in Me (Steam)

More March Madness

On top of the 19 games announced in March, nine extra ones joined the GeForce NOW library this month, including this week’s additions 9 Years of Shadows, Terra Nil, Gripper, Troublemaker, Have a Nice Death, Tunche, as well as:

System Shock didn’t make it in March due to a shift in its release date, nor did Chess Ultra due to a technical issue.

With so many titles streaming from the cloud, what game will you play next? Let us know in the comments below, on Twitter or on Facebook.

Read More