Facebook launches new research award opportunity focused on digital privacy

Facebook is deeply committed to honoring people’s privacy in our products, policies, and services. Part of that commitment involves empowering the academic community to pursue research in this area in order to broaden our collective understanding of global privacy expectations and experiences. With this commitment in mind, Facebook launched the first of a series of privacy-related research award opportunities in November 2019, followed by a second opportunity in January 2020. As a continuation of this series, Facebook now invites the academic community to respond to the People’s Expectations and Experiences with Digital Privacy request for proposals (RFP).
View RFP“This research award opportunity seeks applications from across the social sciences and technical disciplines, and encourages collaboration between fields,” said Liz Keneski, Head of Privacy Research at Facebook. “We are keen to engage with academia in this cross-disciplinary space to inform the creation of world-class privacy experiences for our users.”

Disciplines include but are not limited to anthropology, communications, computer science, economics, engineering, human-computer interaction, human factors, political science, social psychology, and sociology. We are particularly interested in proposals that focus on the following two areas:

  1. Advancing understanding of users’ privacy attitudes, concerns, preferences, needs, behaviors, and outcomes
  2. Informing novel interventions for digital transparency and control that are meaningful for diverse populations, contexts, and data types

Applications for this RFP are now open. Deadline to apply is Wednesday, September 16, at 5:00 p.m. AOE. For more information about areas of interest, proposal requirements, eligibility, and more, visit the application page.

The post Facebook launches new research award opportunity focused on digital privacy appeared first on Facebook Research.

Read More

Shrinking deep learning’s carbon footprint

In June, OpenAI unveiled the largest language model in the world, a text-generating tool called GPT-3 that can write creative fiction, translate legalese into plain English, and answer obscure trivia questions. It’s the latest feat of intelligence achieved by deep learning, a machine learning method patterned after the way neurons in the brain process and store information.

But it came at a hefty price: at least $4.6 million and 355 years in computing time, assuming the model was trained on a standard neural network chip, or GPU. The model’s colossal size — 1,000 times larger than a typical language model — is the main factor in its high cost.

“You have to throw a lot more computation at something to get a little improvement in performance,” says Neil Thompson, an MIT researcher who has tracked deep learning’s unquenchable thirst for computing. “It’s unsustainable. We have to find more efficient ways to scale deep learning or develop other technologies.”

Some of the excitement over AI’s recent progress has shifted to alarm. In a study last year, researchers at the University of Massachusetts at Amherst estimated that training a large deep-learning model produces 626,000 pounds of planet-warming carbon dioxide, equal to the lifetime emissions of five cars. As models grow bigger, their demand for computing is outpacing improvements in hardware efficiency. Chips specialized for neural-network processing, like GPUs (graphics processing units) and TPUs (tensor processing units), have offset the demand for more computing, but not by enough. 

“We need to rethink the entire stack — from software to hardware,” says Aude Oliva, MIT director of the MIT-IBM Watson AI Lab and co-director of the MIT Quest for Intelligence. “Deep learning has made the recent AI revolution possible, but its growing cost in energy and carbon emissions is untenable.”

Computational limits have dogged neural networks from their earliest incarnation — the perceptron — in the 1950s. As computing power exploded, and the internet unleashed a tsunami of data, they evolved into powerful engines for pattern recognition and prediction. But each new milestone brought an explosion in cost, as data-hungry models demanded increased computation. GPT-3, for example, trained on half a trillion words and ballooned to 175 billion parameters — the mathematical operations, or weights, that tie the model together — making it 100 times bigger than its predecessor, itself just a year old.

In work posted on the pre-print server arXiv, Thompson and his colleagues show that the ability of deep learning models to surpass key benchmarks tracks their nearly exponential rise in computing power use. (Like others seeking to track AI’s carbon footprint, the team had to guess at many models’ energy consumption due to a lack of reporting requirements). At this rate, the researchers argue, deep nets will survive only if they, and the hardware they run on, become radically more efficient.

Toward leaner, greener algorithms

The human perceptual system is extremely efficient at using data. Researchers have borrowed this idea for recognizing actions in video and in real life to make models more compact. In a paper at the European Conference on Computer Vision (ECCV) in August, researchers at the MIT-IBM Watson AI Lab describe a method for unpacking a scene from a few glances, as humans do, by cherry-picking the most relevant data.

Take a video clip of someone making a sandwich. Under the method outlined in the paper, a policy network strategically picks frames of the knife slicing through roast beef, and meat being stacked on a slice of bread, to represent at high resolution. Less-relevant frames are skipped over or represented at lower resolution. A second model then uses the abbreviated CliffsNotes version of the movie to label it “making a sandwich.” The approach leads to faster video classification at half the computational cost as the next-best model, the researchers say.

“Humans don’t pay attention to every last detail — why should our models?” says the study’s senior author, Rogerio Feris, research manager at the MIT-IBM Watson AI Lab. “We can use machine learning to adaptively select the right data, at the right level of detail, to make deep learning models more efficient.”

In a complementary approach, researchers are using deep learning itself to design more economical models through an automated process known as neural architecture search. Song Han, an assistant professor at MIT, has used automated search to design models with fewer weights, for language understanding and scene recognition, where picking out looming obstacles quickly is acutely important in driving applications. 

In a paper at ECCV, Han and his colleagues propose a model architecture for three-dimensional scene recognition that can spot safety-critical details like road signs, pedestrians, and cyclists with relatively less computation. They used an evolutionary-search algorithm to evaluate 1,000 architectures before settling on a model they say is three times faster and uses eight times less computation than the next-best method. 

In another recent paper, they use evolutionary search within an augmented designed space to find the most efficient architectures for machine translation on a specific device, be it a GPU, smartphone, or tiny Raspberry Pi. Separating the search and training process leads to huge reductions in computation, they say.

In a third approach, researchers are probing the essence of deep nets to see if it might be possible to train a small part of even hyper-efficient networks like those above. Under their proposed lottery ticket hypothesis, PhD student Jonathan Frankle and MIT Professor Michael Carbin proposed that within each model lies a tiny subnetwork that could have been trained in isolation with as few as one-tenth as many weights — what they call a “winning ticket.” 

They showed that an algorithm could retroactively find these winning subnetworks in small image-classification models. Now, in a paper at the International Conference on Machine Learning (ICML), they show that the algorithm finds winning tickets in large models, too; the models just need to be rewound to an early, critical point in training when the order of the training data no longer influences the training outcome. 

In less than two years, the lottery ticket idea has been cited more than more than 400 times, including by Facebook researcher Ari Morcos, who has shown that winning tickets can be transferred from one vision task to another, and that winning tickets exist in language and reinforcement learning models, too. 

“The standard explanation for why we need such large networks is that overparameterization aids the learning process,” says Morcos. “The lottery ticket hypothesis disproves that — it’s all about finding an appropriate starting point. The big downside, of course, is that, currently, finding these ‘winning’ starting points requires training the full overparameterized network anyway.”

Frankle says he’s hopeful that an efficient way to find winning tickets will be found. In the meantime, recycling those winning tickets, as Morcos suggests, could lead to big savings.

Hardware designed for efficient deep net algorithms

As deep nets push classical computers to the limit, researchers are pursuing alternatives, from optical computers that transmit and store data with photons instead of electrons, to quantum computers, which have the potential to increase computing power exponentially by representing data in multiple states at once.

Until a new paradigm emerges, researchers have focused on adapting the modern chip to the demands of deep learning. The trend began with the discovery that video-game graphical chips, or GPUs, could turbocharge deep-net training with their ability to perform massively parallelized matrix computations. GPUs are now one of the workhorses of modern AI, and have spawned new ideas for boosting deep net efficiency through specialized hardware. 

Much of this work hinges on finding ways to store and reuse data locally, across the chip’s processing cores, rather than waste time and energy shuttling data to and from a designated memory site. Processing data locally not only speeds up model training but improves inference, allowing deep learning applications to run more smoothly on smartphones and other mobile devices.

Vivienne Sze, a professor at MIT, has literally written the book on efficient deep nets. In collaboration with book co-author Joel Emer, an MIT professor and researcher at NVIDIA, Sze has designed a chip that’s flexible enough to process the widely-varying shapes of both large and small deep learning models. Called Eyeriss 2, the chip uses 10 times less energy than a mobile GPU.

Its versatility lies in its on-chip network, called a hierarchical mesh, that adaptively reuses data and adjusts to the bandwidth requirements of different deep learning models. After reading from memory, it reuses the data across as many processing elements as possible to minimize data transportation costs and maintain high throughput. 

“The goal is to translate small and sparse networks into energy savings and fast inference,” says Sze. “But the hardware should be flexible enough to also efficiently support large and dense deep neural networks.”

Other hardware innovators are focused on reproducing the brain’s energy efficiency. Former Go world champion Lee Sedol may have lost his title to a computer, but his performance was fueled by a mere 20 watts of power. AlphaGo, by contrast, burned an estimated megawatt of energy, or 500,000 times more.

Inspired by the brain’s frugality, researchers are experimenting with replacing the binary, on-off switch of classical transistors with analog devices that mimic the way that synapses in the brain grow stronger and weaker during learning and forgetting.

An electrochemical device, developed at MIT and recently published in Nature Communications, is modeled after the way resistance between two neurons grows or subsides as calcium, magnesium or potassium ions flow across the synaptic membrane dividing them. The device uses the flow of protons — the smallest and fastest ion in solid state — into and out of a crystalline lattice of tungsten trioxide to tune its resistance along a continuum, in an analog fashion.

“Even though is not yet optimized, it gets to the order of energy consumption per unit area per unit change in conductance that’s close to that in the brain,” says the study’s senior author, Bilge Yildiz, a professor at MIT.

Energy-efficient algorithms and hardware can shrink AI’s environmental impact. But there are other reasons to innovate, says Sze, listing them off: Efficiency will allow computing to move from data centers to edge devices like smartphones, making AI accessible to more people around the world; shifting computation from the cloud to personal devices reduces the flow, and potential leakage, of sensitive data; and processing data on the edge eliminates transmission costs, leading to faster inference with a shorter reaction time, which is key for interactive driving and augmented/virtual reality applications.

“For all of these reasons, we need to embrace efficient AI,” she says.

Read More

Announcing the winners of Facebook’s request for proposals on misinformation and polarization

Misinformation and polarization are fundamental challenges we face, not just as a company with the mission of bringing people together but also as members of societies dealing with layered challenges ranging from election interference to a global pandemic.

At the end of February, Facebook Research launched a request for proposals focusing on these dual challenges. Our goal is to support independent research that will contribute to the understanding of these phenomena and, in the long term, help us improve our policies, interventions, and tooling. We invited proposals that took any of a wide variety of research approaches to bring new perspectives into ongoing work on issues like health misinformation, affective polarization, digital literacy, and more.

We received over 1,000 proposals from 600 institutions and 77 countries around the world that covered an impressive range of disciplines and methodological approaches. The 25 awardees intend to investigate these issues across 42 countries: Argentina, Australia, Brazil, Canada, Chile, China, Colombia, Denmark, Egypt, Ethiopia, Germany, Ghana, Hungary, India, Indonesia, Israel, Italy, Kenya, Mexico, Myanmar, New Zealand, Nigeria, Pakistan, Philippines, Russia, Rwanda, Spain, South Africa, South Korea, Sudan, Taiwan, Tanzania, Thailand, Tunisia, Turkey, Uganda, Ukraine, United Arab Emirates, United Kingdom, United States, Vietnam, and Zimbabwe.

Proposals were evaluated by a selection committee comprising members of Facebook’s research and policy teams. The selection process was incredibly competitive, so we want to thank all the researchers who took the time to submit a proposal. Congratulations to the winners.

Research award winners

The names listed below are the principal investigators of each proposal.

Affective polarization and contentious politics: Women’s movement in Mexico
Marta Barbara Ochman, Instituto Tecnológico y de Estudios Superiores de Monterrey

Affective polarization: Causal drivers, online networks, and Interventions
Selim Erdem Aytaç, Koç University

Can third party fact-checkers on Facebook reduce affective polarization?
Fei Shen, City University of Hong Kong

Countering deepfake misinformation among low digital-literacy populations
Ayesha Ali, Lahore University of Management Sciences

Cross-cultural psychological motivations of online political hostility
Michael Bang Petersen, Aarhus University

Dangerous speech, social media and violence in Ethiopia
Mercy Fekadu Mulugeta, Addis Ababa University

Digital literacy and misinformation among smallholder farmers in Tanzania
Justin Kalisti Urassa, Sokoine University of Agriculture

Digital literacy, demographics and misinformation in Myanmar
Hilary Oliva Faxon, Phandeeyar

Digital literacy in East Africa: A three country comparative study
Meghan Sobel Cohen, Regis University

Digital literacy in Latin America: Developing measures for WhatsApp
Kevin Munger, Pennsylvania State University

Do online video recommendation algorithms increase affective polarization?
Brandon Stewart, Princeton University

Do users in India, Kenya, Ghana react differently to problematic content?
Godfred Bokpin, CUTS Accra

Examining how ingroup dissent on social media mitigates false polarization
Victoria A. Parker, Wilfrid Laurier University

Exploring harmful [mis]information via normalized online violent content
Joanne Lloyd, University of Wolverhampton

Indigenous women and LBGTQI+ people and violence on Facebook
Bronwyn Carlson, Macquarie University

Micro-Influencers as digital community health workers
Kathryn Cottingham, Dartmouth College

Political elites and the appeal of fake news in Brazil
Natália Salgado Bueno, Emory University

Political identity ownership
Shannon C. McGregor, University of North Carolina at Chapel Hill

Quantifying harms of misinformation during the U.S. presidential election
Erik C. Nisbet, Northwestern University

Quantifying persistent effects of misinformation via neural signals
Joseph W. Kable, University of Pennsylvania

STOP! Selective trust originates polarization
Sergio Splendore, Universitá degli Studi di Milano

The circulation of dangerous speech in the 2020 Brazilian elections
Lucas Calil Guimarães Silva, Fundação Getúlio Vargas

The contagion of misinformation
Heidi Larson, London School of Hygiene & Tropical Medicine

Unpacking trust and bias in social media news in developing countries
Denis Stukal, University of Sydney

When online speech meets offline harm: Internet shutdowns in Africa
Nicole Stremlau, University of Oxford

To view our currently open research awards and to subscribe to our email list, visit our Research Awards page.

The post Announcing the winners of Facebook’s request for proposals on misinformation and polarization appeared first on Facebook Research.

Read More

Build more effective conversations on Amazon Lex with confidence scores and increased accuracy

In the rush of our daily lives, we often have conversations that contain ambiguous or incomplete sentences. For example, when talking to a banking associate, a customer might say, “What’s my balance?” This request is ambiguous and it is difficult to disambiguate if the intent of the customer is to check the balance on her credit card or checking account. Perhaps she only has a checking account with the bank. The agent can provide a good customer experience by looking up the account details and identifying that the customer is referring to the checking account. In the customer service domain, agents often have to resolve such uncertainty in interpreting the user’s intent by using the contextual data available to them. Bots face the same ambiguity and need to determine the correct intent by augmenting the contextual data available about the customer.

Today, we’re launching natural language understanding improvements and confidence scores on Amazon Lex. We continuously improve the service based on customer feedback and advances in research. These improvements enable better intent classification accuracy. You can also achieve better detection of user inputs not included in the training data (out of domain utterances). In addition, we provide confidence score support to indicate the likelihood of a match with a certain intent. The confidence scores for the top five intents are surfaced as part of the response. This better equips you to handle ambiguous scenarios such as the one we described. In such cases, where two or more intents are matched with reasonably high confidence, intent classification confidence scores can help you determine when you need to use business logic to clarify the user’s intent. If the user only has a credit card, then you can trigger the intent to surface the balance on the credit card. Alternately, if the user has both a credit card and a checking account, you can pose a clarification question such as “Is that for your credit card or checking account?” before proceeding with the query. You now have better insights to manage the conversation flow and create more effective conversations.

This post shows how you can use these improvements with confidence scores to trigger the best response based on business knowledge.

Building a Lex bot

This post uses the following conversations to model a bot.

If the customer has only one account with the bank:

User: What’s my balance?
Agent: Please enter your ATM card PIN to confirm
User: 5555
Agent: Your checking account balance is $1,234.00

As well as an alternate conversation path, where the customer has multiple types of accounts:

User: What’s my balance?
Agent: Sure. Is this for your checking account, or credit card?
User: My credit card
Agent: Please enter your card’s CCV number to confirm
User: 1212
Agent: Your credit card balance is $3,456.00

The first step is to build an Amazon Lex bot with intents to support transactions such as balance inquiry, funds transfer, and bill payment. The GetBalanceCreditCard, GetBalanceChecking, and GetBalanceSavings intents provide account balance information. The PayBills intent processes payments to payees, and TransferFunds enables the transfer of funds from one account to another. Lastly, you can use the OrderChecks intent to replenish checks. When a user makes a request that the Lex bot can’t process with any of these intents, the fallback intent is triggered to respond.

Deploying the sample Lex bot

To create the sample bot, perform the following steps. For this post, you create an Amazon Lex bot BankingBot, and an AWS Lambda function called BankingBot_Handler.

  1. Download the Amazon Lex bot definition and Lambda code.
  2. On the Lambda console, choose Create function.
  3. Enter the function name BankingBot_Handler.
  4. Choose the latest Python runtime (for example, Python 3.8).
  5. For Permissions, choose Create a new role with basic Lambda permissions.
  6. Choose Create function.
  7. When your new Lambda function is available, in the Function code section, choose Actions and Upload a .zip file.
  8. Choose the BankingBot.zip file that you downloaded.
  9. Choose Save.
  10. On the Amazon Lex console, choose Actions, Import.
  11. Choose the file BankingBot.zip that you downloaded and choose Import.
  12. Select the BankingBot bot on the Amazon Lex console.
  13. In the Fulfillment section, for each of the intents, including the fallback intent (BankingBotFallback), choose AWS Lambda function and choose the BankingBot_Handler function from the drop-down menu.
  14. When prompted to Add permission to Lambda function, choose OK.
  15. When all the intents are updated, choose Build.

At this point, you should have a working Lex bot.

Setting confidence score thresholds

You’re now ready to set an intent confidence score threshold. This setting controls when Amazon Lex will default to Fallback Intent based on the confidence scores of intents. To configure the settings, complete the following steps:

  1. On the Amazon Lex console, choose Settings, and the choose General.
  2. For us-east-1, us-west-2, ap-southeast-2, or eu-west-1, scroll down to Advanced options and select Yes to opt in to the accuracy improvements and features to enable the confidence score feature.

These improvements and confidence score support are enabled by default in other Regions.

  1. For Confidence score threshold, enter a number between 0 and 1. You can choose to leave it at the default value of 0.4.

  1. Choose Save and then choose Build.

When the bot is configured, Amazon Lex surfaces the confidence scores and alternative intents in the PostText and PostContent responses:

   "alternativeIntents": [ 
      { 
         "intentName": "string",
         "nluIntentConfidence": { 
            "score": number
         },
         "slots": { 
            "string": "string" 
         }
      }
   ]

Using a Lambda function and confidence scores to identify the user intent

When the user makes an ambiguous request such as “Can I get my account balance?” the Lambda function parses the list of intents that Amazon Lex returned. If multiple intents are returned, the function checks whether the top intents have similar scores as defined by an AMBIGUITY_RANGE value. For example, if one intent has a confidence score of 0.95 and another has a score of 0.65, the first intent is probably correct. However, if one intent has a score of 0.75 and another has a score of 0.72, you may be able to discriminate between the two intents using business knowledge in your application. In our use case, if the customer holds multiple accounts, the function is configured to respond with a clarification question such as, “Is this for your credit card or for your checking account?” But if the customer holds only a single account (for example, checking), the balance for that account is returned.

When you use confidence scores, Amazon Lex returns the most likely intent and up to four alternative intents with their associated scores in each response. If all the confidence scores are less than the threshold you defined, Amazon Lex includes the AMAZON.FallbackIntent intent, the AMAZON.KendraSearchIntent intent, or both. You can use the default threshold or you can set your own threshold value.

The following code samples are from the Lambda code you downloaded when you deployed this sample bot. You can adapt it for use with any Amazon Lex bot.

The Lambda function’s dispatcher function forwards requests to handler functions, but for the GetBalanceCreditCard, GetBalanceChecking, and GetBalanceSavings intents, it forwards to determine_intent instead.

The determine_intent function inspects the top event as reported by Lex, as well as any alternative intents. If an alternative intent is valid for the user (based on their accounts), and is within the AMBIGUITY_RANGE of the top event, it is added to a list of possible events.

possible_intents = []
# start with the top intent (if it is valid for the user)
top_intent = intent_request["currentIntent"]
if top_intent["name"] in valid_intents:
    possible_intents.append(top_intent)

# add any alternative intents that are within the AMBIGUITY_RANGE
# if they are valid for the user
if intent_request.get("alternativeIntents", None):
    top_intent_score = top_intent["nluIntentConfidenceScore"]
    for alternative_intent in intent_request["alternativeIntents"]:
        alternative_intent_score = alternative_intent["nluIntentConfidenceScore"]
        if top_intent_score is None:
            top_intent_score = alternative_intent_score                            
        if alternative_intent["name"] in valid_intents:
            if abs(top_intent_score - alternative_intent_score) <= AMBIGUITY_RANGE:
                possible_intents.append(alternative_intent)

If there is only one possible intent for the user, it is fulfilled (after first eliciting any missing slots).

num_intents = len(possible_intents)
if num_intents == 1:
    # elicit any missing slots or fulfill the intent
    slots = possible_intents[0]["slots"]
    for slot_name, slot_value in slots.items():
        if slot_value is None:
            return elicit_slot(intent_request['sessionAttributes'],
                          possible_intents[0]["name"], slots, slot_name)
    # dispatch to the appropriate fulfillment method 
    return HANDLERS[possible_intents[0]['name']]['fulfillment'](intent_request)

If there are multiple possible intents, ask the user for clarification.

elif num_intents > 1:
    counter = 0
    response = ""
    while counter < num_intents:
        if counter == 0:
            response += "Sure. Is this for your " + 
               INTENT_TO_ACCOUNT_MAPPING[possible_intents[counter]["name"]]
        elif counter < num_intents - 1:
            response += ", " + 
               INTENT_TO_ACCOUNT_MAPPING[possible_intents[counter]["name"]]
        else:
            response += ", or " + 
               INTENT_TO_ACCOUNT_MAPPING[possible_intents[counter]["name"]] + "?"
        counter += 1
    return elicit_intent(form_message(response))

If there are no possible intents for the user, the fallback intent is triggered.

else:
    return fallback_handler(intent_request)

To test this, you can change the test user configuration in the code by changing the return value from the check_available_accounts function:

# This could be a DynamoDB table or other data store
USER_LIST = {
    "user_with_1": [AccountType.CHECKING],
    "user_with_2": [AccountType.CHECKING, AccountType.CREDIT_CARD],
    "user_with_3": [AccountType.CHECKING, AccountType.SAVINGS, AccountType.CREDIT_CARD]
}

def check_available_accounts(user_id: str):
    # change user ID to test different scenarios
    return USER_LIST.get("user_with_2")

You can see the Lex confidence scores in the Amazon Lex console, or in your Lambda functions CloudWatch Logs log file.

Confidence scores can also be used to test different versions of your bot. For example, if you add new intents, utterances, or slot values, you can test the bot and inspect the confidence scores to see if your changes had the desired effect.

Conclusion

Although people aren’t always precise in their wording when they interact with a bot, we still want to provide them with a natural user experience. With natural language understanding improvements and confidence scores now available on Amazon Lex, you have additional information available to design a more intelligent conversation. You can couple the machine learning-based intent matching capabilities of Amazon Lex with your own business logic to zero in on your user’s intent. You can also use the confidence score threshold while testing during bot development, to determine if changes to the sample utterances for intents have the desired effect. These improvements enable you to design more effective conversation flows. For more information about incorporating these techniques into your bots, see Amazon Lex documentation.


About the Authors

Trevor Morse works as a Software Development Engineer at Amazon AI. He focuses on building and expanding the NLU capabilities of Lex. When not at a keyboard, he enjoys playing sports and spending time with family and friends.

 

 

 

Brian Yost is a Senior Consultant with the AWS Professional Services Conversational AI team. In his spare time, he enjoys mountain biking, home brewing, and tinkering with technology.

 

 

 

As a Product Manager on the Amazon Lex team, Harshal Pimpalkhute spends his time trying to get machines to engage (nicely) with humans.

 

 

 

 

 

Read More

MediaPipe Iris: Real-time Iris Tracking & Depth Estimation

Posted by Andrey Vakunov and Dmitry Lagun, Research Engineers, Google Research

A wide range of real-world applications, including computational photography (e.g., portrait mode and glint reflections) and augmented reality effects (e.g., virtual avatars) rely on estimating eye position by tracking the iris. Once accurate iris tracking is available, we show that it is possible to determine the metric distance from the camera to the user — without the use of a dedicated depth sensor. This, in-turn, can improve a variety of use cases, ranging from computational photography, over virtual try-on of properly sized glasses and hats to usability enhancements that adopt the font size depending on the viewer’s distance.

Iris tracking is a challenging task to solve on mobile devices, due to limited computing resources, variable light conditions and the presence of occlusions, such as hair or people squinting. Often, sophisticated specialized hardware is employed, limiting the range of devices on which the solution could be applied.

FaceMesh can be adopted to drive virtual avatars (middle). By additionally employing iris tracking (right), the avatar’s liveliness is significantly enhanced.
An example of eye re-coloring enabled by MediaPipe Iris.

Today, we announce the release of MediaPipe Iris, a new machine learning model for accurate iris estimation. Building on our work on MediaPipe Face Mesh, this model is able to track landmarks involving the iris, pupil and the eye contours using a single RGB camera, in real-time, without the need for specialized hardware. Through use of iris landmarks, the model is also able to determine the metric distance between the subject and the camera with relative error less than 10% without the use of depth sensor. Note that iris tracking does not infer the location at which people are looking, nor does it provide any form of identity recognition. Thanks to the fact that this system is implemented in MediaPipe — an open source cross-platform framework for researchers and developers to build world-class ML solutions and applications — it can run on most modern mobile phones, desktops, laptops and even on the web.

Usability prototype for far-sighted individuals: observed font size remains constant independent of the device distance from the user.

An ML Pipeline for Iris Tracking
The first step in the pipeline leverages our previous work on 3D Face Meshes, which uses high-fidelity facial landmarks to generate a mesh of the approximate face geometry. From this mesh, we isolate the eye region in the original image for use in the iris tracking model. The problem is then divided into two parts: eye contour estimation and iris location. We designed a multi-task model consisting of a unified encoder with a separate component for each task, which allowed us to use task-specific training data.

Examples of iris (blue) and eyelid (red) tracking.

To train the model from the cropped eye region, we manually annotated ~50k images, representing a variety of illumination conditions and head poses from geographically diverse regions, as shown below.

Eye region annotated with eyelid (red) and iris (blue) contours.
Cropped eye regions form the input to the model, which predicts landmarks via separate components.

Depth-from-Iris: Depth Estimation from a Single Image
Our iris-tracking model is able to determine the metric distance of a subject to the camera with less than 10% error, without requiring any specialized hardware. This is done by relying on the fact that the horizontal iris diameter of the human eye remains roughly constant at 11.7±0.5 mm across a wide population [1, 2, 3, 4], along with some simple geometric arguments. For illustration, consider a pinhole camera model projecting onto a sensor of square pixels. The distance to a subject can be estimated from facial landmarks by using the focal length of the camera, which can be obtained using camera capture APIs or directly from the EXIF metadata of a captured image, along with other camera intrinsic parameters. Given the focal length, the distance from the subject to the camera is directly proportional to the physical size of the subject’s eye, as visualized below.

The distance of the subject (d) can be computed from the focal length (f) and the size of the iris using similar triangles.
Left: MediaPipe Iris predicting metric distance in cm on a Pixel 2 from iris tracking alone, without the use of a depth sensor. Right: Ground-truth depth.

In order to quantify the accuracy of the method, we compared it to the depth sensor on an iPhone 11 by collecting front-facing, synchronized video and depth images on over 200 participants. We experimentally verified the error of the iPhone 11 depth sensor to be < 2% for distances up to 2 meters, using a laser ranging device. Our evaluation shows that our approach for depth estimation using iris size has a mean relative error of 4.3% and standard deviation of 2.4%. We tested our approach on participants with and without eyeglasses (not accounting for contact lenses on participants) and found that eyeglasses increase the mean relative error slightly to 4.8% (standard deviation 3.1%). We did not test this approach on participants with any eye diseases (like arcus senilis or pannus). Considering MediaPipe Iris requires no specialized hardware, these results suggest it may be possible to obtain metric depth from a single image on devices with a wide range of cost-points.

Histogram of estimation errors (left) and comparison of actual to estimated distance by iris (right).

Release of MediaPipe Iris
We are releasing the iris and depth estimation models as a cross-platform MediaPipe pipeline that can run on desktop, mobile and the web. As described in our recent Google Developer Blog post on MediaPipe on the web, we leverage WebAssembly and XNNPACK to run our Iris ML pipeline locally in the browser, without any data being sent to the cloud.

Using MediaPipe’s WASM stack, you can run the models locally in your browser! Left: Iris tracking. Right: Depth from Iris computed just from a photo with EXIF data. Iris tracking can be tried out here and iris depth measurements here.

Future Directions
We plan to extend our MediaPipe Iris model with even more stable tracking for lower error and deploy it for accessibility use cases. We strongly believe in sharing code that enables reproducible research, rapid experimentation, and development of new ideas in different areas. In our documentation and the accompanying Model Card, we detail the intended uses, limitations and model fairness to ensure that use of these models aligns with Google’s AI Principles. Note, that any form of surveillance or identification is explicitly out of scope and not enabled by this technology. We hope that providing this iris perception functionality to the wider research and development community will result in an emergence of creative use cases, stimulating responsible new applications and new research avenues.

For more ML solutions from MediaPipe, please see our solutions page and Google Developer blog for the latest updates.

Acknowledgements
We would like to thank Artsiom Ablavatski, Andrei Tkachenka, Buck Bourdon, Ivan Grishchenko and Gregory Karpiak for support in model evaluation and data collection; Yury Kartynnik, Valentin Bazarevsky, Artsiom Ablavatski for developing the mesh technology; Aliaksandr Shyrokau and the annotation team for their diligence to data preparation; Vidhya Navalpakkam, Tomer, Tomer Shekel, Kai Kohlhoff for their domain expertise, Fan Zhang, Esha Uboweja, Tyler Mullen, Michael Hays and Chuo-Ling Chang for help to integrate the model to MediaPipe; Matthias Grundmann, Florian Schroff and Ming Guang Yong for continuous help for building this technology.

Read More

AI Goes Uptown: A Tour of Smart Cities Around the Globe 

There are as many ways to define a smart city as there are cities on the road to being smart.

From London and Singapore to Seat Pleasant, Maryland, they vary widely. Most share some common characteristics.

Every city wants to be smart about being a great place to live. So, many embrace broad initiatives for connecting their citizens to the latest 5G and fiber optic networks, expanding digital literacy and services.

Most agree that a big part of being smart means using technology to make their cities more self-aware, automated and efficient.

That’s why a smart city is typically a kind of municipal Internet of Things — a network of cameras and sensors that can see, hear and even smell. These sensors, especially video cameras, generate massive amounts of data that can serve many civic purposes like helping traffic flow smoothly.

Cities around the globe are turning to AI to sift through that data in real time for actionable insights. And, increasingly, smart cities build realistic 3D simulations of themselves, digital twins to test out ideas of what they might look like in the future.

“We define a smart city as a place applying advanced technology to improve the quality of life for people who live in it,” said Sokwoo Rhee, who’s worked on more than 200 smart city projects in 25 countries as an associate director for cyber-physical systems innovation in the U.S. National Institute of Standards and Technology.

U.S., London Issue Smart Cities Guidebooks

At NIST, Rhee oversees work on a guide for building smart cities. Eventually it will include reports on issues and case studies in more than two dozen areas from public safety to water management systems.

Across the pond, London describes its smart city efforts in a 60-page document that details many ambitious goals. Like smart cities from Dubai to San Jose in Silicon Valley, it’s a metro-sized work in progress.

smart london
An image from the Smart London guide.

“We are far from the ideal at the moment with a multitude of systems and a multitude of vendors making the smart city still somewhat complex and fragmented,” said Andrew Hudson-Smith, who is chair of digital urban systems at The Centre for Advanced Spatial Analysis at University College London and sits on a board that oversees London’s smart city efforts.

Living Labs for AI

In a way, smart cities are both kitchen sinks and living labs of technology.

They host everything from air-quality monitoring systems to repositories of data cleared for use in shared AI projects. The London Datastore, for example, already contains more than 700 publicly available datasets.

One market researcher tracks a basket of 13 broad areas that define a smart city from smart streetlights to connected garbage cans. A smart-parking vendor in Stockholm took into account 24 factors — including the number of Wi-Fi hotspots and electric-vehicle charging stations — in its 2019 ranking of the world’s 100 smartest cities. (Its top five were all in Scandinavia.)

“It’s hard to pin it down to a limited set of technologies because everything finds its way into smart cities,” said Dominique Bonte, a managing director at market watcher ABI Research. Among popular use cases, he called out demand-response systems as “a huge application for AI because handling fluctuating demand for electricity and other services is a complex problem.”

smart city factors from EasyPark
Sweden’s EasyPark lists 24 factors that define a smart city.

Because it’s broad, it’s also big. Market watchers at Navigant Research expect the global market for smart-city gear to grow from $97.4 billion in annual revenue in 2019 to $265.4 billion by 2028 at a compound annual growth rate of 11.8 percent.

It’s still early days. In a January 2019 survey of nearly 40 U.S. local and state government managers, more than 80 percent thought a municipal Internet of Things will have significant impact for their operations, but most were still in a planning phase and less than 10 percent had active projects.

smart city survey by NIST
Most smart cities are still under construction, according to a NIST survey.

“Smart cities mean many things to many people,” said Saurabh Jain, product manager of Metropolis, NVIDIA’s GPU software stack for vertical markets such as smart cities.

“Our focus is on building what we call the AI City with the real jobs that can be done today with deep learning, tapping into the massive video and sensor datasets cities generate,” he said.

For example, Verizon deployed on existing streetlights in Boston and Sacramento video nodes using the NVIDIA Jetson TX1 to analyze and improve traffic flow, enhance pedestrian safety and optimize parking.

“Rollout is happening fast across the globe and cities are expanding their lighting infrastructure to become a smart-city platform … helping to create efficiency savings and a new variety of citizen services,” said David Tucker, head of product management in the Smart Communities Group at Verizon in a 2018 article.

Smart Streetlights for Smart Cities

Streetlights will be an important part of the furniture of tomorrow’s smart city.

So far, only a few hundred are outfitted with various mixes of sensors and Wi-Fi and cellular base stations. The big wave is yet to come as the estimated 360 million posts around the world slowly upgrade to energy-saving LED lights.

smart streetlight EU
A European take on a smart streetlight.

In a related effort, the city of Bellevue, Washington, tested a computer vision system from Microsoft Research to improve traffic safety and reduce congestion. Researchers at the University of Wollongong recently described similar work using NVIDIA Jetson TX2 modules to track the flow of vehicles and pedestrians in Liverpool, Australia.

Airports, retail stores and warehouses are already using smart cameras and AI to run operations more efficiently. They are defining a new class of edge computing networks that smart cities can leverage.

For example, Seattle-Tacoma International Airport (SEA) will roll out an AI system from startup Assaia that uses NVIDIA GPUs to speed the time to turn around flights.

“Video analytics is crucial in providing full visibility over turnaround activities as well as improving safety,” said an SEA manager in a May report.

Nashville, Zurich Explore the Simulated City

Some smart cities are building digital twins, 3D simulations that serve many purposes.

For example, both Zurich and Nashville will someday let citizens and city officials don goggles at virtual town halls to see simulated impacts of proposed developments.

“The more immersive and fun an experience, the more you increase engagement,” said Dominik Tarolli, director of smart cities at Esri, which is supplying simulation software that runs on NVIDIA GPUs for both cities.

Cities as far apart in geography and population as Singapore and Rennes, France, built digital twins using a service from Dassault Systèmes.

“We recently signed a partnership with Hong Kong and presented examples for a walkability study that required a 3D simulation of the city,” said Simon Huffeteau, a vice president working on smart cities for Dassault.

Europe Keeps an AI on Traffic

Many smart cities get started with traffic control. London uses digital signs to post speed limits that change to optimize traffic flow. It also uses license-plate recognition to charge tolls for entering a low-emission zone in the city center.

Cities in Belgium and France are considering similar systems.

“We think in the future cities will ban the most polluting vehicles to encourage people to use public transportation or buy electric vehicles,” said Bonte of ABI Research. “Singapore is testing autonomous shuttles on a 5.7-mile stretch of its streets,” he added.

Nearby, Jakarta uses a traffic-monitoring system from Nodeflux, a member of NVIDIA’s Inception program that nurtures AI startups. The software taps AI and the nearly 8,000 cameras already in place around Jakarta to recognize license plates of vehicles with unpaid taxes.

The system is one of more than 100 third-party applications that run on Metropolis, NVIDIA’s application framework for the Internet of Things.

Unsnarling Traffic in Israel and Kansas City

Traffic was the seminal app for a smart-city effort in Kansas City that started in 2015 with a $15 million smart streetcar. Today, residents can call up digital dashboards detailing current traffic conditions around town.

And in Israel, the city of Ashdod deployed AI software from viisights. It helps understand patterns in a traffic monitoring system powered by NVIDIA Metropolis to ensure safety for citizens.

NVIDIA created the AI City Challenge to advance work on deep learning as a tool to unsnarl traffic. Now in its fourth year, it draws nearly 1,000 researchers competing in more than 300 teams that include members from multiple city and state traffic agencies.

The event spawned CityFlow, one of the world’s largest datasets for applying AI to traffic management. It consists of more than three hours of synchronized high-definition videos from 40 cameras at 10 intersections, creating 200,000 annotated bounding boxes around vehicles captured from different angles under various conditions.

Drones to the Rescue in Maryland

You don’t have to be a big city with lots of money to be smart. Seat Pleasant, Maryland, a Washington, D.C., suburb of less than 5,000 people, launched a digital hub for city services in August 2017.

Since then it installed intelligent lighting, connected waste cans, home health monitors and video analytics to save money, improve traffic safety and reduce crime. It’s also become the first U.S. city to use drones for public safety, including plans for life-saving delivery of emergency medicines.

The idea got its start when Mayor Eugene Grant, searching for ways to recover from the 2008 economic downturn, attended an event on innovation villages.

“Seat Pleasant would like to be a voice for small cities in America where 80 percent have less than 10,000 residents,” said Grant. “Look at these cities as test beds of innovation … living labs,” he added.

Seat Pleasant Mayor Eugene Grant
Mayor Grant of Seat Pleasant aims to set an example of how small towns can become smart cities.

Rhee of NIST agrees. “I’m seeing a lot of projects embracing a broadening set of emerging technologies, making smart cities like incubation programs for new businesses like air taxis and autonomous vehicles that can benefit citizens,” he said, noting that even rural communities will get into the act.

Simulating a New Generation of Smart Cities

When the work is done, go to the movies. Hollywood might provide a picture of the next horizon in the same way it sparked some of the current work.

Simulated smart city
Esri’s tools are used to simulate cities for movies as well as the real world.

Flicks including Blade Runner 2049, Cars, Guardians of the Galaxy and Zootopia used a program called City Engine from startup Procedural that enables a rule-based approach to constructing simulated cities.

Their work caught the eye of Esri, which acquired the company and bundled its program with its ArcGIS Urban planning tool, now a staple for hundreds of real cities worldwide.

“Games and movies make people expect more immersive experiences, and that requires more computing,” said Tarolli, a co-founder of Procedural and now Esri’s point person on smart cities.

The post AI Goes Uptown: A Tour of Smart Cities Around the Globe  appeared first on The Official NVIDIA Blog.

Read More

Deep Learning on Tap: NVIDIA Engineer Turns to AI, GPU to Invent New Brew

Some dream of code. Others dream of beer. NVIDIA’s Eric Boucher does both at once, and the result couldn’t be more aptly named.

Full Nerd #1 is a crisp, light-bodied blonde ale perfect for summertime quaffing.

Eric, an engineer in the GPU systems software kernel driver team, went to sleep one night in May wrestling with two problems.

One, he had to wring key information from the often cryptic logs for the systems he oversees to help his team respond to issues faster.

The other: the veteran home brewer wanted a way to brew new kinds of beer.

“I woke up in the morning and I knew just what to do,” Boucher said. “Basically I got both done on one night’s broken sleep.”

Both solutions involved putting deep learning to work on a NVIDIA TITAN V GPU. Such powerful gear tends to encourage this sort of parallel processing, it seems.

Eric, a native of France now based near Sacramento, Calif., began homebrewing two decades ago, inspired by a friend and mentor at Sun Microsystems. He took a break from it when his children were first born.

Now that they’re older, he’s begun brewing again in earnest, using gear in both his garage and backyard, turning to AI for new recipes this spring.

Of course, AI has been used in the past to help humans analyze beer flavors, and even create wild new craft beer names. Eric’s project, however, is more ambitious, because it’s relying on AI to create new beer recipes.

You’ve Got Ale — GPU Speeds New Brew Ideas

For training data, Eric started with the all-grain ale recipes from MoreBeer, a hub for brewing enthusiasts, where he usually shops for recipe kits and ingredients.

Eric focused on ales because they’re relatively easy and quick to brew, and encompass a broad range of different styles, from hearty Irish stout to tangy and refreshing Kölsch.

He used wget — an open source program that retrieves content from the web — to save four index pages of ale recipes.

Then, using a Python script, he filtered the downloaded HTML pages and downloaded the linked recipe PDFs. He then converted the PDFs to plain text and used another Python script to interpret the text and generate recipes in a standardized format.

He fed these 108 recipes — including one for Russian River Brewing’s legendary Pliny the Elder IPA — to Textgenrnn, a recurrent neural network, a type of neural network that can be applied to a sequence of data to help guess what should come next.

And, because no one likes to wait for good beer, he ran it on an NVIDIA TITAN V GPU. Eric estimates it cuts the time to learn from the recipes database to seven minutes from one hour and 45 minutes using a CPU alone.

After a little tuning, Eric generated 10 beer recipes. They ranged from dark stouts to yellowish ales, and in flavor from bitter to light.

To Eric’s surprise, most looked reasonable (though a few were “plain weird and impossible to brew” like a recipe that instructed him to wait 45 days with hops in the wort, or unfermented beer, before adding the yeast).

Speed of Light (Beer)

With the approaching hot California summer in mind, Eric selected a blonde ale.

He was particularly intrigued because the recipe suggested adding Warrior, Cascade, and Amarillo hops — the flowers of the herbaceous perennial Humulus lupulus that gives good beer a range of flavors, from bitter to citrusy — an “intriguing schedule.”

The result, Eric reports, was refreshing, “not too sweet, not too bitter,” with “a nice, fresh hops smell and a long, complex finish.”

He dubbed the result Full Nerd #1.

The AI-generated brew became the latest in a long line of brews with witty names Eric has produced, including a bourbon oak-infused beer named, appropriately enough, “The Groot Beer,” in honor of the tree-like creature from Marvel’s “Guardians of the Galaxy.”

Eric’s next AI brewing project: perhaps a dark stout, for winter, or a lager, a light, crisp beer that requires months of cold storage to mature.

For now, however, there’s plenty of good brew to drink. Perhaps too much. Eric usually shares his creations with his martial arts buddies. But with social distancing in place amidst the global COVID-19 pandemic, the five gallons, or forty pints, is more than the light drinker knows what to do with.

Eric, it seems, has found a problem deep learning can’t help him with. Bottoms up.

The post Deep Learning on Tap: NVIDIA Engineer Turns to AI, GPU to Invent New Brew appeared first on The Official NVIDIA Blog.

Read More