Recap of TensorFlow at Google I/O 2021

Posted by the TensorFlow team

TensorFlow recap header

Thanks to everyone who joined our virtual I/O 2021 livestream! While we couldn’t meet in person, we hope we were able to make the event more accessible than ever. In this article, we’re recapping a few of the updates we shared during the keynote. You can watch the keynote below, and you can find recordings of every talk on the TensorFlow YouTube channel. Here’s a summary of a few announcements by product area (and there’s more in the videos, so be sure to check them out, too).

TensorFlow for Mobile and Web

The TensorFlow Lite runtime will be bundled with Google Play services

Let’s start with the announcement that the TensorFlow Lite runtime is going to be bundled with Google Play services, meaning you don’t need to distribute it with your app. This can greatly reduce your app’s bundle size. Now you can distribute your model without needing to worry about the runtime. You can sign up for an early access program today, and we expect a full rollout later this year.

You can now run TensorFlow Lite models on the web

All your TensorFlow Lite models can now directly be run on the web in the browser with the new TFLite Web APIs that are unified with TensorFlow.js. This task-based API supports running all TFLite Task Library models for image classification, objection detection, image segmentation, and many NLP problems. It also supports running arbitrary, custom TFLite models with easy, intuitive TensorFlow.js compatible APIs. With this option, you can unify your mobile and web ML development with a single stack.

A new On-Device Machine Learning site

We understand that the most effective developer path to reach Android, the Web and iOS isn’t always the most obvious. That’s why we created a new On-Device Machine Learning site to help you navigate your options, from turnkey to custom models, from cross platform mobile, to in-browser. It includes pathways to take you from an idea to a deployed app, with all the steps in between.

Performance profiling

When it comes to performance, we’re also working on additional tooling for Android developers. TensorFlow Lite includes built-in support for Systrace, integrating seamlessly with perfetto for Android 10.

And perf improvements aren’t limited to Android – for iOS developers TensorFlow Lite comes with built-in support for signpost-based profiling. When you build your app with the trace option enabled, you can run the Xcode profiler to see the signpost events, letting you dive deeper, and seeing all the way down to individual ops during execution.

Perfetto dashboard

TFX

TFX 1.0: Production ML at Enterprise-scale

Moving your ML models from prototype to production requires lots of infrastructure. Google created TFX because we needed a strong framework for our ML products and services, and then we open-sourced it so that others can use it too. It includes support for training models for mobile and web applications, as well as server-based applications.

After a successful beta with many partners, today we’re announcing TFX 1.0 — ready today for production ML at enterprise-scale. TFX includes all of the things an enterprise-ready framework needs, including enterprise-grade support, security patches, bug fixes, and guaranteed backward compatibility for the entire 1.X release cycle. It also includes strong support for running on Google Cloud and support for mobile, web, and NLP applications.

If you’re ready for production ML, TFX is ready for you. Visit the TFX site to learn more.

Responsible AI

We’re also sharing a number of new tools to help you keep Responsible AI top of mind in everything that you do when developing with ML.

Know Your Data

Know Your Data (KYD) is a new tool to help ML researchers and product teams understand rich datasets (images and text) with the goal of improving data and model quality, as well as surfacing and mitigating fairness and bias issues. Try the interactive demo at the link above to learn more.

Know Your Data interface

People + AI Guidebook 2.0

As you create AI solutions, building with a people centric approach is a key to doing it responsibly, and we’re delighted to announce the People + AI Guidebook 2.0. This update is designed to help you put best practices and guidance for people-centric AI into practice with a lot of new resources including code, design patterns and much more!

Also check out our Responsible AI Toolkit to help you integrate Responsible AI practices into your ML workflow using TensorFlow.

Decision forests in Keras

New support for random forests and gradient boosted trees

There’s more to ML than neural networks. Starting with TensorFlow 2.5, you can easily train powerful decision forest models (including favorites like random forests and gradient boosted trees) using familiar Keras APIs. There’s support for many state-of-the-art algorithms for training, serving and interpreting models for classification, regression and ranking tasks. And you can serve your decision forests using TF Serving, just like any other model trained with TensorFlow. Check out the tutorials here, and the video from this session.

TensorFlow Lite for Microcontrollers

A new pre-flashed board, experiments, and a challenge

TensorFlow Lite for Microcontrollers is designed to help you run ML models on microcontrollers and other devices with only a few kilobytes of memory. You can now purchase pre-flashed Arduino boards that will connect via Bluetooth and your browser. And you can use these to try out new Experiments With Google that let you make gestures and even create your own classifiers and run custom TensorFlow models. If you’re interested in challenges, we’re also running a new TensorFlow Lite for Microcontrollers challenge, you can check it out here. And also be sure to check out the TinyML workshop video in the next steps below.

Microcontroller chip

Google Cloud

Vertex AI: A new managed ML platform on Google Cloud

An ML model is only valuable if you can actually put it into production. And as you know, it can be challenging to productionize efficiently and at scale. That’s why Google Cloud is releasing Vertex AI, a new managed machine learning platform to help you accelerate experimentation and deployment of AI models. Vertex AI has tools that span every stage of the developer workflow, from data labeling, to working with notebooks and models, to prediction tools and continuous monitoring – all unified into one UI. While many of these offerings may be familiar to you, what really distinguishes Vertex AI is the introduction of new MLOps features. You can now manage your models with confidence using our MLOps tools such as Vertex Pipelines and Vertex Feature Store, to remove the complexity of robust self-service model maintenance and repeatability.

TensorFlow Cloud: Transition from local model building to distributed training on the Cloud

TensorFlow Cloud provides APIs that ease the transition from local model building and debugging to distributed training and hyperparameter tuning on Google Cloud. From inside a Colab or Kaggle Notebook or a local script file, you can send your model for tuning or training on Cloud directly, without needing to use the Cloud Console. We recently added a new site and new features, check it out if you’re interested in learning more.

Community

A new TensorFlow Forum

We created a new TensorFlow Forum for you to ask questions and connect with the community. It’s a place for developers, contributors, and users to engage with each other and the TensorFlow team. Create your account and join the conversation at discuss.tensorflow.org.

TensorFlow Forum page

Find all the talks here

This is just a small part of what was shared at Google I/O 2021. You can find all of the TensorFlow sessions in this playlist, and for your convenience here are direct links to each of the sessions also:

To learn more about TensorFlow, you check out tensorflow.org, read other articles on the blog, follow us on social media, and subscribe to our YouTube Channel, or join a TensorFlow User Group near you.

Read More

What Is Explainable AI?

Banks use AI to determine whether to extend credit, and how much, to customers. Radiology departments deploy AI to help distinguish between healthy tissue and tumors. And HR teams employ it to work out which of hundreds of resumes should be sent on to recruiters.

These are just a few examples of how AI is being adopted across industries. And with so much at stake, businesses and governments adopting AI and machine learning are increasingly being pressed to lift the veil on how their AI models make decisions.

Charles Elkan, a managing director at Goldman Sachs, offers a sharp analogy for much of the current state of AI, in which organizations debate its trustworthiness and how to overcome objections to AI systems:

We don’t understand exactly how a bomb-sniffing dog does its job, but we place a lot of trust in the decisions they make.

To reach a better understanding of how AI models come to their decisions, organizations are turning to explainable AI.

What Is Explainable AI?

Explainable AI, or XAI, is a set of tools and techniques used by organizations to help people better understand why a model makes certain decisions and how it works. XAI is: 

  • A set of best practices: It takes advantage of some of the best procedures and rules that data scientists have been using for years to help others understand how a model is trained. Knowing how, and on what data, a model was trained helps us understand when it does and doesn’t make sense to use that model. It also shines a light on what sources of bias the model might have been exposed to.
  • A set of design principles: Researchers are increasingly focused on simplifying the building of AI systems to make them inherently easier to understand.
  • A set of tools: As the systems get easier to understand, the training models can be further refined by incorporating those learnings into it — and by offering those learnings to others for incorporation into their models.

How Does Explainable AI Work?

While there’s still a great deal of debate over the standardization of XAI processes, a few key points resonate across industries implementing it:

  • Who do we have to explain the model to?
  • How accurate or precise an explanation do we need?
  • Do we need to explain the overall model or a particular decision?
Source: DARPA

Data scientists are focusing on all these questions, but explainability boils down to: What are we trying to explain?

Explaining the pedigree of the model:

  • How was the model trained?
  • What data was used?
  • How was the impact of any bias in the training data measured and mitigated?

These questions are the data science equivalent of explaining what school your surgeon went to —  along with who their teachers were, what they studied and what grades they got. Getting this right is more about process and leaving a paper trail than it is about pure AI, but it’s critical to establishing trust in a model.

While explaining a model’s pedigree sounds fairly easy, it’s hard in practice, as many tools currently don’t support strong information-gathering. NVIDIA provides such information about its pretrained models. These are shared on the NGC catalog, a hub of GPU-optimized AI and high performance computing SDKs and models that quickly help businesses build their applications.

Explaining the overall model:

Sometimes called model interpretability, this is an active area of research. Most model explanations fall into one of two camps:

In a technique sometimes called “proxy modeling,” simpler, more easily comprehended models like decision trees can be used to approximately describe the more detailed AI model. These explanations give a “sense” of the model overall, but the tradeoff between approximation and simplicity of the proxy model is still more art than science.

Proxy modeling is always an approximation and, even if applied well, it can create opportunities for real-life decisions to be very different from what’s expected from the proxy models.

The second approach is “design for interpretability.” This limits the design and training options of the AI network in ways that attempt to assemble the overall network out of smaller parts that we force to have simpler behavior. This can lead to models that are still powerful, but with behavior that’s much easier to explain.

This isn’t as easy as it sounds, however, and it sacrifices some level of efficiency and accuracy by removing components and structures from the data scientist’s toolbox. This approach may also require significantly more computational power.

Why XAI Explains Individual Decisions Best

The best understood area of XAI is individual decision-making: why a person didn’t get approved for a loan, for instance.

Techniques with names like LIME and SHAP  offer very literal mathematical answers to this question — and the results of that math can be presented to data scientists, managers, regulators and consumers. For some data — images, audio and text — similar results can be visualized through the use of “attention” in the models — forcing the model itself to show its work.

In the case of the Shapley values used in SHAP, there are some mathematical proofs of the underlying techniques that are particularly attractive based on game theory work done in the 1950s. There is active research in using these explanations of individual decisions to explain the model as a whole, mostly focusing on clustering and forcing various smoothness constraints on the underlying math.

The drawback to these techniques is that they’re somewhat computationally expensive. In addition, without significant effort during the training of the model, the results can be very sensitive to the input data values. Some also argue that because data scientists can only calculate approximate Shapley values, the attractive and provable features of these numbers are also only approximate — sharply reducing their value.

While healthy debate remains, it’s clear that by maintaining a proper model pedigree, adopting a model explainability method that provides clarity to senior leadership on the risks involved in the model, and monitoring actual outcomes with individual explanations, AI models can be built with clearly understood behaviors.

For a closer look at examples of XAI work, check out the talks presented by Wells Fargo and ScotiaBank at NVIDIA GTC21.

The post What Is Explainable AI? appeared first on The Official NVIDIA Blog.

Read More

Best practices in customer service automation

Chatbots, virtual assistants, and Interactive Voice Response (IVR) systems are key components of successful customer service strategies.

We had the pleasure of hearing from three AWS Contact Center Intelligence (AWS CCI) Partners as part of our Best Practices in Customer Service Automation webinar, who provided valuable insights and tips for building automated, customer-service solutions.

The panel included:

Why build a chatbot or IVR?

Customers expect great customer service. At the same time, enterprises struggle with the costs and resources necessary to provide high-quality, highly available, live-agent solutions. Automated solutions, like chatbots and IVR, enable enterprises to provide quality support, 24/7, while reducing costs and increasing customer satisfaction.

Although reducing costs is important, a big reason enterprises are implementing automated solutions is to provide a better overall user-experience. As Brad Beumer of UIPath points out, it is what customers are asking for. Customers want a 24/7/365 experience—especially for common tasks they can handle on their own without an agent.

Self-serve, automated solutions help take the pressure off live agents. As Rebecca Owens of Genesys mentions, self-service can help handle the upfront tasks, leaving the more complex tasks to the live agents, who are the contact centers’ most valuable assets.

The impact of COVID-19

COVID-19 has had a significant impact on the interest in chatbots. Shelter-in-place rules affected both the consumers’ ability to go into locations, and the live agents’ ability to work in the same contact center. The need for automated solutions skyrocketed. Genesys saw a large increase in call volumes—in some cases, nearly triple the volume.

Chatbots are not only helping consumers during COVID-19, but work-from-home agents as well. As Beumer mentions, automated solutions help offload more of the agents’ tasks and help them with compliance, security, and even VPN issues related to working from home.

COVID-19 resulted in more stress on existing chatbots too. As Pat Higbie of XAPP AI shares, existing chatbots were not set up to handle the additional use cases people wanted them to handle. These are opportunities to take advantage of AI, through tools like Amazon Lex or Amazon Kendra, for chatbots and natural language search, to enable users to get what they need and improve the customer experience.

Five best practices

Building automated solutions is an iterative process. Our panelists provided insights and best practices when facing common issues.

Getting started

Building conversational interfaces can be challenging because it is hard to know all the things a user may request, or even how they pose the request.

Our panelists see three basic use cases:

  • Task completion – Collecting user information to make an update, like an address change
  • Information requests – Providing information like delivery status or a bank balance
  • Efficient routing – Collecting information to route the user to the most appropriate agent

Our panelists recommend getting started with simpler use cases that have a high impact. As Beumer recommends, start with high-volume, low-complexity tasks like password resets or lost credit cards. Owens adds that starting with high-level Natural Language Understanding (NLU) menus to understand user intent and routing them to the right agent is a simple investment with a significant ROI. Afterwards, move to simple task automation and information requests, and then move into the more advanced use cases that were not possible before conversational AI. As Higbie puts it, start with a quick win, like informational chatbots, especially if you have not done this before. The level of complexity can go up quite dramatically, especially with transactional use cases.

As complexity increases, there are opportunities for more advanced use cases, like transactional or even proactive use cases. Owens mentioned an example of using AI to monitor activity on a website and proactively offering a chatbot when needed. For example, if you can predict the likelihood of an ecommerce user having an issue at checkout, a chatbot can proactively offer to help the user, to lead them through completion so the user does not abandon their cart.

Handling fallbacks gracefully

Fallbacks occur when the automated solution cannot understand the user or cannot handle the request. It is important to handle fallbacks gracefully.

In the past with contact centers, users were often routed to an agent when a fallback occurred. Now with AI, you can better understand the user’s intent and context, and either send them to another AI solution, or more efficiently transfer them to an agent, sending the full context so the user does not have to repeat themselves.

Fallbacks are an opportunity to educate users on what they can say and do—to help get users back on the “happy path.” For example, if the user asks for something the chatbot cannot do, have it respond with a list of what it can do. Predefined buttons, referred to as quick replies, can also help let a user know what the chatbot can do.

Supporting multimodal channels

Our panelists see enterprises building automated solutions across multiple channels, including multi-modal text and voice options on the web, IVR, social media, and email. Enterprises are building solutions where their customers are interacting. There are additional factors to consider when supporting multiple channels.

People ask questions differently across channels. As Higbie points out, users communicating via text tend do so in “keyword style” with incomplete sentences, whereas in voice, they tend to ask the full question.

The way the chatbot responds across channels can be different as well. In text, the chatbot can provide a menu of options for the user to click. With voice, if there are more than three options, it can be difficult for the user to remember.

Regardless of the channel, it is important to understand the user’s intent. As Beumer mentions, if the intent can be understood, the right automation can be triggered.

It can be helpful to have a common interaction model for understanding across channels, but it is important to optimize the actual responses for each particular channel. As Higbie indicates, model management, dialog management, and content management are all needed to handle the complexities in conversational AI.

Keeping context in mind

Context is important—what is known about the user, where they are, or what they are doing can help improve the user experience.

Chatbots and IVRs can connect to backend CRMs to have additional information to personalize and tailor the experience. They can also pass along information gathered from a user to a live agent for more efficient handling so the user does not have to repeat themselves.

In the case of voice, knowing if the user has been in recent contact before can be helpful. While introductory prompts can be great to educate people, if the user contacts again, it is better to use a tapered approach that reduces some of the default messaging in order to have a quicker opening response.

The context can also be used with proactive solutions that monitor user activity and prompt if help is needed.

Measuring success

Our panelists use a variety of metrics to measure success, such as call deflection rates, self-service containment rates, first response time, and customer satisfaction. The metrics can also be used to calculate operational cost savings by knowing the cost of live agents and the deflection rates.

Customer satisfaction is very important—one of the goals of automated solutions is to provide a better user experience. One way UIPath does this is to look at Net Promoter Scores (NPS) before and after an automated solution is launched. Surveys can be used as well, via outbound calls after an interaction to gather customer feedback. With chatbots, you can immediately ask the user whether the response was helpful and take further action depending on the response.

Automated solutions like chatbots and IVRs need continuous optimization. It is difficult to anticipate all the things a user may ask, or how they may ask them. Monitoring the interactions to understand what users are asking for, how the automated solution is responding, and where it needs improvement is important. It is an iterative process.

What the future looks like

Our panelists shared their thoughts on the future of automated solutions.

Owens sees an increase in usage of automated solutions across all channels as chatbot technologies gain momentum and AI is able to handle even more tasks and complexity. Although customer service is heavily voice today, she is seeing a push to digital, and expects the trend to continue. One area of growth is in the expansion of language support in AI beyond English to support worldwide coverage.

Beumer envisions expansion of automated solutions across all channels, for a more consistent user experience. While automation will increase, it is important to continue to make sure that when a chatbot hands off to a live agent, that it is done so seamlessly.

Higbie sees a lot of exciting opportunity for automated solutions, and believes we are only in the “first inning” of AI automation. Customers will ask for even more than what chatbots currently do, and they will get the responses instantly. Solutions will move more to the proactive side as well. He sees this as a bigger paradigm shift than either web or mobile. It is important to commit now and not be displaced. As he summarizes, enterprises need to get started, get a quick win, and then expand the sophistication of their AI initiatives.

As the underlying technologies continue to evolve, the opportunities for automated chatbots continue to grow. It is exciting to learn from our panelists and see where automated solutions are going in the future.

About AWS Contact Center Intelligence

AWS CCI solutions can quickly and easily add AI and ML to your existing contact center to improve customer satisfaction and reduce costs. AWS CCI covers three key areas of the contact center workflow: self-service automation, real-time analytics with agent assist, and post-call analytics. Each solution is created using a specific combination of AWS AI services, and is available through select AWS Partners. Join the next CCI Webinar, “Banking on Bots”, on May 25, 2021.


About the Author

Arte Merritt leads partnerships for Contact Center Intelligence and Conversational AI. He is a frequent author and speaker in the conversational AI space. He was the co-founder and CEO of the leading analytics platform for conversational interfaces, leading the company to 20,000 customers, 90B messages, and multiple acquisition offers. Previously he founded Motally, a mobile analytics platform he sold to Nokia. Arte has more than 20 years experience in big data analytics. Arte is an MIT alum.

Read More

Implement live customer service chat with two-way translation, using Amazon Connect and Amazon Translate

Many businesses support customers across multiple countries and ethnic communities, and therefore need to provide customer service in a wide variety of local languages. It’s hard to consistently staff contact centers with agents with different language proficiencies. During periods of high call volumes, callers often must wait on hold for an agent who can speak their language.

What if these businesses could implement a system to act as a real-time translator, allowing customers and agents to easily communicate in different languages? With such a system, a customer could message a support agent in their native language, such as French, and the support agent could use their own native language, maybe Italian, to read and respond to the customer’s messages. Deliveroo, an online food delivery company based in England, has implemented a system that does exactly that!

Deliveroo provides food delivery in over 200 locations across Europe, the Middle East, and Asia, serving customers in dozens of languages. Previously, during periods of high demand (such as during televised sporting events, or bad weather) they would ask customers to wait for a native speaker to become available or ask their agents to copy/paste the chats into an online translation service. These approaches were far from ideal, so Deliveroo is now deploying a much better solution that uses Amazon Connect and Amazon Translate to implement scalable live agent chat with built-in automatic two-way translation.

In this post, we share an open-source version of this solution from one of Amazon’s partners, VoiceFoundry. We show you how to install and try the solution, and then how you can customize it to control translations of specific phrases. Finally, we share success stories from our customer, Deliveroo, and leave you with pointers for implementing a similar solution for your own business.

Set up an Amazon Connect test instance and live chat translation

Follow these tutorials to set up an Amazon Connect test instance and experiment with the chat contact feature:

If you have an Amazon Connect test instance and you already know how to use chat contacts, you can skip this step.

Now that you have Amazon Connect chat working, it’s time to install the sample live chat translation solution. My co-author, Dan from VoiceFoundry, has made it easy. Follow the instructions in the project GitHub repository Install Translate CCP Demo for Amazon Connect.

Test the solution

To test the solution, you simulate two roles—the agent and the customer.

  1. As the agent, sign in to your Amazon Connect instance dashboard.
  2. In a separate browser window, open the new web application using the URL created when you installed the solution.

The Amazon Connect Control Panel is displayed on the left, and the new chat translation panel is on the right.

  1. On the Control Panel title bar, change your status from Offline to Available.
  2. Acting as the customer, launch the test chat page from the Amazon Connect dashboard, or using the URL https://<yourConnectInstance>/connect/test-chat.

In a real-world application, you use a customer chat client widget on a website or mobile application. However, for this post, it’s convenient to use the test chat client.

  1. Open the customer test chat widget to initiate contact with the agent.

You hear a ring tone and see a visual indicator on the agent’s control panel as the agent is asked to accept your contact.

  1. As the agent, accept the incoming request to establish contact.

  1. As the customer, enter a message in Spanish into the customer test chat widget. For example, “Hola, necesito ayuda con mi pedido.”

Let’s assume that the agent can’t understand the incoming message in Spanish. Don’t worry—we can use our sample solution. The new web app chat translation panel displays the translation in English, along with the customer’s original message. Now you can understand the phrase “Hi, I need help with my order.”

  1. As the agent, enter a reply in English in the chat translation panel text box, for example “Hi, My name is Bob and I will be happy to help. What is your name and phone number?”

Your reply is automatically translated back to Spanish.

  1. As the customer, verify that you received a reply from the agent in Spanish.

Continue the conversation and observe how the customer can chat entirely in Spanish, and the agent entirely in English. Take a moment to consider how useful this can be.

When you’re done, as the agent, choose End chat and Close contact to end the chat session. As the customer, choose End chat.

Did you notice that the chat translation panel automatically identified the language the customer used—in this case Spanish? You can use any of the languages supported by Amazon Translate. Try the experiment again, this time using a different language for the customer. Have some fun with it—engage friends who are fluent in other languages and communicate with them in their native tongue.

In the sample application, we have assumed that the agent always uses English. A production version of the application would allow the agent to choose their preferred language.

Multi-chat support

Amazon Connect supports up to five concurrent chat sessions per agent. Our sample application allows a single agent to support multiple customer chats in different languages concurrently.

In the following screenshot, agent Bob is now chatting with a new customer, this time in German!

Customize terminology

Let’s say you have a product called Moonlight and Roses. While discussing this product with your Spanish-speaking customer, you enter something like “I see that you ordered Moonlight and Roses on May 13, is that correct?”

Your customer sees the translation “Veo que ordenaste Luz de Luna y Rosas el 13 de mayo, ¿es correcto?”

This is a good literal translation—Luz de Luna y Rosas does mean Moonlight and Roses. But in this case, you want your English product name, Moonlight and Roses, to be translated to the Spanish product name, Moonlight y Roses.

This is where we can use the powerful custom terminology feature in Amazon Translate. Let’s try it. For instructions on updating your custom terminologies, see the GitHub repo.

Now we can validate the solution with another simulated chat between an agent and customer, as in the following screenshot.

Deliveroo use case

Amazon Translate helps Deliveroo’s customers, riders (delivery personnel), and food establishment owners talk to each other across language barriers to deliver hot and tasty food of your choice from your local neighborhood eateries quickly.

This helped support the food delivery industry especially during the COVID-19 pandemic, when going out to restaurants became a hazardous endeavor.

Amy Norris, Product Manager for Deliveroo Customer Care says, “Amazon Translate is fast, accurate, and customizable to ensure that food item names, restaurant names, addresses, and customer names are translated correctly to create trustful conversational connections in uncertain times. By using Amazon Translate, our customer service agents were able to increase their first call resolution to 83% and reduce the average call handling time for their customers by 20%.”

Clean up

When you have finished experimenting with this solution, you can clean up your resources by removing the sample live chat translation application and deleting your test Amazon Connect instance.

Conclusion

The combination of Amazon Connect and Amazon Translate enables a scalable, cost-effective solution for your customer support agents to communicate in real time with customers in their preferred languages. The sample application is provided as open source—you can use it as a starting point for your own solution. AWS Professional Services, VoiceFoundry, and other Amazon partners are here to help as well.

We’d love to hear from you. Let us know what you think in the comments section, or using the issues forum in the sample solution GitHub repository.


About the Authors

Bob StrahanBob Strahan is a Principal Solutions Architect in the AWS Language AI Services team.

 

 

 

 

Daniel Bloy is a practice leader for VoiceFoundry, an Amazon Connect specialty partner.

Read More

How Diversity Drives Innovation: Catch Up on Inclusion in AI with NVIDIA On-Demand

NVIDIA’s GPU Technology Conference is a hotbed for sharing groundbreaking innovations — making it the perfect forum for developers, students and professionals from underrepresented communities to discuss the challenges and opportunities surrounding AI.

Last month’s GTC brought together virtually tens of thousands of attendees from around the world, with more than 20,000 developers from emerging markets, hundreds of women speakers and a variety of session topics focused on diversity and inclusion in AI.

It had 6x increase in female attendees from last fall’s event, a 6x jump in Black attendees and a 5x boost in Hispanic and Latino attendees. Dozens signed up for hands-on training from the NVIDIA Deep Learning Institute and joined networking sessions hosted by NVIDIA community resource groups in collaboration with organizations like Black in AI and LatinX in AI.

More than 1,500 sessions from GTC 2021 are now available for free replay on NVIDIA On-Demand — including panel discussions on AI literacy and efforts to grow the participation of underrepresented groups in science and engineering.

Advocating for AI Literacy Among Youth

In a session called “Are You Smarter Than a Fifth Grader Who Knows AI?,” STEM advocates Justin Shaifer and Maynard Okereke (known as Mr. Fascinate and the Hip Hop M.D., respectively) led a conversation about initiatives to help young people understand AI.

Given the ubiquity of AI technologies, being surrounded by it “is essentially just how they live,” said Jim Gibbs, CEO of the Pittsburgh-based startup Meter Feeder. “They just don’t know any different.”

But school curriculums often don’t teach young people how AI technologies work, how they’re developed or about AI ethics. So it’s important to help the next generation of developers prepare “to take advantage of all the new opportunities that there are going to be for people who are familiar with machine learning and artificial intelligence,” he said.

Panelist Lisa Abel-Palmieri, CEO of the Boys & Girls Clubs of Western Pennsylvania, described how her organization’s STEM instructors audited graduate-level AI classes at Carnegie Mellon University to inform a K-12 curriculum for children from historically marginalized communities. NVIDIA recently announced a three-year AI education partnership with the organization to create an AI Pathways Toolkit that Boys & Girls Clubs nationwide can deliver to students, particularly those from underserved and underrepresented communities.

And Babak Mostaghimi, assistant superintendent of Georgia’s Gwinnett County Public Schools shared how his team helps students realize how AI is relevant to their daily experiences.

“We started really getting kids to understand that AI is already part of your everyday life,” he said. “And when kids realize that, it’s like, wait a minute, let me start asking questions like: Why does the algorithm behind something cause a certain video to pop up and not others?”

Watch the full session replay on NVIDIA On-Demand.

Diverse Participation Brings Unique Perspectives

Another panel, “Diversity Driving AI Innovation,” was led by Brennon Marcano, CEO of the National GEM Consortium, a nonprofit focused on diversifying representation in science and engineering.

Researchers and scientists from Apple, Amazon Web Services and the University of Utah shared their experiences working in AI, and the value that the perspectives of underrepresented groups can provide in the field.

“Your system on the outside is only as good as the data going in on the side,” said Marcano. “So if the data is homogeneous and not diverse, then the output suffers from that.”

But diversity of datasets isn’t the only problem, said Nashlie Sephus, a tech evangelist at Amazon Web Services AI who focuses on fairness and identifying biases. Another essential consideration is making sure developer teams are diverse.

“Just by having someone on the team with a diverse experience, a diverse perspective and background — it goes a long way. Teams and companies are now starting to realize that,” she said.

The panel described how developers can mitigate algorithmic bias, improve diversity on their teams and find strategies to fairly compensate focus groups who provide feedback on products.

“Whenever you are trying to create something in software that will face the world, the only way you can be precisely coupled to that world is to invite the world into that process,” said Rogelio Cardona-Rivera, assistant professor at the University of Utah. “There’s no way you will be able to be as precise if you leave diversity off the table.”

Watch the discussion here.

Learn more about diversity and inclusion at GTC, and watch additional session replays on NVIDIA On-Demand. Find the GTC keynote address by NVIDIA CEO Jensen Huang here.

The post How Diversity Drives Innovation: Catch Up on Inclusion in AI with NVIDIA On-Demand appeared first on The Official NVIDIA Blog.

Read More

Reduce ML inference costs on Amazon SageMaker with hardware and software acceleration

Amazon SageMaker is a fully-managed service that enables data scientists and developers to build, train, and deploy machine learning (ML) models at 50% lower TCO than self-managed deployments on Elastic Compute Cloud (Amazon EC2). Elastic Inference is a capability of SageMaker that delivers 20% better performance for model inference than AWS Deep Learning Containers on EC2 by accelerating inference through model compilation, model server tuning, and underlying hardware and software acceleration technologies.

Inference is the process of making predictions using a trained ML model. For production ML applications, inference accounts for up to 90% of total compute costs. Hence, when deploying an ML model for inference, accelerating inference performance on low-cost instance types is an effective way to reduce overall compute costs while meeting performance requirements such as latency and throughput. For example, running ML models on GPU-based instances provides good inference performance; however, selecting the right instance size and optimizing GPU utilization is challenging because different ML models require different amounts of compute and memory resources.

Elastic Inference Accelerators (EIA) solve this problem by enabling you to attach the right amount of GPU-powered inference acceleration to any Amazon SageMaker ML instance. You can choose any CPU instance type that best suits your application’s overall compute and memory needs, and separately attach the right amount of GPU-powered inference acceleration needed to satisfy your performance requirements. This allows you to reduce inference costs by using compute resources more efficiently. Along with hardware acceleration, Elastic Inference offers software acceleration through SageMaker Neo, a capability of SageMaker that automatically compiles ML models for any ML framework and to any target hardware. With SageMaker Neo, you don’t need to set up third-party or framework-specific compiler software or tune the model manually for optimizing inference performance. With Elastic Inference, you can combine software and hardware acceleration to get the best inference performance on SageMaker.

This post demonstrates how you can use hardware and software-based inference acceleration to reduce costs and improve latency for pre-trained TensorFlow models on Amazon SageMaker. We show you how to compile a pre-trained TensorFlow ResNet-50 model using SageMaker Neo and how to deploy this model to a SageMaker Endpoint with Elastic Inference.

Setup

First, we need to ensure we have SageMaker Python SDK  >=2.32.1 and import necessary Python packages. If you are using SageMaker Notebook Instances, select conda_tensorflow_p36 as your kernel. Note that you may have to restart your kernel after upgrading packages.

import numpy as np
import time
import json
import requests
import boto3
import os
import sagemaker

Next, we’ll get the IAM execution role and a few other SageMaker specific variables from our notebook environment so that SageMaker can access resources in your AWS account. See the documentation for more information on how to set this up.

from sagemaker import get_execution_role
from sagemaker.session import Session

role = get_execution_role()
sess = Session()
region = sess.boto_region_name
bucket = sess.default_bucket()

Get pre-trained model for compilation

SageMaker Neo supports compiling TensorFlow/Keras, PyTorch, ONNX, and XGBoost models. However, only Neo-compiled TensorFlow models are supported on EIA as of this writing. TensorFlow models should be in SavedModel format or frozen graph format. Learn more here.

Import ResNet50 model from Keras

We will import ResNet50 model from Keras applications and create a model artifact model.tar.gz.

import tensorflow as tf
import tarfile

tf.keras.backend.set_image_data_format('channels_last')
pretrained_model = tf.keras.applications.resnet.ResNet50()
saved_model_dir = '1'
tf.saved_model.save(pretrained_model, saved_model_dir)

with tarfile.open('model.tar.gz', 'w:gz') as tar:
    tar.add(saved_model_dir)

Upload model artifact to S3

SageMaker Neo expects a path to the model artifact in Amazon S3, so we will upload the model artifact to an S3 bucket.

from sagemaker.utils import name_from_base

prefix = name_from_base('ResNet50')
input_model_path = session.upload_data(path='model.tar.gz', bucket=bucket, key_prefix=prefix)
print('S3 path for input model: {}'.format(input_model_path))

Compile model for EI Accelerator using SageMaker Neo

Now the model is ready to be compiled by SageMaker Neo. Note that ml_eia2 needs to be set for target_instance_family field in order for the model to be optimized for EI accelerator deployment. If you want to compile your own model for EI accelerator, refer to Neo compilation API. In order to compile the model, you also need to provide the model input_shape and any optional compiler_options to your model. Note that 32-bit floating-point types (FP32) are the default precision mode for ML models. We include this here to be explicit versus compiling with lower precision models. Learn more about advantages of different precision types here.

from sagemaker.tensorflow import TensorFlowModel

# Create a TensorFlow SageMaker model
tensorflow_model = TensorFlowModel(model_data=input_model_path,
                                   role=role,
                                   framework_version='2.3')

# Compile the model for EI accelerator in SageMaker Neo
output_path = '/'.join(input_model_path.split('/')[:-1])
compilation_job_name = prefix + "-fp32"
compiled_model_fp32 = tensorflow_model.compile(target_instance_family='ml_eia2',
                                               input_shape={"input_1": [1, 224, 224, 3]},
                                               output_path=output_path,
                                               role=role,
                                               job_name=compilation_job_name,
                                               framework='tensorflow',
                                               compiler_options={"precision_mode": "fp32"})

Deploy compiled model to an Endpoint with EI Accelerator attached

Deploying a model to a SageMaker Endpoint uses the same deploy function whether or not a model is compiled using SageMaker Neo. The only change required for utilizing EI Accelerator is to provide an accelerator_type parameter, which determines the type of EI accelerator to be attached to your endpoint. All supported types of accelerators can be found here.

predictor_compiled_fp32 = compiled_model_fp32.deploy(initial_instance_count=1,
instance_type='ml.m5.xlarge', accelerator_type='ml.eia2.large')

Benchmarking endpoints

Once the endpoint is created, we will benchmark to measure latency. The model expects input shape of 1 x 224 x 224 x 3, so we expand the dog image (224x224x3) with a batch size of 1 to be compatible with the model input. The benchmark first runs a series of 100 warmup inferences, and then runs 1000 inferences to make sure that we get an accurate estimate of latency ignoring startup times. Latency percentiles are reported from these 1000 inferences.

import numpy as np
import matplotlib.image as mpimg

data = mpimg.imread('dog.jpg')
data = np.expand_dims(data, axis=0)
print("Input data shape: {}".format(data.shape))

import time
import numpy as np


def benchmark_sm_endpoint(predictor, input_data):
    print('Doing warmup round of 100 inferences (not counted)')
    for i in range(100):
      output = predictor.predict(input_data)
    time.sleep(3)

    client_times = []
    print('Running 1000 inferences')
    for i in range(1000):
      client_start = time.time()
      output = predictor.predict(data)
      client_end = time.time()
      client_times.append((client_end - client_start)*1000)

    print('Client end-to-end latency percentiles:')
    client_avg = np.mean(client_times)
    client_p50 = np.percentile(client_times, 50)
    client_p90 = np.percentile(client_times, 90)
    client_p99 = np.percentile(client_times, 99)
    print('Avg | P50 | P90 | P99')
    print('{:.4f} | {:.4f} | {:.4f} | {:.4f}n'.format(client_avg, client_p50, client_p90, client_p99))
    
benchmark_sm_endpoint(predictor_compiled_fp32, data)

From the benchmark above, the output will be similar to the following:

Doing warmup round of 100 inferences (not counted)
Running 1000 inferences
Client end-to-end latency percentiles:
Avg | P50 | P90 | P99
103.2129 | 124.4727 | 129.1123 | 133.2371

Compile and benchmark model with quantization

Quantization based model optimizations represent model weights in lower precision (e.g. FP16) which increases throughput and offers lower latency. Using FP16 precision in particular provides faster performance than FP32 with effectively no drop (<0.1%) in model accuracy. When you enable FP16 precision, SageMaker Neo chooses kernels from both FP16 and FP32 precision. For the ResNet50 model in this post, we are able to compile the model along with FP16 quantization by setting the precision_mode under compiler_options.

# Create a TensorFlow SageMaker model
tensorflow_model = TensorFlowModel(model_data=input_model_path,
                                   role=role,
                                   framework_version='2.3')

# Compile the model for EI accelerator in SageMaker Neo
output_path = '/'.join(input_model_path.split('/')[:-1])
compilation_job_name = prefix + "-fp16"
compiled_model_fp16 = tensorflow_model.compile(target_instance_family='ml_eia2',
                                               input_shape={"input_1": [1, 224, 224, 3]},
                                               output_path=output_path,
                                               role=role,
                                               job_name=compilation_job_name,
                                               framework='tensorflow',
                                               compiler_options={"precision_mode": "fp16"})

# Deploy the compiled model to SM endpoint with EI attached
predictor_compiled_fp16 = compiled_model_fp16.deploy(initial_instance_count=1,
                                                     instance_type='ml.m5.xlarge',
                                                     accelerator_type='ml.eia2.large')

# Benchmark the SageMaker endpoint
benchmark_sm_endpoint(predictor_compiled_fp16, data)

Benchmark data for model compiled with FP16 will appear as follows:

Doing warmup round of 100 inferences (not counted)
Running 1000 inferences
Client end-to-end latency percentiles:
Avg | P50 | P90 | P99
91.8721 | 112.8929 | 117.7130 | 122.6844

Compare latency with unoptimized model on EIA

We could see that model compiled with FP16 precision mode is faster than the model compiled with FP32, now let’s get the latency for an uncompiled model as well.

# Create a TensorFlow SageMaker model
tensorflow_model = TensorFlowModel(model_data=input_model_path,
                                   role=role,
                                   framework_version='2.3')

# Deploy the uncompiled model to SM endpoint with EI attached
predictor_uncompiled = tensorflow_model.deploy(initial_instance_count=1,
                                           instance_type='ml.m5.xlarge',
                                           accelerator_type='ml.eia2.large')

# Benchmark the SageMaker endpoint
benchmark_sm_endpoint(predictor_uncompiled, data)

Benchmark data for uncompiled model will appear as follows:

Doing warmup round of 100 inferences (not counted)
Running 1000 inferences
Client end-to-end latency percentiles:
Avg | P50 | P90 | P99
117.1654 | 137.9665 | 143.5326 | 150.2070

Clean up endpoints

Having an endpoint running will incur some costs. Therefore, we would delete the endpoint to release the resources after finishing this example.

session.delete_endpoint(predictor_compiled_fp32.endpoint_name)
session.delete_endpoint(predictor_compiled_fp16.endpoint_name)
session.delete_endpoint(predictor_uncompiled.endpoint_name)

Performance comparison

To understand the performance improvement from model compilation and quantization, you can visualize differences in percentile latency for models with different optimizations in following plot. For our model, we find that adding model compilation improves latency by 13.5% compared to the unoptimized model. Adding quantization (FP16) to the compiled model results in 27.5% improvement in latency compared to the unoptimized model.

Summary

SageMaker Elastic Inference is an easy-to-use solution for adding model optimizations to improve inference performance on Amazon SageMaker. With Elastic Inference accelerators, you can get GPU inference acceleration and remain more cost-effective than standalone SageMaker GPU instances. With SageMaker Neo, software-based acceleration provided by model optimizations further improves performance (27.5%) over unoptimized models.

If you have any questions or comments, use the Amazon SageMaker Discussion Forums or send an email to amazon-ei-feedback@amazon.com.


About the Authors

Jiacheng Guo is a Software Engineer with AWS AI. He is passionate about building high performance deep learning systems with state-of-art techniques. In his spare time, he enjoys drifting on dirt track and playing with his Ragdoll cat.

 

 

 

Santosh Bhavani is a Senior Technical Product Manager with the Amazon SageMaker Elastic Inference team. He focuses on helping SageMaker customers accelerate model inference and deployment. In his spare time, he enjoys traveling, playing tennis, and drinking lots of Pu’er tea.

Read More

Finding any Cartier watch in under 3 seconds

Cartier is legendary in the world of luxury — a name that is synonymous with iconic jewelry and watches, timeless design,  savoir-faire and exceptional customer service. 

Maison Cartier’s collection dates back to the opening of Louis-François Cartier’s very first Paris workshop in 1847. And with over 174 years of history, the Maison’s catalog is extensive, with over a thousand wristwatches, some with only slight variations between them. Finding specific models, or comparing several models at once, could take some time for a sales associate working at one of Cartier’s 265 boutiques — hardly ideal for a brand with a reputation for high-end client service. 

In 2020, Cartier turned to Google Cloud to address this challenge. 

An impressive collection needs an app to match 

Cartier’s goal was to develop an app to help sales associates find any watch in its immense catalog quickly. The app would use an image to find detailed information about any watch the Maison had ever designed (starting with the past decade) and suggest similar-looking watches with possibly different characteristics, such as price. 

But creating this app presented some unique challenges for the Cartier team. Visual product search uses artificial intelligence (AI) technology like machine learning algorithms to identify an item (like a Cartier wristwatch) in a picture and return related products. But visual search technology needs to be “trained” with a huge amount of data to recognize a product correctly — in this case, images of the thousands of watches in Cartier’s collections. 

As a Maison that has always been driven by its exclusive design, Cartier had very few in-store product images available. The photos that did exist weren’t consistent, varying in backgrounds, lighting, quality and styling. This made it very challenging to create an app that could categorize images correctly. 

On top of that, Cartier has very high standards for its client service. For the stores to successfully adopt the app, the visual product search app would need to identify products accurately 90% of the time and ideally return results within five seconds. 

Redefining Cartier’s luxury customer experience with AI technology

Working together with Cartier’s team, we helped them build a visual product search system using Google Cloud AI Platform services, including AutoML Vision and Vision API.

The system can recognize a watch’s colors and materials and then use this information to figure out which collection the watch is from. It analyzes an image and comes back with a list of the three watches that look most similar, which sales associates can click on to get more information. The visual product search system identifies watches with 96.5% accuracy and can return results within three seconds.

Now, when customers are interested in a specific Cartier watch, the boutique team can take a picture of the desired model (or use any existing photo of it) and use the app to find its equivalent product page online. The app can also locate products that look similar in the catalog, displaying each item with its own image and a detailed description that customers can explore if the boutique team clicks on it. Sales associates can also send feedback about how relevant the recommendations were so that the Cartier team can continually improve the app. For a deeper understanding of the Cloud and AI technology powering this app, check out this blog post

High-quality design and service never go out of style

Today, the visual product search app is used across all of the Maison’s global boutiques, helping sales associates find information about any of Cartier’s creations across its catalog. Instead of several minutes, associates can now answer customer questions in seconds. And over time, the Maison hopes to add other helpful features to the app. 

The success of this project shows it’s possible to embrace new technology and bring innovation while preserving the quality and services that have established Cartier as a force among luxury brands. With AI technology, the future is looking very bright. 

Read More

Fighting Fire with Insights: CAPE Analytics Uses Computer Vision to Put Geospatial Data and Risk Information in Hands of Property Insurance Companies

Every day, vast amounts of geospatial imagery are being collected, and yet, until recently, one of the biggest potential users of that trove — property insurers — had made surprisingly little use of it.

Now, CAPE Analytics, a computer vision startup and NVIDIA Inception member, seeks to turn that trove of geospatial imagery into better underwriting decisions, and is applying these insights to mitigate wildfire disasters.

Traditionally, the insurance industry could only rely on historic data for broad swaths of land, combined with an in-person visit. CAPE Analytics can use AI to produce detailed data on the vegetation density, roof material and proximity to surrounding structures. This provides a better way to calculate risk, as well as an avenue to help homeowners take actions to cut it.

“For the first time, insurers can quantify defensible space- the removal of flammable material, such as vegetation from around a home- with granular analytics,” said Kevin van Leer, director of Customer Success at CAPE Analytics. “CAPE allows insurance carriers to identify the vulnerability of a specific home and make recommendations to the homeowner. For example, our recent study shows that cutting back vegetation in the 10 feet surrounding a home is the most impactful action a homeowner can take to reduce their wildfire risk. It’s also much easier to achieve in comparison to the frequently recommended 30-to-100 foot buffer.”

As fire seasons grow longer and deadlier each year, and wildfires are driven by hotter, drier, and faster winds, the risk area widens into newer areas, not found on older maps. This makes up-to-date insights especially crucial.

“What’s unique about this dataset is that it’s very recent, and it’s high resolution,” said Kavan Farzaneh, head of marketing at the Mountain View, Calif., based company. “Using AI, we can analyze it at scale.”

Insights from such analysis extend beyond weather risk to “blue sky,” or day-to-day risk, as well. Whether that means determining the condition of a roof, factoring in new solar panels or detecting the presence of a trampoline, CAPE’s software seeks to optimize the underwriting process by helping insurers make more informed decisions about what policies to write.

And given that the six-year-old company already boasts more than 40 insurance industry customers and is backed by investments from several large insurance carriers, including the Hartford, State Farm and CSAA, CAPE Analytics appears to be on to something.

Creating More Accurate Records

For some time, insurance companies have used aerial imagery for claims verification, such as reviewing storm damage. But CAPE Analytics is converting that imagery into structured data that underwriters can use to streamline their decision-making process. The company is essentially creating more up-to-date property records, which traditionally come from tax assessor offices and other public records sources.

“We zeroed in on property underwriting because there was a void in accuracy, and data tends to be old,” said Busy Cummings, chief revenue officer at CAPE Analytics. “By using AI to tap into this objective ‘source of truth,’ we can improve the accuracy of existing data sources.”

And that means more efficiency for underwriters, who can avoid unnecessary inspections altogether thanks to having access to more current and complete data.

CAPE Analytics obtains its datasets from multiple imagery partners. Human labelers tag some of the data, and the company has trained algorithms that can then identify elements of an aerial image, such as whether a roof is flat or has gables, whether additional structures have been added, or if trees and brush are overgrowing the structure.

The company started training its models on several NVIDIA GPU-powered servers. It has since transitioned the bulk of its training activities to Amazon Web Services P3 instances running NVIDIA V100 Tensor Core GPUs.

Inferencing is running on the NVIDIA Triton inferencing server. CAPE Analytics relies on multiple Triton instances to run its models, with a load balancer distributing inferencing requests, allowing it to scale horizontally to meet shifting client demand. The company’s infrastructure makes it possible to do live inferencing of imagery, with geospatial data converted into actionable structured data in two seconds.

In Pursuit of Scale

Thanks to its membership in NVIDIA Inception, the company has recently been experimenting with the NVIDIA DGX A100 AI system to train larger networks on larger datasets. Jason Erickson, director of platform engineering at CAPE Analytics, said the experience with the DGX A100 has shown “what we could potentially achieve if we had unlimited resources.”

“We’ve been very fortunate to be a part of NVIDIA’s Inception program since 2017, which has afforded us opportunities to test new NVIDIA offerings, including data science GPU and DGX A100 systems, while engaging with the wider NVIDIA community,” said Farzaneh.

CAPE Analytics has every motivation to pursue more scale. Cummings said it has spent the past year focused on expanding from insurance underwriting into the real estate and mortgage markets, where there is demand to integrate property condition data into the tools that determine home values. The company also just announced it’s powering a new automated valuation model based on geospatial data.

With so many potential markets to explore, CAPE Analytics has to keep pushing the envelope.

“Machine learning is such a fast-moving world. Every day there are new papers and new methods and new models,” said Farzaneh. “We’re just trying to stay on the bleeding edge.”

Learn more about NVIDIA’s work with the financial services industry

Feature image credit: Paul Hanaoka on Unsplash.

The post Fighting Fire with Insights: CAPE Analytics Uses Computer Vision to Put Geospatial Data and Risk Information in Hands of Property Insurance Companies appeared first on The Official NVIDIA Blog.

Read More