Characterizing quantum advantage in machine learning by understanding the power of data

Characterizing quantum advantage in machine learning by understanding the power of data

Posted by Hsin-Yuan Huang, Google/Caltech, Michael Broughton, Google, Jarrod R. McClean, Google, Masoud Mohseni, Google.

Data drives machine learning. Large scale research and production ML both depend on high volume and high quality sources of data where it is often the case that more is better. The use of training data has enabled machine learning algorithms to achieve better performance than traditional algorithms at recognizing photos, understanding human languages, or even tasks that have been championed as premier applications for quantum computers such as predicting properties of novel molecules for drug discovery and the design of better catalysts, batteries, or OLEDs. Without data, these machine learning algorithms would not be any more useful than traditional algorithms.

While existing machine learning models run on classical computers, quantum computers provide the potential to design quantum machine learning algorithms that achieve even better performance, especially for modelling quantum-mechanical systems like molecules, catalysts, or high-temperature superconductors. Because the quantum world allows the superposition of exponentially many states to evolve and interfere at the same time while classical computers struggle at such tasks, one would expect quantum computers to have an advantage in machine learning problems that have a quantum origin. The hope is that such quantum advantage (the advantage of using quantum computers instead of classical computers) extends to machine learning problems in the classical domain, like computer vision or natural language processing.

TensorFlow image

Learning from data obtained in nature (e.g., physical experiments) can enable classical computers to solve some problems that are hard for classical computers without the data. However, an even larger class of problems can be solved using quantum computers. If we assume nature can be simulated using quantum computers, then quantum computers with data obtained in nature will not have further computational power.

To understand the advantage of quantum computers in machine learning problems, the following big questions come to mind:

  1. If data comes from a quantum origin, such as from experiments developing new materials, are quantum models always better at predicting than classical models?
  2. Can we evaluate when there is a potential for a quantum computer to be helpful for modelling a given data set, from either a classical or quantum origin?
  3. Is it possible to find or construct a data set where quantum models have an advantage over their classical counterparts?

We addressed these questions and more in a recent paper by developing a mathematical framework to compare classical modelling approaches (neural networks, tree-based models, etc) against quantum modelling approaches in order to understand potential advantages in making more accurate predictions. This framework is applicable both when the data comes from the classical world (MNIST, product reviews, etc) and when the data comes from a quantum experiment (chemical reaction, quantum sensors, etc).

Conventional wisdom might suggest that the use of data coming from quantum experiments that are hard to reproduce classically would imply the potential for a quantum advantage. However, we show that this is not always the case. It is perhaps no surprise to machine learning experts that with enough data, an arbitrary function can be learned. It turns out that this extends to learning functions that have a quantum origin, too. By taking data from physical experiments obtained in nature, such as experiments for exploring new catalysts, superconductors, or pharmaceuticals, classical ML models can achieve some degree of generalization beyond the training data. This allows classical algorithms with data to solve problems that would be hard to solve using classical algorithms without access to the data (rigorous justification of this claim is given in our paper).

We provide a method to quantitatively determine the amount of samples required to make accurate predictions in datasets coming from a quantum origin. Perhaps surprisingly, sometimes there is no great difference between the number of samples needed by classical and quantum models. This method also provides a constructive approach to generate datasets that are hard to learn with some classical models.

TensorFlow study graphic

An empirical demonstration of prediction advantage using quantum models compared to the best among a list of common classical models under different amounts of training data N. The projected quantum kernel we introduce has a significant advantage over the best tested Classical ML.

We go on to use this framework to engineer datasets for which all attempted traditional machine learning methods fail, but quantum methods do not. An example of one such data set is presented in the above figure. These trends are examined empirically in the largest gate-based quantum machine learning simulations to date, made possible with TensorFlow Quantum, which is an open source library for quantum machine learning. In order to carry out this work at large quantum system sizes a considerable amount of computing power was needed. The combination of quantum circuit simulation (~300 TeraFLOP/s) and analysis code (~800 TeraFLOP/s) written using TensorFlow and TensorFlow Quantum was able to easily reach throughputs as high as 1.1 PetaFLOP/s, a scale that is rarely seen in the crossover field of quantum computing and machine learning (Though it is nothing new for classical ML which has already hit the exaflop scale).

When preparing distributed workloads for quantum machine learning research in TensorFlow Quantum, the setup and infrastructure should feel very familiar to regular TensorFlow. One way to handle distribution in TensorFlow is with the tf.distribute module, which allows users to configure device placement as well as machine configuration in multi-machine workloads. In this work tf.distribute was used to distribute workloads across ~30 Google cloud machines (some containing GPUs) managed by Kubernetes. The major stages of development were:

  1. Develop a functioning single-node prototype using TensorFlow and TensorFlow Quantum.
  2. Incorporate minimal code changes to use MultiWorkerMirroredStrategy in a manually configured two-node environment.
  3. Create a docker image with the functioning code from step 2. and upload it to the Google container registry .
  4. Using the Google Kubernetes engine, launch a job following along with the ecosystem templates provided here .

TensorFlow Quantum has a tutorial showcasing a variation of step 1. where you can create a dataset (in simulation) that is out of reach for classical neural networks.

It is our hope that the framework and tools outlined here help to enable the TensorFlow community to explore precursors of datasets that require quantum computers to make accurate predictions. If you want to get started with quantum machine learning, check out the beginner tutorials on the TensorFlow Quantum website. If you are a quantum machine learning researcher, you can read our paper addressing the power of data in quantum machine learning for more detailed information. We are excited to see what other kinds of large scale QML experiments can be carried out by the community using TensorFlow Quantum. Only through expanding the community of researchers will machine learning with quantum computers reach its full potential.

Read More

Goodhart’s Law, Diversity and a Series of Seemingly Unrelated Toy Problems

Goodhart’s Law, Diversity and a Series of Seemingly Unrelated Toy Problems

Goodhart’s Law is an adage which states the following:

“When a measure becomes a target, it ceases to be a good measure.”

This is particularly pertinent in machine learning, where the source of many of
our greatest achievements comes from optimizing a target in the form of a loss
function. The most prominent way to do so is with stochastic gradient descent
(SGD), which applies a simple rule, follow the gradient:

For some step size $alpha$. Updates of this form have led to a series of
breakthroughs from computer vision to reinforcement learning, and it is easy to
see why it is so popular: 1) it is relatively cheap to compute using backprop
2) it is guaranteed to locally reduce the loss at every step and finally 3) it
has an amazing track record empirically.

AWS expands language support for Amazon Lex and Amazon Polly

AWS expands language support for Amazon Lex and Amazon Polly

At AWS, our mission is to enable developers and businesses with no prior machine learning (ML) expertise to easily build sophisticated, scalable, ML-powered applications with our AI services. Today, we’re excited to announce that Amazon Lex and Amazon Polly are expanding language support. You can build ML-powered applications that fit the language preferences of your users. These easy-to-use services allow you to add intelligence to your business processes, automate workstreams, reduce costs, and improve the user experience for your customers and employees in a variety of languages.

New and improved features

Amazon Lex is a service for building conversational interfaces into any application using voice and text. Amazon Lex now supports French, Spanish, Italian and Canadian French. With the addition of these new languages, you can build and expand your conversational experiences to better understand and engage your customer base in a variety of different languages and accents. Amazon Lex can be applied to a diverse set of use cases such as virtual agents, conversational IVR systems, self-service chatbots, or application bots. For a full list of languages, please go to Amazon Lex languages.

Amazon Polly, a service that turns text into lifelike speech offers voices for all Amazon Lex languages. Our first Australian English voice, Olivia, is now generally available in Neural Text-to-Speech (NTTS). Olivia’s unique vocal personality and voice sounds expressive, natural and is easy to follow. You can now choose among three Australian English voices: Russell, Nicole and Olivia. For a full list of Amazon Polly’s voices, please go to Amazon Polly voices.


“Growing demand for conversational experiences led us to launch Amazon Lex and Amazon Polly to enable businesses to connect with their customers more effectively,” shares Julien Simon, AWS AIML evangelist.

“Amazon Lex uses automatic speech recognition and natural language understanding to help organizations understand a customer’s intent, fluidly manage conversations and create highly engaging and lifelike interactions. We are delighted to advance the language capabilities of Lex and Polly. These launches allow our customers to take advantage of AI in the area of conversational interfaces and voice AI,” Simon says.

“Amazon Lex is a core AWS service that enables Accenture to deliver next-generation, omnichannel contact center solutions, such as our Advanced Customer Engagement (ACE+) platform, to a diverse set of customers. The addition of French, Italian, and Spanish to Amazon Lex will further enhance the accessibility of our global customer engagement solutions, while also vastly enriching and personalizing the overall experience for people whose primary language is not English. Now, we can quickly build interactive digital solutions based on Amazon’s deep learning expertise to deflect more calls, reduce contact center costs and drive a better customer experience in French, Italian, and Spanish-speaking markets. Amazon Lex can now improve customer satisfaction and localized brand awareness even more effectively,” says J.C. Novoa, Global Technical Evangelist – Advanced Customer Engagement (ACE+) for Accenture.

Another example is Clevy, a French start-up and AWS customer. François Falala-Sechet, the CTO of Clevy adds, “At Clevy, we have been utilizing Amazon Lex’s best-in-class natural language processing services to help bring customers a scalable low-code approach to designing, developing, deploying and maintaining rich conversational experiences with more powerful and more integrated chatbots. With the addition of Spanish, Italian and French in Amazon Lex, Clevy can now help our developers deliver chatbot experiences to a more diverse audience in our core European markets.”

Eudata helps customers implement effective contact and management systems. Andrea Grompone, the Head of Contact Center Delivery at Eudata says, “Ora Amazon Lex parla in italiano! We are excited about the new opportunities this opens for Eudata. Amazon Lex simplifies the process of creating automated dialog-based interactions to address challenges we see in the market. The addition of Italian allows us to build a customer experience that ensures both service speed and quality in our markets.”

Using the new features

To use the new Amazon Lex languages, simply choose the language when creating a new bot via the  Amazon Lex console or AWS SDK. The following screenshot shows the console view.

To learn more, visit the Amazon Lex Development Guide.

You can use new Olivia voice in the Amazon Polly console, the AWS Command Line Interface (AWS CLI), or AWS SDK. The feature is available across all AWS Regions supporting NTTS. For the full list of available voices, see Voices in Amazon Polly, or log in to the Amazon Polly console to try it out for yourself.

Summary

Use Amazon Lex and Amazon Polly to build more self-service bots, to voice-enable applications, and to create an integrated voice and text experience for your customers and employees in a variety of languages. Try them out for yourself!

 


About the Author

Esther Lee is a Product Manager for AWS Language AI Services. She is passionate about the intersection of technology and education. Out of the office, Esther enjoys long walks along the beach, dinners with friends and friendly rounds of Mahjong.

Read More

Join the Final Lap of the 2020 DeepRacer League at AWS re:Invent 2020

Join the Final Lap of the 2020 DeepRacer League at AWS re:Invent 2020

AWS DeepRacer is the fastest way to get rolling with machine learning (ML). It’s a fully autonomous 1/18th scale race car driven by reinforcement learning, a 3D racing simulator, and a global racing league. Throughout 2020, tens of thousands of developers honed their ML skills and competed in the League’s virtual circuit via the AWS DeepRacer console and 14 AWS Summit online events.

The AWS DeepRacer League’s 2020 season is nearing the final lap with the Championship at AWS re:Invent 2020. From November 10 through December 15, there are three ways to join in the racing fun: learn how to develop a competitive reinforcement learning model through our sessions, enter and compete in the racing action for a chance to win prizes, and watch to cheer on other developers as they race for the cup. More than 100 racers have already qualified for the Championship Cup, but there is still time to compete. Log in today to qualify for a chance to win the Championship Cup by entering the Wildcard round, offering the top 5 racers spots in the Knockout Rounds. Starting December 1, it’s time for the Knockout Rounds to start – and for racers to compete all the way to the checkered flag and the Championship Cup. The Grand Prize winner will receive a choice of either 10,000 USD AWS promotional credits and a chance to win an expenses-paid trip to an F1 Grand Prix in the upcoming 2021 season or a Coursera online Machine Learning degree scholarship with a maximum value of up to 20,000 USD. See our AWS DeepRacer 2020 Championships Official Rules for more details.

Watch the latest episode of DRTV news to learn more about how the Championship at AWS re:Invent 2020 will work.

Congratulations to our 2020 AWS re:Invent Championship Finalists!

Thanks to the thousands of developers who competed in the 2020 AWS DeepRacer League. Below is the list of our Virtual and Summit Online Circuit winners who qualified for the Championship at AWS re:Invent 2020.

Last chance for the Championship: Enter the Wildcard

Are you yet to qualify for the Championship Cup this season? Are you brand new to the league and want to take a shot at the competition? Well, you have one last chance to qualify with the Wildcard. Through November, the open-play wildcard race will be open. This race is a traditional virtual circuit style time trial race, taking place in the AWS DeepRacer console. Participants have until 11:59pm UTC November 30 (6:59pm EST, 3:59pm PST) to submit their fastest model. The top five competitors from the wildcard race will advance to the Championship Cup knockout.

Don’t worry if you don’t advance to the next round. There are chances for developers of all skill levels to compete and win at AWS re:Invent, including the AWS DeepRacer League open racing and special live virtual races. Visit our DeepRacer page for complete race schedule and additional details.

Here’s an overview of how the Championships are organized and how many racers participate in each round from qualifying through to the Grand Prix Finale.

Round 1: Live Group Knockouts

On December 1, racers need to be ready for anything in the championships, no matter what road blocks they may come across. In Round 1, competitors have the opportunity to participate in a brand-new live racing format on the console. Racers submit their best models and control maximum speed remotely from anywhere in the world, while their autonomous models attempt to navigate the track, complete with objects to avoid. They’ll have 3 minutes to try to achieve their single best lap to top the leaderboard. Racers will be split into eight groups based on their time zone, with start order determined by the warmup round (with the fastest racers from round 1 getting to go last in their group). The top four times in each group will advance to our bracket round. Tune in to AWS DeepRacer TV  throughout AWS re:Invent to catch the championship action. 

Round 2: Bracket Elimination

The top 32 remaining competitors will be placed into a single elimination bracket, where they face off against one another in a head-to-head format in a five-lap race. Head-to-head virtual matchups will proceed until eight racers remain. Results will be released on the AWS DeepRacer League page and in the console. 

Round 3: Grand Prix Finale

The final race will take place before the closing keynote on December 15 as an eight-person virtual Grand Prix. Similar to the F1 ProAm in May, our eight finalists will submit their model on the console and the AWS DeepRacer team will run the Grand Prix, where the eight racers simultaneously face off on the track in simulation, to complete five laps. The first car to successfully complete 5 laps and cross the finish line will be crowned the 2020 AWS DeepRacer Champion and officially announced at the closing keynote.

More Options for your ML Journey

If you’re ready to get over the starting line on your ML journey, AWS DeepRacer re:Invent sessions are the best place to learn ML fast.  In 2020, we have not one, not two, but three levels of ML content for aspiring developers to go from zero to hero in no time! Register now for AWS re:Invent to learn more about session schedules when they become available.

  • Get rolling with Machine Learning on AWS DeepRacer (200L). Get hands-on with AWS DeepRacer, including exciting announcements and enhancements coming to the league in 2021. Learn about the basics of machine learning and reinforcement learning (a machine learning technique ideal for autonomous driving). In this session, you can build a reinforcement learning model and submit that model to the AWS DeepRacer League for a chance to win prizes and glory.
  • Shift your Machine Learning model into overdrive with AWS DeepRacer analysis tools (300L). Make your way from the middle of the pack to the top of the AWS DeepRacer podium! This session extends your machine learning skills by exploring how human analysis of reinforcement learning through logs will improve your performance through trend identification and optimization to better prepare for new racing divisions coming to the league in 2021.
  • Replicate AWS DeepRacer architecture to master the track with SageMaker Notebooks (400L). Complete the final lap on your machine learning journey by demystifying the underlying architecture of AWS DeepRacer using Amazon SageMaker, AWS RoboMaker, and Amazon Kinesis Video Streams. Dive into SageMaker notebooks to learn how others have applied the skills acquired through AWS DeepRacer to real-world use cases and how you can apply your reinforcement learning models to relevant use cases.

You can take all the courses live during re:Invent or learn at your own speed on-demand. It’s up to you.  Visit the DeepRacer page at AWS re:Invent to register and find out more on when sessions will be available.

As you can see, there are many opportunities to up-level your ML skills, join in the racing action and cheer on developers as they go for the Championship Cup. Watch this page for schedule and video updates all through AWS re:Invent 2020!

 


About the Author

Dan McCorriston is a Senior Product Marketing Manager for AWS Machine Learning. He is passionate about technology, collaborating with developers, and creating new methods of expanding technology education. Out of the office he likes to hike, cook and spend time with his family.

Read More

Measuring forecast model accuracy to optimize your business objectives with Amazon Forecast

Measuring forecast model accuracy to optimize your business objectives with Amazon Forecast

We’re excited to announce that you can now measure the accuracy of your forecasting model to optimize the trade-offs between under-forecasting and over-forecasting costs, giving you flexibility in experimentation. Costs associated with under-forecasting and over-forecasting differ. Generally, over-forecasting leads to high inventory carrying costs and waste, whereas under-forecasting leads to stock-outs, unmet demand, and missed revenue opportunities. Amazon Forecast allows you to optimize these costs for your business objective by providing an average forecast as well as a distribution of forecasts that captures variability of demand from a minimum to maximum value. With this launch, Forecast now provides accuracy metrics for multiple distribution points when training a model, allowing you to quickly optimize for under-forecasting and over-forecasting without the need to manually calculate metrics.

Retailers rely on probabilistic forecasting to optimize their supply chain to balance the cost of under-forecasting, leading to stock-outs, with the cost of over-forecasting, leading to inventory carrying costs and waste. Depending on the product category, retailers may choose to generate forecasts at different distribution points. For example, grocery retailers choose to over-stock staples such as milk and eggs to meet variable demand. While these staple foods have relatively low carrying costs, stock-outs may not only lead to a lost sale, but also a complete cart abandonment. By maintaining high in-stock rates, retailers can improve customer satisfaction and customer loyalty. Conversely, retailers may choose to under-stock substitutable products with high carrying costs when the cost of markdowns and inventory disposal outweighs the occasional lost sale. The ability to forecast at different distribution points allows retailers to optimize these competing priorities when demand is variable.

Anaplan Inc., a cloud-native platform for orchestrating business performance, has integrated Forecast into its PlanIQ solution to bring predictive forecasting and agile scenario planning to enterprise customers. Evgy Kontorovich, Head of Product Management, says, “Each customer we work with has a unique operational and supply chain model that drives different business priorities. Some customers aim to reduce excess inventory by making more prudent forecasts, while others strive to improve in-stock availability rates to consistently meet customer demand. With forecast quantiles, planners can evaluate the accuracy of these models, evaluate model quality, and fine-tune them based on their business goals. The ability to assess the accuracy of forecast models at multiple custom quantile levels further empowers our customers to make highly informed decisions optimized for their business.”

Although Forecast provided the ability to forecast at the entire distribution of variability to manage the trade-offs of under-stocking and over-stocking, the accuracy metrics were only provided for the minimum, median, and maximum predicted demand, providing an 80% confidence band centered on the median. To evaluate the accuracy metric at a particular distribution point of interest, you first had to create forecasts at that point, and then manually calculate the accuracy metrics on your own.

With today’s launch, you can assess the strengths of your forecasting models at any distribution point within Forecast without needing to generate forecasts and manually calculate metrics. This capability enables you to experiment faster, and more cost-effectively arrive at a distribution point for your business needs.

To use this new capability, you select forecast types (or distribution points) of interest when creating the predictor. Forecast splits the input data into training and testing datasets, trains and tests the model created, and generates accuracy metrics for these distribution points. You can continue to experiment to further optimize forecast types without needing to create forecasts at each step.

Understanding Forecast accuracy metrics

Forecast provides different model accuracy metrics for you to assess the strength of your forecasting models. We provide the weighted quantile loss (wQL) metric for each selected distribution point, and weighted average percentage error (WAPE) and root mean square error (RMSE), calculated at the mean forecast. For each metric, a lower value indicates a smaller error and therefore a more accurate model. All these accuracy metrics are non-negative.

Let’s take an example of a retail dataset as shown in the following table to understand these different accuracy metrics. In this dataset, three items are being forecasted for 2 days.

Item ID Remarks Date Actual Demand Mean Forecast P75 Forecast

P75 Error

(P75 Forecast – Actual)

Mean Absolute error

|Actual – Mean Forecast|

Mean Squared Error

(Actual – Mean Forecast)2

Item1 Item 1 is a popular item with a lot of product demand Day 1 200 195 220 20 5 25
Day 2 100 85 90 -10 15 225
Item2 With low product demand, Item 2 is in the long tail of demand Day 1 1 2 3 2 1 1
Day 2 2 3 5 3 1 1
Item3 Item 3 is in the long tail of demand and observes a forecast with a large deviation from actual demand Day 1 5 45 50 45 40 1600
Day 2 5 35 40 35 30 900
Total Demand = 313 Used in wQL[0.75] Used in WAPE Used in RMSE

The following table summarizes the calculated accuracy metrics for the retail dataset use case.

Metric Value
wQL[0.75] 0.21565
WAPE 0.29393
RMSE 21.4165

In the following sections, we explain how each metric was calculated and recommendations for the best use case for each metric.

Weighted quantile loss (wQL)

The wQL metric measures the accuracy of a model at specified distribution points called quantiles. This metric helps capture the bias inherent in each quantile. For a grocery retailer who prefers to over-stock staples such as milk, choosing a higher quantile such as 0.75 (P75) better captures spikes in demand and therefore is more informative than forecasting at the median quantile of 0.5 (P50).

In this example, more emphasis is given to over-forecasting than under-forecasting and suggests that a higher amount of stock is needed to satisfy customer demand with 75% probability of success. In other words, the actual demand is less than or equal to forecasted demand 75% of the time, allowing the grocer to maintain target in-stock rates with less safety stock.

We recommend using the wQL measure at different quantiles when the costs of under-forecasting and over-forecasting differ. If the difference in costs is negligible, you may consider forecasting at the median quantile of 0.5 (P50) or use the WAPE metric, which is evaluated using the mean forecast. The following figure illustrates the probability of meeting demand dependent on quantile.

For the retail dataset example, the P75 forecast indicates that we want to prioritize over-forecasting and penalize under-forecasting. To calculate the wQL[0.75], sum the positive values in the P75 Error column and multiply them by a smaller weight of 1-0.75 = 0.25, and sum the negative values in the P75 Error column and multiply them by a larger weight of 0.75 to penalize under-forecasting. The wQL[0.75] is as follows:

Weighted absolute percentage error (WAPE)

The WAPE metric is the sum of the absolute error normalized by the total demand. The WAPE doesn’t penalize for under-forecasting or over-forecasting, and uses the average (mean) expected value of forecasts to calculate the differences. We recommend using the WAPE metric when the difference in costs of under-forecasting or over-forecasting is negligible, or if you want to evaluate the model accuracy at the mean forecast. For example, to predict the amount of cash needed at a given time in an ATM, a bank may choose to meet average demand because there is less concern of losing a customer or having excess cash in the ATM machine. In this example, you may choose to forecast at the mean and choose the WAPE as your metric to evaluate the model accuracy.

The normalization or weighting helps compare models from different datasets. For example, if the absolute error summed over the entire dataset is 5, it’s hard to interpret the quality of this metric without knowing the scale of total demand. A high total demand of 1000 results in a low WAPE (0.005) and a small total demand of 10 results in a high WAPE (0.5). The weighting in the WAPE and wQL allows these metrics to be compared across datasets with different scales.

The normalization or weighting also helps evaluate datasets that contain a mix of items with large and small demand. The WAPE metric emphasizes accuracy for items with larger demand. You can use WAPE for datasets where forecasting for a small number of SKUs drives majority of sale. For example, a retailer may prefer to use the WAPE metric to put less emphasis on forecasting errors related to special edition items, and prioritize forecasting errors for standard items with most sales.

In our retail dataset use case, the WAPE is equal to the sum of the absolute error column divided by the sum of the actual demand column (total demand).

Because the sum of the total demand is mostly driven by Item1, the WAPE gives more importance to the accuracy of the popular Item1.

Many retail customers work with sparse datasets, where most of their SKUs are sold infrequently. For most of the historical data points, the demand is 0. For these datasets, it’s important to account for the scale of total demand, making wQL and WAPE a better metric over RMSE to evaluate sparse datasets. The RMSE metric doesn’t take into account the scale of total demand, and returns a lower RMSE value by considering the total number of historical data points and the total number of SKUs, giving you a false sense of security that you have an accurate model.

Root mean square error (RMSE)

The RMSE metric is the square root of the sum of the squared errors (differences of the mean forecasted and actual values) divided by the product of the number of items and number of time points. The RMSE also doesn’t penalize for under-forecasting or over-forecasting, and can be used when the trade-offs between under-forecasting or over-forecasting are negligible or if you prefer to forecast at mean. Because the RMSE is proportional to the square of the errors, it’s sensitive to large deviations between the actual demand and forecasted values.

However, you should use RMSE with caution, because a few large deviations in forecasting errors can severely punish an otherwise accurate model. For example, if one item in a large dataset is severely under-forecasted or over-forecasted, the error in that item skews the entire RMSE metric drastically, and may make you reject an otherwise accurate model prematurely. For use cases where a few large deviations are not of importance, consider using wQL or WAPE.

In our retail dataset example, the RMSE is equal to square root of the sum of the squared error column divided by the total number of points (3 items * 2 days = 6).

The RMSE gives more importance to the large deviation of forecasting error of Item3, resulting in a higher RMSE value.

We recommend using the RMSE metric when a few large incorrect predictions from a model on some items can be very costly to the business. For example, a manufacturer who is predicting machine failure may prefer to use the RMSE metric. Because operating machinery is critical, any large deviation from the actual and forecasted demand, even infrequent, needs to be over-emphasized when assessing model accuracy.

The following table summarizes our discussion on selecting an accuracy metric dependent on your use case.

Use case wQL WAPE RMSE
Optimizing for under-forecasting or over-forecasting, which may have different implications X
Prioritizing popular items or items with high demand is more important than low-demand items X X
Emphasizing business costs related to large deviations in forecasts errors X
Assessing sparse datasets with 0 demand for most items in the historical data points X X

Selecting custom distribution points for assessing model accuracy in Forecast

You can measure model accuracy in Forecast through the CreatePredictor API and GetAccuracyMetrics API or using the Forecast console. In this section, we walk through the steps to use the console.

  1. On the Forecast console, create a dataset group.

  1. Upload your dataset.

  1. In the navigation pane, choose Predictors.
  2. Choose Train predictor.

  1. Under Predictor Settings for Forecast types, you can enter up to five distribution points of your choosing. Mean is also accepted.

If unspecified, default quantiles of 0.1, 0.5, and 0.9 are used to train a predictor and calculate accuracy metrics.

  1. For Algorithm selection, select Automatic (AutoML).

AutoML optimizes a model for the specified quantiles.

  1. Optionally, for Number of backtest windows, you can select up to five to assess your model strength.

  1. After your predictor is trained, choose your predictor on the Predictors page to view details of the accuracy metrics.

On the predictor’s details page, you can view the overall model accuracy metrics and the metrics from each backtest windows that you specified. Overall model accuracy metrics are calculating by averaging the individual backtest window metrics.

 

  1. Now that your model is trained, choose Forecasts in the navigation pane.
  2. Choose Create a forecast.
  3. Select your trained predictor to create a forecast.

Forecasts are generated for the distribution points used during training a predictor. You also have the option to specify different distribution points for generating forecasts.

Tips and best practices

In this section, we share a few tips and best practices when using Forecast:

  • Before experimenting with Forecast, define your business problem related to costs of under-forecasting or over-forecasting. Evaluate the trade-offs and prioritize if you would rather over-forecast than under.
  • Experiment with multiple distribution points to optimize your forecast model to balance the costs associated with under-forecasting and over-forecasting.
  • If you’re comparing different models, use the wQL metric at the same quantile for comparison. The lower the value, the more accurate the forecasting model.
  • Forecast allows you to select up to five backtest windows. Forecast uses backtesting to tune predictors and produce accuracy metrics. To perform backtesting, Forecast automatically splits your time-series datasets into two sets: training and testing. The training set is used to train your model, and the testing set to evaluate the model’s predictive accuracy. We recommend choosing more than one backtest window to minimize selection bias that may make one window more or less accurate by chance. Assessing the overall model accuracy from multiple backtest windows provides a better measure of the strength of the model.

Conclusion

You can now measure the accuracy of your forecasting model at multiple distribution points of your choosing with Forecast. This gives you more flexibility to align Forecast with your business, because the impact of over-forecasting frequently differs from under-forecasting. You can use this capability in all Regions where Forecast is publicly available. For more information about Region availability, see Region Table. For more information about APIs specifying custom distribution points when creating a predictor and accessing accuracy metrics, see the documentation for the CreatePredictor API and GetAccuracyMetrics API. For more details on evaluating a predictor’s accuracy for multiple distribution points, see Evaluating Predictor Accuracy.

 


About the Authors

Namita Das is a Sr. Product Manager for Amazon Forecast. Her current focus is to democratize machine learning by building no-code/low-code ML services. On the side, she frequently advises startups and is raising a puppy named Imli.

 

 

 

Danielle Robinson is an Applied Scientist on the Amazon Forecast team. Her research is in time series forecasting and in particular how we can apply new neural network-based algorithms within Amazon Forecast. Her thesis research was focused on developing new, robust, and physically accurate numerical models for computational fluid dynamics. Her hobbies include cooking, swimming, and hiking.

Read More

Fast and Furio: Artist Accelerates Rendering on the Go with NVIDIA RTX Laptop

Fast and Furio: Artist Accelerates Rendering on the Go with NVIDIA RTX Laptop

No traveling for work? No problem — at least not for character concept artist Furio Tedeschi.

Tedeschi has worked on films like Transformers and Bumblebee: The Movie, as well as games like Mass Effect: Andromeda from Bioware. Currently, he’s working on the upcoming Cyberpunk 2077 video game.

No stranger to working while traveling, Tedeschi often goes from one place to another for his projects. So he’s always been familiar with using a workstation that can keep up with his mobility.

But when the coronavirus pandemic hit, the travel stopped. Tedeschi had a workstation setup at home, so he adjusted quickly to remote work. However, he didn’t want to be chained to his desk, and he wanted a mobile device that could handle the complexity of his designs.

With the Lenovo ThinkPad P53 accelerated by the NVIDIA RTX 5000 GPU, Tedeschi gets the speed and performance he needs to handle massive scenes and render graphics in real time, no matter where he works from.

All images courtesy of Furio Tedeschi.

RTX-tra Boost for Remote Rendering Work

One of the biggest challenges Tedeschi faced was migrating projects, which often contained heavy scenes, from his desktop workstation to a mobile setup.

He used to reduce file sizes and renders because his previous laptop couldn’t run complex graphics workflows. But with the RTX laptop, Tedeschi has enough power to quickly migrate his scenes without slowing anything down. He no longer has to reduce scene sizes because the RTX 5000 can easily handle them at full quality across the creative applications he uses to work on scenes.

“The RTX laptop has made working on the move very comfortable for me, as it packs enough power to handle even some of the heavier scenes I use for concepts,” said Tedeschi. “Now I feel 100 percent comfortable knowing I can work on renders when I go back to traveling.”

With the RTX-powered Lenovo ThinkPad P53, he gets speed and performance that’s similar to his workstation at home. The laptop allows Tedeschi to see the models and scenes from every single view, all in real time.

When it comes to his artistic style, Tedeschi likes to focus on form and shapes, and he uses applications like ZBrush and KeyShot to create his designs. After using the RTX-powered mobile workstation, Tedeschi experienced massive rendering improvements with both applications.

With KeyShot specifically, the GPU rendering is “leaps faster and gives instant feedback through the render viewpoint,” according to Tedeschi.

The faster workflows, combined with the ability to see his concepts running in real time, allows Tedeschi to choose better angles and lighting and get closer to bringing his ideas to life in a final image. This results in reduced prep time, faster productions and even less stress.

With Keyshot, ZBrush and Photoshop all running at the same time, the laptop’s battery lasts up to two hours without being plugged in, so Tedeschi can easily get in the creative flow and work on designs from anywhere in his house without being distracted.

Learn more about Furio Tedeschi and RTX-powered mobile workstations.

The post Fast and Furio: Artist Accelerates Rendering on the Go with NVIDIA RTX Laptop appeared first on The Official NVIDIA Blog.

Read More

AWS Finance and Global Business Services builds an automated contract-processing platform using Amazon Textract and Amazon Comprehend

AWS Finance and Global Business Services builds an automated contract-processing platform using Amazon Textract and Amazon Comprehend

Processing incoming documents such as contracts and agreements is often an arduous task. The typical workflow for reviewing signed contracts involves loading, reading, and extracting contractual terms from agreements, which requires hours of manual effort and intensive labor.

At AWS Finance and Global Business Services (AWS FGBS), this process typically takes more than 150 employee hours per month. Often, multiple analysts manually input key contractual data into an Excel workbook in batches of a hundred contracts at a time.

Recently, one of the AWS FGBS teams responsible for analyzing contract agreements set out to implement an automated workflow to process incoming documents. The goal was to liberate specialized accounting resources from routine and tedious manual labor, so they had more free time to perform value-added financial analysis.

As a result, the team built a solution that consistently parses and stores important contractual data from an entire contract in under a minute with high fidelity and security. Now an automated process only requires a single analyst working 30 hours a month to maintain and run the platform. This is a 5x reduction in processing time and significantly improves operational productivity.

This application was made possible by two machine learning (ML)-powered AWS-managed services: Amazon Textract, which enables efficient document ingestion, and Amazon Comprehend, which provides downstream text processing that enables the extraction of key terms.

The following post presents an overview of this solution, a deep dive into the architecture, and a summary of the design choices made.

Contract workflow

The following diagram illustrates the architecture of the solution:

The AWS FGBS team, with help from the AWS Machine Learning (ML) Professional Services (ProServe) team, created an automated, durable, and scalable contract processing platform. Incoming contract data is stored in an Amazon Simple Storage Service (Amazon S3) data lake. The solution uses Amazon Textract to convert the contracts into text form and Amazon Comprehend for text analysis and term extraction. Critical terms and other metadata extracted from the contracts are stored in Amazon DynamoDB, a database designed to accept key-value and document data types. Accounting users can access the data via a custom web user interface hosted in Amazon CloudFront, where they can perform key user actions such as error checking, data validation, and custom term entry. They can also generate reports using a Tableau server hosted in Amazon Appstream, a fully-managed application streaming service. You can launch and host this end-to-end contract-processing platform in an AWS production environment using an AWS CloudFormation template.

Building an efficient and scalable document-ingestion engine

The contracts that the AWS FGBS team encounters often include high levels of sophistication, which has historically required human review to parse and extract relevant information. The format of these contracts has changed over time, varying in length and complexity. Adding to the challenge, the documents not only contain free text, but also tables and forms that hold important contextual information.

To meet these needs, the solution uses Amazon Textract as the document-ingestion engine. Amazon Textract is a powerful service built on top of a set of pre-trained ML computer vision models tuned to perform Optical Character Recognition (OCR) by detecting text and numbers from a rendering of a document, such as an image or PDF. Amazon Textract takes this further by recognizing tables and forms so that contextual information for each word is preserved. The team was interested in extracting important key terms from each contract, but not every contract contained the same set of terms. For example, many contracts hold a main table that has the name of a term on the left-hand side, and the value of the term on the right-hand side.  The solution can use the result from the Form Extraction feature of Amazon Textract to construct a key-value pair that links the name and the value of a contract term.

The following diagram illustrates the architecture for batch processing in Amazon Textract:

A pipeline processes incoming contracts using the asynchronous Amazon Textract APIs enabled by Amazon Simple Queue Service (Amazon SQS), a distributed message queuing service. The AWS Lambda function StartDocumentTextAnalysis initiates the Amazon Textract processing jobs, which are triggered when new files are deposited into the contract data lake in Amazon S3. Because contracts are loaded in batches, and the asynchronous API can accommodate PDF files natively without first converting to an image file format, the design choice was made to build an asynchronous process to improve scalability. The solution uses DocumentAnalysis APIs instead of the DocumentDetection APIs so that recognition of tables and forms is enabled.

When a DocumentAnalysis job is complete, a JobID is returned and input into a second queue, where it is used to retrieve the completed output from Amazon Textract in the form of a JSON object. The response contains both the extracted text and document metadata within a large nested data structure. The GetDocumentTextAnalysis function handles the retrieval and storage of JSON outputs into an S3 bucket to await post-processing and term extraction.

Extracting standard and non-standard terms with Amazon Comprehend

The following diagram illustrates the architecture of key term extraction:

After the solution has deposited the processed contract output from Amazon Textract into an S3 bucket, a term extraction pipeline begins. The primary worker for this pipeline is the ExtractTermsFromTextractOutput function, which encodes the intelligence behind the term extraction.

There are a few key actions performed at this step:

  1. The contract is broken up into sections and basic attributes are extracted, such as the contract title and contract ID number.
  2. The standard contract terms are identified as key-value pairs. Using the table relationships uncovered in the Amazon Textract output, the solution can find the right term and value from tables containing key terms, while taking additional steps to convert the term value into the correct data format, such as parsing dates.
  3. There is a group of non-standard terms that can either be within a table or embedded within the free text in certain sections of the contract. The team used the Amazon Comprehend custom classification model to identify these sections of interest.

To produce the proper training data for the model, subject matter experts on the AWS FGBS team annotated a large historical set of contracts to identify examples of standard contract sections (that don’t contain any special terms) and non-standard contract sections (that contain special terms and language). An annotation platform hosted on an Amazon SageMaker instance displays individual contract sections that were split up previously using the ExtractTermsFromTextractOutput function so that a user could label the sections accordingly.

A final custom text classification model was trained to perform paragraph section classification to identify non-standard sections with an F1 score over 85% and hosted using Amazon Comprehend. One key benefit of using Amazon Comprehend for text classification is the straightforward format requirements of the training data, because the service takes care of text preprocessing steps such as feature engineering and accounting for class imbalance. These are typical considerations that need to be addressed in custom text models.

After dividing contracts into sections, a subset of the paragraphs is passed to the custom classification model endpoint from the ExtractTermsFromTextractOutput function to detect the presence of non-standard terms. The final list of contract sections that were checked and sections that were flagged as non-standard is recorded in the DynamoDB table. This mechanism notifies an accountant to examine sections of interest that require human review to interpret non-standard contract language and pick out any terms that are worth recording.

After these three stages are complete (breaking down contract sections, extracting standard terms, and classifying non-standard sections using a custom model), a nested dictionary is created with key-value pairs of terms. Error checking is also built into this processing stage. For each contract, an error file is generated that specifies exactly which term extraction caused the error and the error type. This file is deposited into the S3 bucket where contract terms are stored so that users can trace back any failed extractions for a single contract to its individual term. A special error alert phrase is incorporated into the error logging to simplify searching through Amazon CloudWatch logs to locate points where extraction errors occurred. Ultimately, a JSON file containing up to 100 or more terms is generated for each contract and pushed into an intermediate S3 bucket.

The extracted data from each contract is sent to a DynamoDB database for storage and retrieval. DynamoDB serves as a durable data store for two reasons:

  • This key-value and document database has flexibility over a relational database because it can accept data structures such as large string values and nested key-value pairs, while not requiring a pre-defined schema, so that new terms can easily be added to the database in the future as contracts evolved.
  • This fully managed database delivers single-digit millisecond performance at scale so the custom front end and reporting can be integrated on top of this service.

Additionally, DynamoDB supports continuous backups and point-in-time-recovery, which enables data restoration in tables to any single second for the prior 35 days. This feature was crucial in earning trust from team; it protects their critical data against accidental deletions or edits and provides business continuity of essential downstream tasks such as financial reporting. An important check to perform during data upload is the DynamoDB item-size limit, which is set at 400 KB. With longer contracts, and long lists of extracted terms, some contract outputs exceed this limit, so a check is performed in the term extraction function to break up larger entries into multiple JSON objects.

Secure data access and validation through a custom web UI

The following diagram illustrates the custom web user interface (UI) and reporting architecture:

To effectively interact with the extracted contract data, the team built a custom web UI. The web application, built using the Angular v8 framework, is hosted through Amazon S3 on Amazon CloudFront, a cloud content delivery network. When the Amazon CloudFront URL is accessed, the user is first authorized and authenticated against a user pool with strict permissions allowing them to view and interact with this UI. After validation, the session information is saved to make sure the user stays logged in for only a set amount of time.

On the landing page, the user can navigate to the three key user scenarios displayed as links on the navigation panel:

  • Search for Records – View and edit a record for a contract
  • View Record History – View the record edit history for a contract
  • Appstream Reports – Open the reporting dashboard hosted in Appstream

To view a certain record, the user is prompted to search for the file using the specific customer name or file name. In the former, the search returns the latest records for each contract with the same customer name. In the latter, the search only returns the latest record for a specific contract with that file name.

After the user identifies the right contract to examine or edit, the user can choose the file name and go to the Edit Record page. This displays the full list of terms for a contract and allows the user to edit, add to, or delete the information extracted from the contract. The latest record is retrieved with form fields for each term, which allows the user to choose the value to validate the data, and edit the data if errors are identified. The user can also add new fields by choosing Add New Field and entering a key-value pair for the custom term.

After updating the entry with edited or new fields, the user can choose the Update Item button. This triggers the data for the new record being passed from the frontend via Amazon API Gateway to the PostFromDynamoDB function using a POST method, generating a JSON file that is pushed to the S3 bucket holding all the extracted term data. This file triggers the same UpdateDynamoDB function that pushed the original term data to the DynamoDB table after the first Amazon Textract processing run.

After verification, the user can choose Delete Record to delete all versions of the contract consistently through the DynamoDB and to the S3 buckets that store all the extracted contract data. This is initiated using the DELETE method, triggering the DeleteFromDynamoDB function that expunges the data. During contract data validation and editing, every update to a field that is pushed creates a new data record that automatically logs a timestamp and user identity, based on the login profile, to ensure fine-grained tracking of the edit history.

To take advantage of the edit tracking, the user can search for a contract and view the entire collection of records for a single contract, ordered by timestamp. This allows the user to compare edits made across time, including any custom fields that were added, and link those edits back to the editor identity.

To make sure the team could ultimately realize the business value from this solution, they added a reporting dashboard pipeline to the architecture. Amazon AppStream 2.0, a fully managed application streaming service, hosts an instance of the reporting application. AWS Glue crawlers build a schema for the data residing in Amazon S3, and Amazon Athena queries and loads the data into the Amazon AppStream instance and into the reporting application.

Given the sensitivity of this data for Amazon, the team implemented several security considerations so they could safely interact with the data without unauthorized third-party access. An Amazon Cognito authorizer secures the API Gateway. During user login, Amazon internal single sign-on with two-factor authentication is verified against an Amazon Cognito user pool that only contains relevant team members.

To view record history, the user can search for a contract and examine the entire collection of records for a single contract to review any erroneous or suspicious edits and trace back to the exact time of edit and identity of editor.

The team also took additional application-hardening steps to secure the architecture and front end. AWS roles and policies are ensured to be narrowly scoped according to least privilege. For storage locations such as Amazon S3 and DynamoDB, access is locked down and protected with server-side encryption enabled to cover encryption at rest, all public access is blocked, and versioning and logging is enabled. To cover encryption in transit, VPC interface endpoints powered by AWS PrivateLink connect AWS services together so data is transmitted over private AWS endpoints versus public internet. In all the Lambda functions, temporary data handling best practices are adhered to by using the Python tempfile library to address time of use attacks (TOCTOU) where attackers may pre-emptively place files at specified locations. Because Lambda is a serverless service, all data stored in a temp directory is deleted after invoking the Lambda function unless the data is pushed to another storage location, such as Amazon S3.

Customer data privacy is a top concern for the AWS FGBS team and AWS service teams. Although incorporating additional data to further train ML models in AWS services like Amazon Textract and Amazon Comprehend is crucial towards improving performance, you always retain the option to withhold your data. To prevent the retention of contract data processed by Amazon Textract and Amazon Comprehend to be stored for model retraining purposes, the AWS FGBS team requested for Customer Data Opt-Out by creating a customer support ticket.

To verify the integrity of the application given the sensitivity of the data processed, both internal security reviews and penetration testing from external vendors were performed. That included static code review, dynamic fuzz testing, form validation, script injection and cross site scripting (XSS), and other items from the OWASP top 10 web application security risks. To address distributed denial-of-service (DDos) attacks, throttling limits were set on the API Gateway, and CloudWatch alarms were set for a threshold invocation limit.

After security requirements were integrated into the application, the entire solution was codified into an AWS CloudFormation template so the solution can be launched into the team’s production account. The CloudFormation templates are comprised of nested stacks, delineating key components of the application’s infrastructure (front-end versus back-end services). A continuous deployment pipeline was set up using AWS CodeBuild to build and deploy any change to the application code or infrastructure. Development and production environments were created in two separate AWS accounts, with deployment into these environments managed by environment variables set in the respective accounts.

Conclusion

The application is now live and being tested with hundreds of contracts every month. Most importantly, the business is beginning to realize the benefits of saving time and value by automating this previously routine business process.

The AWS ProServe team is working with the AWS FGBS team on a second phase of the project to further enhance the solution. To improve accuracy of term extraction, they’re exploring pretrained and custom Amazon Comprehend NER models. They will implement a retraining pipeline so the platform intelligence can improve over time with new data. They’re also considering a new service, Amazon Augmented AI (Amazon A2I), which is a human-review capability, for aiding in reviewing low-confidence Amazon Textract outputs and generating new training data for model improvement.

As exciting new ML services are being launched by AWS, the AWS FGBS team hopes that solutions like these will replace legacy accounting practices as they continue to achieve their goals of modernizing their business operations.

Get started today! Explore your use case with the services mentioned in this post and many others on the AWS Management Console.

 


About the Authors

Han Man is a Senior Data Scientist with AWS Professional Services. He has a PhD in engineering from Northwestern University and has several years of experience as a management consultant advising clients in manufacturing, financial services, and energy. Today he is passionately working with customers from a variety of industries to develop and implement machine learning & AI solutions on AWS. He enjoys following the NBA and playing basketball in his spare time.

 

AWS Finance and Global Business Services Team

Carly Huang is a Senior Financial Analyst at Amazon Accounting supporting AWS revenue. She holds a Bachelor of Business Administration from Simon Fraser University and is a Chartered Professional Accountant (CPA) with several years of experience working as an auditor focusing on technology and manufacturing clients. She is excited to find new ways using AWS services to improve and simplify existing accounting processes. In her free time, she enjoys traveling and running.

 

 

Shonoy Agrawaal is a Senior Manager at Amazon Accounting supporting AWS revenue. He holds a Bachelor of Business Administration from the University of Washington and is a Certified Public Accountant with several years of experience working as an auditor focused on retail and financial services clients. Today, he supports the AWS business with accounting and financial reporting matters. In his free time, he enjoys traveling and spending time with family and friends.

 

 

AWS ProServe Team

Nithin Reddy Cheruku is a Sr. AI/ML architect with AWS Professional Services, helping customers digital transform leveraging emerging technologies. He likes to solve and work on business and community problems that can bring a change in a more innovative manner . Outside of work, Nithin likes to play cricket and ping pong.

 

 

Huzaifa Zainuddin is a Cloud Infrastructure Architect with AWS Professional Services. He has several years of experience working in a variety different technical roles. He currently works with customers to help design their infrastructure as well deploy, and scale applications on AWS. When he is not helping customers, he enjoys grilling, traveling, and playing the occasional video games.

 

 

Ananya Koduri is an Application Cloud Architect with the AWS Professional Services – West Coast Applications Teams. With a Master’s Degree in Computer Science, she’s been a consultant in the tech industry for 5 years, with varied clients in the Government, Mining and the Educational sector. In her current role, she works closely with the clients to implement the architectures on AWS. In her spare time she enjoys long hikes and is a professional classical dancer.

 

 

Vivek Lakshmanan is a Data & Machine Learning Engineer at Amazon Web Services. He has a Master’s degree in Software Engineering with specialization in Data Science from San Jose State University. Vivek is excited on applying cutting-edge technologies and building AI/ML solutions to customers in cloud. He is passionate about Statistics, NLP and Model Explainability in AI/ML. In his spare time, he enjoys playing cricket and taking unplanned road trips.

Read More

Meet Olivia: The first NTTS voice in Australian English for Amazon Polly

Meet Olivia: The first NTTS voice in Australian English for Amazon Polly

Amazon Polly is launching a new Australian English voice, Olivia. Amazon Polly turns text into lifelike speech, allowing you to build speech-enabled products. Building upon the existing Australian English Standard voices, Nicole and Russell, Olivia is the first Australian English voice in Amazon Polly powered by the Neural Text-to-Speech (NTTS) technology.

The NTTS voices in Amazon Polly are designed to create a great user experience that is engaging and similar to listening to a real person. While building Amazon Polly voices, our goal is not only to offer a natural, human-like sound, but also to give end-users a sense of personality.

As the first NTTS Australian voice for Polly, Olivia reaches a new level of expressiveness that brings text to life. Olivia’s sweet Australian accent is warm and welcoming, striking the right balance between casualness and professionalism. Olivia’s bright personality and friendly tone provides an engaging user experience, which makes the voice suitable for a large variety of use cases such as voice bots, contact center IVRs, training videos, audio books, or long-form reading.

 

You can use Olivia via the Amazon Polly console, the AWS Command Line Interface (AWS CLI), or AWS SDK. The feature is available across all AWS Regions supporting NTTS. For more information, see What Is Amazon Polly? For the full list of available voices, see Voices in Amazon Polly, or log in to the Amazon Polly console to try it out for yourself! To get more control over the speech output, try SSML tags and tailor the voice to your needs.

 


About the Author

Sarah Schopper is a Language Engineer for English Text-to-Speech. At work, she delights customers with new voices for Amazon Polly. In her spare time, she enjoys playing board games and experimenting with new cooking ingredients.

Read More

The Machine Learning Behind Hum to Search

The Machine Learning Behind Hum to Search

Posted by Christian Frank, Google Research, Zürich

Melodies stuck in your head, often referred to as “earworms,” are a well-known and sometimes irritating phenomenon — once that earworm is there, it can be tough to get rid of it. Research has found that engaging with the original song, whether that’s listening to or singing it, will drive the earworm away. But what if you can’t quite recall the name of the song, and can only hum the melody?

Existing methods to match a hummed melody to its original polyphonic studio recording face several challenges. With lyrics, background vocals and instruments, the audio of a musical or studio recording can be quite different from a hummed tune. By mistake or design, when someone hums their interpretation of a song, often the pitch, key, tempo or rhythm may vary slightly or even significantly. That’s why so many existing approaches to query by humming match the hummed tune against a database of pre-existing melody-only or hummed versions of a song, instead of identifying the song directly. However, this type of approach often relies on a limited database that requires manual updates.

Launched in October, Hum to Search is a new fully machine-learned system within Google Search that allows a person to find a song using only a hummed rendition of it. In contrast to existing methods, this approach produces an embedding of a melody from a spectrogram of a song without generating an intermediate representation. This enables the model to match a hummed melody directly to the original (polyphonic) recordings without the need for a hummed or MIDI version of each track or for other complex hand-engineered logic to extract the melody. This approach greatly simplifies the database for Hum to Search, allowing it to constantly be refreshed with embeddings of original recordings from across the world — even the latest releases.

Background
Many existing music recognition systems convert an audio sample into a spectrogram before processing it, in order to find a good match. However, one challenge in recognizing a hummed melody is that a hummed tune often contains relatively little information, as illustrated by this hummed example of Bella Ciao. The difference between the hummed version and the same segment from the corresponding studio recording can be visualized using spectrograms, seen below:

Visualization of a hummed clip and a matching studio recording.

Given the image on the left, a model needs to locate the audio corresponding to the right-hand image from a collection of over 50M similar-looking images (corresponding to segments of studio recordings of other songs). To achieve this, the model has to learn to focus on the dominant melody, and ignore background vocals, instruments, and voice timbre, as well as differences stemming from background noise or room reverberations. To find by eye the dominant melody that might be used to match these two spectrograms, a person might look for similarities in the lines near the bottom of the above images.

Prior efforts to enable discovery of music, in particular in the context of recognizing recorded music being played in an environment such as a cafe or a club, demonstrated how machine learning might be applied to this problem. Now Playing, released to Pixel phones in 2017, uses an on-device deep neural network to recognize songs without the need for a server connection, and Sound Search further developed this technology to provide a server-based recognition service for faster and more accurate searching of over 100 million songs. The next challenge then was to leverage what was learned from these releases to recognize hummed or sung music from a similarly large library of songs.

Machine Learning Setup
The first step in developing Hum to Search was to modify the music-recognition models used in Now Playing and Sound Search to work with hummed recordings. In principle, many such retrieval systems (e.g., image recognition) work in a similar way. A neural network is trained with pairs of input (here pairs of hummed or sung audio with recorded audio) to produce embeddings for each input, which will later be used for matching to a hummed melody.

Training setup for the neural network

To enable humming recognition, the network should produce embeddings for which pairs of audio containing the same melody are close to each other, even if they have different instrumental accompaniment and singing voices. Pairs of audio containing different melodies should be far apart. In training, the network is provided such pairs of audio until it learns to produce embeddings with this property.

The trained model can then generate an embedding for a tune that is similar to the embedding of the song’s reference recording. Finding the correct song is then only a matter of searching for similar embeddings from a database of reference recordings computed from audio of popular music.

Training Data
Because training of the model required song pairs (recorded and sung), the first challenge was to obtain enough training data. Our initial dataset consisted of mostly sung music segments (very few of these contained humming). To make the model more robust, we augmented the audio during training, for example by varying the pitch or tempo of the sung input randomly. The resulting model worked well enough for people singing, but not for people humming or whistling.

To improve the model’s performance on hummed melodies we generated additional training data of simulated “hummed” melodies from the existing audio dataset using SPICE, a pitch extraction model developed by our wider team as part of the FreddieMeter project. SPICE extracts the pitch values from given audio, which we then use to generate a melody consisting of discrete audio tones. The very first version of this system transformed this original clip into these tones.

Generating hummed audio from sung audio

We later refined this approach by replacing the simple tone generator with a neural network that generates audio resembling an actual hummed or whistled tune. For example, the network generates this humming example or whistling example from the above sung clip.

As a final step, we compared training data by mixing and matching the audio samples. For example, if we had a similar clip from two different singers, we’d align those two clips with our preliminary models, and are therefore able to show the model an additional pair of audio clips that represent the same melody.

Machine Learning Improvements
When training the Hum to Search model, we started with a triplet loss function. This loss has been shown to perform well across a variety of classification tasks like images and recorded music. Given a pair of audio corresponding to the same melody (points R and P in embedding space, shown below), triplet loss would ignore certain parts of the training data derived from a different melody. This helps the machine improve learning behavior, either when it finds a different melody that is too ‘easy’ in that it is already far away from R and P (see point E) or because it is too hard in that, given the model’s current state of learning, the audio ends up being too close to R — even though according to our data it represents a different melody (see point H).

Example audio segments visualized as points in embedding space

We’ve found that we could improve the accuracy of the model by taking these additional training data (points H and E) into account, namely by formulating a general notion of model confidence across a batch of examples: How sure is the model that all the data it has seen can be classified correctly, or has it seen examples that do not fit its current understanding? Based on this notion of confidence, we added a loss that drives model confidence towards 100% across all areas of the embedding space, which led to improvements in our model’s precision and recall.

The above changes, but in particular our variations, augmentations and superpositions of the training data, enabled the neural network model deployed in Google Search to recognize sung or hummed melodies. The current system reaches a high level of accuracy on a song database that contains over half a million songs that we are continually updating. This song corpus still has room to grow to include more of the world’s many melodies.

Hum to Search in the Google App

To try the feature, you can open the latest version of the Google app, tap the mic icon and say “what’s this song?” or click the “Search a song” button, after which you can hum, sing, or whistle away! We hope that Hum to Search can help with that earworm of yours, or maybe just help you in case you want to find and playback a song without having to type its name.

Acknowledgements
The work described here was authored by Alex Tudor, Duc Dung Nguyen, Matej Kastelic‎, Mihajlo Velimirović‎, Stefan Christoph, Mauricio Zuluaga, Christian Frank, Dominik Roblek, and Matt Sharifi. We would like to deeply thank Krishna Kumar, Satyajeet Salgar and Blaise Aguera y Arcas for their ongoing support, as well as all the Google teams we’ve collaborated with to build the full Hum to Search product.

We would also like to thank all our colleagues at Google who donated clips of themselves singing or humming and therefore laid a foundation for this work, as well as Nick Moukhine‎ for building the Google-internal singing donation app. Finally, special thanks to Meghan Danks and Krishna Kumar for their feedback on earlier versions of this post.

Read More