Expanding our ML-based flood forecasting

In 2018 we began our flood forecasting initiative to help combat the catastrophic damage from floods each year by equipping those in harm’s way with accurate and detailed alerts. This work is a part of Google’s broader Crisis Response program which provides people access to trusted information and resources in critical moments. For over a decade, our Crisis Response team has been partnering with front line and emergency workers to develop technology and programs that help keep people safe, informed and out of harm’s way.

Expanding our forecasting reach

In the first three years, we expanded our program to cover much of India and Bangladesh, working in partnership with the Indian Central Water Commision and with the Bangladesh Water Development Board, covering an area with about 220 million people and sending out 40 million potentially life-saving alerts. And in 2021, our operational systems were further expanded to cover an area with over 360 million people. Thanks to better flood prediction technology, we sent out over 115 million alerts — that’s about triple the amount we previously sent out.

Coverage areas of our current operational flood forecasting systems.

Coverage areas of our current operational flood forecasting systems. In these areas, we use our models to help government alerts reach the right people. In some areas we have also increased lead time and spatial accuracy.

We’re hyper-focused on making alerts more local, accessible, actionable and accurate — the more information we can offer about upcoming floods, the better, more timely decisions people can make. Most global flood alerts only provide information on how much a river will rise (e.g. 30 cm), which doesn’t always mean people can know what that would mean for them and their village. Our flood alerts display inundation maps, which show the extent and depth of flooding right on top of Google Maps, so people can visualize this critical information more easily. Our new manifold inundation model and advances across all models allow us to scale up significantly and provide this information to many more people (and we’ll share more about this technology in the near future).

  • Google Flood alerts

    Google Flood alerts

  • Google Flood alerts

    Google Flood alerts

  • Google Flood alerts

    Google Flood alerts

We recently launched the Google Flood Hub to make this flood data even more hyper-local. It allows you to zoom into our inundation maps where you can find information about the same flood, and focus on highly specific areas, such as a village. The Flood Hub provides the same depth and flood extent information in a more visual format that helps people to understand the current and forecasted flood situation in their area instantly. This site will be our primary resource for local, visual forecast information moving forward.

The Google Flood Hub user interface on a mobile device

The Google Flood Hub user interface on a mobile device

We’ve also partnered with multiple local aid organizations such as Federation of Red Cross and Red Crescent Societies, Indian Red Cross Society (IRCS), Bangladesh Red Crescent Society (BDRCS) and Yuganter to help get the alerts out even to people without smartphones or internet access. We worked closely with the organizations’ local teams who traveled between villages to train locals. The training included deeper explanations on how to read the Google alerts and flood maps, as well as how to act and notify others once an alert is issued.

Our flood forecasting system is now live in all of India and Bangladesh, and we are working to expand these life-saving alerts to countries in South Asia and South America. And eventually, we want them to be available everywhere.

Read More

On-device training in TensorFlow Lite

Posted by the TensorFlow Lite team

TensorFlow Lite is Google’s machine learning framework to deploy machine learning models on multiple devices and surfaces such as mobile (iOS and Android), desktops and other edge devices. Recently, we added support to run TensorFlow Lite models in a browser as well. In order to build apps using TensorFlow Lite, you can either use an off-the shelf model from TensorFlow Hub, or convert an existing TensorFlow Model to a TensorFlow Lite model using the converter. Once the model is deployed in an app, you can run inference on the model based on input data.

TensorFlow Lite now supports training your models on-device, in addition to running inference. On-device training enables interesting personalization use cases where models can be fine-tuned based on user needs. For instance, you could deploy an image classification model and allow a user to fine-tune the model to recognize bird species using transfer learning, while allowing another user to retrain the same model to recognize fruits. This new feature is available in TensorFlow 2.7 and later and is currently available for Android apps. (iOS support will be added in the future.)

On-device training is also a necessary foundation for Federated Learning use cases to train global models on decentralized data. This blog post does not cover Federated Learning and instead focuses on helping you integrate on-device training in your Android apps.

Later in this article we will reference a Colab and Android sample app as we walk you through the end-to-end implementation path for on-device learning to fine-tune an image classification model.

Improvements over the earlier approach

In our 2019 blog post, we introduced on-device training concepts and an example of on-device training in TensorFlow Lite. However, there were several limitations. For example, it was not easy to customize the model structure and optimizers. You also had to deal with multiple physical TensorFlow Lite (.tflite) models instead of a single TensorFlow Lite model. Similarly, there was no easy way to store and update the training weights. Our latest TensorFlow Lite version streamlines this process by providing more convenient options for on-device training, as explained below.

How does it work?

In order to deploy a TensorFlow Lite model with on-device training built-in, here are the high level steps:

  • Build a TensorFlow model for training and inference
  • Convert the TensorFlow model to TensorFlow Lite format
  • Integrate the model in your Android app
  • Invoke model training in the app, similar to how you would invoke model inference

These steps are explained below.

Build a TensorFlow model for training and inference

The TensorFlow Lite model should not only support model inference, but also model training, which typically involves saving the model’s weights to the file system and restoring the weights from the file system. This is done to save the training weights after each training epoch, so that the next training epoch can use the weights from the previous one, instead of starting training from scratch.

Our suggested approach is to implement these tf.functions to represent training, inference, saving weights, and loading weights:

  • A train function that trains the model using training data. The train function below makes a prediction, calculates the loss (or error), and uses tf.GradientTape() to record operations for automatic differentiation and update the model’s parameters.
    # The `train` function takes a batch of input images and labels.
    @tf.function(input_signature=[
    tf.TensorSpec([None, IMG_SIZE, IMG_SIZE], tf.float32),
    tf.TensorSpec([None, 10], tf.float32),
    ])
    def train(self, x, y):
    with tf.GradientTape() as tape:
    prediction = self.model(x)
    loss = self._LOSS_FN(prediction, y)
    gradients = tape.gradient(loss, self.model.trainable_variables)
    self._OPTIM.apply_gradients(
    zip(gradients, self.model.trainable_variables))
    result = {"loss": loss}
    for grad in gradients:
    result[grad.name] = grad
    return result
  • An infer or a predict function that invokes model inference. This is similar to how you currently use TensorFlow Lite for inference.
    @tf.function(input_signature=[tf.TensorSpec([None, IMG_SIZE, IMG_SIZE], tf.float32)])
    def predict(self, x):
    return {
    "output": self.model(x)
    }
  • A save/restore function that saves training weights (i.e., parameters used by the model) in Checkpoints format to the file system. The save function’s code is shown below.
    @tf.function(input_signature=[tf.TensorSpec(shape=[], dtype=tf.string)])
    def save(self, checkpoint_path):
    tensor_names = [weight.name for weight in self.model.weights]
    tensors_to_save = [weight.read_value() for weight in self.model.weights]
    tf.raw_ops.Save(
    filename=checkpoint_path, tensor_names=tensor_names,
    data=tensors_to_save, name='save')
    return {
    "checkpoint_path": checkpoint_path
    }

Convert to TensorFlow Lite format

You may already be familiar with the workflow to convert your TensorFlow model to the TensorFlow Lite format. Some of the low level features for on-device training (e.g., variables to store the model parameters) are still experimental, and others (e.g., weight serialization) currently rely on TF Select operators, so you will need to set these flags during conversion. You can find an example of all the flags you need to set in the Colab.

# Convert the model
converter = tf.lite.TFLiteConverter.from_saved_model(SAVED_MODEL_DIR)
converter.target_spec.supported_ops = [
tf.lite.OpsSet.TFLITE_BUILTINS, # enable TensorFlow Lite ops.
tf.lite.OpsSet.SELECT_TF_OPS # enable TensorFlow ops.
]
converter.experimental_enable_resource_variables = True
tflite_model = converter.convert()

Integrate the model in your Android app

Once you have converted your model to the TensorFlow Lite format, you’re ready to integrate the model into your app! Refer to the Android app samples for more details.

Invoke model training and inference in app

On Android, TensorFlow Lite on-device training can be performed using either Java or C++ APIs. You can create an instance of the TensorFlow Lite Interpreter to load a model and drive model training tasks. We had previously defined multiple tf.functions: these functions can be invoked using TensorFlow Lite’s support for Signatures, which allow a single TensorFlow Lite model to support multiple ‘entry’ points. For example, we had defined a train function for on-device training, which is one of the model’s signatures. The train function can be invoked using TensorFlow Lite’s runSignature method by specifying the name of the signature (‘train’):

 // Run training for a few steps.
float[] losses = new float[NUM_EPOCHS];
for (int epoch = 0; epoch < NUM_EPOCHS; ++epoch) {
for (int batchIdx = 0; batchIdx < NUM_BATCHES; ++batchIdx) {
Map<String, Object> inputs = new HashMap<>>();
inputs.put("x", trainImageBatches.get(batchIdx));
inputs.put("y", trainLabelBatches.get(batchIdx));

Map<String, Object> outputs = new HashMap<>();
FloatBuffer loss = FloatBuffer.allocate(1);
outputs.put("loss", loss);

interpreter.runSignature(inputs, outputs, "train");

// Record the last loss.
if (batchIdx == NUM_BATCHES - 1) losses[epoch] = loss.get(0);
}
}


Similarly, the following example shows how to invoke inference using the model’s ‘infer’ signature:

try (Interpreter anotherInterpreter = new Interpreter(modelBuffer)) {
// Restore the weights from the checkpoint file.

int NUM_TESTS = 10;
FloatBuffer testImages = FloatBuffer.allocateDirect(NUM_TESTS * 28 * 28).order(ByteOrder.nativeOrder());
FloatBuffer output = FloatBuffer.allocateDirect(NUM_TESTS * 10).order(ByteOrder.nativeOrder());

// Fill the test data.

// Run the inference.
Map<String, Object> inputs = new HashMap<>>();
inputs.put("x", testImages.rewind());
Map<String, Object> outputs = new HashMap<>();
outputs.put("output", output);
anotherInterpreter.runSignature(inputs, outputs, "infer");
output.rewind();

// Process the result to get the final category values.
int[] testLabels = new int[NUM_TESTS];
for (int i = 0; i < NUM_TESTS; ++i) {
int index = 0;
for (int j = 1; j < 10; ++j) {
if (output.get(i * 10 + index) < output.get(i * 10 + j))
index = testLabels[j];
}
testLabels[i] = index;
}
}

And, that’s it! You now have a TensorFlow Lite model that is able to use on-device training. We hope that this code walkthrough gives you a good idea on how to run on-device training in TensorFlow Lite, and we’re excited to see where you take it.

Practical considerations

In theory, you should be able to apply on-device training in TensorFlow Lite to any use case that TensorFlow supports. However, in reality there are a few practical considerations that you need to keep in mind before you deploy on-device training in your apps:

  • Use cases: The Colab example shows an example of on-device training for a vision use case. If you run into issues for specific models or use cases, please let us know on GitHub.
  • Performance: Depending on the use case, on-device training could take anywhere from a few seconds to much longer. If you run on-device training as part of a user-facing feature (e.g., your end user is interacting with the feature), you should measure the time taken for a wide range of possible training inputs in your app to limit the training time. If your use-case requires very long on-device training times, consider training a model using a desktop or the cloud first, then fine-tuning it on-device.
  • Battery usage: Just like model inference, invoking model training on device may result in a battery drain. If model training is part of a feature that is not user facing, we recommend following Android’s guidelines to implement background tasks.
  • Training from scratch vs. retraining: In theory, it should be possible to train a model from scratch on device using the above features. However, in reality, training from scratch involves an enormous amount of training data and may take several days even on servers with powerful processors. Consequently, for on-device applications, we recommend retraining on an already trained model (i.e., transfer learning) as shown in the Colab example.

Roadmap

Future work includes (but is not limited to) on-device training support on iOS, performance improvements to leverage on-device accelerators (e.g. GPUs) for on-device training, reducing the binary size by implementing more training ops natively in TensorFlow Lite, higher level API support (e.g. via the TensorFlow Lite Task Library) to abstract away the implementation details and examples covering other on-device training use cases (e.g. NLP). Our long term roadmap involves potentially providing on-device end-to-end Federated Learning solutions.

Next steps

Thank you for reading! We’re excited to see what you build using on-device learning. Once again, here are links to the sample app and Colab. If you have any feedback, please let us know on the TensorFlow Forum, or on GitHub.

Acknowledgements

This post reflects the significant contributions of many people in Google’s TensorFlow Lite team including Michelle Carney, Lawrence Chan, Jaesung Chung, Jared Duke, Terry Heo, Jared Lim, Yu-Cheng Ling, Thai Nguyen, Karim Nosseir, Arun Venkatesan, Haoliang Zhang, other TensorFlow Lite team members, and our collaborators in Google Research.

Read More

Enhanced Sleep Sensing in Nest Hub

Posted by Michael Dixon, Software Engineer and Reena Singhal Lee, Product Manager, Google Health

Earlier this year, we launched Contactless Sleep Sensing in Nest Hub, an opt-in feature that can help users better understand their sleep patterns and nighttime wellness. While some of the most critical sleep insights can be derived from a person’s overall schedule and duration of sleep, that alone does not tell the complete story. The human brain has special neurocircuitry to coordinate sleep cycles — transitions between deep, light, and rapid eye movement (REM) stages of sleep — vital not only for physical and emotional wellbeing, but also for optimal physical and cognitive performance. Combining such sleep staging information with disturbance events can help you better understand what’s happening while you’re sleeping.

Today we announced enhancements to Sleep Sensing that provide deeper sleep insights. While not intended for medical purposes1, these enhancements allow better understanding of sleep through sleep stages and the separation of the user’s coughs and snores from other sounds in the room. Here we describe how we developed these novel technologies, through transfer learning techniques to estimate sleep stages and sensor fusion of radar and microphone signals to disambiguate the source of sleep disturbances.

To help people understand their sleep patterns, Nest Hub displays a hypnogram, plotting the user’s sleep stages over the course of a sleep session. Potential sound disturbances during sleep will now include “Other sounds” in the timeline to separate the user’s coughs and snores from other sound disturbances detected from sources in the room outside of the calibrated sleeping area.

Training and Evaluating the Sleep Staging Classification Model
Most people cycle through sleep stages 4-6 times a night, about every 80-120 minutes, sometimes with a brief awakening between cycles. Recognizing the value for users to understand their sleep stages, we have extended Nest Hub’s sleep-wake algorithms using Soli to distinguish between light, deep, and REM sleep. We employed a design that is generally similar to Nest Hub’s original sleep detection algorithm: sliding windows of raw radar samples are processed to produce spectrogram features, and these are continuously fed into a Tensorflow Lite model. The key difference is that this new model was trained to predict sleep stages rather than simple sleep-wake status, and thus required new data and a more sophisticated training process.

In order to assemble a rich and diverse dataset suitable for training high-performing ML models, we leveraged existing non-radar datasets and applied transfer learning techniques to train the model. The gold standard for identifying sleep stages is polysomnography (PSG), which employs an array of wearable sensors to monitor a number of body functions during sleep, such as brain activity, heartbeat, respiration, eye movement, and motion. These signals can then be interpreted by trained sleep technologists to determine sleep stages.

To develop our model, we used publicly available data from the Sleep Heart Health Study (SHHS) and Multi-ethnic Study of Atherosclerosis (MESA) studies with over 10,000 sessions of raw PSG sensor data with corresponding sleep staging ground-truth labels, from the National Sleep Research Resource. The thoracic respiratory inductance plethysmography (RIP) sensor data within these PSG datasets is collected through a strap worn around the patient’s chest to measure motion due to breathing. While this is a very different sensing modality from radar, both RIP and radar provide signals that can be used to characterize a participant’s breathing and movement. This similarity between the two domains makes it possible to leverage a plethysmography-based model and adapt it to work with radar.

To do so, we first computed spectrograms from the RIP time series signals and used these as features to train a convolutional neural network (CNN) to predict the groundtruth sleep stages. This model successfully learned to identify breathing and motion patterns in the RIP signal that could be used to distinguish between different sleep stages. This indicated to us that the same should also be possible when using radar-based signals.

To test the generality of this model, we substituted similar spectrogram features computed from Nest Hub’s Soli sensor and evaluated how well the model was able to generalize to a different sensing modality. As expected, the model trained to predict sleep stages from a plethysmograph sensor was much less accurate when given radar sensor data instead. However, the model still performed much better than chance, which demonstrated that it had learned features that were relevant across both domains.

To improve on this, we collected a smaller secondary dataset of radar sensor data with corresponding PSG-based groundtruth labels, and then used a portion of this dataset to fine-tune the weights of the initial model. This smaller amount of additional training data allowed the model to adapt the original features it had learned from plethysmography-based sleep staging and successfully generalize them to our domain. When evaluated on an unseen test set of new radar data, we found the fine-tuned model produced sleep staging results comparable to that of other consumer sleep trackers.

The custom ML model efficiently processes a continuous stream of 3D radar tensors (as shown in the spectrogram at the top of the figure) to automatically compute probabilities of each sleep stage — REM, light, and deep — or detect if the user is awake or restless.

More Intelligent Audio Sensing Through Audio Source Separation
Soli-based sleep tracking gives users a convenient and reliable way to see how much sleep they are getting and when sleep disruptions occur. However, to understand and improve their sleep, users also need to understand why their sleep may be disrupted. We’ve previously discussed how Nest Hub can help monitor coughing and snoring, frequent sources of sleep disturbances of which people are often unaware. To provide deeper insight into these disturbances, it is important to understand if the snores and coughs detected are your own.

The original algorithms on Nest Hub used an on-device, CNN-based detector to process Nest Hub’s microphone signal and detect coughing or snoring events, but this audio-only approach did not attempt to distinguish from where a sound originated. Combining audio sensing with Soli-based motion and breathing cues, we updated our algorithms to separate sleep disturbances from the user-specified sleeping area versus other sources in the room. For example, when the primary user is snoring, the snoring in the audio signal will correspond closely with the inhalations and exhalations detected by Nest Hub’s radar sensor. Conversely, when snoring is detected outside the calibrated sleeping area, the two signals will vary independently. When Nest Hub detects coughing or snoring but determines that there is insufficient correlation between the audio and motion features, it will exclude these events from the user’s coughing or snoring timeline and instead note them as “Other sounds” on Nest Hub’s display. The updated model continues to use entirely on-device audio processing with privacy-preserving analysis, with no raw audio data sent to Google’s servers. A user can then opt to save the outputs of the processing (sound occurrences, such as the number of coughs and snore minutes) in Google Fit, in order to view their night time wellness over time.

Snoring sounds that are synchronized with the user’s breathing pattern (left) will be displayed in the user’s Nest Hub’s Snoring timeline. Snoring sounds that do not align with the user’s breathing pattern (right) will be displayed in Nest Hub’s “Other sounds” timeline.

Since Nest Hub with Sleep Sensing launched, researchers have expressed interest in investigational studies using Nest Hub’s digital quantification of nighttime cough. For example, a small feasibility study supported by the Cystic Fibrosis Foundation2 is currently underway to evaluate the feasibility of measuring night time cough using Nest Hub in families of children with cystic fibrosis (CF), a rare inherited disease, which can result in a chronic cough due to mucus in the lungs. Researchers are exploring if quantifying cough at night could be a proxy for monitoring response to treatment.

Conclusion
Based on privacy-preserving radar and audio signals, these improved sleep staging and audio sensing features on Nest Hub provide deeper insights that we hope will help users translate their night time wellness into actionable improvements for their overall wellbeing.

Acknowledgements
This work involved collaborative efforts from a multidisciplinary team of software engineers, researchers, clinicians, and cross-functional contributors. Special thanks to Dr. Logan Schneider, a sleep neurologist whose clinical expertise and contributions were invaluable to continuously guide this research. In addition to the authors, key contributors to this research include Anupam Pathak, Jeffrey Yu, Arno Charton, Jian Cui, Sinan Hersek, Jonathan Hsu, Andi Janti, Linda Lei, Shao-Po Ma, ‎Jo Schaeffer, Neil Smith, Siddhant Swaroop, Bhavana Koka, Dr. Jim Taylor, and the extended team. Thanks to Mark Malhotra and Shwetak Patel for their ongoing leadership, as well as the Nest, Fit, and Assistant teams we collaborated with to build and validate these enhancements to Sleep Sensing on Nest Hub.


1Not intended to diagnose, cure, mitigate, prevent or treat any disease or condition. 
2Google did not have any role in study design, execution, or funding. 

Read More

Google at EMNLP 2021

Posted by Catherine Armato, Google Research

This week, the annual conference on Empirical Methods in Natural Language Processing (EMNLP 2021) will be held both virtually and in Punta Cana, Dominican Republic. As a Diamond Level sponsor of EMNLP 2021, Google will contribute research on a diverse set of topics, including language interactions, causal inference, and question answering, additionally serving in various levels of organization in the conference.

Below is a full list of Google’s involvement and publications being presented at EMNLP 2021. We congratulate these authors, and all other researchers who are presenting their work at the conference (Google affiliations presented in bold).

Organizing Committee
Ethics Committee includes: Nyalleng Moorosi

Publications
MATE: Multi-view Attention for Table Transformer Efficiency (see blog post)
Julian Martin Eisenschlos, Maharshi Gor*, Thomas Müller*, William W. Cohen

Residual Adapters for Parameter-Efficient ASR Adaptation to Atypical and Accented Speech (see blog post)
Katrin Tomanek, Vicky Zayats, Dirk Padfield, Kara Vaillancourt, Fadi Biadsy

Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach (see blog post)
Haoming Jiang*, Bo Dai, Mengjiao Yang,Tuo Zhao, Wei Wei

Case-Based Reasoning for Natural Language Queries Over Knowledge Bases
Rajarshi Das, Manzil Zaheer, Dung Thai, Ameya Godbole, Ethan Perez, Jay-Yoon Lee, Lizhen Tan, Lazaros Polymenakos, Andrew McCallum

XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation (see blog post)
Sebastian Ruder, Noah Constant, Jan Botha, Aditya Siddhant, Orhan Firat, Jinlan Fu, Pengfei Liu, Junjie Hu, Dan Garrette, Graham Neubig, Melvin Johnson

Building and Evaluating Open-Domain Dialogue Corpora with Clarifying Questions
Mohammad Aliannejadi, Julia Kiseleva, Aleksandr Chuklin, Jeffrey Dalton, Mikhail Burtsev

Fast WordPiece Tokenization
Xinying Song, Alex Salcianu, Yang Song*, Dave Dopson, Denny Zhou

Frequency Effects on Syntactic Rule Learning in Transformers
Jason Wei, Dan Garrette, Tal Linzen, Ellie Pavlick

Controllable Semantic Parsing via Retrieval Augmentation
Panupong Pasupat, Yuan Zhang, Kelvin Guu

Systematic Generalization on gSCAN: What is Nearly Solved and What is Next?
Linlu Qiu*, Hexiang Hu,Bowen Zhang, Peter Shaw, Fei Sha

Effective Sequence-to-Sequence Dialogue State Tracking
Jeffrey Zhao, Mahdis Mahdieh, Ye Zhang, Yuan Cao, Yonghui Wu

Learning Compact Metrics for MT
Amy Pu*, Hyung Won Chung*, Ankur P. Parikh, Sebastian Gehrmann, Thibault Sellam

Joint Passage Ranking for Diverse Multi-answer Retrieval
Sewon Min*, Kenton Lee, Ming-Wei Chang, Kristina Toutanova, Hannaneh Hajishirzi

Toward Deconfounding the Effect of Entity Demographics for Question Answering Accuracy
Maharshi Gor*, Kellie Webster, Jordan Boyd-Graber*

Good-Enough Example Extrapolation
Jason Wei

Q2: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering
Or Honovich*, Leshem Choshen, Roee Aharoni, Ella Neeman, Idan Szpektor, Omri Abend

The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester*, Rami Al-Rfou, Noah Constant

A Simple and Effective Method to Eliminate the Self Language Bias in Multilingual Representations
Ziyi Yang*, Yinfei Yang, Daniel Cer, Eric Darve

Universal Sentence Representation Learning with Conditional Masked Language Model
Ziyi Yang*, Yinfei Yang, Daniel Cer, Jax Law, Eric Darve

Scalable Font Reconstruction with Dual Latent Manifolds
Nikita Srivatsan, Si Wu, Jonathan T. Barron, Taylor Berg-Kirkpatrick

Structured Context and High-Coverage Grammar for Conversational Question Answering Over Knowledge Graphs
Pierre Marion, Paweł Krzysztof Nowak, Francesco Piccinno

Don’t Search for a Search Method — Simple Heuristics Suffice for Adversarial Text Attacks
Nathaniel Berger*, Stefan Riezler, Artem Sokolov, Sebastian Ebert

HintedBT: Augmenting Back-Translation with Quality and Transliteration Hints
Sahana Ramnath, Melvin Johnson, Abhirut Gupta, Aravindan Raghuveer

STraTA: Self-Training with Task Augmentation for Better Few-Shot Learning
Tu Vu*, Minh-Thang Luong, Quoc V. Le, Grady Simon,Mohit Iyyer

Do Transformer Modifications Transfer Across Implementations and Applications? (See blog post)
Sharan Narang, Hyung Won Chung, Yi Tay, William Fedus, Thibault Fevry*, Michael Matena*, Karishma Malkan*, Noah Fiedel, Noam Shazeer, Zhenzhong Lan*, Yanqi Zhou, Wei Li, Nan Ding, Jake Marcus, Adam Roberts, Colin Raffel*

A Large-Scale Study of Machine Translation in Turkic Languages
Jamshidbek Mirzakhalova, Anoop Babua, Duygu Atamana, Sherzod Karieva, Francis Tyersa, Otabek Abduraufova, Mammad Hajilia, Sardana Ivanovaa, Abror Khaytbaeva, Antonio Laverghetta Jr., Behzodbek Moydinboyeva, Esra Onala, Shaxnoza Pulatovaa, Ahsan Wahaba, Orhan Firat, Sriram Chellappan

ReasonBERT: Pre-trained to Reason with Distant Supervision
Xiang Deng, Yu Su, Alyssa Lees, You Wu, Cong Yu, Huan Sun

MasakhaNER: Named Entity Recognition for African Languages
David Ifeoluwa Adelani, Jade Abbott, Graham Neubig, Daniel D’souza, Julia Kreutzer, Constantine Lignos, Chester Palen-Michel, Happy Buzaaba, Shruti Rijhwani, Sebastian Ruder, Stephen Mayhew, Israel Abebe Azime, Shamsuddeen H. Muhammad, Chris Chinenye Emezue, Joyce Nakatumba-Nabende, Perez Ogayo, Anuoluwapo Aremu, Catherine Gitau, Derguene Mbaye, Jesujoba Alabi, Seid Muhie Yimam, Tajuddeen Rabiu Gwadabe, Ignatius Ezeani, Rubungo Andre Niyongabo, Jonathan Mukiibi, Verrah Otiende, Iroro Orife, Davis David, Samba Ngom, Tosin Adewumi, Paul Rayson, Mofetoluwa Adeyemi, Gerald Muriuki, Emmanuel Anebi, Chiamaka Chukwuneke, Nkiruka Odu, Eric Peter Wairagala, Samuel Oyerinde, Clemencia Siro, Tobius Saul Bateesa, Temilola Oloyede, Yvonne Wambui, Victor Akinode, Deborah Nabagereka, Maurice Katusiime, Ayodele Awokoya, Mouhamadane MBOUP, Dibora Gebreyohannes, Henok Tilaye, Kelechi Nwaike, Degaga Wolde, Abdoulaye Faye, Blessing Sibanda, Orevaoghene Ahia, Bonaventure F. P. Dossou, Kelechi Ogueji, Thierno Ibrahima DIOP, Abdoulaye Diallo, Adewale Akinfaderin, Tendai Marengereke, Salomey Osei

Multi-stage Training with Improved Negative Contrast for Neural Passage Retrieval
Jing Lu*, Gustavo Hernandez, Abrego, Ji Ma, Jianmo Ni, Yinfei Yang

Controlling Machine Translation for Multiple Attributes with Additive Interventions
Andrea Schioppa, Artem Sokolov, David Vilar, Katja Filippova

A Simple and Effective Positional Encoding for Transformers
Pu-Chin Chen, Henry Tsai, Srinadh Bhojanapalli, Hyung Won Chung, Yin-Wen Chang, Chun-Sung Ferng

CrossVQA: Scalably Generating Benchmarks for Systematically Testing VQA Generalization
Arjun R. Akula*, Soravit Changpinyo, Boqing Gong, Piyush Sharma, Song-Chun Zhu, Radu Soricut

Can We Improve Model Robustness through Secondary Attribute Counterfactuals?
Ananth Balashankar, Xuezhi Wang, Ben Packer, Nithum Thain, Ed Chi, Alex Beutel

Multi-Vector Attention Models for Deep Re-ranking
Giulio Zhou*, Jacob Devlin

Diverse Distributions of Self-Supervised Tasks for Meta-Learning in NLP
Trapit Bansal, Karthick Gunasekaran, Tong Wang, Tsendsuren Munkhdalai, Andrew McCallum

Workshops
NLP for Conversational AI
Invited speakers include: Idan Szpektor
Organizers include: Abhinav Rastogi

Novel Ideas in Learning-to-Learn Through Interaction
Invited speakers include: Natasha Jaques

Evaluation & Comparison of NLP Systems
Invited speakers include: Sebastian Ruder

Causal Inference & NLP
Organizers include: Amir Feder, Jacob Eisenstein, Victor Veitch

Machine Reading for Question Answering
Invited speakers include: Jon Clark

Computational Approaches to Discourse
Organizers include: Annie Louis

New Frontiers in Summarization
Invited speakers include: Sebastian Gehrmann, Shashi Narayan

Multi-lingual Representation Learning
Invited speakers include: Melvin Johnson
Organizers include: Alexis Conneau, Orhan Firat, Sebastian Ruder

Widening in NLP
Invited speakers include: Jasmijn Bastings
Organizers include: Shaily Bhatt

Evaluations and Assessments of Neural Conversation Systems (EANCS)
Organizers include: Wei Wei, Bo Dai

BlackboxNLP
Invited speakers include: Sara Hooker
Organizers include: Jasmijn Bastings

Tutorials
Multi-Domain Multilingual Question Answering
Organizers include: Sebastian Ruder

Demos
LMdiff: A Visual Diff Tool to Compare Language Models
Hendrik Strobelt, Benjamin Hoover, Arvind Satyanarayan, Sebastian Gehrmann



*Work done while at Google.  

Read More

Improved On-Device ML on Pixel 6, with Neural Architecture Search

Posted by Suyog Gupta and Marie White, Software Engineers, Google Research

This fall Pixel 6 phones launched with Google Tensor, Google’s first mobile system-on-chip (SoC), bringing together various processing components (such as central/graphic/tensor processing units, image processors, etc.) onto a single chip, custom-built to deliver state-of-the-art innovations in machine learning (ML) to Pixel users. In fact, every aspect of Google Tensor was designed and optimized to run Google’s ML models, in alignment with our AI Principles. That starts with the custom-made TPU integrated in Google Tensor that allows us to fulfill our vision of what should be possible on a Pixel phone.

Today, we share the improvements in on-device machine learning made possible by designing the ML models for Google Tensor’s TPU. We use neural architecture search (NAS) to automate the process of designing ML models, which incentivize the search algorithms to discover models that achieve higher quality while meeting latency and power requirements. This automation also allows us to scale the development of models for various on-device tasks. We’re making these models publicly available through the TensorFlow model garden and TensorFlow Hub so that researchers and developers can bootstrap further use case development on Pixel 6. Moreover, we have applied the same techniques to build a highly energy-efficient face detection model that is foundational to many Pixel 6 camera features.

An illustration of NAS to find TPU-optimized models. Each column represents a stage in the neural network, with dots indicating different options, and each color representing a different type of building block. A path from inputs (e.g., an image) to outputs (e.g., per-pixel label predictions) through the matrix represents a candidate neural network. In each iteration of the search, a neural network is formed using the blocks chosen at every stage, and the search algorithm aims to find neural networks that jointly minimize TPU latency and/or energy and maximize accuracy.

Search Space Design for Vision Models
A key component of NAS is the design of the search space from which the candidate networks are sampled. We customize the search space to include neural network building blocks that run efficiently on the Google Tensor TPU.

One widely-used building block in neural networks for various on-device vision tasks is the Inverted Bottleneck (IBN). The IBN block has several variants, each with different tradeoffs, and is built using regular convolution and depthwise convolution layers. While IBNs with depthwise convolution have been conventionally used in mobile vision models due to their low computational complexity, fused-IBNs, wherein depthwise convolution is replaced by a regular convolution, have been shown to improve the accuracy and latency of image classification and object detection models on TPU.

However, fused-IBNs can have prohibitively high computational and memory requirements for neural network layer shapes that are typical in the later stages of vision models, limiting their use throughout the model and leaving the depthwise-IBN as the only alternative. To overcome this limitation, we introduce IBNs that use group convolutions to enhance the flexibility in model design. While regular convolution mixes information across all the features in the input, group convolution slices the features into smaller groups and performs regular convolution on features within that group, reducing the overall computational cost. Called group convolution–based IBNs (GC-IBNs), their tradeoff is that they may adversely impact model quality.

Inverted bottleneck (IBN) variants: (a) depthwise-IBN, depthwise convolution layer with filter size KxK sandwiched between two convolution layers with filter size 1×1; (b) fused-IBN, convolution and depthwise are fused into a convolution layer with filter size KxK; and (c) group convolution–based GC-IBN that replaces with the KxK regular convolution in fused-IBN with group convolution. The number of groups (group count) is a tunable parameter during NAS.
Inclusion of GC-IBN as an option provides additional flexibility beyond other IBNs. Computational cost and latency of different IBN variants depends on the feature dimensions being processed (shown above for two example feature dimensions). We use NAS to determine the optimal choice of IBN variants.

Faster, More Accurate Image Classification
Which IBN variant to use at which stage of a deep neural network depends on the latency on the target hardware and the performance of the resulting neural network on the given task. We construct a search space that includes all of these different IBN variants and use NAS to discover neural networks for the image classification task that optimize the classification accuracy at a desired latency on TPU. The resulting MobileNetEdgeTPUV2 model family improves the accuracy at a given latency (or latency at a desired accuracy) compared to the existing on-device models when run on the TPU. MobileNetEdgeTPUV2 also outperforms their predecessor, MobileNetEdgeTPU, the image classification models designed for the previous generation of the TPU.

Network architecture families visualized as connected dots at different latency targets. Compared with other mobile models, such as FBNet, MobileNetV3, and EfficientNets, MobileNetEdgeTPUV2 models achieve higher ImageNet top-1 accuracy at lower latency when running on Google Tensor’s TPU.

MobileNetEdgeTPUV2 models are built using blocks that also improve the latency/accuracy tradeoff on other compute elements in the Google Tensor SoC, such as the CPU. Unlike accelerators such as the TPU, CPUs show a stronger correlation between the number of multiply-and-accumulate operations in the neural network and latency. GC-IBNs tend to have fewer multiply-and-accumulate operations than fused-IBNs, which leads MobileNetEdgeTPUV2 to outperform other models even on Pixel 6 CPU.

MobileNetEdgeTPUV2 models achieve ImageNet top-1 accuracy at lower latency on Pixel 6 CPU, and outperform other CPU-optimized model architectures, such as MobileNetV3.

Improving On-Device Semantic Segmentation
Many vision models consist of two components, the base feature extractor for understanding general features of the image, and the head for understanding domain-specific features, such as semantic segmentation (the task of assigning labels, such as sky, car, etc., to each pixel in an image) and object detection (the task of detecting instances of objects, such as cats, doors, cars, etc., in an image). Image classification models are often used as feature extractors for these vision tasks. As shown below, the MobileNetEdgeTPUV2 classification model coupled with the DeepLabv3+ segmentation head improves the quality of on-device segmentation.

To further improve the segmentation model quality, we use the bidirectional feature pyramid network (BiFPN) as the segmentation head, which performs weighted fusion of different features extracted by the feature extractor. Using NAS we find the optimal configuration of blocks in both the feature extractor and the BiFPN head. The resulting models, named Autoseg-EdgeTPU, produce even higher-quality segmentation results, while also running faster.

The final layers of the segmentation model contribute significantly to the overall latency, mainly due to the operations involved in generating a high resolution segmentation map. To optimize the latency on TPU, we introduce an approximate method for generating the high resolution segmentation map that reduces the memory requirement and provides a nearly 1.5x speedup, without significantly impacting the segmentation quality.

Left: Comparing the performance, measured as mean intersection-over-union (mIOU), of different segmentation models on the ADE20K semantic segmentation dataset (top 31 classes). Right: Approximate feature upsampling (e.g., increasing resolution from 32×32 → 512×512). Argmax operation used to compute per-pixel labels is fused with the bilinear upsampling. Argmax performed on smaller resolution features reduces memory requirements and improves latency on TPU without a significant impact to quality.

Higher-Quality, Low-Energy Object Detection
Classic object detection architectures allocate ~70% of the compute budget to the feature extractor and only ~30% to the detection head. For this task we incorporate the GC-IBN blocks into a search space we call “Spaghetti Search Space”1, which provides the flexibility to move more of the compute budget to the head. This search space also uses the non-trivial connection patterns seen in recent NAS works such as MnasFPN to merge different but related stages of the network to strengthen understanding.

We compare the models produced by NAS to MobileDet-EdgeTPU, a class of mobile detection models customized for the previous generation of TPU. MobileDets have been demonstrated to achieve state-of-the-art detection quality on a variety of mobile accelerators: DSPs, GPUs, and the previous TPU. Compared with MobileDets, the new family of SpaghettiNet-EdgeTPU detection models achieves +2.2% mAP (absolute) on COCO at the same latency and consumes less than 70% of the energy used by MobileDet-EdgeTPU to achieve similar accuracy.

Comparing the performance of different object detection models on the COCO dataset with the mAP metric (higher is better). SpaghettiNet-EdgeTPU achieves higher detection quality at lower latency and energy consumption compared to previous mobile models, such as MobileDets and MobileNetV2 with Feature Pyramid Network (FPN).

Inclusive, Energy-Efficient Face Detection
Face detection is a foundational technology in cameras that enables a suite of additional features, such as fixing the focus, exposure and white balance, and even removing blur from the face with the new Face Unblur feature. Such features must be designed responsibly, and Face Detection in the Pixel 6 were developed with our AI Principles top of mind.

Left: The original photo without improvements. Right: An unblurred face in a dynamic environment. This is the result of Face Unblur combined with a more accurate face detector running at a higher frames per second.

Since mobile cameras can be power-intensive, it was important for the face detection model to fit within a power budget. To optimize for energy efficiency, we used the Spaghetti Search Space with an algorithm to search for architectures that maximize accuracy at a given energy target. Compared with a heavily optimized baseline model, SpaghettiNet achieves the same accuracy at ~70% of the energy. The resulting face detection model, called FaceSSD, is more power-efficient and accurate. This improved model, combined with our auto-white balance and auto-exposure tuning improvements, are part of Real Tone on Pixel 6. These improvements help better reflect the beauty of all skin tones. Developers can utilize this model in their own apps through the Android Camera2 API.

Toward Datacenter-Quality Language Models on a Mobile Device
Deploying low-latency, high-quality language models on mobile devices benefits ML tasks like language understanding, speech recognition, and machine translation. MobileBERT, a derivative of BERT, is a natural language processing (NLP) model tuned for mobile CPUs.

However, due to the various architectural optimizations made to run these models efficiently on mobile CPUs, their quality is not as high as that of the large BERT models. Since MobileBERT on TPU runs significantly faster than on CPU, it presents an opportunity to improve the model architecture further and reduce the quality gap between MobileBERT and BERT. We extended the MobileBERT architecture and leveraged NAS to discover models that map well to the TPU. These new variants of MobileBERT, named MobileBERT-EdgeTPU, achieve up to 2x higher hardware utilization, allowing us to deploy large and more accurate models on TPU at latencies comparable to the baseline MobileBERT.

MobileBERT-EdgeTPU models, when deployed on Google Tensor’s TPU, produce on-device quality comparable to the large BERT models typically deployed in data centers.

Performance on the question answering task (SQuAD v 1.1). While the TPU in Pixel 6 provides a ~10x acceleration over CPU, further model customization for the TPU achieves on-device quality comparable to the large BERT models typically deployed in data centers.

Conclusion
In this post, we demonstrated how designing ML models for the target hardware expands the on-device ML capabilities of Pixel 6 and brings high-quality, ML-powered experiences to Pixel users. With NAS, we scaled the design of ML models to a variety of on-device tasks and built models that provide state-of-the-art quality on-device within the latency and power constraints of a mobile device. Researchers and ML developers can try out these models in their own use cases by accessing them through the TensorFlow model garden and TF Hub.

Acknowledgements
This work is made possible through a collaboration spanning several teams across Google. We’d like to acknowledge contributions from Rachit Agrawal, Berkin Akin, Andrey Ayupov, Aseem Bathla, Gabriel Bender, Po-Hsein Chu, Yicheng Fan, Max Gubin, Jaeyoun Kim, Quoc Le, Dongdong Li, Jing Li, Yun Long, Hanxiao Lu, Ravi Narayanaswami, Benjamin Panning, Anton Spiridonov, Anakin Tung, Zhuo Wang, Dong Hyuk Woo, Hao Xu, Jiayu Ye, Hongkun Yu, Ping Zhou, and Yanqi Zhuo. Finally, we’d like to thank Tom Small for creating illustrations for this blog post.



1The resulting architectures tend to look like spaghetti because of the connection patterns formed between blocks. 

Read More

Another Step Towards Breakeven Fusion

Posted by Ted Baltz, Senior Staff Software Engineer, Google Research

For more than 70 years, plasma physicists have dreamed of controlled “breakeven” fusion, where a system is capable of releasing more energy in a fusion reaction than it takes to initiate and sustain those reactions. The challenge is that the reactor must create a plasma at a temperature of tens of millions of degrees, which requires a highly complex, finely tuned system to confine and sustain. Further, creating the plasma and maintaining it, requires substantial amounts of energy, which, to date, have exceeded that released in the fusion reaction itself. Nevertheless, if a “breakeven” system could be achieved, it could provide ample zero-carbon electricity, the potential impact of which has driven interest by government laboratories, such as ITER and the National Ignition Facility, as well as several privately funded efforts.

Today we highlight two recently published papers arising from our collaboration with TAE Technologies1, which demonstrate exciting advancements in the field. In “Overview of C-2W: High-temperature, steady-state beam-driven field-reversed configuration plasmas,” published in Nuclear Fusion, we describe the experimental program implemented by TAE, which leverages our improved version of the Optometrist Algorithm for machine optimization. Due in part to this contribution, the current state-of-the-art reactor is able to achieve plasma lifetimes up to three times longer than its predecessor. In “Multi-instrument Bayesian reconstruction of plasma shape evolution in the C-2W experiment,” published in Physics of Plasmas, we detail new methods developed for analyzing indirect measurements of plasma to reconstruct its properties in detail. This work enabled us to better understand how instabilities in the plasma arise and to understand how to mitigate these perturbations in practice.

Optimizing the Next Generation Fusion Device
The C-2W “Norman” machine (named for TAE’s late co-founder Prof. Norman Rostoker) is a nearly complete rebuild of the C-2U machine that we described in 2017. For this updated version, the TAE team integrated new pressure vessels, new power supplies, a new vacuum system, along with other substantial upgrades.

Norman is incredibly complex, with over 1000 machine control parameters, and likewise, it captures extensive amounts of data for each run, including over 1000 measurements of conditions in the plasma alone. And while the measurements of each plasma experiment are extremely rich, there is no simple metric for “goodness”. Further complicating matters, it is not possible to rapidly iterate to improve performance, because only one experiment can be executed every eight minutes. For these reasons, tuning the system is quite difficult and relies on the expert intuition developed by the plasma physicists operating the system. To optimize the new reactor’s performance, we needed a control system capable of handling the tremendous complexity of the system while being able to quickly tune the control parameters in response to the extensive data generated in experiments.

To accomplish this, we further adapted the Optometrist Algorithm that we had developed for the C-2U system to leverage the expertise of the operators. In this algorithm, the physicists compare experiment pairs, and determine whether the trial better achieves the current goals of the experiment, according to their judgment, than the current reference experiment — e.g., achieving increased plasma size at a fixed temperature, increased temperature, etc. By updating the reference accordingly, machine performance improves over time. However, accounting for operator intuition during this process is critical, because the measure of improvement may not be immediately obvious. For example, under some situations, an experiment with much denser plasma that is a little bit colder may, in fact, be “better”, because it may lead to other improvements in subsequent experiments. We further modified the algorithm by fitting a logistic regression to the binary decisions of the expert to guide the trial experiments, making a classic exploration-exploitation tradeoff.

Applying the Optometrist Algorithm to the magnetic field coils that form the plasma, we found a novel timing sequence that provides consistent starting conditions for long-lived plasmas, almost tripling the plasma lifetime when first applied. This was a marked improvement over the regime of net plasma heating first seen on the C-2U machine in 2015.

Plasma formation section of the Norman reactor. The outer coils operate for the duration of the experiments while the inner coils accelerate the plasma in less than 10 microseconds. (Photograph by Erik Lucero)

Bayesian Reconstruction of Plasma Conditions
In addition to optimizing the performance of the machine, we also sought to more thoroughly understand the behavior of the plasmas it is generating. This includes understanding the density profiles, separate electron and ion temperatures, and magnetic fields generated by the plasma. Because the plasma in a fusion generator reaches 30 million Kelvin, which would destroy most solid materials in moments, precise measurements of the plasma conditions are very difficult.

To address this, Norman has a set of indirect diagnostics, generating 5 GB of data per shot, that peer into the plasma without touching it. One of these is a two-story laser interferometer that measures the line-integrated electron density along 14 lines of sight through the plasma, with a sample rate of more than a megahertz. The resulting dataset of line-integrated densities can be used to extract the spatial density profile of the plasma, which is crucial to understanding the plasma behavior. In this case, the Norman reactor generates field-reversed configuration (FRC) plasmas that tend to be best confined when they are hollow (imagine a smoke ring elongated into a barrel shape). The challenge in this situation is that generating the spatial density profiles for such a plasma configuration is an inverse problem, i.e., it is more difficult to infer the shape of the plasma from the measurements (the “inverse” direction) than to predict the measurements from a known shape (the “forward” direction).

Schematic of C-2W confinement vessel showing measurement systems: interferometer lines of sight measuring electron density (magenta), neutral particle beam lines of sight measuring ion density (purple) and magnetic sensors (blue). These disparate measurements are combined in the Bayesian framework.

We developed a TensorFlow implementation of the Hamiltonian Monte Carlo (HMC) algorithm to address the problem of inferring the density profile of the plasma from multiple indirect measurements. Because the plasma is described by hundreds to thousands of variables and we want to reconstruct the state for thousands of frames, linked into “bursts” or short movies, for each plasma experiment, processing on CPUs is insufficient. For this reason, we optimized the HMC algorithm to be executed on GPUs. The Bayesian framework for this involves building “forward” models (i.e., predicting effects from causes) for several instruments, which can predict what the instrument would record, given some specified plasma conditions. We can then use HMC to calculate the probabilities of various possible plasma conditions. Understanding both density and temperature are crucial to the problem of breakeven fusion.

High Frequency Plasma Perturbations
Reconstruction of the plasma conditions does more than just recover the plasma density profile, it also recovers the behavior of high frequency density perturbations in the plasma. TAE has done a large number of experiments to determine if Norman’s neutral particle beams and electrode currents can control these oscillations. In the second paper, we demonstrate the strong mitigating effects of the neutral beams, showing that when the neutral beams are turned off, fluctuations immediately begin growing. The reconstruction allows us to see how the radial density profile of the plasma evolves as the perturbations grow, an understanding of which is key to mitigating such perturbations, allowing long-lived stable plasmas. Following a long tradition of listening to plasma perturbations to better intuit their behavior (e.g., ionospheric “whistlers” have been captured by radio operators for over a century), we translate the perturbations to audio (slowed down 500x) in order to listen to them.

Movie showing spectrogram of magnetic oscillations, played as audio 500 times slower. Different colors indicate different shapes. There is a whistle as the plasma forms, as well as low drum sounds followed immediately by chirps when the plasma destabilizes and recovers. Headphones / earbuds recommended; may annoy pets and humans.

The Future Looks Hot and Stable
With our assistance using machine optimization and data science, TAE achieved their major goals for Norman, which brings us a step closer to the goal of breakeven fusion. The machine maintains a stable plasma at 30 million Kelvin for 30 milliseconds, which is the extent of available power to its systems. They have completed a design for an even more powerful machine, which they hope will demonstrate the conditions necessary for breakeven fusion before the end of the decade. TAE has succeeded with two complete machine builds during our collaboration, and we are really excited to see the third.

Acknowledgments
We wish to thank Michael Dikovsky, Ian Langmore, Peter Norgaard, Scott Geraedts, Rob von Behren, Bill Heavlin, Anton Kast, Tom Madams, John Platt, Ross Koningstein, and Matt Trevithick for their contributions to this work. We thank the TensorFlow Probability team for considerable implementation assistance. Furthermore, we thank Jeff Dean for visiting TAE’s facility in Southern California and providing thoughtful suggestions. As always we are grateful to our colleagues at TAE Technologies for the opportunity to work on such a fascinating and important problem.



1Google owns stock and warrants in TAE Technologies.  

Read More