Part 2: Fast, scalable and accurate NLP: Why TFX is a perfect match for deploying BERT

Part 2: Fast, scalable and accurate NLP: Why TFX is a perfect match for deploying BERT

Guest author Hannes Hapke, Senior Data Scientist, SAP Concur Labs. Edited by Robert Crowe on behalf of the TFX team

Transformer models and the concepts of transfer learning in Natural Language Processing have opened up new opportunities around tasks like sentiment analysis, entity extractions, and question-answer problems.

BERT models allow data scientists to stand on the shoulders of giants. Pre-trained on large corpora, data scientists can then apply transfer learning using these multi-purpose trained transformer models and achieve state-of-the-art results for their domain-specific problems.

In part one of our blog post, we discussed why current deployments of BERT models felt too complex and cumbersome and how the deployment can be simplified through libraries and extensions of the TensorFlow ecosystem. If you haven’t checked out the post, we recommend it as a primer for the implementation discussion in this blog post.

At SAP Concur Labs, we looked at simplifying our BERT deployments and we discovered that the TensorFlow ecosystem provides the perfect tools to achieve simple and concise Transformer deployments. In this blog post, we want to take you on a deep dive of our implementation and how we use components of the TensorFlow ecosystem to achieve scalable, efficient and fast BERT deployments.

Want to jump ahead to the code?

If you would like to jump to the complete example, check out the Colab notebook. It showcases the entire TensorFlow Extended (TFX) pipeline we used to produce a deployable BERT model with the preprocessing steps as part of the model graph. If you want to try out our demo deployment, check out our demo page at SAP ConcurLabs showcasing our sentiment classification project.

Why use Tensorflow Transform for Preprocessing?

Before we answer this question, let’s take a quick look at how a BERT transformer works and how BERT is currently deployed.

What preprocessing does BERT require?

Transformers like BERT are initially trained with two main tasks in mind: masked language models and next sentence predictions (NSP). These tasks require an input data structure beyond the raw input text. Therefore, the BERT model requires, besides the tokenized input text, a tensor input_type_ids to distinguish between different sentences. A second tensor input_mask is used to note the relevant tokens within the input_word_ids tensor. This is required because we will expand our input_word_ids tensors with pad tokens to reach the maximum sequence length. That way all input_word_ids tensors will have the same lengths but the transformer can distinguish between relevant tokens (tokens from our input sentence) and irrelevant pads (filler tokens).

Figure 1: BERT tokenization

Currently, with most transformer model deployments, the tokenization and the conversion of the input text is either handled on the client side or on the server side as part of a pre-processing step outside of the actual model prediction.

This brings a few complexities with it: if the preprocessing happens on the client side then all clients need to be updated if the mapping between tokens and ids changes (e.g., when we want to add a new token). Most deployments with server-side preprocessing use a Flask-based web application to accept the client requests for model predictions, tokenize and convert the input sentence, and then submit the data structures to the deep learning model. Having to maintain two “systems” (one for the preprocessing and one for the actual model inference) is not just cumbersome and error prone, but also makes it difficult to scale.

Figure 2: Current BERT deployments

It would be great if we could get the best of both solutions: easy scalability and simple upgradeability. With TensorFlow Transform (TFT), we can achieve both requirements by building the preprocessing steps as a graph, exporting them together with the deep learning model, and ultimately only deploying one “system” (our combined deep learning model with the integrated preprocessing functionality). It’s worth pointing out that moving all of BERT into preprocessing is not an option when we want to fine-tune the tf.hub module of BERT for our domain-specific task.

Figure 3: BERT with TFX

Processing Natural Language with tf.text

In 2019, the TensorFlow team released a new tensor type: RaggedTensors which allow storing arrays of different lengths in a tensor. The implementation of RaggedTensors became very useful specifically in NLP applications, e.g., when we want to tokenize a 1-D array of sentences into a 2-D RaggedTensor with different array lengths.

Before tokenization:

[
“Clara is playing the piano.”
“Maria likes to play soccer.’”
“Hi Tom!”
]

After the tokenization:

[
[[b'clara'], [b'is'], [b'playing'], [b'the'], [b'piano'], [b'.']],
[[b'maria'], [b'likes'], [b'to'], [b'play'], [b'soccer'], [b'.']],
[[b'hi'], [b'tom'], [b'!']]
]

As we will see in a bit, we use RaggedTensors for our preprocessing pipelines. In late October 2019, the TensorFlow team then released an update to the tf.text module which allows wordpiece tokenization required for the preprocessing of BERT model inputs.

import tensorflow_text as text

vocab_file_path = bert_layer.resolved_object.vocab_file.asset_path.numpy()
do_lower_case = bert_layer.resolved_object.do_lower_case.numpy()

bert_tokenizer = text.BertTokenizer(
vocab_lookup_table=vocab_file_path,
token_out_type=tf.int64,
lower_case=do_lower_case)

TFText provides a comprehensive tokenizer specific for the wordpiece tokenization (BertTokenizer) required by the BERT model. The tokenizer provides the tokenization results as strings (tf.string) or already converted to word_ids (tf.int32).

NOTE: The tf.text version needs to match the imported TensorFlow version. If you use TensorFlow 2.2.x, you will need to install TensorFlow Text version 2.2.x, not 2.1.x or 2.0.x.

How can we preprocess text with TensorFlow Transform?

Earlier, we discussed that we need to convert any input text to our Transformer model into the required data structure of input_word_ids, input_mask, and input_type_ids. We can perform the conversion with TensorFlow Transform. Let’s have a closer look.

For our example model, we want to classify the sentiment of IMDB reviews using the BERT model.

    ‘This is the best movie I have ever seen ...’       -> 1
‘Probably the worst movie produced in 2019 ...’ -> 0
‘Tom Hank’s performance turns this movie into ...’ -> ?

That means that we’ll input only one sentence with every prediction. In practice, that means that all submitted tokens are relevant for the prediction (noted by a vector of ones) and all tokens are part of sentence A (noted by a vector of zeros). We won’t submit any sentence B in our classification case.

If you want to use a BERT model for other tasks, e.g., predicting the similarity of two sentences, entity extraction or question-answer tasks, you would have to adjust the preprocessing step.

Since we want to export the preprocessing steps as a graph, we need to use TensorFlow ops for all preprocessing steps exclusively. Due to this requirement, we can’t reuse functions of Python’s standard library which are implemented in CPython.

The BertTokenizer, provided by TFText, handles the preprocessing of the incoming raw text data. There is no need for lower casing your strings (if you use the uncased BERT model) or removing unsupported characters. The tokenizer from the TFText library requires a table of the support tokens as input. The tokens can be provided as TensorFlow LookupTable, or simply as a file path to a vocabulary file. The BERT model from TFHub provides such a file and we can determine the file path with

import tensorflow_hub as hub

BERT_TFHUB_URL = "https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/2"
bert_layer = hub.KerasLayer(handle=BERT_TFHUB_URL, trainable=True)
vocab_file_path =
bert_layer.resolved_object.vocab_file.asset_path.numpy()

Similarly, we can determine if the loaded BERT model is case-sensitive or not.

do_lower_case = bert_layer.resolved_object.do_lower_case.numpy()

We can now pass the two arguments to our TFText BertTokenizer and specify the data type of our tokens. Since we are passing the tokenized string to the BERT model, we need to provide the tokens as token indices (provided as int64 integers)

bert_tokenizer = text.BertTokenizer(
vocab_lookup_table=vocab_file_path,
token_out_type=tf.int64,
lower_case=do_lower_case
)

After instantiating the BertTokenizer, we can perform the tokenizations with the tokenize method.

 tokens = bert_tokenizer.tokenize(text)

Once the sentence is tokenized into token ids, we will need to prepend the start and append a separation token.

 CLS_ID = tf.constant(101, dtype=tf.int64)
SEP_ID = tf.constant(102, dtype=tf.int64)
start_tokens = tf.fill([tf.shape(text)[0], 1], CLS_ID)
end_tokens = tf.fill([tf.shape(text)[0], 1], SEP_ID)
tokens = tokens[:, :sequence_length - 2]
tokens = tf.concat([start_tokens, tokens, end_tokens], axis=1)

At this point, our token tensors are still ragged tensors with different lengths. TensorFlow Transform expects all tensors to have the same length, therefore we will be padding the truncating the tensors to a maximum length (MAX_SEQ_LEN) and pad shorter tensors with a defined pad token.

PAD_ID = tf.constant(0, dtype=tf.int64)
tokens = tokens.to_tensor(default_value=PAD_ID)
padding = sequence_length - tf.shape(tokens)[1]
tokens = tf.pad(tokens,
[[0, 0], [0, padding]],
constant_values=PAD_ID)

The last step provides us with constant length token vectors which are the final step of the major preprocessing steps. Based on the token vectors, we can create the two required, additional data structures, input_mask, and input_type_ids.

In the case of the input_mask, we want to note all relevant tokens, basically all tokens besides the pad token. Since the pad token has the value zero and all ids are greater or equal zero, we can define the input_mask with the following ops.

input_word_ids = tokenize_text(text)
input_mask = tf.cast(input_word_ids > 0, tf.int64)
input_mask = tf.reshape(input_mask, [-1, MAX_SEQ_LEN])

To determine the input_type_ids is even simpler in our case. Since we are only submitting one sentence, the type ids are all zero in our classification example.

input_type_ids = tf.zeros_like(input_mask)

To complete the preprocessing setup, we will wrap all steps in the preprocessing_fn function which is required by TensorFlow Transform.

def preprocessing_fn(inputs):

def tokenize_text(text, sequence_length=MAX_SEQ_LEN):
...
return tf.reshape(tokens, [-1, sequence_length])

def preprocess_bert_input(text, segment_id=0):
input_word_ids = tokenize_text(text)
...
return (
input_word_ids,
input_mask,
input_type_ids
)
...

input_word_ids, input_mask, input_type_ids =
preprocess_bert_input(_fill_in_missing(inputs['text']))

return {
'input_word_ids': input_word_ids,
'input_mask': input_mask,
'input_type_ids': input_type_ids,
'label': inputs['label']
}

Train the Classification Model

The latest updates of TFX allow the use of native Keras models. In the example code below, we define our classification model. The model takes advantage of the pretrained BERT model and KerasLayer provided by TFHub. To avoid any misalignment between the transform step and the model training, we are creating the input layers dynamically based on the feature specification provided by the transformation step.

 feature_spec = tf_transform_output.transformed_feature_spec() 
feature_spec.pop(_LABEL_KEY)

inputs = {
key: tf.keras.layers.Input(
shape=(max_seq_length),
name=key,
dtype=tf.int32)
for key in feature_spec.keys()}

We need to cast the variables since TensorFlow Transform can only output variables as one of the types: tf.string, tf.int64 or tf.float32 (tf.int64 in our case). However, the BERT model from TensorFlow Hub used in our Keras model above expects tf.int32 inputs. So, in order to align the two TensorFlow components, we need to cast the inputs in the input functions or in the model graph before passing them to the instantiated BERT layer.

 input_word_ids = tf.cast(inputs["input_word_ids"], dtype=tf.int32)
input_mask = tf.cast(inputs["input_mask"], dtype=tf.int32)
input_type_ids = tf.cast(inputs["input_type_ids"], dtype=tf.int32)

Once our inputs are converted to tf.int32 data types, we can pass them to our BERT layer. The layer returns two data structures: a pooled output, which represents the context vector for the entire text and list of vectors providing context specific vector representation for each submitted token. Since we are only interested in the classification of the entire text, we can ignore the second data structure.

bert_layer = load_bert_layer()
pooled_output, _ = bert_layer(
[input_word_ids,
input_mask,
input_type_ids
]
)

Afterwards, we can assemble our classification model with tf.keras. In our example, we used the functional Keras API.

 x = tf.keras.layers.Dense(256, activation='relu')(pooled_output)
dense = tf.keras.layers.Dense(64, activation='relu')(x)
pred = tf.keras.layers.Dense(1, activation='sigmoid')(dense)

model = tf.keras.Model(
inputs=[inputs['input_word_ids'],
inputs['input_mask'],
inputs['input_type_ids']],
outputs=pred
)
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])

The Keras model can then be consumed by our run_fn function which is called by the TFX Trainer component. With the recent updates to TFX, the integration of Keras models was simplified. No “detour” with TensorFlow’s model_to_estimator function is required anymore. We can now define a generic run_fn function which executes the model training and exports the model after the completion of the training.

Here is an example of the setup of a run_fn function to work with the latest TFX version:

def run_fn(fn_args: TrainerFnArgs):
tf_transform_output = tft.TFTransformOutput(fn_args.transform_output)
train_dataset = _input_fn(
fn_args.train_files, tf_transform_output, 32)
eval_dataset = _input_fn(
fn_args.eval_files, tf_transform_output, 32)

mirrored_strategy = tf.distribute.MirroredStrategy()
with mirrored_strategy.scope():
model = get_model(tf_transform_output=tf_transform_output)

model.fit(
train_dataset,
steps_per_epoch=fn_args.train_steps,
validation_data=eval_dataset,
validation_steps=fn_args.eval_steps)

signatures = {
'serving_default':
_get_serve_tf_examples_fn(model, tf_transform_output
).get_concrete_function(
tf.TensorSpec(
shape=[None],
dtype=tf.string,
name='examples')),
}
model.save(
fn_args.serving_model_dir,
save_format='tf',
signatures=signatures)

It is worth taking special note of a few lines from the example Trainer function. With the latest release of TFX, we can now take advantage of the distribution strategies introduced in Keras last year in our TFX trainer components.

mirrored_strategy = tf.distribute.MirroredStrategy()
with mirrored_strategy.scope():
model = get_model(tf_transform_output=tf_transform_output)

It is most efficient to preprocess the data sets ahead of the model training, which allows for faster training, especially when the trainer passes multiple times over the same data set.
Therefore, TensorFlow Transform will perform the preprocessing prior to the training and evaluation, and store the preprocessed data as TFRecords.

{'input_mask': array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]),
'input_type_ids': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]),
'input_word_ids': array([ 101, 2023, 3319, 3397, 27594, 2545, 2005, 2216, 2040, ..., 2014, 102]),
'label': array([0], dtype=float32)}

This allows us to generate a preprocessing graph which then can be applied during our post-training prediction mode. Because we reuse the preprocessing graph, we can avoid skew between the training and the prediction preprocessing.

In our run_fn function we can then “wire up” the preprocessed training and evaluation data sets instead of the raw data sets to be used during the training:

 tf_transform_output = tft.TFTransformOutput(fn_args.transform_output)
train_dataset = _input_fn(fn_args.train_files, tf_transform_output, 32)
eval_dataset = _input_fn(fn_args.eval_files, tf_transform_output, 32)
...
model.fit(
train_dataset,
validation_data=eval_dataset,
...)

Once the training is completed, we can export our trained model together with the processing steps.

Export the Model with its Preprocessing Graph

After the model.fit() completes the model training, we are calling model.save()to export the model in the SavedModel format. In our model signature definition, we are calling the function _get_serve_tf_examples_fn() which parses serialized tf.Example records submitted to our TensorFlow Serving endpoint (e.g. in our case the raw text strings to be classified) and then applies the transformations preserved in the TensorFlow Transform graph. The model prediction is then performed with the transformed features which are the output of the model.tft_layer(parsed_features)call. In our case, this would be the BERT token ids, masks ids and type ids.

def _get_serve_tf_examples_fn(model, tf_transform_output):
model.tft_layer = tf_transform_output.transform_features_layer()

@tf.function
def serve_tf_examples_fn(serialized_tf_examples):
feature_spec = tf_transform_output.raw_feature_spec()
feature_spec.pop(_LABEL_KEY)
parsed_features = tf.io.parse_example(serialized_tf_examples, feature_spec)

transformed_features = model.tft_layer(parsed_features)
return model(transformed_features)

return serve_tf_examples_fn

The _get_serve_tf_examples_fn() function is the important connection between the transformation graph generated by TensorFlow Transform, and the trained tf.Keras model. Since the prediction input is passed through the model.tft_layer(), it guarantees that the exported SavedModel will include the same preprocessing that was performed during training. The SavedModel is one graph, consisting of both the preprocessing and the model graphs.

With the deployment of the BERT classification model through TensorFlow Serving, we can now submit raw strings to our model server (submitted as tf.Example records) and receive a prediction result without any preprocessing on the client side or a complicated model deployment with a preprocessing step.

Future work

The presented work allows a simplified deployment of BERT models. The preprocessing steps shown in our demo project can easily be extended to handle more complicated preprocessing, e.g., for tasks like entity extractions or question-answer tasks. We are also investigating if the prediction latency can be further reduced if we reuse a quantized or distilled version of the pre-trained BERT model (e.g., Albert).

Thank you for reading our two-part blog post. Feel free to get in touch if you have questions or recommendations by email.

Further Reading

If you are interested in an overview of the TensorFlow libraries we used in this project, we recommend the part one of this blog post.

In case you want to try out our demo deployment, check out our demo page at SAP ConcurLabs showcasing our sentiment classification project.

If you are interested in the inner workings of TensorFlow Extended (TFX) and TensorFlow Transform, check out this upcoming O’Reilly publication “Building Machine Learning Pipelines with TensorFlow” (pre-release available online).

For more information

To learn more about TFX check out the TFX website, join the TFX discussion group, dive into other posts in the TFX blog, watch our TFX playlist on YouTube, and subscribe to the TensorFlow channel.

Acknowledgments

This project wouldn’t have been possible without the tremendous support from Catherine Nelson, Richard Puckett, Jessica Park, Robert Reed, and the SAP’s Concur Labs team. Thanks goes also out to Robert Crowe, Irene Giannoumis, Robby Neale, Konstantinos Katsiapis, Arno Eigenwillig, and the rest of the TensorFlow team for discussing implementation details and for the detailed review of this post. A special thanks to Varshaa Naganathan, Zohar Yahav, and Terry Huang from Google’s TensorFlow team for providing updates to the TensorFlow libraries to make this pipeline implementation possible. Big thanks also to Cole Howard from Talenpair for always enlightening discussions about Natural Language Processing.
Read More

Engineers put tens of thousands of artificial brain synapses on a single chip

MIT engineers have designed a “brain-on-a-chip,” smaller than a piece of confetti, that is made from tens of thousands of artificial brain synapses known as memristors — silicon-based components that mimic the information-transmitting synapses in the human brain.

The researchers borrowed from principles of metallurgy to fabricate each memristor from alloys of silver and copper, along with silicon. When they ran the chip through several visual tasks, the chip was able to “remember” stored images and reproduce them many times over, in versions that were crisper and cleaner compared with existing memristor designs made with unalloyed elements.

Their results, published today in the journal Nature Nanotechnology, demonstrate a promising new memristor design for neuromorphic devices — electronics that are based on a new type of circuit that processes information in a way that mimics the brain’s neural architecture. Such brain-inspired circuits could be built into small, portable devices, and would carry out complex computational tasks that only today’s supercomputers can handle.

“So far, artificial synapse networks exist as software. We’re trying to build real neural network hardware for portable artificial intelligence systems,” says Jeehwan Kim, associate professor of mechanical engineering at MIT. “Imagine connecting a neuromorphic device to a camera on your car, and having it recognize lights and objects and make a decision immediately, without having to connect to the internet. We hope to use energy-efficient memristors to do those tasks on-site, in real-time.”

Wandering ions

Memristors, or memory transistors, are an essential element in neuromorphic computing. In a neuromorphic device, a memristor would serve as the transistor in a circuit, though its workings would more closely resemble a brain synapse — the junction between two neurons. The synapse receives signals from one neuron, in the form of ions, and sends a corresponding signal to the next neuron.

A transistor in a conventional circuit transmits information by switching between one of only two values, 0 and 1, and doing so only when the signal it receives, in the form of an electric current, is of a particular strength. In contrast, a memristor would work along a gradient, much like a synapse in the brain. The signal it produces would vary depending on the strength of the signal that it receives. This would enable a single memristor to have many values, and therefore carry out a far wider range of operations than binary transistors.

Like a brain synapse, a memristor would also be able to “remember” the value associated with a given current strength, and produce the exact same signal the next time it receives a similar current. This could ensure that the answer to a complex equation, or the visual classification of an object, is reliable — a feat that normally involves multiple transistors and capacitors.

Ultimately, scientists envision that memristors would require far less chip real estate than conventional transistors, enabling powerful, portable computing devices that do not rely on supercomputers, or even connections to the Internet.

Existing memristor designs, however, are limited in their performance. A single memristor is made of a positive and negative electrode, separated by a “switching medium,” or space between the electrodes. When a voltage is applied to one electrode, ions from that electrode flow through the medium, forming a “conduction channel” to the other electrode. The received ions make up the electrical signal that the memristor transmits through the circuit. The size of the ion channel (and the signal that the memristor ultimately produces) should be proportional to the strength of the stimulating voltage.

Kim says that existing memristor designs work pretty well in cases where voltage stimulates a large conduction channel, or a heavy flow of ions from one electrode to the other. But these designs are less reliable when memristors need to generate subtler signals, via thinner conduction channels.

The thinner a conduction channel, and the lighter the flow of ions from one electrode to the other, the harder it is for individual ions to stay together. Instead, they tend to wander from the group, disbanding within the medium. As a result, it’s difficult for the receiving electrode to reliably capture the same number of ions, and therefore transmit the same signal, when stimulated with a certain low range of current.

Borrowing from metallurgy

Kim and his colleagues found a way around this limitation by borrowing a technique from metallurgy, the science of melding metals into alloys and studying their combined properties.

“Traditionally, metallurgists try to add different atoms into a bulk matrix to strengthen materials, and we thought, why not tweak the atomic interactions in our memristor, and add some alloying element to control the movement of ions in our medium,” Kim says.

Engineers typically use silver as the material for a memristor’s positive electrode. Kim’s team looked through the literature to find an element that they could combine with silver to effectively hold silver ions together, while allowing them to flow quickly through to the other electrode.

The team landed on copper as the ideal alloying element, as it is able to bind both with silver, and with silicon.

“It acts as a sort of bridge, and stabilizes the silver-silicon interface,” Kim says.

To make memristors using their new alloy, the group first fabricated a negative electrode out of silicon, then made a positive electrode by depositing a slight amount of copper, followed by a layer of silver. They sandwiched the two electrodes around an amorphous silicon medium. In this way, they patterned a millimeter-square silicon chip with tens of thousands of memristors.

As a first test of the chip, they recreated a gray-scale image of the Captain America shield. They equated each pixel in the image to a corresponding memristor in the chip. They then modulated the conductance of each memristor that was relative in strength to the color in the corresponding pixel.

The chip produced the same crisp image of the shield, and was able to “remember” the image and reproduce it many times, compared with chips made of other materials.

The team also ran the chip through an image processing task, programming the memristors to alter an image, in this case of MIT’s Killian Court, in several specific ways, including sharpening and blurring the original image. Again, their design produced the reprogrammed images more reliably than existing memristor designs.

“We’re using artificial synapses to do real inference tests,” Kim says. “We would like to develop this technology further to have larger-scale arrays to do image recognition tasks. And some day, you might be able to carry around artificial brains to do these kinds of tasks, without connecting to supercomputers, the internet, or the cloud.”

This research was funded, in part, by the MIT Research Support Committee funds, the MIT-IBM Watson AI Lab, Samsung Global Research Laboratory, and the National Science Foundation.

Read More

Announcing the winners of the Measuring Economic Impact in the Digital Economy research awards

In January 2020, Facebook launched a request for research proposals that address the economic impact of digital technologies. For this RFP, we pledged $1 million in research award funding as part of Facebook’s continued goal of supporting independent research that helps us better understand and measure the impact of the digital economy. Today, we are announcing the winners of these research awards.

VIEW RFP

“The survival of small and medium-sized businesses hinges on their being able to operate and reach as many customers as possible during the COVID-19 pandemic,” says Arturo Gonzalez, Director of Advocacy and Research at Facebook. “Part of the solution is being online to the fullest extent possible.

“Facebook believes that research and the academic community play an important role in analyzing the impact of digital tools for small and medium-sized businesses. Our goal for this RFP was to foster further innovation in academia in this space, to accelerate the production of high-quality, independent, and data-driven research that, when all is said and done, advances the goal of helping small businesses.”

We were especially interested in proposals that addressed the economic consequences and implications of developing digital technologies for small businesses. We were additionally interested in research projects in the following topics, although we encouraged creative approaches beyond these areas:

  • A taxonomy of the digital economy, including subsectors within the digital economy
    • The demarcations between the analog and digital economies
    • Shortcomings (if any) of national accounts and other official statistics of GDP
  • Theoretical models of economic impact of the digital economy
    • How to obtain causal impacts and/or identify biases found in noncausal estimates
    • Incorporate the feedback effects due to the multisided nature of platforms
    • What needs to be solved for in order to estimate the model
  • Empirical application of theoretical models of causal economic impact
    • The biases (if any) present in reduced-form empirical model
    • Limitations of the model and results, including in scope (geographic, size of business), due to data availability or other issues

We received 57 proposals from 16 countries. Thank you to all the researchers who took the time to submit a proposal, and congratulations to the award recipients. For more information about areas of interest, eligibility, requirements, and more, visit the application page.

Research award winners

Digital economy and regional inequalities in the UK
Raquel Ortega-Artiles, Emmanouil Tranos, Giulia Occhini, Levi Wolf, Tasos Kitsos (University of Birmingham)

The expansion of the internet and economic growth worldwide
Paul Raschky, Klaus Ackermann, Simon Angus (Monash University)

GDP-B: A new well-being metric in the era of the digital economy
Erik Brynjolfsson, Avinash Collis, Jae Joon Lee (Massachusetts Institute of Technology)

The impact of online listings on small business performance
Abhishek Nagaraj, Gauri Subramani, Michael Luca (University of California, Berkeley)

Measuring the impact of the digital economy in Canada
Sarah Doyle, David Wolfe (Ryerson University)

Modeling and measuring the economic impacts of digital platform innovation
Johannes M. Bauer, Steven S. Wildman, Tiago Sousa Prado (Michigan State University)

Social sharing and the growth of small businesses: Measurement and strategy
Tianshu Sun (University of Southern California)

The post Announcing the winners of the Measuring Economic Impact in the Digital Economy research awards appeared first on Facebook Research.

Read More

If transistors can’t get smaller, then coders have to get smarter

In 1965, Intel co-founder Gordon Moore predicted that the number of transistors that could fit on a computer chip would grow exponentially — and they did, doubling about every two years. For half a century, Moore’s Law has endured: Computers have gotten smaller, faster, cheaper, and more efficient, enabling the rapid worldwide adoption of PCs, smartphones, high-speed internet, and more.

This miniaturization trend has led to silicon chips today that have almost unimaginably small circuitry. Transistors, the tiny switches that implement computer microprocessors, are so small that 1,000 of them laid end-to-end are no wider than a human hair. And for a long time, the smaller the transistors were, the faster they could switch. But today, we’re approaching the limit of how small transistors can get. As a result, over the past decade researchers have been scratching their heads to find other ways to improve performance so that the computer industry can continue to innovate.

While we wait for the maturation of new computing technologies like quantum, carbon nanotubes, or photonics (which may take a while), other approaches will be needed to get performance as Moore’s Law comes to an end. In a recent journal article published in Science, a team from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) identifies three key areas to prioritize to continue to deliver computing speed-ups: better software, new algorithms, and more streamlined hardware.

Senior author Charles E. Leiserson says that the performance benefits from miniaturization have been so great that, for decades, programmers have been able to prioritize making code-writing easier rather than making the code itself run faster. The inefficiency that this tendency introduces has been acceptable, because faster computer chips have always been able to pick up the slack.

“But nowadays, being able to make further advances in fields like machine learning, robotics, and virtual reality will require huge amounts of computational power that miniaturization can no longer provide,” says Leiserson, the Edwin Sibley Webster Professor in MIT’s Department of Electrical Engineering and Computer Science. “If we want to harness the full potential of these technologies, we must change our approach to computing.”

Leiserson co-wrote the paper, published this week, with Research Scientist Neil Thompson, Professor Daniel Sanchez, Adjunct Professor Butler Lampson, and research scientists Joel Emer, Bradley Kuszmaul, and Tao Schardl.

No more Moore

The authors make recommendations about three areas of computing: software, algorithms, and hardware architecture.

With software, they say that programmers’ previous prioritization of productivity over performance has led to problematic strategies like “reduction”: taking code that worked on problem A and using it to solve problem B. For example, if someone has to create a system to recognize yes-or-no voice commands, but doesn’t want to code a whole new custom program, they could take an existing program that recognizes a wide range of words and tweak it to respond only to yes-or-no answers.

While this approach reduces coding time, the inefficiencies it creates quickly compound: if a single reduction is 80 percent as efficient as a custom solution, and you then add 20 layers of reduction, the code will ultimately be 100 times less efficient than it could be.

“These are the kinds of strategies that programmers have to rethink as hardware improvements slow down,” says Thompson. “We can’t keep doing ‘business as usual’ if we want to continue to get the speed-ups we’ve grown accustomed to.”

Instead, the researchers recommend techniques like parallelizing code. Much existing software has been designed using ancient assumptions that processors can only do only one operation at a time. But in recent years multicore technology has enabled complex tasks to be completed thousands of times faster and in a much more energy-efficient way. 

“Since Moore’s Law will not be handing us improved performance on a silver platter, we will have to deliver performance the hard way,” says Moshe Vardi, a professor in computational engineering at Rice University. “This is a great opportunity for computing research, and the [MIT CSAIL] report provides a road map for such research.” 

As for algorithms, the team suggests a three-pronged approach that includes exploring new problem areas, addressing concerns about how algorithms scale, and tailoring them to better take advantage of modern hardware.

Lastly, in terms of hardware architecture, the team advocates that hardware be streamlined so that problems can be solved with fewer transistors and less silicon. Streamlining includes using simpler processors and creating hardware tailored to specific applications, like the graphics-processing unit is tailored for computer graphics. 

“Hardware customized for particular domains can be much more efficient and use far fewer transistors, enabling applications to run tens to hundreds of times faster,” says Schardl. “More generally, hardware streamlining would further encourage parallel programming, creating additional chip area to be used for more circuitry that can operate in parallel.”

While these approaches may be the best path forward, the researchers say that it won’t always be an easy one. Organizations that use such techniques may not know the benefits of their efforts until after they’ve invested a lot of engineering time. Plus, the speed-ups won’t be as consistent as they were with Moore’s Law: they may be dramatic at first, and then require large amounts of effort for smaller improvements. 

Certain companies have already gotten the memo.

“For tech giants like Google and Amazon, the huge scale of their data centers means that even small improvements in software performance can result in large financial returns,” says Thompson.  “But while these firms may be leading the charge, many others will need to take these issues seriously if they want to stay competitive.”

Getting improvements in the areas identified by the team will also require building up the infrastructure and workforce that make them possible.  

“Performance growth will require new tools, programming languages, and hardware to facilitate more and better performance engineering,” says Leiserson. “It also means computer scientists being better educated about how we can make software, algorithms, and hardware work together, instead of putting them in different silos.”

This work was supported, in part, by the National Science Foundation.

Read More

Inside Uber ATG’s Data Mining Operation: Identifying Real Road Scenarios at Scale for Machine Learning

How did the pedestrian cross the road?

Contrary to popular belief, sometimes the answer isn’t as simple as “to get to the other side.” To bring safe, reliable self-driving vehicles (SDVs) to the streets at Uber Advanced Technologies Group (ATG)

The post Inside Uber ATG’s Data Mining Operation: Identifying Real Road Scenarios at Scale for Machine Learning appeared first on Uber Engineering Blog.

Read More