Announcing the Patent Phrase Similarity Dataset

Patent documents typically use legal and highly technical language, with context-dependent terms that may have meanings quite different from colloquial usage and even between different documents. The process of using traditional patent search methods (e.g., keyword searching) to search through the corpus of over one hundred million patent documents can be tedious and result in many missed results due to the broad and non-standard language used. For example, a “soccer ball” may be described as a “spherical recreation device”, “inflatable sportsball” or “ball for ball game”. Additionally, the language used in some patent documents may obfuscate terms to their advantage, so more powerful natural language processing (NLP) and semantic similarity understanding can give everyone access to do a thorough search.

The patent domain (and more general technical literature like scientific publications) poses unique challenges for NLP modeling due to its use of legal and technical terms. While there are multiple commonly used general-purpose semantic textual similarity (STS) benchmark datasets (e.g., STS-B, SICK, MRPC, PIT), to the best of our knowledge, there are currently no datasets focused on technical concepts found in patents and scientific publications (the somewhat related BioASQ challenge contains a biomedical question answering task). Moreover, with the continuing growth in size of the patent corpus (millions of new patents are issued worldwide every year), there is a need to develop more useful NLP models for this domain.

Today, we announce the release of the Patent Phrase Similarity dataset, a new human-rated contextual phrase-to-phrase semantic matching dataset, and the accompanying paper, presented at the SIGIR PatentSemTech Workshop, which focuses on technical terms from patents. The Patent Phrase Similarity dataset contains ~50,000 rated phrase pairs, each with a Cooperative Patent Classification (CPC) class as context. In addition to similarity scores that are typically included in other benchmark datasets, we include granular rating classes similar to WordNet, such as synonym, antonym, hypernym, hyponym, holonym, meronym, and domain related. This dataset (distributed under the Creative Commons Attribution 4.0 International license) was used by Kaggle and USPTO as the benchmark dataset in the U.S. Patent Phrase to Phrase Matching competition to draw more attention to the performance of machine learning models on technical text. Initial results show that models fine-tuned on this new dataset perform substantially better than general pre-trained models without fine-tuning.

The Patent Phrase Similarity Dataset
To better train the next generation of state-of-the-art models, we created the Patent Phrase Similarity dataset, which includes many examples to address the following problems: (1) phrase disambiguation, (2) adversarial keyword matching, and (3) hard negative keywords (i.e., keywords that are unrelated but received a high score for similarity from other models ). Some keywords and phrases can have multiple meanings (e.g., the phrase “mouse” may refer to an animal or a computer input device), so we disambiguate the phrases by including CPC classes with each pair of phrases. Also, many NLP models (e.g., bag of words models) will not do well on data with phrases that have matching keywords but are otherwise unrelated (adversarial keywords, e.g., “container section” → “kitchen container”, “offset table” → “table fan”). The Patent Phrase Similarity dataset is designed to include many examples of matching keywords that are unrelated through adversarial keyword match, enabling NLP models to improve their performance.

Each entry in the Patent Phrase Similarity dataset contains two phrases, an anchor and target, a context CPC class, a rating class, and a similarity score. The dataset contains 48,548 entries with 973 unique anchors, split into training (75%), validation (5%), and test (20%) sets. When splitting the data, all of the entries with the same anchor are kept together in the same set. There are 106 different context CPC classes and all of them are represented in the training set.

Anchor Target Context Rating Score
acid absorption absorption of acid B08 exact 1.0
acid absorption acid immersion B08 synonym 0.75
acid absorption chemically soaked B08 domain related 0.25
acid absorption acid reflux B08 not related 0.0
gasoline blend petrol blend C10 synonym 0.75
gasoline blend fuel blend C10 hypernym 0.5
gasoline blend fruit blend C10 not related 0.0
faucet assembly water tap A22 hyponym 0.5
faucet assembly water supply A22 holonym 0.25
faucet assembly school assembly A22 not related 0.0
A small sample of the dataset with anchor and target phrases, context CPC class (B08: Cleaning, C10: Petroleum, gas, fuel, lubricants, A22: Butchering, processing meat/poultry/fish), a rating class, and a similarity score.

Generating the Dataset
To generate the Patent Phrase Similarity data, we first process the ~140 million patent documents in the Google Patent’s corpus and automatically extract important English phrases, which are typically noun phrases (e.g., “fastener”, “lifting assembly”) and functional phrases (e.g., “food processing”, “ink printing”). Next, we filter and keep phrases that appear in at least 100 patents and randomly sample around 1,000 of these filtered phrases, which we call anchor phrases. For each anchor phrase, we find all of the matching patents and all of the CPC classes for those patents. We then randomly sample up to four matching CPC classes, which become the context CPC classes for the specific anchor phrase.

We use two different methods for pre-generating target phrases: (1) partial matching and (2) a masked language model (MLM). For partial matching, we randomly select phrases from the entire corpus that partially match with the anchor phrase (e.g., “abatement” → “noise abatement”, “material formation” → “formation material”). For MLM, we select sentences from the patents that contain a given anchor phrase, mask them out, and use the Patent-BERT model to predict candidates for the masked portion of the text. Then, all of the phrases are cleaned up, which includes lowercasing and the removal of punctuation and certain stopwords (e.g., “and”, “or”, “said”), and sent to expert raters for review. Each phrase pair is rated independently by two raters skilled in the technology area. Each rater also generates new target phrases with different ratings. Specifically, they are asked to generate some low-similarity and unrelated targets that partially match with the original anchor and/or some high-similarity targets. Finally, the raters meet to discuss their ratings and come up with final ratings.

Dataset Evaluation
To evaluate its performance, the Patent Phrase Similarity dataset was used in the U.S. Patent Phrase to Phrase Matching Kaggle competition. The competition was very popular, drawing about 2,000 competitors from around the world. A variety of approaches were successfully used by the top scoring teams, including ensemble models of BERT variants and prompting (see the full discussion for more details). The table below shows the best results from the competition, as well as several off-the-shelf baselines from our paper. The Pearson correlation metric was used to measure the linear correlation between the predicted and true scores, which is a helpful metric to target for downstream models so they can distinguish between different similarity ratings.

The baselines in the paper can be considered zero-shot in the sense that they use off-the-shelf models without any further fine-tuning on the new dataset (we use these models to embed the anchor and target phrases separately and compute the cosine similarity between them). The Kaggle competition results demonstrate that by using our training data, one can achieve significant improvements compared with existing NLP models. We have also estimated human performance on this task by comparing a single rater’s scores to the combined score of both raters. The results indicate that this is not a particularly easy task, even for human experts.

Model Training Pearson correlation
word2vec Zero-shot 0.44
Patent-BERT Zero-shot 0.53
Sentence-BERT Zero-shot 0.60
Kaggle 1st place single Fine-tuned 0.87
Kaggle 1st place ensemble Fine-tuned 0.88
Human 0.93
Performance of popular models with no fine-tuning (zero-shot), models fine-tuned on the Patent Phrase Similarity dataset as part of the Kaggle competition, and single human performance.

Conclusion and Future Work
We present the Patent Phrase Similarity dataset, which was used as the benchmark dataset in the U.S. Patent Phrase to Phrase Matching competition, and demonstrate that by using our training data, one can achieve significant improvements compared with existing NLP models.

Additional challenging machine learning benchmarks can be generated from the patent corpus, and patent data has made its way into many of today’s most-studied models. For example, the C4 text dataset used to train T5 contains many patent documents. The BigBird and LongT5 models also use patents via the BIGPATENT dataset. The availability, breadth and open usage terms of full text data (see Google Patents Public Datasets) makes patents a unique resource for the research community. Possibilities for future tasks include massively multi-label classification, summarization, information retrieval, image-text similarity, citation graph prediction, and translation. See the paper for more details.

Acknowledgements
This work was possible through a collaboration with Kaggle, Satsyil Corp., USPTO, and MaxVal. Thanks to contributors Ian Wetherbee from Google, Will Cukierski and Maggie Demkin from Kaggle. Thanks to Jerry Ma, Scott Beliveau, and Jamie Holcombe from USPTO and Suja Chittamahalingam from MaxVal for their contributions.

Read More

DALL·E: Introducing Outpainting

DALL·E: Introducing Outpainting

Extend creativity and tell a bigger story with DALL-E images of any size

DALL·E: Introducing Outpainting
Original outpainting by Emma Catnip

Today we’re introducing Outpainting, a new feature which helps users extend their creativity by continuing an image beyond its original borders — adding visual elements in the same style, or taking a story in new directions — simply by using a natural language description.


DALL·E: Introducing Outpainting

Original: Girl with a Pearl Earring by Johannes Vermeer
Outpainting: August Kamp

DALL·E’s Edit feature already enables changes within a generated or uploaded image — a capability known as Inpainting. Now, with Outpainting, users can extend the original image, creating large-scale images in any aspect ratio. Outpainting takes into account the image’s existing visual elements — including shadows, reflections, and textures — to maintain the context of the original image.

More than one million people are using DALL·E, the AI system that generates original images and artwork from a natural language description, as a creative tool today. Artists have already created remarkable images with the new Outpainting feature, and helped us better understand its capabilities in the process.

DALL·E: Introducing Outpainting
Original outpainting by Tyna Eloundou
DALL·E: Introducing Outpainting
Original outpainting by OpenAI
DALL·E: Introducing Outpainting
Outpainting by David Schnurr
DALL·E: Introducing Outpainting
Original outpainting by Sonia Levesque
DALL·E: Introducing Outpainting
Original outpainting by Danielle Baskin
DALL·E: Introducing Outpainting
Original outpainting by Danielle Baskin
DALL·E: Introducing Outpainting
Original outpainting by Chad Nelson

Outpainting is now available to all DALL·E users on desktop. To discover new realms of creativity, visit labs.openai.com or join the waitlist.


Featured artists:

DALL·E: Introducing Outpainting

Emma Catnip


View Profile

DALL·E: Introducing Outpainting

August Kamp


View Profile

DALL·E: Introducing Outpainting

Sonia Levesque


View Profile

DALL·E: Introducing Outpainting

Danielle Baskin


View Profile

DALL·E: Introducing Outpainting

Chad Nelson


View Profile


OpenAI

JAX on the Web with TensorFlow.js

Posted by  Andreas Steiner and Marc van Zee, Google Research, Brain Team

Introduction

In this blog post we demonstrate how to convert and run Python-based JAX functions and Flax machine learning models in the browser using TensorFlow.js. We have produced three examples of JAX-to-TensorFlow.js conversion each with increasing complexity: 

  1. A simple JAX function 
  2. An image classification Flax model trained on the MNIST dataset 
  3. A full image/text Vision Transformer (ViT) demo, which was used for the Google AI blog post Locked-Image Tuning: Adding Language Understanding to Image Models (a preview of the demo is shown in Figure 1 below)

For each example, there are Google Colab notebooks you can use to try the JAX-to-TensorFlow.js conversion yourself.

Figure 1. TensorFlow.js model matching user-provided text prompts to a precomputed image embedding (try it out yourself). See Part 3: LiT Demo below for implementation details.

Background: JAX and TensorFlow.js

JAX is a NumPy-like library developed by Google Research for high performance computing. It uses XLA to compile programs optimized for GPUs and TPUs. Flax is a popular neural network library built on top of JAX. Researchers have been using JAX/Flax to train very large models with billions of parameters (such as PaLM for language understanding and generation, or Imagen for image generation), making full use of modern hardware. If you’re new to JAX and Flax, start with this JAX 101 tutorial and this Flax Getting Started example.

TensorFlow started as a library for ML towards the end of 2015 and has since become a rich ecosystem that includes tools for productionizing ML pipelines (TFX), data visualization (TensorBoard), deploying ML models to edge devices (TensorFlow Lite), and devices running on a web browser or any device capable of executing JavaScript (TensorFlow.js). Models developed in JAX or Flax can tap into this rich ecosystem by first converting such a model to the TensorFlow SavedModel format, and then using the same tooling as if they had been developed in TensorFlow natively.

This is now made even easier for TensorFlow.js through the new Python API — tfjs.converters.convert_jax() — which allows users to convert a JAX model written in Python to a web format (.json) directly, so that the model can be used in the browser with Tensorflow.js.

To learn how to perform JAX-to-TensorFlow.js conversion, check out the three examples below.

Example 1: Converting a simple JAX function

In this introductory example, you’ll convert a few simple JAX functions using converters.convert_jax().

Internally, this function does the following:

  1. It converts to the Tensorflow SavedModel format, which contains a complete TensorFlow program, including trained parameters (i.e., tf.Variables) and computation.
  2. Then, it constructs a TensorFlow.js model from that SavedModel (refer to Figure 2 for more details).

Figure 2. High-level visualization of the conversion steps inside jax_conversion.from_jax, which converts a JAX function to a Tensorflow.js model.

To convert a Flax model to TensorFlow.js, you need a few things:

  • A function that runs the forward pass of the model.
  • The model parameters (this is usually a dict-like structure).
  • A specification of the shapes and dtypes of the inputs to the function.

The following examples uses a single parameter weight and implements a function prod, which multiplies the input with the parameter (in a real example, params will contain the all weights of the modules used in the neural network):


def prod(params, xs):

  return params[‘weight’] * xs

Let’s call this function with some values and verify the output makes sense:

params = {‘weight’: np.array([0.5, 1])}

# This represents a batch of 3 inputs, each of length 2.

xs = np.arange(6).reshape((3, 2))

prod(params, xs)

This gives the following output, where each batch element is element-wise multiplied by [0.5, 1]:

[[0. 1.]

 [1. 3.]

 [2. 5.]]

Next, let’s convert this to TensorFlow.js using convert_jax and use the helper function get_tfjs_predict_fn (which can be found in the Colab), allowing us to verify that the outputs for the JAX function and the web model match. (Note: this helper function will only work in Colab, as it uses some tooling to run the web model using Javascript.)

tfjs.converters.convert_jax(

    prod,

    params, 

    input_signatures=[tf.TensorSpec((3, 2), tf.float32)],

    model_dir=model_dir)


tfjs_predict_fn = get_tfjs_predict_fn(model_dir)

tfjs_predict_fn(xs)  # Same output as JAX.

Dynamic shapes are supported as usual in Tensorflow by passing the value None for the dynamic dimensions in input_signature. Additionally, one should pass the argument polymorphic_shapes specifying names for dynamic dimensions. Note that polymorphism is a term coming from type theory, but here we use it to mean that the function works for multiple related shapes, e.g., for multiple batch sizes. This is necessary for shape checking in the JAX function (see Colab for more examples, and here for more documentation on this notation).

tfjs.converters.convert_jax(

    prod,

    params, 

    input_signatures=[tf.TensorSpec((None, 2), tf.float32)],

    polymorphic_shapes=[‘(b, 2)’)],

    model_dir=model_dir)


tfjs_predict_fn = get_tfjs_predict_fn(model_dir)

tfjs_predict_fn(np.array([[1., 2.]]))  # Outputs: [[0.5, 2. ]]

Example 2: MNIST Model


Let’s use the same conversion code snippet from before, but this time we’ll use TensorFlow.js to run a real ML model. Flax provides a Colab example of an MNIST classifier that we’ll use as a starting point.

After cloning the repository, the model can be trained using:

train_ds, test_ds = train.get_datasets()

state = train.train_and_evaluate(config, workdir=f‘./workdir’)

This yields a state.apply_fn that can be used to compute logits for input images. Note that the function expects the first argument to be the model weights state.params. Given a batch of input images shaped [batch_size, 28, 28, 1], this will produce the logits for the probability distribution over the ten labels for every model (shaped [batch_size, 10]).

logits = state.apply_fn({‘params’: state.params}, imgs)

The MNIST model’s state.apply_fn() is then converted exactly the same way as in the previous section – after all, it’s a pure function that takes params and images as inputs and returns logits:

tfjs.converters.convert_jax(

    state.apply_fn,

    {‘params’: state.params},

    input_signatures=[tf.TensorSpec((1, 28, 28, 1), tf.float32)],

    model_dir=tfjs_model_dir,

)

On the JavaScript side, you load the model asynchronously, showing a simple progress update in the status text, making sure to give some feedback while the model weights are transferred:

tf.loadGraphModel(modelDir + ‘/model.json’, {

    onProgress: p => status.innerText = `loading model: ${Math.round(p*100)}%`

})

A minimal UI is loaded from this snippet, and in the callback function you call the TensorFlow.js model and output the predictions. The function parameter img is a Uint8Array of length 28*28, which is first converted to a TensorFlow.js tf.tensor, before computing the model outputs, and converting them to probabilities via the tf.softmax() function. The output values from the computation are then waited for synchronously by calling .dataSync(), and converted to JavaScript arrays before they’re displayed.

ui.onUpdate(img => {

  const imgs = tf.tensor(img).cast(‘float32’).reshape([1, 28, 28, 1])

  const logits = model.predict(imgs)

  const preds = tf.softmax(logits)

  const { values, indices } = tf.topk(preds, 10)


  ui.showPreds([…values.dataSync()], […indices.dataSync()]) 

})

The Colab then starts a webserver and tunnels the port so you can scan a QR code on a mobile phone and directly connect to the demo. Even though the training reports around 99.1% accuracy on the test set, you’ll see that the model can easily be fooled with digits that are easy to recognize for the human eye, but hard for a model that has only seen digits from the MNIST dataset (Figure 3).

Figure 3. Our model from the Colab with 99.1% accuracy on the MNIST test dataset is still surprisingly bad at recognizing hand-written digits. On the left, the model predicts all kinds of digits instead of “one”. On the right side, the “one” is drawn more like the data from the training set.

Example 3: LiT Demo

Writing a more realistic application with a TensorFlow.js model is a bit more involved. This section goes through the main steps that were used to create the demo app from the Google AI blog post Locked-Image Tuning: Adding Language Understanding to Image Models. Refer to that post for technical details on the implementation of the ML model. Also make sure to check out the final LiT Demo.

Adapting the model

Before starting to implement an ML demo, it’s a good moment to think carefully about the different options and their respective strengths and weaknesses.
At a high level, you have two options: running the ML model on server-side infrastructure, or running the ML model on the edge (i.e. on the visiting user’s device).
  • Running a model on a server has the advantage that it can use exactly the same framework / code that was used to develop the model. There are libraries like Streamlit or Gradio that make it very easy to quickly build interactive web apps around such centrally-hosted models. The servers running the model can be rather powerful, using lots of RAM and accelerators to run state-of-the-art ML models in near-real time, and such a website can be loaded even by the smallest mobile device.
  • Running the demo on-device puts a limit on the size of the model that you can use, but comes with convincing advantages:
    • No data is ever sent off the device, which is desirable both for privacy reasons and to bring down latency.
    • Free scaling: For instance, a normal webserver (such as one running on GitHub Pages) can serve hundreds or thousands of users simultaneously free of charge. And running a powerful model on server-side infrastructure at this scale would be very expensive (massive compute is not cheap).
The model you use for the demo consists of two parts: an image encoder, and a text encoder (see Figure 4).
For computing image embeddings you use a large model, and for text embeddings—a small model. To make the demo run faster and produce better results, the expensive image embeddings are pre-computed, so the Tensorflow.js model only needs to compute the text embeddings and then compare the image and text embeddings to compute similarities.
Figure 4. Image/text models like LiT (or CLIP) consist of two encoders that can be used separately to create vector representations of images and texts. Usually both image and text encoders are of similar size (LiT-B16B model, left image). For the demo, we precompute image embeddings using a large image encoder, and then run inference on the text on-device using a tiny text encoder (LiT-L16Ti model, right image).

For the demo, we now get those powerful ViT-Large image representations for free, because we can precompute them for all demo images. This allows us to make for a compelling demo with a limited compute budget. In addition to the “tiny” text encoder, we have also prepared a “small” text encoder for the same image embeddings (LiT-L16S), which performs a bit better, but uses more bandwidth to download the model weights, and requires more GPU memory to run on-device. We have evaluated the different models with the code from this Colab:

Image encoder

Text encoder

Zeroshot performance

Model

Params

FLOPs

Params

FLOPs

CIFAR-100

ImageNet

LiT-B16B

86M (344 MB)

36B

109M (436 MB)

2.7B

79.2%

71.7%

LiT-L16S  (“small” text encoder)

303M (1.2 GB)

123B

28M (111 MB)

0.7B

75.8%

60.7%

LiT-L16Ti (“tiny” text encoder)

303M (1.2 GB)

123B

9M (36 MB)

0.2B

73.2%

53.4%

Note though that the “zeroshot performance” should only be taken as a proxy. In the end, the model performance needs to be good enough for the demo, and in this case our manual testing showed that even the tiny text transformer was able to compute similarities good enough for the demo. Next, we tested the performance of the tiny and small text encoders using this TensorFlow.js benchmark tool on different platforms (using the “custom model” option, and benchmarking 5×16 tokens on the WebGL backend):

LiT-L16T (“tiny” text encoder) – benchmark

LiT-L16S (“small” text encoder) – benchmark

Load time

Warmup

Average/10

Peak memory

Load time

Warmup

Average/10

Peak memory

MacBook Pro (Intel i7 2.6GHz / Radeon Pro 5300M)

1.1s

0.15s

0.12s

33.9 MB

3.9s

0.8s

0.8s

122 MB

iPad Air (4th gen)

1.3s

0.6s

0.5s

33.9 MB

2.7s

2.4s

2.5s

141 MB

Samsung S21 G5 (cell phone)

2.0s

1.3s

1.1s

33.9 MB

Note that the results for the model with the “small” text encoder are missing for “Samsung S21 G5” in the above table because the model did not fit into memory. In terms of performance, the model with the “tiny” text encoder produces results within approximately 0.1-1 seconds, which still feels quite responsive, even on the smallest platform tested.

The Lit-LiT web app 

Preparing the model for this application is a bit more complicated, because we need not only convert the text transformer model weights, but also a matching tokenizer, and the precomputed image embeddings. The Colab loads a LiT model and showcases how to use it, and then prepares contents needed by the web app:

  1. The tiny/small text encoder converted to TensorFlow.js and the matching tokenizer vocabulary.
  2. Images in JPG format, as seen by the model (in particular, this means a fixed 224×224 pixel crop)
  3. Pre-computed image embeddings (since the converted model will only be able to compute embeddings for the texts).
  4. A selection of example prompts for every image. The embeddings of these prompts are also precomputed to allow to show precomputed answers if the prompts are not modified.

These files are prepared inside the data/ directory and then downloaded as a ZIP file. This file can then be uploaded to a web hosting, from where it is loaded by the web app (for example on GitHub Pages: vision_transformer/lit/data).

The code for the entire client-side application is available on Github: https://github.com/google-research/big_vision/tree/main/ui/lit_demo/

The application is built using Lit web components. The main index.html declares the demo application:

<lit-demo-app></lit-demo-app>

This web component is defined in lit-demo-app.ts in the src/components subdirectory, next to all the other web components (image carousel, model controls etc).

For the actual computation of image/text similarities, the component image-prompts.ts calls functions from the module src/lit_demo/compute.ts, which wraps all the TensorFlow.js specific code.

export class Model {

  /** Tokenizes text. */

  tokenize(texts: string[]): tf.Tensor { /* … */ }

  /** Computes text embeddings. */

  embed(tokens: tf.Tensor): tf.Tensor {

    return this.model!.execute({inputs: tokens}) as tf.Tensor;

  }

  /** Computes similarities texts / pre-computed image embeddings. */

  computeSimilarities(texts: string[], imgidxs: number[]) {

    const textEmbeddings = this.embed(this.tokenize(texts));

    const imageEmbeddingsTransposed = tf.transpose(

        tf.concat(imgidxs.map(idx => tf.slice(this.zimgs!, idx, 1))));

    return tf.matMul(textEmbeddings, imageEmbeddingsTransposed);

  }

  /** Applies softmax to `computeSimilarities()`. */

  computeProbabilities(texts: string[], imgidx: number): number[] {

    const sims = this.computeSimilarities(texts, [imgidx]);

    const row = tf.squeeze(tf.slice(tf.transpose(sims), 0, 1));

    return […tf.softmax(tf.mul(this.def!.temperature, row)).dataSync()];

  }

}

The parent directory of the data/ exported by the Colab above is referenced via the baseUrl in the file src/lit/constants.ts. By default it refers to the models from the official demo. When replacing the baseUrl with a different server, make sure to enable cross origin resource sharing.

In addition to the complete application, it’s also possible to export the functional parts without the UI as a single JavaScript file that can be linked statically. See the file playground.html as an example, and refer to the instructions in README.md for how to compile the entire application or the functional part before deploying the application.

<!– Loads global symbol `lit`. –>

<script src=“exports_bin.js”></script>

<script>

async function demo() {

  lit.setBaseUrl(‘https://google-research.github.io/vision_transformer/lit’);

  const model = new lit.Model(‘tiny’);

  await model.load();

  console.log(model.computeProbabilities([‘a dog’, ‘a cat’], /*imgIdx=*/1);

}

demo();

</script>

Conclusion

In this article you learned how to convert JAX functions and Flax models into the TensorFlow.js format that can be executed in a browser or on devices capable of running JavaScript.

The first example demonstrated how to convert a JAX function to a TensorFlow.js model, which can then be loaded in Colab for verification, or run on any device with a modern web browser – this is an exactly the same conversion that can be applied to more complex Flax models. The second example showed how to train an ML model in Colab, and test it interactively on a mobile phone.The third example provided a full template for running an on-device ML model (check out the live demo). We hope that this application can serve you as a good starting point for your own client-side demos using JAX models with TensorFlow.js.

Read More

Fraunhofer Research Leads Way Into Future of Robotics

Joseph Fraunhofer was a 19th-century pioneer in optics who brought together scientific research with industrial applications. Fast forward to today and Germany’s Fraunhofer Society — Europe’s largest R&D organization — is setting its sights on the applied research of key technologies, from AI to cybersecurity to medicine.

Its Fraunhofer IML unit is aiming to push the boundaries of logistics and robotics. The German researchers are harnessing NVIDIA Isaac Sim to make advances in robot design through simulation.

Like many — including BMW, Amazon and Siemens — Fraunhofer IML relies on NVIDIA Omniverse. It’s using it to make gains in applied research in logistics for fulfillment and manufacturing.

Fraunhofer’s newest innovation, dubbed O3dyn, uses NVIDIA simulation and robotics technologies to create an indoor-outdoor autonomous mobile robot (AMR).

Its goal is to enable the jump from automated guided vehicles to fast-moving AMRs that aren’t even yet available on the market.

This level of automation advancement promises a massive uptick in logistics acceleration.

“We’re looking at how we can go as fast and as safely as possible in logistics scenarios,” said Julian Eber, a robotics and AI researcher at Fraunhofer IML.

From MP3s to AMRs

Fraunhofer IML’s parent organization, based in Dortmund, near the country’s center, has more than 30,000 employees and is involved in hundreds of research projects. In the 1990s, it was responsible for the development of the MP3 file format, which led to the digital music revolution.

Seeking to send the automated guided vehicle along the same path as the compact disc, Fraunhofer in 2013 launched a breakthrough robot now widely used by BMW in its assembly plants and others.

This robot, known as the STR, is a workhorse for industrial manufacturing. It’s used for moving goods for the production lines. Fraunhofer IML’s AI work benefits the STR and other updates to this robotics platform, such as the O3dyn.

Fraunhofer IML is aiming to create AMRs that deliver a new state of the art. The O3dyn relies on the NVIDIA Jetson edge AI and robotics platform for a multitude of camera and sensor inputs to help navigate.

Advancing speed and agility, it’s capable of going up to 30 miles per hour and has wheels assisted by AI for any direction of movement to maneuver tight situations.

“The omnidirectional dynamics is very unique, and there’s nothing like this that we know of in the market,” said Sören Kerner, head of AI and autonomous systems at Fraunhofer IML.

Fraunhofer IML gave a sneak peek at its latest development on this pallet-moving robot at NVIDIA GTC.

Bridging Sim to Real

Using Isaac Sim, Fraunhofer IML’s latest research strives to develop and validate these AMRs in simulation by closing the sim-to-real gap. The researchers rely on Isaac Sim for virtual development of its highly dynamic autonomous mobile robot by exercising the robot in photorealistic, physically accurate 3D worlds.

This enables Fraunhofer to import into the virtual environment its robot’s more than 5,400 parts from computer-aided design software. It can then rig them with physically accurate specifications with Omniverse PhysX.

The result is that the virtual robot version can move as swiftly in simulation as the physical robot in the real world. Harnessing the virtual environment allows Fraunhofer to accelerate development, safely increase accuracy for real-world deployment and scale up faster.

Minimizing the sim-to-real gap makes simulation become a digital reality for robots. It’s a concept Fraunhofer refers to as simulation-based AI.

To make faster gains, Fraunhofer is releasing the AMR simulation model into open source so developers can make improvements.

“This is important for the future of logistics,” said Kerner. “We want to have as many people as possible work on the localization, navigation and AI of these kinds of dynamic robots in simulation.”

Learn more by watching Fraunhofer’s GTC session: “Towards a Digital Reality in Logistics Automation: Optimization of Sim-to-Real.”

Register for the upcoming GTC, running Sept. 19-22, and explore the robotics-related sessions.

 

The post Fraunhofer Research Leads Way Into Future of Robotics appeared first on NVIDIA Blog.

Read More

UN Economic Commission for Africa Engages NVIDIA to Boost Data Science in 10 Nations

NVIDIA is collaborating with the United Nations Economic Commission for Africa (UNECA) to equip governments and developer communities in 10 nations with data science training and technology to support more informed policymaking and accelerate how resources are allocated.

The initiative will empower the countries’ national statistical offices — agencies that handle population censuses data, economic policies, healthcare and more — by providing AI hardware, training for data scientists and ecosystem support.

Known as the United AI Alliance, the initiative is led by the UNECA, the Global Partnership for Sustainable Development Data (the Global Partnership), which facilitates data partnerships for public good, and NVIDIA. Future Tech, a Long Island, New York-based IT solution provider and member of the NVIDIA Partner Network, is the Alliance’s inaugural funding and global distribution partner.

“Population data is critical information for policy decisions, whether it’s for urban planning, climate action or monitoring the spread of COVID-19,” said Oliver Chinganya, Director of the African Centre for Statistics at UNECA. “Without a strong digital infrastructure, many of these nations struggled to collect and report data during the pandemic.”

Better public health data can help countries track real-time COVID infection rates, detect hotspots and target their response efforts. And beyond the pandemic, strengthening data systems will allow local experts to connect population statistics to agricultural data, climate trends and economic indicators.

Laying the Groundwork for Long-Term Benefits

Future Tech is covering the cost of procurement and overseeing the distribution and deployment of NVIDIA-Certified Systems and data science workstations powered by NVIDIA RTX and NVIDIA Quadro RTX GPUs for each country — starting with Ghana, Kenya, Rwanda, Senegal and Sierra Leone. Up next will be Guinea, Mali, Nigeria, Somalia and Togo.

“Public-sector institutions play a critical role in providing the data used for policymaking at all levels. But often they face huge gaps in infrastructure and expertise required to tap the benefits of the data revolution,” said Future Tech founder and CEO Bob Venero.

To further support the countries’ data science capabilities, NVIDIA is teaming up with local universities, research institutes and data science communities to build a pipeline of developers that can extract insights from census information and other data sources.

“This is the first time many of these countries will be digitizing their census efforts, which represents a potential goldmine of data,” said Keith Strier, VP of AI Nations at NVIDIA. “By connecting these efforts with the local developer ecosystem, we can help more organizations harness this for the benefit of society.”

NVIDIA is putting together a curriculum of free Deep Learning Institute courses — starting with fundamentals such as accelerated computing with CUDA Python and accelerated data science workflows — tailored to the needs of each country’s national statistical office. It’s also providing access to workshops and data science teaching kits for each of the nations.

This work extends the company’s support of AI and data science in Africa through the NVIDIA Inception startup program and the NVIDIA Emerging Chapters initiative, which bolsters developer communities in emerging markets with education and technical resources.

Using Data to Drive Environmental and Social Progress

Around the world, the pandemic has accelerated the transition to digitization. The United AI Alliance is supporting this transformation by working with grassroots groups at the core of AI development in Africa, with the goal of enabling data practitioners in every region to build meaningful solutions to local challenges.

Many of the continent’s developers are part of local technology communities, including groups like the Kenya-based AI Center of Excellence or nonprofit organization Data Science Africa. United AI Alliance is pairing many of these developers with governments to drive new data analysis projects.

“Many countries are still excluded from using big data, AI and digital technologies to improve the quality of information for making decisions,” said Claire Melamed, CEO of the Global Partnership. “Together we can change that and collaborate to support data-driven progress toward the Sustainable Development Goals.”

While the project’s initial focus is in Africa, the collaborators plan to roll out the same model in Southeast Asia and Latin America.

To learn more about this initiative, watch the replay session of “Democratizing AI in Emerging Markets through the United AI Alliance” from NVIDIA GTC and visit the United AI Alliance site.

Learn more about the NVIDIA Emerging Chapters, NVIDIA Developer and NVIDIA Inception programs, and register free for NVIDIA GTC, running online Sept. 19-22.

Main image shows (L to R) Jean Paul Ngom and Ibrahima Diop, of Senegal’s national statistical office, working with an NVIDIA GPU-powered mobile workstation.

The post UN Economic Commission for Africa Engages NVIDIA to Boost Data Science in 10 Nations appeared first on NVIDIA Blog.

Read More

OBS Studio to Release Software Update 28.0 With NVIDIA Broadcast Features ‘In the NVIDIA Studio’

Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates how NVIDIA Studio technology accelerates creative workflows. 

In the NVIDIA Studio celebrates the Open Broadcaster Software (OBS) Studio’s 10th anniversary and its 28.0 software release. Plus, popular streamer WATCHHOLLIE shares how she uses OBS and a GeForce RTX 3080 GPU in a single-PC setup to elevate her livestreams.

The OBS release, available starting later today, offers livestreamers new features, including native integration of the AI-powered NVIDIA Broadcast virtual background and room echo removal effects, along with support for high-efficiency video coding (HEVC or H.265) and high-dynamic range (HDR).

NVIDIA also worked with Google to enable live streaming of HEVC and HDR content to YouTube using the NVIDIA Encoder (NVENC) — dedicated hardware on GeForce RTX GPUs that handles video encoding without taking resources away from the game.

OBS Bliss

OBS 28.0 has launched with a host of updates and improvements — including two NVIDIA Broadcast features exclusive to GeForce RTX streamers.

AI-powered virtual background enables streamers to remove or replace their backgrounds without the need for a physical green screen. Room echo removal eliminates unwanted echoes during streaming sessions. This can come handy when using a desktop mic or if in a room with a bit of echo. Both effects — as well as noise removal — are available within OBS as filters, giving users more flexibility to apply them per source.

HEVC support in OBS 28.0 improves video compression by 15%. The implementation, built specifically for hardware-based HEVC encoders, enables users to record content and stream to supported platforms like YouTube with better quality.

The update also enables recording and streaming in HDR, which offers a greater range of bright and dark colors on a display, adding a stunning vibrance and dramatic improvement in visual quality.

Gamers can now livestream with vibrant HDR colors in titles like Marvel’s “Guardians of the Galaxy.”

Previously, users had to turn HDR off, since 10- or 12-bit HDR content would look washed out if it was recorded at 8-bit. With this update, users can keep HDR on and decide whether they want to capture or stream in SDR, or full HDR.

As HDR displays become more popular, and with Windows 11’s new Auto HDR tool — which enables many games to be displayed in a virtual HDR mode — more users can benefit from this OBS update.

Turn on HDR in Windows 11 to simultaneously game and stream in amazing visual fidelity.

YouTube Live is one of the first platforms to support HEVC and HDR streaming. Streamers on the platform can give their audience higher quality streams, with only a few clicks.

Setup is quick and easy. Open this NVENC OBS guide and navigate to the “Recording and Streaming with HEVC and HDR” section to find directions.

Go Live With WATCHHOLLIE

When WATCHHOLLIE started her channel, she had to learn a whole new way of creating content. Now, she’s teaching the next generation of streamers and showcasing how NVIDIA broadcasting tech helps her workflow.

Trained as a video editor, WATCHHOLLIE experimented with a YouTube channel before discovering Twitch as a way to get back into gaming. During the pandemic, Hollie used streaming as a way to stay in touch with friends.

“I was trapped in an 800-square-foot apartment, and I wanted to hang out with friends,” she said. “So I started streaming every day.”

Her streams promote mental health awareness and inclusivity, establishing a safe place for members of the LGBTQ+ community like herself.

Hollie’s first streams were simple, using OBS on a Mac with an external capture card to share console gameplay.

“I thought, ‘I hope someone shows up,’ and few people did,” she said.

She kept at it between video-editing freelance work, asking friends to tune into her streams and provide advice.

To support her daily streaming schedule, Hollie built her first PC — initially with a GeForce RTX 2070 GPU and later upgrading to a GeForce RTX 3080 GPU. GeForce GPUs include NVENC as a discrete encoder that enables seamless gaming and streaming with maximum performance, even on a single-PC setup like WATCHOLLIE’s. NVENC’s advanced GPU encoding adds higher video quality for recorded videos, as well.

Streamer WATCHHOLLIE.

Soon, Hollie’s channel reached Affiliate status, which opened up monetization opportunities. She began accepting fewer freelance-editing gigs, and even turned down a full-time job offer. “I asked my mom what I should do, and she said, ‘This is your chance, you should take it,’” WATCHHOLLIE said.

Since achieving Partner status, she’s worked on her channel full time. “I’m very happy, and I feel like I’m making a difference with how I stream,” she said.

WATCHHOLLIE’s now turned her attention to helping new streamers get started. She founded WatchUs, a diversity-focused team that teaches aspiring creators how to grow their business, develop brand partnerships and improve their streaming setup.

“With WatchUs, I can help people and guide them through something that I didn’t have or was too scared to ask about,” she said.

More than 20 streamers were selected to join the team from 200+ applicants. They receive mentorship from coaches from all walks of life, about every facet of streaming. The group focuses on “education, not ego,” in WATCHHOLLIE’s words, and the team plans to reopen applications soon.

When asked what it takes to be a successful streamer, Hollie didn’t hesitate to answer: “Stick to a schedule and play what you love. Don’t wait for people to talk to you to be entertaining — be the entertainment first, and then people will want to talk to you.”

Follow and subscribe to WATCHHOLLIE’s social media channels.

Step Into the NVIDIA Studio

Just as WATCHHOLLIE has grown in her creative journey, the NVIDIA Studio team wants to see all artists’ personal growth. Amaze, or be amazed, as creatives share old and new works for the #CreatorJourneyChallenge across social media. Many extraordinary pieces have been shared so far.

To get in on the fun, simply provide an older piece of artwork alongside a more recent one to highlight your growth as an artist. Follow and tag NVIDIA Studio on Instagram, Twitter or Facebook, and use the #CreatorsJourneyChallenge tag for a chance to be showcased on NVIDIA Studio social media channels.

Access tutorials on the Studio YouTube channel and get updates directly in your inbox by subscribing to the NVIDIA Studio newsletter.

The post OBS Studio to Release Software Update 28.0 With NVIDIA Broadcast Features ‘In the NVIDIA Studio’ appeared first on NVIDIA Blog.

Read More

Rendered.ai Founder and CEO Nathan Kundtz on Using AI to Build Better AI

Data is the fuel that makes artificial intelligence run.

Training machine learning and AI systems requires data. And the quality of datasets has a big impact on the systems’ results.

But compiling quality real-world data for AI and ML can be difficult and expensive.

That’s where synthetic data comes in.

The guest for this week’s AI Podcast episode, Nathan Kundtz, is founder and CEO of Rendered.ai, a platform as a service for creating synthetic data to train AI models. The company is also a member of NVIDIA Inception, a free, global program that nurtures cutting-edge startups.

Kundtz is a physicist by training, holds a Ph.D. from Duke University and previously founded Cometa, a hybrid satellite cellular network company.

Our host, Noah Kravitz, spoke to Kundtz about how AI can be used to generate the data needed to create better AI.

You Might Also Like

Artem Cherkasov and Olexandr Isayev on Democratizing Drug Discovery With NVIDIA GPUs

It may seem intuitive that AI and deep learning can speed up workflows — including novel drug discovery, a typically years-long and several-billion-dollar endeavor. However, there is a dearth of recent research reviewing how accelerated computing can impact the process. Professors Artem Cherkasov and Olexandr Isayev discuss how GPUs can help democratize drug discovery.

Lending a Helping Hand: Jules Anh Tuan Nguyen on Building a Neuroprosthetic

Is it possible to manipulate things with your mind? Possibly. University of Minnesota postdoctoral researcher Jules Anh Tuan Nguyen discusses allowing amputees to control their prosthetic limbs with their thoughts, using neural decoders and deep learning.

Wild Things: 3D Reconstructions of Endangered Species With NVIDIA’s Sifei Liu

Studying endangered species can be difficult, as they’re elusive, and the act of observing them can disrupt their lives. Sifei Liu, a senior research scientist at NVIDIA, discusses how scientists can avoid these pitfalls by studying AI-generated 3D representations of these endangered species.

Subscribe to the AI Podcast: Now Available on Amazon Music

You can now listen to the AI Podcast through Amazon Music.

Also get the AI Podcast through iTunes, Google Podcasts, Google Play, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn.

Make the AI Podcast better: Have a few minutes to spare? Fill out our listener survey.

The post Rendered.ai Founder and CEO Nathan Kundtz on Using AI to Build Better AI appeared first on NVIDIA Blog.

Read More