Cattle-ist for the Future: Plainsight Revolutionizes Livestock Management with AI

Computer vision and edge AI are looking beyond the pasture.

Plainsight, a San Francisco-based startup and NVIDIA Metropolis partner, is helping the meat processing industry improve its operations — from farms to forks. By pairing Plainsight’s vision AI platform and NVIDIA GPUs to develop video analytics applications, the company’s system performs precision livestock counting and health monitoring.

With animals such as cows that look so similar, frequently shoulder-to-shoulder and moving quickly, inaccuracies in livestock counts are common in the cattle industry, and often costly.

On average, the cost of a cow in the U.S. is between $980 and $1,200, and facilities process anywhere between 1,000 to 5,000 cows per day. At this scale, even a small percentage of inaccurate counts equates to hundreds of millions of dollars in financial losses, nationally.

“By applying computer vision powered by edge AI and NVIDIA Metropolis, we’re able to automate what has traditionally been a very manual process and remove the uncertainty that comes with human counting,” said Carlos Anchia, CEO of Plainsight. “Organizations begin to optimize existing revenue streams when accuracy can be operationally efficient.”

Plainsight is working with JBS USA, one of the world’s largest food companies, to integrate vision AI into its operational processes. Vision AI-powered cattle counting was among the first solutions to be implemented.

At JBS, cows are counted by fixed-position cameras, connected via a secured private network to Plainsight’s vision AI edge application, which detects, tracks and counts the cows as they move past.

Plainsight’s models count livestock with over 99.5 percent accuracy — about two percentage points better than manual livestock counting by humans in the same conditions, according to the company.

 

For a vision AI solution to be widely adopted by an organization, the accuracy must be higher than humans performing the same activity. By monitoring and tracking each individual animal, the models simplify an otherwise complex process.

Highly robust and accurate computer vision models are only a portion of the cattle counting solution. Through continued collaboration with JBS’s operations and innovation teams, Plainsight provided a path to the digital transformation required to more accurately provide accountability when receiving livestock at scale and thus ensure that the payment for livestock received is as accurate as possible.

Higher Accuracy with GPMoos

For JBS, the initial proof of value involved building models and deploying on an NVIDIA Jetson AGX Xavier Developer Kit.

After quickly achieving nearly 100 percent accuracy levels with their models, the teams moved into a full pilot phase. To augment the model to handle new and often challenging environmental conditions, Plainsight’s AI platform was used to quickly and easily annotate, build and deploy model improvements in preparation for a nationwide rollout.

As a member partner of NVIDIA Metropolis, an application framework that brings visual data and AI together, Plainsight continues to develop and improve models and AI pipelines to enable a national rollout with the U.S. division of JBS.

There, Plainsight uses a technology stack built on the NVIDIA EGX platform, incorporating NVIDIA-Certified Systems with NVIDIA T4 GPUs. Plainsight’s application processes multiple video streams per GPU in real time to count and monitor livestock as part of managing the accounting of livestock when received.

“Innovation is fundamental to the JBS culture, and the application of AI technology to improve efficiencies for daily work routines is important,” said Frederico Scarin do Amaral, Senior Manager Business Solutions of JBS USA. “Our partnership with Plainsight enhances the work of our team members and ensures greater accuracy of livestock count, improving our operations and efficiency, as well as allowing for continual improvements of animal welfare.”

Milking It

Inaccurate counting is only part of the problem the industry faces, however. Visual inspection of livestock is arduous and error-prone, causing late detection of diseases and increasing health risks to other animals.

The same symptoms humans can identify by looking at an animal, such as gait and abnormal behavior, can be approximated by computer vision models built, trained and managed through Plainsight’s vision AI platform.

The models identify symptoms of particular diseases, based on the gait and anomalies in how livestock behave when exiting transport vehicles, in a pen or in feeding areas.

“The cameras are an unblinking source of truth that can be very useful in identifying and alerting to problems otherwise gone unnoticed,” said Anchia. “The combination of vision AI, cameras and Plainsight’s AI platform can help enhance the vigilance of all participants in the cattle supply chain so they can focus more on their business operations and animal welfare improvements as opposed to error-prone manual counting.”

Legen-Dairy

In addition to a variety of other smart agriculture applications, Plainsight is using its vision AI platform to monitor and track cattle on the blockchain as digital assets.

The company is engaged in a first-of-its-kind co-innovation project with CattlePass, a platform that generates a secure and unique digital record of individual livestock, also known as a non-fungible token, or NFT.

Plainsight is applying its advanced vision AI models and platform for monitoring cattle health. The suite of advanced technologies, including genomics, health and proof-of-life records, will provide 100 percent verifiable proof of ownership and transparency into a complete living record of the animal throughout its lifecycle.

Cattle ranchers will be able to store the NFTs in a private digital wallet while collecting and adding metadata: feed, heartbeat, health, etc. This data can then be shared with permissioned viewers such as inspectors, buyers, vets and processors.

The data will remain with each animal throughout its life through harvest, and data will be provided with a unique QR code printed on the beef packaging. This will allow for visibility into the proof of life and quality of each cow, giving consumers unprecedented knowledge about its origin.

The post Cattle-ist for the Future: Plainsight Revolutionizes Livestock Management with AI appeared first on The Official NVIDIA Blog.

Read More

Scale your Amazon Kendra index

Amazon Kendra is a fully managed, intelligent search service powered by machine learning. Amazon Kendra reimagines enterprise search for your websites and applications so your employees and customers can easily find the content they’re looking for. Using keyword or natural language queries, employees and customers can find the right content even when it’s scattered across multiple locations and content repositories within your organization.

Although Amazon Kendra is designed for large-scale search applications with millions of documents and thousands of queries per second, you can run smaller experiments to evaluate Amazon Kendra. You can run a proof of concept, or simply have a smaller workload and still use features that Amazon Kendra Enterprise Edition has to offer. On July 1, 2021, Amazon Kendra introduced new, smaller capacity units for smaller workloads. In addition, to promote experimentation, the price for Amazon Kendra Developer Edition was reduced by 55%.

Amazon Kendra Enterprise Edition capacity units

The base capacity for Amazon Kendra supports up to 100,000 documents and 8,000 searches per day, with adaptive bursting capability to better handle unpredictable query spikes. You can increase the query and the document capacity of your Amazon Kendra index through storage capacity units and query capacity units, and these can be updated independently from each other.

Storage capacity units offer scaling in increments of 100,000 documents (up to 30 GB storage), each. For example, if you need to index 1 million documents, you need nine storage capacity units (100,000 documents with base Amazon Kendra Enterprise Edition, and, 900,00 additional documents from the storage capacity units).

Query capacity units (QCUs) offer scaling increments of 8,000 searches for day, with built-in adaptive bursting. For example, if you need 16K queries per day (average QPS of 0.2) you can provision two units.

For more information about the maximum number of storage capacity units and query capacity units available for a single index, see Quotas for Amazon Kendra.

About capacity bursting

Amazon Kendra has a provisioned base capacity of one query capacity unit. You can use up to 8,000 queries per day with a minimum throughput of 0.1 queries per second (per query capacity unit).

An adaptive approach to handling unexpected traffic beyond the provisioned throughput is to use the built-in adaptive query bursting feature in Amazon Kendra. This allows you to apply unused query capacity to handle unexpected traffic. Amazon Kendra accumulates your unused queries at your provisioned queries per second rate, every second, up to the maximum number of queries you’ve provisioned for your Amazon Kendra index. These accumulated queries are automatically used to help handle unexpected traffic spikes above the currently allocated QPS capacity.

Optimal performance of adaptive query bursting can vary, depending on several factors such as your total index size, query complexity, accumulated unused queries, and overall load on your index. We recommend performing your own load tests to accurately measure bursting capacity.

Best practices

When dimensioning your Amazon Kendra index, you need to consider how many documents you’re indexing, how many queries you expect per day, how many queries per second you need to accommodate, and if you have usage patterns that require additional capacity due to sustained usage. You could also experience short peak times where you can accommodate brief periods of time for additional QPS requirements.

It’s therefore good practice to observe your query usage patterns for a few weeks, especially when the patterns are not easily predictable. This will allow you to define an optimal balance between using the built-in adaptive bursting capability for short, unsustained QPS peaks, and adding/removing capacity units to better handle longer, more sustained peaks and lows.

For information about visualizing and building a rough estimate of your usage patterns in Amazon Kendra, see Automatically scale Amazon Kendra query capacity units with Amazon EventBridge and AWS Lambda.

Amazon Kendra Enterprise Edition allows you to add document storage capacity in units of 100,000 documents with maximum storage of 30 GB. You can add and remove storage capacity at any time, but you can’t remove storage capacity beyond your used capacity (number of documents ingested or storage space used). We recommend estimating how often documents are added to your data sources in order to determine when to increase storage capacity in your Amazon Kendra through storage capacity units. You can monitor the document count with Amazon CloudWatch or on the Amazon Kendra console.

Queries per second represent the number of concurrent queries your Amazon Kendra index receives at a given time. If you’re replacing a search solution with Amazon Kendra, you should be able to retrieve this information from query logs. If you exceed your provisioned and bursting capacity, your request may receive a 400 HTTP status code (client error) with the message ThrottlingException. For example, using the AWS SDK for Python (Boto3), you may receive an exception like the following:

ThrottlingException: An error occurred (ThrottlingException) when calling the Query operation (reached max retries: 4)

For cases like this, Boto3 includes the retries feature, which retries the query call (in this case to Amazon Kendra) after obtaining an exception. If you aren’t using an AWS SDK, you may need to implement an error handling mechanism that, for example, could use exponential backoff to handle this error.

You can monitor your Amazon Kendra index queries with CloudWatch metrics. For example, you could follow the metric IndexQueryCount, which represents the number of index queries per minute. If you want to use the IndexQueryCount metric, you should divide that number by 60 to obtain the average queries per second. Additionally, you can get a report of the queries per second on the Amazon Kendra console, as shown in the following screenshot.

The preceding graph shows three patterns:

  • Peaks of ˜2.5 QPS during business hours, between 8 AM and 8 PM.
  • Sustained QPS usage over ˜0.5 QPS and below 1 QPS between 8 PM and 8 AM.
  • Less than 0.3 QPS usage on the weekend (Feb 7, 2021, was a Sunday and Feb 13,2021 was a Saturday)

Taking into account these capacity requirements, you could start defining your Amazon Kendra index additional capacity units as follows:

  • For the high usage times (between 8 AM and 8 PM Monday through Friday), your Amazon Kendra index adds 24 VQUs (each query capacity unit provides capacity for at least 0.1 QPS) which when added to the initial Amazon Kendra Enterprise Edition query capacity (0.1 QPS), can support 2.5 queries per second
  • For the second usage pattern (Monday through Friday from 8 PM until 8 AM), you add four VQUs, which when combined with your initial Amazon Kendra Enterprise Edition (0.1 QPS), provides capacity for 0.5 QPS.
  • For the weekends, you add two VQUs, provisioning capacity for 0.3 QPS.

The following table summarizes this configuration.

Period Additional VQUS Capacity (Without Bursting)
Mon – Fri 8 AM – 8 PM 24 2.5 QPS
Mon – Fri 8 PM – 8 AM 4 0.5 QPS
Sat – Sun 2 0.3 QPS

You can use this initial approach to define a baseline that needs to be reevaluated to ensure the right sizing of your Amazon Kendra resources.

It’s also important to keep in mind that query autocomplete capacity is defined by your query capacity. Query autocomplete capacity is calculated as five times the provisioned query capacity for an index with a base capacity of 2.5 calls per second. This means that if your Amazon Kendra index query capacity is below 0.6 QPS, you have 2.5 QPS for query autocomplete. If your Amazon Kendra index query capacity is above 0.6 QPS, your query autocomplete capacity is calculated as 2.5 times your current index query capacity.

Conclusion

In this blog post you learned how to estimate capacity and scale for your Amazon Kendra index.

Now it’s easier than ever to experience Amazon Kendra, with 750 hours of Free Tier and the new reduced price for Amazon Kendra Developer Edition. Get started, visit our workshop, or check out the AWS Machine Learning Blog.


About the Author

Dr. Andrew Kane is an AWS Principal Specialist Solutions Architect based out of London. He focuses on the AWS Language and Vision AI services, helping our customers architect multiple AI services into a single use-case driven solution. Before joining AWS at the beginning of 2015, Andrew spent two decades working in the fields of signal processing, financial payments systems, weapons tracking, and editorial and publishing systems. He is a keen karate enthusiast (just one belt away from Black Belt) and is also an avid home-brewer, using automated brewing hardware and other IoT sensors.

 

Tapodipta Ghosh is a Senior Architect. He leads the Content And Knowledge Engineering Machine Learning team that focuses on building models related to AWS Technical Content. He also helps our customers with AI/ML strategy and implementation using our AI Language services like Amazon Kendra.

 

 

Jean-Pierre Dodel leads product management for Amazon Kendra, a new ML-powered enterprise search service from AWS. He brings 15 years of Enterprise Search and ML solutions experience to the team, having worked at Autonomy, HP, and search startups for many years prior to joining Amazon four years ago. JP has led the Amazon Kendra team from its inception, defining vision, roadmaps, and delivering transformative semantic search capabilities to customers like Dow Jones, Liberty Mutual, 3M, and PwC.

 

Juan Bustos is an AI Services Specialist Solutions Architect at Amazon Web Services, based in Dallas, TX. Outside of work, he loves spending time writing and playing music as well as trying random restaurants with his family.

Read More

Archaeologist Digs Into Photogrammetry, Creates 3D Models With NVIDIA Technology

Archaeologist Daria Dabal is bringing the past to life, with an assist from NVIDIA technology.

Dabal works on various archaeological sites in the U.K., conducting field and post-excavation work. Over the last five years, photogrammetry — the use of photographs to create fully textured 3D models — has become increasingly popular in archaeology. Dabal has been expanding her skills in this area to create and render high-quality models of artifacts and sites.

With the help of her partner, Simon Kotowicz, Dabal is taking photogrammetry scans to the next level with a pair of NVIDIA graphics technologies.

Using NVIDIA RTX GPUs, Dabal can accelerate workflows to create and interact with 3D models, from recreating missing elements to adding animations or dropping them into VR. And with the NVIDIA Omniverse real-time collaboration and simulation platform, she can build stunning scenes around her projects using the platform’s library of existing assets.

All for the (Photo)Gram

Dabal quickly learned that good photogrammetry requires a good dataset. It’s important to take photos in the right pattern, so that every area of the site, monument or artifact is covered.

Once she has all the images she needs, Dabal builds the 3D models using Agisoft Metashape, a tool for photogrammetry pipelines. Dabal loads all her photos into the application, and the software turns that data into a point cloud, which is a rough collection of dots that represent the 3D model.

Recently, Dabal and Kotowicz were approached by The Flow State XR, a company that delivers immersive experiences and applications. They were tasked with a new photogrammetry project that rocked their world: creating a 3D model of The Crobar, an iconic, heavy-metal bar located in the heart of Soho in London.

The Flow State XR sent images of The Crobar to the duo, who used the photos to model the bar from scratch, then created texture maps using Adobe Photoshop, Illustrator and Substance Painter. Dabal and Kotowicz are currently finishing the 3D model, but once the project is complete, The Flow State XR plans to use it as an interactive mobile app and a VR hangout for music fans.

The Crobar VR scene. Image courtesy of Dabal.

RTX Shapes 3D Modeling Workflows

Dabal uses an NVIDIA Quadro RTX 4000 GPU to significantly speed up her 3D modeling and rendering workflows. Processing a sparse cloud model with her older generation GPU would take two days, she said. With the upgraded RTX card, a similar point cloud takes only 10 hours.

With NVIDIA RTX, the millions of points on a model can be rotated and zoomed much more easily. The team can also use their 4K monitor to view the models, which they couldn’t do previously because it was difficult to navigate around the point clouds.

Dabal and Kotowicz have also experienced faster performance in creative apps like Autodesk  3ds Max. They can iterate quicker and see textures in graphics without needing to render as often.

“The NVIDIA RTX card has helped us achieve the model we need much faster,” said Dabal. “We’re spending less time in front of the workstation, and we’re getting to the rendering stage a lot quicker now.”

Textured 3D model of the Neath Abbey Ironworks. Image courtesy of Dabal.

Omniverse Makes Space for Sharing Assets, Building Scenes 

Dabal got the chance to use the advanced features of NVIDIA Omniverse when she entered the first “Create With Marbles” design competition. After exploring the platform, Dabal sees great potential in how it will transform traditional workflows and images in archaeology.

Dabal’s submission for the NVIDIA Omniverse “Create With Marbles” design competition.

Currently, there isn’t a tool that enables archaeologists to quickly upload assets in one place, and share or collaborate with others on the same projects.

With an open platform like Omniverse, archaeologists have a virtual space where they can easily upload photogrammetry artifacts and share assets with one another, no matter what location they’re working from. Or they could place models in Omniverse and create a stunning scene by adding extra elements, like trees or farm fields.

“Right now, most archaeological 3D models look fake, floating in their black backgrounds. It would take too much time to add the extras, but it would be so easy in Omniverse,” said Dabal. “When I was in Omniverse, I really enjoyed taking premade objects and moving them around to create a scene. It was super easy.”

Dabal says that if archeologists had access to a library of extra assets, as well as all their own photogrammetry scans, “it would be game-changing.”

With Omniverse, archaeologists can share their projects with others around the world as well as simulate fire or weather conditions to bring their 3D models and sites to life.

Explore more of Dabal’s work, and learn more about NVIDIA RTX and NVIDIA Omniverse.

And join NVIDIA at SIGGRAPH, where we’ll showcase the technologies driving the future of graphics. We’ll announce the winners of the latest “Create With Marbles: Marvelous Machines” contest winner, and premiere a documentary highlighting how Omniverse was used to create the NVIDIA GTC 2021 keynote.

The post Archaeologist Digs Into Photogrammetry, Creates 3D Models With NVIDIA Technology appeared first on The Official NVIDIA Blog.

Read More

Ready for Prime Time: Plus to Deliver Autonomous Truck Systems Powered by NVIDIA DRIVE to Amazon

Your Amazon Prime delivery just got smarter.

Autonomous trucking company Plus recently signed a deal with Amazon to provide at least 1,000 self-driving systems to retrofit on the e-commerce giant’s delivery fleet. These systems are powered by NVIDIA DRIVE Xavier for high-performance, energy-efficient and centralized AI compute.

The agreement follows Plus’ announcement of going public via SPAC, or special acquisition company.

Amazon — which leads the U.S. e-tail market, counting $386 billion in net revenue in 2020 — has been investing heavily in autonomous and electric vehicle technology. Last year, it acquired robotaxi company and NVIDIA DRIVE ecosystem member Zoox for $1.3 billion.

These deals signal the transition to autonomous systems in both personal and commercial transportation at a massive scale.

An A-Plus Platform

The current generation PlusDrive autonomous trucking platform was developed for level 4 autonomous driving with a human driver still at the wheel, using NVIDIA DRIVE Xavier system-on-a-chip at its core.

Xavier is the first-ever production, automotive-grade SoC for autonomous capabilities. It incorporates six different types of processors, including a CPU, GPU, deep learning accelerator, programmable vision accelerator, image signal processor and stereo/optical flow accelerator.

Architected for safety, Xavier incorporates the redundancy and diversity necessary for safe autonomous operation.

This high-performance compute enables the PlusDrive system to perform surround perception with an array of radar, lidar and camera sensors, running a variety of deep neural networks simultaneously and in real time.

Trucking Ahead

Plus’s deal with Amazon is just the beginning of the march toward widespread autonomous delivery.

The self-driving company has already announced plans to transition to the next generation of AI compute, NVIDIA DRIVE Orin, beginning in 2022. Plus has received more than 7,000 orders and pre-orders for this upcoming system.

Additionally, Amazon has been granted a warrant to buy a 20 percent stake in Plus after they spend more than $150 million, opening up the possibility for even deeper integration of the company’s technology with the e-retailer’s delivery fleet.

And with NVIDIA DRIVE at their core, these autonomous systems will be able to handle the AI processing necessary to deliver safe, efficient and continuously improving trucks at scale.

The post Ready for Prime Time: Plus to Deliver Autonomous Truck Systems Powered by NVIDIA DRIVE to Amazon appeared first on The Official NVIDIA Blog.

Read More

Improved Detection of Elusive Polyps via Machine Learning

Posted by Yossi Matias, Vice President and Ehud Rivlin, Research Scientist, Google Research

With the increasing ability to consistently and accurately process large amounts of data, particularly visual data, computer-aided diagnostic systems are more frequently being used to assist physicians in their work. This, in turn, can lead to meaningful improvements in health care. An example of where this could be especially useful is in the diagnosis and treatment of colorectal cancer (CRC), which is especially deadly and results in over 900K deaths per year, globally. CRC originates in small pre-cancerous lesions in the colon, called polyps, the identification and removal of which is very successful in preventing CRC-related deaths.

The standard procedure used by gastroenterologists (GIs) to detect and remove polyps is the colonoscopy, and about 19 million such procedures are performed annually in the US alone. During a colonoscopy, the gastroenterologist uses a camera-containing probe to check the intestine for pre-cancerous polyps and early signs of cancer, and removes tissue that looks worrisome. However, complicating factors, such as incomplete detection (in which the polyp appears within the field of view, but is missed by the GI, perhaps due to its size or shape) and incomplete exploration (in which the polyp does not appear in the camera’s field of view), can lead to a high fraction of missed polyps. In fact, studies suggest that 22%–28% of polyps are missed during colonoscopies, of which 20%–24% have the potential to become cancerous (adenomas).

Today, we are sharing progress made in using machine learning (ML) to help GIs fight colorectal cancer by making colonoscopies more effective. In “Detection of Elusive Polyps via a Large Scale AI System”, we present an ML model designed to combat the problem of incomplete detection by helping the GI detect polyps that are within the field of view. This work adds to our previously published work that maximizes the coverage of the colon during the colonoscopy by flagging for GI follow-up areas that may have been missed. Using clinical studies, we show that these systems significantly improve polyp detection rates.

Incomplete Exploration
To help the GI detect polyps that are outside the field of view, we previously developed an ML system that reduces the rate of incomplete exploration by estimating the fractions of covered and non-covered regions of a colon during a colonoscopy. This earlier work uses computer vision and geometry in a technique we call colonoscopy coverage deficiency via depth, to compute segment-by-segment coverage for the colon. It does so in two phases: first computing depth maps for each frame of the colonoscopy video, and then using these depth maps to compute the coverage in real time.

The ML system computes a depth image (middle) from a single RGB image (left). Then, based on the computation of depth images for a video sequence, it calculates local coverage (right), and detects where the coverage has been deficient and a second look is required (blue color indicates observed segments where red indicates uncovered ones). You can learn more about this work in our previous blog post.

This segment-by-segment work yields the ability to estimate what fraction of the current segment has been covered. The helpfulness of such functionality is clear: during the procedure itself, a physician may be alerted to segments with deficient coverage, and can immediately return to review these areas, potentially reducing the rates of missed polyps due to incomplete exploration.

Incomplete Detection
In our most recent paper, we look into the problem of incomplete detection. We describe an ML model that aids a GI in detecting polyps that are within the field of view, so as to reduce the rate of incomplete detection. We developed a system that is based on convolutional neural networks (CNN) with an architecture that combines temporal logic with a single frame detector, resulting in more accurate detection.

This new system has two principal advantages. The first is that the system improves detection performance by reducing the number of false negatives detections of elusive polyps, those polyps that are particularly difficult for GIs to detect. The second advantage is the very low false positive rate of the system. This low false positive rate makes these systems more likely to be adopted in the clinic.

Examples of the variety of polyps detected by the ML system.

We trained the system on 3600 procedures (86M video frames) and tested it on 1400 procedures (33M frames). All the videos and metadata were de-identified. The system detected 97% of the polyps (i.e., it yielded 97% sensitivity) at 4.6 false alarms per procedure, which is a substantial improvement over previously published results. Of the false alarms, follow-up review showed that some were, in fact, valid polyp detections, indicating that the system was able to detect polyps that were missed by the performing endoscopist and by those who annotated the data. The performance of the system on these elusive polyps suggests its generalizability in that the system has learned to detect examples that were initially missed by all who viewed the procedure.

We evaluated the system performance on polyps that are in the field of view for less than five seconds, which makes them more difficult for the GI to detect, and for which models typically have much lower sensitivity. In this case the system attained a sensitivity that is about three times that of the sensitivity that the original procedure achieved. When the polyps were present in the field of view for less than 2 seconds, the difference was even more stark — the system exhibited a 4x improvement in sensitivity.

It is also interesting to note that the system is fairly insensitive to the choice of neural network architecture. We used two architectures: RetinaNet and  LSTM-SSD. RetinaNet is a leading technique for object detection on static images (used for video by applying it to frames in a consecutive fashion). It is one of the top performers on a variety of benchmarks, given a fixed computational budget, and is known for balancing speed of computation with accuracy. LSTM-SSD is a true video object detection architecture, which can explicitly account for the temporal character of the video (e.g., temporal consistency of detections, ability to deal with blur and fast motion, etc.). It is known for being robust and very computationally lightweight and can therefore run on less expensive processors. Comparable results were also obtained on the much heavier Faster R-CNN architecture. The fact that results are similar across different architectures implies that one can choose the network meeting the available hardware specifications.

Prospective Clinical Research Study
As part of the research reported in our detection paper we ran a clinical validation on 100 procedures in collaboration with Shaare Zedek Medical Center in Jerusalem, where our system was used in real time to help GIs. The system helped detect an average of one polyp per procedure that would have otherwise been missed by the GI performing the procedure, while not missing any of the polyps detected by the GIs, and with 3.8 false alarms per procedure. The feedback from the GIs was consistently positive.

We are encouraged by the potential helpfulness of this system for improving polyp detection, and we look forward to working together with the doctors in the procedure room to further validate this research.

Acknowledgements
The research was conducted by teams from Google Health and Google Research, Israel with support from Verily Life Sciences, and in collaboration with Shaare Zedek Medical Center. Verily is advancing this research via a newly established center in Israel, led by Ehud Rivlin. This research was conducted by Danny Veikherman, Tomer Golany, Dan M. Livovsky, Amit Aides, Valentin Dashinsky, Nadav Rabani, David Ben Shimol, Yochai Blau, Liran Katzir, Ilan Shimshoni, Yun Liu, Ori Segol, Eran Goldin, Greg Corrado, Jesse Lachter, Yossi Matias, Ehud Rivlin, and Daniel Freedman. Our appreciation also goes to several institutions and GIs who provided advice along the way and tested our system prototype. We would like to thank all of our team members and collaborators who worked on this project with us, including: Chen Barshai, Nia Stoykova, and many others.

Read More

August Arrivals: GFN Thursday Brings 34 Games to GeForce NOW This Month

It’s a new month for GFN Thursday, which means a new month full of games on GeForce NOW.

August brings a wealth of great new PC game launches to the cloud gaming service, including King’s Bounty II, Humankind and NARAKA: BLADEPOINT.

In total, 13 titles are available to stream this week. They’re just a portion of the 34 new games coming to the service this month.

Members will also get to stream upcoming content updates for popular free-to-play titles like Apex Legends and Tom Clancy’s Rainbow Six Siege as soon as they release.

Fit for a King

It’s time to save a kingdom. Members will be able to stream King’s Bounty II (Steam) when it releases for PC later this month on GeForce NOW.

King's Bounty II on GeForce NOW
Be a savior to a kingdom overshadowed by conspiracies, sabotage and necromancy in King’s Bounty II.

Darkness has descended over the world of Nostria in this exciting RPG sequel. Gamers will be able to play as one of three heroes, rescuing and building a personal army in a journey of leadership, survival and sacrifice. Fight for the future and outsmart enemies in turn-based combat. Every action has profound and lasting consequences in the fight to bring peace and order to the land.

Be the kingdom’s last hope and live out an adventure August 24. Preorder on Steam to get some exclusive bonuses.

More Fun for Free This Month

This month also comes with new content for some of the most popular free-to-play titles streaming on GeForce NOW. Members can look forward to experiencing the latest in Apex Legends and Tom Clancy’s Rainbow Six Siege.

Apex Legends: Emergence on GeForce NOW
Get ready, Legends. It’s a new season. “Apex Legends Emergence” has arrived.

“Apex Legends: Emergence,” the latest season of the wildly popular free-to-play game from Respawn and EA, launched on August 3 and brought in a new Legend, weapon and Battle Pass as well as some awesome map updates.

The newest legend, Seer, is here and ready to spot opportunities that others may miss. Players can also enjoy a new midrange weapon, the Rampage LMG, a slower but more powerful variation of the Spitfire. To top it all off, the newest map updates reveal a familiar landscape torn at the seams. Decimated World’s Edge is available now on Apex Legends and streaming on GeForce NOW.

The latest event in Tom Clancy’s Rainbow Six Siege features a new time-limited gameplay mode, a challenge to unlock a free Nomad Hive Mind set, and more. Year 6 Season 2 kicked off with a special “Rainbow Six Siege: Containment” event that sets players in the Consulate map overrun by the Chimera parasite in a new game mode called Nest Destruction. Members will be able to stream the Containment event from August 3 to August 24.

Here This Week

A Plague Tale: Innocence on GeForce NOW
Follow the grim tale of two siblings through some of the darkest hours in history in A Plague Tale: Innocence.

Starting off the month, members can look for the following titles available to stream this GFN Thursday:

August’s Newest Additions

NARAKA: BLADEPOINT on GeForce NOW
Experience a game where melee meets battle royale in NARAKA: BLADEPOINT.

This month is packed with more games coming to GeForce NOW over the course of August, including nine new titles:

More from July

Crowfall on GeForce NOW
Massive PVP sieges? Check. Raiding parties? Check. Playing as an epic Half-Giant Champion? Heck yes. Check out Crowfall, streaming on GeForce NOW.

On top of the 36 games that were announced and released in July, an extra 24 titles joined the GeForce NOW library over the month:

Finally, here’s a special question from our friends on the GeForce NOW Twitter feed:

𝙒𝙖𝙣𝙩𝙚𝙙: 𝘼 𝙨𝙩𝙧𝙖𝙩𝙚𝙜𝙞𝙘 𝙘𝙝𝙖𝙡𝙡𝙚𝙣𝙜𝙚.

Drop your favorite strategy games below. 👇

🌩 NVIDIA GeForce NOW (@NVIDIAGFN) August 4, 2021

The post August Arrivals: GFN Thursday Brings 34 Games to GeForce NOW This Month appeared first on The Official NVIDIA Blog.

Read More

Reimagine knowledge discovery using Amazon Kendra’s Web Crawler

When you deploy intelligent search in your organization, two important factors to consider are access to the latest and most comprehensive information, and a contextual discovery mechanism. Many companies are still struggling to make their internal documents searchable in a way that allows employees to get relevant information knowledge in a scalable, cost-effective manner. A 2018 International Data Corporation (IDC) study found that data professionals are losing 50% of their time every week—30% searching for, governing, and preparing data, plus 20% duplicating work. Amazon Kendra is purpose-built for addressing these challenges. Amazon Kendra is an intelligent search service that uses deep learning and reading comprehension to deliver more accurate search results.

The intelligent search capabilities of Amazon Kendra improve the search and discovery experience, but enterprises are still faced with the challenge of connecting troves of unstructured data and making that data accessible to search. Content is often unstructured and scattered across intranets and Wikis, making critical information hard to find and costing employees time and effort to track down the right answer.

Enterprises spend a lot of time and effort building complex extract, transform, and load (ETL) jobs that aggregate data sources. Amazon Kendra connectors allow you to quickly aggregate content as part of a single unified searchable index, without needing to copy or move data from an existing location to a new one. This reduces the time and effort typically associated with creating a new search solution.

With the recently launched Amazon Kendra web crawler, it’s now easier than ever to discover information stored within the vast amount of content spread across different websites and internal web portals. You can use the Amazon Kendra web crawler to quickly ingest and search content from your websites.

Sample use case

A common need is to reduce the complexity of searching across multiple data sources present in an organization. Most organizations have multiple departments, each having their own knowledge management and search systems. For example, the HR department may maintain a WordPress-based blog containing news and employee benefits-related articles, a Confluence site could contain internal knowledge bases maintained by engineering, sales may have sales plays stored on a custom content management system (CMS), and corporate office information could be stored in a Microsoft SharePoint Online site.

You can index all these types of webpages for search by using the Amazon web crawler. Specific connectors are also available to index documents directly from individual content data sources.

In this post, you learn how to ingest documents from a WordPress site using its sitemap with the Amazon Kendra web crawler.

Ingest documents with Amazon Kendra web crawler

For this post, we set up a WordPress site with information about AWS AI language services. In order to be able to search the contents of my website, we create a web crawler data source.

  1. On the Amazon Kendra console, choose Data sources in the navigation pane.

  1. Under WebCrawler, choose Add connector.

  1. For Data source name, enter a name for the data source.
  2. Add an optional description.

  1. Choose Next.

The web crawler allows you to define a series of source URLs or source sitemaps. WordPress generates a sitemap, which I use for this post.

  1. For Source, select Source sitemaps.
  2. For Source sitemaps, enter the sitemap URL.

  1. Add a web proxy or authentication if your host requires that.
  2. Create a new AWS Identity and Access Management (IAM) role.
  3. Choose Next.

  1. For this post, I set up the web crawler to crawl one page per second, so I modify the Maximum throttling value to 60.

The maximum value that’s allowed is 300.

For this post, I remove a blog entry that contains 2021/06/28/this-post-is-to-be-skipped/ in the URL, and also all the contents that have the term /feed/ in the URL. Keep in mind that the excluded content won’t be ingested into your Amazon Kendra index, so your users won’t be able to search across these documents.

  1. In the Additional configuration section, add these patterns on the Exclude patterns

  1. For Sync run schedule, choose Run on demand.
  2. Choose Next.

  1. Review the settings and choose Create.
  2. When the data source creation process is complete, choose Sync now.

When the sync job is complete, I can search on my website.

Conclusion

In this post, you saw how to set up the Amazon Kendra web crawler and how easy is to ingest your websites into your Amazon Kendra index. If you’re just getting started with Amazon Kendra, you can build an index, ingest your website, and take advantage of intelligent search to provide better results to your users. To learn more about Amazon Kendra, refer to the Amazon Kendra Essentials workshop and deep dive into the Amazon Kendra blog.


About the Authors

Tapodipta Ghosh is a Senior Architect. He leads the Content And Knowledge Engineering Machine Learning team that focuses on building models related to AWS Technical Content. He also helps our customers with AI/ML strategy and implementation using our AI Language services like Amazon Kendra.

 

 

Vijai Gandikota is a Senior Product Manager at Amazon Web Services for Amazon Kendra.

 

 

 

 

Juan Bustos is an AI Services Specialist Solutions Architect at Amazon Web Services, based in Dallas, TX. Outside of work, he loves spending time writing and playing music as well as trying random restaurants with his family.

Read More

Two New Datasets for Conversational NLP: TimeDial and Disfl-QA

Posted by Aditya Gupta, Software Engineer and Shyam Upadhyay, Research Scientist, Google Assistant

A key challenge in natural language processing (NLP) is building conversational agents that can understand and reason about different language phenomena that are unique to realistic speech. For example, because people do not always premeditate exactly what they are going to say, a natural conversation often includes interruptions to speech, called disfluencies. Such disfluencies can be simple (like interjections, repetitions, restarts, or corrections), which simply break the continuity of a sentence, or more complex semantic disfluencies, in which the underlying meaning of a phrase changes. In addition, understanding a conversation also often requires knowledge of temporal relationships, like whether an event precedes or follows another. However, conversational agents built on today’s NLP models often struggle when confronted with temporal relationships or with disfluencies, and progress on improving their performance has been slow. This is due, in part, to a lack of datasets that involve such interesting conversational and speech phenomena.

To stir interest in this direction within the research community, we are excited to introduce TimeDial, for temporal commonsense reasoning in dialog, and Disfl-QA, which focuses on contextual disfluencies. TimeDial presents a new multiple choice span filling task targeted for temporal understanding, with an annotated test set of over ~1.1k dialogs. Disfl-QA is the first dataset containing contextual disfluencies in an information seeking setting, namely question answering over Wikipedia passages, with ~12k human annotated disfluent questions. These benchmark datasets are the first of their kind and show a significant gap between human performance and current state of the art NLP models.

TimeDial
While people can effortlessly reason about everyday temporal concepts, such as duration, frequency, or relative ordering of events in a dialog, such tasks can be challenging for conversational agents. For example, current NLP models often make a poor selection when tasked with filling in a blank (as shown below) that assumes a basic level of world knowledge for reasoning, or that requires understanding explicit and implicit inter-dependencies between temporal concepts across conversational turns.

It is easy for a person to judge that “half past one” and “quarter to two” are more plausible options to fill in the blank than “half past three” and “half past nine”. However, performing such temporal reasoning in the context of a dialog is not trivial for NLP models, as it requires appealing to world knowledge (i.e., knowing that the participants are not yet late for the meeting) and understanding the temporal relationship between events (“half past one” is before “three o’clock”, while “half past three” is after it). Indeed, current state-of-the-art models like T5 and BERT end up picking the wrong answers — “half past three” (T5) and “half past nine” (BERT).

The TimeDial benchmark dataset (derived from the DailyDialog multi-turn dialog corpus) measures models’ temporal commonsense reasoning abilities within a dialog context. Each of the ~1.5k dialogs in the dataset is presented in a multiple choice setup, in which one temporal span is masked out and the model is asked to find all correct answers from a list of four options to fill in the blank.

In our experiments we found that while people can easily answer these multiple choice questions (at 97.8% accuracy), state-of-the-art pre-trained language models still struggle on this challenge set. We experiment across three different modeling paradigms: (i) classification over the provided 4 options using BERT, (ii) mask filling for the masked span in the dialog using BERT-MLM, (iii) generative methods using T5. We observe that all the models struggle on this challenge set, with the best variant only scoring 73%.

Model   2-best Accuracy
Human   97.8%
BERT – Classification   50.0%
BERT – Mask Filling   68.5%
T5 – Generation   73.0%

Qualitative error analyses show that the pre-trained language models often rely on shallow, spurious features (particularly text matching), instead of truly doing reasoning over the context. It is likely that building NLP models capable of performing the kind of temporal commonsense reasoning needed for TimeDial requires rethinking how temporal objects are represented within general text representations.

Disfl-QA
As disfluency is inherently a speech phenomenon, it is most commonly found in text output from speech recognition systems. Understanding such disfluent text is key to building conversational agents that understand human speech. Unfortunately, research in the NLP and speech community has been impeded by the lack of curated datasets containing such disfluencies, and the datasets that are available, like Switchboard, are limited in scale and complexity. As a result, it’s difficult to stress test NLP models in the presence of disfluencies.

Disfluency   Example
Interjection   When is, uh, Easter this year?
Repetition   When is EasEaster this year?
Correction   When is Lent, I mean Easter, this year?
Restart   How much, no wait, when is Easter this year?
Different kinds of disfluencies. The reparandum (words intended to be corrected or ignored; in red), interregnum (optional discourse cues; in grey) and repair (the corrected words; in blue).

Disfl-QA is the first dataset containing contextual disfluencies in an information seeking setting, namely question answering over Wikipedia passages from SQuAD. Disfl-QA is a targeted dataset for disfluencies, in which all questions (~12k) contain disfluencies, making for a much larger disfluent test set than prior datasets. Over 90% of the disfluencies in Disfl-QA are corrections or restarts, making it a much more difficult test set for disfluency correction. In addition, compared to earlier disfluency datasets, it contains a wider variety of semantic distractors, i.e., distractors that carry semantic meaning as opposed to simpler speech disfluencies. 

Passage: …The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave their name to Normandy, a region in France. They were descended from Norse (“Norman” comes from “Norseman”) raiders and pirates from Denmark, Iceland and Norway who, under their leader Rollo, …
Q1:   In what country is Normandy located?   France ✓
DQ1:   In what country is Norse found no wait Normandy not Norse?   Denmark X
Q2:   When were the Normans in Normandy?   10th and 11th centuries ✓
DQ2:   From which countries no tell me when were the Normans in Normandy?   Denmark, Iceland and Norway X
A passage and questions (Qi) from SQuAD dataset, along with their disfluent versions (DQi), consisting of semantic distractors (like “Norse” and “from which countries”) and predictions from a T5 model.

Here, the first question (Q1) is seeking an answer about the location of Normandy. In the disfluent version (DQ1) Norse is mentioned before the question is corrected. The presence of this correctional disfluency confuses the QA model, which tends to rely on shallow textual cues from the question for making predictions.

Disfl-QA also includes newer phenomena, such as coreference (expression referring to the same entity) between the reparandum and the repair.

SQuAD    Disfl-QA
Who does BSkyB have an operating license from?    Who removed [BSkyB’s] operating license, no scratch that, who do [they] have [their] operating license from?

Experiments show that the performance of existing state-of-the-art language model–based question answering systems degrades significantly when tested on Disfl-QA and heuristic disfluencies (presented in the paper) in a zero-shot setting.

Dataset   F1
SQuAD   89.59
Heuristics   65.27 (-24.32)
Disfl-QA   61.64 (-27.95)

We show that data augmentation methods partially recover the loss in performance and also demonstrate the efficacy of using human-annotated training data for fine-tuning. We argue that researchers need large-scale disfluency datasets in order for NLP models to be robust to disfluencies.

Conclusion
Understanding language phenomena that are unique to human speech, like disfluencies and temporal reasoning, among others, is a key ingredient for enabling more natural human–machine communication in the near future. With TimeDial and Disfl-QA, we aim to fill a major research gap by providing these datasets as testbeds for NLP models, in order to evaluate their robustness to ubiquitous phenomena across different tasks. It is our hope that the broader NLP community will devise generalized few-shot or zero-shot approaches to effectively handle these phenomena, without requiring task-specific human-annotated training datasets, constructed specifically for these challenges.

Acknowledgments
The TimeDial work has been a team effort involving Lianhui Qi, Luheng He, Yenjin Choi, Manaal Faruqui and the authors. The Disfl-QA work has been a collaboration involving Jiacheng Xu, Diyi Yang, Manaal Faruqui.

Read More