Research Focus: Week of February 19, 2024

Research Focus: Week of February 19, 2024

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

Research Focus Week of February 19, 2024

Vertically Autoscaling Monolithic Applications with CaaSPER: Scalable Container-as-a-Service Performance Enhanced Resizing Algorithm for the Cloud

Kubernetes is a prominent open-source platform for managing cloud applications, including stateful databases, which keep track of changes and transactions involving the underlying data. These monolithic applications often must rely on vertical resource scaling instead of horizontal scale-out, adjusting CPU cores to match load fluctuations. However, an analysis of database-as-a-service (DBaaS) offerings at Microsoft revealed that many customers consistently over-provision resources for peak workloads, neglecting opportunities to optimize their cloud resource consumption by scaling down. Existing vertical autoscaling tools lack the ability to minimize resource slack and respond promptly to throttling, leading to increased costs and impacting crucial metrics, such as throughput and availability.

In a recent paper: Vertically Autoscaling Monolithic Applications with CaaSPER: Scalable Container-as-a-Service Performance Enhanced Resizing Algorithm for the Cloud, researchers from Microsoft propose CaaSPER, a vertical autoscaling algorithm that blends reactive and proactive strategies to address this challenge. By dynamically adjusting CPU resources, CaaSPER minimizes resource slack, maintains optimal CPU utilization, and reduces throttling. Importantly, customers have the flexibility to prioritize either cost savings or high performance. Extensive testing demonstrates that CaaSPER effectively reduces throttling and keeps CPU utilization within target levels. CaaSPER is designed to be application-agnostic and platform-agnostic, with potential for extension to other applications and resources requiring vertical autoscaling.

Microsoft Research Podcast

Collaborators: Holoportation™ communication technology with Spencer Fowers and Kwame Darko

Spencer Fowers and Kwame Darko break down how the technology behind Holoportation and the telecommunication device being built around it brings patients and doctors together when being in the same room isn’t an easy option and discuss the potential impact of the work.


Improved Scene Landmark Detection for Camera Localization

Camera localization is a fundamental component commonly used in computer vision, robotics, augmented reality, and virtual reality applications for estimating the precise 3D position and orientation of a camera-enabled device within a scene. Localization techniques that use image-based retrieval, visual feature matching, and 3D structure-based pose estimation, are generally accurate, but they require high storage, are often slow, and are not privacy-preserving. Researchers from Microsoft and external colleagues recently proposed an alternate learning-based localization method based on scene landmark detection (SLD) to address these limitations. It involves training a convolutional neural network to detect a few predetermined, salient, scene-specific 3D points or landmarks and computing camera pose from the associated 2D–3D correspondences. Although SLD outperformed existing learning-based approaches, it was notably less accurate than 3D structure-based methods.

In a recent paper: Improved Scene Landmark Detection for Camera Localization, researchers from Microsoft show that the accuracy gap was due to insufficient model capacity and noisy labels during training. To mitigate the capacity issue, they propose splitting the landmarks into subgroups and training a separate network for each subgroup. To generate better training labels, they propose using dense reconstructions to estimate accurate visibility of scene landmarks. Finally, they present a compact neural network architecture to improve memory efficiency. This approach is as accurate as state-of-the-art structure-based methods on the INDOOR-6 dataset, but it runs significantly faster and uses less storage.


ESUS: Aligning and Simplifying SUS for Enterprise Applications

Over many years, researchers have developed standard questionnaires to evaluate usability and present a single score that represents a product’s overall level of ease of use. These evaluations are highly valuable for researchers studying human-computer interaction (HCI) and user experience (UX). One of the most notable questionnaires is the System Usability Scale (SUS). However, since the SUS was introduced in 1986, products and services have undergone monumental advances in technology, while HCI and UX research practices have matured considerably. These changes are also true in enterprise environments.

In a recent article: ESUS: Aligning and Simplifying SUS for Enterprise Applications, researchers from Microsoft present preliminary evidence showing the effectiveness of a new usability questionnaire with three advantages for enterprise applications over the original 10-item SUS questionnaire. The Enterprise System Usability Scale (ESUS) offers better measurement of usability for technical products and services; reduced questionnaire items; and alignment with enterprise environments. Results indicate that the ESUS strongly correlates with user satisfaction, similar to the SUS.

The post Research Focus: Week of February 19, 2024 appeared first on Microsoft Research.

Read More

Shining Brighter Together: Google’s Gemma Optimized to Run on NVIDIA GPUs

Shining Brighter Together: Google’s Gemma Optimized to Run on NVIDIA GPUs

NVIDIA, in collaboration with Google, today launched optimizations across all NVIDIA AI platforms for Gemma — Google’s state-of-the-art new lightweight 2 billion– and 7 billion-parameter open language models that can be run anywhere, reducing costs and speeding innovative work for domain-specific use cases.

Teams from the companies worked closely together to accelerate the performance of Gemma — built from the same research and technology used to create the Gemini models — with NVIDIA TensorRT-LLM, an open-source library for optimizing large language model inference, when running on NVIDIA GPUs in the data center, in the cloud and on PCs with NVIDIA RTX GPUs.

This allows developers to target the installed base of over 100 million NVIDIA RTX GPUs available in high-performance AI PCs globally.

Developers can also run Gemma on NVIDIA GPUs in the cloud, including on Google Cloud’s A3 instances based on the H100 Tensor Core GPU and soon, NVIDIA’s H200 Tensor Core GPUs — featuring 141GB of HBM3e memory at 4.8 terabytes per second — which Google will deploy this year.

Enterprise developers can additionally take advantage of NVIDIA’s rich ecosystem of tools — including NVIDIA AI Enterprise with the NeMo framework and TensorRT-LLM — to fine-tune Gemma and deploy the optimized model in their production application.

Learn more about how TensorRT-LLM is revving up inference for Gemma, along with additional information for developers. This includes several model checkpoints of Gemma and the FP8-quantized version of the model, all optimized with TensorRT-LLM.

Experience Gemma 2B and Gemma 7B directly from your browser on the NVIDIA AI Playground.

Gemma Coming to Chat With RTX

Adding support for Gemma soon is Chat with RTX, an NVIDIA tech demo that uses retrieval-augmented generation and TensorRT-LLM software to give users generative AI capabilities on their local, RTX-powered Windows PCs.

The Chat with RTX lets users personalize a chatbot with their own data by easily connecting local files on a PC to a large language model.

Since the model runs locally, it provides results fast, and user data stays on the device. Rather than relying on cloud-based LLM services, Chat with RTX lets users process sensitive data on a local PC without the need to share it with a third party or have an internet connection.

Read More