How Artificial Intelligence Protects Porsche’s IT Landscape



Original Source Here

How Artificial Intelligence Protects Porsche’s IT Landscape

Porsche, iTUBS and comNET are working on an ongoing research project to develop the first AI-supported IT monitoring tool that can automatically detect complex error cases. The development of this tool will enable Porsche to monitor and respond effectively to alerts across all its IT systems and services.

Robust, powerful and healthy IT systems play an increasingly important role in today’s digital age. If IT systems are not functioning as they should, it can have a devastating impact on a company and seriously disrupt business operations. IT monitoring — collecting measurement data and monitoring the IT environment — is an effective way to improve the health and resilience of IT systems.

At Porsche, IT monitoring is used to strengthen the company’s IT resilience and maximize the availability of IT services, from production to sales to databases. Currently, there are about 5000 monitored hosts, with the number of hosts (servers, VMs, etc) constantly increasing. This volume makes it difficult to keep track of everything and troubleshooting is complicated and lengthy.

Automatic detection of complex error cases

In order to handle the increasingly complex tasks of IT monitoring, the research project will for the first time develop an AI-supported IT monitoring tool that can automatically detect cases of error. Since the launch of the first prototype in February 2021, all important host and application parameters of Porsche IT services have been analyzed by an artificial intelligence based on neural networks.

But how can AI help make IT monitoring smarter? Two types of errors can occur in IT, distinguishing between low-level and high-level errors. Low-level errors are caused by only one metric (disk usage or CPU utilization) and can be detected by thresholds for the metric (disk usage above 95% or CPU utilization at 100%). We can effectively monitor these types of errors with tools like Checkmk. High-level errors, on the other hand, are caused by multiple metrics (e.g., “disk usage” and “network traffic”). Here, detection depends on complex metrics and temporal dependencies (“disk usage of remote database host increases rapidly without network traffic”). Our AI-powered solution is designed to address these types of issues.

Monitoring for IT environments — our AI approach

The monitoring tool was developed jointly by Porsche, iTUBS (Innovationsgesellschaft Technische Universität Braunschweig mbH) and comNET (Gesellschaft für Kommunikation & Netzwerke mbH). The AI researchers at TU Braunschweig developed the AI architecture, the training processes and the deployment strategies. The IT specialists from comNET, with their expertise in monitoring and open source software, took over the project coordination and Checkmk integration and developed the AI dashboard.

There are two important use cases for the AI-powered IT monitoring project: First, predictive maintenance aims to predict host errors or failures before they occur, helping to avoid costly downtime and schedule host maintenance. Second, anomaly detection alerts IT professionals to a host’s unusual behavior and is able to detect complex high-level anomalies, which is much more difficult than applying thresholds to host metrics. The AI dashboard visualizes the results of methods that address these AI-enabled IT monitoring use cases.

Proof of Concept: Anomalie-Dashboard

Our monitoring tool provides IT professionals with an AI-driven web-based monitoring dashboard. Hosts with many anomalies are visually highlighted for quick troubleshooting. In addition, the tool is based on a human-in-the-loop approach: the AI alerts IT professionals to potential anomalies, and the IT professionals help improve the AI by collecting data from anomalous or faulty events, enabling continuous improvement of the anomaly detection neural model. This leads to faster troubleshooting, less downtime, and a better understanding of the root causes of complex IT failure events.

By using AI technology, Porsche IT can (i) improve the monitoring of IT systems, (ii) identify IT vulnerabilities and prevent failures, and (iii) thus increase Porsche’s overall IT resilience.

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: