English

Intelligent Network O&M

Posted on Apr 8, 2024 by
63

What Is Intelligent Network O&M?

The intelligent network O&M solution can visualize various O&M data and quickly detect, locate, and rectify faults. Additionally, this solution provides intelligent capabilities such as comprehensive health evaluation and fault prediction to achieve proactive protection based on exception detection and risk prediction, ensuring 24/7 service continuity.

Why Is Intelligent Network O&M Needed?

The shift towards digitization is an inescapable movement across multiple sectors. Its pace is hastened by advances in software capabilities, including big data analytics and machine learning. A broad array of services and applications are transitioning to cloud-based platforms, with businesses frequently leveraging and connecting to the cloud every day. The emergence of software-defined networking (SDN) alongside cloud technology has simplified the digital evolution of companies by enabling the pooling of computing and storage resources. Nonetheless, this has led to unprecedented intricacies in network structures, which present significant obstacles for network O&M. These include:

Difficult to Perceive Services

  • Traditional network operations rely heavily on alarms for maintenance, yet the volume of alarms is escalating, exceeding the capabilities of outdated manual-focused approaches. To mitigate this, less critical alarms may be filtered out, but this approach risks missing a comprehensive understanding of network integrity. With the advance of sophisticated technologies like SDN, network operators are tasked with managing both the underlying physical infrastructure and the virtual network layers. Alarm-based maintenance is no longer sufficient for these complex environments.

  • Traditional O&M practices lead to a reactive response from personnel to network issues, lacking the capability to foresee or preempt faults before they arise.

Difficult to Locate Faults

  • Managing Networks on a Vast Scale: Within the realm of cloud computing, O&M teams are tasked with overseeing a combination of physical hardware and virtual machines (VMs), which equates to a management scope that is exponentially larger, often with the number of network elements (NEs) increasing by orders of magnitude. Compounding this complexity, the shift towards real-time analytics necessitates much more frequent data collection from devices, down to millisecond intervals as opposed to minutes, resulting in a data volume surge by as much as a factor of a thousand. To effectively preempt and resolve complications, the O&M infrastructure is also required to process and present an immense volume of device data, thereby further expanding the already vast data landscape.

  • Multiple Routing Paths: Ensuring robustness and ample bandwidth, networks are often configured to route traffic using load balancing techniques, utilizing hashing algorithms to direct the flow. Yet, as the node count within the network climbs, the potential traffic routes multiply significantly, rendering it challenging for network administrators to identify the specific path utilized by a given service's traffic. Traditional fault localization relies greatly on the expertise of O&M staff and tends to be a lengthy process.

Slow Fault Rectification

  • The uninterrupted functionality of networks is critical in protecting corporate information and bolstering business achievements. Consequently, even minimal disruptions in network performance can lead to substantial financial detriment. Therefore, it is essential to address network issues promptly to prevent any adverse impact on enterprise operations.

  • In the context of financial services, there has been a shift from a centralized structure to a more intricate distributed deployment. This complexity often leads to delayed reactions from O&M staff, with the average time taken to identify faults extending to approximately 76 minutes, thereby posing challenges in maintaining uninterrupted service.

In response to the above challenges, the Intelligent Network Operation and Maintenance (INOM) solution is able to realize precise network operation and maintenance, and continuously improve the manageability and service quality of the network.

What Are the Benefits of Intelligent Network O&M?

Thorough Assessment of Network Health for Real-Time Insight into Services and Connectivity

The solution for assessing network health conducts a holistic evaluation and surveillance at the network scale, aiding operations and maintenance staff in obtaining a comprehensive understanding of the network to optimize the efficiency of O&M processes and enhance the user experience. Specifically, the solution is comprised of three components:

  • Abstraction and modeling at the network scale: Establish a multi-tiered appraisal framework and regularly gather data on the condition of network components, protocols, linkages, and services.

  • Holistic and smart assessment of network well-being: Construct a network entity model for every stratum, thoroughly accumulating network intelligence encompassing histories, performance metrics, configuration settings of network machinery, and inter-host service traffic. Employ advanced analytical algorithms to appraise the vitality of each stratification, actively monitor anomalies in crucial metrics like device functionality and network load, and anticipate potential issues in network throughput and traffic with foresight.

  • GUI-based real-time visualization: Present the gathered information using diverse visual formats like graphs instantaneously, and consistently produce reports evaluating network health, thereby aiding in regular network wellness assessments and preemptive problem-solving.

Rapid Fault Detection for Smart Diagnostic Solutions

Swiftly identifying and correcting errors is challenging due to the network's vastness, intricate setups, and frequent configuration alterations. The reliance on the expertise of O&M staff for problem-solving makes the process exceedingly time-intensive.

The intelligent network O&M solution can quickly locate the root causes of faults.

  • Utilizing in-situ Flow Information Telemetry (iFIT), end-to-end (E2E) analysis is conducted on services experiencing subpar quality of experience (QoE) without disrupting them, despite the negative impact on user satisfaction. Through iFIT, an intelligent network controller methodically gathers data across each network segment, enabling precise identification of the sources of failure.

  • A strategic sequence of actionable diagnostic procedures is designed by leveraging various fault diagnostics, a wealth of incident data from operational networks, and the seasoned insights of Huawei's operations and maintenance experts. This approach is aimed at reducing the time required for detecting and isolating faults. For instance, service connectivity problems are addressed with automated workflows, which facilitate one-click, automated resolution processes.

  • Device-related ERSPAN streams and telemetry data are gathered and subjected to advanced big data analysis, allowing for the proactive identification of possible issues within network fabrics. By integrating artificial intelligence algorithms, this analysis is further refined to discern if problems are localized to specific network segments or applications. Such comprehensive diagnostics are instrumental in aiding users to fulfill their objective of proactive and intelligent operations and maintenance, aimed at early fault detection and swift, pinpoint fault isolation within minutes.

  • AI algorithms can be employed to deduce unidentified issues, thereby assisting operations and maintenance staff in thoroughly investigating the underlying reasons for these faults.

Automated Error Correction for Uninterrupted Service Performance

  • Leveraging a rule engine, intelligence engine, and knowledge graph, the advanced network O&M system excels at executing big data mining and analytical tasks, leading to the swift detection and pinpointing of system faults. It is designed to integrate with a controller, allowing immediate fault correction or isolation through a single-click operation. The system has the capability to assess and report the effects of specific faults on both network functionality and service delivery. Prior to implementing any corrective or isolation measures, it can also forecast and illustrate the potential repercussions on the network and services, which significantly streamlines the decision-making process.

  • For poor-QoE services, the intelligent network O&M system can automatically adjust service paths to avoid the links or nodes that cause poor quality, implementing automatic service SLA recovery.

Architectures of Intelligent Network O&M

Smart O&M is commonly deployed within networks of data centers and carriers. The ensuing text outlines the structures of smart network O&M within these two contexts.

Intelligent O&M Architecture of Data Center Networks

The architecture for intelligent O&M within data center networks is structured into three distinct levels: the networking layer, the control layer, and the analytical layer, as illustrated in the figure below.

  • The network layer comprises various network devices within the data center that forward copied packets, performance metrics, and logs upward to the analytical layer for advanced processing and display. It serves as the foundational data provider for the analytics layer.

  • The control layer is founded on the iMaster NCE-Fabric platform, which is a sophisticated network management and control system. It establishes a connection with iMaster NCE-FabricInsight, a smart network analysis system, to facilitate the automatic set-up of network services in operations and maintenance processes. Furthermore, the iMaster NCE-Fabric system has the capability to link with cloud platforms for cloud-network integration or with Virtual Machine Management (VMM) servers in the context of network virtualization, enabling it to coordinate logical networks as well as to automate the translation and application of configurations for network devices. In addition, the iMaster NCE-Fabric platform offers an array of other functionalities such as tracing network paths, confirming network connectivity, intelligently identifying faults, pinpointing and remedying issues, as well as isolating faults when necessary.

  • The analysis layer is built on iMaster NCE-FabricInsight. Based on big data platform, iMaster NCE-FabricInsight receives data from network devices through telemetry and uses intelligent algorithms to analyze and display the reported data. It can proactively detect network faults and locate their root causes in minutes, implementing intelligent O&M.

Intelligent Network O&M

Intelligent O&M architecture of data center networks

Intelligent O&M Architecture of Carrier Networks

The intelligent O&M architecture of carrier networks is conceptually segmented into three layers: data collection layer, data analysis layer, and data presentation layer, as shown in the following figure, as illustrated in the subsequent diagram.

  • Data collection layer: iMaster NCE-IP — an intelligent network management and control system — delivers subscription messages to network devices, which then send running data, configuration data, and resource data to the data analysis layer in real time through network management protocols.

  • Data analysis layer: analyzes network data (device, connection, protocol, and security data) and service data in the following aspects:

  • 1. Examines network information to assess the network's condition, forwards the evaluation of the network's well-being and potential network vulnerabilities to the data display layer, and executes preventive network operations and maintenance.

  • 2. Conducts a comparative analysis between network and service data, utilizes AI-powered big data analytics as well as expert insights to carry out smart fault detection, compiles diagnostic reports of the faults, and dispatches these reports to the data visualization layer.

  • Data presentation layer: iMaster NCE-IP presents the received data analysis results in various forms, such as dashboards, charts, reports, and relationship diagrams. Additionally, northbound application programming interfaces (APIs) are provided for third-party systems to obtain data analysis results.

Intelligent Network O&M

Intelligent O&M architecture of carrier networks

Application Scenarios of Intelligent Network O&M

Data Center Networks

  • Service change:

  • 1. Simulations and validations are conducted to ascertain the alignment of impending service deliveries with user anticipations.

  • 2. Real-time visualization of network modifications is enabled, coupled with the identification of snapshot data and differentials in configuration entries pre- and post-adjustments, aiding in the analysis of the network's condition.

  • 3. Configuration rollback is supported to rapidly restore services when faults occur, minimizing service interruption loss.

  • 4. The automatic scaling of server capabilities guarantees rapid deployment of services

  • Routine preventive maintenance inspection (PMI):

  • 1. Network wellness is assessed across various aspects such as equipment, connectivity, protocols, overlays, and services.

  • 2. Through telemetry, a comprehensive sweep of network data, encompassing settings, records, logs, and critical performance metrics, is amassed to identify problems and hazards within every stratum of the network instantaneously.

  • 3. The operational condition of devices, the capacity of the network, the state of individual components, and the service interplay are scrutinized.

  • 4. Intelligent detection of network performance anomalies ensures that potential threats are identified before they impact services.

  • All these help O&M personnel comprehensively learn the network status and overall user experience.

  • Emergency fault rectification:

  • 1. Data pertaining to various network malfunctions is gathered and examined to extract meaningful correlations of faults from extensive datasets, allowing for swift and precise fault analysis and localization.

  • 2. The capability for one-click fault resolution guarantees uninterrupted and steady operation of services.

  • Fault root causes location:

  • 1. The inference engine of the knowledge graph is employed to scrutinize the accumulated network data, promptly pinpointing the fundamental reasons for errors.

  • 2. Unknown faults are learned and inferred, helping O&M personnel deeply explore the root causes of these faults.

Carrier Networks

Intelligent O&M techniques have been implemented within the intelligent cloud-network framework designed for service providers.

For example, a carrier has applied intelligent O&M to their network. The use cases consist of:

  • Display of VPN service quality across multiple dimensions and advance warnings for subpar Quality of Experience (QoE).

  • 1. The smart network operations and maintenance system delivers analyses of irregular VPN Key Performance Indicators (KPIs), VPN traffic anomalies, and access point KPI examinations.

  • 2. VPN clients have the ability to monitor Service Level Agreement (SLA) metrics including packet loss ratio and latency for VPN services in a real-time manner, as well as establish thresholds for unsatisfactory Quality of Experience. In the event that these thresholds are surpassed, an alert is triggered proactively.

  • Automated and precise identification of faults along with accurate dispatch of trouble tickets for those faults.

  • 1. Accurately visualizing the routes of VPN services and pinpointing issues at each hop assists operations and maintenance staff in swiftly correcting problems.

  • 2. Support 7-day historical playback of VPN services, facilitating post-even fault analysis.

You might be interested in

See profile for undefined.
FS Official
Load Balancing
See profile for undefined.
FS Official
Malware
See profile for undefined.
FS Official
Orthogonal Architecture