Designing and operating ERP capabilities in the cloud has become the epicenter of digital transformation for many of our SAP customers. SAP ERP runs mission-critical processes that are foundational for enterprises. The applications must run incident-free and must be able to resolve issues quickly.
An SAP Operations Manager needs to ensure that the SAP systems are operational at maximum efficiency and available to cater to business needs. As a result, they are under continuous pressure to meet the performance metrics and service levels e.g., Mean-Time-To-Resolution (MTTR). For detecting and preventing failures, it is essential to have a monitoring tool to analyze operation and performance and to detect and alert about possible errors.
The hurdle only gets challenging as the monitoring tools they have historically relied on aren’t suited for today’s 24/7 dynamic business world. The landscape and systems become complex. It is essential to have a solution that provides observability (gather metrics, logs, and traces), an observable system is easily understood, and instrumentation is a big part of building observable systems. It lets you create an early warning system to detect issues before they occur.
Based on the above principle, we designed and implemented SAP NetWeaver provider as part of Azure Monitor of SAP solutions. It provides deep visibility into your SAP applications and databases by helping you identify and resolve issues affecting them and maintain optimal performance. The solution is simple and intuitive, provides a unified source of multi-dimensional insights into application and infrastructure health and performance, and helps in Proactive anomaly detection, quick incident response and resolution, and smart alerts to manage alert effectiveness. It collects data using a pull mechanism with zero customization in your core ERP, leverages serverless computing technique, and stores the information in Log analytics. The data is visualized leveraging the Azure workbook, thereby making the entire product azure native. With this, you can
Monitor - Monitor the landscape by acquiring and processing system-level information
Analyze - Analyze and understand the information collected by adding meaning and insights.
Act - Act in the system based on comprehension and understanding.
Learn - Improve performance (quality, consistency, and accuracy) based on your data and system details
Over the next few minutes, we will show you a few scenarios on how the above can be applied to help maximize the observability by using instrumentation techniques
Proactively identifying account unlocks before they are being reported by users – Leveraging the system log SM21 metric collected by AMS, you can set up a unique alert that captures an account lock event, the reason, and from which terminal. It allows you to proactively take necessary action and inform the end user.
Whitebox Monitoring – The OS can already tell us how much memory HANA has allocated, the HANA provider can tell us more in detail what this memory is used for, NW provider can tell us CPU, I/O, and Memory. Thereby correlating information & optimizing the system performance to avoid I/O throttling.
Co-relating Short dumps with release and object locks – As a basis admin or developer, you can relate ST22 short dumps and see if it caused by the recent change being moved (STMS), did the ST22 caused any batch job (SM37) to fail to result in any failed updates (SM13) or lock entries (SM12).
Finding a window to schedule a new batch job & anomalies in execution – Leveraging graph and time-series capability from Azure Log analytics and workbook, you can find the best suitable time to schedule a new job, looking at previous job execution window and CPU/DB performance. In addition, finding patterns in job execution data that do not conform to expected behavior (runtime, delay) through native log analytics capabilities.
Managing queues to ensure transaction reliability - You can monitor Outbound (SMQ1), Inbound (SMQ2), and RFC (SM59) queues to provide a better experience for business users, more so during special events where the queues process more data than before.
Minimizing Alert Fatigue – The solution provides a rich set of pre-defined templates for customers to kick start the journey and a fairly easy way to extend/customize as per needs. Alerts are designed for symptoms rather than on causes, for example, we report “SAP system performance is suboptimal” – this is the symptom as the cause is determined through the metric, log highlighting high CPU or DB overload, or all DIA work processes re in use.
Apart from these, many such scenarios can be implemented by leveraging the AMS core capabilities. Troubleshooting and finding evidence for root cause analysis have been made easier.
As the product and technology evolve, it provides customers with a unique opportunity for customers to leverage azure native capabilities to extend and innovate on AMS. For example, leveraging PowerApps to automate account unlock or dynamically scaling systems based on I/O throttling.