This blog describes a failure situation that showed up over the last 12 months in some circumstances with customers running DBMS High Availability solutions on Azure that require an Azure Load Balancer. These customers may have observed ABAP Shortdumps with the error message 10054. We meanwhile identified the problem and described a solution to resolve this problem in a new SAP Note 3083711 - Azure - ST22 shows DBSQL_SQL_ERROR
1. How do you Diagnose the Problem?
To analyze whether you encountered the problem, follow the procedure below. All the conditions below must be true for this note to apply:
The DBMS solution must use a Virtual IP Address and the Azure Standard Load Balancer (such as SQL AlwaysOn, Hana HSR architectures)
The DBMS solution must use Operating System clustering. Examples include Windows Cluster for SQL Server AlwaysOn or FCI or Pacemaker for Hana
The Standard Load Balancer must have explicit tcp port rules configured and NOT HA Ports configured
The ABAP dump DBSQL_SQL_ERROR (or similar) will contain a network level error
The error message in the dump may contain “connection was forcibly closed” or tcp reset or 10054
The ABAP dump will be triggered on a failure of a Secondary Service Connection. An example of a Secondary Connection is illustrated below. The “>>>>>” indicates the line of code that triggered the DBSQL_SQL_ERROR
The screenshot shows an example of a Secondary Service Connection. Note: the actual name of the Secondary Connection will be different for different ABAP programs. The ABAP syntax for a Secondary Service Connection is “connection (<connection name or variable)”.
The SQL operation will typically be on a very high concurrency table such as NRIV, VARINUM etc
The problem will not occur on the Basic Load Balancer as the Basic Load Balancer runs different logic. Nevertheless, we advice customers that are using Basic Load Balancer to move to the Standard Load Balancer as general guidance due to significant latency improvements, independent of the issue described in 3083711 - Azure - ST22 shows DBSQL_SQL_ERROR.
2. What Causes this Problem?
The problem is caused by a regression in the how Azure networking handles TCP reset injection. The problem will only occur with High Availability solutions that use the Azure Standard Load Balancer to present a Virtual IP Address created by either Windows Cluster or Pacemaker.
3. How to Resolve this Problem?
The problem can be quickly and easily resolved by applying the following the procedure below.
Arrange short downtime for SAP Application
Stop SAP Application servers
Shutdown DBMS using High Availability solution (Windows Cluster or Linux Pacemaker)
Delete port rules from the Azure Standard Load Balancer
Configure “HA Ports”
Save configuration on the Azure Standard Load Balancer