Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
JimmyYang
Advisor
Advisor

Overview


 

 

This blog is part of a series of troubleshooting blogs geared towards telling you a story of how an issue got resolved. I will include the entire troubleshooting process to give you a fully transparent account of what went on. I hope you find these interesting. Please leave the feedback in the comments if you like the format or things I can improve on :smile:

 

Let's get started!

 

 

Problem Description


 

 

Trying to register the secondary site for System Replication fails with error "remoteHost does not match with any host of the source site"

 

 

Environment Details


 

 

This incident occurred on Revision 73

 

 

Symptoms


 

 

Running the following command:

 

hdbnsutil -sr_register --name=SITEB --remoteHost=<hostname primary> --remoteInstance=<inst> --mode=<sync mode>

 
Gives error:

 

adding site ..., checking for inactive nameserver ..., nameserver <hostname_secondary>:3<inst>01
not responding., collecting information ..., Error while registering new
secondary site: remoteHost does not match with any host of the source site.
please ensure that all hosts of source and target site can resolve all
hostnames of both sites correctly., See primary master nameserver tracefile for
more information at <hostname_primary>, failed. trace file nameserver_<hostname_secondary>00000.000.trc
may contain more error details.]

 

 

Studio had a similar error as well.



 

 

Troubleshooting


 

The error message indicates that the secondary system could not be reached when performing sr_register.

 

Firstly, when dealing with System Replication, it is always good to double-check that all the prerequisites have been completed. Refer to the Administration
guide for this (http://help.sap.com/hana/SAP_HANA_Administration_Guide_en.pdf)

 

 

Let’s make sure the network connectivity is fine between the primary master nodes and the secondary master nodes.

 

 

Are the servers able to ping each other?

 

From the O/S, type “ping <hostname>”. Perform this from the primary to secondary and secondary to primary.

 

 

In this customer’s case, ping was successful.

 

 

What about firewalls? Could the ports be blocked?

 

 

From the O/S, type “telnet <hostname> <port>”. Perform this from the primary to the secondary and secondary to the primary.
The port that you will use is the SQL Port. In this case 3<instance number>15.

 

 

In this customer’s case, ping was successful.

 

 

 

Comparing the host files between the primary and secondary sites


The customer noticed that there was an error in the /etc/hosts file, the shortname was not filled in correctly. They fixed this, but the problem still occurred :sad:

 

 

 

 

Network Communication and System Replication


There is a note 1876398 - Network configuration for System Replication in HANA SP6. 


 

 

 

The symptoms of the note match what we are experiencing “When using SAP HANA Support Package 6, a
System Replication secondary system may not be able to establish a connection to the primary system.
“.

 

It is explained “Therefore, the listener hears only on the local network. System Replication also uses the infrastructure for internal network communication for exchanging data between the name servers of the primary and the secondary system.  Therefore, the name servers of the two systems can no longer communicate with each other in this case.”

 

 

It is worth noting this is very common cause of the issue, but in the customer's case, it was not the problem.

 

 

 

 

Strace

 

 

Performed an strace, here is some of the output.

 

 

sendto(13,"?\0\50\50\50\60\0\0\0\1\2\6,\0\0\0dr_gethdbversion"..., 86, 0, NULL,
0) = 86


recvfrom(13,0x7f1bd94549264, 8337, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)

poll([{fd=13,events=POLLIN|POLLPRI}], 1, -1) = 1 ([{fd=13, revents=POLLIN}])

recvfrom(13,"\323\346\v\333\333\333\333\333\333\333\333F\1I\nhdbversionI\0221.00."...,
8337, 0, NULL, NULL) = 52


recvfrom(13,0x7fff22c5277f, 1, 2, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)

recvfrom(13,0x7fff22c528bf, 1, 2, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)

gettid()                                = 35760

sendto(13,"?\0\32\33\45\33\0\0\0\1\2\0033\0\0\0dr_registerdatac"..., 413, 0,NULL, 0) = 413

recvfrom(13,0x7f1bd9745564, 8337, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)

poll([{fd=13,events=POLLIN|POLLPRI}], 1, -1

 

 

Seems like some sort of packet loss here.

 

 

 

Involving the Networking Team

 

 

 

We involved the customer’s networking team and found that the MTU-size was set to 9000. They set the MTU-size to 1500 and then ran the register step and it worked! The registration completed!

 

 

The networking team did not explain exactly what was going on but we suspect they performed a tcpdump to see if there was packet loss.

 

 

** This may need to be changed back later for performance optimization, see 2081065 - Troubleshooting SAP HANA Network **

 

 

 

Disclaimer


 

This blog detailed the steps that SAP and the customer worked through a problem towards a resolution. This may not be the exact resolution for every incident that has the same symptoms. If you are encountering the same issue, you can review these steps with your HANA Administrator and Networking team.

1 Comment