Oracle customers use Oracle Active Dataguard to create and maintain one or many standby databases that protect their mission critical primary databases from disaster. Typically, in such deployments, the primary databases and standby databases are in geographically separate locations connected via a WAN (Wide Area Network). Log Transport Services, transfers Large volumes of redo logs from the primary location to the standby, using Sql*Net.
We have to ensure that all the components from the source to target are setup correctly to ensure that the data transfer can be done with the best throughput possible. If sufficient network bandwidth is not available with reasonable latencies, then we will start seeing the log transfer and apply,lagging on the standby site (Which is oracle speak for, your primary and standby database is now out of sync from a data perspective).
One key point to keep in mind is that, lower the network round trip time (aka latency), higher your data transfer throughput. Higher the network round trip time (aka latency), lower your data transfer throughput. So it is very important to maintain low round trip times on your Wide area network.
To understand network data transfer throughput, It is important to understand the the concepts of Tcp Window Size and Bandwidth Delay Product (aka BDP).
Tcp Window size is the amount of bytes that can be transmitted without receiving an acknowledgement from the other side. Once Tcp Window size amount of bytes are send, the sender stops sending any more bytes and waits for an acknowledgement from the receiver.
Bandwidth delay product is calculated as the product of the network bandwidth and network round trip time. bdp=network bandwidth*round trip time. This is the amount of data that left the sender before the first acknowledgement was received by the sender. If the senders output bandwidth is stable, and the bandwidth is fully used, then the BDP calculates the number of packets in transit. If we set the Tcp Window size equal to the bandwidth delay product, then in theory we should be able to fully utilize the available bandwidth.
Setup the network
We have to start by setting up the networking components to support the desired/stated bandwidth. So if you have a WAN that is a 10GigE network, all the NIC’s (Network interface cards), ports, switches in the configuration should be configured to support 10GigE full Duplex settings. After setup, run the configuration display utilities and ensure that all these component levels the transfer speeds are set to be 10 GigE. Customers often run into trouble when Auto Negotiation causes some NIC’s to set the transfer speeds to 1GigE because of configuration mismatches.
Use tools like Iperf to test the transfer speeds that your network is capable of achieving.
One important aspect to keep in mind is that it is probable that the WAN is shared by other traffic (e-mail, data replication, san replication). This has two important implications that we should consider.
- If there is a lot of bandwidth consumption by some of this miscellaneous traffic, round trip times could be degraded periodically on the network.
- We should be careful in our calculations that we do not completely consume the entire bandwidth for redo transport. (This could impact other processes)
- So it is important to figure out (Working with the network admins) what the bandwidth entitlements are for redo transport and base our calculations on those numbers.
Caclulate our BDP
Use the following formula to calculate our Bandwidth delay product (BDP)
(bandwidth/8)*rount trip time in seconds.
The network bandwidth is expressed in bits per second, so we divide by 8 to convert to bytes.
Round Trip Time is usually in milli seconds, so we divide by 1000 to convert to seconds.
So for example, if we have a 10Gbit network bandwidth and a 40ms round trip time
BDP=(10000000000/8)*(40/1000) = 50,000,000 bytes.
Setup Sql*Net Parameters
The current recommendations for Dataguard Redo transport are as follows.
Set the SDU size to 65535
- We can set SDU on a per connection basis using the SDU parameter in the local naming configuration file (TNSNAMES.ORA) and the listener configuration file (LISTENER.ORA)
- We can set the SDU for all Oracle Net connections with the profile parameter DEFAULT_SDU_SIZE in the SQLNET.ORA file.
Set TCP.NODELAY to YES
To preempt delays in buffer flushing in the TCP protocol stack, disable the TCP Nagle algorithm by setting TCP.NODELAY to YES in the SQLNET.ORA file on both the primary and standby systems.
Setup RECV_BUF_SIZE and SEND_BUF_SIZE
The current recommendation is to set the SEND_BUF_SIZE and RECV_BUF_SIZE parameters (Which are the send and receive socket buffer sizes for SQL*Net) to 3 Times the BDP.
As per the above example we would set them to 50,000,000*3 = 150,000,000
Setup Operating system Kernel Parameters
If you are using the Linux operating system make sure that the values for the following kernel parameters are setup to be higher than the values set for RECV_BUF_SIZE and SEND_BUF_SIZE.
net.core.rmem_max
net.core.wmem_max
Once we have configured the network, operating system and the sql*net, and we have redo transfer, we can perform further network monitoring to see how the network bandwidth is being utilized, and make appropriate adjustments.
Links to helpful Documents
How to calcluate Tcp throughput for long distance links (blog)
Oracle Net Performance Tuning (Mos)
Setting Send and Receive Buffer Sizes (Mos)
Tuning Sql*Net peformance (Oracle Docs)