【负载均衡技术】TCP复用所产生的故障解析 --- 一个非常实用的案例
标签:
tcptcp端口复用time_out负载均衡 |
分类: 互联网运维技术 |
Client
|------ syn1
----->|
|
|
|<----- ack
-------|
|------ rst
------>|
|
|
|
|------ syn2
----->|
|
[Service-Type http 未发生3秒延迟现象]
Client
|------ syn1
----->|
|<--- syn_ack
-----|
|------ ack
------>|
|------ GET
------>|
|
|
|
|
|
|
|
由上面的两个TCP FLOW可以看出,
问题进一步分析:
有了假设之后,我们进行了如下的实验:
2 确认tcp处在time_wait状态
14:28:31.460914 IP 10.192.144.177.35000 > 10.192.144.172.http: S 2256667492:2256667492(0) win 65535 14:28:31.461764 IP 10.192.144.172.http > 10.192.144.177.35000: . ack 34418700 win 1448 <nop,nop,timestamp 102151303 71539512> 14:28:31.462849 IP 10.192.144.177.35000 > 10.192.144.172.http: R 34418700:34418700(0) win 0 14:29:56.534495 IP 10.192.144.177.35000 > 10.192.144.172.http: S 1557240063:1557240063(0) win 5840 <mss 1460,sackOK,timestamp 71563072 0,nop,wscale 6> 14:29:56.535583 IP 10.192.144.172.http > 10.192.144.177.35000: S 2358231624:2358231624(0) ack 1557240064 win 5792 <mss 1460,sackOK,timestamp 102236030 71563072,nop,wscale 2> 14:29:56.537262 IP 10.192.144.177.35000 > 10.192.144.172.http: . ack 2358231625 win 92 <nop,nop,timestamp 71563072 102236030> 14:30:02.904552 IP 10.192.144.177.35000 > 10.192.144.172.http: P 1557240064:1557240089(25) ack 2358231625 win 92 <nop,nop,timestamp 71564664 102236030> 14:30:02.905083 IP 10.192.144.172.http > 10.192.144.177.35000: . ack 1557240089 win 1448 <nop,nop,timestamp 102242382 71564664> 14:30:03.208222 IP 10.192.144.177.35000 > 10.192.144.172.http: P 1557240089:1557240090(1) ack 2358231625 win 92 <nop,nop,timestamp 71564740 102242382> 14:30:03.209047 IP 10.192.144.172.http > 10.192.144.177.35000: . ack 1557240090 win 1448 <nop,nop,timestamp 102242685 71564740> 14:30:03.211047 IP 10.192.144.172.http > 10.192.144.177.35000: P 2358231625:2358232110(485) ack 1557240090 win 1448 <nop,nop,timestamp 102242688 71564740> 14:30:03.211375 IP 10.192.144.172.http > 10.192.144.177.35000: F 2358232110:2358232110(0) ack 1557240090 win 1448 <nop,nop,timestamp 102242688 71564740> 14:30:03.211843 IP 10.192.144.177.35000 > 10.192.144.172.http: . ack 2358232110 win 108 <nop,nop,timestamp 71564740 102242688> 14:30:03.212353 IP 10.192.144.177.35000 > 10.192.144.172.http: F 1557240090:1557240090(0) ack 2358232111 win 108 <nop,nop,timestamp 71564740 102242688> 14:30:03.212465 IP 10.192.144.172.http > 10.192.144.177.35000: . ack 1557240091 win 1448 <nop,nop,timestamp 102242689 71564740>
如上面显示的结果,TCP正常建立连接,并且在数据传输完成后,服务器端主动执行关闭操作,连接进入time_wait(2msl)状态
3 利用sendip工具再度连接该socket对
14:30:20.923480 IP 10.192.144.172.http >
10.192.144.177.35000: S 2358297648:2358297648(0) ack 2380432428 win
5840
14:30:20.924389 IP 10.192.144.177.35000
> 10.192.144.172.http: R 2380432428:2380432428(0)
win 0
于是有了下面的实验3
19:25:47.477340 IP 10.192.144.177.32785 >
10.192.144.172.http: S 2054603651:2054603651(0) win 5840
<mss 1460,sackOK,timestamp 180993208 0,nop,wscale
6>
19:25:47.477378 IP 10.192.144.172.http >
10.192.144.177.32785: S 4041734616:4041734616(0) ack 2054603652 win
5792 <mss 1460,sackOK,timestamp 340710
180993208,nop,wscale 3>
19:25:47.477831 IP 10.192.144.177.32785 >
10.192.144.172.http: . ack 4041734617 win 92
<nop,nop,timestamp 180993209
340710>
19:25:47.480333 IP
10.192.144.177.32785 > 10.192.144.172.http: P
2054603652:2054603829(177) ack 4041734617 win 92
<nop,nop,timestamp 180993209
340710>
19:25:47.480360 IP 10.192.144.172.http >
10.192.144.177.32785: . ack 2054603829 win 858
<nop,nop,timestamp 340713
180993209>
19:25:47.482095 IP 10.192.144.172.http >
10.192.144.177.32785: P 4041734617:4041734861(244) ack 2054603829
win 858 <nop,nop,timestamp 340715
180993209>
19:25:47.482508 IP 10.192.144.172.http >
10.192.144.177.32785: F 4041734861:4041734861(0) ack 2054603829 win
858 <nop,nop,timestamp 340715
180993209>
19:25:47.483351 IP 10.192.144.177.32785 >
10.192.144.172.http: . ack 4041734861 win 108
<nop,nop,timestamp 180993210
340715>
19:25:47.487574 IP 10.192.144.177.32785 >
10.192.144.172.http: F 2054603829:2054603829(0) ack 4041734862 win 108
<nop,nop,timestamp 180993211
340715>
19:25:47.487621 IP 10.192.144.172.http >
10.192.144.177.32785: . ack 2054603830 win 858
<nop,nop,timestamp 340720
180993211
2 确认服务器端处在time_wait状态
3 在time_wait状态下,再度使用同一soeket对进行连接,次出强制使用比
$ sudo sendip -d '' -p ipv4 -p tcp -is 10.192.144.177 -ts 32785 -td 80 10.192.144.172 -tfr 0 -tfs 1 -tn 2054603820
19:26:20.606046 IP 10.192.144.177.32785 > 10.192.144.172.http: S 2054603820:2054603820(0) win 65535 19:26:20.606086 IP 10.192.144.172.http > 10.192.144.177.32785: . ack 2054603830 win 858 <nop,nop,timestamp 373795 180993211> 19:26:20.606765 IP 10.192.144.177.32785 > 10.192.144.172.http: R 2054603830:2054603830(0) win 0
此处时实验结果证明,对于新来的syn连接,服务器直接返回ack(而不是syn+ack),表示该包not accetable(RFC793)
根据这个实验结果,我们对本次的问题有了进一步的假设,
为了证明以上假设的正确性,我们在LB上做了进一步的测试,测试LB是如何进行snat的
12:31:53.967580 IP 192.168.11.6.57516
> 110.44.179.124.443: Flags [S], seq 3348951429, win
5840, options [mss 1460,sackOK,TS val 1478305370 ecr 0,nop,wscale
6], length 0
真实WEB服务器端数据
12:31:53.970874 IP
172.24.2.21.17708 > 172.24.2.242.snpp: S
3348951429:3348951429(0) win 5840 <mss
1400,sackOK,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,wscale
6>
至此,问题基本明了。原因归结如下:
|------ syn1
----->|
|
|
|
|
|
|<---
|------
|
|

加载中…