加载中…
个人资料
枝叶飞扬
枝叶飞扬
  • 博客等级:
  • 博客积分:0
  • 博客访问:1,921,751
  • 关注人气:217
  • 获赠金笔:0支
  • 赠出金笔:0支
  • 荣誉徽章:
相关博文
推荐博文
谁看过这篇博文
加载中…
正文 字体大小:

German tank problem

(2016-03-19 23:18:21)
标签:

点估计

最大似然估计法

minimum-varianceunbi

序列号

德国坦克问题

分类: 数学

Crusader tank and german tank


During World War 2, the Western Allies used a simple formula to estimate the rate at which German tanks were being produced, based on the serial numbers obtained from captured and destroyed tanks.  
  
The formula is the following:

German tank problem formula

where   N hat is estimated total number of objects (e.g. German tanks) 
            m  is the highest sampled serial number 
            n  is the sample size (e.g. the number of captured/destroyed German tanks)

  
For example, let’s say 10 tanks were captured/destroyed, and the following serial numbers were obtained: 
117, 232, 122, 173, 167, 12, 168, 204, 4, 229 
  
  
The highest serial number obtained was 232, therefore m = 232.

German tank problem example

It so happens that these 10 serial numbers were drawn randomly from a (rounded) uniform distribution with minimum 1, and maximum 255.



How well the formula performed 
  
The formula performed much better than the conventional intelligence estimates.  Conventional intelligence estimates were based on counting the number of tanks on the battlefield and by secretly observing factories. 
  
Through conventional intelligence it was estimated that the Germans were producing around 1400 tanks per month, from June 1940 to September 1942.  The statistical estimate was 246 tanks per month.  After the war, German production figures showed the actual number to be 245. 
  
Estimates for some specific months:

Month Statistical Estimate Intelligence Estimate German Records
June 1940 169 1000 122
June 1941 244 1550 271
August 1942 327 1550 342

The statistical estimates were useful because they gave the Allies an idea of whether or not an attack on the western front could succeed.



Other applications 
  
This formula can be applied to other things with serial numbers For example, with serial numbers gathered through online discussions, the same formula was used to estimate the number of iphones sold.  It was estimated that Apple had sold around 9.1 million phones to the end of September 2008.

from: http://www.statisticalconsultants.co.nz/blog/the-german-tank-problem.html

最小方差无偏估计

对于点估计(估算出单个总体(\hat{N})值),最小方差无偏估计(MVUE,或UMVU估计)由下式给出:[c]

\hat{N} = m\left(1 + k^{-1}\right) - 1

其中m是所观察到的最大序号(样本最大值),而k是观察到的坦克数目(样本容量)。注意,一旦观察到一个序列号,它就不再在样本池中,也不会被再次观察到。

其方差为

 \operatorname{var}(\hat{N}) = \frac{1}{k}\frac{(N-k)(N+1)}{(k+2)} \approx \frac{N^2}{k^2} \text{ for small samples } k \ll N

因而标准差约为N/k,即样本间距的(总体)平均大小;注意与前文中的m/k相比。


直观

公式可以直观地理解为样本最大值加上样本中观测值之间的平均间距,由于是最大似然估计样本最大值被用作初始估计值,再加上间距以补偿样本最大值的负偏差,以此作为总体最大值的一个估计,因而可写成:

\hat{N} = m + \frac{m - k}{k}= m + mk^{-1} - 1 = m\left(1 + k^{-1}\right) - 1

可以想象样品在整个区间内均匀分布,而更多的样本就在0至N + 1的区间之外。 如果在0和编号最小样本(样本最小值)之间选一个初始间距,那么样本间的平均间距是(m - k)/k;有-k是因为样本本身在计算样本间距时不应算入。

这一理念确立并推广了最大间距估计的方法。


推导

样本最大值等于m的概率为\tbinom{m - 1}{k - 1}\big/\tbinom Nk,其中\tbinom \cdot\cdot二项式系数

样本最大值的期望值为

\begin{align} \mu &= \sum_{m=k}^N m\frac{\tbinom{m - 1}{k - 1}}{\tbinom Nk} = \frac{k(N + 1)}{k + 1} \ \Rightarrow N &= \mu\left(1 + k^{-1}\right) - 1 \end{align}

因而

\begin{align} \mu\left(1 + k^{-1}\right) - 1 &= E\left[m\left(1 + k^{-1}\right) - 1\right] \ \Rightarrow \hat{N} &= m\left(1 + k^{-1}\right) - 1 \end{align}

N无偏估计

为了表明这是UMVU估计:

0

阅读 评论 收藏 转载 喜欢 打印举报/Report
前一篇:网络流行成语
后一篇:抢票软件原理
  • 评论加载中,请稍候...
发评论

    发评论

    以上网友发言只代表其个人观点,不代表新浪网的观点或立场。

    < 前一篇网络流行成语
    后一篇 >抢票软件原理
      

    新浪BLOG意见反馈留言板 电话:4000520066 提示音后按1键(按当地市话标准计费) 欢迎批评指正

    新浪简介 | About Sina | 广告服务 | 联系我们 | 招聘信息 | 网站律师 | SINA English | 会员注册 | 产品答疑

    新浪公司 版权所有