五位诺奖科学家的小怪癖(年年翻阅同一本天书)
五位诺奖科学家的小怪癖(年年翻阅同一本天书)如今,计算机通常采用动态生成法来产生随机数。更准确地讲,这种方法需要先设定一个大的随机数“种子”,然后重复以下两个操作:首先是“洗牌”,即通过特定序列的数学运算将这个种子数打乱;然后“分牌”,即从打乱后的数字中取出几位数,作为下一个输出。然后对前面处理过的数再次“洗牌”、“分牌”,循环往复。整套流程,除了最初的种子数和“洗牌程序”,都是独立、机械与固定的。我们在一个大数据与电子游戏行业蓬勃发展的时代,随机数比以往任何时候都更有价值。大量应用程序,时刻需要产生上百万的随机数,并将其迅速传输。比如,市场调查与射击类游戏就是随机数的消费巨头。据我了解,该书是兰德公司在1955年出版的。兰德公司用电子脉冲模拟随机轮盘来物理生成随机数,然后把每次得到的结果记录下来。在现代计算机出现之前,这本经典著作被工程师和科学家们广泛使用。(最近,人们发现了书里的一些错误,去年《华尔街日报》的一篇文章就报道过。)
Frank Wilczek
因发现了量子色动力学的渐近自由现象,弗兰克·维尔泽克在2004年获得了诺贝尔物理学奖。弗兰克·维尔泽克是世界顶尖科学家协会(WLA)指导委员会成员、麻省理工学院物理学教授、量子色动力学的奠基人之一。关于他的个人经历,可戳:WLA科学家说⑩ | 在实验室里待不住,他转身研究理论物理,拿了诺奖。
中文版
多年前的一个夏天,我在一家常去的旧书店里发现了一本名为《一百万随机数与十万标准偏差》()的书。被书名吸引,我不禁翻阅起来。这本书没让我失望。书的主要内容是一个长达400页的随机数表,每页50行,每行50个数字。在整齐的表格中,0~9这十个数字毫无规律地排列着,一页又一页。自那时起,我每年都会去阅读这本书,再把它放回书架上。
据我了解,该书是兰德公司在1955年出版的。兰德公司用电子脉冲模拟随机轮盘来物理生成随机数,然后把每次得到的结果记录下来。在现代计算机出现之前,这本经典著作被工程师和科学家们广泛使用。(最近,人们发现了书里的一些错误,去年《华尔街日报》的一篇文章就报道过。)
什么是看上去随机的数列呢?它最简单的一个特征是:每一个数字出现的概率是相等的,即那个电子骰子没有被做过手脚。不仅如此,每两个数字出现的概率也是相同的。这意味着不同数字的生成是相互独立的。如果我们无法根据数列前面的数字来预测后面的数,那么这个数列就是完全不可预测的。在这个层面上,这个数列看起来是随机的。
这种随机数列有许多用途。比如,在统计学家常说的无偏抽样中,为了获得一大组数据的平均属性,我们只要对其中一个小的子集进行抽样。只有当抽样是无偏差的时候,它才是有效的。而一个随机数表可以帮我们做到这点。
我们在一个大数据与电子游戏行业蓬勃发展的时代,随机数比以往任何时候都更有价值。大量应用程序,时刻需要产生上百万的随机数,并将其迅速传输。比如,市场调查与射击类游戏就是随机数的消费巨头。
如今,计算机通常采用动态生成法来产生随机数。更准确地讲,这种方法需要先设定一个大的随机数“种子”,然后重复以下两个操作:首先是“洗牌”,即通过特定序列的数学运算将这个种子数打乱;然后“分牌”,即从打乱后的数字中取出几位数,作为下一个输出。然后对前面处理过的数再次“洗牌”、“分牌”,循环往复。整套流程,除了最初的种子数和“洗牌程序”,都是独立、机械与固定的。
由这样的一个确定性过程产生的看似随机的输出,就叫作伪随机数。伪随机数在科学和密码学的应用中具有重要的优势。通过分享其中的密钥和程序,也就是前面提到的种子和洗牌,就可以将原本不可预知的选择重复出来。
“伪随机性”也美妙地隐喻了我们是如何从一个按照确定的规律运行的大千世界中,感受到自由选择的能力。正如在你对随机数种子和程序一无所知的时侯,会觉得伪随机数看上去真的随机一样,当你不了解大脑潜意识的时候,人类的行为也是看似完全随意的。我怀疑这并不只是一个比喻。
《一百万随机数与十万标准偏差》这本书从技术上来看,已经过时了。但对我来说,它是一首终极的自由诗。今年夏天,或许我会把它特意放到诗歌类的书架上,或者最终买下它,又或者和以前一样把它放回原处。我很高兴地觉得自己享有这个选择的权利,尽管(我认为)我了解更多。
英文版
Many summers ago I discovered a book called "A Million Random Digits with 100 000 Normal Deviates" at a used bookstore I visit. That title being irresistible I looked inside. It did not disappoint. The main table takes up 400 pages each with 50 lines of 50 digits. Page after page in neat columns it makes parades of digits chosen from 0 to 9 guaranteed to be free of rhyme or reason. Every year since I’ve admired it again before putting it back on the shelf.
The book I learned was published by the Rand Corporation in 1955. Rand generated the numbers physically by spinning a series of wobbly electronic "roulette wheels" over and over and recording the results. Their classic book was widely used by engineers and scientists for years in the days before modern computers. (Though errors were recently found as Michael Phillips reported last year in The Wall Street Journal.)
What does it mean for a series to look random? The simplest property is that each digit appears equally often-the digital die is not loaded. Each run of two digits also appears equally often. This means that each digit conveys no information about the next one. If no method works to use earlier digits to predict the values of later ones our series has proven completely unpredictable-and it is fair to say that it looks random.
Sequences of such randomly generated numbers are useful for many purposes such as what statisticians call unbiased sampling. To estimate average propertiesof a big data set you sample a small subset. This is legitimate provided that you choose the sample without any bias. A table of random numbers can help you do that.
In the age of ginormous data sets and intense computer games random numbers are more valuable than ever. Many applications call for more than a million random digits delivered rapidly. Market surveys and shoot-'em-up games are both voracious consumers.
Nowadays when a computer needs random numbers the usual method is to generate them on the fly. More precisely it starts with a big "seed" number and then does two things over and over. It puts the big number through a fixed series of mathematical operations that scramble it (akin to thoroughly shuffling a deck of cards). Then it records a few digits of the result as the next entries in its output (like dealing a hand). Then it does the same things again starting with the processed number. Rinse lather repeat. Aside from the seed number and the shuffling procedure the whole process is self-contained mechanical and free of any choices.
When a deterministic procedure like this gives an output that looks random we say its output is pseudo-random. The pseudo version of randomness has important advantages in scientific and cryptographic applications because by sharing the key and program—here the seed and shuffling mechanism-you can allow others to reproduce your (otherwise) unpredictable choices.
Pseudo-randomness is a beautiful metaphor for how our own perception of free choice can emerge from underlying determinism as we navigate through the world. Just as a sequence of pseudo-random numbers appears freely chosen if you don't have access to the program and seed so can human actions appear to be freely chosen if you lack access to the brain's underlying subconscious processing. I suspect that this is more than a metaphor.
"A Million Random Digits with 100 000 Normal Deviates" is now obsolete. But to me it's cerebral poetry-the ultimate in free verse. This summer I might re-shelve it in the poetry section or finally buy it or maybe just put it back as usual. I'm happy to think the choice is mine—even though (I think) I know better.
作者 | Frank Wilczek
翻译 | 胡风、梁丁当