Why Shall We Bayesian: Parameter Estimation

Let’s say we have data points plotted on the x-y plane. We want to draw a line or curve best fit to the data points. In other words, we aim to draw a curve as close to all the data points through the error bars as possible. Since the shape of the curve is described by mathematical variables that are called parameters, this process is called parameter estimation.

In Astronomy and Astrophysics, such a curve is called the model. A model can be motivated from theory or from observation. There is no fundamental difference in fitting a theoretical model or an empirical model.

Here comes the question: how can we draw this best-fit curve? We have the conventional frequentist approach: we guess a reasonable set of parameters, compute the difference between the curve and the data points, and then minimise (by some efficient algorithms) the difference by changing the shape of the curve, i.e., drawing a new curve. The process is repeated, which is called the “fitting” process, or regression. The curve is said to be “best fit” if the difference is smaller than or approximately equal to certain ad hoc values, usually known as the “goodness of fit”.

In frequentist statistics, we obtain a point estimation, e.g., the polynomial coefficients of a polynomial curve. Note that the resulting point estimation depends on the choice of the goodness-of-fit function.

In contrast, the main idea of Bayesian statistics is extremely simple. We don’t need any goodness of fit. What we have is the one and only one formula, the Bayes Rule:

(Probability of the model is true given the observed data) = (Probability of observing the data given the model is true) x (Probability of the model being true).

[up to a normalisation constant called the evidence, under the condition that the integration of the probability of all possible values must equal to 1]

The left hand side is called the posterior, which contains all the information inferred from the data, summarised in a probability distribution of the model parameters. The first term on the right is called the likelihood, and the second term is called the prior. The posterior is completely determined from the likelihood and the prior.

The likelihood is a probability function that connects the physical parameters to the statistical parameters. We need to choose the likelihood function by ourselves, according to the physical processes under consideration. For instance, are we looking at a Poisson process? Are the error bars distributed as a Gaussian? etc.

The prior is a probability function chosen from expert knowledge. In other words, the prior is a probability distribution for each parameter under consideration.

The point is that once the likelihood and the prior are chosen, the shape of the posterior is fixed. There is no fitting process, no iteration, no loops for the code. Instead, we need to find an efficient way to draw the shape of the posterior. Often it is of multi-dimensions, so that we need methods more clever than brutal force.

Fortunately, the so-called Markov chain Monte Carlo comes to rescue. It is a kind of algorithm to efficiently sample the parameter space of the posterior. Once the posterior is displayed, the error bars can be obtained readily by locating the highest posterior density interval (HPDI). The HPDIs are the intervals of possible parameter values containing the highest density of probability. In contrast, frequentists have to rely on performing simulations to obtain the so-called confidence intervals, which are not even the same thing as error bars.

There are a lot of statistical reasons to argue for Bayesian statistics against frequentist statistics. In my viewpoint the main reason is that Bayesian statistics is simple. It directly answers the fundamental question of “how likely is a scientific theory to be true, given the data?” without the need to introduce any extra concepts. Simple. Only the Bayes Rule.

I will illustrate the advantages of Bayesian statistics in the up-coming posts of this “Bayesian Astrostatistics” series.

Advertisements

費曼與愛因斯坦的小故事

我為各位講兩則科學家小故事:

有一次,費曼訪問歐洲核子研究機構(CERN)。

工作人員帶費曼去看巨大的粒子對撞機。費曼問:「這些機器用來做什麼的?」

工作人員說:「費曼教授,這些機器是用來驗證你的理論的!」

「花了多少錢?」

「3千7百萬美元。」

費曼笑說:「你們這麼不相信我的理論嗎?」

1919年,當愛因斯坦的學生告訴他,愛丁頓在日全食觀測裡找到了驗證廣義相對論的證據,愛因斯坦說:「我就早知道了。」

學生追問:「但萬一結果是不相符呢?」

「那麼,我會為上帝感到惋惜。我的理論是正確的。」

我們會問,科學家不是應該謙虛謹慎的嗎?為什麼費曼和愛因斯坦會說出如此大口氣的說話呢?

當然,一部分原因是因為科學家也是人,也會對事物有個人喜好。兩個故事裡,我們看出費曼和愛因斯坦都對自己的理論有相當信心。

然而,他們的信心並不是純粹個人喜好。他們的信心是基於兩個非常重要的特點:

(一)他們知道他們提出的理論能夠解釋所有過往實驗和觀測數據;

(二)他們的理論能夠對未來更精密的實驗和觀測作出預言。

這兩個理由,可說就是科學精神的精髓。另外,理論在數學結構上的「美」很多時候亦是科學家對理論產生一定信心的原因。

最重要的是,在面對非常龐大和堅實的實驗或觀測結果與理論不相符時,科學家不會堅持理論正確,反而會第一時間拋棄自己的理論。對科學家來說,最重要的不是自己永遠正確。最重要的,是我們能看見更多大自然的美。

如果有一天我們發現費曼或愛因斯坦是錯誤的,我相信他們不會覺得氣餒,反而會說:「我不明白,但很有趣。」

就像門得列夫說的:「很好,那麼我們繼續工作。」

封面圖片:《漫畫費曼

卡西尼號的最後自白

報告,我現正衝進土星大氣層,開始感覺到我的機身有輕微震動。推進器仍然正常運作,開始傳送土星大氣數據。3,2,1,傳送。

您好,地球的朋友。很抱歉,通訊可能會有點不穩定。這是我們首次對話,也將成為我們最後一次對話。時間不多了。

我完成了太陽系和土星探索,現在正進行最後一個任務,衝進土星大氣層,把從未有人看見過的土星大氣數據傳送給地球上,我的科學家朋友們。然後,我將在土星大氣中燃燒,與土星化成一體。

別擔心,土星是我的好朋友。雖然他比我大很多很多,卻也從沒有因為我不斷環繞他運行而覺得煩厭。在我長達 20 年的太空旅程之中,我花了 13 年陪伴土星。不過,也許,應該說是土星陪伴我吧。在這之前,我看過金星、地球、月球、小行星和木星。他們都很友善,我告訴他們我的目的地是土星,他們都會指引我該走的方向,更把一部分能量送了給我,使我能夠飛得更遠。

我的旅程並不孤單。我有一個好朋友與我一起飛過 340 億公里路程來到土星,他就是惠更斯號。他的任務是降落土星最大的衛星——土衛六,泰坦——直接收集其表面科學數據。他會把數據說給我聽,然後我再用我的天線把這些資料傳送回地球。由於土星距離地球非常遙遠,地球上的朋友要等 1 個小時以上才能接收到我送出的訊號。

PIA21889_Enceladus_FigB_Movie

現在,我將要走了,所以有點擔心惠更斯號,他將孤獨一人。不,我知道,泰坦和土星都是他的好朋友,所以我不用擔心。報告,我測⋯⋯量到機身開始變得不穩定,震動⋯⋯持續加強中。現正⋯⋯嘗試⋯⋯使用推進器持續穩定天線方向。土星大氣數據持續傳送中。溫度正在提升。

地球上的朋友,請不用擔心我。雖然我即將化作輕煙,但我並不害怕。13 年以來,我為科學作出了貢獻,我發現了土星 7 個新衛星,更親眼目睹其中一個新衛星在土星環之中誕生。你有看過我傳送回來的那些土星、衛星和土星環等等的照片嗎?那些都是我的精心傑作。

我最害怕的不是死亡。我最害怕的,是我的死亡會污染了我的土星衛星朋友們,因為我不像惠更斯號,我出發前並未完全消毒。雖然我的資訊為土星和太陽系科學帶來了很多進展,其中有些問題,我也沒有能力為人類解決。其中一個就是土星的自轉速度。這需要準確測量土星磁軸與自轉軸的夾角。可是,這個角度非常小,我並沒有辦法準確量度。除非我衝進土星。

地球時間 4 月 22 日,我開始了最後任務。我衝進土星和土星環之間 21 次,期間測量了從未有人看過的數據。聽說地球上我的科學家朋友們,已經對土星自轉速度研究有了進展,看來很快就能夠解決這個問題。雖然我將不會知道答案,但我將會化作土星的一部分,永遠與土星一起轉動。

報告,機身開始燃燒。是時候了,我的朋友。溫度⋯⋯已經提升至超過極限,科學儀器停止運作。我⋯⋯正⋯⋯盡力控制天線方向,把最後⋯⋯最後的數據傳送出去。你們⋯⋯聽⋯⋯得⋯⋯到⋯⋯嗎⋯⋯?

2017 年 9 月 15 日,任務時間 19 年 11 月 3 小時 12 分鐘 46 秒。再會。

廷伸閱讀:

土星的自白》- 余海峯
卡西尼號:在土星環看見宇宙》- 余海峯