最大熵原理指出,在精确陈述的先验数据(例如表示可测信息的命题)的背景下,最能代表当前知识状态的概率分布是最大熵的概率分布。这些先验数据是对概率分布的约束。
给定热力学第二定律(熵增加的原理),孤立的系统自发地向热力学平衡发展,在某些约束下,具有最大熵,最大熵分布的状态成为最自然的分布。在此博客文章中,我想讨论熵最大化和几个最大熵分布。
\ [\ begin {align} \ int _ {-\ infty} ^ {\ infty} e ^ {-x ^ 2} dx = \ sqrt {\ pi} \\\ end {align} \]我将在此处跳过证明,因为Wikipedia的证明并不难理解。
\ [\ begin {align} \ int _ {-\ infty} ^ {\\ infty} xe ^ {-x ^ 2} dx& =-\ frac {1} {2} \ int _ {-\ infty} ^ {\ infty} e ^ {-x ^ 2} d(-x ^ 2)\\& =-\ frac {1} {2} e ^ {-x ^ 2} \ big \ rvert _ {-\ infty} ^ { \ infty} \\& = 0 \\\ end {align} \] \ [\ begin {align} \ int _ {-\ infty} ^ {\ infty} x ^ 2 e ^ {-x ^ 2} dx&amp ; =-\ frac {1} {2} \ int _ {-\ infty} ^ {\ infty} xd(e ^ {-x ^ 2})\\& =-\ frac {1} {2} \ Big (xe ^ {-x ^ 2} \ big \ rvert _ {-\ infty} ^ {\ infty}-\ int _ {-\ infty} ^ {\ infty} e ^ {-x ^ 2} dx \ Big)\\ & =-\ frac {1} {2} \ Big(0-\ sqrt {\ pi} \ Big)\\& = \ frac {\ sqrt {\ pi}} {2} \\\ end {align } \] \ [\ begin {align} H(P)& =-\ sum_ {x \ in X} ^ {} P(x)\ log P(x)\\\ end {align} \]第一个给定$ P $是概率分布,这两个约束是微不足道的。第三个约束是可选的,它指示整个系统上的约束。注意,如果$ m>可能有多个约束。 1 $。
\ [\ max_ {P} H(P)= \ max_ {P} \ Big(-\ sum_ {x \ in X} ^ {} P(x)\ log P(x)\ Big)\]让我们尝试解决此优化问题。我们将使用拉格朗日乘数作为约束。
\ [\ begin {align} L(P,\ lambda_0,\ lambda_1,\ cdots,\ lambda_m)& =-\ sum_ {x \ in X} ^ {} P(x)\ log P(x)+ \ lambda_0 \ Big(\ sum_ {x \ in X} ^ {} P(x)-1 \ Big)+ \ sum_ {i = 1} ^ {m} \ lambda_i \ sum_ {x \ in X} ^ {} \ Big(P(x)r_i(x)-\ alpha_i \ Big)\\\ end {align} \]我们取$ L(P,\ lambda_0,\ lambda_1,\ cdots,\ lambda_m)$的导数到$ P(x)$,则导数应为$ 0 $。
\ [\开始{align} \ frac {\ partial} {\ partial P(x)} L(P,\ lambda_0,\ lambda_1,\ cdots,\ lambda_m)& =-\ log P(x)-1 + \ lambda_0 + \ sum_ {i = 1} ^ {m} \ lambda_i r_i(x)\\& = 0 \\\ end {align} \] \ [\ begin {align} P(x)& = e ^ {\ big(\ sum_ {i = 1} ^ {m} \ lambda_i r_i(x)\ big)+ \ lambda_0-1} \\& = \ frac {e ^ {\ sum_ {i = 1} ^ {m} \ lambda_i r_i(x)}} {e ^ {1-\ lambda_0}} \\\ end {align} \] \ [\ begin {align} \ sum_ {x \ in X} ^ {} P( x)& = \ sum_ {x \ in X} ^ {} e ^ {\ big(\ sum_ {i = 1} ^ {m} \ lambda_i r_i(x)\ big)+ \ lambda_0-1} \\ & = e ^ {\ lambda_0-1} \ sum_ {x \ in X} ^ {} e ^ {\ sum_ {i = 1} ^ {m} \ lambda_i r_i(x)} \\& = 1 \ \\ end {align} \] \ [e ^ {1-\ lambda_0} = \ sum_ {x \ in X} ^ {} e ^ {\ sum_ {i = 1} ^ {m} \ lambda_i r_i(x) } \] \ [\ begin {align} P(x)& = \ frac {e ^ {\ sum_ {i = 1} ^ {m} \ lambda_i r_i(x)}} {\ sum_ {x \ in X } ^ {} e ^ {\ sum_ {i = 1} ^ {m} \ lambda_i r_i(x)}} \\\ end {align} \] \ [\ begin {align} H(P)& =- \ int_ {X} ^ {} P(x)\ log P(x)dx \\\ end {align} \] \ [\ begin {align} L(P,\ lambda_0,\ lambda_1,\ cdots,\ lambda_m )& =-\ int_ {X} ^ {} P(x)\ log P(x)d x + \ lambda_0 \ Big(\ int_ {X} ^ {} P(x)dx-1 \ Big)+ \ sum_ {i = 1} ^ {m} \ lambda_i \ Big(\ int_ {X} ^ {} P(x)r_i(x)dx-\ alpha_i \ Big)\\\ end {align} \]对于以下项,我们采用$ L(P,\ lambda_0,\ lambda_1,\ cdots,\ lambda_m)$的导数$ P(x)$和导数应为$ 0 $。我们还将使用变异演算来计算导数,这要稍微复杂一些。在不深入讨论所有细节的情况下,我们有以下衍生内容。
Warning: Can only detect less than 5000 characters
\ [\ begin {align} S(P)& =-k_B \ sum_ {x \ in X} ^ {} P(x)\ log P(x)\\\ end {align} \]这种熵称为Gibbs熵与Shannon熵的区别在于玻尔兹曼常数$ k_B $。我们仍然可以使用熵最大化来推导玻尔兹曼分布。最大熵分布的约束是
约束中的$ U $实际上是系统的内部能量,$ \ varepsilon(x)$是系统的能量状态,并且通过热力学进行量化。
鉴于本文的所有推导,不难发现对于吉布斯熵,最大熵概率分布为
\ [\ begin {align} P(x)& = e ^ {\ big(\ frac {1} {k_B} \ sum_ {i = 1} ^ {m} \ lambda_i r_i(x)\ big)+ \ frac {\ lambda_0} {k_B}-1} \\& = \ frac {e ^ {\ frac {1} {k_B} \ sum_ {i = 1} ^ {m} \ lambda_i r_i(x)}} { e ^ {1-\ frac {\ lambda_0} {k_B}}} \\\ end {align} \] \ [e ^ {1-\ frac {\ lambda_0} {k_B}} = \ sum_ {x \ in X } ^ {} e ^ {\ frac {1} {k_B} \ sum_ {i = 1} ^ {m} \ lambda_i r_i(x)} \] \ [\ begin {align} P(x)& = e ^ {\ frac {1} {k_B} \ lambda_1 \ varepsilon(x)+ \ frac {\ lambda_0} {k_B}-1} \\& = \ frac {e ^ {\ frac {1} {k_B} \ lambda_1 \ varepsilon(x)}} {e ^ {1-\ frac {\ lambda_0} {k_B}}} \\\ end {align} \] \ [e ^ {1-\ frac {\ lambda_0} {k_B} } = \ sum_ {x \ in X} ^ {} e ^ {\ frac {1} {k_B} \ lambda_1 \ varepsilon(x)} \] \ [\ begin {align} S(P)& =-k_B \ sum_ {x \ in X} ^ {} P(x)\ log P(x)\\& =-k_B \ sum_ {x \ in X} ^ {} P(x)\ log \ big(e ^ {\ frac {1} {k_B} \ lambda_1 \ varepsilon(x)+ \ frac {\ lambda_0} {k_B}-1} \ big)\\& =-k_B \ sum_ {x \ in X} ^ {} P(x)\ big(\ frac {1} {k_B} \ lambda_1 \ varepsilon(x)+ \ frac {\ lambda_0} {k_B}-1 \ big)\\& =-k_B \ sum_ {x \ in X} ^ {} P(x)\ Big( \ big(\ frac {1} {k_B} \ lambda_1 \ varepsilon(x)\ big)+ \ big(\ frac {\ lambda_0} {k_B}-1 \ big)\ Big)\\& =-k_B \ bigg(\ frac {1} {k_B} \ lambda_1 \ sum_ {x \ in X} ^ {} P(x)\ varepsilon(x)+ \ big(\ frac {\ lambda_0} {k_B}-1 \ big) \ sum_ {x \ in X} ^ {} P(x)\ bigg)\\& =-k_B \ bigg(\ frac {1} {k_B} \ lambda_1 U + \ big(\ frac {\ lambda_0} { k_B}-1 \ big)\ bigg)\\& =-\ lambda_1 U + \ lambda_0-k_B \\\ end {align} \]根据内能的定义和热力学第一定律,我们有遵循热力学身份。
\ [dU = T dS-p dV \]其中$ U $是内部能量,$ T $是温度,$ S $是熵,$ p $是压力,$ V $是体积系统。
\ [\ frac {\ partial S} {\ partial U} = \ frac {1} {T} \] \ [\ frac {\ partial S(P)} {\ partial U} =-\ lambda_1 = \ frac { 1} {T} \] \ [\ begin {align} P(x)& = \ frac {e ^ {-\ frac {\ varepsilon(x)} {k_B T}}} {\ sum_ {x \ in X} e ^ {-\ frac {\ varepsilon(x)} {k_B T}}} \\\ end {align} \]最大熵分布无处不在。它反映了在某些约束下系统的性质。可以在Wikipedia上找到最大熵分布及其约束的集合。