I'll repeat the setup here:
- World chooses (x,ω,r) from D and reveals (x,ω).
- Player chooses a∈A via p(a|x,ω).
- World chooses A∈P(A) via q(A|x,ω,r,a), where a∈A is required.
- World reveals {r(a)|a∈A}.
A Filter-Offset Style Update (Fail)
Consider a filter-offset tree style solution. Fix (x,ω,r), and consider a fixed internal node with inputs λ∉ω and ϕ∉ω. The expected importance weight of λ would be wλ|r=Ea∼p[EA∼q|r,a[αλ,¬ϕ1λ∈A1ϕ∉A1r(λ)≥12(r(λ)−12)+α¬λ,ϕ1λ∉A1ϕ∈A1r(ϕ)≤12(12−r(ϕ))+αλ,ϕ1λ∈A1ϕ∈A1r(λ)>r(ϕ)(r(λ)−r(ϕ))]]/Ea∼p[EA∼q|r,a[1λ∈A1ϕ∉A+1λ∉A1ϕ∈A+1λ∈A1ϕ∈A]]. Analogy with the filter-offset update suggests the choices αλ,¬ϕ=(1−γ)Ea∼p[EA∼q|r,a[1λ∈A1ϕ∉A+1λ∉A1ϕ∈A+1λ∈A1ϕ∈A]]]Ea∼p[EA∼q|r,a[1λ∈A1ϕ∉A]],α¬λ,ϕ=(1−γ)Ea∼p[EA∼q|r,a[1λ∈A1ϕ∉A+1λ∉A1ϕ∈A+1λ∈A1ϕ∈A]]Ea∼p[EA∼q|r,a[1λ∉A1ϕ∈A]],αλ,ϕ=γEa∼p[EA∼q|r,a[1λ∈A1ϕ∉A+1λ∉A1ϕ∈A+1λ∈A1ϕ∈A]]Ea∼p[EA∼q|r,a[1λ∈A1ϕ∈A]], for some γ∈[0,1]. Unfortunately in general these quantities cannot be computed since r is only partially revealed per instance. For the price differentiation q, for instance, only when a is the largest possible price and r(a)>0, or when a is the smallest possible price and r(a)=0, can these quantities be computed.
My suspicion is that the only way to proceed with this filter-offset style update is if the set of rewards that q depends upon is always revealed. So something like q(A|x,ω,r,a)={{˜a}∪{a′|a′≥a}if r(˜a)>0;{a,˜a}if r(˜a)=0;{˜a}∪{a′|a′≤a}if r(˜a)<0, would work since q only depends upon r(˜a) which is always revealed, so the above expectations can always be computed. With such a cooperative q, the rest of the filter-offset tree crank can be turned and the weighting factors would be αλ,¬ϕ=Ea∼p[EA∼q|r,a[1λ∈A1ϕ∉A+1λ∉A1ϕ∈A]]Ea∼p[EA∼q|r,a[1λ∈A1ϕ∉A]],α¬λ,ϕ=Ea∼p[EA∼q|r,a[1λ∈A1ϕ∉A+1λ∉A1ϕ∈A]]Ea∼p[EA∼q|r,a[1λ∉A1ϕ∈A]],αλ,ϕ=1, which is neat, but still leaves me wondering how to exploit the additional information available in the price differentiation problem.
No comments:
Post a Comment