Loading [MathJax]/jax/output/HTML-CSS/jax.js

Saturday, October 9, 2010

Dependent Reward Revelation: Part II

In my previous post I talked about price differentiation and how it motivates dependent reward revelation, which is a fancy way of saying which rewards are revealed depends upon the reward values. I have to admit I'm somewhat stuck about how to exploit this additional information.

I'll repeat the setup here:
  1. World chooses (x,ω,r) from D and reveals (x,ω).
  2. Player chooses aA via p(a|x,ω).
  3. World chooses AP(A) via q(A|x,ω,r,a), where aA is required.
  4. World reveals {r(a)|aA}.
The usual ``see what you did'' scenario is q(A|x,ω,r,a)=1A={a}, and for price differentiation it is q(A|x,ω,r,a)={{a|aa}if r(a)>0;{a|aa}if r(a)=0. Since aA is required, I can always throw away the additional information and convert any q into q=1A={a}, then use the offset tree. That seems wasteful, but at the moment I don't have another option for the price differentiation problem.

A Filter-Offset Style Update (Fail)


Consider a filter-offset tree style solution. Fix (x,ω,r), and consider a fixed internal node with inputs λω and ϕω. The expected importance weight of λ would be wλ|r=Eap[EAq|r,a[αλ,¬ϕ1λA1ϕA1r(λ)12(r(λ)12)+α¬λ,ϕ1λA1ϕA1r(ϕ)12(12r(ϕ))+αλ,ϕ1λA1ϕA1r(λ)>r(ϕ)(r(λ)r(ϕ))]]/Eap[EAq|r,a[1λA1ϕA+1λA1ϕA+1λA1ϕA]]. Analogy with the filter-offset update suggests the choices αλ,¬ϕ=(1γ)Eap[EAq|r,a[1λA1ϕA+1λA1ϕA+1λA1ϕA]]]Eap[EAq|r,a[1λA1ϕA]],α¬λ,ϕ=(1γ)Eap[EAq|r,a[1λA1ϕA+1λA1ϕA+1λA1ϕA]]Eap[EAq|r,a[1λA1ϕA]],αλ,ϕ=γEap[EAq|r,a[1λA1ϕA+1λA1ϕA+1λA1ϕA]]Eap[EAq|r,a[1λA1ϕA]], for some γ[0,1]. Unfortunately in general these quantities cannot be computed since r is only partially revealed per instance. For the price differentiation q, for instance, only when a is the largest possible price and r(a)>0, or when a is the smallest possible price and r(a)=0, can these quantities be computed.

My suspicion is that the only way to proceed with this filter-offset style update is if the set of rewards that q depends upon is always revealed. So something like q(A|x,ω,r,a)={{˜a}{a|aa}if r(˜a)>0;{a,˜a}if r(˜a)=0;{˜a}{a|aa}if r(˜a)<0, would work since q only depends upon r(˜a) which is always revealed, so the above expectations can always be computed. With such a cooperative q, the rest of the filter-offset tree crank can be turned and the weighting factors would be αλ,¬ϕ=Eap[EAq|r,a[1λA1ϕA+1λA1ϕA]]Eap[EAq|r,a[1λA1ϕA]],α¬λ,ϕ=Eap[EAq|r,a[1λA1ϕA+1λA1ϕA]]Eap[EAq|r,a[1λA1ϕA]],αλ,ϕ=1, which is neat, but still leaves me wondering how to exploit the additional information available in the price differentiation problem.

No comments:

Post a Comment