2 Comments
User's avatar
The Intelligence Layer (TIL)'s avatar

Hey! Thanks for the great post about DPO vs RLHF. Just a couple of things:

1. DPO was runner up for last year's NeurIPS (2023).

2. I'm not sure whether this is just my browser or substack issue, but the MathJax isn't rendering on the web browser version of Substack.

\( \mathcal{L}_{\text{DPO}}(\pi_\theta; \pi_{\text{ref}}) = -\mathbb{E}_{(x, y_w, y_l) \sim \mathcal{D}} \left[ \log \sigma \left( \beta \log \frac{\textcolor{green}{\pi_\theta(y_w \mid x)}}{\textcolor{blue}{\pi_{\text{ref}}(y_w \mid x)}} - \beta \log \frac{\textcolor{red}{\pi_\theta(y_l \mid x)}}{\textcolor{blue}{\pi_{\text{ref}}(y_l \mid x)}} \right) \right]. \)

Expand full comment
AI Coffee Break with Letitia's avatar

Hey, thanks a lot for noticing and for taking your time to let me know! 🤗

1. I've updated the NeurIPS year.

2. I've replaced the LaTeX with pictures, since Substack fails to render them correctly in the published version. Somehow, the equations render almost correctly in the editing version of the post. So yeah, I'll put pictures in here and I'll go on to fix all my other posts. Quite sad, actually that LaTeX isn't working properly. :(

Expand full comment