Aspirational Links (2024-12-03)

Reward Hacking in Reinforcement Learning

Back in grad school, Sean Luke shared this story of some RL they worked on for a soccer game. After letting it train for a few days they came back and watched some of the play. Looked like a bad result. The low-poly men on the field would run forward then stop and run backwards intermittently, even when the objective obviously entailed «go down field». Turns out that the game implemented stamina mechanics. Run too long, you slow to a craw. But...if you run backwards, a bug in the game would reset it. Kinda a bad example because in this case the exploited flaw led to higher scores in a way that was aligned with the the task of winning the game. But the lesson stuck and now I think about it every time I do RL (rarely).

Anyways, if Lilian Weng writes something, it' generally worth reading. Plus, she recently left OpenAI so I kinda wonder if this is related to something she may work on next.

DumbPipe

Damn. This one kinda hurts.

Both during grad school — for academic reasons and as a data science consultant (teaching really did not appeal to me) — I kept having the need to send large files/folders to people in my cohort or clients. It was always anoying to stash on S3 or something first. I wanted simple P2P but it turns out that it wasn't easy. But after I defended, not really knowing what I wanted to do, I started trying to scratch that itch.

There were antecedents. LibP2P had uPnP and that worked sometimes. (Their stubbed hole punching implementation didn't work well then.) Tailscale was starting to grow fast. But I didn't want exactly that. I wanted to just enable P2P, including for ephemeral cases. Wrote a decent version in a few months of go. The simplest version was just,

$ wormhole peerName peerService

and it would use my main server as something like a STUN/TURN server, then you would communicate P2P over a hole-punched QUIC connection. You could decide to use it as a PIPE or directly in your app (if it was GO). Had a lot of other features eventually, too. Worked okay. And it got me a YC interview. But in the end, I didn't get in, and I kept confronting the problem of "people who want P2P don't really pay for things." I moved on. (it was fun to learn a lot of cryptography 🚩🚩🚩 though.)

Dumbpipe seems really close to a lot of what I did, except using rust instead of go. I'll definitely use it.

Notes on the Tao Te Ching

This was really cool - like someone thoughtfully demonstrating how they tried to read a map that they didn't understand to you, who also does not understand.

I reread the bible every few years. I was raised Roman Catholic in North Jersey to an Italian family so...I think it was mostly just me trying to honor my mom or something. (she's still alive. but I'm something of an uninteresting agnostic.)

Anyways, the last time I did that during the pandemic, it stopped being just a cover-to-cover ritual. I started jumping around more; actually trying to connect all the dots; reflect on things.

Ended up starting to grow some of those "bigger on the inside" perceptions that you get when you the work and the focus of your attention supports it. since then, I've done it for 15 minutes a day — something like 1 read to 3 writes in Obsidian. (although i stray to other Christian works, too)

Admittedly, "15 minutes a day" is still kinda checkboxing. I'm very time poor now though, so it's almost about giving myself permission. But thanks to Michael I have a new approach to try.

Physicists wiggling on floors searching for an intuition.


Replies via Bluesky

No comments yet. Go to the post