9 Comments
User's avatar
Onyekachukwu Okonji's avatar

Really enjoyed reading this. Currently doing research on hallucinations and I'm glad I came across this.

Expand full comment
Victor Dibia, PhD's avatar

Glad you found it useful.

Thanks for sharing feedback

Expand full comment
Beyond AI's avatar

Great. work. Very in-depth coverage.

Expand full comment
Victor Dibia, PhD's avatar

Thanks @derrick! Glad you found it useful! Thanks for sharing and subscribing! Your support is much appreciated!

Expand full comment
Beyond AI's avatar

Absolutely! Don't mention it.

Expand full comment
Doug Ross's avatar

Great summary, thank you.

Wondering if there are visualizations/UIs that can help capture model performance in various interim states?

Or provenance/bias of training datasets?

Expand full comment
Victor Dibia, PhD's avatar

Model performance benchmarking is an interesting emergent area.

It is also hard because because performance is affected by factors beyond the model's weights (e.g. prompt design).

Some benchmarks

- HELM . https://crfm.stanford.edu/helm/latest/

- Eleuther AI LLM Eval https://github.com/EleutherAI/lm-evaluation-harness

UIs for benchmarking typically are task-based.

Some tools for comparing model outputs

- https://scale.com/spellbook

- https://twitter.com/natfriedman/status/1633582489850773504?lang=en

Expand full comment
Jon's avatar

Great work; curious, how do you generate your post images? It looks great

Expand full comment
Victor Dibia, PhD's avatar

Thank you!

Designed by hand in Figma.

Expand full comment