9 Comments
Dec 4, 2023Liked by Victor Dibia

Really enjoyed reading this. Currently doing research on hallucinations and I'm glad I came across this.

Expand full comment
author
Dec 4, 2023·edited Dec 4, 2023Author

Glad you found it useful.

Thanks for sharing feedback

Expand full comment

Great. work. Very in-depth coverage.

Expand full comment
author

Thanks @derrick! Glad you found it useful! Thanks for sharing and subscribing! Your support is much appreciated!

Expand full comment

Absolutely! Don't mention it.

Expand full comment

Great summary, thank you.

Wondering if there are visualizations/UIs that can help capture model performance in various interim states?

Or provenance/bias of training datasets?

Expand full comment
author

Model performance benchmarking is an interesting emergent area.

It is also hard because because performance is affected by factors beyond the model's weights (e.g. prompt design).

Some benchmarks

- HELM . https://crfm.stanford.edu/helm/latest/

- Eleuther AI LLM Eval https://github.com/EleutherAI/lm-evaluation-harness

UIs for benchmarking typically are task-based.

Some tools for comparing model outputs

- https://scale.com/spellbook

- https://twitter.com/natfriedman/status/1633582489850773504?lang=en

Expand full comment
Mar 3, 2023Liked by Victor Dibia

Great work; curious, how do you generate your post images? It looks great

Expand full comment
author

Thank you!

Designed by hand in Figma.

Expand full comment