Issue #11 | How can we make systems that integrate LLM's like ChatGPT more reliable? Here are practical techniques (and research) to mitigate hallucination and improve overall performance.
Really enjoyed reading this. Currently doing research on hallucinations and I'm glad I came across this.
Glad you found it useful.
Thanks for sharing feedback
Great. work. Very in-depth coverage.
Thanks @derrick! Glad you found it useful! Thanks for sharing and subscribing! Your support is much appreciated!
Absolutely! Don't mention it.
Great summary, thank you.
Wondering if there are visualizations/UIs that can help capture model performance in various interim states?
Or provenance/bias of training datasets?
Model performance benchmarking is an interesting emergent area.
It is also hard because because performance is affected by factors beyond the model's weights (e.g. prompt design).
Some benchmarks
- HELM . https://crfm.stanford.edu/helm/latest/
- Eleuther AI LLM Eval https://github.com/EleutherAI/lm-evaluation-harness
UIs for benchmarking typically are task-based.
Some tools for comparing model outputs
- https://scale.com/spellbook
- https://twitter.com/natfriedman/status/1633582489850773504?lang=en
Great work; curious, how do you generate your post images? It looks great
Thank you!
Designed by hand in Figma.
Really enjoyed reading this. Currently doing research on hallucinations and I'm glad I came across this.
Glad you found it useful.
Thanks for sharing feedback
Great. work. Very in-depth coverage.
Thanks @derrick! Glad you found it useful! Thanks for sharing and subscribing! Your support is much appreciated!
Absolutely! Don't mention it.
Great summary, thank you.
Wondering if there are visualizations/UIs that can help capture model performance in various interim states?
Or provenance/bias of training datasets?
Model performance benchmarking is an interesting emergent area.
It is also hard because because performance is affected by factors beyond the model's weights (e.g. prompt design).
Some benchmarks
- HELM . https://crfm.stanford.edu/helm/latest/
- Eleuther AI LLM Eval https://github.com/EleutherAI/lm-evaluation-harness
UIs for benchmarking typically are task-based.
Some tools for comparing model outputs
- https://scale.com/spellbook
- https://twitter.com/natfriedman/status/1633582489850773504?lang=en
Great work; curious, how do you generate your post images? It looks great
Thank you!
Designed by hand in Figma.