Tap to explore a project

ML Reasoning Eval

Posted 22 Mar 2023

I contributed to the OpenAI GPT Evals Repository. Evals is a framework for evaluating OpenAI models and an open-source registry of benchmarks. The evaluation is an adapted version of https://fr3ddie.me/article/jobtestpractice . The pull request was merged and is now used for evaluating and improving OpenAI models.

See how GPT3.5 turbo got on here:

Answers with '🤖' are GPTs choice, '✅' is the correct answer.

Overview of GPT3.5 performance:

image

See the pull request here: https://github.com/openai/evals/pull/341

Copyright © 2025 Freddie Nicholson All rights reserved.