Multimodel evaluation#
environment settings#
Before running the Light-eval, users must ensure that they have correctly installed and configured all necessary environments according to the instructions in the Installation Document.
LLaVA-benchmark#
Prerequisites#
dataset
├── data
│ └── LLaVA-benchmark
│ ├── images
│ │ ├── 001.jpg
│ │ ├── 002.jpg
│ │ └── ...
│ ├── answers_gpt4.jsonl
│ ├── context.jsonl
│ └── ...
└── ...
Store the images folder according to the file structure given above in data
.
The dataset is availabel at 🤗Hugging Face/liuhaotian
evaluating#
Please make sure the dataset is stored according to the storage structure described above.
Change the parameters in scripts/run_llavabenchmark.sh
:
model_name
, pretrained_path
, llama_config
, tokenizer_path
, openai_key
and mode
.
mode settings:
inference: Get model answers.
eval: Use GPT4 to score the modle’s answers against the GPT4 answers.
show: Output of the scored results
all: Inferring, scoring, and outputting results for models.
After changing parameters, you can use following script to run LLaVA-benchmark evaluation code for your model.
script
sh scripts/run_llavabenchmark.sh
MM-Vet benchmark#
Prerequisites#
dataset
├── data
│ └── MM-Vet
│ ├── images
│ │ ├── v1_0.png
│ │ ├── v1_2.png
│ │ └── ...
│ ├── mm-vet.json
│ └── bard_set.json
└── ...
Store the images folder according to the file structure given above in data
Download MM-Vet data yuweihao/mm-vet.zip and unzip the dataset file according to the format described above.
evaluating#
Please make sure the dataset is stored according to the storage structure described above.
Change the parameters in scripts/run_mmvet.sh
:
model_name
, pretrained_path
, llama_config
, tokenizer_path
, openai_key
, use_sub_set
and mode
.
mode settings:
inference: Get model answers.
eval: Use GPT4 to score the modle’s answers against the GPT4 answers.
all: Inferring, outputting results for models.
use_sub_set:
True: use subset for evaluation.
False: use the full dataset for evaluation.
After changing parameters, you can use following script to run MM_Vet benchmark evaluation code for your model.
script
sh scripts/run_mmvet.sh