Equivariance Benchmark for Vision-Language Model (EqBen)
Tan Wang,Â
Kevin Lin,Â
Linjie Li,Â
Chung-Ching Lin,Â
Zhengyuan Yang,Â
Hanwang Zhang,Â
Zicheng Liu,Â
Lijuan Wang
Nanyang Technological University, Â Microsoft Corporation
About
Welcome to the EqBen, which helps to benchmark your Vision-Language Pretrained (VLP) Model effectively and efficiently with a kind of image-text matching task.
Compared to recent works (Winoground and VALSE) focusing on minimal semantic changes in captions, EqBen pivots on diverse visual-minimal changes, automatically curated from time-varying visual contents in natural videos and synthetic engines with more precise control.
This repo contains an one-stop and ready-to-use pypi toolkit, supporting multiple evaluation:
- EqBen data evaluation
- Winoground and VALSE data evaluation
Installation & Usage
pip install -i https://test.pypi.org/simple/ eqben==0.0.6
It can be easily inserted into your VL model framework with little code addition. Here we provide a code template and examples (#1 and #2) for 2 popular VL models (CLIP and FIBER).
For the specific evaluation step, the users need to further download the data. Please check the following sections for details.
EqBen ##### 1. Data Download
The user can download the raw image data via onedrive or baidu drive.
2. Modify Data Path
Please refer to the template (example) to modify the data path and annotation path. Then follow the example to insert EqBen evaluation code into your VL model framework.
3. Submit to Server for Score
Running the evaluation script to get the score.npy
file, then please submit to our CodaLab server after zip to obtain the final score.
Winoground & VALSE
Our toolkit also supports the previous Winoground and VALSE benchmark. You can easily import them with following steps.
1. Data Download
The user can download the raw data by following the official website of Winoground and VALSE.
2. Modify Data Path
Please refer to the template (example) to modify the data path and annotation path. Then follow the example to insert EqBen evaluation toolkit into your VL model framework.
3. Run the Script and Check the Score
The users can just check the offline score output.