[Question] The evaluation results vary every time. #60

koda-11 · 2024-03-11T06:05:56Z

Question

When i tried to evaluate LanguageBind/MoE-LLaVA-Phi2-2.7B-4e model, the evaluation results vary every time.
(e.g. 1st: 61.42, 2nd: 61.32, 3nd: 61.22 for GQA evaluation)

I attempted to debug and realized that the routing results in MoE layer vary with each inference.
In GQA evaluation, the sample data below results in either red or brown.
{"question_id": "2059565", "image": "n130638.jpg", "text": "What color is the dirt?\nAnswer the question using a single word or phrase.", "category": "default"}

I print self.deepspeed_moe.exp_counts in https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/moe/layer.py#L132

=======================================
Case1:

tensor([ 63, 140, 157, 270])
tensor([ 9, 185, 0, 436])
tensor([211, 44, 49, 326])
tensor([134, 308, 11, 177])
tensor([351, 203, 56, 20])
tensor([ 9, 375, 33, 213])
tensor([ 58, 136, 264, 172])
tensor([ 30, 12, 582, 6])
tensor([150, 450, 10, 20])
tensor([ 93, 147, 1, 389])
tensor([ 48, 221, 30, 331])
tensor([541, 30, 46, 13])
tensor([ 24, 169, 410, 27])
tensor([ 10, 293, 92, 235])
tensor([ 0, 36, 1, 593])
tensor([116, 0, 514, 0])
tensor([1, 0, 0, 0])
tensor([0, 0, 0, 1])
tensor([0, 0, 0, 1])
tensor([1, 0, 0, 0])
tensor([1, 0, 0, 0])
tensor([0, 1, 0, 0])
tensor([0, 0, 0, 1])
tensor([0, 1, 0, 0])
tensor([1, 0, 0, 0])
tensor([0, 1, 0, 0])
tensor([0, 0, 0, 1])
tensor([1, 0, 0, 0])
tensor([0, 0, 1, 0])
tensor([0, 0, 1, 0])
tensor([0, 0, 0, 1])
tensor([0, 0, 1, 0])
Red

Case2:

tensor([ 63, 140, 157, 270])
tensor([ 8, 188, 0, 434])
tensor([200, 46, 52, 332])
tensor([136, 314, 14, 166])
tensor([351, 201, 59, 19])
tensor([ 7, 371, 36, 216])
tensor([ 55, 134, 261, 180])
tensor([ 34, 14, 579, 3])
tensor([156, 443, 10, 21])
tensor([ 95, 147, 1, 387])
tensor([ 48, 222, 29, 331])
tensor([548, 31, 40, 11])
tensor([ 26, 168, 411, 25])
tensor([ 8, 296, 97, 229])
tensor([ 0, 39, 3, 588])
tensor([113, 0, 517, 0])
tensor([0, 1, 0, 0])
tensor([0, 0, 0, 1])
tensor([0, 0, 0, 1])
tensor([0, 0, 0, 1])
tensor([1, 0, 0, 0])
tensor([0, 1, 0, 0])
tensor([0, 0, 1, 0])
tensor([0, 0, 0, 1])
tensor([1, 0, 0, 0])
tensor([0, 1, 0, 0])
tensor([0, 0, 0, 1])
tensor([1, 0, 0, 0])
tensor([0, 0, 1, 0])
tensor([0, 0, 1, 0])
tensor([0, 0, 0, 1])
tensor([0, 0, 1, 0])
Brown

Please check if this phenomenon is common or if there is a bug in code.

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] The evaluation results vary every time. #60

[Question] The evaluation results vary every time. #60

koda-11 commented Mar 11, 2024

[Question] The evaluation results vary every time. #60

[Question] The evaluation results vary every time. #60

Comments

koda-11 commented Mar 11, 2024

Question

Question