Phi-3 technical report

Summary: We introduce phi-3-mini, A 3.8 billion setting language model qualified on 3.3 thousand billion tokens, of which generally performance, as measure by both academic landmarks And internal essay, rivals that of models such as Mixtral 8x7B And GPT-3.5 (For example., phi-3-mini accomplished 69% on MMLU And 8.38 on MT bench), despite be little enough has be deployed on A phone. THE innovation lies entirely In OUR database For training, A expanded version of THE A used For phi-2, compound of strongly filtered the Web data And synthetic data. THE model East Also further aligned For robustness, security, And cat format. We Also provide a few initial parameter scaling results with A 7B And 14B models qualified For 4.8T tokens, called phi-3-petit And phi-3-medium, both significantly more able that phi-3-mini (For example., respectively 75% And 78% on MMLU, And 8.7 And 8.9 on MT bench).

Technology Apr 23, 2024 0 13 Add to Reading List

Summary: We introduce phi-3-mini, A 3.8 billion setting language model qualified on 3.3 thousand billion tokens, of which generally performance, as measure by both academic landmarks And internal essay, rivals that of models such as Mixtral 8x7B And GPT-3.5 (For example., phi-3-mini accomplished 69% on MMLU And 8.38 on MT bench), despite be little enough has be deployed on A phone. THE innovation lies entirely In OUR database For training, A expanded version of THE A used For phi-2, compound of strongly filtered the Web data And synthetic data. THE model East Also further aligned For robustness, security, And cat format. We Also provide a few initial parameter scaling results with A 7B And 14B models qualified For 4.8T tokens, called phi-3-petit And phi-3-medium, both significantly more able that phi-3-mini (For example., respectively 75% And 78% on MMLU, And 8.7 And 8.9 on MT bench).