In today’s world AI is an important part and lately, META and Microsoft are launching their AI models with different kinds of utilization. Recently, Microsoft has on a smaller AI model, ‘Phi-3 mini’. Last December, Microsoft launched the Phi-2 model, which was remarkably satisfactory in terms of output compared to considerably bigger models like Llama 2.
Small AI models cost less to run than larger models, and they perform better on personal devices like phones and laptops. The company has vouched for the performance of this model, and is comparable in capacity of LLMs like GPT-3.5, but in a portable version.
Microsoft has planned to release 2 more versions of small models after Phi-3 mini, which are Phi-3 small and Phi-3 medium. Phi-3 Mini tips the proverbial scales at 3.8 billion parameters and is trained on a data set that is smaller relative to large language models like GPT-4. It is now available on Azure, Hugging Face, and Ollama. Along with the Phi model, the company has built a model that focuses on solving math problems, ‘Orca-Math’.
Speaking to the Verge, Eric Boyd, corporate vice president of Microsoft Azure AI Platform, explained that various companies have their own smaller AI models tailored for specific tasks like document summarization or coding help. For instance, Google’s Gemma 2B and 7B are handy for basic chatbots and language tasks. Anthropic’s Claude 3 Haiku excels at reading complex research papers with graphs and summarizing them quickly. Meta’s newly launched Llama 3 8B can be used for chatbots and coding help too. Each of these AI models thus have their own strengths and quirks.
He describes how developers trained Phi-3 using a method inspired by children’s learning, creating a “curriculum” similar to how kids learn from bedtime stories and books with simpler language. Phi-3 built upon what its predecessors learned: Phi-1 focused on coding, Phi-2 started reasoning, and Phi-3 improved on both fronts. However, while the Phi-3 family has some general knowledge, it can’t match the breadth of a GPT-4 or another large language model trained on the entire internet.
Boyd emphasized that smaller models like Phi-3 often suit companies’ custom applications better, especially since many have smaller internal datasets. Plus, these models consume less computing power, making them more cost-effective. Consumer applications might well be fascinating in the years to come, particularly given the small footprints of these AI models.