This week, OpenAI unveiled a substantial enhancement to ChatGPT, enabling their GPT-3.5 and GPT-4 AI models to analyse images and respond to them within text conversations.
OpenAI also disclosed that their ChatGPT mobile app will soon incorporate speech synthesis capabilities, facilitating fully verbal interactions with the AI assistant when combined with its existing speech recognition features. OpenAI intends to roll out these features to their Plus and Enterprise subscribers within the next two weeks.
The new image recognition capability in ChatGPT allows users to upload one or more images during a conversation, utilising either the GPT-3.5 or GPT-4 models. OpenAI claims that this feature has diverse practical applications, from assisting users in deciding what to cook by analysing pictures of their fridge and pantry contents to helping troubleshoot issues with malfunctioning grills. Users can also employ their device’s touch screen to highlight specific areas of the image for ChatGPT’s attention.
OpenAI has provided a promotional video on their website, illustrating a hypothetical interaction with ChatGPT. In this scenario, a user seeks guidance on adjusting a bicycle seat and provides photos, an instruction manual and an image of their toolbox. ChatGPT responds by offering step-by-step instructions on how to complete the task. It’s crucial to mention that this feature has not undergone independent real-world testing for its effectiveness.
Regarding the technical aspects, OpenAI has not divulged specific details about the inner workings of GPT-4 or its multimodal variant, GPT-4V. However, based on existing AI research, including that of OpenAI’s partner, Microsoft, multimodal AI models generally transform both text and images into a shared encoding space. This allows them to process various types of data using the same neural network. OpenAI may employ techniques such as CLIP to align image and text representations in the same latent space, enabling ChatGPT to make contextual inferences across text and images, although this remains speculative.
In terms of audio capabilities, ChatGPT’s new voice synthesis feature reportedly enables spoken interactions with the AI. OpenAI describes it as a “new text-to-speech model.” Once this feature is introduced, users can enable voice conversations in the app’s settings and choose from five synthetic voices with names like “Juniper,” “Sky,” “Cove,” “Ember” and “Breeze.” These voices have been crafted in collaboration with professional voice actors. OpenAI’s Whisper, an open-source speech recognition system, will continue to handle the transcription of user speech input.
OpenAI acknowledges several limitations in the expanded features of ChatGPT, including the potential for visual misidentifications and imperfect recognition of non-English languages. The company has conducted risk assessments and sought input from alpha testers, advising users to exercise caution, especially in high-stakes or specialized contexts.
In light of privacy concerns, OpenAI has implemented technical measures to restrict ChatGPT’s ability to analyze and make direct statements about individuals, recognizing that ChatGPT is not always accurate and that privacy should be respected.
While OpenAI promotes these new features as granting ChatGPT the ability to “see, hear, and speak,” there is ongoing debate about the anthropomorphism and potential exaggeration in the language used. Notably, some AI researchers caution against anthropomorphizing AI models.
Although ChatGPT and its associated AI models are unequivocally not human, these updates have the potential to significantly expand OpenAI’s computer assistant capabilities. However, their actual performance and effectiveness will need to be assessed. OpenAI plans to introduce these features gradually, allowing for ongoing improvements and risk mitigation while preparing for more advanced systems in the future.
The fifth annual AI Awards takes place on Tuesday November 21st at the Gibson Hotel. This is an exciting opportunity to connect and network with over 200 AI and Data professionals across the island of Ireland and hear from some of the most exciting AI applications across industry and academia spanning 12 award categories.
Don’t miss the opportunity to get your discounted Early Bird tickets now! Head over to aiawards.ie/tickets or Eventbrite and enter the discount code EarlyBird_AIA to get your ticket for just €100. Limited tickets available; Original Price €239.