Meta AI Virtual Assistant Trained Using Public Social Media Posts

Liam Williams


Meta AI Virtual Assistant Trained Using Public Social Media Posts

Meta Platforms has disclosed its strategy of leveraging public posts from Facebook and Instagram to train its innovative Meta AI virtual assistant. However, in a bid to safeguard user privacy, it intentionally omitted private posts only shared amongst close-knit circles from the training dataset, according to an interview with the organization's chief policy expert.

Respecting the Privacy Boundaries

Meta’s President of Global Affairs, Nick Clegg, mentioned during a chat at the company’s annual Connect conference that the protection of users' private content was of utmost importance. Thus, private messages on its various platforms were not included in the training data for the model. Measures were also applied to rule out private information from public datasets exploited for training.

Shift in Focus to Artificial Intelligence

The recently unveiled Meta AI was the standout product amongst the first batch of consumer-centric AI tools introduced by Meta’s CEO, Mark Zuckerberg, at the annual products conference, Connect. This year’s occasion was predominantly about artificial intelligence, contrasting previous years dedicated primarily to augmented and virtual reality.

Meta’s Learning Model and Functionality

Meta developed the virtual assistant using a custom model, which was a fusion of the potent Llama 2 large language model – made public for commercial use in July, and a fresh model, Emu - designed to generate images in reaction to text prompts. The resulting product is capable of creating text, audio, and imagery content.

Wrapping Up the Meta Reveal

In summary, the innovative product will be able to tap into real-time data thanks to a strategic alliance with Microsoft's Bing search engine. This announcement by Meta reaffirms its commitment to empowering users with groundbreaking AI tools while maintaining a stern approach to privacy by consciously excluding users’ private content from the product’s training data.