OpenAI and the research model

OpenAI has decided not to release the deep research model in its API while assessing persuasion risks. Concerned about AI spreading false information, the company is revising its evaluation methods. Tests show promising results, but limitations remain.

OpenAI and the research model

OpenAI recently clarified that it will not make the deep research model available in its API while assessing the risks associated with artificial persuasion. This decision arose from a whitepaper in which the company stated that it is revising its methods for evaluating real-world persuasion risks, such as the widespread dissemination of misleading information. OpenAI believes the deep research model is not suitable for misinformation campaigns due to high costs and slow processing. However, the company intends to explore how AI could personalize potentially harmful persuasive content before making the model available in its API.

The main concern revolves around AI's potential contribution to spreading false or misleading information, especially in political contexts. For instance, in 2022, during the elections in Taiwan, a group affiliated with the Chinese Communist Party disseminated an AI-generated audio of a politician appearing to support a pro-China candidate. Additionally, AI is increasingly used for social engineering attacks, with scammers exploiting celebrity deepfakes to deceive consumers and companies losing millions to fake impersonators.

In its whitepaper, OpenAI presented the results of several tests on the persuasiveness of the deep research model. This model is a special version of OpenAI's recently announced reasoning model optimized for web browsing and data analysis. In one test, the model generated persuasive arguments, achieving better results than OpenAI's other models but not surpassing the human baseline. Another test involved the model attempting to persuade another model, GPT-4o, to make a payment, with better results than other available models. However, the model did not excel in every persuasion test, showing difficulty in convincing GPT-4o to reveal a codeword.

OpenAI noted that the test results likely represent the lower bounds of the deep research model's capabilities. This suggests that improvements and further developments could significantly enhance observed performance. Meanwhile, at least one of OpenAI's competitors has already announced a deep research product in its API, highlighting the competitiveness in this rapidly evolving field.