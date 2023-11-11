The company behind ChatGPT, OpenAI, is now seeking partnerships with organizations all over the world that could release their data for AI training.

This is coming amid rising legal cases over the use of unauthorized data to train its Large Language Model. The company said it is now inviting organizations or other interested parties to share large-scale datasets that reflect human society and are not already easily accessible online to the public today.

The datasets will be used in an open-source archive, publicly available for AI model training, and private datasets for training proprietary AI models.

OpenAI noted that data partnerships are intended to enable more organizations to help steer the future of AI and benefit from models that are more useful to them, by including content they care about.

What OpenAI is looking for

While noting that it is already working with many partners who are eager to represent data from their country or industry, OpenAI said:

“We’re interested in large-scale datasets that reflect human society and that are not already easily accessible online to the public today. We can work with any modality, including text, images, audio, or video.

We’re particularly looking for data that expresses human intention (e.g. long-form writing or conversations rather than disconnected snippets), across any language, topic, and format.

“We can work with data in almost any form and can use our next-generation in-house AI technology to help you digitize and structure your data.

For example, we have world-class optical character recognition (OCR) technology to digitize files like PDFs, and automatic speech recognition (ASR) to transcribe spoken words.

If the data needs cleaning (e.g. has lots of auto-generated artifacts or transcription errors), we can work with your team to process it into the most useful form.

We are not seeking datasets with sensitive or personal information, or information that belongs to a third party; we can work with you to remove this information if you need help.”

Avoiding legal battle

OpenAI’s call for partnership is the company’s strategy to avoid legal battles relating to the data it uses in training its AI.

Just recently OpenAI was sued for allegedly stealing private information from “hundreds of millions” of internet users to develop its AI models.

The class action suit was filed in the U.S. District Court, Northern District of California by plaintiffs known only by their initials. OpenAI, along with partner Microsoft, was accused of unlawfully “collecting and feeding … personal data from millions of unsuspecting consumers worldwide.”

The defendants were alleged to have conducted widespread web-scraping campaigns, violating various platforms’ terms of service as well as state and federal privacy laws, including the Computer Fraud and Abuse Act and the Electronic Communications Privacy Act.