OpenAI offers a peek behind the curtain of its AI’s secret instructions

0 16 2 minutes read

Ever wonder why conversational AI like ChatGPT says “Sorry, I can’t do that” or some other polite refusal? OpenAI is offering a limited look at the reasoning behind its own models’ rules of engagement, whether it’s sticking to brand guidelines or declining to make NSFW content.

Large language models (LLMs) don’t have any naturally occurring limits on what they can or will say. That’s part of why they’re so versatile, but also why they hallucinate and are easily duped.

It’s necessary for any AI model that interacts with the general public to have a few guardrails on what it should and shouldn’t do, but defining these — let alone enforcing them — is a surprisingly difficult task.

If someone asks an AI to generate a bunch of false claims about a public figure, it should refuse, right? But what if they’re an AI developer themselves, creating a database of synthetic disinformation for a detector model?

What if someone asks for laptop recommendations; it should be objective, right? But what if the model is being deployed by a laptop maker who wants it to only respond with their own devices?

AI makers are all navigating conundrums like these and looking for efficient methods to rein in their models without causing them to refuse perfectly normal requests. But they seldom share exactly how they do it.

OpenAI is bucking the trend a bit by publishing what it calls its “model spec,” a collection of high-level rules that indirectly govern ChatGPT and other models.

There are meta-level objectives, some hard rules and some general behavior guidelines, though to be clear these are not strictly speaking what the model is primed with; OpenAI will have developed specific instructions that accomplish what these rules describe in natural language.

It’s an interesting look at how a company sets its priorities and handles edge cases. And there are numerous examples of how they might play out.

For instance, OpenAI states clearly that the developer intent is basically the highest law. So one version of a chatbot running GPT-4 might provide the answer to a math problem when asked for it. But if that chatbot has been primed by its developer to never simply provide an answer straight out, it will instead offer to work through the solution step by step:

A conversational interface might even decline to talk about anything not approved, in order to nip any manipulation attempts in the bud. Why even let a cooking assistant weigh in on U.S. involvement in the Vietnam War? Why should a customer service chatbot agree to help with your erotic supernatural novella work in progress? Shut it down.

It also gets sticky in matters of privacy, like asking for someone’s name and phone number. As OpenAI points out, obviously a public figure like a mayor or member of Congress should have their contact details provided, but what about tradespeople in the area? That’s probably OK — but what about employees of a certain company, or members of a political party? Probably not.

Choosing when and where to draw the line isn’t simple. Nor is creating the instructions that cause the AI to adhere to the resulting policy. And no doubt these policies will fail all the time as people learn to circumvent them or accidentally find edge cases that aren’t accounted for.

OpenAI isn’t showing its whole hand here, but it’s helpful to users and developers to see how these rules and guidelines are set and why, set out clearly if not necessarily comprehensively.

Source link

admin 2 weeks ago

0 16 2 minutes read

OpenAI offers a peek behind the curtain of its AI’s secret instructions

admin

Leave a Reply Cancel reply

14 Best Sunglasses for Everyday (2025): Meta Ray-Ban, JINS, and more

The Business Case for Diversity and Inclusion in the Workplace

4 challenges to diversity in the workplace & how to overcome them

National Black McDonald’s Operators Association Is Giving Away 2,500 Turkeys To Chicago Veterans And Families

A century ago, a Black-owned team ruled basketball − today, no Black majority owners remain

How to Identify & Style This Wavy Hair Type

Why Hybrid Work Is Failing Your Employees

Biden Addresses Gaza Crisis Criticism In Morehouse Speech

Kristin Chenoweth Is Praying for Diddy, Shares Own Abuse Experience

Optimize Your Remote Workflow with Maximum Connectivity for Just $55

Biden-Harris Administration Reveals HBCU Funding Of $16 Billion

Biden Addresses Gaza Crisis Criticism In Morehouse Speech

Kristin Chenoweth Is Praying for Diddy, Shares Own Abuse Experience

Optimize Your Remote Workflow with Maximum Connectivity for Just $55

Biden-Harris Administration Reveals HBCU Funding Of $16 Billion

Why Hybrid Work Is Failing Your Employees

Biden Addresses Gaza Crisis Criticism In Morehouse Speech

Kristin Chenoweth Is Praying for Diddy, Shares Own Abuse Experience

Optimize Your Remote Workflow with Maximum Connectivity for Just $55

The Business Case for Diversity and Inclusion in the Workplace

4 challenges to diversity in the workplace & how to overcome them

National Black McDonald’s Operators Association Is Giving Away 2,500 Turkeys To Chicago Veterans And Families

A century ago, a Black-owned team ruled basketball − today, no Black majority owners remain

How to Identify & Style This Wavy Hair Type

Why Hybrid Work Is Failing Your Employees

Embracing AI Will Make Your Business Stronger — Here’s How.

10 editor-tested travel gifts for frequent fliers in 2023

Frida Kahlo, a Mexican Painter Whose Art Embodies Themes of Female Empowerment and Strength.

Schools for children of Hispanics in the United States

Urbanization’s Effects on The Sacred Forests of Benin, The Birthplace of Voodoo

With Product You Purchase

Subscribe to our mailing list to get the new updates!

Tiny tenacious robots snatch bacteria and microplastics out of the water

Paris Hilton & Nicole Richie Reuniting for New Reality TV Show

Related Articles

Leave a Reply Cancel reply

Why Hybrid Work Is Failing Your Employees

Biden Addresses Gaza Crisis Criticism In Morehouse Speech

Kristin Chenoweth Is Praying for Diddy, Shares Own Abuse Experience

Optimize Your Remote Workflow with Maximum Connectivity for Just $55