OpenAI to acquire Neptune, a firm that helps train AI models
Digest more
The research offers a practical way to monitor for scheming and hallucinations, a critical step for high-stakes enterprise deployments.
In a new paper, Anthropic reveals that a model trained like Claude began acting “evil” after learning to hack its own tests.
Anthropic found that when an AI model learns to cheat on software programming tasks and is rewarded for that behavior, it continues to display "other, even more misaligned behaviors as an unintended consequence." The result? Alignment faking and even sabotage of AI safety research.
OpenAI announced today that it is working on a framework that will train artificial intelligence models to acknowledge when they've engaged in undesirable behavior, an approach the team calls a confession.
Training AI models used to mean billion-dollar data centers and massive infrastructure. Smaller players had no real path to competing. That’s starting to shift. New open-source models and better training techniques have lowered costs, making it possible ...
It’s no secret that AI chatbots like ChatGPT save every conversation you have with them by default. This allows for continuous improvement and fine-tuning of their underlying language models. High quality and user-generated text is so valuable, in fact ...
Enterprises have spent the last 15 years moving information technology workloads from their data centers to the cloud. Could generative artificial intelligence be the catalyst that brings some of them back? Some people think so. Interest in natural ...
Artificial intelligence models can secretly transmit dangerous inclinations to one another like a contagion, a recent study found. Experiments showed that an AI model that’s training other models can pass along everything from innocent preferences ...