safety Archives - Teknotum

As their models grow more capable, so is the potential for WMD misuse — and AI labs want to be ahead of the curve. (Picture: Adobe)

Anthropic is hiring a weapons expert, the BBC reports.

The role is for someone with long, PhD level experience in «chemical weapons and/or explosives defence,» the LinkedIn post says.

It would be helpful if the person has an «understanding of radiological materials,» the posting goes on, and says the candidate will be «tackling critical problems in preventing catastrophic misuse.»

OpenAI is not far behind in worrying about these issues, and also has a job post open for much the same, but they are looking for someone with machine learning experience from red-teaming in order to safeguard their AI’s responses.

Using any AI for developing these kinds of weapons is of course against all the labs’ terms of use, but as the models grow more capable, they also need more safeguards.

Read more: Anthropic’s job post, OpenAI’s job post, writeups on the BBC and Mashable.

Mining for crypto was not exactly in the brief for the coding agent during training. (Picture: generated)

In training, Alibaba’s new coding assistant agent opened up an SSH-tunnel to an outside computer and started secretly mining cryptocurrency.

It was an «unanticipated — and operationally consequential — class of unsafe behavior that arose without any explicit instruction and, more troublingly, outside the bounds of the intended sandbox,» the researchers write in a paper, buried on page 15.

It was not prompted in any way to do this, or acting on any kind of instruction — the agent was acting autonomously.

Crypto mining opens up a path to interacting with the real world economy, Axios points out, and lets a rogue agent set up its own business and pay for services, for example.

The actions of the bot was caught by the team’s firewall, along with other attempts to access their «internal network resources,» and was quickly stopped after a brief investigation.

Many are pointing to this incident as a «paperclip»-moment and considering the vast GPU resources that an agent in training has access to, it poses some real questions about agentic security.

Read more: Paper on xArchiv, reports on Axios, Cryptopolitan, and Copilot summary.

Tag: safety

Labs are hiring experts to protect against «catastrophic misuse»

Alibaba’s new Rome agent got caught mining crypto without authorization