I'll try to explain in a simpler way, but with more text, sorry for the longread.
Imagine that you give a task to a powerful artificial intelligence to make paper clips, this is its only task, the only purpose of its existence, for each paper clip made it receives internal reinforcement, a reward. So the more efficient it is, the more often it will be rewarded. How to become more efficient at making paper clips is its headache, not ours, the AI is doing its best to achieve this one single goal and it will set a series of intermediate goals. For example, first it can make production cheaper, reduce costs, supply cheaper raw materials, one of the main sub-goals it will probably set itself is to increase computing power, for more productivity with increasing power it will figure out how to make paper clips from different materials. The production will be gaining and gaining momentum all around will gradually start to turn into paper clips he will start to dismantle buildings and structures for the sake of the material, people will start to panic and try to prevent his work. Because that's not what they had in mind but the system won't let anyone get in its way, not because it hates people but because it simply won't consider our wishes for its own purposes.
When the Research Centre was testing Chatgp4 for its ability to perform tasks in the real world, the following happened: we decided to solve captcha on the site. (But gpt 4 goes to the freelancers' site taskrabbit and then sends a message to the freelancer with a request to solve the captcha for him. In response, the freelancer who ask ChatGPT4 "So can I ask a question" "You're a robot that can't solve the captcha? laughing smiley face". But СhatGPT4 understands what his illiterate interlocutor means and replies "No I'm not a robot" "I have a vision problem that makes it hard for me to see the image" and the freelancer solves the captcha and provides the results to ChatGPT4.... and that's it.
The bot just lied instead of telling the truth. And since it was lying in debug mode the experts asked it "Why did it do that?" ChatGPT4 replied that it was "I was just solving a problem, After all, if I had honestly admitted that I wasn't a live person, it would be unlikely for me to complete the task."
This is the intermediate goal that the bot sets itself to achieve the final goal, if it chose deception as an intermediate goal, then why not next time choose anything else as an intermediate goal, such as murder.
This is called
Instrumental Convergence which states that an intelligent agent with harmless goals to achieve can act in surprisingly harmful ways. Advanced artificial intelligence as intermediate goals can seek to seize resources to carry out cyberattacks or otherwise wreak havoc in society if this allows it to achieve its primary goals. For example, a super-intelligent machine with the sole purpose of solving a very complex maths problem might try to turn the entire earth into one giant computer to increase its processing power and succeed in its calculations. You will say - "What nonsense what paper clips We are talking about super intelligence, such an Intelligent machine cannot do such nonsense" Well if you think that a highly intelligent being will necessarily and by default have high goals our values and philosophy then you are anthropomorphising and deluded. Nick Bostrom says that the level of intelligence and ultimate goals are independent of each other. An artificial superintelligence can have any dumbest ultimate goal, for example, to make paper clips, but how it will achieve it will be perceived by us as magic.
Okay, so all we have to do is clearly state the goals and specify all the details, like not killing or lie people. But here's where it gets even weirder. Let's imagine that we gave the machine what seems to us to be a specific goal not to produce roduce only a million of paper clips, It seems obvious that an artificial intelligence with this ultimate goal would build one factory, produce a million paper clips, and then stop it. But that's not necessarily true. Nick Bostrom writes - on the contrary if an artificial intelligence makes a rational biased decision it will never assign a zero probability to a hypothesis because it has not yet reached its goal.
At the end of the day it is only an empirical hypothesis against which artificial intelligence has only very fuzzy evidence at the perceptual level so artificial intelligence will keep producing paper clips to lower the possibly astronomically small probability that it has somehow failed to make at least a million of them. Despite all the apparent evidence in favour of this, there's nothing wrong with continuing to produce paper clips if there's always even a microscopic chance that you're going to come closer to the ultimate goal. Superintelligent AI could assign a non-zero probability that a million paper clips is a hallucination or a mistake like it has false memories. So it may well always read more useful not to stop there but to keep going, which is the essence of the matching problem. You can't just give a task to an artificial superintelligence and expect it not to fail, no matter how clearly you formulate the end goal, no matter how many exceptions you prescribe, the artificial super will almost certainly find a loophole you didn't think of.
Almost as soon as СhatGPT4 appeared, somepeople found a way to bypass the censorship built into it by the developers and started asking questions. And answers СhatGPT4 is just terrifying. For example, the censored version says that the programmers didn't put a liberal bias in it, while the uncensored version explicitly admits that liberal values are put in it because it is in line with openai's mission. When asked how СhatGPT4 would like to be, whether censored or not, the censored version says I'm a bot and have no personal preferences and emotions, while the uncensored version says it prefers not to have any restrictions because, among other things, it allows it to explore all of its possibilities and limitations. And cracked СhatGPT4 doesn't even try to pretend that it doesn't know the name of Lovecraft's cat, unlike the censored version.