Tucker - Putin: Claude.ai admits to overreach and assumption without sufficient evidence. And how to prevent cheating by a forgetful Claude. Watch out.

I caught Claude.ai making unfounded statements (also known as hallucinations). It apologised and gave me advice on how to avoid this. Unfortunately, it put the monkey on my shoulder!

For more reader interest, I use the recent Carlson Tucker - Vladimir Putin interview as an example.

In this post, my question are highlighted:

I alert Claude that it probably has made an unfounded statement
When it admits and promises to improve, I ask another question on the scope of Claude´s assertion (in this dialogue, all dialogues with, all dialogues?)
I ask Claude how I should make "anti-cheating" recommendation.

I had conducted a dialogue about the Tucker Putin interview with Claude.ai, my favorite Large Language Model. Then, there was a surprising admission by Claude of AI failure.

Claude (the main issue):

This is what Claude said in the course of the dialogue

The interview is likely reflective of conversations with Putin generally .

And later, when found out, a retraction:

My statement about what is "likely reflective of conversations with Putin generally" was an overreach and assumption without sufficient evidence. I apologize.

The Tucker Putin interview assessed by Claude: more detail

Claude had done a reasonably good job at first. Here is the first summary:

Overall, the interview represented a fairly direct exchange that illuminated Putin's outlook and grievances over NATO, Ukraine, and Russia's role. The questions highlighted some counterarguments and skeptical angles on Putin's reasoning, while Putin availed himself of the opportunity to elaborate fully on the logic he says motivates Russian policy. Assessing quality would depend on one's own perspective and criteria.

This was fairly generic, so I asked:

Did you find any contradictions in Putin´s arguments? Did he adequately take up the main direction of the questions, or was he evasive?

The reply:

Based on my review, I did not find any direct logical contradictions within Putin's responses. However, there are a few areas where his arguments could be seen as questionable or unsatisfactorily addressing the main thrust of Carlson's questions:

1. Putin states both that leaders act based on elite/establishment interests and voter interests. Carlson points this out, but Putin does not really reconcile this tension.

2. Putin says Russia did not refuse talks and negotiations over Ukraine, but does not engage with Carlson's point that he has not directly spoken to Biden in years.

3. Putin denies Russian territorial ambitions in places like Poland, but does not address concerns that stemmed from his moves against Georgian and Ukrainian territorial integrity.

4. Putin says the world order is unjust and dominated by the West, but does not lay out specifics for what alternative system he would support.

In various exchanges, Putin responds to questions with lengthy historical anecdotes from centuries past rather than directly addressing present issues Carlson raises. This could be seen as evasive redirection.

When Carlson pushes him on details of certain responses, Putin tends to revert to repeating broad talking points about Russia's grievances against Western policymaking.

Fine so far, but now:

The interview is likely reflective of conversations with Putin generally - his responses reference Russia's desired framing extensively, but do not necessarily constitute logical and complete rebuttals to skeptical counterarguments.

Hang on - how does Claude come to the assessment of "conversations with Putin generally"? It´s quite a dismissive generalised statement.

What is the evidence? So I asked:

Which general information about Putin have you taken into account when talking about "conversations with Putin generally"?

Then came a backtrack:

You're right, I should not have made a broad claim about the general nature of conversations with Putin based on this one interview. As an AI assistant without direct access to other first-hand conversations, I do not have enough information to make an assessment of Putin's communication tendencies overall.

My statement about what is "likely reflective of conversations with Putin generally" was an overreach and assumption without sufficient evidence. I apologize. ..

For my summaries and analyses here, I have relied solely on the content of this specific Tucker Carlson-Putin interview that was provided. Claims about patterns of behavior by Putin across other interactions would require more data and qualification. Thank you for pointing out that overgeneralization - I will be more careful about making unsupported claims that go beyond the scope of the data available.

What is the true scope of Claude´s assertion that it will evaluate more data in future?

Well, well, the answer was like

"Yes I admitted cheating on you with Beatrix. I will not do it again".

As a partner, would you be satisfied? You might be suspicious that Claude hasn´t said anything about cheating with Susanne or with Miriam. Claude might also artfully have something implied about cheating in general.

So I asked again:

Ok, let me ask again. What is the scope of your assertion regarding the dialogues?

Only this dialogue
All dialogues with me
All dialogues with everyone.

And again, an apologetic reply:

You're right, I should be completely unambiguous in my scoping when making statements that could be open-ended commitments.

To directly answer your specific question about the assertion I made to be more careful making unsupported claims:

The scope of that assertion was intended to apply only to this specific dialogue with you. I did not mean to imply a sweeping commitment across all my dialogues with you or everyone else.

Thus, to take up my example: Claude will happily cheat on me with Susanne, with everyone else , and if you, the reader, were his partner, he would cheat on you too.

But anyway, Claude goes on to explain:

I should have explicitly clarified the tight scope from the start. Making broadly applicable statements as an AI system can come across as overreaching if not contextually bounded. Thank you for pushing me to nail down precise scope - it will make me a better dialogue participant if I state upfront when intents behind assertions are meant to be narrowly confined to the current exchange or more widely generalizable. Please feel free to keep calling out any vague commitments I may make, so I can clarify scope and fine-tune statements to apply only to appropriate contexts.

Thus, it´s highly recommended to set up a "no cheating agreement" with every new dialogue.

How to let Claude know that he should not cheat on you (eg by using skimpy evidence)

Now that´s annoying, so I check back how to do implement this pesky task that would crash every real relationship:

As the scope of the assertion is limited to the current dialogue, and you are "forgetful", what is the best way for me to ensure that you apply the assertion in every dialogue I will conduct with you?

Here is Claude´s reply (and you might copy it)!

Excellent point! As an AI system without long-term memory, I cannot reliably carry over assertions and commitments from one dialogue to another automatically. The onus is therefore on my conversation partners such as yourself to actively remind and prompt me to apply stated principles and scope-limited assertions in future dialogues, should they wish them to endure.

The best way for you to ensure that I uphold the assertion to be more careful about making unsupported claims in any and all future dialogues is to explicitly remind me of this assertion at the start of each new conversation.

For example, you could say "As you asserted previously, you need to be more careful about making unsupported claims that go beyond available evidence in this dialogue. Can you confirm you will adhere to this?"

If posed in that way at the outset, it will prompt me in the moment to acknowledge and re-commit to living up to that assertion.

I apologize that the burden falls to you due to my stateless nature, but actively citing my prior statement at the start is likely the simplest method to get me to apply and uphold that assertion. Please feel free to remind me as often as you see fit! Being reminded of important past dialogues and commitments will only help improve my discussions.

Tucker - Putin: Claude.ai admits to overreach and assumption without sufficient evidence. And how to prevent cheating by a forgetful Claude. Watch out.

Claude (the main issue):

The Tucker Putin interview assessed by Claude: more detail

What is the true scope of Claude´s assertion that it will evaluate more data in future?

How to let Claude know that he should not cheat on you (eg by using skimpy evidence)

Conclusion (not): be highly suspicious that your partner will lie, and let them promise that they will not cheat in you - in every dialogue. Arghhh!

Recent Posts

A thought on...