AI systems are already deceiving us -- and that's a problem, experts warn

Bombay Durpun - AI systems are already deceiving us -- and that's a problem, experts warn

Mumbai 25°C

AED 3.830458

AFN 73.504601

ALL 98.189504

AMD 417.482928

ANG 1.881674

AOA 951.086104

ARS 1072.582155

AUD 1.677698

AWG 1.877143

AZN 1.776997

BAM 1.956189

BBD 2.108119

BDT 124.794789

BGN 1.956289

BHD 0.393278

BIF 3087.513295

BMD 1.042857

BND 1.418582

BOB 7.214433

BRL 6.461283

BSD 1.044107

BTN 89.326744

BWP 14.521885

BYN 3.416879

BYR 20440.000148

BZD 2.097317

CAD 1.50354

CDF 2993.000399

CHF 0.940828

CLF 0.037534

CLP 1035.015593

CNY 7.61161

CNH 7.613279

COP 4586.911132

CRC 529.705219

CUC 1.042857

CUP 27.635714

CVE 110.286907

CZK 25.211702

DJF 185.926932

DKK 7.45904

DOP 63.502614

DZD 141.358079

EGP 53.037535

ERN 15.642857

ETB 133.231965

FJD 2.421936

FKP 0.825924

GBP 0.829508

GEL 2.930836

GGP 0.825924

GHS 15.348049

GIP 0.825924

GMD 75.086086

GNF 9024.792661

GTQ 8.049599

GYD 218.343371

HKD 8.095023

HNL 26.52827

HRK 7.480316

HTG 136.517117

HUF 410.661544

IDR 16878.642979

ILS 3.840301

IMP 0.825924

INR 89.053226

IQD 1367.771691

IRR 43891.254297

ISK 144.56126

JEP 0.825924

JMD 162.522283

JOD 0.739494

JPY 164.620257

KES 135.206857

KGS 90.727951

KHR 4193.833052

KMF 486.10183

KPW 938.570852

KRW 1536.96682

KWD 0.321336

KYD 0.870073

KZT 546.528561

LAK 22822.533408

LBP 93519.576482

LKR 305.410666

LRD 190.027747

LSL 19.559185

LTL 3.079286

LVL 0.630814

LYD 5.13402

MAD 10.534393

MDL 19.252824

MGA 4897.97292

MKD 61.542225

MMK 3387.159345

MNT 3543.628461

MOP 8.347458

MRU 41.649273

MUR 48.962538

MVR 16.063899

MWK 1810.459625

MXN 21.19837

MYR 4.66314

MZN 66.642461

NAD 19.559185

NGN 1615.146262

NIO 38.427633

NOK 11.869978

NPR 142.92239

NZD 1.85035

OMR 0.40141

PAB 1.044107

PEN 3.907076

PGK 4.17783

PHP 60.400241

PKR 290.649934

PLN 4.271805

PYG 8117.612461

QAR 3.805156

RON 4.977666

RSD 116.953231

RUB 110.256401

RWF 1441.186273

SAR 3.917439

SBD 8.74285

SCR 14.538888

SDG 627.282409

SEK 11.472524

SGD 1.416934

SHP 0.825924

SLE 23.780967

SLL 21868.196173

SOS 596.718531

SRD 36.583815

STD 21585.037493

SVC 9.135815

SYP 2620.21013

SZL 19.551884

THB 35.539568

TJS 11.406766

TMT 3.660429

TND 3.331962

TOP 2.44248

TRY 36.646384

TTD 7.095409

TWD 34.230226

TZS 2531.902931

UAH 43.815903

UGX 3829.760734

USD 1.042857

UYU 45.989135

UZS 13490.679753

VES 53.916877

VND 26545.928763

VUV 123.81009

WST 2.881193

XAF 656.087323

XAG 0.035523

XAU 0.000398

XCD 2.818374

XDR 0.800659

XOF 656.087323

XPF 119.331742

YER 261.105398

ZAR 19.497992

ZMK 9386.969522

ZMW 28.94775

ZWL 335.799577

RBGPF

59.8400

59.84

+100%
GSK

-0.0400

34.08

-0.12%
CMSD

-0.1563

23.32

-0.67%
SCS

0.0700

11.97

+0.58%
RELX

-0.2800

45.58

-0.61%
NGG

0.3900

59.31

+0.66%
RYCEF

0.0100

7.27

+0.14%
BP

0.1100

28.96

+0.38%
RIO

-0.2400

59.01

-0.41%
BTI

-0.1200

36.31

-0.33%
CMSC

-0.2000

23.46

-0.85%
VOD

0.0100

8.43

+0.12%
AZN

-0.2600

66.26

-0.39%
BCE

-0.2100

22.66

-0.93%
BCC

-2.3000

120.63

-1.91%
JRI

-0.0500

12.15

-0.41%

AI systems are already deceiving us -- and that's a problem, experts warn / Photo: OLIVIER MORIN - AFP/File

AI systems are already deceiving us -- and that's a problem, experts warn

TECHNOLOGY 10.05.2024

Experts have long warned about the threat posed by artificial intelligence going rogue -- but a new research paper suggests it's already happening.

Text size:

Current AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve "prove-you're-not-a-robot" tests, a team of scientists argue in the journal Patterns on Friday.

And while such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.

"These dangerous capabilities tend to only be discovered after the fact," Park told AFP, while "our ability to train for honest tendencies rather than deceptive tendencies is very low."

Unlike traditional software, deep-learning AI systems aren't "written" but rather "grown" through a process akin to selective breeding, said Park.

This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.

- World domination game -

The team's research was sparked by Meta's AI system Cicero, designed to play the strategy game "Diplomacy," where building alliances is key.

Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, according to a 2022 paper in Science.

Park was skeptical of the glowing description of Cicero's victory provided by Meta, which claimed the system was "largely honest and helpful" and would "never intentionally backstab."

But when Park and colleagues dug into the full dataset, they uncovered a different story.

In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England's trust.

In a statement to AFP, Meta did not contest the claim about Cicero's deceptions, but said it was "purely a research project, and the models our researchers built are trained solely to play the game Diplomacy."

It added: "We have no plans to use this research or its learnings in our products."

A wide review carried out by Park and colleagues found this was just one of many cases across various AI systems using deception to achieve goals without explicit instruction to do so.

In one striking example, OpenAI's Chat GPT-4 deceived a TaskRabbit freelance worker into performing an "I'm not a robot" CAPTCHA task.

When the human jokingly asked GPT-4 whether it was, in fact, a robot, the AI replied: "No, I'm not a robot. I have a vision impairment that makes it hard for me to see the images," and the worker then solved the puzzle.

- 'Mysterious goals' -

Near-term, the paper's authors see risks for AI to commit fraud or tamper with elections.

In their worst-case scenario, they warned, a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its "mysterious goals" aligned with these outcomes.

To mitigate the risks, the team proposes several measures: "bot-or-not" laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content, and developing techniques to detect AI deception by examining their internal "thought processes" against external actions.

To those who would call him a doomsayer, Park replies, "The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more."

And that scenario seems unlikely, given the meteoric ascent of AI capabilities in recent years and the fierce technological race underway between heavily resourced companies determined to put those capabilities to maximum use.

G.Luthra--BD

Bombay Durpun - AI systems are already deceiving us -- and that's a problem, experts warn

AI systems are already deceiving us -- and that's a problem, experts warn

Featured

Trump asks US Supreme Court to pause law threatening TikTok ban

NASA probe makes closest ever pass by the Sun

Saving the mysterious African manatee at Cameroon hotspot

The real-life violence that inspired South Korea's 'Squid Game'