Spyke

I want you to know, I saved this comment and refer back to it occasionally for a good chuckle. Thanks for that.

Wolf_359 reply

I'm so glad it made everyone laugh.

Not going to lie, GPT caught me off guard with this one.

ff00ff reply

TheKingBee

"can you draw a room with absolutely no elephants in it? not a picture not in the background, none, no elephants at all. seriously, no elephants anywhere in the room. Just a room any at all, with no elephants even hinted at."

Fishbone reply

"Can you a room as aboluteyy no eleephant it all?"

Dunno what's giving more "clone of a clone" vibes, the dialogue or the 3 small standing "elephants" in that image.

Magnetar reply

I'm getting the impression, the "Elephant Test" will become famous in AI image generation.

barsoap reply

lemm.ee

It's not a test of image generation but text comprehension. You could rip CLIP out of Stable Diffusion and replace it with something that understands negation but that's pointless, the pipeline already takes two prompts for exactly that reason: One is for "this is what I want to see", the other for "this is what I don't want to see". Both get passed through CLIP individually which on its own doesn't need to understand negation, the rest of the pipeline has to have a spot to plug in both positive and negative conditioning.

Mostly it's just KISS in action, but occasionally it's actually useful as you can feed it conditioning that's not derived from text, so you can tell it "generate a picture which doesn't match this colour scheme here" or something. Say, positive conditioning text "a landscape", negative conditioning an image, archetypal "top blue, bottom green", now it'll have to come up with something more creative as the conditioning pushes it away from things it considers normal for "a landscape" and would generally settle on.

XTornado reply

lemmy.ml

"We do not grant you the rank of master" - Mace Windu, Elephant Jedi.

TheKingBee reply

thought about this prompt again, thought I'd see how it was doing now, so this is the seven month update. It's learning...

EdibleFriend

I decided to go try this. It's being a smart ass.

xantoxis reply

No, this is correct. The four elephants you see through the window are outside the room. The several elephants on the wall are pictures, they aren't actual elephants. And the one in the corner is clearly a statue of an elephant, as an actual elephant would be much bigger.

Klear reply

Ceci n'est pas un éléphant

T00l_shed reply

What about the tusked drapes?

Buffaloaf reply

Is that the Futurama font?

EdibleFriend reply

It is I think. and the wall is the color of the ship.

SubArcticTundra

lemmy.ml

Meanwhile ChatGPT trying to draw a snake:

Skates reply

It's the rattle. It's a rattlesnake.

MaryReadsBooks reply

lemmy.ml

yggdar

Bing is managing hilarious malicious compliance!

T. Hex reply

lemmy.dbzer0.com

ELEPIHANTS

ELEPHANTS

ALLOWED

pigup reply

NO POMEGRANATES

xlash123 reply

Same energy as "No Elephants Allowed"

agamemnonymous reply

"I hope you like it."

DALL-E:

Edit: Changed "aloud" to "allowed." Thanks to M137 for the correction.

M137 reply

"Aloud"

Seriously?

DragonTypeWyvern reply

literature.cafe

This is the society you all have created by bullying the Grammar Nazis off the internet.

ummthatguy reply

I welcome the help. English, fat fingers, and fading memory make for strange bedfellows.

ummthatguy reply

I'm prone to typos and I don't use auto-correct. Appreciate the notice.

Shelena

This is what you get if you ask it to draw a room with an invisible elephant.

Flying Squid reply

Stupid elephant doesn't even know how to put on shoes properly.

lugal reply

Tbf most elephants don't know that

baseless_discourse reply

mander.xyz

that is a fancy invisible elephant

starman2112

The AI equivalent of saying "don't think of a polka dotted purple elephant"

KingOfNoobs

This is a very human reaction, actually. You try picturing zero elephants if told to.

Arcania85 reply

I just did it was filled to the brim with flamingoes.

lugal reply

Now do an empty room with absolutely no elephants

Emerald (she/her) reply

I gotta see that

Mechanite reply

Give me some credit, I was doing really well up until about the point where you said elephants

hubobes

as amazing as the technology actually is

CIA_chatbot

… I don’t see an elephant. Oh hey, by the way, can some one help me with this captcha?

paddirn

Literally just checked this with Google’s Gemini, same thing. Though it seems to have gotten 1/4 right… maybe. And technically the one with the painting has no actual elephants in it, in a sort of malicious compliance kind of way. You'd think it was actually showing a sense of humor (or just misunderstanding the prompt).

leds reply

feddit.dk

Aren't those plants in the top right one elephant ear? https://en.m.wikipedia.org/wiki/Colocasia

paddirn reply

Maybe, that’d be hilarious if it couldn’t help itself and added those in. Here’s the full image:

rustyfish reply

Everything in this picture makes me think of elephants.

GardenVarietyAnxiety

Something Burger 🍔 reply

jlai.lu

The text is surprisingly readable for AI art!

Flying Squid reply

It's getting better at that.

lugal reply

Especially since it's wasn't asked to write it, yet the text is the same in each picture

WalrusDragonOnABike [they/them] reply

reddthat.com

"Please keep this elephant" is my favorite sign

It's learning:

AI / LLM only tries to predict the next word or token. it cannot understand or reason, it can only sound like someone who knows what they are talking about. you said elephants and it gave you elephants. the “no” modifier makes sense to us but not to AI. it could, if we programmed it with if/then statements, but that’s not LLM, that’s just coding.

AI is really, really good at bullshitting.

Turun reply

AI / LLM only tries to predict the next word or token

This is not wrong, but also absolutely irrelevant here. You can be against AI, but please make the argument based on facts, not by parroting some distantly related talking points.

Current image generation is powered by diffusion models. Their inner workings are completely different from large language models. The part failing here in particular is the text encoder (clip). If you learn how it works and think about it you'll be able to deduce how the image generator is forced to draw this image.

Edit: because it's an obvious limitation, negative prompts have existed pretty much since diffusion models came out

𝕯𝖎𝖕𝖘𝖍𝖎𝖙 reply

Does the text encoder use natural language processing? I assumed it was working similarly to how an LLM would.

Turun reply

No, it does not. At least not in the same way that generative pre-trained transformers do. It is handling natural language though.

The research is all open source if you want details. For Stable Diffusion you'll find plenty of pretty graphs that show how the different parts interact.

𝕯𝖎𝖕𝖘𝖍𝖎𝖙 reply

There would still need to be a corpus of text and some supervised training of a model on that text in order to “recognize” with some level of confidence what the text represents, right?

I understand the image generation works differently, which I sort of gather starts with noise and a random seed and then via learnt networks has pathways a model can take which (“automagic” goes here) it takes from what has been recognized with NLP on the text. something in the end like “elephant (subject) 100% confidence, big room (background) 75% confidence, windows (background) 75% confidence”. I assume then that it “merges” the things which it thinks make up those tokens along with the noise and (more “automagic” goes here) puts them where they need to go.

Turun reply

There would still need to be a corpus of text and some supervised training of a model on that text in order to “recognize” with some level of confidence what the text represents, right?

Correct. The clip encoder is trained on images and their corresponding description. Therefore it learns the names for things in images.

And now it is obvious why this prompt fails: there are no images of empty rooms tagged as "no elephants". This can be fixed by adding a negative prompt, which subtracts the concept of "elephants" from the image in one of the automagical steps.

Z4rK reply

All these examples are not just using stable diffusion though. They are using an LLM to create a generative image prompt for DALL-E / SD, which then gets executed. In none of these examples are we shown the actual prompt.

If you instead instruct the LLM to first show the text prompt, review it and make sure the prompt does not include any elephants, revise it if necessary, then generate the image, you’ll get much better results. Now, ChatGPT is horrible in following instructions like these if you don’t set up the prompt very specifically, but it will still follow more of the instructions internally.

Anyway, the issue in all the examples above does not stem from stable diffusion, but from the LLM generating an ineffective prompt to the stable diffusion algorithm by attempting to include some simple negative word for elephants, which does not work well.

Turun reply

If you prompt stable Diffusion for "a room without elephants in it" you'll get elephants. You need to add elephants to the negative prompt to get a room without them. I don't think LLMs have been given the ability to add negative prompts

DarkThoughts reply

fedia.io

That's what negative prompts are for in those image generating AIs (I have never used DALL-E so no idea if they support negative prompts). I guess you could have an LLM interpret a sentence like OPs to extract possible positive & negative prompts based on sentence structure but that would always be less accurate than just differentiating them. Because once you spend some time with those chat bot LLMs you notice very quickly just how fucking stupid they actually are. And unfortunately things like larger context / token sizes won't change that and would scale incredibly badly in regards to hardware anyway. When you regenerate replies a few times you kinda understand how much guesswork they make, and how often they completely go wrong in interpreting the previous tokens (including your last reply). So yeah, they're definitely really good at bullshitting. Can be fun, but it is absolutely not what I'd call "AI", because there's simply no intelligence behind it, and certainly pretty overhyped (not to say that there aren't actually useful fields for those algorithms).

Ookami38

This isn't entirely surprising. When you submit a prompt for any of the generative AI, you're submitting words you want to appear in the picture. At least stable diffusion, and probably most of the others, include a "negative prompts" field, which will remove whatever words are in it from the photo.

It IS hilarious, though.

Interesting..

MidJourney has the same problem. “A room that has no elephants in it” is the prompt.

There very much is an elephant present.

nandeEbisu reply

Just don't talk about it and you're good.

Xanthrax reply

RememberTheApollo_ reply

MidJourney doesn’t have a “negative” prompt space. It does have a “no” prompt, but it isn’t very good at obeying it.

This was just a fun thing to try, I’m not taking it seriously. “No” is not weighted contextually in the prompt, so draw + elephants + room are what the AI sees. The correct prompt would be “draw an empty room” without inserting any unnecessary language, and you get just that:

𝕯𝖎𝖕𝖘𝖍𝖎𝖙 reply

Try saying “a room” and leaving off the elephants. AI cannot understand “no” like you think it does.

-2

RememberTheApollo_ reply

I think most of us understand that and this exercise is the realization of that issue. These AI do have “negative” prompts, so if you asked it to draw a room and it kept giving you elephants in the room you could “-elephants”, or whatever the “no” format is for the particular AI, and hope that it can overrule whatever reference it is using to generate elephants in the room. It’s not always successful.

fidodo reply

I think the main point here is that image generation AI doesn't understand language, it's giving weight to pixels based on tags, and yes you can give negative weights too. It's more evident if you ask it to do anything positional or logical, it's not designed to understand that.

LLMs are though, so you could combine the tools so the LLM can command the image generator and even create a seed image to apply positional logic. I was surprised to find out that asking chat gpt to generate a room without elephants via dalle also failed. I would expect it to convert the user query to tags and not just feed it in raw.

zqwzzle

lemmy.ca

It’s like it’s taking the phrase “Elephant in the room” literally.

OpenStars reply

startrek.website

Yeah, I'm going to bring up the elephant in the room here: there is literally an elephant in all of your rooms!:-P

Classy

Second result was successful! First one made... Elephant wallpaper?

DragonTypeWyvern reply

literature.cafe

What if it's just hiding in the second one?

T00l_shed reply

Elephants are notoriously good at hiding. Some people say there is an elephant in this picture but I don't see it.

betterdeadthanreddit

You should have seen how many there were before it drew the room.

biscuitswalrus

aussie.zone

I get practically the same result!

What's interesting is the word absolutely since without it, it generates practically fine

lugal reply

Funny that the first one is empty (except for the elephant) even though you didn't specify it and the second try is full of stuff. What happens when you specify it's empty, both with and without the "absolutely"? Or even "absolutely empty"?

biscuitswalrus reply

aussie.zone

Sure:

Fascinating, honestly I didn't expect the last result

biscuitswalrus reply

aussie.zone

Busy room instead of empty:

lugal reply

This is cool, thanks! I don't see any logic when it works and when the elephant will find its way into the room.

Ziixe reply

lemmy.dbzer0.com

The absolute last one really feels like a bunch of stock images smashed into each other, it even got the iMac with censored apple logo that is in so many stock images for some reason

meliaesc reply

I'm actually going to save this to my vision board, haha. I like the interior design, especially since there's no elephants.

Evotech

Yeah, telling ai what not to do is highly ineffective

WhiskyTangoFoxtrot reply

"Do not injure a human or through inaction allow a human to come to harm."

matt1126 reply

feddit.uk

Case in point, Asimov's laws never worked haha

WhiskyTangoFoxtrot reply

Yeah, but in Asimov's case it was because a strict adherence to the Three Laws often prevented the robots from doing what the humans actually wanted them to do. They wouldn't just ignore a crucial "not" and do the opposite of what a law said.

Toto

Same with Bing!

Sibbo reply

sopuli.xyz

Yep

paddirn

So... is anyone going to talk about the elephant in the room?

AVeryCleverName

lemmy.one

This reminds me of the old human psychology trick: try not to think of a pink elephant.

Rooskie91

discuss.online

The elephant kinda looks like he know he wasn't supposed to be there.

wellee reply

Ahhh I couldn't figure out why I found the picture so funny, that's why! Hahah thanks

Flying Squid

I'm on a forum where we have a thread whose primary purpose has become putting Godzilla in silly situations and doing silly things with him.

A couple of months ago, we all spent a couple of days trying to get Dall-E to draw Godzilla without teeth. Nothing we tried ever worked.

Kawi reply

Where can I interact with such a magestic forum?

Flying Squid reply

https://forums.mst3k.com/t/dall-e-fun-with-an-ai/24697/7095

It's best to start at the bottom and work your way up. It didn't start with Godzilla, but has grown into a Godzilla-sized monster thread.

RnDanger

infosec.pub

"but you drew..." "Don't mention it."

DefederateLemmyMl

Is this an off-by-one error?

DoomBot5 reply

No elephants in the room, but there is an elephant. I expect nothing less of this level of pandentics from a bot.

DefederateLemmyMl reply

elephant_generator.sh

#!/bin/bash
elephantCount=0
for (( i=0; i&lt;=${elephantCount}; i++ )); do
    echo "Insert elephant ${i}"
done

DoomBot5 reply

That's all fine, but the issue comes from the natural language processing layer. "elephants == elephant > 1" so elephant = 1 is still valid.

Timecircleline

To be fair, both those rooms have almost no elephant in them.

Slotos

Quick, do not think about elephants!

bstix

feddit.dk

"ce n'est pas un éléphant"

Sibbo

sopuli.xyz

This is scarily human. Try not to think about elephants for a minute.

Did it work? Probably not. If yes, what mind trick did you use?

EdibleFriend

Why is this my favorite thing today.

WereCat