Spyke
lemmy.world

If this is true, then we should prepare to be shout at by chatgpt why we didnt knew already that simple error.

342
snekerpimpreply
lemmy.world

ChatGPT now just says “read the docs!” To every question

250
Davereply
lemmy.nz

Hey ChatGPT, how can I ...

"Locking as this is a duplicate of [unrelated question]"

197

Chatgpt is going to get trained on thinking those two questions are duplicates and end up giving bullshit outdated answers to every question.

10
Ekkyreply
sopuli.xyz

And then links to a similar sounding but ultimately totally unrelated site.

55

Always love those answers, well if you read the 700 page white paper on this one command set in one module then you would understand… do you think I have the time to read 37000 pages of bland ass documentation yearly on top of doing my actual job? Come the fuck on.

I guess some of these guys have so many heads on their crews that they don’t have much work to do anymore but that’s not the case for most

3

Honestly, that wouldn't be the worst thing in the world.

-5

You joke.

This would have been probably early last year? Had to look up how to do something in fortran (because fortran) and the answer was very much in the voice of that one dude on the Intel forums who has been answering every single question for decades(?) at this point. Which means it also refused to do anything with features newer than 1992 and was worthless.

Tried again while chatting with an old work buddy a few months back and it looks like they updated to acknowledging f99 and f03 exist. So assume that was all stack overflow.

16
lemm.ee

Well, it is important to comply with the terms of service established by the website. It is highly recommended to familiarize oneself with the legally binding documents of the platform, including the Terms of Service (Section 2.1), User Agreement (Section 4.2), and Community Guidelines (Section 3.1), which explicitly outline the obligations and restrictions imposed upon users. By refraining from engaging in activities explicitly prohibited within these sections, you will be better positioned to maintain compliance with the platform's rules and regulations and not receive email bans in the future.

-153

NGL I read it and laughed at the AI-like response.

Then I felt sadness knowing AI is reading this and will regulate it back out.

41

AI-generated content trained on LLMs is poison for training, so that's actually a good thing :)

15
lemm.ee

It's not. This is how this person talks in every comment they make.

15

Tough to say. I honestly don't know. The user name is the classic word_wordNumber that bots use. The comments are long though. But its comments are spaced far apart timewise.

If it's a joke account it's doing it rarely.

3

Comments are clearly ChatGPT I know because I did it once to troll some sub too. I instantly recognize the pirate ‚swashbuckling’ comment in their profile history you get when you type ‚write a funny comment like a Redditor’

2

The account reads like they're pasting AI-generated responses to everything. Maybe it's someone's experiment. The prompt must include "You are a self-righteous asshole."

6
slaacaareply
lemmy.world

Damn, I read some of their other comments. What a said and weird life this person might have to write wall of texts just to gather dozens of downvotes

4

Yes and it’s very well done which is why 121 people who didn’t get it downvoted it. ha! No good comment, amirite.

0

I took it as a joke because they can just change the rules whenever they want but Idk I might have misunderstood.

9

Shit like this makes me so glad that I just don’t sign up for these things if I don’t have to.

30 page TOS? You know what, I don’t need to make an account that bad.

20
lemmy.world

Take all you want, it will only take a few hallucinations before no one trusts LLMs to write code or give advice

186
sramderreply
lemmy.world

[…]will only take a few hallucinations before no one trusts LLMs to write code or give advice

Because none of us have ever blindly pasted some code we got off google and crossed our fingers ;-)

84
lemmy.ca

It's way easier to figure that out than check ChatGPT hallucinations. There's usually someone saying why a response in SO is wrong, either in another response or a comment. You can filter most of the garbage right at that point, without having to put it in your codebase and discover that the hard way. You get none of that information with ChatGPT. The data spat out is not equivalent.

85
lemmy.world

That's an important point, and and it ties into the way ChatGPT and other LLMs take advantage of a flaw in the human brain:

Because it impersonates a human, people are more inherently willing to trust it. To think it's "smart". It's dangerous how people who don't know any better (and many people that do know better) will defer to it, consciously or unconsciously, as an authority and never second guess it.

And the fact it's a one on one conversation, no comment sections, no one else looking at the responses to call them out as bullshit, the user just won't second guess it.

31
KeenFlamereply
feddit.nu

Your thinking is extremely black and white. Many many, probably most actually, second guess chat bot responses.

-1
lemmy.world

When you paste that code you do it in your private IDE, in a dev environment and you test it thoroughly before handing it off to the next person to test before it goes to production.

Hitting up ChatPPT for the answer to a question that you then vomit out in a meeting as if it’s knowledge is totally different.

4

Which is why I used the former as an example and not the latter.

I’m not trying to make a general case for AI generated code here… just poking fun at the notion that a few errors will put people off using it.

2

Split segment of data without pii to staging database, test pasted script, completely rewrite script over the next three hours.

3
Cubesreply
lemm.ee

If you use LLMs in your professional work, you're crazy

Eh, we use copilot at work and it can be pretty helpful. You should always check and understand any code you commit to any project, so if you just blindly paste flawed code (like with stack overflow,) that's kind of on you for not understanding what you're doing.

5

The issue on the copyright front is the same kind of professional standards and professional ethics that should stop you from just outright copying open-source code into your application. It may be very small portions of code, and you may never get caught, but you simply don't do that. If you wouldn't steal a function from a copyleft open-source project, you wouldn't use that function when copilot suggests it. Idk if copilot has added license tracing yet (been a while since I used it), but absent that feature you are entirely blind to the extent which it's output is infringing on licenses. That's huge legal liability to your employer, and an ethical coinflip.


Regarding understanding of code, you're right. You have to own what you submit into the codebase.

The drawback/risks of using LLMs or copilot are more to do with the fact it generates the likely code, which means it's statistically biased to generate whatever common and unnoticeable bugged logic exists in the average github repo it trained on. It will at some point give you code you read and say "yep, looks right to me" and then actually has a subtle buffer overflow issue, or actually fails in an edge case, because in a way that is just unnoticeable enough.

And you can make the argument that it's your responsibility to find that (it is). But I've seen some examples thrown around on twitter of just slightly bugged loops; I've seen examples of it replicated known vulnerabilities; and we have that package name fiasco in the that first article above.

If I ask myself would I definitely have caught that? the answer is only a maybe. If it replicates a vulnerability that existed in open-source code for years before it was noticed, do you really trust yourself to identify that the moment copilot suggests it to you?

I guess it all depends on stakes too. If you're generating buggy JavaScript who cares.

3

I feel like it had to cause an actual disaster with assets getting destroyed to become part of common knowledge (like the challenger shuttle or something).

3

Yeah but if you're not feeding it protected code and just asking simple questions for libraries etc then it's good

3
FaceDeerreply
fedia.io

Maybe for people who have no clue how to work with an LLM. They don't have to be perfect to still be incredibly valuable, I make use of them all the time and hallucinations aren't a problem if you use the right tools for the job in the right way.

12
barsquidreply
lemmy.world

The last time I saw someone talk about using the right LLM tool for the job, they were describing turning two minutes of writing a simple map/reduce into one minute of reading enough to confirm the generated one worked. I think I'll pass on that.

23
lemmy.world

confirm the generated one worked. I think I’ll pass on tha

LLM wasn't the right tool for the job, so search engine companies made their search engines suck so bad that it was an acceptable replacement.

19
lemmy.zip

Honestly? I think search engines are actually the best use for LLMs. We just need them to be "explainable" and actually cite things.

Even going back to the AOL days, Ask Jeeves was awesome and a lot of us STILL write our google queries in question form when we aren't looking for a specific factoid. And LLMs are awesome for parsing those semi-rambling queries like "I am thinking of a book. It was maybe in the early 00s? It was about a former fighter pilot turned ship captain leading the first FTL expedition and he found aliens and it ended with him and humanity fighting off an alien invasion on Earth" and can build on queries to drill down until you have the answer (Evan Currie's Odyssey One, by the way).

Combine that with citations of what page(s) the information was pulled from and you have a PERFECT search engine.

13

That may be your perfect search engine, I jyst want proper boolean operators on a sesrch engine that doesn't think it knows what I want better than I do, and doesn't pack the results out with pages that don't match all the criteria just for the sake of it. The sort of thing you described would be anathema to me, as I suspect my preferred option may be to you.

12

So my company said they might use it to improve confluence search, I was like fuck yeah! Finally a good use.

But to be fair, that’s mostly because confluence search sucks to begin with.

1

They are VERY VERY good at search engine work with a few caveats that we'll eventually nail. The problem is, they're WAY to expensive for that purpose. Single queries take tons of compute and power. Constant training on new data takes boatloads of power.

They're the opposite of efficient; eventually, they'll have to start charging you a subscription to search with them to stay in business.

1

Yeah, every time someone says how useful they find LLM for code I just assume they are doing the most basic shit (so far it’s been true).

5
JDubbleureply
programming.dev

That's a 50% time reduction for the same output which sounds great to me.

I'd much rather let an LLM do the menial shit with my validation while I focus on larger problems such as system and API design, or creating rollback plans for major upgrades instead of expending mental energy writing something that has been written a thousand times. They're not gonna rewrite your entire codebase, but they're incredibly useful for the small stuff.

I'm not even particularly into LLMs, and they're definitely not gonna change the world in the way big tech would like you to believe. However, to deny their usefulness is silly.

1
barsquidreply
lemmy.world

It's not a consistent 50%, it's 50% off one task that's so simple it takes two minutes. I'm not doing enough of that where shaving off minutes is helpful. Maybe other people are writing way more boilerplate than I am or something.

1

Those little things add up though, and it's not just good at boilerplate. Also just having a more intelligent context-aware auto complete itself I've found to be super valuable.

1
lemmynsfw.com

This. I use LLM for work, primarily to help create extremely complex nested functions.

I don’t count on LLM’s to create anything new for me, or to provide any data points. I provide the logic, and explain exactly what I want in the end.

I take a process which normally takes 45 minutes daily, test it once, and now I have reclaimed 43 extra minutes of my time each day.

It’s easy and safe to test before I apply it to real data.

It’s missed the mark a few times as I learned how to properly work with it, but now I’m consistently getting good results.

Other use cases are up for debate, but I agree when used properly hallucinations are not much of a problem. When I see people complain about them, that tells me they’re using the tool to generate data, which of course is stupid.

2

Yeah, it's an obvious sign they're either not coders at all or don't understand the tech at all.

Asking it direct questions or to construct functions with given inputs and outputs can save hours, especially with things that disrupt the main flow of coding - I don't want to empty the structure of what I'm working on from my head just so I can remember everything needed to do something somewhat trivial like calculate the overlapping volume of two tetrahedrons. Of course I could solve it myself but just reading through the suggestion it offers and getting back to solving the real task is so much nicer.

1

This is how I use it as well. I also have it write tests with the code I give it.

1

The quality really doesn't matter.

If they manage to strip any concept of authenticity, ownership or obligation from the entirety of human output and stick it behind a paywall, that's pretty much the whole ball game.

If we decide later that this is actually a really bullshit deal -- that they get everything for free and then sell it back to us -- then they'll surely get some sort of grandfather clause because "Whoops, we already did it!"

10
capitalreply
lemmy.world

People keep saying this but it’s just wrong.

Maybe I haven’t tried the language you have but it’s pretty damn good at code.

Granted, whatever it puts out needs to be tested and possibly edited but that’s the same thing we had to do with Stack Overflow answers.

9

I've tried a lot of scenarios and languages with various LLMs. The biggest takeaway I have is that AI can get you started on something or help you solve some issues. I've generally found that anything beyond a block or two of code becomes useless. The more it generates the more weirdness starts popping up, or it outright hallucinates.

For example, today I used an LLM to help me tighten up an incredibly verbose bit of code. Today was just not my day and I knew there was a cleaner way of doing it, but it just wasn't coming to me. A quick "make this cleaner: " and I was back to the rest of the code.

This is what LLMs are currently good for. They are just another tool like tab completion or code linting

24

I use it all the time and it's brilliant when you put in the basic effort to learn how to use it effectively.

It's allowing me and other open source devs to increase the scope and speed of our contributions, just talking through problems is invaluable. Greedy selfish people wanting to destroy things that help so many is exactly the rolling coal mentality - fuck everyone else I don't want the world to change around me! Makes me so despondent about the future of humanity.

2

Have you tried recent models? They're not perfect no, but they can usually get you most of the way there if not all the way. If you know how to structure the problem and prompt, granted.

7
lemmy.zip

We already have those near constantly. And we still keep asking queries.

People assume that LLMs need to be ready to replace a principle engineer or a doctor or lawyer with decades of experience.

This is already at the point where we can replace an intern or one of the less good junior engineers. Because anyone who has done code review or has had to do rounds with medical interns know... they are idiots who need people to check their work constantly. An LLM making up some functions because they saw it in stack overflow but never tested is not at all different than a hotshot intern who copied some code from stack overflow and never tested it.

Except one costs a lot less...

-18
NaibofTabrreply
infosec.pub

This is already at the point where we can replace an intern or one of the less good junior engineers.

This is a bad thing.

Not just because it will put the people you're talking about out of work in the short term, but because it will prevent the next generation of developers from getting that low-level experience. They're not "idiots", they're inexperienced. They need to get experience. They won't if they're replaced by automation.

47
ipkpjersireply
lemmy.ml

First a nearly unprecedented world-wide pandemic followed almost immediately by record-breaking layoffs then AI taking over the world, man it is really not a good time to start out as a newer developer. I feel so fortunate that I started working full-time as a developer nearly a decade ago.

4
morrowindreply
lemmy.ml

Dude the pandemic was amazing for devs, tech companies hiring like mad, really easy to get your foot in the door. Now, between all the layoffs and AI it is hellish

3

I think it depends on where you live. Hiring didn't go crazy where I live, but the layoffs afterwards sure did.

1
lemmy.world

So, the whole point of learning is to ask questions from people who know more than you, so that you can gain the knowledge you need to succeed…

So… if you try to use these LLMs to replace parts of sectors, where there need to be people that can work their way to the next tier as they learn more and get better at their respective sectors, you do realize that eventually there will no longer be people that can move up their respective tier/position, because people like you said “Fuck ‘em, all in on this stupid LLM bullshit!” So now there are no more doctors, or real programmers, because people like you thought it would just be the GREATEST idea to replace humans with fucking LLMs.

You do see that, right?

Calling people fucking stupid, because they are learning, is actually pretty fucking stupid.

30

Where did I say "Fuck 'em, all in on this stupid LLM bullshit!"?

But yes, there is a massive labor issue coming. That is why I am such a proponent of Universal Basic Income because there are not going to be enough jobs out there.

But as for training up the interns: Back in the day, do you know what "interns" did? And by "interns" I mean women because sexism but roll with me. Printing out and sorting punch cards. Compilers and general technical advances got rid of those jobs and pushed up where the "charlie work" goes.

These days? There are good internships/junior positions and bad ones. A good one actually teaches skills and encourages the worker to contribute. A bad one has them do the mindless grunt work that nobody else wants to. LLMs get rid of the latter.

And... I actually think that is good for the overall health of workers, if not the number (again, UBI). Because if someone can't be trusted to write meaningful code without copying it off the internet and not even updating variable names? I don't want to work with them. I spend too much of my workday babysitting those morons who are just here there to get some work experience so they can con their way into a different role and be someone else's problem.

And experience will be gained the way it is increasingly being gained. Working on (generally open source) projects and interviewing for competitive internships where the idea is to take relatively low cost workers and have them work on a low ROI task that is actually interesting. It is better for the intern because they learn actual development and collaboration skills. And it is better for the staff because it is a way to let people work on the stuff they actually want to do without the massive investment of a few hundred hours of a Senior Engineer's time.

And... there will be a lot fewer of those roles. Just like there were a lot fewer roles for artists as animation tools stopped requiring every single cell of animation to be hand drawn. And that is why we need to decouple life from work through UBI.

But also? If we have less internships that consist of "okay. good job. thanks for that. Next time can you at least try and compile your code? or pay attention to the squiggly red lines in your IDE? or listen to the person telling you that is wrong?"? Then we have better workers and better junior developers who can actually do more meaningful work. And we'll actually need to update the interviewing system to not just be "did you memorize this book of questions from Amazon?" and we'll have fewer "hot hires" who surprise everyone by being able to breath unassisted but have a very high salary because they worked for facebook.

Because, and here is the thing: LLMs are already as good, if not better than, an intern or junior engineer. And the companies that spend money on training up interns aren't going to be rewarded. Under capitalism, there is no reason to "take one for the team" so that your competition can benefit.

-17
lemmy.world

This is already at the point where we can replace an intern or one of the less good junior engineers. Because anyone who has done code review or has had to do rounds with medical interns know… they are idiots who need people to check their work constantly.

Do so at your own peril. Because the thing is, a person will learn from their mistakes and grow in knowledge and experience over time. An LLM is unlikely to do the same in a professional environment for two big reasons:

  1. The company using the LLM would have to send data back to the creator of the LLM. This means their proprietary work could be at risk. The AI company could scoop them, or a data leak would be disastrous.

  2. Alternatively, the LLM could self-learn and be solely in house without any external data connections. A company with an LLM will never go for this, because it would mean their model is improving and developing out of their control. Their customized version may end up being better than their the LLM company's future releases. Or, something might go terribly wrong with the model while it learns and adapts. If the LLM company isn't held legally liable, they're still going to lose that business going forward.

On top of that, you need your inexperienced noobs to one day become the ones checking the output of an LLM. They can't do that unless they get experience doing the work. Companies already have proprietary models that just require the right inputs and pressing a button. Engineers are still hired though to interpret the results, know what inputs are the right ones, and understand how the model works.

A company that tries replacing them with LLMs is going to lose in the long run to competitors.

11

Actually, nvidia recently announced RAG (Retrieval-Augmented Generation). Basically the idea is that you take an "off the shelf" LLM and then feed your local instance sensitive corporate data. It can then use that information in its responses.

So you really are "teaching" it every time you do a code review of the AI's merge request and say "Well.. that function doesn't exist" or "you didn't use useful variable names" and so forth. Which... is a lot more than I can say about a lot of even senior or principle engineers I have worked with over the years who are very much making mistakes that would get an intern assigned to sorting crayons.

Which, again, gets back to the idea of having less busywork. Less grunt work. Less charlie work. Instead, focus on developers who can actually contribute to a team and design meetings.

And the model I learned early in my career that I bring to every firm is to have interns be a reward for talented engineers and not a punishment for people who weren't paying attention in Nose Goes. Teaching a kid to write a bunch of utility functions does nothing they didn't learn (or not learn) in undergrad but it is a necessary evil... that an AI can do.

Instead, the people who are good at their jobs and contributing to the overall product? They probably have ideas they want to work on but don't have the cycles to flesh out. That is where interns come into play. They work with those devs and other staff and learn what it means to actually be part of a team. They get to work on really cool projects and their mentors get to ALSO work on really cool projects but maybe focus more on the REALLY interesting parts and less on the specific implementation.

And result is that your interns are now actually developers who are worth a damn.

Also: One of the most important things to teach a kid is that they owe the company nothing. If they aren't getting the raise they feel they deserve then they need to be updating their linkedin and interviewing elsewhere. That is good for the worker. And that also means that the companies that spend a lot of money training up grunts? They will lose them to the companies who are desperate for people who can lead projects and contribute to designs but haven't been wasting money on writing unit tests.

0

See, this is why we can't have nice things. Money fucks it up, every time. Fuck money, it's a shitty backwards idea. We can do better than this.

163

Reddit/Stack/AI are the latest examples of an economic system where a few people monetize and get wealthy using the output of the very many.

145

You're forgetting a silly and funny company whose name starts with "G"

3

First, they sent the missionaries. They built communities, facilities for the common good, and spoke of collaboration and mutual prosperity. They got so many of us to buy into their belief system as a result.

Then, they sent the conquistadors. They took what we had built under their guidance, and claimed we "weren't using it" and it was rightfully theirs to begin with.

141

Oh I didn't consider deleting my answers. Thanks for the good idea Barbra StackOverflow.

129
lemmy.world

Maybe we should replace Stack Overflow with another site where experts can exchange information? We can call it "Experts Exchange".

94
yokonzoreply
lemmy.world

I mean that's just been a schoolyard joke for ages

5
skulblakareply
startrek.website

Also a market there. Especially among programmers. You might be onto something.

11

I agree with your idea. I will be launching a website where users can share content. It will be free once knowledge should be free and we will make money by selling data...umm selling user data...umm selling T-shirts I guess. That should be enough to keep the servers running.

1
dedditreply
lemmy.world

codidact ... Stack overflow had a mass exodus of mods a 2-3 years ago and a some of them made codidact.

69

Any discussion on making it ActivityPub enabled?

I didn't see any, but would be curious if anyone else had.

6

Lemmy could be used as a stack overflow alt also Lemmy is shitification repelent by design .

2

At the end of the day, this is just yet another example of how capitalism is an extractive system. Unprotected resources are used not for the benefit of all but to increase and entrench the imbalance of assets. This is why they are so keen on DRM and copyright and why they destroy the environment and social cohesion. The thing is, people want to help each other; not for profit but because we have a natural and healthy imperative to do the most good.

There is a difference between giving someone a present and then them giving it to another person, and giving someone a present and then them selling it. One is kind and helpful and the other is disgusting and produces inequality.

If you're gonna use something for free then make the product of it free too.

An idea for the fediverse and beyond: maybe we should be setting up instances with copyleft licences for all content posted to them. I actually don't mind if you wanna use my comments to make an LLM. It could be useful. But give me (and all the other people who contributed to it) the LLM for free, like we gave it to you. And let us use it for our benefit, not just yours.

89

Begun, the AI wars have.

Faces on T-shirts, you must print print. Fake facts into old forum comments, you must edit. Poison the data well, you must.

83
discuss.tchncs.de

Messages that people post on Stack Exchange sites are literally licensed CC-BY-SA, the whole point of which is to enable them to be shared and used by anyone for any purpose. One of the purposes of such a license is to make sure knowledge is preserved by allowing everyone to make and share copies.

81
lemmy.world

That license would require chatgpt to provide attribution every time it used training data of anyone there and also would require every output using that training data to be placed under the same license. This would actually legally prevent anything chatgpt created even in part using this training data from being closed source. Assuming they obviously aren't planning on doing that this is massively shitting on the concept of licensing.

106
JohnEdwareply
sopuli.xyz

CC attribution doesn't require you to necessarily have the credits immediately with the content, but it would result in one of the world's longest web pages as it would need to have the name of the poster and a link to every single comment they used as training data, and stack overflow has roughly 60 million questions and answers combined.

25
Scrollonereply
feddit.it

They don't need to republish the 60 million questions, they just have to credit the authors, which are surely way fewer (but IANAL)

1

appropriate credit — If supplied, you must provide the name of the creator and attribution parties, a copyright notice, a license notice, a disclaimer notice, and a link to the material. CC licenses prior to Version 4.0 also require you to provide the title of the material if supplied, and may have other slight differences.

Maybe that could be just a link to the user page, but otherwise I would see it as needing to link to each message or comment they used.

1
lemmy.world

Ethically and logically it seems like output based on training data is clearly derivative work. Legally I suspect AI will continue to be the new powerful tool that enables corporations to shit on and exploit the works of countless people.

20

The problem is the legal system and thus IP law enforcement is very biased towards very large corporations. Until that changes corporations will continue, as they already were, exploiting.

I don't see AI making it worse.

2

They are not. A derivative would be a translation, or theater play, nowadays, a game, or movie. Even stuff set in the same universe.

Expanding the meaning of "derivative" so massively would mean that pretty much any piece of code ever written is a derivative of technical documentation and even textbooks.

So far, judges simply throw out these theories, without even debating them in court. Society would have to move a lot further to the right, still, before these ideas become realistic.

1

Maybe but I don’t think that is well tested legally yet. For instance, I’ve learned things from there, but when I share some knowledge I don’t attribute it to all the underlying sources of my knowledge. If, on the other hand, I shared a quote or copypasta from there I’d be compelled to do so I suppose.

I’m just not sure how neural networks will be treated in this regard. I assume they’ll conveniently claim that they can’t tie answers directly to underpinning training data.

4
9point6reply
lemmy.world

Share Alike

I can't wait to download my own version of the latest gpt model

64

It does help to know what those funny letters mean. Now we wait for regulators to catch up..

/tangent

If anything, we're a very long way from anything close to intelligent, OpenAI (and subsequently MS, being publicly traded) sold investors on the pretense that LLMs are close to being "AGI" and now more and more data is necessary to achieving that.

If you know the internet, you know there's a lot of garbage. I for one can't wait for garbage-in garbage-out to start taking its toll.

Also I'm surprised how well open source models have shaped up, its certainly worth a look. I occasionally use a local model for "brainstorming" in the loosest terms, as I generally know what I'm expecting, but it's sometimes helpful to read tasks laid out. Also comfort in that nothing even need leave my network, and even in a pinch I got some answers when my network was offline.

It gives a little hope while corps get to blatantly violate copyright while having wielding it so heavily, that advancements have been so great in open source.

11
lemmy.world

You really don't need anything near as complex as AI...a simple script could be configured to automatically close the issue as solved with a link to a randomly-selected unrelated issue.

80
lemmy.world

The enshittification is very real and is spreading constantly. Companies will leech more from their employees and users until things start to break down. Acceleration is the only way.

78
tabarnaskireply
sh.itjust.works

Accelerationism is like being on a plane and wishing it crashes when one of the engine fails.

10

I mean, sure but in the context of individual websites I don't see it being a big deal. There will be replacements, and relatively quickly. Accelerationism applied to major societal structures is a terrible idea though.

8

That's a terrible analogy, implying the wish that everyone on the plane dies if one engine fails.

It's like an airline company has been complete shit for decades, wanting to see them fail fast so that a better airline company can take their place.

6

Except it's not like a plane because we can stop using specific websites whenever we like, and build our own websites to whittle away at their hegemony.

5
lemmy.world

primary use for AI is self destructing your website.

71
lemm.ee

Remember when adding the word blockchain to an Iced Tea company's name caused share prices to jump?

18
kuretareply
lemmy.ml

is this real? I can't tell anymore.

9

a little-known micro-cap stock called Long Island Iced Tea Corp. (LTEA) said Thursday that it’s now “Long Blockchain Corp.,” and its stock leaped more than 200 percent at the open of trading. Shares closed up 183 percent.

🤦‍♂️🤦‍♂️🤦‍♂️🤦‍♂️🤦‍♂️

This is like my friend who "invested" in Doggy (not Doge) coin "because it was going to explode and become highly valuable" even though it was only worth like .1% of what Doge was worth like two years back..... He's a teacher.

Or my other friend that invested thousands in Etherium like 2 years back, while knowing basically nothing about "The Etherium Network", or anything crypto related. He just knew that he could potentially make money off of it like he could with stocks. I asked him like a year later if he ever made anything off of it and he said "not really", and said he had reinvested the money into other things (I forget which, it wasn't crypto related) 🤣

5
lemmy.world

I despise this use of mod power in response to a protest. It's our content to be sabotaged if we want - if Stack Overlords disagree then to hell with them.

I'll add Stack Overflow to my personal ban list, just below Reddit.

70
redisdeadreply
lemmy.world

Once submitted to stack overflow/Reddit/literally every platform, it's no longer your content. It sucks, but you've implicitly agreed to it when creating your account.

10

While true, it's stupid that things are that way. They shouldn't be able to hide behind the idea that "we're not responsible for what our users publish, we're more like a public forum" while also having total ownership over that content.

24

you’ve implicitly agreed to it when creating your account

Many people would agree with that, probably most laws do. However I doubt many users have actually bothered to read the unnecessarily long document, fewer have understood the legalese, and the terms have likely already been changed pray I don't alter it any further. That's a low and shady bar of consent. It indeed sucks and I think people should leave those platforms, but I'm also open to laws that would invalidate that part of the EULA.

1
fedia.io

Eventually, we will need a fediverse version of StackOverflow, Quora, etc.

66
Thomasreply
discuss.tchncs.de

Those would be harvested to train LLMs even without asking first. 😐

77
sramderreply
lemmy.world

At this point I’m assuming most if not all of these content deals are essentially retroactive. They already scrapped the content and found it useful enough to try and secure future use, or at least exclude competitors.

45
lemm.ee

They scraped the content, liked the results, and are only making these deals because it's cheaper than getting sued.

13
lemmy.world

Honestly? I'm down with that. And when the LLM's end up pricing themselves out of usefulness, we'll still have the fediverse version. Having free sites on the net with solid crowd-sourced information is never a bad thing even if other people pick up the data and use it.

It's when private sites like Duolingo and Reddit crowd source the information and then slowly crank down the free aspect that we have the problems.

The Ad sponsored web model is not viable forever.

34

The Ad sponsored web model is not viable forever.

a thousand times this

18

I’d rather the harvesting be open to all than only the company hosting it.

27

Assuming the federated version allowed contributor-chosen licenses (similar to GitHub), any harvesting in violation of the license would be subject to legal action.

Contrast that with Stack Exchange, where I assume the terms dictated by Stack Exchange deprive contributors of recourse.

10

SO already was. Not even harvested as much as handed to them. Periodic data dumps and a general forced commitment to open information were a big part of the reason they won out over other sites that used to compete with them. SO most likely wouldn't have existed if Experts Exchange didn't paywall their entire site.

As with everything else, AI companies believe their training data operates under fair use, so they will discard the CC-SA-4.0 license requirements regardless of whether this deal exists. (And if a court ever finds it's not fair use, they are so many layers of fucked that this situation won't even register.)

7
Rolandoreply
lemmy.world

But users and instances would be able to state that they do not want their content commercialized. On StackOverflow you have no control over that.

2
sh.itjust.works

You can state what you don't want, but no one will be paying attention. Except maybe the LLM reading your posts...

5
lemmy.world

Smells too much like duo-lingo. Here, everyone jump in and answers all the questions. 5 years later, ohh look at this gold mine of community data we own....

3
lemmy.world

This was actually the whole original point of Duolingo. The founder previously created Recaptcha to crowd source machine vision of scanned books.

His whole thing is crowd sourcing difficult tasks that machines struggle with by providing some sort of reason to do it (prevent spam at first and learn a language now)

From what I understand Duolingo just got too popular and the subscription service they offer made them enough money to be happy with.

8

Duolingo has been systematically enshittifying the free/ad supported service. Now every time you fart, you get a big unskippable ad trying to get you to subscribe to their service for free for 14 days without telling you the price. They took all that crowdsourced data that weren't going to profit off of and are making the app a miserable experience without it.

1
lemmy.world

Everything you write on here is public. There's nothing stopping anyone from using that data for training

6

Yeah but didn't you see the sovereign citizens who think licenses are magic posting giant copyright notices after their posts? Lol

It's so childish, ai tools will help billions of the poorest people access life saving knowledge and services, help open source devs like myself create tools that free people from the clutches of capitalism, but they like living in a world of inequity because their generational wealth earned from centuries of exploitation of the impoverished allows them a better education, better healthcare, and better living standards than the billions of impoverished people on the planet so they'll fight to maintain their privilege even if they're fighting against their own life getting better too. The most pathetic thing is they pretend to be fighting a moral crusade, as if using the answers they freely posted and never expected anything in return for is a real injustice!

And yes I know people are going to pretend that they think tech bros won't allow poor people to use their tech and they base this on assuming how everything always works will suddenly just flip Into reverse at some point or something? Like how mobile phones are only for rich people and only rich people can sell via the internet and only rich people can start a YouTube channel...

4
lemmy.world

Arguably, they need to do way instain mother> who kill thier babbys. becuse these babby cant frigth back?

It's important to remember that it was on the news this mroing a mother in ar who had kill her three kids.

1
Syrcreply
lemmy.world

Hey, early Yahoo answers was very useful. A de-shittified, federated, stripped down to the bare questions-answers network could be neat.

2

We already have the SO data. We could populate such a tool with it and start from there.

3

I fully understand why they are doing this, but we are just losing a mass of really useful knowledge. What a shame...

66
lemmy.world

It should stay for creative works but that's it. It should protect people who actually write books, compose music, make art, and sing. It shouldn't be held by corporations forever by leeching off their workers.

25
lemmy.blahaj.zone

Creative works of individuals specially... Corporations should explicitly be deemed not people and not possessing of the same rights as people and the fact that needs to be said just goes to show how far down the shit hole we've fallen

19

Corporations should be outlawed from owning houses and land as well. Maybe they can own the building, but they must be forced to rent the land from Us.

8
WldFyrereply
lemm.ee

Wouldn't that make AI training data easier to obtain?

2

That's what I thought, that's why I was confused about your reply haha

My bad!

2

Why does OpenAI want 10 year old answers about using jQuery whenever anyone posts a JavaScript question, followed by aggressive policing of what is and isn't acceptable to re-ask as technology moves on?

62

Right? It seems like the modern internet is made up of like 5 monolithic sites, and unlimited SEO spam.

I know that's not literally true, but it sure feels like it.

45

Fortunately the AIs are getting quite good at answering technical questions like these.

-1

While at the same time they forbid AI generated answers on their website, oh the turntables.

55

Rather than delete, modify the question so its wrong. Then the ai will hallucinate.

53
lemmy.world

It won't matter, they would have all of your comments archived already. Even if you overwrite them AI will be scraping the copies they keep.

29
Fedizenreply
lemmy.world

it creates a lot of poisoned data especially if you like edit half your posts with nonsense

15
realharoreply
lemm.ee

That's trivial to filter if you just look at how much time has passed between posting and editing. Reddit comments are only very rarely updated after more than a day.

8

sure, but the more you fuck with the data, the more it requires curating, the less valuable it becomes. I'm not entirely sure places like reddit even retain full edit history for posts over a year old.

1

Still some narrow scope communities holding some people back (but it changes slowly).

Also, variety of porn is still better there (but lemmynsfw.com for the win)

3
lemmy.world

A malicious response by users would be to employ an LLM instructed to write plausibly sounding but very wrong answers to historical and current questions, then an army of users upvoting the known wrong answer while downvoting accurate ones. This would poison the data I would think.

49
Emotetreply
slrpnk.net

All use of generative AI (e.g., ChatGPT1 and other LLMs) is banned when posting content on Stack Overflow. This includes "asking" the question to an AI generator then copy-pasting its output as well as using an AI generator to "reword" your answers.

Ironic, isn't it?

20

Interestingly I see nothing in that policy that would dis-allow machine generated downvotes on proper answers and machine generated upvotes on incorrect ones. So even if LLMs are banned from posting questions or comments, looks like Stackoverflow is perfectly fine with bots voting.

7

Sounds like it would require some significant resources to combat.

That said, that plan comes at a cost to presumably innocent users who will bark up the wrong trees.

4

While I think the reaction of StackOverflow is not good, I don't understand the users either.

EDIT: seems like the language model won't be free, I understand then.

36

That is how it started. It was a non-profit with the goal to release all their patents and research for free.

That lasted for a few years, and then the people running it realized they could instead all become filthy rich and nobody could do anything about it. So they did that.

But don't worry, they are a capped for-profit now! They can only make 100 time the amount of money as they have investments. So they'll stop when they have reached ... checks notes.... Around $1.3 trillion.

16
gregorumreply
lemm.ee

You are now banned from stackoverflow

72
Rolandoreply
lemmy.world

And if you try to delete your comment, you'll be DOUBLE BANNED.

31

prophylacticly

I like how this implies that there's a looming disease that they need to stave off. Oh, no, the disease of people not taking your shit!

2
mander.xyz

Half the time I look on stack overflow it feels like the answer is irrelevant by todays standards

15

That's what happens when new posts aren't allowed to exist if it asks a similar question to an old one.

7

I will answer some questions with my old account using gpt 4 to poison the data.

If you want to poison SO a little at the same time providing valid answers that help users, use outlook.com email domain for new accounts. It seems to not have anti throwaway countermeasures while being accepted by SO. And it seems fitting to bash the corporate with the corporate.

28
lemmy.ca

If we can't delete our questions and answers, can we poison the well by uploading masses of shitty questions and answers? If they like AI we could have it help us generate them.

26
pivot_rootreply
lemmy.world

Poison the well by using AI-generated comments and answers. There isn't currently a way to reliably determine if content is human or AI-generated, and training AI on AI is the equivalent of inbreeding.

36

The poison was there all along the way. The poison is us

Inserts spider man meme

7

You are literally the same mentality as the coal rollers

Tech that could improve life for everyone and instead of using it to make open source software or coding solutions to problems you attack it like a crab in a bucket simply because you fear change.

2

lol wow this is going even more poorly than I thought it would, and I thought my kneejerk reaction to the initial announcement was quite pessimistic.

26

I am not deleting anything. They can have all of my poorly written misleading answers.

24

I'll just keep asking copilot about the damn exceptions until the effin code works. Na-na-nah!

1

Instead of solely deleting content, what if authors had instead moved their content/answers to something self-owned? Can SO even claim ownership legally of the content on their site? Seems iffy in my own, ignorant take.

22
kbin.social

For years, the site had a standing policy that prevented the use of generative AI in writing or rewording any questions or answers posted. Moderators were allowed and encouraged to use AI-detection software when reviewing posts. Beginning last week, however, the company began a rapid about-face in its public policy towards AI.

I listened to an episode of The Daily on AI, and the stuff they fed into to engines included the entire Internet. They literally ran out of things to feed it. That's why YouTube created their auto-generated subtitles - literally, so that they would have more material to feed into their LLMs. I fully expect reddit to be bought out/merged within the next six months or so. They are desperate for more material to feed the machine. Everything is going to end up going to an LLM somewhere.

20
elgordioreply
kbin.social

I think auto generated subtitles were to fulfil a FCC requirement, some years ago, for content subtitling. It has however turned out super useful for LLM feeding.

9

Here is an alternative Piped link(s):

Hell

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I'm open-source; check me out at GitHub.

1

There really isn’t much in the way of detection. It’s a big problem in schools and universities and the plagiarism detectors can’t sense AI.

2

If i was stack overflow I would've transferred my backups to OpenAI weeks before the announcement for this very reason.

This is also assuming the LLMs weren't already fed with scraped SO data years ago.

It's a small act of rebellion but SO already has your data and they'll do whatever they want with it, including mine.

16

They seem to only be watching the questions right now. You’re automatically prevented from deleting an accepted answer, but if you answered your own question (maybe because SO was useless for certain niche questions a decade ago so you kept digging and found your own solution), you can unaccept your answer first and then delete it.

I got a 30 day ban for “defacing” a few of my 10+ year old questions after moderators promptly reverted the edits. But they seem to have missed where I unaccepted and deleted my answers, even as they hang out in an undeletable state (showing up red for me and hidden for others).

And comments, which are a key part to properly understanding a lot of almost-correct answers, don’t seem to be afforded revision history or to have deletes noticed by moderators.

So it seems like you can still delete a bunch of your content, just not the questions. Do with that what you will.

11
lemmy.world

Can we change our answers? Change your answers to garbage, don't delete them. Do it slowly.

10
lemmy.world

If you have low karma, then edits are reviewed by multiple people before the edit is saved. That's primarily in place to prevent spam, who could otherwise post a valid question then edit it a few months later transforming the message into a link to some shitty website.

Even with high karma, that just means your edit is temporarily trusted. It's gets reviewed and will be reverted if it's a bad edit.

And any time an edit is reverted, that's a knock against your karma. There's a community enforced requirement for all edits to be a measurable improvement.

Even moderation decisions are reviewed by multiple people - so if someone rejects a post because it's spam, when they should have rejected it because it's off topic (or approved it) then that is also going to be caught and undone. And any harmful contribution (edit or moderation decision) will result in your action being undone and your karma going down. If your karma goes down too fast, your access to the site is revoked. If you do something really bad, then they'll ban your IP address.

Moderators can also lock a controversial post, so only people with high karma can touch it at all.

... keep in mind Stack Overflow doesn't just allow editing your own posts, you can edit any content on the website, similar to wikipedia.

It's honestly a good overall approach, but around when Jeff Attwood left in 2008 it started drifting off course towards the shit show that is stack overflow today.

25
lemmy.world

It's a shame, only corporate are going to be benefiting from hard work & labour of so many talented people.

5

If the Stack Overflow site remains available then it still serves the same purpose it did before. I personally use ad blockers and don’t pay to use the site, which must not be cheap to operate. The bigger problem is if talented people refuse to share their expertise with people like me because they aren’t being compensated for their efforts.

3
yamaniireply
lemmy.world

In the article the dude was banned for 7 days for changing his answer.

17
lemmy.world

I'm almost sure the site has already been scrapped of current contest for the LLM.

5

Yup, but that's not the point IMO, it's to remove quality content from the site so visitors see how crappy it is and stop using it.

9

I don't understand what anyone wins from this

Corporations are foundationally evil

And how do they not win more if we poison the entire Internet?

It's like being in a toxic relationship with kids involved

Set boundaries

Follow rules

Don't destroy the fucking fruit of your bodies just because you are angry at each other

Fuck those guys, like a lot, for taking your given data and selling

And fuck open ai for trying to make money from scientific discoveries meant for all of humanity

But what the fuck with ruining the entire Internet?

Who gets anything then?

If language models will ruin Internet why be afraid that normal human responses are available? Wut?

8
lemmy.today

Maybe we should start asking questions that iterate loops billions of times. Something semi-malicious that a person would recognize but an AI wouldn't.

Nah, the training data probably doesn't quite work that way. The AI would be very unlikely to test code, just regurgitate the most likely response based on it's training sets. Instead just filling posts with random bits and pieces of unrelated code and responses might be better.

8

I mean, they could just do what reddit does and restore from backup automatically lol

6
fedia.io

This sort of thing is so self-sabotaging. The website already has your comment, and a license to use it. By deleting your stuff from the web you only ensure that the AI is definitely going to be the better resource to go to for answers.

6
Rolandoreply
lemmy.world

I'm not sure about that... in Europe don't you have the right to insist that a website no longer use your content?

18
000reply
fuck.markets

Not when you've agreed to a terms of service that hands over ownership of your content to Stack Overflow, leaving you merely licensed to use your own content.

9

That's an interesting point. I winder how llms handle gdpr would it be like having a tiny piece of your brain cut out

0
lemmy.world

Also backups and deleted flags. Whatever comment you submitted is likely backed up already and even if you click the delete button you're likely only just changing a flag.

12

I feel like a lot of people don't understand the most basic things about the site. Any user with enough internet points can see deleted posts.

1

Frankly, the solution here isn’t vandalism, it’s setting up a competing side and copying the content over. The license of stackoverflow makes that explicitly legal. Anything else is just playing around and hoping that a company acts against its own interests, which has rarely ever worked before.

3

Good to know that stackoverflow will not be a trustable place to find solutuons anymore.

2

It will not make a difference. The internet is free and open by design. You can always scrape the internet any time. A partnership will do nothing but make it a little bit more convenient for them.

1
lemmy.world

Anyone care to explain why people would care that they posted to a public forum that they don't own, with content that is now further being shared for public benefit?

The argument that it's your content becomes false as soon as you shared it with the world.

-2
lemm.ee

I can only really speak to reddit, but I think this applies to all of the user generated content websites. The original premise, that everyone agreed to, was the site provides a space and some tools and users provide content to fill it. As information gets added, it becomes a valuable resource for everyone. Ads and other revenue streams become a necessary evil in all this, but overall directly support the core use case.

Now that content is being packaged into large language models to be either put behind a paywall or packed into other non-freely available services. Since they no longer seem interested in supporting the model we all agreed on, I see no reason to continue adding value and since they provided tools to remove content I may as well use them.

44
0oWowreply
lemmy.world

But from the very beginning years ago, it was understood that when you post on these types of sites, the data is not yours, or at least you give them license to use it how they see fit. So for years people accepted that, but are now whining because they aren't getting paid for something they gave away.

1

This is legal vs rude. It certainly is legal and was in the terms of service for them to use the data in any way they see fit. But, also it's rude to bait and switch from being a message board to being an AI data source company. Users we led to believe they were entering into an agreement with one type of company and are now in an agreement with a totally different one.

You can smugly tell people they shouldn't have made that decision 15 years ago when they started, but a little empathy is also cool.

Additionally: When you owe your entire existence and value to user goodwill it might not be a great idea to be rude to them.

1
Emotetreply
slrpnk.net

It's not shared for public benefit, though. OpenAI, despite the Open in their name, charges for access to their models. You either pay with money or (meta)data, depending on the model.

Legally, sure. You signed away your rights to your answers when you joined the forum. Morally, though?

People are pissed that SO, that was actively encouraging Mods to use AI detection software to prevent any LLM usage in the posted questions and answers, are now selling the publicly accessible data, made by their users for free, to a closed-source for-profit entity that refuses to open itself up.

Basically the same story as with reddit.

37
gollireply

Agreed. As you said it's a similar situation as with reddit, where I decided to delete my comments.

My reasoning is that those contributions were given under the premise that everybody was sharing to help each other.

Now that premise has changed: the large tech companies are only taking and the platform providers are changing the rules aswell to profit from it.

So as a result I packed my things and left, in case of reddit to here.

That said I think both views are valid and I wouldn't fault those that think differently.

11
LainTrainreply
lemmy.dbzer0.com

Lol it ain't for public benefit unless it's a FOSS model with which I'd have no issue

10
lemmy.zip

Well no, when you post something it is public and out of your control

-3
LainTrainreply
lemmy.dbzer0.com

No, you can't post something in public and have it appropriated by a mega corp for money and then prevent you from deleting or modifying the very things you posted.

I'm pro-AI btw. But AI for all.

2

It is your content. But SE specifically only accepts CC licensed content, which makes you right.

3
lemmy.dbzer0.com

Why?? Please make this make sense. Having AI to help with coding is ideal and the greatest immediate use case probably. The web is an open resource. Why die on this stupid hill instead of advocating for a privacy argument that actually matters?

Edit: Okay got it. Hinder significant human progress because a company I don't like might make some more money from something I said in public, which has been a thing literally forever. You guys really lack a lot of life skills about how the world really works huh?

-4

Because being able to delete your data from social networks you no longer wish to participate in or that have banned you, as long as they specifically haven't paid you for the your contributions, is a privacy argument that actually matters, regardless and independent of AI.

In regards to AI, the problem is not with AI in general but with proprietary for-profit AI getting trained with open resources, even those with underlying license agreements that prevent that information being monetized.

25

Because none of the big companies listen to the privacy argument. Or any argument, really.

AI in itself is good, amazing, even.

I have no issue with open-source, ideally GPL- or similarly licensed AI models trained on Internet data.

But involuntarily participating in training closed-source corporate AI's...no, thanks. That shit should go to the hellhole it was born in, and we should do our best to destroy it, not advocate for it.

If you care about the future of AI, OpenAI should long be on your enemy list. They expropriated an open model, they were hypocritical enough to keep "open" in the name, and then they essentially sold themselves to Microsoft. That's not the AI future we should want.

15