The fact that users are encouraged to include text descriptions with media content makes it perfect training data for AI.
Lemmy hates AI.
I'm fully supportive of the accessibility for persons with disabilities, to be clear. It's ironic though. Does Lemmy's open source code make it easier for bots to scrape it?
118
Comments12
It's not perfect training data. Being encouraged to add alt text and actually doing it are two different things. Writing good alt text is another matter all together. And anything that's on the internet is training data whether people want it to be or not. The only difference is ethical whether the scraper accepts and respects a version of robots dot txt, i.e. "do not scrape," that communicates the training data's holders' intentions. And if they torrent books you can guess how respectful they are.
At this point all the imagery data they need is already out there. Not like your picture of a cat you post to Lemmy is gonna help these companies make a better model.
Web content should always strive to be more accessible. Things like AI should be better regulated instead. I think we've missed the boat on a big part of that though, should have legally clamped down on activities a long time ago.
Something I mention every time this comes up. AI doesn’t need to scrape Lemmy. All someone has to do is set up their own federated instance and AcitivityPub will wrap it up in a nice JSON format for them to parse however they want. And there’s fundamentally nothing a person can do about it.
It’s just best to realize anything and everything on Lemmy is publicly available for any use, good or bad.
Yes, but that doesn't mean we should eschew accessibility
We can just use an AI to describe the images /s
alt text is meant for screen readers, so blind people can hear the image description.
Also, the alt texts vary in descriptiveness for that exact purpose. They're meant to be useful for humans, not for training data.
What would a blind person rather have as the alt text:
(there are no photos here, for the blind people listening)
1:
2:
and I actually really like that one particular use-case of ai because less required human interaction gives the blind user more independence. The remaining issue of corporatization and private ownership of something that should be a publicly owned resource (as with many other assistive technologies) is a society-wide issue and framing it as a futurist vs Luddite discussion is a powerful misdirection.
ai is already amazing at image recognition, training or no training it’s already here
We've been training it through the Captcha system for decades now.
All technicality aside, models trained on images and their text-descriptions for blind people falls under one of the few good use-cases to Machine Learning.