This is possible because Lemmy doesn't proxy external images but instead loads them directly. While not all that bad, this could be used for Spy pixels by nefarious posters and commenters.
Note, that the only thing that I willingly log is the "hit count" visible in the image, and I have no intention to misuse the data.
I think proxying everything through lemmy would have a pretty big bandwidth/scalability impact. I expect the lemmy clients dont send any unique user info on these image requests so not sure how useful it would be as a spy pixel? Maybe I'm missing something :-)
It would be interesting to see just how much info is shared when lemmy requests the image. If there is [potentially] sensitive info being shared, the devs might be interested in working on it too (I have no idea how to check such a thing, this comment is just so I can find the post later when more people have shared their wisdom on it)
None (by Lemmy), as Lemmy doesn't actually request the image (that would be proxying). Your browser requests the image directly by URL. Lemmy, technically, doesn't even know an image exists. It just provides the HTML and lets your browser do the work.

I get the same result when I browse directly to the link.
So, if OP links a malcious website we have a problem ... (?).
Not really that huge of a problem. When making requests you also usually send a header which includes the user agent.
The program just logs how many times the image has been requested and it reads the user agent data. No Javascript is actually executed.
Well it might be possible to have a XSS somehow but I haven't really done much research into this possibility.
In general it's a pretty standard way of handling embedded images. Email does this too. That's how you have these services that can check if someone read a mail
okay so I make a test here, with this : 
I believe this web page doesn't load automatically.
::: spoiler FWi
The domain names example.com, example.net and example.org are second-level domain names in the Domain Name System of the Internet. They are reserved by the Internet Assigned Numbers Authority (IANA) at the direction of the Internet Engineering Task Force (IETF) as special-use domain names for documentation purposes. (...wikipedia)
:::
Yup. And to add, your browser will send things like:
Your IP address. Technically this is sent by the OS doing networking and is unavoidable. At best, a VPN can hide this, because the VPN sits in the middle.
Various basic request headers, which most notably contains user agent (identifies browser) and language headers, both which you can fake if you want to.
Cookies for that domain (if you have any). Those can track you across multiple requests and thus build up a profile of you.
Notably, this allows remote parties to associate your IP address with your interests, as revealed by the Lemmy communities that you browse.
One way is for the image host to use the HTTP Referer field. (Standards-respecting web browsers pass the URL of the web page being viewed to the server hosting the image.)
Another way is by posting an image with a unique URL.
Even if Referer is withheld and the image is not unique, the image host can still do basic fingerprinting of your client's request header and your OS's TCP quirks, and associate that fingerprint with your IP address.
An option for Lemmy to proxy media would be very helpful. Small instances could perhaps disable it, although they might not need to, since the additional load would scale with the number of users on that instance.
Notably, this allows remote parties to associate your IP address with your interests, as revealed by the Lemmy communities that you browse.
I suspect with a coordinated pool of posts or multiple comments on the same post, you could narrow that IP address down to an actual user account.
When a new comment is posted by a user, store, against their username, all IP addresses that visited since the last comment in that thread (by anyone). When a second comment is posted by a user, remove any IP addresses that don't appear in both lists.
I suspect you would have a very short list after two comments, and a single address after 3. It would also be extremely easy to both lure someone into viewing an image and bait them into multiple replies. Geolocate that IP and you know know vaguely where that user lives.
Even without that, once your Lemmy interests are sold/shared by IP address, they can be associated with your real identity as soon as you log in to a service that knows who you are.
Were you expecting otherwise? Loading an external image is no different than loading an external website with images. Lemmy and reddit are link aggregators, not proxies. Having to proxy everything would run a significant bandwidth for instance admin who are often paying out of pocket for hosting.
Proxying external images means that instead of the image being downloaded from the original link, your Lemmy server would download it and serve it for you. The Lemmy server acts as a proxy.
But it means performing a lot of extra traffic. And realistically you'd want to cache the image because otherwise your server will likely get banned for the high volume of requests you send. But caching the images requires more storage and can have potential for legal issues.
And images are one thing, but literally any content is the problem. Images are just the most obvious because they often load without even having to click on the image and thus you'll get far higher volume of user data. Literally anything you link to has this issue and you cannot proxy all of it.
It's just a simple Flask server. I parse the user-agent using the user_agentsPython library, apply some conditionals upon the result, render the image using Pillow and send it to the user.
Yeah I saw OPs explanation in the comments. That is fucking cool! And scary! I've never needed to generate images with code before, so Ive never even considered something like this before.
It can run natively on an Apache server without any frameworks required to render user website markup and serve pages. That's a pretty awesome advantage.
PHP is the OG bad-ass for getting shit done. No setup, no compile, no deployment pipelines. Hell, you can create and write the files right there on the server with nothing more than an SSH terminal if you want.
I hate this so much. Its super cool but MAN what the hell. I don't think I'm going to ever turn off my VPN anymore. I'm in a super small town and that image is correct.
It's cached somewhere because I can't get it to update. Maybe time for a new account too. Hmmmm
I'm fine with this. Instances shouldn't proxy or cache images because it opens instance owners to a lot more liability than text. A client side setting to not load images in comments by default is better.
Salient demonstration, but if image proxying were to come to Lemmy I'd hope it was made optional, as it could overburden smaller instances, especially one-person instances (like mine). We also need a simple integrated way of configuring object storage.
If you are using lemmy.ml your rule would be this:
lemmy.ml * 3p block
if you're using another instance then change the domain or use both rules cause you might end up visiting the others as well.
Note that adding this rule wont work unless enable advanced features in ublock origin.
Man, I remember I scared the crap out of trolls on Reddit when we started arguing over DM, and I added a link to a meme that tracked their IP and system info (without them knowing ofc). Let's just say they went AFK quickly after that. Good times!
The user-agent detection definitely isn’t great. If it doesn't recognize a client, it just says unknown. But that wasn't the main point of the post anyway, this was just meant as a quick proof of concept for anyone curious.
I feel like there isn't a real way to fix this, since lemmy isn't a single service, like I can choose any image host I want. The only way I could think of would be to have your instance download the images but that's currently not even support on the mastodon alike platforms even. The only thing you can do on Mastodon that I'm aware of is cache the images on your own server which could get costly
Even without instance proxy, it should easy enough on the client side to not pull remote images unless directed to do so, similar to most email clients these days. At least it gives people a warning that they're passing data to a 3rd party location.
Maybe, but on the flip side though, as an instance owner, I don't nessecarily want my node to be in the logs acessing questionable content on behalf of the end user.
By not using internet. No, seriously, if you access something over the internet, you will leave tracks. This here post is nothing new or inherently scary on its own. I used to have forum signatures that would tell people what browser they were using or from what IP they were coming.
What you really want to do is disable third party cookies on everything you own. That (and things like hsts super cookies) is what tracks you.
If you’re using an app to browse Lemmy, you might ask for their implementation to reject cookies and fingerprinting attempts when displaying images and other embeddables.
a minute later edit: And yeah, if you don’t like web services to know the IP address given to you by your ISP, VPN is a decent option.
I would say a user agent spoofer would be more useful for this particular image. The Mozilla team recommends User-Agent Switcher and Manager for Firefox users.
That's weird. The extension should definitely work with the image, as that's what I used when building this quick demo. Does the content of a site like this update?
TL;DW: Click on the extension icon, use the drop-down lists to find a browser and OS, select a pre-configured user-agent string from the list, and click "apply (container)" or "apply (all windows)". Having your user-agent string change randomly with each request is possible but requires writing a bit of JSON in the options.
it is because the website providing the image is overloaded and cannot create an image.
You just have to reload the image and eventually you will see one.
Lemmy clients should really include an option to group or only show the first instance of a link for cases like this; where the same link is posted to multiple places.
This is possible because Lemmy doesn't proxy external images but instead loads them directly. While not all that bad, this could be used for Spy pixels by nefarious posters and commenters.
Note, that the only thing that I willingly log is the "hit count" visible in the image, and I have no intention to misuse the data.
Nice example!
I think proxying everything through lemmy would have a pretty big bandwidth/scalability impact. I expect the lemmy clients dont send any unique user info on these image requests so not sure how useful it would be as a spy pixel? Maybe I'm missing something :-)
It would be interesting to see just how much info is shared when lemmy requests the image. If there is [potentially] sensitive info being shared, the devs might be interested in working on it too (I have no idea how to check such a thing, this comment is just so I can find the post later when more people have shared their wisdom on it)
None (by Lemmy), as Lemmy doesn't actually request the image (that would be proxying). Your browser requests the image directly by URL. Lemmy, technically, doesn't even know an image exists. It just provides the HTML and lets your browser do the work.
Exactly. The text of this post is simply :
I get the same result when I browse directly to the link.
So, if OP links a malcious website we have a problem ... (?).
Oh dangit, it's simpler than I thought. So the only data being sent is...just whatever is sent in your average GET request.
Yes. It's also a pretty standard way of serving images. A lot of Email clients do that too.
That's also how these services that show you when a email is read work.
Not really that huge of a problem. When making requests you also usually send a header which includes the user agent.
The program just logs how many times the image has been requested and it reads the user agent data. No Javascript is actually executed.
Well it might be possible to have a XSS somehow but I haven't really done much research into this possibility.
In general it's a pretty standard way of handling embedded images. Email does this too. That's how you have these services that can check if someone read a mail
okay so I make a test here, with this :
I believe this web page doesn't load automatically.
::: spoiler FWi The domain names
example.com,example.netandexample.orgare second-level domain names in the Domain Name System of the Internet. They are reserved by the Internet Assigned Numbers Authority (IANA) at the direction of the Internet Engineering Task Force (IETF) as special-use domain names for documentation purposes. (...wikipedia) :::Yup. And to add, your browser will send things like:
Your IP address. Technically this is sent by the OS doing networking and is unavoidable. At best, a VPN can hide this, because the VPN sits in the middle.
Various basic request headers, which most notably contains user agent (identifies browser) and language headers, both which you can fake if you want to.
Cookies for that domain (if you have any). Those can track you across multiple requests and thus build up a profile of you.
That's why you should use a native app, which won't send any of that identifying info (except for IP but there's nothing you can do on that)
Notably, this allows remote parties to associate your IP address with your interests, as revealed by the Lemmy communities that you browse.
One way is for the image host to use the HTTP Referer field. (Standards-respecting web browsers pass the URL of the web page being viewed to the server hosting the image.)
Another way is by posting an image with a unique URL.
Even if Referer is withheld and the image is not unique, the image host can still do basic fingerprinting of your client's request header and your OS's TCP quirks, and associate that fingerprint with your IP address.
An option for Lemmy to proxy media would be very helpful. Small instances could perhaps disable it, although they might not need to, since the additional load would scale with the number of users on that instance.
I suspect with a coordinated pool of posts or multiple comments on the same post, you could narrow that IP address down to an actual user account.
When a new comment is posted by a user, store, against their username, all IP addresses that visited since the last comment in that thread (by anyone). When a second comment is posted by a user, remove any IP addresses that don't appear in both lists.
I suspect you would have a very short list after two comments, and a single address after 3. It would also be extremely easy to both lure someone into viewing an image and bait them into multiple replies. Geolocate that IP and you know know vaguely where that user lives.
Time to make sure you're always on a VPN I guess.
You could also send the image through a DM if you want to find a particular user
Oh yeah, that'd be much less effort.
Even without that, once your Lemmy interests are sold/shared by IP address, they can be associated with your real identity as soon as you log in to a service that knows who you are.
Were you expecting otherwise? Loading an external image is no different than loading an external website with images. Lemmy and reddit are link aggregators, not proxies. Having to proxy everything would run a significant bandwidth for instance admin who are often paying out of pocket for hosting.
Any chance that's why this account is posting the same image and gibberish? @googa
From what I remember, that image was hosted on hexbear.net, so I don't think so.
How do you get an image to run code? I guess I somehow missed something important in website development.
Edit: I saw that you said you're using Pillow to actually render the image from code. That's neat! ...and scary
Proxying external images means that instead of the image being downloaded from the original link, your Lemmy server would download it and serve it for you. The Lemmy server acts as a proxy.
But it means performing a lot of extra traffic. And realistically you'd want to cache the image because otherwise your server will likely get banned for the high volume of requests you send. But caching the images requires more storage and can have potential for legal issues.
And images are one thing, but literally any content is the problem. Images are just the most obvious because they often load without even having to click on the image and thus you'll get far higher volume of user data. Literally anything you link to has this issue and you cannot proxy all of it.
Share source code? I'm curious
It's just a simple Flask server. I parse the user-agent using the
user_agentsPython library, apply some conditionals upon the result, render the image using Pillow and send it to the user.[This comment has been deleted by an automated system]
Finally. Someone noticed 🥹
Joke's on you. IP geolocation where I am is an unreliable mess and your image got it wrong by about 1000km!
[This comment has been deleted by an automated system]
Location is right, but I highly doubt anyone near me is using Lemmy (dictatorship here).
[This comment has been deleted by an automated system]
Great, hot milfs near my location
Thanks for the heads-up.
Routing my Lemmy mobile app through orbot from now on. Seems to have fixed the issue.
I’m not using a VPN or anything and it got my location wrong by 700 kilometers 🤔
Are you sure you are where you think you are? When's the last time you looked outside?
Oh no! I've been kidnapped!
Woah this is really cool. Though I was way off for me and I'm not on a VPN right now.
[This comment has been deleted by an automated system]
You can run Geolocation with images now? What the heck? How?
[This comment has been deleted by an automated system]
It's not the image, it's a normal image. The server does the hard work when you make the request, and then it just builds the image accordingly.
Yeah I saw OPs explanation in the comments. That is fucking cool! And scary! I've never needed to generate images with code before, so Ive never even considered something like this before.
I wonder why the Baltimore community is so dead, then.
Thought about adding the user's location, but was worried PythonAnywhere could somehow cache the image between multiple people. A great demo though!
You have the code for this? Very interested in how you implemented it
Damn, PHP is such a sleeper of a language, I always forget how useful it can be.Thanks for sharing!
[This comment has been deleted by an automated system]
Nice, sounds like it's getting modernized. I'll have to give it another round, thanks!
Genuinely curious, how is it superior to Python in your opinion?
Edit: Apart from the things you listed 😅
It can run natively on an Apache server without any frameworks required to render user website markup and serve pages. That's a pretty awesome advantage.
PHP is the OG bad-ass for getting shit done. No setup, no compile, no deployment pipelines. Hell, you can create and write the files right there on the server with nothing more than an SSH terminal if you want.
PHP is pretty damn awesome really... Sad that it's gone out of favor IMHO
My location is accurate, to give some good feedback on your program too lol
[This comment has been deleted by an automated system]
Hah, not my town, but close. That's where my ISP is located though.
I'm not using a VPN and the location isn't accurate.
Hey. I wanted to do this tomorrow.
Well I have a new idea which is pretty similar
[This comment has been deleted by an automated system]
I'm plannig to make one of these "dox'd memes" where someone says something controversial and another one answers with the ip address.
[This comment has been deleted by an automated system]
It’s got me about an hour from where I actually am
I hate this so much. Its super cool but MAN what the hell. I don't think I'm going to ever turn off my VPN anymore. I'm in a super small town and that image is correct.
It's cached somewhere because I can't get it to update. Maybe time for a new account too. Hmmmm
[This comment has been deleted by an automated system]
Yeah, app cache had to be cleared. We good
I'm fine with this. Instances shouldn't proxy or cache images because it opens instance owners to a lot more liability than text. A client side setting to not load images in comments by default is better.
Each instance stores post thumbnails locally even if the post was on another server. It actually takes up quite a bit of hdd space.
Same for Liftoff on Android
Doesn't know it's sync.
Voyager on Android
Which would be correct as Voyager is a Web App
Oh neat, Jerboa doesn't identify itself. Cool.
Same on Sync And on infinity
I get "unknown (mobile?) client" using Jerboa
What is it supposed to say?
"You are viewing this from The Black Pearl, Davy Jones."
It names your browser and OS.
it got mine wrong because i change default useragent and platform in the browser.
Salient demonstration, but if image proxying were to come to Lemmy I'd hope it was made optional, as it could overburden smaller instances, especially one-person instances (like mine). We also need a simple integrated way of configuring object storage.
[This comment has been deleted by an automated system]
I'll be damned. I tried this from three different platforms and you've nailed it.
I'm using Firefox on Mac and it thought I was on windows. Still a big issue though.
It said I'm on Mac OS X, but that's wrong. It's been macOS for some years now. /s
It still makes me wanna cry.
Yeah, I just use whatever the
user_agentsPython library gives me asuser_agent.os.family.VPN using Librewolf user checking in. This post got nothing on me.
Easiest way to stop this from happening is to use ublock origin to block all third party request on your instance.
One way to do this is via dynamic filtering. This is for advanced users so be sure to read the info page: https://github.com/gorhill/uBlock/wiki/Dynamic-filtering
(Consider backing up your ublock settings before doing this)
If you are using lemmy.ml your rule would be this:
if you're using another instance then change the domain or use both rules cause you might end up visiting the others as well. Note that adding this rule wont work unless enable advanced features in ublock origin.
EDIT: THIS MIGHT BREAK THINGS ON YOUR INSTANCE, its recommended to learn how to use dynamic filtering to unbreak it: https://github.com/gorhill/uBlock/wiki/Dynamic-filtering:-quick-guide If it breaks stuff just remove that rule.
You could also block it using static filters but I can't remember how to do that exactly, if you know please reply below.
Yeah, I'm using Mullvad with misc DNS blockers enabled so it has nothing on me ᕕ( ᐛ )ᕗ
I'm also on jerboa, but a Samsung with GPS, and it also tells me unknown device. Must be jerboa
It says unknown (mobile?) client for me too, using Sync with Bluetooth and location enabled and Play Store Services installed.
Whoever wrote that image tracking over-hyped it?
The user-agent detection definitely isn't great, this was just meant as a quick proof of concept for anyone curios.
It successfully identified Firefox when I checked it from the browser. Maybe some of the apps don't identify themselves in the useragent string?
I would've hoped that lemmy users on a c called privacy would understand the technology better, but I guess not.
for a little extra creepiness, modify the image-generating script to add geoip location data and http referer to the image.
[This comment has been deleted by an automated system]
Thought about adding the user's location, but was worried PythonAnywhere could somehow cache the image between multiple people.
What does it say? on jerboa is states that i use unknown mobile client, with infinity, android client. All i have is adaway on my phone
Man, I remember I scared the crap out of trolls on Reddit when we started arguing over DM, and I added a link to a meme that tracked their IP and system info (without them knowing ofc). Let's just say they went AFK quickly after that. Good times!
unknown device?
unkown
The unkown sounds pretty fucking scary to me
Ok.. I usually don't laugh much at comments and replies but that right there was pretty funny... I don't care who ya are.
Oh, how did I not notice that before? Now should be fixed.
Still says unkown for me.
The user-agent detection definitely isn’t great. If it doesn't recognize a client, it just says unknown. But that wasn't the main point of the post anyway, this was just meant as a quick proof of concept for anyone curious.
Whats the point of unknown?
Image proxies are a must have, let's hope we get those soon!
I feel like there isn't a real way to fix this, since lemmy isn't a single service, like I can choose any image host I want. The only way I could think of would be to have your instance download the images but that's currently not even support on the mastodon alike platforms even. The only thing you can do on Mastodon that I'm aware of is cache the images on your own server which could get costly
Even without instance proxy, it should easy enough on the client side to not pull remote images unless directed to do so, similar to most email clients these days. At least it gives people a warning that they're passing data to a 3rd party location.
That's pretty stupid for a platform mainly based on images.
Maybe, but on the flip side though, as an instance owner, I don't nessecarily want my node to be in the logs acessing questionable content on behalf of the end user.
That's why it isn't done this way I guess
And because it's more resource efficient.
I got mobile client from Liftoff.
Holy shit. How do we avoid this? VPN?
By not using internet. No, seriously, if you access something over the internet, you will leave tracks. This here post is nothing new or inherently scary on its own. I used to have forum signatures that would tell people what browser they were using or from what IP they were coming.
What you really want to do is disable third party cookies on everything you own. That (and things like hsts super cookies) is what tracks you.
If you’re using an app to browse Lemmy, you might ask for their implementation to reject cookies and fingerprinting attempts when displaying images and other embeddables.
a minute later edit: And yeah, if you don’t like web services to know the IP address given to you by your ISP, VPN is a decent option.
I would say a user agent spoofer would be more useful for this particular image. The Mozilla team recommends User-Agent Switcher and Manager for Firefox users.
Where can I learn more about using this Firefox extension? I've installed it, but it hasn't changed the results of (https://trilinder.pythonanywhere.com/image.jpg).
I see I am able to black list pythonanywhere.com.
That's weird. The extension should definitely work with the image, as that's what I used when building this quick demo. Does the content of a site like this update?
Here's a six-minute YouTube video explaining how to use it
TL;DW: Click on the extension icon, use the drop-down lists to find a browser and OS, select a pre-configured user-agent string from the list, and click "apply (container)" or "apply (all windows)". Having your user-agent string change randomly with each request is possible but requires writing a bit of JSON in the options.
TY!! That link works on Invidious, Yay! I'll check it when I get a break.
Wow! But mine didn't. Which filter lists are you using?
I’m using a VPN, and the picture knows everything about me regardless.
Jokes on you! I use a Firefox extension that spoofs my browser profile. https://addons.mozilla.org/en-US/firefox/addon/chameleon-ext/
All these people correcting the result effectively giving useful data to improve data collection and detection methods.
it is because the website providing the image is overloaded and cannot create an image.
You just have to reload the image and eventually you will see one.
Lemmy clients should really include an option to group or only show the first instance of a link for cases like this; where the same link is posted to multiple places.
User Agent Switcher and Manager
The image doesn't load on alexandrite
Whoa I m totally pigged out