Grafana is the most essential application in my job. I can use Notepad to code in a world without IDEs. I couldn't keep a damn thing running in the real world without Grafana. And I've been forced against my will to use alternatives in the past.
Basically brute force, I'm not great with it but I was the one on my team responsible for setting up our dashboards. I wrote the prometheus metric collection in our microservices and built the dashboards from that data.
There are tons of free dashboards though for monitoring resources and such so a lot of things I use are just downloaded from the Grafana website. And the docs are good too. So looking at examples + documentation is how I learn. It would be helpful if I was better with math though.
I can’t believe how fast you’ve managed to crowdsource and fix things on this instance. I haven’t seen many problems at all sharing comments and things.
Donations
If you would like to make a donation to support the cost of running this platform, please do so at the mastodon.world donation URLs:
https://opencollective.com/mastodonworld
https://patreon.com/mastodonworld
Damn that’s a huge chunk of (what looks like) a 64 core CPU there. Impressive!
It’s cool it can aggressively cache that much. Although I am perplexed why one would have a swap file configured in this case? What does it give you here? Sorry not trying to be an elitist or anything just have no idea what advantage you get!
The OOM killer is usually triggered after it starts hitting the disk. Which means your system is unresponsive for a long time until it finally kills something.
Using something like oomd can help trigger before it hits swap but then why are you using swap in the first place?
The bigger issue is that the kernel sometimes ignores the swappiness and will evict code/data pages long before file cache even when set to 0 or 1. I'm still not sure if that was because of an Ubuntu patch or if it was an issue that's been resolved in the years since I last saw this
How far do you see lemmy.world capable of scaling to? One thing I've been noticing is the centralisation of Lemmy users on a few top servers, surely that cannot be healthy for federation? What are your thoughts on this?
Not entirely sure of what you are asking, but the only reason they need a clustered setup is simply because of their scale. Making the details of their setup public does not help with the issue I addressed, since in an ideal scenario, communities and users would be evenly distributed amongst the many Lemmy instances in the fediverse, making the need to do any sort of clustering for performance reasons unnecessary.
We do run on 1 server, but we’ve now seen that Lemmy scales horizontally so the k8s path forward is open 😊 With all these latest improvements we can have a bit more users on the current box.
Not trying to be pedantic, but why do they have to do so? Why can't people figure it out themselves? Also, why can't Lemmy instances run on single non-redundant boxes? Most instance operators don't have the budget of enterprises, so why would they have to run their Lemmy's like enterprises?
Er, because we should all be working together to try to help Lemmy grow and be stable…?
I agree with this point, but I disagree with the context in which you mentioned, "They should post their clustered setup so others can replicate more easily", right as a reply to my original comment asking how Ruud felt about the centralisation of users in a federated application. This should've been an entirely separate reply, or perhaps an issue on GitHub to the Lemmy authors.
You can run on a single box, but a single problem will bring down your single box. This is a basic problem commonly discussed in DevOps circles.
Again, I agree, but the context in which you mentioned it, basically suggests that everyone who runs single instance Lemmys are doing it wrong, which I disagree.
Lowering the entry requirements is part of how we can get wide-spread adoption of federated software. Not telling people that they have to have at least 2 instances with redundancies or they are doing it entirely wrong.
The bare minimum I would ask anyone running their own instance, is to have backups. They don't need fancy load-balancers, or slaved Postgres database setups, or even multi-node redis caches for their instances of sub-thousand users.
For example, one reasonably priced server on most providers is like $20-40/month. Say a load balancer as a service is another $10-20, and a database server or database as a service is also like $20-$40. A distributed, redundant setup would be like 2 webservers, a database, and a load balancer so like, $70?
Seriously? That may be an acceptable price tag for a extremely public Lemmy host, like lemmy.world or lemmy.ml, but in no way should it be a reasonable price tag for the vast majority of Lemmy instances setup out there. Especially when most of them have sub-thousand users. $70/mo? That has to be a joke. You can easily host a Lemmy on a $5-$10 droplet for ~100 users.
I’ve deployed clustered applications myself, I just haven’t looked into doing it with Lemmy and was curious if they had a run book or documentation.
No offense, but you definitely seem like the kind of person to shill for cloud-scaling and disregard cost-savings.
I know that the RAM cache is just taking advantage of otherwise free RAM and will be dropped in favor of anything else, but it does stress me out a bit to see it "full" like that.
Difference between Windows and Linux. Windows would only use what it needs. Linux pre-empts more and fills the RAM for what coul dbe needed.
It used to stress the shit out of me when I switched to Linux as I'd gotten used to opening task manager and seeing 90% free RAM. On Linux I'd be seeing 10% free and panicking thinking it was a resource hog.
I had a feeling that 'factoid' may be out of date!
Since I learnt it about the time of Windows XP when we were shown examples of how Linux and Windows memory management differed.
It all made sense why Linux seemed to have full RAM even after a big upgrade but WinXP gave the 'illusion' of having lots of free RAM to use.
~
20yrs ago!
I think we used SuSE Linux 7.3!
I still hold a savage hatred of all RPM-based distros after dealing with the hell of early 2000's editions (Redhat, Mandrake & Suse). Though I did like SuSE KDE's colours when it worked!
If you had this much buffer memory what are the reasons to have swap space as well?
Many programs do stuff once during startup that they never do again, sometimes creating redundant data objects that will never get accessed in the configuration its being run in. Eventually the kernel memory manager figures out that some pages are never used but it can't just delete them. If swap is enabled it can swap them to disk instead. It frees up that RAM for something more important. It's usually minor but every few MB helps.
I personally like having some swap as during low memory situations (which lemmy gets at least once a day on my small instance) everything slows down rather than getting culled by the oom killer. It's not a replacement for monitoring, but it does extend the timeframe to react to things.
Memcache usually takes all the assigned memory regardless of usage so seeing high usage isn't always unusual. That's assuming the lemmy servers are using some kind of session caching solution.
I hate that radial graphs are so popular with *Grafana dashboards. Radial/pie charts are terrible representations for humans to interpret. I tend to try and convert them either to a stat with the line/time display or a bar chart. Humans are better judging linear relationships than radial.
That‘s what I had in mind. To run nginx on a seperate vps, so I can scale it easier. Run fediverse instances in the back, either all on one vps or on different vps. This way I could provide a hub while increase performance (due to compression and caching) and provide redundancy/load balancing if necessary.
What‘s the typical traffic you experience? Peak (Gbit/s) and average/daily traffic (GB)
Yeah I saw that. I‘m a big fan of minimalistic, yet super performant architectures and I‘m just trying to get a feeling on how I could solve this problem. I try to avoid any downtime, whenever possible
Can someone give me a hand. I see tons of posts of people talking about a picture in the OP but i see nothing. Am i doing something wrong? Is my connection bad? This seems to be happening quite a lot. For example the meme instance has almost zero pictures but i know just about every post should have one.
The entire team is doing an amazing job. Lemmy is getting smoother with each passing day. I hope it keeps growing (and none of you get too burnt out in the process)!
From those graphs, memory usage is very low. Most of it is being used for disk caching, which is what linux does with memory it has no other use for (may as well use it for something).
I have so many things pumping data “into” Grafana these days I’m surprised they haven’t tried to force me to pay for an enterprise license.
Anyway, thanks for sharing these, @[email protected]. As a performance engineer, I love to see this level of detail and commitment on your part to keep the user experience for lemmy.world at acceptable levels.
In the early days of the pandemic…and the early days of my Ansible learning…I set up a playbook to scrape several websites for hand sanitizer and Clorox wipes.
If it found one in stock, it would email my cell phone carriers SMS gateway. Tasker would then make a loud audible alert.
Ran for weeks before it found some in stock. And then it did. At 2am. And again at 2:05, and 2:10, and 2:15…
And it was an error on the shops webpage. It wasn’t actually orderable…once it got in your cart, it wouldn’t let you check out.
4) Even if you do want a pizza, you should probably be careful with this provider. In testing, I once nearly ordered every item on the Domino's menu, which would probably have been expensive and embarrassing.
Reminds me of the old adage:
A computer lets you make more mistakes faster than any invention in human history -- with the possible exceptions of hand guns and tequila.
I think you're misreading it. The olive green in the CPU chart is idle. RAM cache taking up most of system memory is also normal on most Linux systems, even on desktop. That cache is freed for applications to use as needed.
Welp, my only calculation was "64 cpu threads * 30% load -> ~19 cores busy", I may be guilty of rounding up too much...
The RAM usage is intresting however, since the kernel seems to be caching all it can, to point ejecting uneeded data into swap in order to retain the disk cache. If more ram is reserved by running processes, the (likely pict-rs, database services) disk access times will begin to degrade.
It could also be all 64 threads being used lightly with the scheduler trying to spread the load out evenly.
Not sure what the exact situation regarding swap in that graph, but I've also had the kernel preemptively use swap for rarely used chunks of memory in favor of cache when running long-running processes. Its probably relatively normal.
I really enjoy your transparency and style of communication!
Comparing it to Spez and how Reddit became prior to the migration, this is such a refreshing change
/u/Ruud is like /u/Spez but only if /u/Spez was actually cool.
Sooo… /u/Ruud is nothing like /u/Spez? Same energy as “Communism is like Capitalism but only if Capitalism got rid of the concept of capital”.
Yes that was the joke.
It gets me every time seeing people using the product I build 🥹
You worked on Grafana? Your product is awesome, I use it in my homelab for performance metrics
Yes, I'm one of the designers 👍🏾
Poggers. Couldn‘t live wuthout it. Thank you for your work!
Love Grafana, especially the new UI. Great work, man. :)
That’s so cool! Grafana is awesome, the whole team did a great job
Grafana is one of those tools which everyone should use if they have something they maintain themselves. Superb tool.
Thank you very much.
Do you work on Loci too?
Grafana is the most essential application in my job. I can use Notepad to code in a world without IDEs. I couldn't keep a damn thing running in the real world without Grafana. And I've been forced against my will to use alternatives in the past.
How did you learn it?
Basically brute force, I'm not great with it but I was the one on my team responsible for setting up our dashboards. I wrote the prometheus metric collection in our microservices and built the dashboards from that data.
There are tons of free dashboards though for monitoring resources and such so a lot of things I use are just downloaded from the Grafana website. And the docs are good too. So looking at examples + documentation is how I learn. It would be helpful if I was better with math though.
I guess it's time to start browsing the dashboards thanks
You work at Grafana?
It's been very snappy today, nice work! Is it all under Docker Compose with the node handling Nginx and Postgres as well?
Yes.
Why did you guys roll back the UI to .7 from .10? I enjoyed some of the UI improvements, but I guess there were some bugs?
Edit: I see its back to .10 maybe I had a browser tab open from before that I never refreshed
I‘m really grateful for your and your colleagues‘ work. Thank you for letting us lemmy around here!!!
Dang that's a lot of RAM
mastodon.world has the same server but with twice the RAM :-)
What chassis? I’ve got 256GB in an R720 but only 32 cores here!
It's a AX161 server at Hetzner
€142 is more reasonable than I expected! I’ll toss some cash to help!
You should see some of our VM hosts at work...
I can’t believe how fast you’ve managed to crowdsource and fix things on this instance. I haven’t seen many problems at all sharing comments and things.
From the lemmy.world front page:
Where in the frontpage can we see this?
Edit: thank you all!
It's on the right-hand sidebar of lemmy.world:
Awesome! I'm on mobile, so I cannot see it. Will check it out when I get to my computer.
You can view sidebar on mobile. I think it's in the three dots, but it's somewhere!
EDIT: On Jerboa it's under Community Info, under the three dots. On the mobile web app for L.W. there's a sidebar button.
Just go to lemmy.world and click sidebar.
This is awesome! As a systems engineer for my day job, I love seeing stuff like this!
Some of my usage is in this data and I like that.
pretty gauges. the instance seems to be more stable/responsive today
How much is that in beans?
At least 1
Possibly 2
Let’s not go crazy
About tree fiddy
Damn that’s a huge chunk of (what looks like) a 64 core CPU there. Impressive!
It’s cool it can aggressively cache that much. Although I am perplexed why one would have a swap file configured in this case? What does it give you here? Sorry not trying to be an elitist or anything just have no idea what advantage you get!
To be honest I tend to use swap less and less. But this was in the build that Hetzner does and I didn't remove it.
If your application goes wild with RAM usage, a properly configured swap will make sure the underlying OS remains responsive enough to deal with it.
The OOM killer is usually triggered after it starts hitting the disk. Which means your system is unresponsive for a long time until it finally kills something.
Using something like oomd can help trigger before it hits swap but then why are you using swap in the first place?
The bigger issue is that the kernel sometimes ignores the swappiness and will evict code/data pages long before file cache even when set to 0 or 1. I'm still not sure if that was because of an Ubuntu patch or if it was an issue that's been resolved in the years since I last saw this
How far do you see lemmy.world capable of scaling to? One thing I've been noticing is the centralisation of Lemmy users on a few top servers, surely that cannot be healthy for federation? What are your thoughts on this?
Not entirely sure of what you are asking, but the only reason they need a clustered setup is simply because of their scale. Making the details of their setup public does not help with the issue I addressed, since in an ideal scenario, communities and users would be evenly distributed amongst the many Lemmy instances in the fediverse, making the need to do any sort of clustering for performance reasons unnecessary.
We do run on 1 server, but we’ve now seen that Lemmy scales horizontally so the k8s path forward is open 😊 With all these latest improvements we can have a bit more users on the current box.
Oh, I could a swore I read somewhere you went multi. Maybe I'm confusing another instance
Not trying to be pedantic, but why do they have to do so? Why can't people figure it out themselves? Also, why can't Lemmy instances run on single non-redundant boxes? Most instance operators don't have the budget of enterprises, so why would they have to run their Lemmy's like enterprises?
I agree with this point, but I disagree with the context in which you mentioned, "They should post their clustered setup so others can replicate more easily", right as a reply to my original comment asking how Ruud felt about the centralisation of users in a federated application. This should've been an entirely separate reply, or perhaps an issue on GitHub to the Lemmy authors.
Again, I agree, but the context in which you mentioned it, basically suggests that everyone who runs single instance Lemmys are doing it wrong, which I disagree.
Lowering the entry requirements is part of how we can get wide-spread adoption of federated software. Not telling people that they have to have at least 2 instances with redundancies or they are doing it entirely wrong.
The bare minimum I would ask anyone running their own instance, is to have backups. They don't need fancy load-balancers, or slaved Postgres database setups, or even multi-node redis caches for their instances of sub-thousand users.
Seriously? That may be an acceptable price tag for a extremely public Lemmy host, like lemmy.world or lemmy.ml, but in no way should it be a reasonable price tag for the vast majority of Lemmy instances setup out there. Especially when most of them have sub-thousand users. $70/mo? That has to be a joke. You can easily host a Lemmy on a $5-$10 droplet for ~100 users.
No offense, but you definitely seem like the kind of person to shill for cloud-scaling and disregard cost-savings.
How much is this costing you? Also who is your host? Is it on a virtual machine?
They have a dedicated server: https://lemmy.world/post/75556
It's actually pretty funny to see him mention the growth (almost 12k users!) considering they've added, what, 50k or so users recently?
I signed up three days before that post. They were the largest instance with open signups. Almost 1000 users.
Whoa, cool. Thanks. Only a matter of time until it gets overloaded though. Can't Lemmy run in a container service like Cloud Run or AWS App Runner?
Yeah, you could do it in AWS with ECS or Fargate.
https://github.com/jetbridge/lemmy-cdk
Indeed you can, very cool.
Dedicated means local?
Dedicated usually means it’s not splitting cpu time with another instance. It could mean a local machine but it does not have to be one.
Tbh I'd see it hard to be local, so maybe it is cloud computing but a standalone instance as you just said.
No, it means it's got the physical machine all to itself. It's a rented server located in a Hetzner data center.
My homies love dedicated servers
I know that the RAM cache is just taking advantage of otherwise free RAM and will be dropped in favor of anything else, but it does stress me out a bit to see it "full" like that.
It would stress me even more to see a lot of RAM doing nothing, that would be a shame! ;-)
Difference between Windows and Linux. Windows would only use what it needs. Linux pre-empts more and fills the RAM for what coul dbe needed.
It used to stress the shit out of me when I switched to Linux as I'd gotten used to opening task manager and seeing 90% free RAM. On Linux I'd be seeing 10% free and panicking thinking it was a resource hog.
The Linux-way is the best way.
I use Arch btw ;)
Both OSes do pre-caching and for both the standard tools to check usage nowadays ignore pre-cached elements when counting RAM usage.
I had a feeling that 'factoid' may be out of date! Since I learnt it about the time of Windows XP when we were shown examples of how Linux and Windows memory management differed. It all made sense why Linux seemed to have full RAM even after a big upgrade but WinXP gave the 'illusion' of having lots of free RAM to use. ~ 20yrs ago!
I think we used SuSE Linux 7.3!
I still hold a savage hatred of all RPM-based distros after dealing with the hell of early 2000's editions (Redhat, Mandrake & Suse). Though I did like SuSE KDE's colours when it worked!
But Windows also does pre caching?
It probably just didn't mark that memory as "used" in the task manager.
I discovered this about 20yrs ago and there's been a lot of drugs & drink since then.
I do remember I could open my shit-hot 256Mb RAM desktop with Windows XP taskmanager and it shows a whopping 128Mb free RAM. 😎
Then I'd boot into my '733T H4X0r' Suse Linux 7.3 and top would show 5Mb free RAM. 😱
This caused much upset until I found out the two OS's have (had?) fundamentally different memory utilisation philosophies.
May not be the case anymore but it was late 90s/early 00s.
That's how it supposed to work, free RAM does nothing :)
It’s free real estate!
If you had this much buffer memory what are the reasons to have swap space as well?
With my servers I’m paranoid having swap enabled will inadvertently slow stuff down. Perhaps there’s a reason to have it that I’m unaware of?
Many programs do stuff once during startup that they never do again, sometimes creating redundant data objects that will never get accessed in the configuration its being run in. Eventually the kernel memory manager figures out that some pages are never used but it can't just delete them. If swap is enabled it can swap them to disk instead. It frees up that RAM for something more important. It's usually minor but every few MB helps.
I personally like having some swap as during low memory situations (which lemmy gets at least once a day on my small instance) everything slows down rather than getting culled by the oom killer. It's not a replacement for monitoring, but it does extend the timeframe to react to things.
Memcache usually takes all the assigned memory regardless of usage so seeing high usage isn't always unusual. That's assuming the lemmy servers are using some kind of session caching solution.
I hate that radial graphs are so popular with *Grafana dashboards. Radial/pie charts are terrible representations for humans to interpret. I tend to try and convert them either to a stat with the line/time display or a bar chart. Humans are better judging linear relationships than radial.
Who says I'm human?
Or are you dancer?
Killers ❤️
Radial graphs are a bit of a meme where I work as one of the C-suite managers despises them for precisely that reason.
Now that’s hot
As a server admin, I really hope it's not hot
2hot
Looks Awesome! Glad to see the patches seem to be working.
Awesome. Gotta love Grafana!
This is so cool to see. Thanks for posting! Lemmy.world has been super smooth today
Love me some grafana.
Everytime I open a post and go back to previous page it scrolls back to top. Is this fixable? Im on windows 11, chrome.
I was hoping to see some uptime, but thanks for the window into your server! Are you still having to kill the instance every half hour?
It says uptime is 3.3 weeks in the top right.
Hmmm... maybe the instance uptime is different from the server uptime.
Great stats. Thanks for posting!
Always fun to see system dashboards.
Quite a beefy setup 😄
Thanks for all the hard work. It has been running so well all day!
I notice your defederation list is completely depopulated today. Is that intentional?
No it's just moved to the bottom of the page apparently. I preferred it on the side. Maybe a tab would be better.
On 0.18.1-rc.10 the defederated instances are at the very bottom, not on the right hand side.
OOoooooh! Thanks for the info.
Also infinitely less clear and helpful...
It may be a bug, I'm not sure.
THE DROP???!! >:O
Seriously! Talk about amazing optimization and debugging of the network service.
Is the memory leak still there?
No! Restarts are disabled and it's OK now!
Great! 😁 I was just wondering because the memory graph showed sharp falls in memory usage every ~30 mins.
That is probably the Garbage Collector running.
Rust has no garbage collector though. Memory is freed up as soon as the variable leaves the current scope.
I’m guessing the server was still set up to restart every 30 mins at the time this pic was taken. Then they tried disabling that and it was fine.
It was mildly educated guess, I know very little about Rust.
I fucking love a sexy Grafana dashboard.
So I‘m currently on planning to host an instance myself. This graph helped me quite a lot to get an idea what system resources are required.
Do you use any reverse proxy in front of it?
Nginx runs on the server , proxying to the lemmy docker containers
That‘s what I had in mind. To run nginx on a seperate vps, so I can scale it easier. Run fediverse instances in the back, either all on one vps or on different vps. This way I could provide a hub while increase performance (due to compression and caching) and provide redundancy/load balancing if necessary.
What‘s the typical traffic you experience? Peak (Gbit/s) and average/daily traffic (GB)
Thanks, that’s super helpfu!
Lemmy world has a lot of users. So your instance initially will require a lot less resources ✌️
Yeah I saw that. I‘m a big fan of minimalistic, yet super performant architectures and I‘m just trying to get a feeling on how I could solve this problem. I try to avoid any downtime, whenever possible
Ahh look at all those nice charts and diagrams, that's true server porn lol.
Again thank you very much for your awesome job. We all really appreciate that <3
Can someone give me a hand. I see tons of posts of people talking about a picture in the OP but i see nothing. Am i doing something wrong? Is my connection bad? This seems to be happening quite a lot. For example the meme instance has almost zero pictures but i know just about every post should have one.
hmm yeah it was gone.. need to investigate..
The entire team is doing an amazing job. Lemmy is getting smoother with each passing day. I hope it keeps growing (and none of you get too burnt out in the process)!
I think you can export the dashboard the way it looks to you - into Grafana cloud. Like a snapshot. Click "Share" then "upload" and share the link.
We won't be able to see historical data as it takes only dashboard snapshot with visible data.
Would be cool, isn't it?
Used some provisioning templates to get started 😁
🤤
This is indeed interesting, thanks again for the service!
Such pretty gobbledygook!
Nice! That's a nice-looking dashboard, would you mind sharing its JSON config? Thanks!
it's the popular one on grafana.com - https://grafana.com/grafana/dashboards/1860-node-exporter-full/
Thanks!
I could share the template, if ya like.
Thanks for the offering, but no worries, some user posted it and I found it already
Sexy loads.
I imagine sexy sax man playing in the background while watching these graphs.
You just can't beat the dopamine hit from "pointy chaos graph go smooth". Delicious. Great work!
Does Lemmy have a memory leak?
Lemmory meak?
Yes at least until yesterday's version...
heh
From those graphs, memory usage is very low. Most of it is being used for disk caching, which is what linux does with memory it has no other use for (may as well use it for something).
Yes, but we still restart the containers every 30 min. I'm gonna see if that's still needed after the recent changes.
Ah, so that's the reason for the regular dips in the memory graph I assume? They do indeed seem to be spaced every 30 minutes.
The consistent, sharp dips every 15 minutes made me assume that the container was being restarted.
These graphs were generated from https://github.com/prometheus/node_exporter (I believe, not my graphs). They're showing system-level data, not lemmy specific data.
Correct!
I have a love and hate relationship with Grafana but it probably feels the same
Who you hosting with?
They have a dedicated server: https://lemmy.world/post/75556
I figured haha. I was wondering which company they used
Whois shows Hetzner which answers my question :D
Is that kibana or graphana?
That's the grafana icon and it says grafana top right
grafana, judging by the logo
Graphana
Is there anything Grafana cant do?
I have so many things pumping data “into” Grafana these days I’m surprised they haven’t tried to force me to pay for an enterprise license.
Anyway, thanks for sharing these, @[email protected]. As a performance engineer, I love to see this level of detail and commitment on your part to keep the user experience for lemmy.world at acceptable levels.
It can't make me pancakes.
Wrong tool for the job, but if you want to order pizza, you can use terraform:
https://registry.terraform.io/providers/MNThomson/dominos/latest/docs
I suppose you could then feed your Terraform runs into Grafana and use it to track your pizza consumption.
In the early days of the pandemic…and the early days of my Ansible learning…I set up a playbook to scrape several websites for hand sanitizer and Clorox wipes.
If it found one in stock, it would email my cell phone carriers SMS gateway. Tasker would then make a loud audible alert.
Ran for weeks before it found some in stock. And then it did. At 2am. And again at 2:05, and 2:10, and 2:15…
And it was an error on the shops webpage. It wasn’t actually orderable…once it got in your cart, it wouldn’t let you check out.
Bwahaha:
Reminds me of the old adage:
I don't know what any of that means, but graag gedaan!
DataIsBeautiful vibe
I am not seeing them. Are they gone?
Thats ~19 cores pegged at 100%, eating 128GiB of ram (OS disk cache included) and bleeding onto swap. 🤯
I think you're misreading it. The olive green in the CPU chart is idle. RAM cache taking up most of system memory is also normal on most Linux systems, even on desktop. That cache is freed for applications to use as needed.
Welp, my only calculation was "64 cpu threads * 30% load -> ~19 cores busy", I may be guilty of rounding up too much... The RAM usage is intresting however, since the kernel seems to be caching all it can, to point ejecting uneeded data into swap in order to retain the disk cache. If more ram is reserved by running processes, the (likely pict-rs, database services) disk access times will begin to degrade.
It could also be all 64 threads being used lightly with the scheduler trying to spread the load out evenly.
Not sure what the exact situation regarding swap in that graph, but I've also had the kernel preemptively use swap for rarely used chunks of memory in favor of cache when running long-running processes. Its probably relatively normal.
This is cool! Thanks for sharing
Pretty cool stuff.
I love it, thank you!
calckey widgets