147
submitted 2 years ago* (last edited 2 years ago) by Averrin@lemmy.world to c/selfhosted@lemmy.world

Correct me if I'm wrong. I read ActivityPub standards and dug a little into lemmy sources to understand how federation works. And I'm a bit disappointed. Every server just has a cache and the ability to fetch something from another known server. So if you start your own instance, there is no profit for the whole network until you have a significant piece of auditory (e.g. private instances or servers with no users). Are there any "balancers" to utilize these empty instances? Should we promote (or create in the first place) a way how to passively help lemmy with such fast growth?

you are viewing a single comment's thread
view the rest of the comments
[-] majorswitcher@lemmyfly.org 11 points 2 years ago

every instance is sharing in the traffic to browse the fediverse. Not one service is responsible for serving content, you (the instance admin) are only serving for your members.

The downside of this is there is a huge amount of replicated data stored everywhere. Content of popular communities will be scraped by and stored on many many servers, filling up servers and increasing storage and bandwith bills for all those servers

[-] deadcyclo@lemmy.world 6 points 2 years ago

I'm not sure your second paragraph is correct. First of all, it's "just in time" so will only be replicated if somebody on that instance is following it. But more importantly, I read a statement from a server owner somewhere that the software purges older content regularly (and refetches is "just in time" when somebody tries to view the old content) to keep storage size down.

[-] JustEnoughDucks@feddit.nl 3 points 2 years ago* (last edited 2 years ago)

If this is the case, then wouldn't his fitst paragraph be incorrect also? Because if it is "just in time" with quick purging, the main server still has to constantly serve the instance server the content. It would only be beneficial if many instance users are trying to view the exact same content at around the same time (so for the "massive" communities maybe?)

[-] deadcyclo@lemmy.world 3 points 2 years ago

From my understanding you are correct. Each instance is responsible for serving all of the content of the communities created on it. So many small instances with a smaller amount of communities = good, a few huge instances with lots of communities = bad.

[-] majorswitcher@lemmyfly.org 1 points 2 years ago

The purging of older content is good news I didn’t know that.

[-] Averrin@lemmy.world 5 points 2 years ago

Please elaborate, how is "every instance is sharing in the traffic to browse the fediverse". I didn't find it nor in AP standards, nor in activitypub_federation lib docs. If there is some mechanisms of balancing inside the lemmy's code, would you mind pointing it for me?

[-] majorswitcher@lemmyfly.org 3 points 2 years ago

Looking into the database, it contains many thousands of posts. I’m assuming this is stored in the local db for serving it to instance members. So when you open a post from instance B on instance A, A fetches post-data from B, stores it in A database, then serve the content from db A to the browser

[-] Averrin@lemmy.world 2 points 2 years ago

Yes, you are right. If this instance has members. A server will actively fetch "foreign" content and cache when this instance's user asks. But aside of top 10 servers, there is no profit of having more until they have a couple of dozens of users. If any server would have been able to "delegate" request handling to less busy servers, it will be a solution for this uneven load.

[-] jon@lemmy.tf 3 points 2 years ago

The replication isn't all that bad. Images stick around in their local instance, the federated data is all JSON payloads and metadata. Yes it will pile up over time, but only instances with hundreds of users and thousands of indexed communities are at risk of massive storage needs.

this post was submitted on 12 Jun 2023
147 points (100.0% liked)

Selfhosted

40767 readers
488 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS