Is the storage expectations for self-hosting ATProto including a relay really 5tb (with the expectation also that this will grow)? https://alice.bsky.sh/post/3laega7icmi2q
On Linode's default shared hosting that's getting into a full salary, like $55k/year, territory https://www.linode.com/pricing/
Is this right?
Every self-hosted user on the instance also needs to fetch every self-hosted user on the network, which also seems it would mean that the amount of network traffic for retrieving information is O(n^2)?
@cwebber if by “instance” you mean the relay+AppView, no, users would just query it and get ready-to-use results like you can currently do on bsky.app?
but yeah, you are not meant to self-host your own bluesky, only your own data repository basically
@Claire The article I pointed to was about fully self hosting your own atoproto infrastructure, including relay, and people seem to be getting excited about it on bluesky as being feasible, and I'm feeling uncertain it is
@cwebber @Claire Clearly for an individual it's far too expensive to run the whole stack, but I guess the expectation is that 3rd party will like... form to gather that kind of resources/funding to run relays? Still unclear to me how things actually work when more than one Relay exist. Is the Bluesky app supposed to be able to switch between multiple at once?
@eramdam @cwebber the front-facing Bluesky app is built to use Bluesky PBC's AppView, i don't think it has any provision to switch to another provider with the same API (but i might be wrong)
you can theoretically build an equivalent relay+AppView that works in the same way with the same data (although you have yet to build some of the pieces yourself, not everything in bsky is open source afaik), but it's unclear what incentives you'd have to do that
@Claire @eramdam Well I mean, Bluesky's ATproto recently has made the rounds as the "more decentralized" protocol over ActivityPub, and clearly a lot of people on my Bluesky feed right now seem to think it's more decentralized, and it does have one major improvement which is that it uses content-addressed posts (I don't think the DID methods currently actually do their job though the goal is good, I can expand on that at a future time)
Which is what's leading me to look more into it, in which ways is it more decentralized in practice? Is it even particularly decentralized in practice? I'm trying to gain a technical understanding.
@cwebber @Claire @eramdam don't think the goal w/ atproto is to be "more decentralized" in the abstract. we (team) had worked on SSB and dat, which were radically decentralized/p2p but hard to work with and grow. would not supplant "the platforms".
atproto came out of identifying the *minimum* necessary decentralization properties, and ensuring those are strongly locked in. we basically settled on:
@cwebber @Claire @eramdam
unbundling and composition: ability to swap components w/o swapping full stack.
credible exit: ability for new interoperable providers to enter network. public data must be accessible w/o permission, and schemas/APIs declared
control identity: whole network/proto doesn't need to be individualist, but identity is personal
easy to build new apps: don't build for the old modalities, enable new modalities. accomodate opinionated devs.
@cwebber @Claire @eramdam I have a longer post on this, and our progress, on my personal blog:
https://bnewbold.net/2024/atproto_progress/
@cwebber @Claire @eramdam
I think a bunch about this post about the history of mp3 piracy and "minimum viable decentralization":
https://web.archive.org/web/20180725200137/https://medium.com/@jbackus/resistant-protocols-how-decentralization-evolves-2f9538832ada
(though it wasn't directly influential on atproto design, and Backus has since pulled the post)
@bnewbold Thank you Bryan, this is extremely helpful!
I hope to see multiple #BlueSky relays soon (incentives unclear: https://neuromatch.social/@jonny/113364719373034539). I worry about the climate costs of many full copies.
One accidental design feature in #Mastodon is how an instance serves as "relay" with a cache of posts and media caused haphazardly by whatever happens to federate. This is messy but flexible. https://masto.host/re-mastodon-media-storage/ Instances can share deduplicated object storage. https://jortage.com/
@luca It's the opposite. The amount of media to be cached per user grows less than linearly with the amount of users on an instance. A single-user instance has the highest level of storage waste. An instance with a million users (like mastodon.social) already mirrors pretty much everything so they would have little benefit from deduplication.
You don't need to guess, we already have actual numbers. Check the figures of Jortage members.
@luca We're saying the same thing but I'm not sure about the details. Perhaps the optimal size for storage is closer to 1000 users rather than 100. I'm not sure I'd call 1 TiB of storage (the average in the Jortage pool) "minimal". It's still much better than 90 (what they'd need if each member of the pool mirrored everything).
Recently #Mamot started purging old media and posts with the extremely broken https://github.com/mastodon/mastodon/discussions/19260 . It's not clear why, perhaps to simplify PostgreSQL scaling.
@luca A single-user Pleroma instance is indeed a very different matter because Pleroma relies on hotlinking from remote instances. Only other Pleroma users or people who visit your instance directly will see your own media served by your server. I see it through Mamot's cache.