mamot.fr is one of the many independent Mastodon servers you can use to participate in the fediverse.
Mamot.fr est un serveur Mastodon francophone, géré par La Quadrature du Net.

Server stats:

3.5K
active users

#sre

33 posts31 participants2 posts today

So, I've been using Thanos to receive and store my prometheus metrics long term in a self hosted S3 bucket. Thanos also acts as a datasource for my dashboards in Grafana, and provides a Ruler, which evaluates alerting rulers and forwards them to my alertmanager. It's ok. It's certainly got it's downsides, which I can go into later, but I've thinking... what about Mimir?

How do you all feel about Grafana's Mimir (source on GitHub)? It's AGPL and seems to literally be a replacement of Thanos, which is Apache 2.0.

Thanos description from their website:

Open source, highly available Prometheus setup with long term storage capabilities.

Mimir description from their website:

...open source software project that provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus and OpenTelemetry metrics.

Both with work with alloy and prometheus alike. Both require you to configure initially confusing hashrings and replication parameters. Both have a bunch of large companies adopting them, so... now I feel conflicted. Should I try mimir? Poll in reply.

Hello, hachyderm! we've been working hard on building up our ansible runbooks and improving hachyderm's overall resilience. Recently, we've been focusing on is database resilience.

We're getting close to retiring our original database server (finally!) and preparing to move to a fully ansible-managed set of databases servers, primary and replica on new hardware. We'll send another announcement when we do the cut over. The team has done excellent work to make this highly automated, quick, and painless! :blobfoxscience:

Done:

✅ author ansible roles for managing postgresql, pgbackrest (backups), pgbouncer, and primary/replica failover
✅ decide to continue with pgbouncer and *not* use pgcat
✅ rotate database passwords
✅ order new replica database hardware
✅ order new future primary database hardware

To do soon:

🟨 rebuild replica database with ansible scripts
🟨 prepare primary database with ansible scripts
🟨 start replicating to new database replica
🟨 cut over to new database server 🎉

We're also planning on open-sourcing our ansible roles in the coming weeks - just a little housekeeping & tidying up before we do!

hey, fediverse friends - i'm excited that we're finally announcing our Fediverse Security Fund over at @nivenly to help make fedi software more secure.

we're starting off super small to see if the Fund is a thing that can help. along the way we'll learn and improve our intake/payout process. and if there's solid interest and we see good impact, we'll hold a member vote near the end of the experiment to decide if we'll renew/expand the program.

thanks to @thisismissem for her contributions and being the first disclosure to validate the process.

let's close some vulns! :blobfoxscience:

Pushing core workout lately and being rewarded with more mornings free of migraine.

I played deeply into my music the past few nights, awaking the next morning scrubbed of a migraine.

Having those who listen and witness allows me to let go of emotions when I am having them, not carry them around. Less migraine activity ensues.

This week I learned that my anxiety about others is entwined with a particularly evil symptom of religious trauma, I saw both but never saw hiw they were connected.

I can recognize it now. And the feeling of not needing to "save" someone is a really powerful emotion - or lack of one - that, today, I am thankful for contributing to a clear head and no migraine.

Also feeling self-assured that fixing failures in our systems look a lot more like treating a migraine than using quick-fixes and low-hanging-fruit.

Continued thread

System Administration

Week 8, The Simple Mail Transfer Protocol, Part III

In this video, we look at ways to combat Spam. In the process, we learn about email headers, the Sender Policy Framework (#SPF), DomainKeys Identified Mail (#DKIM), and Domain-based Message Authentication, Reporting and Conformance (#DMARC). #SMTP doesn't seem quite so simple any more...

youtu.be/KwCmv3GHGfc

Continued thread

System Administration

Week 8, The Simple Mail Transfer Protocol, Part II

In this video, we observe the incoming mail on our MTA, look at how STARTTLS can help protect information in transit, how MTA-STS can help defeat a MitM performing a STARTTLS-stripping attack, and how DANE can be used to verify the authenticity of the mail server's certificate.

youtu.be/RgEiAOKv640

howdy, #hachyderm!

over the last week or so, we've been preparing to move hachy's #DNS zones from #AWS route 53 to bunny DNS.

since this could be a pretty scary thing -- going from one geo-DNS provider to another -- we want to make sure *before* we move that records are resolving in a reasonable way across the globe.

to help us to do this, we've started a small, lightweight tool that we can deploy to a provider like bunny's magic containers to quickly get DNS resolution info from multiple geographic regions quickly. we then write this data to a backend S3 bucket, at which point we can use a tool like #duckdb to analyze the results and find records we need to tweak to improve performance. all *before* we make the change.

then, after we've flipped the switch and while DNS is propagating -- :blobfoxscared: -- we can watch in real-time as different servers begin flipping over to the new provider.

we named the tool hachyboop and it's available publicly --> github.com/hachyderm/hachyboop

please keep in mind that it's early in the booper's life, and there's a lot we can do, including cleaning up my hacky code. :blobfoxlaughsweat:

attached is an example of a quick run across 17 regions for a few minutes. the data is spread across multiple files but duckdb makes it quite easy for us to query everything like it's one table.

Ugh slept like shit. The drama and stress from work is giving me insomnia. The boss who quit didn't give leave ANYTHING for my new boss to understand what I do with DevEx and incidents.

I had to bust my ass yesterday to basically hold the ground that I had worked extremely hard over the past four months to make it so that I could manage incidents. New boss wanted to yank me out of it and put me back on "SRE" infrastructure.

CTO says incidents are staying where they are, with Engineering. So I told him I want to transfer to Engineering.

And then I get the question "do want to be an SRE?"

Ridiculous. Makes me want to scream.

I really liked this informal community poll and thematic analysis on SLO usage. It does a better job at highlighting the hurdles to adopting them at a Company Who Is Not Google than a lot of "Here's how to do SLOs" pieces that just don't cover it.

If there is ever a "Seeking SLOs" book, this should be the first chapter.

ericmustin.substack.com/p/note

A Small, Good Thing · Notes on Service Level ObjectivesBy Eric Mustin