this post was submitted on 14 Aug 2024

5 points (100.0% liked)

Programmer Humor

19154 readers

2012 users here now

Welcome to Programmer Humor!

This is a place where you can post jokes, memes, humor, etc. related to programming!

For sharing awful code theres also Programming Horror.

Rules

Keep content in english
No advertisements
Posts must be related to programming or programmer topics

founded 1 year ago

MODERATORS

[email protected]

[email protected]

[email protected]

5

All I wanted to do was push my changes and log off... [Github Outage] (dubvee.org)

submitted 1 month ago* (last edited 1 month ago) by [email protected] to c/[email protected]

10 comments fedilink hide all child comments

Github seems to be down.

Edit: After I made this, their status page finally updated to indicate an issue.

Update - We are experiencing interruptions in multiple public GitHub services. We suspect the impact is due to a database infrastructure related change that we are working on rolling back.

top 10 comments

sorted by: hot top controversial new old

[–] [email protected] 1 points 1 month ago

So... you crashed github? 😋

[–] [email protected] 1 points 1 month ago (3 children)

This is a common problem. Same thing happens with AWS outages too. Business people get to manually flip the switches here. It’s completely divorced from proper monitoring. An internal alert triggers, engineers start looking at it, and only when someone approves publishing the outage does it actually appear on the status page. Outages for places like GitHub and AWS are tied to SLAs that are tied to payouts or discounts for huge customers so there’s an immense incentive to not declare an outage even though everything is on fire. I have yelled at AWS, GitHub, Azure, and a few smaller vendors for this exact bullshit. One time we had a Textract outage for over six hours before AWS finally decided to declare one. We were fucking screaming at our TAM by the end because no one in our collective networks could use it but they refused to declare an outage.

[–] [email protected] 1 points 1 month ago

Yeah my second spot if a status page is green is always https://downdetector.com/ since it’s user generated

[–] [email protected] 1 points 1 month ago

Or, alternatively, coms management is important and formally declaring an incident is an important part of outage response - going from "hey Bob something isn't looking right can you check when you get a sec" to "ok, shits broken, everyone put down what you are working on and help with this. Jim is in charge of coordinating the technical people so we don't make things worse, and should feed updates to Mike who is going to handle comms to non-technical internal people and to externals" takes management input

[–] [email protected] 0 points 1 month ago (1 children)

It's manual?? Holy shit, that explains some previous hair pulling.

[–] [email protected] 1 points 1 month ago

To be clear, usually there’s an approval gate. Something is generated automatically but a product or business person has to actually approve the alert going out. Behind the scenes everyone internal knows shit is on fire (unless they have shitty monitoring, metrics, and alerting which is true for a lot of places but not major cloud or SaaS providers).

[–] [email protected] 0 points 1 month ago (1 children)

https://git.30p87.de with 90% uptime (I reboot the server 10% of the time due to bleeding-edge arch testing kernel updates)

[–] [email protected] 1 points 1 month ago

Funny how I just hit a downtime when trying to explore your GL.

[–] [email protected] 0 points 1 month ago (1 children)

its all red, how did the fuck up that badly.

[–] [email protected] 1 points 1 month ago

Some intern is having a bad day right now.