view the rest of the comments
Ask Lemmy
A Fediverse community for open-ended, thought provoking questions
Please don't post about US Politics. If you need to do this, try !politicaldiscussion@lemmy.world
Rules: (interactive)
1) Be nice and; have fun
Doxxing, trolling, sealioning, racism, and toxicity are not welcomed in AskLemmy. Remember what your mother said: if you can't say something nice, don't say anything at all. In addition, the site-wide Lemmy.world terms of service also apply here. Please familiarize yourself with them
2) All posts must end with a '?'
This is sort of like Jeopardy. Please phrase all post titles in the form of a proper question ending with ?
3) No spam
Please do not flood the community with nonsense. Actual suspected spammers will be banned on site. No astroturfing.
4) NSFW is okay, within reason
Just remember to tag posts with either a content warning or a [NSFW] tag. Overtly sexual posts are not allowed, please direct them to either !asklemmyafterdark@lemmy.world or !asklemmynsfw@lemmynsfw.com.
NSFW comments should be restricted to posts tagged [NSFW].
5) This is not a support community.
It is not a place for 'how do I?', type questions.
If you have any questions regarding the site itself or would like to report a community, please direct them to Lemmy.world Support or email info@lemmy.world. For other questions check our partnered communities list, or use the search function.
Reminder: The terms of service apply here too.
Partnered Communities:
Logo design credit goes to: tubbadu
Over 150 Major Incidents in a single month.
Formerly, I was on the Major Incident Response team for a national insurance company. IT Security has always been in their own ivory tower in every company I've worked for. But this company IT Security department was about the worst case I've ever seen up until that time and since.
They refused to file changes, or discuss any type of change control with the rest of IT. I get that Change Management is a bitch for the most of IT, but if you want to avoid major outages, file a fucking Change record and follow the approval process. The security directors would get some hair brained idea in a meeting in the morning and assign one of their barely competent techs to implement it that afternoon. They'd bring down what ever system they were fucking with. Then my team had to spend hours, usually after business hours, figuring out why a system, which had not seen a change control in two weeks, suddenly stopped working. Would security send someone to the MI meeting? Of course not. What would happen is, we would call the IT Security response team and ask if anything changed on their end. Suddenly 20 minutes later everything was back up and running. With the MI team not doing anything. We would try to talk to security and ask what they changed. They answered "nothing" every god damn time.
They got their asses handed to them when they brought down a billing system which brought in over $10 Billion (yes with a "B") a year and people could not pay their bills. That outage went straight to the CIO and even the CEO sat in on that call. All of the sudden there was a hard change freeze for a month and security was required to file changes in the common IT record system, which was ServiceNow at the time.
We went from 150 major outages (defined as having financial, or reputation impact to the company) in a single month to 4 or 5.
Fuck IT Security. It's a very important part of of every IT Department, but it is almost always filled with the most narcissistic incompetent asshats of the entire industry.
Jesus Christ I never thought id be happy to have a change control process
Lots of safety measures really suck. But they generally get implemented because the alternative is far worse.
At my current company all changes have to happen via GitHub PR and commit because we use GitOps (ex: ArgoCD with Kubernetes). Any changes you do manually are immediately overwritten when ArgoCD notices the config drift.
This makes development more annoying sometimes but I'm so damn glad when I can immediately look at GitHub for an audit trail and source of truth.
It wasn't InfoSec in this case but I had an annoying tech lead that would merge to main without telling people, so anytime something broke I had his GitHub activity bookmarked and could rule that out first.
You can also lock down the repo to require approvals before merge into main branch to avoid this.
Since we were on the platform team we were all GitHub admins 😩. So it all relied on trust. Is there a way to block even admins?
Hm can't say. I'm using bitbucket and it does block admins, though they all have the ability to go into settings and remove the approval requirement. No one does though because then the bad devs would be able to get changes in without reviews.
That sounds like a good idea. I'll take another look at GitHub settings. Thanks!
The past several years I have been working more as a process engineer than a technical one. I've worked in Problem Management, Change Management, and currently in Incident for a major defense contractor (yes, you've heard of it). So I've been on both sides. Documenting an incident is a PITA. File a Change record to restart a server that is in an otherwise healthy cluster? You're kidding, right? What the hell is a "Problem" record and why do I need to mess with it?
All things I've heard and even thought over the years. What it comes down to, the difference between a Mom and Pop operation, that has limited scalability and a full Enterprise Environment that can support a multi-billion dollar business... Is documentation. That's what those numb nuts in that Insurance Company were too stupid to understand.
You poor man. I've worked with those exact fukkin' bozos.
Lack of a Change Control process has nothing to do with IT Security except within the domain of Availability. Part of Security is ensuring IT systems are available and working.
You simply experienced working at an organization with poor enforcement of Change Control policies. That was a mistake of oversight, because with competent oversight anyone causing outages by making unapproved changes that cause an outage would be reprimanded and instructed to follow policy properly.