Global BSOD

Mr Jack · 24 Jul 2024 at 12:22

CGrieves said:
Aaaaaaaaaanyway, at the risk of interrupting the squabbling, Crowdstrike's response is here:

Interesting reading, and I suspect it will feature prominently in the coming lawsuits. Surely, it is negligent for a cybersecurity company on the scale of Crowdstrike to be pushing these "Rapid Response Content" updates without using canaries, or even manual local testing? The other issues are, I think, understandable. They should have better coded the handling of these content files better, and they should have had better automated testing, and they should probably have realised that faults in these files could potentially be devastating and designed the code to gracefully recover if it ever happened - but no code is faultless, and hindsight is 20:20, etc.

CGrieves · 24 Jul 2024 at 13:35

Mr Jack said:
Interesting reading, and I suspect it will feature prominently in the coming lawsuits. Surely, it is negligent for a cybersecurity company on the scale of Crowdstrike to be pushing these "Rapid Response Content" updates without using canaries, or even manual local testing? The other issues are, I think, understandable. They should have better coded the handling of these content files better, and they should have had better automated testing, and they should probably have realised that faults in these files could potentially be devastating and designed the code to gracefully recover if it ever happened - but no code is faultless, and hindsight is 20:20, etc.

100% agree. If Crowdstrike survive- and I hope they do, I think the product is good, it's the process that's bad- then I'd at least expect a configurable ability for customers to N-X and canary definition updates the same way we can for agent updates.

The risk is of course that customers may be behind on protection against emerging threats, but at least that's a risk we can manage ourselves.

Rroff · 24 Jul 2024 at 13:39

Mr Jack said:
but no code is faultless, and hindsight is 20:20, etc.

Hindsight is 20:20 especially with security, but almost no one seems to have any vision any more, probably doesn't help that the newer generation of coders aren't the pioneers some of the older ones were and so don't have the experiences learnt along the way but still.

Mr Jack · 24 Jul 2024 at 14:26

Rroff said:
Hindsight is 20:20 especially with security, but almost no one seems to have any vision any more, probably doesn't help that the newer generation of coders aren't the pioneers some of the older ones were and so don't have the experiences learnt along the way but still.

Yeah, I dunno: it seems like the kind of thing that I'd have thought of if I was writing that kind of code but is that just hindsight? Clearly they don't have a big problem here, because they have a huge installed base and this is a rare event, on the other hand they also managed to brick a Linux distribution with one of their updates not so long ago. I did see a report they'd recently had a big headcount reduction I do wonder whether they fired the people who'd been stopping this happening.

CGrieves said:
100% agree. If Crowdstrike survive- and I hope they do, I think the product is good, it's the process that's bad- then I'd at least expect a configurable ability for customers to N-X and canary definition updates the same way we can for agent updates.

Agreed.

Murphy · 24 Jul 2024 at 18:13

Mr Jack said:
The blame for this lies almost entirely on Crowdstike. Everything else is basically an irrelevance.

Personally speaking i wouldn't go that far, IMO it's 50/50 as what Cs managed to do either shouldn't be possible in the first place or the Windows kernel should automatically unload/not load certain boot drivers if the machine fails to boot a few times.

CGrieves said:
The risk is of course that customers may be behind on protection against emerging threats, but at least that's a risk we can manage ourselves.

The only issue i have with that is it gives bad actors a window to reverse engineer the protection, basically sign up to the canary release channel so they can discover emerging threats they may not be aware of and exploit them before the wider community is protected.

d_brennen · 24 Jul 2024 at 18:25

If it compiles, send it

SupraWez · 24 Jul 2024 at 19:00

d_brennen said:
If it compiles, send it

Agile

Pho · 24 Jul 2024 at 19:01

Murphy said:
Personally speaking i wouldn't go that far, IMO it's 50/50 as what Cs managed to do either shouldn't be possible in the first place or the Windows kernel should automatically unload/not load certain boot drivers if the machine fails to boot a few times.

The only issue i have with that is it gives bad actors a window to reverse engineer the protection, basically sign up to the canary release channel so they can discover emerging threats they may not be aware of and exploit them before the wider community is protected.

According to the Dave Plummer video (https://www.youtube.com/watch?v=wAzEJxOo1ts) Crowdstrike set their driver to be a boot-start driver, marking it as required driver for Windows to boot. Microsoft use boot-start for their shipped drivers (https://learn.microsoft.com/en-us/windows-hardware/drivers/install/installing-a-boot-start-driver) - I guess they're designed for things like keyboard/mouse drivers, stuff you really do need. And he explains normally windows should detect a bad driver and stop it, but if it's a boot-start one apparently it doesn't.

Mr Jack · 24 Jul 2024 at 19:26

Murphy said:
Personally speaking i wouldn't go that far, IMO it's 50/50 as what Cs managed to do either shouldn't be possible in the first place or the Windows kernel should automatically unload/not load certain boot drivers if the machine fails to boot a few times.

Windows does, but some drivers can be marked as essential so it won't try and boot without them.

This is on Crowdstrike.

Murphy · 24 Jul 2024 at 23:00

Pho said:
And he explains normally windows should detect a bad driver and stop it, but if it's a boot-start one apparently it doesn't.

Therein lies the problem, a failed boot driver unless it's truly critical shouldn't cause a bug check simply because it fails to load. Even if it does you'd expect a more fault tolerant system to automatically disable the loading of a driver after Xn of failed boots.

Basically you'd expect/hope something like a problematic driver would be handled in a more graceful manner than creating a bug check and leaving people with an unusable system.

That's why i say it's 50/50 IMO because while the problem was caused by Cs they were only able to cause that problem because of how MS/Windows handles, or rather doesn't handle, such problems.

CGrieves · 24 Jul 2024 at 23:07

Murphy said:
Therein lies the problem, a failed boot driver unless it's truly critical shouldn't cause a bug check simply because it fails to load. Even if it does you'd expect a more fault tolerant system to automatically disable the loading of a driver after Xn of failed boots.

Basically you'd expect/hope something like a problematic driver would be handled in a more graceful manner than creating a bug check and leaving people with an unusable system.

That's why i say it's 50/50 IMO because while the problem was caused by Cs they were only able to cause that problem because of how MS/Windows handles, or rather doesn't handle, such problems.

Would that be any better? Imagine if a bad actor could somehow orchestrate the disabling of an EDR solution worldwide. I think I'd prefer the BSOD and the cleanup effort....

As our global head of security semi-joked "At least we were truly protected for a day or two".

Murphy · 24 Jul 2024 at 23:20

CGrieves said:
Would that be any better? Imagine if a bad actor could somehow orchestrate the disabling of an EDR solution worldwide. I think I'd prefer the BSOD and the cleanup effort....

Depends on how you deal with such a situation, you could simply setup dependencies so if X fails to load then don't load N (N = Network so disable that if there's a problem loading the EDR protection).

There's more graceful ways of dealing with faults than a bug check, disable drivers, auto boot to safe mode, drop to a command prompt, etc, etc.

Uther · 25 Jul 2024 at 07:14

CrowdStrike: Company that caused global techno meltdown offers partners $10 vouchers to say sorry - and they don't work

The company behind the world's worst IT outage has given gift cards to its teammates and partners to apologise and thank them for the extra work during last week's meltdown.

news.sky.com

fez · 25 Jul 2024 at 13:24

Sometimes nothing at all is less insulting than something. My partner works for the NHS and over COVID the hospital she was at said they want to give their staff a gift for all their hard work. They got some flower seeds...

StriderX · 25 Jul 2024 at 13:49

I'd drop them over that insult, jesus.

Rroff · 25 Jul 2024 at 13:57

fez said:
Sometimes nothing at all is less insulting than something. My partner works for the NHS and over COVID the hospital she was at said they want to give their staff a gift for all their hard work. They got some flower seeds...

Ouch, yeah the optics sometimes on rewards is really missing.

fez · 25 Jul 2024 at 14:35

Part of the problem these days is that companies are always hunting maximum profit and things like QA, testing and slower pace of development are the first things to go. Hell, trillion dollar companies leak your data on a semi-regular basis due to genuinely amateur level security issues and the slap on the wrist isn't even remotely approaching the amount of money they will have saved by ignoring it for years. Thats if there are any repercussions.

Programming has a host of best practices but they are rarely adhered to in my experience. None of the companies I have worked at implement proper testing. Admittedly they have all been smallish companies but its usually a case of having the guy at the top screaming to "get it done" and the people between the developers and the top brass just make that **** roll downhill and cut corners wherever possible. Who cares about tomorrow and the fact that after 6 months of this your product is a fragile mess and development is now at a glacial pace because you spend 50% of your time fixing bugs you have introduced due to lack of testing.

Murphy · 25 Jul 2024 at 16:06

fez said:
Programming has a host of best practices but they are rarely adhered to in my experience.

I've often found myself wondering if we need something like health and safety standards for software, a set of rules that must be followed for the protection of the workforce and public.

Problem is the people who would be responsible for ensuring 'best practices', the politicians, often don't have any idea of what's risky or dangerous.

fez · 25 Jul 2024 at 16:37

Murphy said:
I've often found myself wondering if we need something like health and safety standards for software, a set of rules that must be followed for the protection of the workforce and public.

Problem is the people who would be responsible for ensuring 'best practices', the politicians, often don't have any idea of what's risky or dangerous.

The Cambridge Analytica scandal was eye opening. Facebook should have been dead and buried after that but the people responsible for taking them to task could barely understand what decade we are in let alone anything remotely cutting edge tech wise.

Yaayuh! · 30 Jul 2024 at 14:36

Azure status

Looks like Microsoft is having fun times with a networking issue.

I'm struggling with 365 Admin and Azure Portal.