Global BSOD

Aaaaaaaaaanyway, at the risk of interrupting the squabbling, Crowdstrike's response is here:

Interesting reading, and I suspect it will feature prominently in the coming lawsuits. Surely, it is negligent for a cybersecurity company on the scale of Crowdstrike to be pushing these "Rapid Response Content" updates without using canaries, or even manual local testing? The other issues are, I think, understandable. They should have better coded the handling of these content files better, and they should have had better automated testing, and they should probably have realised that faults in these files could potentially be devastating and designed the code to gracefully recover if it ever happened - but no code is faultless, and hindsight is 20:20, etc.
 
Interesting reading, and I suspect it will feature prominently in the coming lawsuits. Surely, it is negligent for a cybersecurity company on the scale of Crowdstrike to be pushing these "Rapid Response Content" updates without using canaries, or even manual local testing? The other issues are, I think, understandable. They should have better coded the handling of these content files better, and they should have had better automated testing, and they should probably have realised that faults in these files could potentially be devastating and designed the code to gracefully recover if it ever happened - but no code is faultless, and hindsight is 20:20, etc.
100% agree. If Crowdstrike survive- and I hope they do, I think the product is good, it's the process that's bad- then I'd at least expect a configurable ability for customers to N-X and canary definition updates the same way we can for agent updates.

The risk is of course that customers may be behind on protection against emerging threats, but at least that's a risk we can manage ourselves.
 
Last edited:
but no code is faultless, and hindsight is 20:20, etc.

Hindsight is 20:20 especially with security, but almost no one seems to have any vision any more, probably doesn't help that the newer generation of coders aren't the pioneers some of the older ones were and so don't have the experiences learnt along the way but still.
 
Hindsight is 20:20 especially with security, but almost no one seems to have any vision any more, probably doesn't help that the newer generation of coders aren't the pioneers some of the older ones were and so don't have the experiences learnt along the way but still.

Yeah, I dunno: it seems like the kind of thing that I'd have thought of if I was writing that kind of code but is that just hindsight? Clearly they don't have a big problem here, because they have a huge installed base and this is a rare event, on the other hand they also managed to brick a Linux distribution with one of their updates not so long ago. I did see a report they'd recently had a big headcount reduction I do wonder whether they fired the people who'd been stopping this happening.

100% agree. If Crowdstrike survive- and I hope they do, I think the product is good, it's the process that's bad- then I'd at least expect a configurable ability for customers to N-X and canary definition updates the same way we can for agent updates.

Agreed.
 
The blame for this lies almost entirely on Crowdstike. Everything else is basically an irrelevance.
Personally speaking i wouldn't go that far, IMO it's 50/50 as what Cs managed to do either shouldn't be possible in the first place or the Windows kernel should automatically unload/not load certain boot drivers if the machine fails to boot a few times.
The risk is of course that customers may be behind on protection against emerging threats, but at least that's a risk we can manage ourselves.
The only issue i have with that is it gives bad actors a window to reverse engineer the protection, basically sign up to the canary release channel so they can discover emerging threats they may not be aware of and exploit them before the wider community is protected.
 
Last edited:
Personally speaking i wouldn't go that far, IMO it's 50/50 as what Cs managed to do either shouldn't be possible in the first place or the Windows kernel should automatically unload/not load certain boot drivers if the machine fails to boot a few times.

The only issue i have with that is it gives bad actors a window to reverse engineer the protection, basically sign up to the canary release channel so they can discover emerging threats they may not be aware of and exploit them before the wider community is protected.

According to the Dave Plummer video (https://www.youtube.com/watch?v=wAzEJxOo1ts) Crowdstrike set their driver to be a boot-start driver, marking it as required driver for Windows to boot. Microsoft use boot-start for their shipped drivers (https://learn.microsoft.com/en-us/windows-hardware/drivers/install/installing-a-boot-start-driver) - I guess they're designed for things like keyboard/mouse drivers, stuff you really do need. And he explains normally windows should detect a bad driver and stop it, but if it's a boot-start one apparently it doesn't.
 
Personally speaking i wouldn't go that far, IMO it's 50/50 as what Cs managed to do either shouldn't be possible in the first place or the Windows kernel should automatically unload/not load certain boot drivers if the machine fails to boot a few times.

Windows does, but some drivers can be marked as essential so it won't try and boot without them.

This is on Crowdstrike.
 
And he explains normally windows should detect a bad driver and stop it, but if it's a boot-start one apparently it doesn't.
Therein lies the problem, a failed boot driver unless it's truly critical shouldn't cause a bug check simply because it fails to load. Even if it does you'd expect a more fault tolerant system to automatically disable the loading of a driver after Xn of failed boots.

Basically you'd expect/hope something like a problematic driver would be handled in a more graceful manner than creating a bug check and leaving people with an unusable system.

That's why i say it's 50/50 IMO because while the problem was caused by Cs they were only able to cause that problem because of how MS/Windows handles, or rather doesn't handle, such problems.
 
Last edited:
Therein lies the problem, a failed boot driver unless it's truly critical shouldn't cause a bug check simply because it fails to load. Even if it does you'd expect a more fault tolerant system to automatically disable the loading of a driver after Xn of failed boots.

Basically you'd expect/hope something like a problematic driver would be handled in a more graceful manner than creating a bug check and leaving people with an unusable system.

That's why i say it's 50/50 IMO because while the problem was caused by Cs they were only able to cause that problem because of how MS/Windows handles, or rather doesn't handle, such problems.
Would that be any better? Imagine if a bad actor could somehow orchestrate the disabling of an EDR solution worldwide. I think I'd prefer the BSOD and the cleanup effort....

As our global head of security semi-joked "At least we were truly protected for a day or two".
 
Would that be any better? Imagine if a bad actor could somehow orchestrate the disabling of an EDR solution worldwide. I think I'd prefer the BSOD and the cleanup effort....
Depends on how you deal with such a situation, you could simply setup dependencies so if X fails to load then don't load N (N = Network so disable that if there's a problem loading the EDR protection).

There's more graceful ways of dealing with faults than a bug check, disable drivers, auto boot to safe mode, drop to a command prompt, etc, etc.
 
Sometimes nothing at all is less insulting than something. My partner works for the NHS and over COVID the hospital she was at said they want to give their staff a gift for all their hard work. They got some flower seeds...

Ouch, yeah the optics sometimes on rewards is really missing.
 
Part of the problem these days is that companies are always hunting maximum profit and things like QA, testing and slower pace of development are the first things to go. Hell, trillion dollar companies leak your data on a semi-regular basis due to genuinely amateur level security issues and the slap on the wrist isn't even remotely approaching the amount of money they will have saved by ignoring it for years. Thats if there are any repercussions.

Programming has a host of best practices but they are rarely adhered to in my experience. None of the companies I have worked at implement proper testing. Admittedly they have all been smallish companies but its usually a case of having the guy at the top screaming to "get it done" and the people between the developers and the top brass just make that **** roll downhill and cut corners wherever possible. Who cares about tomorrow and the fact that after 6 months of this your product is a fragile mess and development is now at a glacial pace because you spend 50% of your time fixing bugs you have introduced due to lack of testing.
 
Programming has a host of best practices but they are rarely adhered to in my experience.
I've often found myself wondering if we need something like health and safety standards for software, a set of rules that must be followed for the protection of the workforce and public.

Problem is the people who would be responsible for ensuring 'best practices', the politicians, often don't have any idea of what's risky or dangerous.
 
I've often found myself wondering if we need something like health and safety standards for software, a set of rules that must be followed for the protection of the workforce and public.

Problem is the people who would be responsible for ensuring 'best practices', the politicians, often don't have any idea of what's risky or dangerous.

The Cambridge Analytica scandal was eye opening. Facebook should have been dead and buried after that but the people responsible for taking them to task could barely understand what decade we are in let alone anything remotely cutting edge tech wise.
 
Back
Top Bottom