Global BSOD

Soldato
Joined
29 Aug 2006
Posts
4,161
Location
In a world of my own
I wondered whether it was actually a deployment failure rather than a testing failure. It seems so severe that it surely would have been picked up by even the most cursory of checks.

First question our head of development asked when he saw this was 'Where are their canaries?'.

We upgrade our customers over the Internet too, but do it in stages. Each stage has customers who are randomly designated as Canaries - if their update fails, the whole update schedule is frozen until we figure out why.
 
Last edited:
Caporegime
Joined
19 May 2004
Posts
32,093
Location
Nordfriesland, Germany
We upgrade our customers over the Internet too, but do it in stages. Each stage has customers who are randomly designated as Canaries - if their update fails, the whole update schedule is frozen until we figure out why.

Sensible approach - and even without that, wouldn't you space updates over time? If nothing else so that customers in Sydney and London are both updated during the night, in which case why weren't they able to pull the update before it made it out to 8.5 million PCs?

I'd love to know the details of what went wrong inside the company, and I suspect we eventually will. Governments are likely to get involved if nothing else.
 
Soldato
Joined
14 Jun 2004
Posts
5,715
its one of the big questions at the moment, what went wrong and why.
from internal testing to deployment.
hind sight is great though.
 
Last edited:
Man of Honour
Joined
13 Oct 2006
Posts
91,940
Sensible approach - and even without that, wouldn't you space updates over time? If nothing else so that customers in Sydney and London are both updated during the night, in which case why weren't they able to pull the update before it made it out to 8.5 million PCs?

I'd love to know the details of what went wrong inside the company, and I suspect we eventually will. Governments are likely to get involved if nothing else.

I might be wrong but I believe the update ignored policies and was pushed to everything at once - even where businesses had configured to have some systems ahead of others...
 
Soldato
Joined
29 Aug 2006
Posts
4,161
Location
In a world of my own
I might be wrong but I believe the update ignored policies and was pushed to everything at once - even where businesses had configured to have some systems ahead of others...

Which a Vendor should never be able to do imho. We're in a different boat to CRWD, we supply an out of band network appliance. If an update to that fails we don't break anything, but we still operate as if we could.

I used to work for a vendor who deployed agents to workstations. I remember one deployment we did for a PoC at a large telco - 10,000 devices sent out over SCCM after rigorous testing by their resilience teams, 10% of the devices crashed with a BSOD and we were mortified.
Management demanded an immediate roll back and explanation - our deal looked dead in the water. Then investigation showed that the machine impacted were running an beta version of IBMs version control software and IBM had left .dlls compiled in debug mode in the software - it was them crashing, not us and our bacon was saved. Was brown pants time all round though, I can tell you!
 
Caporegime
Joined
19 May 2004
Posts
32,093
Location
Nordfriesland, Germany
Which a Vendor should never be able to do imho. We're in a different boat to CRWD, we supply an out of band network appliance. If an update to that fails we don't break anything, but we still operate as if we could.

I assume that Crowdstrike pushed this update because they believed it dealt with a very serious threat. Although I don't think they've commented on that yet. The difference with their business area compared to any I've worked in, is that failing to update can mean leaving their clients at risk during that time.
 
Soldato
Joined
14 Jun 2004
Posts
5,715
Sensible approach - and even without that, wouldn't you space updates over time? If nothing else so that customers in Sydney and London are both updated during the night, in which case why weren't they able to pull the update before it made it out to 8.5 million PCs?

I'd love to know the details of what went wrong inside the company, and I suspect we eventually will. Governments are likely to get involved if nothing else.
rememebr this was billed as a definition update as i remember to an AV product those are generally deploy ASAP, but from the posts ive seen it was deployed regadless of the policies set by companies, which is in its self a bit or a worry they have so much control.
 
Man of Honour
Joined
13 Oct 2006
Posts
91,940
I remember one deployment we did for a PoC at a large telco - 10,000 devices sent out over SCCM after rigorous testing by their resilience teams, 10% of the devices crashed with a BSOD and we were mortified.
Management demanded an immediate roll back and explanation - our deal looked dead in the water. Then investigation showed that the machine impacted were running an beta version of IBMs version control software and IBM had left .dlls compiled in debug mode in the software - it was them crashing, not us and our bacon was saved. Was brown pants time all round though, I can tell you!

Difficult to take into account issues like that as I mentioned in a recent post where at work we had BSODs due to similarly about 10% of machines using a device with a substitute chipset which was supposed to be identical but wasn't 100% so, which slipped through testing because head office only had devices with the normal chipset.
 
Back
Top Bottom