Associate
- Joined
- 2 Aug 2005
- Posts
- 9
Hi Guys/Girls,
First proper post here but i'm hoping some of the more experienced debuggers (or users of WinDBG) might be able to help with a major problem we're having.
Basically, the problem we've had recently is on our live web server, which is running Windows 2000 SP4, IIS 5, All Hotfixes, etc, keeps randomly crashing. To be more precise, IIS keeps crashing and therefore all of our sites go down.
Once the server has 'crashed', an IIS Reset is unable to restart IIS, and the only way to get the sites back online is to reboot the server.
This has been a problem for quite some time now, and it's very random, sometimes it happens twice in a week, sometimes twice in a month, sometimes it won't happen for a couple of months, but the problem never goes away.
We host many e-commerce websites, so obviously this is causing a major headache for both us and our customers when the sites are offline.
So, what i've been doing recently is implementing some tools to try and capture what is causing IIS to fail, and one tool i'm using is the IIS Crash/Hang Agent. This has created many logs and dumps from when IIS has failed.
Unfortunately, i don't really understand the information that WinDBG is giving me when i open the dump and type the '!analyze -v' command.
This is what is returned when the analyze command is given:
___________________________________________________
0:000> !analyze -v
*******************************************************************************
* *
* Exception Analysis *
* *
*******************************************************************************
*************************************************************************
*** ***
*** ***
*** Your debugger is not using the correct symbols ***
*** ***
*** In order for this command to work properly, your symbol path ***
*** must point to .pdb files that have full type information. ***
*** ***
*** Certain .pdb files (such as the public OS symbols) do not ***
*** contain the required information. Contact the group that ***
*** provided you with these symbols if you need this command to ***
*** work. ***
*** ***
*** Type referenced: ntdll!_PEB ***
*** ***
*************************************************************************
FAULTING_IP:
+0
00000000 ?? ???
EXCEPTION_RECORD: ffffffff -- (.exr ffffffffffffffff)
ExceptionAddress: 00000000
ExceptionCode: 80000007 (Wake debugger)
ExceptionFlags: 00000000
NumberParameters: 0
BUGCHECK_STR: 80000007
DEFAULT_BUCKET_ID: APPLICATION_FAULT
PROCESS_NAME: DLLHOST.EXE
ERROR_CODE: (NTSTATUS) 0x80000007 - {Kernel Debugger Awakened} the system debugger was awakened by an interrupt.
LAST_CONTROL_TRANSFER: from 7c59a072 to 77f88f13
STACK_TEXT:
0006fd28 7c59a072 0000004c 00000000 00000000 NTDLL!ZwWaitForSingleObject+0xb
0006fd50 7c57b3e9 0000004c ffffffff 00000000 KERNEL32!WaitForSingleObjectEx+0x71
0006fd60 7ce7b194 0000004c ffffffff 000745fc KERNEL32!WaitForSingleObject+0xf
0006fd80 7ce7a991 000745e8 ffffffff 0006fdbf OLE32!CSurrogateProcessActivator::WaitForSurrogateTimeout+0x4f
0006fd9c 01001230 0006ff10 00000000 00072d80 OLE32!CoRegisterSurrogateEx+0x169
0006ff24 010014c6 01000000 00000000 00072d80 DLLHOST!WinMain+0xb0
0006ffc0 7c5989a5 ffffffff 00caef70 7ffdf000 DLLHOST!WinMainCRTStartup+0x156
0006fff0 00000000 01001370 00000000 000000c8 KERNEL32!BaseProcessStart+0x3d
STACK_COMMAND: ~0s; .ecxr ; kb
FAULTING_THREAD: 00000a0c
FOLLOWUP_IP:
DLLHOST!WinMain+b0
01001230 ff1578100001 call dword ptr [DLLHOST!_imp__CoUninitialize (01001078)]
SYMBOL_STACK_INDEX: 5
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: DLLHOST
IMAGE_NAME: DLLHOST.EXE
DEBUG_FLR_IMAGE_TIMESTAMP: 3e7b8905
SYMBOL_NAME: DLLHOST!WinMain+b0
FAILURE_BUCKET_ID: 80000007_DLLHOST!WinMain+b0
BUCKET_ID: 80000007_DLLHOST!WinMain+b0
Followup: MachineOwner
___________________________________________________
I would really appreciate it if anyone could suggest what the details above could possibly mean?
If you need any further details (IE the complete list of details from the log file generated), then i'll be more than happy to post them up.
One thing i noticed that seemed to aggrivate the problem was a script that one of our developers executed prior to a site launch. This script sent out 1400 plain text e-mails containing usernames/passwords for a site, and around 2 hours after the script was executed, the server went down. Incidentally, the same script was run the next day at a different time, and funnily enough, 2 hours after it was run the server went down.
Could it be something to do with e-mails? I believe it was just using the PHP mail() statement.
Anyway, i would really appreciate any help with this so thanks in advance for any suggestions.
Kieran Welch
First proper post here but i'm hoping some of the more experienced debuggers (or users of WinDBG) might be able to help with a major problem we're having.
Basically, the problem we've had recently is on our live web server, which is running Windows 2000 SP4, IIS 5, All Hotfixes, etc, keeps randomly crashing. To be more precise, IIS keeps crashing and therefore all of our sites go down.
Once the server has 'crashed', an IIS Reset is unable to restart IIS, and the only way to get the sites back online is to reboot the server.
This has been a problem for quite some time now, and it's very random, sometimes it happens twice in a week, sometimes twice in a month, sometimes it won't happen for a couple of months, but the problem never goes away.
We host many e-commerce websites, so obviously this is causing a major headache for both us and our customers when the sites are offline.
So, what i've been doing recently is implementing some tools to try and capture what is causing IIS to fail, and one tool i'm using is the IIS Crash/Hang Agent. This has created many logs and dumps from when IIS has failed.
Unfortunately, i don't really understand the information that WinDBG is giving me when i open the dump and type the '!analyze -v' command.
This is what is returned when the analyze command is given:
___________________________________________________
0:000> !analyze -v
*******************************************************************************
* *
* Exception Analysis *
* *
*******************************************************************************
*************************************************************************
*** ***
*** ***
*** Your debugger is not using the correct symbols ***
*** ***
*** In order for this command to work properly, your symbol path ***
*** must point to .pdb files that have full type information. ***
*** ***
*** Certain .pdb files (such as the public OS symbols) do not ***
*** contain the required information. Contact the group that ***
*** provided you with these symbols if you need this command to ***
*** work. ***
*** ***
*** Type referenced: ntdll!_PEB ***
*** ***
*************************************************************************
FAULTING_IP:
+0
00000000 ?? ???
EXCEPTION_RECORD: ffffffff -- (.exr ffffffffffffffff)
ExceptionAddress: 00000000
ExceptionCode: 80000007 (Wake debugger)
ExceptionFlags: 00000000
NumberParameters: 0
BUGCHECK_STR: 80000007
DEFAULT_BUCKET_ID: APPLICATION_FAULT
PROCESS_NAME: DLLHOST.EXE
ERROR_CODE: (NTSTATUS) 0x80000007 - {Kernel Debugger Awakened} the system debugger was awakened by an interrupt.
LAST_CONTROL_TRANSFER: from 7c59a072 to 77f88f13
STACK_TEXT:
0006fd28 7c59a072 0000004c 00000000 00000000 NTDLL!ZwWaitForSingleObject+0xb
0006fd50 7c57b3e9 0000004c ffffffff 00000000 KERNEL32!WaitForSingleObjectEx+0x71
0006fd60 7ce7b194 0000004c ffffffff 000745fc KERNEL32!WaitForSingleObject+0xf
0006fd80 7ce7a991 000745e8 ffffffff 0006fdbf OLE32!CSurrogateProcessActivator::WaitForSurrogateTimeout+0x4f
0006fd9c 01001230 0006ff10 00000000 00072d80 OLE32!CoRegisterSurrogateEx+0x169
0006ff24 010014c6 01000000 00000000 00072d80 DLLHOST!WinMain+0xb0
0006ffc0 7c5989a5 ffffffff 00caef70 7ffdf000 DLLHOST!WinMainCRTStartup+0x156
0006fff0 00000000 01001370 00000000 000000c8 KERNEL32!BaseProcessStart+0x3d
STACK_COMMAND: ~0s; .ecxr ; kb
FAULTING_THREAD: 00000a0c
FOLLOWUP_IP:
DLLHOST!WinMain+b0
01001230 ff1578100001 call dword ptr [DLLHOST!_imp__CoUninitialize (01001078)]
SYMBOL_STACK_INDEX: 5
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: DLLHOST
IMAGE_NAME: DLLHOST.EXE
DEBUG_FLR_IMAGE_TIMESTAMP: 3e7b8905
SYMBOL_NAME: DLLHOST!WinMain+b0
FAILURE_BUCKET_ID: 80000007_DLLHOST!WinMain+b0
BUCKET_ID: 80000007_DLLHOST!WinMain+b0
Followup: MachineOwner
___________________________________________________
I would really appreciate it if anyone could suggest what the details above could possibly mean?
If you need any further details (IE the complete list of details from the log file generated), then i'll be more than happy to post them up.
One thing i noticed that seemed to aggrivate the problem was a script that one of our developers executed prior to a site launch. This script sent out 1400 plain text e-mails containing usernames/passwords for a site, and around 2 hours after the script was executed, the server went down. Incidentally, the same script was run the next day at a different time, and funnily enough, 2 hours after it was run the server went down.
Could it be something to do with e-mails? I believe it was just using the PHP mail() statement.
Anyway, i would really appreciate any help with this so thanks in advance for any suggestions.
Kieran Welch
Last edited: