| Author | Messages | |
activedirsmaporg
Posts:0
 | | 08/26/2005 1:20 AM |
| Alex,
Unfortunately, only the developer version of eseutil.exe gives out more
info, including a raw hex dump of the page. I'm a little curious, to see
if the tail of 81183, and the head of 81184 look skewed, sometimes we've
seen a disk corruption, where the bytes seem right, just off by several
bytes ... but maybe a probably explanation will present itself by just the
output of the header ...
If you make a copy of the bad database (& logs), before you defrag or
restore, it gives you / us the chance to ask more questions about the
nature of the corruption later ...
Cheers,
BrettSh [msft]
> This posting is provided "AS IS" with no warranties, and confers no
> rights. On Tue, 23 Aug 2005, Al Mulnick wrote:
> Hopefully it's just an index that's taken one for the team.
> Take the advice and ensure that the hardware is solid before
> declaring things well enough to be restored etc. This was the type of
> error in the Exchange world that would bug you till the end. It was
> associated with everything from disk controller settings (battery
> backup) to faulty disks, to transient hardware errors. Tough to
> diagnose, but almost always a hardware error (like >99% of the time)
> was the root cause. Software issues were sometimes to blame
> (misonfigured AV etc) that would take things out but see above for the
> frequency of that.
> The fact that it stays the same is a good thing. The fact that it
> occurred at all is not. Disk or other hardware would be my next
> suspect. All the way down to the motherboard (checked the revs to
> ensure no issues yet?)
> I have to also admit that a restore is not my favorite method if the
> bandwidth can support it. I'd prefer to dcpromo the repaired piece of
> hardware, especially for a smaller DIT. That's just my preference
> though.
> Good luck,
> > Al
> > ________________________________
> > From: ActiveDir-owner@xxxxxxxxxxxxxxxxxx on behalf of Alex Fontana
> Sent: Mon 8/22/2005 9:30 PM
> To: ActiveDir@xxxxxxxxxxxxxxxxxx
> Subject: RE: [ActiveDir] Database Corruption
> > > > ECC memory, no errors in the event logs relating to memory. The ntds.dit is
> about 800MB. There are multiple events, the page number is always the same
> (81184).
> > Haven't fixed it yet - it's limping along until this weekend when I'll dump
> the pages to see what the header shows - then either defrag or restore...
> > -----Original Message-----
> From: ActiveDir-owner@xxxxxxxxxxxxxxxxxx
> [mailto:ActiveDir-owner@xxxxxxxxxxxxxxxxxx] On Behalf Of Brett Shirley
> Sent: Monday, August 22, 2005 10:22 AM
> To: ActiveDir@xxxxxxxxxxxxxxxxxx
> Subject: RE: [ActiveDir] Database Corruption
> > Both Steve, Hunter's, and your original advice is sound ... I think it is
> very likely if you call PSS, they'll tell you to do Steve's, yours, and
> Hunter's advice in about that order.
> > My favorite disk sub-system diagnostics is jetstress, but dedicated disk
> sub-system stressers are better, as they try odd patterns of bits that
> they know buses, electrical systems, and disks get fouled up on. Also do
> not ignore RAM checkers, that is almost as likely, perhaps even more
> likely here.
> > Do you have ECC or parity memory? Any events in system or app event log
> related to parity memory issues?
> > BTW, how big is your ntds.dit file? Is it over 1.5-2.5 GBs? That
> increases the hypothesis of memory issues.
> > So you have multiple of these events? If you do, do they always happen
> for the same page numbers ("pgno") and offsets? If different, does thier
> frequency increase?
> > If you haven't restored it already, I'd be curious if you felt like
> sharing, what the page looked like from:
> esentutl /m ntds.dit /p81184 /v
> ... then we could see how bad the header was corrupted. Also this will
> tell you if the page is an "Index page", and thus likely to be fixed by an
> offline defrag. If you see "primary" or "long value" page, offline defrag
> probably won't fix it.
> > Also get the previous page too (change 81184 to 81183 in the above
> command). But again, only if you feel like sharing.
> > Cheers,
> BrettSh
> > This posting is provided "AS IS" with no warranties, and confers no
> rights.
> > > > On Sat, 20 Aug 2005, Coleman, Hunter wrote:
> > > I'd also look at running hardware diagnostics, particularly on the
> > disk subsystem and controller. No point in restoring or repromoting if
> > there is an unresolved hardware problem.
> > > > -----Original Message-----
> > From: ActiveDir-owner@xxxxxxxxxxxxxxxxxx on behalf of Steve Linehan
> > Sent: Fri 8/19/2005 8:18 PM
> > To: ActiveDir@xxxxxxxxxxxxxxxxxx
> > Cc:
> > Subject: RE: [ActiveDir] Database Corruption
> > > > Well the first thing I always recommend is to try an offline
> > defrag as it is possible that the corruption is in an index, i.e.
> > metadata, that can be rebuilt. If the offline defrag fails then
> > restoring from backup or repromoting will be your next step.
> > > > Thanks,
> > -Steve
> > _____
> > > > From: ActiveDir-owner@xxxxxxxxxxxxxxxxxx
> [mailto:ActiveDir-owner@xxxxxxxxxxxxxxxxxx] On Behalf Of Ayers, Diane
> > Sent: Friday, August 19, 2005 6:43 PM
> > To: ActiveDir@xxxxxxxxxxxxxxxxxx
> > Subject: RE: [ActiveDir] Database Corruption
> > > > My preferred approach would be to demote the box to member
> > server and re-promote to a domain controller to ensure a good fresh
> > copy of the DIT. YMMV as the specific requirements at your location
> > may prevent this. We have only run into this once early in our AD
> > days and this was the approach we used with good success.
> > > > Diane
> > _____
> > > > From: ActiveDir-owner@xxxxxxxxxxxxxxxxxx
> [mailto:ActiveDir-owner@xxxxxxxxxxxxxxxxxx] On Behalf Of Alex Fontana
> > Sent: Friday, August 19, 2005 3:29 PM
> > To: ActiveDir@xxxxxxxxxxxxxxxxxx
> > Subject: [ActiveDir] Database Corruption
> > > > Started getting the error below a few weeks ago on one of our
> > DCs. My first reaction is to run a non-auth restore from a day before
> > this started happening and let replication take care of everything
> > else. Any reason NOT to do this? IEUR(tm)m concerned that this may
> > happen again and wasnEUR(tm)t able to find anything specific to the error
> > below. Besides calling PSS any thing else I should look into before
> > restoring? This box holds all FSMO roles, Win2k3, server for NIS.
> > > > TIA
> > -alex
> > > > > > Event Type: Error
> > Event Source: NTDS ISAM
> > Event Category: Database Page Cache
> > Event ID: 475
> > Date: 8/19/2005
> > Time: 2:00:24 PM
> > User: N/A
> > Computer: DC
> > Description:
> > > > NTDS (528) NTDSA: The database page read from the file
> > "C:\WINNT\NTDS\ntds.dit" at offset 665067520 (0x0000000027a42000) for
> > 8192 (0x00002000) bytes failed verification due to a page number
> > mismatch. The expected page number was 81184 (0x00013d20) and the
> > actual page number was 2349964126 (0x8c119b5e). The read operation
> > will fail with error -1018 (0xfffffc06). If this condition persists
> > then please restore the database from a previous backup. This problem
> > is likely due to faulty hardware. Please contact your hardware vendor
> > for further assistance diagnosing the problem.
> > > > > > > > > > List info : http://www.activedir.org/List.aspx
> List FAQ : http://www.activedir.org/ListFAQ.aspx
> List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/
> .+-wi0-+?@Bm+v*?E?rzm Vry&-4ibb
> >
List info : http://www.activedir.org/List.aspx
List FAQ : http://www.activedir.org/ListFAQ.aspx
List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/ | | | |
|
|