How do I know if my DAG replication is corrupted?

Corruption is indicated by ContentIndexState showing "Failed", Get-MailboxDatabaseCopyStatus showing "FailedAndSuspended", or Event IDs 4113, 3154, or 474 in the Application log. Running Test-ReplicationHealth will also flag corruption issues.

Should I reseed a database copy or restore from backup?

Reseed from another healthy copy whenever possible—it is faster and uses existing infrastructure. Only restore from backup if: all DAG copies are corrupted, you need point-in-time recovery, or the reseeding repeatedly fails due to source database issues.

How long does database reseeding take?

Reseeding time depends on database size and network speed. A 100GB database over 1Gbps network takes approximately 15-30 minutes. A 500GB database can take 1-2 hours. Use the -Network parameter to specify a dedicated replication network for faster seeding.

Can DAG replication issues cause data loss?

If the active copy fails while passive copies are behind in replication (copy queue length > 0), you may lose the unreplicated transactions. This is why monitoring copy queue length and replay queue length is critical. Best practice is to keep copy queue under 10 logs.

How do I know if my DAG replication is corrupted?

Corruption is indicated by ContentIndexState showing "Failed", Get-MailboxDatabaseCopyStatus showing "FailedAndSuspended", or Event IDs 4113, 3154, or 474 in the Application log. Running Test-ReplicationHealth will also flag corruption issues.

Should I reseed a database copy or restore from backup?

Reseed from another healthy copy whenever possible—it is faster and uses existing infrastructure. Only restore from backup if: all DAG copies are corrupted, you need point-in-time recovery, or the reseeding repeatedly fails due to source database issues.

How long does database reseeding take?

Reseeding time depends on database size and network speed. A 100GB database over 1Gbps network takes approximately 15-30 minutes. A 500GB database can take 1-2 hours. Use the -Network parameter to specify a dedicated replication network for faster seeding.

Can DAG replication issues cause data loss?

If the active copy fails while passive copies are behind in replication (copy queue length > 0), you may lose the unreplicated transactions. This is why monitoring copy queue length and replay queue length is critical. Best practice is to keep copy queue under 10 logs.

DAG Replication Timeout/Corruption

DAG Replication Timeout & Corruption - Fix Guide 2025

Q: What causes DAG replication timeout in Exchange Server?

DAG replication timeouts occur when the passive database copy cannot receive transaction logs from the active copy fast enough. Common causes include network latency/bandwidth issues, disk I/O bottlenecks, storage failures, or the replication network being overloaded with client traffic.

Complete troubleshooting guide for Exchange Server Database Availability Group (DAG) replication timeout and corruption issues. Learn how to diagnose network problems, reseed database copies, and restore high availability in your Exchange environment.

Medha Cloud Exchange Server Team

Exchange Database Recovery Team•January 15, 2025•15 min read

Reading Progress

0 of 9

Database Availability Group (DAG) replication timeout and corruption threaten Exchange high availability by preventing passive database copies from staying synchronized with the active copy. When replication fails, failover protection is lost, and data loss becomes a real risk. This guide shows you how to diagnose replication issues and restore full DAG functionality.

Our Exchange Server DAG team has restored hundreds of DAG environments with zero data loss. This guide provides the same diagnostic and recovery process we use.

Understanding DAG Replication

DAG provides automatic database-level failover for Exchange mailbox databases. The active copy serves client requests while passive copies on other DAG members maintain synchronized copies through continuous log shipping and replay.

DAG Replication Flow

Active Copy

Writes transaction logs

→ Logs →

Passive Copy

Copies & replays logs

Healthy State: Copy Queue Length = 0, Replay Queue Length ≤ 10

Key Replication Metrics

# Check replication health
Get-MailboxDatabaseCopyStatus -Identity "DB01" | Format-List Name,Status,
  CopyQueueLength,ReplayQueueLength,ContentIndexState,LastInspectedLogTime

# Key metrics to monitor:
# CopyQueueLength: Logs waiting to be copied (should be 0-2)-2)
# ReplayQueueLength: Logs waiting to be replayed (should be < 10)
# ContentIndexState: Should be "Healthy"
# Status: Should be "Mounted" (active) or "Healthy" (passive)"Healthy" (passive)

Replication Timeout

Passive copy cannot receive logs fast enough. CopyQueueLength grows continuously. Network or disk I/O bottleneck.

Replication Corruption

Log divergence or database inconsistency detected. Status shows "FailedAndSuspended". Requires reseed to fix.

Symptoms & Business Impact

Replication Timeout Symptoms:

CopyQueueLength steadily increasing (10+, 50+, 100+ logs behind)
Get-MailboxDatabaseCopyStatus shows "Seeding" or "SeedingSource"
Event ID 4113: "The copy of database has fallen behind replication"
Cluster events about heartbeat failures between DAG members

Corruption Symptoms:

Status shows "FailedAndSuspended" or "Failed"
ContentIndexState shows "Failed" or "FailedAndSuspended"
Event ID 3154: "The Active Manager was unable to mount database"
Event ID 474/475: Database corruption detected
Resume-MailboxDatabaseCopy fails repeatedly

⚠️ Critical Business Impact: While replication is broken, you have NO automatic failover protection. If the active copy fails, you risk data loss equal to the copy queue length (each log = ~1MB of transactions). Treat DAG replication failures as high-priority incidents.

Common Causes of DAG Replication Issues

1. Network Latency/Bandwidth (35% of cases)

Most Common Cause: Replication network saturated, high latency between DAG members (should be <1ms), or MAPI network used instead of dedicated replication network.

Identified by: Network traces show packet loss, latency >10ms, or bandwidth saturation

2. Disk I/O Bottleneck (25% of cases)

Storage Issue: Slow disk write performance on passive copy prevents log replay from keeping up. SAN congestion, RAID rebuild, or insufficient IOPS.

Identified by: High disk queue length, Event ID 1018 (database I/O errors)

3. Log File Corruption (20% of cases)

Data Integrity Issue: Transaction log file corrupted during copy or storage failure. Passive copy cannot replay corrupted log.

Identified by: Event ID 454 "Log file signature mismatch", Status "FailedAndSuspended"

4. Content Index Issues (15% of cases)

Search Catalog Problem: Content index database corrupted or out of sync. Often occurs after storage issues or Exchange service crashes.

Identified by: ContentIndexState "Failed", search not returning results

5. Cluster Communication Failure (5% of cases)

Infrastructure Issue: Windows Failover Cluster heartbeat failures, witness server unreachable, or cluster database corruption.

Identified by: Cluster events in System log, DAG members showing as "Down"

Quick Diagnosis: PowerShell Commands

📌 Version Compatibility: This guide applies to Exchange 2016, Exchange 2019, Exchange 2022. Commands may differ for other versions.

Run these commands in Exchange Management Shell (run as Administrator) to identify replication issues:

Step 1: Check All Database Copy Status

# Overview of all database copies
Get-MailboxDatabaseCopyStatus * | Sort-Object DatabaseName | Format-Table `
  DatabaseName, MailboxServer, Status, CopyQueueLength, ReplayQueueLength, ContentIndexState

# Identify problematic copies
Get-MailboxDatabaseCopyStatus * | Where-Object {
    $_.Status -notmatch "Mounted|Healthy" -or
    $_.CopyQueueLength -gt 10 -or
    $_.ContentIndexState -ne "Healthy"
}

What to look for:

Status should be "Mounted" (active) or "Healthy" (passive)
CopyQueueLength should be 0-2 (higher = replication behind)
ReplayQueueLength should be <10 (higher = replay behind)
ContentIndexState should be "Healthy"

Step 2: Run Replication Health Test

# Comprehensive replication health check
Test-ReplicationHealth | Format-Table Server, Check, Result, Error -AutoSize

# Check specific DAG member
Test-ReplicationHealth -Identity EXCH01 | Where-Object {$_.Result -ne "Passed"}

Step 3: Check Network Connectivity

# Test replication network between DAG members
Test-NetConnection -ComputerName EXCH02 -Port 64327 # Replication port

# Check network latency
$servers = (Get-DatabaseAvailabilityGroup).Servers
foreach ($server in $servers) {
    $ping = Test-Connection -ComputerName $server -Count 4
    "$server - Avg: $([math]::Round(($ping.ResponseTime | Measure-Object -Average).Average, 2))ms"$ping.ResponseTime | Measure-Object -Average).Average, 2))ms"
}

Step 4: Check for Recent Errors

# Exchange replication events
Get-EventLog -LogName Application -Source "MSExchangeRepl" -Newest 50 |
    Where-Object {$_.EntryType -eq "Error"} |
    Format-Table TimeGenerated, EventID, Message -AutoSize

# Cluster events
Get-EventLog -LogName System -Source "FailoverCluster*" -Newest 30 |
    Where-Object {$_.EntryType -eq "Error"}

Quick Fix (15 Minutes) - Resume Suspended Copy

⚠️ Only use this if:

Status shows "Suspended" (not "FailedAndSuspended")
CopyQueueLength is manageable (<100 logs)
No corruption indicators in event logs

Solution: Resume and Monitor

Resume Suspended Database Copy

# Check current status
$dbCopy = "DB01\EXCH02"
Get-MailboxDatabaseCopyStatus -Identity $dbCopy | Format-List Status,*Queue*,LastInspectedLogTime

# Resume the copy
Resume-MailboxDatabaseCopy -Identity $dbCopy

# Monitor replication progress (run every 30 seconds)
while ($true) {
    $status = Get-MailboxDatabaseCopyStatus -Identity $dbCopy
    Write-Host "$(Get-Date -Format 'HH:mm:ss') - Copy: $($status.CopyQueueLength) | Replay: $($status.ReplayQueueLength) | Status: $($status.Status)"-Format 'HH:mm:ss') - Copy: $($status.CopyQueueLength) | Replay: $($status.ReplayQueueLength) | Status: $($status.Status)"
    if ($status.CopyQueueLength -eq 0 -and $status.Status -eq "Healthy") {
        Write-Host "Replication caught up!" -ForegroundColor Green
        break
    }
    Start-Sleep -Seconds 30
}

✅ Expected Result:

CopyQueueLength decreases steadily toward 0
Status changes from "Suspended" to "Healthy"
ContentIndexState remains or becomes "Healthy"
No new error events in Application log

Detailed Solution: Reseed Database Copy

If resume fails or status shows "FailedAndSuspended", you need to reseed the database copy. This creates a fresh copy from another healthy source.

⚠️ Important: Reseeding copies the entire database over the network. During this time, the copy provides no failover protection. Schedule reseeding during low-usage periods if possible.

Scenario 1: Reseed from Active Copy

Reseed Database Copy

# Step 1: Suspend the problematic copy
$dbCopy = "DB01\EXCH02"
Suspend-MailboxDatabaseCopy -Identity $dbCopy -Confirm:$false

# Step 2: Remove existing database files (optional, speeds up reseed)
# WARNING: This deletes the local copy - ensure other copies exist!
# Run this on the target server (EXCH02)
$dbPath = (Get-MailboxDatabase DB01).EdbFilePath.PathName
$logPath = (Get-MailboxDatabase DB01).LogFolderPath.PathName
# Remove-Item "$dbPath" -Force"$dbPath" -Force
# Remove-Item "$logPath\*.log" -Force"$logPath\*.log" -Force

# Step 3: Start reseed
Update-MailboxDatabaseCopy -Identity $dbCopy -DeleteExistingFiles

# Step 4: Monitor reseed progress
while ($true) {
    $status = Get-MailboxDatabaseCopyStatus -Identity $dbCopy
    Write-Host "$(Get-Date -Format 'HH:mm:ss') - Status: $($status.Status) | $($status.SeedingProgress)%"-Format 'HH:mm:ss') - Status: $($status.Status) | $($status.SeedingProgress)%"
    if ($status.Status -eq "Healthy") {
        Write-Host "Reseed complete!" -ForegroundColor Green
        break
    }
    Start-Sleep -Seconds 60
}

Scenario 2: Reseed from Specific Source

Reseed from Specific DAG Member

# Use a specific server as seed source (useful when active copy is busy)
$dbCopy = "DB01\EXCH02"
$sourceServer = "EXCH03"  # Another healthy passive copy

Update-MailboxDatabaseCopy -Identity $dbCopy -SourceServer $sourceServer -DeleteExistingFiles

# Or use a specific network for faster seeding
Update-MailboxDatabaseCopy -Identity $dbCopy -Network "DAGNetwork02" -DeleteExistingFiles

Scenario 3: Fix Content Index Only

If database replication is healthy but ContentIndexState is "Failed", you can reseed just the content index:

Reseed Content Index Catalog

# Reseed only the search catalog (much faster than full reseed)
$dbCopy = "DB01\EXCH02"

Update-MailboxDatabaseCopy -Identity $dbCopy -CatalogOnly

# Monitor content index status
Get-MailboxDatabaseCopyStatus -Identity $dbCopy | Select-Object ContentIndexState,ContentIndexErrorMessage

Scenario 4: Network Performance Fix

Configure Dedicated Replication Network

# Check current DAG network configuration
Get-DatabaseAvailabilityGroupNetwork -Identity DAG01 | Format-List Name,Subnets,ReplicationEnabled

# Disable client traffic on replication network
Set-DatabaseAvailabilityGroupNetwork -Identity "DAG01\ReplicationNetwork" -ReplicationEnabled $true -IgnoreNetwork $false

# Enable compression for WAN replication
Set-DatabaseAvailabilityGroup -Identity DAG01 -NetworkCompression Enabled

# Enable encryption for secure replication
Set-DatabaseAvailabilityGroup -Identity DAG01 -NetworkEncryption Enabled

💡 Pro Tip: For large databases (500GB+), use the -ManualResume parameter with Update-MailboxDatabaseCopy to prevent automatic resumption after seeding. This lets you verify the seed completed successfully before enabling replication.

Verify the Fix

After reseeding or resuming, verify full replication health:

Verification Commands

# 1. Check all database copy status
Get-MailboxDatabaseCopyStatus * | Format-Table DatabaseName,MailboxServer,Status,CopyQueueLength,ReplayQueueLength,ContentIndexState

# 2. Run full replication health test
Test-ReplicationHealth | Format-Table Server, Check, Result -AutoSize

# 3. Verify no pending failures
Get-MailboxDatabaseCopyStatus * | Where-Object {$_.Status -notmatch "Mounted|Healthy"}

# 4. Test failover capability (optional - causes brief disruption)
# Move-ActiveMailboxDatabase -Identity DB01 -ActivateOnServer EXCH02 -Confirm:$false-Identity DB01 -ActivateOnServer EXCH02 -Confirm:$false
# Then move back:
# Move-ActiveMailboxDatabase -Identity DB01 -ActivateOnServer EXCH01 -Confirm:$false-Identity DB01 -ActivateOnServer EXCH01 -Confirm:$false

# 5. Check event logs for any new errors
Get-EventLog -LogName Application -Source "MSExchangeRepl" -Newest 20 |
    Where-Object {$_.EntryType -eq "Error"}

✅ Success Indicators:

All copies show Status "Mounted" or "Healthy"
CopyQueueLength = 0 on all passive copies
ReplayQueueLength < 10 on all passive copies
ContentIndexState = "Healthy" on all copies
Test-ReplicationHealth shows all checks "Passed"
Manual failover test succeeds (if performed)

Prevention: Maintain Healthy DAG Replication

1. Monitor Replication Metrics

DAG Replication Monitoring Script

# Set up scheduled monitoring (run every 5 minutes)
$threshold = 10  # Alert if copy queue exceeds this

$badCopies = Get-MailboxDatabaseCopyStatus * | Where-Object {
    $_.CopyQueueLength -gt $threshold -or
    $_.Status -notmatch "Mounted|Healthy" -or
    $_.ContentIndexState -ne "Healthy"
}

if ($badCopies) {
    $body = $badCopies | Format-Table DatabaseName,MailboxServer,Status,CopyQueueLength | Out-String
    Send-MailMessage -To "admin@company.com" -Subject "DAG Replication Alert" `
        -Body $body -SmtpServer "mail.company.com"
}

2. Use Dedicated Replication Network

Configure separate VLAN/subnet for DAG replication traffic
Ensure minimum 1Gbps bandwidth between DAG members
Keep network latency under 1ms for same-site DAG
Enable compression for WAN-based DAG replication

3. Proper Storage Configuration

Use separate physical disks for database and transaction logs
Ensure storage provides consistent IOPS (not burst)
Monitor disk queue length - should be under 20
Plan storage capacity for 25% growth

4. Regular Health Checks

Weekly DAG Health Audit

# Comprehensive DAG health check
Write-Host "=== DAG Health Report ===" -ForegroundColor Cyan

# Check DAG configuration
Get-DatabaseAvailabilityGroup | Format-List Name,WitnessServer,WitnessDirectory

# Check all members
Get-DatabaseAvailabilityGroupServer | Format-Table Name,DatabaseCopyAutoActivationPolicy

# Run replication health
Test-ReplicationHealth | Format-Table Server,Check,Result

# Check for long replay queues
Get-MailboxDatabaseCopyStatus * | Where-Object {$_.ReplayQueueLength -gt 100}

# Verify cluster health
Get-ClusterNode | Format-Table Name,State

DAG Issues Beyond Reseeding?

Complex DAG failures involving cluster quorum issues, split-brain scenarios, or multi-copy corruption require expert intervention to prevent data loss. Our Exchange high availability specialists can restore your DAG and implement monitoring to prevent recurrence.

Get Exchange Server DAG Help

Average Resolution Time: 60 Minutes

Frequently Asked Questions

DAG replication timeouts occur when the passive database copy cannot receive transaction logs from the active copy fast enough. Common causes include network latency/bandwidth issues, disk I/O bottlenecks, storage failures, or the replication network being overloaded with client traffic.

Related Exchange Server Errors

Event ID 4113

Event ID 4113: Database Copy Failure - Fix Guide 2025

DAG database copy failing to replicate. Fix replication lag, reseed database copy, restore redundancy.

Read Fix Guide

Error 0x6D9

Error 0x6D9: Replication Service Not Running - Fix Guide 2025

Replication service stopped, breaking DAG high availability. Restart service, fix dependencies.

Read Fix Guide

Event IDs 2059/2153

Event IDs 2059/2153: DAG Permission Issues - Fix Guide 2025

DAG permission errors preventing cluster operations. Fix computer account permissions, restore DAG health.

Read Fix Guide

View All Exchange Services

Can't Resolve DAG Replication Timeout/Corruption?

Exchange errors can cause data loss or extended downtime. Our specialists are available 24/7 to help.

Emergency help - Chat with us

Medha Cloud Exchange Server Team

Microsoft Exchange Specialists

Our Exchange Server specialists have 15+ years of combined experience managing enterprise email environments. We provide 24/7 support, emergency troubleshooting, and ongoing administration for businesses worldwide.

15+ Years ExperienceMicrosoft Certified99.7% Success Rate24/7 Support

Contact Team Follow on LinkedIn