Snippet Name: SQL_Server_Health_Check
 
 
 Tags: 
 
 
 Created Date: 2025-06-16 13:12:44
Last Modified Date: 2025-05-31 10:32:01
 
 
 
 
 
 Dependencies: 
 
 
  
 
 DBA Notes: 
 
 
 Sample Output: 
 
 
 
Title: SQL Server Health Check
Author: user
I am doing a SQL Server health check help me with teh categories of things to review. for example, I have Performance, Server Configuration, SSRS, SSIS among others.
Author: ChatGPT
For a **comprehensive SQL Server health check**, here are the key categories along with specific items to review within each:
---
### 1. **Performance**
   - **Server Performance**  
     - CPU usage and bottlenecks.  
     - Memory allocation (Page Life Expectancy, Buffer Cache Hit Ratio).  
     - Disk I/O metrics (Disk Queue Length, Read/Write Latency).  
     - Query Execution Plans (Identify slow queries, missing indexes).  
   - **Database Performance**  
     - Index fragmentation and maintenance.  
     - Statistics up-to-date status.  
     - Wait stats analysis (e.g., `sys.dm_os_wait_stats`).  
     - TempDB contention and configuration (files, growth settings).  
   - **Workload Monitoring**  
     - Query execution time and duration.  
     - Long-running transactions.  
     - Blocking and deadlocks.  
---
### 2. **Server Configuration**
   - **Instance Settings**  
     - Max Server Memory (`sp_configure`).  
     - Max Degree of Parallelism (MAXDOP) and Cost Threshold for Parallelism.  
     - Default fill factor and autogrowth settings.  
     - Priority Boost (should be disabled).  
     - CLR Integration and XP_CMDSHELL settings.  
   - **Service Accounts**  
     - Verify SQL Server and Agent service accounts (principle of least privilege).  
   - **Security**  
     - Authentication Mode (Windows vs. Mixed).  
     - Unused logins and orphaned users.  
     - SA account status (disable if not used).  
---
### 3. **Database Integrity and Availability**
   - **DBCC CHECKDB**  
     - Verify database integrity regularly.  
   - **Backups**  
     - Review recovery model (Full, Simple, or Bulk-Logged).  
     - Verify full, differential, and transaction log backups are running.  
     - Check backup history and last backup times.  
   - **High Availability/Disaster Recovery (HA/DR)**  
     - Status of AlwaysOn Availability Groups or Log Shipping.  
     - Failover Cluster or Database Mirroring configurations.  
     - Verify database replicas are in sync.  
     - RPO (Recovery Point Objective) and RTO (Recovery Time Objective) alignment.  
---
### 4. **SQL Server Agent**
   - **Job Management**  
     - Review enabled/disabled jobs and schedules.  
     - Check for job failures (last run status).  
   - **Alerts and Operators**  
     - Review configured alerts (e.g., severity level errors).  
     - Ensure notifications for failures are active.  
---
### 5. **SSRS (SQL Server Reporting Services)**
   - **SSRS Configuration**  
     - Verify `RSReportServer.config` settings.  
     - URL reservations for Report Manager and Web Service.  
   - **Performance and Logs**  
     - Analyze SSRS Execution Log for slow reports.  
     - Monitor report subscriptions and data refresh schedules.  
   - **Security**  
     - Review permissions (folder, data source access).  
     - Ensure service accounts are configured securely.  
---
### 6. **SSIS (SQL Server Integration Services)**
   - **Package Validation**  
     - Check for package failures and error logs.  
     - Ensure SSISDB is healthy (if using the SSIS Catalog).  
   - **Configuration**  
     - Package execution schedules and resource usage.  
     - Review deployment methods and version control.  
   - **Performance**  
     - Monitor SSIS packages for bottlenecks (source/destination delays, memory issues).  
---
### 7. **Security**
   - **Logins and Users**  
     - Review logins, roles, and users with elevated privileges.  
     - Remove unused or orphaned logins.  
   - **Encryption**  
     - Transparent Data Encryption (TDE) enabled where required.  
     - Column-level encryption if sensitive data exists.  
   - **Auditing**  
     - Review SQL Server Audit logs for critical events.  
     - Check for unauthorized login attempts.  
---
### 8. **Index and Storage Management**
   - **Index Health**  
     - Analyze index fragmentation (use `sys.dm_db_index_physical_stats`).  
     - Identify unused indexes and missing indexes.  
   - **Storage and File Sizes**  
     - Check database file sizes and autogrowth settings.  
     - Monitor space usage in database and log files.  
   - **TempDB**  
     - Multiple data files with uniform growth.  
     - Review TempDB usage and contention.
---
### 9. **Monitoring and Logs**
   - **SQL Server Logs**  
     - Check for errors, warnings, and startup issues.  
   - **Extended Events**  
     - Set up event sessions to capture slow queries, deadlocks, etc.  
   - **System Health Session**  
     - Analyze built-in `system_health` Extended Events for anomalies.  
---
### 10. **Patching and Version Management**
   - **SQL Server Version**  
     - Check for latest Cumulative Updates (CUs) and Service Packs.  
     - Validate SQL Server version compatibility with applications.  
   - **Windows Updates**  
     - Ensure the OS hosting SQL Server is up-to-date.  
---
### 11. **Licensing and Compliance**
   - **License Usage**  
     - Check licensing mode (e.g., Core-based, CAL-based).  
   - **Compliance**  
     - Ensure GDPR, HIPAA, or other regulatory compliance is met.
---
### 12. **Extended Services**
   - **Linked Servers**  
     - Validate linked server connections and credentials.  
   - **Replication**  
     - Check status and health of replication (Transactional, Snapshot, or Merge).  
   - **Full-Text Search**  
     - Ensure catalogs are up-to-date and queries are efficient.
---
If you want **queries, scripts, or checklists** for any of these categories, let me know!
Author: user
can you give me a query related to Page Life Expectancy
Author: ChatGPT
Here is a **query** to check the **Page Life Expectancy (PLE)** for each buffer node in SQL Server. Page Life Expectancy is a critical performance metric indicating how long pages stay in memory before being flushed to disk. A low PLE can signal memory pressure.
### Query for Page Life Expectancy
```sql
SELECT 
    RTRIM(instance_name) AS [NUMA Node],
    cntr_value AS [Page Life Expectancy (Seconds)]
FROM sys.dm_os_performance_counters
WHERE counter_name = 'Page life expectancy'
    AND OBJECT_NAME LIKE '%Buffer Manager%';
```
---
### **How to Interpret Results**
- **Threshold**: The general rule of thumb is **PLE > 300 seconds** (5 minutes) per NUMA node.
- **Low PLE**: If PLE drops significantly below 300, it often indicates memory pressure due to heavy read operations or insufficient memory.
---
### **Additional Notes**
- If your server has multiple **NUMA nodes**, this query will return a PLE value for each node.
- Modern systems may require a higher threshold depending on memory size. For example:
  - Servers with **64 GB RAM or more** might aim for **PLE > 1,200 seconds** (20 minutes).
Author: user
how about a query for Buffer Cache Hit Ratio
Author: ChatGPT
Here???s a **query to check the Buffer Cache Hit Ratio** in SQL Server. This metric represents the percentage of pages read from the buffer cache instead of disk. A high ratio (close to 100%) indicates good memory performance.
### **Query for Buffer Cache Hit Ratio**
```sql
SELECT 
    (1.0 - (a.cntr_value * 1.0 / b.cntr_value)) * 100 AS [Buffer Cache Hit Ratio (%)]
FROM 
    sys.dm_os_performance_counters a
JOIN 
    sys.dm_os_performance_counters b 
    ON a.object_name = b.object_name
WHERE 
    a.counter_name = 'Cache Pages' 
    AND b.counter_name = 'Page Reads/sec'
    AND a.object_name LIKE '%Buffer Manager%';
```
---
### **How to Interpret Results**
- **Ideal Value**: Close to **100%**. A value above **95%** is considered healthy.
- **Low Ratio**: If below **90%**, it may indicate:
   - Insufficient memory for the workload.
   - High I/O due to inefficient queries or index fragmentation.
   - A need to review indexing and query performance.
---
### **Notes**
- If your system has **NUMA nodes**, this ratio should ideally hold true for each node.
- You can cross-reference this with **Page Life Expectancy** to confirm memory pressure.
Let me know if you need scripts for monitoring other SQL Server performance metrics! ????