This page last changed on Jul 20, 2009 by mmcgarry.
HQ Health
Topics marked with * relate to HQ Enterprise-only features.
This screen provides real-time diagnostic data for the machine on which the HQ Server is installed. This data is mostly useful for HQ Support to troubleshoot problems with the system and is not intended for customers to use on their own. This screen is available to Administrators only.
This screen displays statistics from HQ's internal caches. For information about modifying cache settings, see Configure HQ Cache Settings for Improved Performance.
Feedback is welcome. Click Add Comment at the bottom of the page.
Tasks Available on This Screen
On this screen, users can do one thing: print the displayed diagnostic data.
To print the diagnostic data:
- Click Print, located under the statistics displayed in the top of the page, in the center.
Gathering all the data to be printed might take a few moments. When that is done, a simple text display of the statistics are displayed in a browser window, from which you can easily print.
back to top
Sections on this Screen
The top of the screen displays standard health statistics for a server. Below are three tabs that display more data about the state of the server and database:
Tab |
What Data It Displays |
Diagnostics |
This tab contains the information that HQ prints to log files every 15 minutes. You can choose these diagnostics from the drop-down list.
- Batch Aggregate AvailabilityInserter*: Status of queue for availability updates to database. Available only if using the Batch Aggregate inserter.
- Batch Aggregate DataInserter*: Status of queue for metric updates to the database. Available only if using the Batch Aggregate inserter.
- Event Tracker Stats: Shows minimum, maximum, and average invocation times, and the number of invocations since since the HQ server last started.
- EhCache: Size and hit ratio of HQ caches.
- Metric Reports Stats: A running average of how fast metrics are being pushed into the database.
- ZEvents: Status of the internal BUS.
|
Cache |
The same data (less the "Bytes" column) displayed in the EhCache display in the Diagnostics tab, but in a more easily viewable format. This tab's data can be sorted. |
Load |
Current load on HQ Server, including:
- Metrics collected per minute
- Platforms
- CPUs
- Agents
- Active Agents
- Servers
- Services
- Applications
- Roles
- Users
- Alerts
- Resources
- Resource Types
- Groups
- Escalations
- Active Escalations
|
Database |
- Actions:
- Purge AIQ Data - In HQ 4.1 and later, deletes the contents of the auto-discovery queue. This is useful if the queue contains resources that for some reason cannot be imported. Deleting resources from the queue will cause the agent to re-discover them.
- Purge Stalled Executions - In HQ 4.1 and later, deletes escalations that are stall for some reason.
- Queries:
- AutoInventory IPs
- AutoInventory Platforms
- AutoInventory Servers
- Orphaned AlertDef Count (See Orphaned Database Rows)
- Orphaned Audit Count (See Orphaned Database Rows)
- Stalled Escalation s- In HQ 4.1 and later.
- Orphaned Group Count (See Orphaned Database Rows)
- Postgres Locks - In HQ 4.1 and later.
- Postgres Activity - In HQ 4.1 and later.
- Active But Disabled Resource Alert Defs
- Resource Type Alert Defs with Triggers
- Database Version Information
|
Agents |
For each HQ Agent connected to the HQ Server, the following information is listed:
- FQDN - of the machine it runs on; this is the identifier of the monitored platform in HQ.
- Address - The IP address upon which the agent listens for server communications.
- Port - The port on the agent's listen address upon which it listens for server communications. Be default, the listen port is 2144.
- Version - HQ version
- Build Version - HQ build
- Creation Time - When the platform where the agent runs was first added to HQ inventory.
- # Platforms - The number of platforms the agent is monitoring. Typically this value is "1", indicated that the agent is monitoring only the platform where it runs. The value is greater, if the agent is also monitoring an an agentless device, for instance an SNMP device.
- # Metrics - This is the number of metrics the agent collects. Hyperic recommends balancing the metric collection load across agents. For example, don't use a single agent to monitor every SNMP devices in your network - this would constitute a single point of failure, and the metric load might downgrade the performance of other services running on the host. .
- Time Offset (ms) - The system time offset between HQ Server and HQ Agent. Time synchronisation on HQ Server and HQ Agents is very important to determine the availability of platforms and services correctly. Single or double digit values are okay. Higher values indicate a problem. In this case, set up NTP-daemons on your server and agent hosts. You can monitor the NTP-daemons and set an alert on the offset value
- License count
|
Orphaned Database Rows
Interrupted database updates can result in orphaned rows in the HQ database. Orphaned rows can cause HQ exceptions. For example:
- If the HQ database contains alert definitions that are no longer associated with a resource, trying to edit edit a resource type alert can result in a stack trace similar to:
org.hyperic.hq.events.AlertConditionCreateException:
org.hyperic.hq.measurement.MeasurementNotFoundException: No measurement found for 10288 with template 33434 at org.hyperic.hq.bizapp.server.session.EventsBossEJBImpl.updateAlertDefinition(EventsBossEJBImpl.java:887)
- If the HQ database contains orphaned resources, an exception like this may result:
java.rmi.ServerException: RuntimeException; nested exception is: org.hyperic.hq.common.SystemException: javax.ejb.TransactionRolledbackLocalException: Error updating platform with new AIServer data.; CausedByException is: Error updating platform with new AIServer data. at org.jboss.ejb.plugins.LogInterceptor.handleException(LogInterceptor.java:386) at org.jboss.ejb.plugins.LogInterceptor.invoke(LogInterceptor.java:196) at
The HQ Health page provides orphaned item queries for alert definitions, resources, groups, and audits.
If these queries indicate that there are any orphaned rows in the HQ database, contact Hyperic support for assistance in removing the orphaned rows.
Navigating via the Masthead Menu
Main screens in Hyperic contain a masthead menu. Users can navigate to the following parts of HQ using this menu:
Dashboard |
Displays the Dashboard, the starting screen in HQ, which displays information about resource health, recent alerts, recently performed auto-discovery scan, and control actions |
Resources |
Click the menu name to access either Browse (takes users to the Browse Resources screen, where they can locate and navigate through information about all managed resources), Currently Down (takes users to the Currently Down screen, where they can look at all managed resources that are unavailable), Nagios Availability (available only when Nagios is installed), or Recently Viewed (provides a drop-down list of resources that the user has recently viewed. Selecting one of those resources takes the user to the Current Health screen for the resource). |
Analyze |
Click the menu name to access Reporting * (presents the Reporting screen, from which reports can be generated), Alert Center (presents the Alert Center, which displays a deployment wide alert summary), or Event Center (presents the Event Center, which displays a deployment-wide event summary). |
Administration |
Takes users (with "Administer Hyperic HQ Server Configuration" permissions) to the Administration screen, where they can manage the system-wide settings |
Search |
The Search box in the upper right of the Masthead allows you to search for HQ resources of any type and HQ users.
Search results appear after four characters are entered in the text box. Results will include the first 10 platforms, servers, services, and groups whose name includes the search string.
Double-click a resource in the list to navigate to its Resource page. |
The masthead also displays the two most recently triggered alerts. Users can click the alert's time to be taken to the alert detail.
back to top
|