|
|
|
Posted by Aashish Patil (patil_aashish AT emc.com) on December 31, 2006
|
Over the past few years, EMC's software portfolio has been steadily growing. Smarts (System Management Arts) was acquired to provide software and technology that allows you to monitor resources such as networks, routers and locate the source of faults. Events can be generated from these resources and piped to a correlation engine to display root cause alarms to its console.
While a single failure may generate a large number of events, these events represent failures that occur as a consequence of the original failure. Without the Smarts solution, one way to analyze the data is to have a human wade through this mass of events and determine what was the original cause of failure. Another is to define rules to pinpoint the original cause of failure (root cause). However, writing rules does not scale. Rules need to be written for every possible scenario in an environment and rewritten if anything changes in it, which happens quite often in a normal world.
This is where Smarts adds some smarts. Smarts performs analysis on all the events received and determines the root cause of failure instead of bombarding the user with all possible events. In addition, Smarts can initially perform a discovery of an environment and automatically create the correlation logic for detecting failures.
Smarts' "magic potion" consists of a Common Information Model(CIM) and a patented Codebook Correlation Technology.
Note:The proceeding discussion refers to the word 'domain'. In Smarts terminology, an 'environment' under consideration is called a 'domain'(not to be confused with a network domain such as emc.com) and reporting affected elements is called 'impact analysis'. Examples of domains are an 'IP Network', 'VOIP' or an 'an application such as Oracle database or Documentum content server'.
CIM 'models' a domain. For example, in a network, Smarts models the network pipes, routers, switches, ports and so on, including how these elements are interconnected. When Smarts discovers a domain, it maps all the elements of the domain to the model. Thus, Smarts has complete information about a domain mapped to the model of that domain. The discovery process is automatic and does not need human intervention. Not only is the environment discovered but the relations between the different elements are also discovered. This discovery of relationships helps Smarts to detect all affected elements when one of them develops a problem.

Fig - Discovered Network
The codebook correlation technology is layered over the 'model' and it monitors for problems in the domain. When a set of problems is detected, the algorithm detects the root cause of the problem and notifies the person responsible.
Here is an example I found in one of the internal documents that illustrates the Common Information Model and Codebook Correlation Technology really well.
"To give you an idea of how these technologies work together, picture someone who is sick with bad cough and high fever. At the emergency room, doctors are knowledgeable about the human body, in this case, the 'common information model'. But given the set of symptoms, they don't look at the patient's elbow or ankle, instead they take the person's temperature, listen to his heart and lungs, do blood work, take a chest x-ray. Once they confirm a diagnosis of pneumonia, they can prescribe the appropriate treatment.
This is exactly how the Smarts model and Codebook work together. The model stores information about the devices and the problems that can affect them. The Codebook takes that information and looks for specific patterns of symptoms that indicate a serious problem. When we see this set of symptoms then and only then does Smarts generate a root cause event, which we refer to as an authentic problem.
Interestingly, using the example of the patient with pneumonia, imagine if the doctor could determine how that person became ill in the first place, the first sneeze sending the first germs across a room, and how those germs spread from person to person until our patient became sick. That's what our model and our Codebook can do - trace a problem to it's root cause. And because our common information model can map the IT environment end-to-end, we can correlate the impacts of these root cause problems not only within a single technology silo, but across domains to the business level. "
EMC's acquisition of nLayers (and its integration with Smarts), extends this root cause detection technology from networks to applications. nLayers can automatically discover applications in an enterprise based on known application signatures. For example, you could discover all the Documentum repositories that are deployed in your enterprise (and might even be surprised to find repositories that you did not know existed). Below is a screenshot of one of the views displaying discovered applications. In the interest of keeping this article short and readable, let us leave the detailed discussion of application discovery for another day.

Fig - Application Discovery
Many thanks to Alan Z (Editor EMC Developer Network) and Gerard Berthet (EMC Developer Programs) for editing the article for accuracy
Comments / Discussion
|