Application of Service Monitoring Tools Configuration

Application of Service Monitoring Tools Configuration ()

Definition: Architecture Decision Record from where you should specialise the ADR SBBs regarding the Application of Service Monitoring Tools Configuration

Source: ISO/IEC/IEEE 42010:2022

Source reference: https://www.iso.org/standard/74393.html

Additional information: The Application of Service Monitoring Tools Configuration concept refers to the process of configuring service monitoring tools to ensure that they can effectively monitor and manage the performance of IT services. This involves setting up the tools to collect data on key performance indicators (KPIs) such as response time, availability, and throughput, and configuring alerts and notifications to ensure that IT teams are alerted to any issues that arise. The goal of this concept is to enable IT teams to proactively identify and address issues before they impact end-users, thereby improving the overall performance and reliability of IT services.

Example: Network Optimization:
Decision: The GitHub alertmanager receiver can easily be configured and operated to function with Prometheus alerts. It automatically creates issues in GitHub repositories for any active alerts being fired, making it visible for any user to track
All communication/updates/concerns related to the incident can be easily handled by adding comments in the issues created by the GitHub receiver
Unlike Option 1, there is no additional cost involved
There is no requirement for using JIRA/Slack for incident tracking, which are the only supported options in some of the tools listed in Option 2 (such as Dispatch and Response) In any case that such a requirement surfaces, we can use GitHub bots for different platforms such as GitHub for Slack and Google Chat to notify us of the issues immediately
It is actively being maintained and supported compared to some of the tools in Option 1 (such as Cabot and OpenDuty) which lack community support
Rationale: As we have multiple services/applications deployed and monitored in the Operate First environment (ex. Jupyterhub, Argo, Superset, Observatorium, Project Thoth, AICoE CI pipelines etc), we need to implement an incident reporting setup for handling outages/incidents related to these services.

LOST view: Digital Solution Architecture Decisions Catalogue view

Identifier: http://data.europa.eu/dr8/egovera/ApplicationOfServiceMonitoringToolsConfigurationGoal

EIRA traceability: eira:DigitalSolutionArchitectureDecisionGoal

ABB name: egovera:ApplicationOfServiceMonitoringToolsConfigurationGoal

EIRA concept: eira:ArchitectureBuildingBlock

Last modification: 2023-06-15

dct:identifier: ADR-20230515180947662

dct:title: Architecture Decision Record about Application of Service Monitoring Tools Configuration

eira:adr_context: The context explains why we need to make a decision. It also describes the alternatives along with the pros and cons.

eira:adr_decision: The decision describes the justification for why the particular solution was accepted. It has more emphasis on the why rather than the how.

eira:adr_status: [Proposed (under review)|Accepted (approved and ready for implementation)|Superseded (superseded by another decision)]

eira:adr_consecuences: The consequences section contains information about the overall impact of an architectural decision. Every decision has trade-offs. That’s why it’s crucial to include the analysis to provide a clear picture.


dct:modified	2024-01-28
dct:identifier	ADR-20230515180947662
dct:title	Architecture Decision Record about Application of Service Monitoring Tools Configuration
skos:example	Network Optimization: Decision: The GitHub alertmanager receiver can easily be configured and operated to function with Prometheus alerts. It automatically creates issues in GitHub repositories for any active alerts being fired, making it visible for any user to track All communication/updates/concerns related to the incident can be easily handled by adding comments in the issues created by the GitHub receiver Unlike Option 1, there is no additional cost involved There is no requirement for using JIRA/Slack for incident tracking, which are the only supported options in some of the tools listed in Option 2 (such as Dispatch and Response) In any case that such a requirement surfaces, we can use GitHub bots for different platforms such as GitHub for Slack and Google Chat to notify us of the issues immediately It is actively being maintained and supported compared to some of the tools in Option 1 (such as Cabot and OpenDuty) which lack community support Rationale: As we have multiple services/applications deployed and monitored in the Operate First environment (ex. Jupyterhub, Argo, Superset, Observatorium, Project Thoth, AICoE CI pipelines etc), we need to implement an incident reporting setup for handling outages/incidents related to these services.
eira:adr_context	The context explains why we need to make a decision. It also describes the alternatives along with the pros and cons.
eira:adr_decision	The decision describes the justification for why the particular solution was accepted. It has more emphasis on the why rather than the how.
eira:adr_status	[Proposed (under review)\|Accepted (approved and ready for implementation)\|Superseded (superseded by another decision)]
eira:adr_consecuences	The consequences section contains information about the overall impact of an architectural decision. Every decision has trade-offs. That’s why it’s crucial to include the analysis to provide a clear picture.
eira:concept	eira:ArchitectureBuildingBlock
eira:definitionSource	ISO/IEC/IEEE 42010:2022
eira:definitionSourceReference	https://www.iso.org/standard/74393.html
skos:note	The Application of Service Monitoring Tools Configuration concept refers to the process of configuring service monitoring tools to ensure that they can effectively monitor and manage the performance of IT services. This involves setting up the tools to collect data on key performance indicators (KPIs) such as response time, availability, and throughput, and configuring alerts and notifications to ensure that IT teams are alerted to any issues that arise. The goal of this concept is to enable IT teams to proactively identify and address issues before they impact end-users, thereby improving the overall performance and reliability of IT services.
eira:PURI	http://data.europa.eu/dr8/ApplicationOfServiceMonitoringToolsConfigurationGoal
dct:type	eira:ApplicationOfServiceMonitoringToolsConfigurationGoal
skos:definition	Architecture Decision Record from where you should specialise the ADR SBBs regarding the Application of Service Monitoring Tools Configuration
eira:view	Digital Solution Architecture Decisions Catalogue view
eira:eifLayer	N/A
skos:broader	http://data.europa.eu/dr8/DigitalSolutionArchitectureDecisionGoal



Digital Solution Architecture Decisions Catalogue viewpoint



		Digital Solution High Availability and Resilience Architecture Decision Catalogue	Application of Service Monitoring Tools Configuration