Definition: Architecture Decision Record from where you should specialise the ADR SBBs regarding the Application of Service Monitoring Tools Configuration
Source: ISO/IEC/IEEE 42010:2022
Source reference: https://www.iso.org/standard/74393.html
Additional information: The Application of Service Monitoring Tools Configuration concept refers to the process of configuring service monitoring tools to ensure that they can effectively monitor and manage the performance of IT services. This involves setting up the tools to collect data on key performance indicators (KPIs) such as response time, availability, and throughput, and configuring alerts and notifications to ensure that IT teams are alerted to any issues that arise. The goal of this concept is to enable IT teams to proactively identify and address issues before they impact end-users, thereby improving the overall performance and reliability of IT services.
Example: Network Optimization:
Decision: The GitHub alertmanager receiver can easily be configured and operated to function with Prometheus alerts. It automatically creates issues in GitHub repositories for any active alerts being fired, making it visible for any user to track
All communication/updates/concerns related to the incident can be easily handled by adding comments in the issues created by the GitHub receiver
Unlike Option 1, there is no additional cost involved
There is no requirement for using JIRA/Slack for incident tracking, which are the only supported options in some of the tools listed in Option 2 (such as Dispatch and Response) In any case that such a requirement surfaces, we can use GitHub bots for different platforms such as GitHub for Slack and Google Chat to notify us of the issues immediately
It is actively being maintained and supported compared to some of the tools in Option 1 (such as Cabot and OpenDuty) which lack community support
Rationale: As we have multiple services/applications deployed and monitored in the Operate First environment (ex. Jupyterhub, Argo, Superset, Observatorium, Project Thoth, AICoE CI pipelines etc), we need to implement an incident reporting setup for handling outages/incidents related to these services.
LOST view: Digital Solution Architecture Decisions Catalogue view
Identifier: http://data.europa.eu/dr8/egovera/ApplicationOfServiceMonitoringToolsConfigurationGoal
EIRA traceability: eira:DigitalSolutionArchitectureDecisionGoal
ABB name: egovera:ApplicationOfServiceMonitoringToolsConfigurationGoal
EIRA concept: eira:ArchitectureBuildingBlock
Last modification: 2023-06-15
dct:identifier: ADR-20230515180947662
dct:title: Architecture Decision Record about Application of Service Monitoring Tools Configuration
eira:adr_context: The context explains why we need to make a decision. It also describes the alternatives along with the pros and cons.
eira:adr_decision: The decision describes the justification for why the particular solution was accepted. It has more emphasis on the why rather than the how.
eira:adr_status: [Proposed (under review)|Accepted (approved and ready for implementation)|Superseded (superseded by another decision)]
eira:adr_consecuences: The consequences section contains information about the overall impact of an architectural decision. Every decision has trade-offs. That’s why it’s crucial to include the analysis to provide a clear picture.
|
|
dct:modified | 2024-01-28 |
dct:identifier | ADR-20230515180947662 |
dct:title | Architecture Decision Record about Application of Service Monitoring Tools Configuration |
skos:example | Network Optimization:
Decision: The GitHub alertmanager receiver can easily be configured and operated to function with Prometheus alerts. It automatically creates issues in GitHub repositories for any active alerts being fired, making it visible for any user to track
All communication/updates/concerns related to the incident can be easily handled by adding comments in the issues created by the GitHub receiver
Unlike Option 1, there is no additional cost involved
There is no requirement for using JIRA/Slack for incident tracking, which are the only supported options in some of the tools listed in Option 2 (such as Dispatch and Response) In any case that such a requirement surfaces, we can use GitHub bots for different platforms such as GitHub for Slack and Google Chat to notify us of the issues immediately
It is actively being maintained and supported compared to some of the tools in Option 1 (such as Cabot and OpenDuty) which lack community support
Rationale: As we have multiple services/applications deployed and monitored in the Operate First environment (ex. Jupyterhub, Argo, Superset, Observatorium, Project Thoth, AICoE CI pipelines etc), we need to implement an incident reporting setup for handling outages/incidents related to these services. |
eira:adr_context | The context explains why we need to make a decision. It also describes the alternatives along with the pros and cons. |
eira:adr_decision | The decision describes the justification for why the particular solution was accepted. It has more emphasis on the why rather than the how. |
eira:adr_status | [Proposed (under review)|Accepted (approved and ready for implementation)|Superseded (superseded by another decision)] |
eira:adr_consecuences | The consequences section contains information about the overall impact of an architectural decision. Every decision has trade-offs. That’s why it’s crucial to include the analysis to provide a clear picture. |
eira:concept | eira:ArchitectureBuildingBlock |
eira:definitionSource | ISO/IEC/IEEE 42010:2022 |
eira:definitionSourceReference | https://www.iso.org/standard/74393.html |
skos:note | The Application of Service Monitoring Tools Configuration concept refers to the process of configuring service monitoring tools to ensure that they can effectively monitor and manage the performance of IT services. This involves setting up the tools to collect data on key performance indicators (KPIs) such as response time, availability, and throughput, and configuring alerts and notifications to ensure that IT teams are alerted to any issues that arise. The goal of this concept is to enable IT teams to proactively identify and address issues before they impact end-users, thereby improving the overall performance and reliability of IT services. |
eira:PURI | http://data.europa.eu/dr8/ApplicationOfServiceMonitoringToolsConfigurationGoal |
dct:type | eira:ApplicationOfServiceMonitoringToolsConfigurationGoal |
skos:definition | Architecture Decision Record from where you should specialise the ADR SBBs regarding the Application of Service Monitoring Tools Configuration |
eira:view | Digital Solution Architecture Decisions Catalogue view |
eira:eifLayer | N/A |
skos:broader | http://data.europa.eu/dr8/DigitalSolutionArchitectureDecisionGoal |