We will discuss one of the most important aspects of the Microservices system, which is the logging and monitoring. Without proper logging and monitoring, we are going to have a lot of problems with our system. It might lead to its failure too.
As I said earlier, this is extremely important in Microservices projects even more than monolith. The reason for that is that in Microservices, the flow goes through multiple processes. In this case its difficult to get holistic view of the system or entire flow. Most of the time in traditional monolith, we can examine logs and see what went wrong or what is happening. In this case with Microservices, it is difficult to stitch things together as it involve many processes.
We can look at a specific service and understand if there are any problems with it. Its very difficult to figure out what is going on with all the services working together. All these problems are handled by well-designed logging and monitoring.
Logging vs Monitoring
These two terms are often used interchangeably. Whats the difference between the two?
- Recording the system’s activity. We document what the system did and what the users did and how did the system behave etc.. This is useful for analyzing the system’s behavior and making sure everything is good.
- Auditing -We can see users behavior and see what they did.
- Documenting errors -Logging is the best way to write everything that has to do with errors happening in the system.
- Based on system’s metrics -Monitoring tools look at the various metrics of the system, from infrastructure related metrics such as CPU, RAM, Disk etc.. to application related metrics such as requests per minute, orders per day etc.. These metrics then available for users via sophisticated dashboards.
- Alerting -We can define alerts that will be raised when a specific metric goes outside the normal range. For example, when CUP goes high, and Memory usage is high, we can configure monitoring tools to send alerts to a predefined group of people, so they can handle.
As you can see both should implemented in order to make sure our Microservices system is reliable and stable.
As I said, logging should provide holistic view on the system. It should allow tracing end to end flow. End to end means from the service where the flow originated to the service that ended and with all the services in between included. Logging should contain as much information as possible. So make sure log whatever information you can think of, even things that are first look might seem useless.
For example we definitely want to look at least the timestamp, severity model, message, machine id and may be ip address etc. Then you should be able to filter based on these values.
Logging in Microservices architecture should work quite differently from the traditional logging.
In a traditional logging, if we have, lets say two services, then each on of them has its own logging infra. The service is using its own logging library and it logs to service’s own repository, may be in to file or database. This is easiest way and when our application is composed of a single process, then there is no problem with such implementation.
However, with Microservices, we have some problems implementing the logging like that. As we want to look at the logs and trace an end to end flow, that would be a problem. Also using different logging libraries means we might utilize different logging formats. For example, we may have JSON in our file and plain text in another file. In addition, we can not aggregate easily to see how many error per day, how many DB access per day etc..
In order to overcome such issues, we have to have log data in a single central place where we can run these queries on and can be analyzed.
Below is an example implementation.
In this way, logging can be,
- Can be analyzed
How exactly this approach can be implemented? We have to decide on three main parts,
The recommended approach is to use single library for all the services. You can pick any library you want as long as it is flexible enough and suites the development platform.
This will transport logs from the service to the central logging service. It has quite an important role. Usually we would prefer using a queue for such transport. Queue balances the load and there can be no performance hit on the client side.
This is the central component that receives and stores the log record from the other services in the system. This service is preferably based on an existing indexing and search product. These products are designed from the bottom up to handle a huge amount of data in various formats and make this data accessible using easy to use visualization tools and query language.
Monitoring looks at metrics and detects anomalies. It also provides simplified view of the system status usually using dashboards that presents the important metrics in an easy to grasp format.
In addition, we can set alerts to trigger when there is a problem and relevant people can notify. There are two types of monitoring,
Here we monitors the underlines servers. This means, we usually monitor things such as CPU, RAM, Disk, Network and any other metrics that are infrastructure related. Then we can get alerts when a problem is detected in the infrastructure such as high CPU or high disk IO etc..
Here we monitor the application itself. Application monitoring looks at metrics that are published by the app such as requests per minute, orders per day etc.. We can set alerts to trigger when a problem is detected in the application itself, such as a bug in code or slow down in response time. The data source for application monitoring is usually logs generated by the application and it can be the log database itself or other logging mechanisms such as event log.
The most monitoring products provide both Infrastructure and Application monitoring.
The take over is make sure logging and monitoring is there when you design your application and make sure it is properly implemented and configured :)