The term purge means removing or erase something completely. Data purging is a process or activity mostly heard with Database technologies, i.e. Cleaning up of Database.
There are many different techniques and strategies for data purging, which is often contrasted with data deletion. Deletion is more of a temporary process whereas Purging removes data permanently which in turn frees up the storage and/or memory space for other uses.
Purging process lets you archive the data even though it removed from the main source permanently, giving us an option to retrieve from the archive in case there is a need for it. The deleting process also removes data permanently but doesn’t necessarily involve keeping a backup and generally involves on insignificant amounts of data.
For any organization and in any environment, it is important to keep track of database growth and execute the data purging policies whenever necessary.
When the database size grows, the SQL server needs more memory and CPU to read data from the tables, which slows down the database operation.
Besides that, there is also the risk that, due to the growing database(s), you are running out of disk space.
Database Administrators spend a lot of their time dealing with the problem of database processes consuming too much disk space. So, it’s very important to monitor the database size to ensure the database is not seizing the memory and CPU for smooth operation. Also, it’s essential and necessary to automate the purging process.
The purge process moves data between and deletes data from three categories of data or data sets:
In any BizTalk environment, to purge the SQL database there are two main categories of options available. The most common way is
The purpose of the SQL Agent is to serve as a job scheduler. Many experienced DBAs use jobs running inside the SQL Agent to perform routine tasks such as backups, updating statistics, and rebuilding indexes as needed.
Scheduling jobs is one of the core SQL Server functions. Many businesses have numerous SQL Server jobs scheduled that perform any number of different tasks from database maintenance jobs like backup and index rebuilds to running queries and kicking off ETL (Extract. Transform and Load) tasks.
The SQL Agent is the SQL Server subsystem that’s responsible for automatic task scheduling. The SQL Agent is available in all of the editions of SQL Server except the SQL Server Express edition. Taking advantage of SQL Agent enables you to automate many of your routine IT database infrastructure tasks.
The SQL Server Agent gives you a more specific focus around scheduling specific types of jobs around SQL Server. As the Agent is a part of the cluster resource when a failover occurs as long as the cluster has been set up correctly the SQL Agent job will start once the SQL Server Service has started.
Windows task Scheduler is a component of Microsoft Windows that provides the ability to schedule the launch of programs or scripts at pre-defined times or after specified time intervals. SQL Server Scheduler is created for Agent job scheduling, so using SQL Server Scheduler to schedule the job is the best method.
If the Agent is not running at the time a job is supposed to run, when the Agent is started it will run the job at the next scheduled time. It will not automatically run the job.
SQL Server Agent jobs that are running at the time of a failover event on a SQL Server failover cluster instance do not resume after failover to another failover cluster node. Jobs that begin but fail to complete because of a failover event are logged as started but do not show additional log entries for completion or failure. SQL Server Agent jobs in these scenarios appear to have never ended. So if your server failover, you need to run this job again in the other node.
Windows Services is a core component of the Microsoft Windows operating system and enables the creation and management of long-running processes.
Unlike regular software that is launched by the end-user and only runs when the user is logged on, Windows Services can start without user intervention and may continue to run long after the user has logged off. The services run in the background and will usually kick in when the machine is booted. An option ideal for use on servers when long-running functionality is needed without interference with other users on the same system.
BizTalk360 consists own database. Collecting the necessary data is important to be able to represent the data in BizTalk360. As many of you aware that BizTalk360 is a one-stop solution for BizTalk monitoring. So for monitoring and analytics purposes, we do collect different data from various environments (it collects the information from BizTalk, SQL Server, and Tracking database) and stored in the BizTalk360’s database for future references.
They are the sparkling examples for the data collection within BizTalk360 when you have configured/added more environments (for each environment there will be multiple machines say example a simple multi-server environment consists of 2-BizTalk servers for HA, 1-SQL server. For cluster servers and Load balancing you can imagine the numbers). However, you don’t want unlimited growth of your database.
BizTalk360 comes out of the box with the “Data Purging” feature to able to manage the size of the BizTalk360 database.
There might be a situation where the BizTalk360 database grows due to the amount and size of event log data and performance counter data. The BizTalk360 purging policy will take care of the data growth, but it is also important to monitor the size regularly to ensure BizTalk360 is working seamlessly.
As mentioned, BizTalk360 comes out of the box with the ability to purge the historic data by setting up the purging duration (days to be monitor) after the specified period. The Administrators/Superusers can set up the “Purge duration” under “Settings -> BizTalk360 Health -> Data Purging”. This will control the database growth and hence the performance of BizTalk360 will not get affected.
BizTalk360 uses Windows Services to trigger the purging. BizTalk360 owns two services,
Out of those, purging was created as a sub-service of Monitoring service.
In a regular interval time (for every 2 hours) the “Purge Data – a sub-service of Monitoring service” will purge the data by hitting the Stored Procedures for purging available in the BizTalk360 database.
We have chosen the windows service to utilize the advantages of Sub-service and the services that can be started and paused. As most of you aware that the BizTalk360 services should be always running to collect the data and perform few operations, so we would like to effectively use the service rather than creating SQL job and monitor separately. Also, if any exception appears for the sub-service, the exception will be captured from the log files and it will be shown adjacent to the respective sub-service in the UI.
Based on the values set in the data purge duration, for every 2 hours, the data will be purged using the stored procedures with the trigger of Purge Data sub-service.
Until BizTalk360 v5.0, there was a single stored procedure (sp_b360_admin_PurgePolicies) called from our monitoring service which takes care of all the purging. In v6.0, we improved the logic by splitting it into multiple stored procedures (sp_b360_admin_PurgePolicies) which takes care of various parts of data purging.
Purge Data – a sub-service of Monitoring service from BizTalk360 Settings -> BizTalk360 Health -> Monitoring Service Status -> Purge Data, will help you to understand and find the information that the purging is running for every 2 hours.
If you can compare the started time and Last execution time, for every 2 hours the execution should happen. Even if there is an exception, the exception will be captured in the log files and it will be shown adjacent to the purge data sub-service.
Along with the purging service within BizTalk360, there are few more options to check if the database size is under the expected size.
Whenever the database grows beyond expectations, the first thing to be checked is the standard database reports. This will ensure which table occupies more space. Then we can act on the purging policy and change it according to the business needs and data flow.
Till the latest release, there were 14 Stored Procedures involved in Purging for 32 tables. Each stored procedure will purge multiple tables. Let’s see what the stored procedures are used for purging the corresponding tables.
Let’s see how purge helps. The default purging settings in BizTalk360 can be seen in the below screenshot;
Based on the values/duration provided by the purging policy, the data will be purged. We can see that the purging duration for Data Monitoring is 2 months. Hence the historical data for 2 months will be present in the BizTalk360 database.
Purging needs to be done to remove the historical data, thereby making the database healthy. BizTalk360 purges the data by running the stored procedure in the specified duration specified in the settings. The purging settings can be altered by the customers according to their business needs and data flow. If a large volume of data flows through the ports, they can set the purge duration to a minimum value so that data growth is controlled.
The customer configured 3 environments in a single installation of BizTalk360. In each environment, there were 5 servers available due to BizTalk and SQL server. Hence there were 15 servers to collect data for Advanced Event Viewer and they have set-up 30days to monitor. The Database has grown almost to 300 GB because of the number of servers to monitor.
We have recommended the customer to decrease the purge duration from 1 month to 2days and TRUNCATE the particular table (Since the table belongs to BizTalk360 and it is historical data, it won’t affect the BizTalk server). After modifying the purge duration, the data were reduced to 2 GB started working as expected.
BizTalk360 will purge the data automatically. Sometimes the data collection may be out/beyond our expectation and that leads to a huge volume of data.
Our first suggestion is to reduce the number of purge days. So that the data will be reduced.
Until v6.0, the stored procedure sp_b360_admin_PurgePolicies was used and in recent versions of BizTalk360, you can use the stored procedure sp_b360_admin_CleanupDB to manually purge the database in case of large database growth.
When the number of records in the database grows quickly, call to the stored procedure(s) can start to get timed out from the monitoring services. This will result in data not being cleared and resulting in unusual database growth.
Considering the feedback provided by our customers we have improved and fine-tuned the collection of the data into the BizTalk360 database.
Whenever the database grows beyond expectations, the first thing to be checked is the standard database reports. This will ensure which table occupies more space. Then we can act on the purging policy and change it according to the business needs and data flow.
If you have any questions, contact us at support@biztalk360.com. Also, feel free to leave your feedback in our forum.