When it comes to automation one of your first thoughts is usually Cron. With Cron you can schedule you tasks in many different ways and almost everything is possible, but some things can be tricky. For example, scheduling a job to run on every first Monday of a month. The following line should do the trick, or maybe not?
0 14 1-7 * Mon /bin/task
Reading this in the usual way this should produce: on 14:00 every 1st to 7th day of a month, if this day is a Monday, /bin/task will be executed. Wrong! Take a look at the documentation and you will understand the "correct" behaviour.
$ man 5 crontab
Note: The day of a command’s execution can be specified by two fields — day of month, and day of week. If both fields are restricted
(ie, aren’t *), the command will be run when either field matches the current time. For example,
"30 4 1,15 * 5" would cause a command to be run at 4:30 am on the 1st and 15th of each month, plus every Friday.
So the fields for day of month and day of week are linked in a OR fashion. When either of these conditions is present the job will be executed. To get Cron to do a AND linking with these fields you need to check the day of week in your script. One possible way to do this is this:
0 14 1-7 * * [[ $(date +\%a) = Mo ]] && /bin/task
The "[[…]]" is a shorthand for IF in bash. The “&&” ensures, that the remaining line is only executed if the condition evaluates to TRUE.
As many admins know, logfiles are a cool thing. That is, if they are readable. Take for example the log written by Nagios.
$ grep restarting nagios.log
 Caught SIGHUP, restarting...
 Caught SIGHUP, restarting...
If you want to know the time of the restarts, you must convert the timestamps to a readable time. The Swiss-army-knife Perl and some RegEx know-how can do this easily.
$ grep restarting nagios.log | perl -pe 's/(\d+)/localtime($1)/e'
[Mon Feb 9 09:09:32 2015] Caught SIGHUP, restarting...
[Mon Feb 9 14:01:19 2015] Caught SIGHUP, restarting...
A few days ago a colleague of mine came to me with a strange problem. His server could not write to the /var partition. Every time he got a "No space left on device" error message. Of course he did look at a df output to see if there was really nothing left. And this was what he got.
# df -h /var
504M 309M 171M 65% /var
# touch /var/test
No space left on device
Now, what happens here? At first I noticed the relative small size of the partition. So among the first things that came to my mind were Quotas or Inodes. And Inodes was the way to go. They were all used up and thus no file could be written. Since every new file needs at least 1 Inode to be created. Now that we know the problem, we can search for the root cause. At first we need to find the folder, which contains all these files. A quick way to do this would be this script. It shows the number of files contained in the subdirectories.
for DIR in $(ls /var)
echo -n $DIR" "
find $DIR| wc -l
At the end we found, that pacemaker stores all cluster transitions an separate files. These are located in /var/lib/pengine (pacemaker) or /var/lib/heartbeat/pengine (heartbeat2). Pacemaker does never delete any files. So by time they all pile up, until the disk is full. If you got a /var partition that is multiple gigabytes in size, you usually will never notice. But I think it better to prevent this from ever happening. Both pacemaker and heartbeat2 have Options you can set to specify a maximum number of files, which are kept as history. I think, a reasonable amount of 1000 Files is enough for debugging possible problems. Before setting this you will have to delete all previously created files manually, but from now on Pacemaker will never use more than 1000 files for logs of pengine operations.
crm(live)configure# property pe-error-series-max=1000 pe-input-series-max=1000 pe-warn-series-max=1000
crm_attribute -t crm_config -n pe-error-series-max -v 1000
crm_attribute -t crm_config -n pe-warn-series-max -v 1000
crm_attribute -t crm_config -n pe-input-series-max -v 1000
This sould be at the beginning of every crontab ...
# Example of job definition:
# .---------------- minute (0 - 59)
# | .------------- hour (0 - 23)
# | | .---------- day of month (1 - 31)
# | | | .------- month (1 - 12) OR jan,feb,mar,apr ...
# | | | | .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
# | | | | |
# * * * * * (user) command to be executed