Takeaway: Debugging rsyslog and logrotate is easier when using Vagrant and logrotate‘s debug switch (-d).
I’ve been getting my hands greasy with more involvement in Ubuntu server maintenance lately, so I thought I’d share some of my experiences with debugging rsyslog and logrotate.
We’re using an EC2 instance running Ubuntu 12.04 (Precise Pangolin) to receive syslog packets from Heroku and elsewhere. They’re received with rsyslog, and we’re using logrotate to rotate and compress the log files. We’re being a little lazy right now and just letting everything go into /var/log/syslog. So far, that’s been a good choice as the machine only handles log aggregation. Most of the setup is similar to what Heroku recommends.
My first bump in the road was that which Ubuntu 12.04 doesn’t give rsyslog port 514 (the standard syslog port), unlike Ubuntu 10.04. This is because 514 requires root privileges. As a result, the port has to be above 1024, so I chose 1514 to make it easier to remember and know its purpose. It’s not completely clear why Ubuntu changed rsyslog to run under its own user (I would guess security reasons), but there is a bug report which helped me figure this out.
Following that I ran into an issue with getting logrotate to rotate the log files in the way we wanted. We decided that storing 30 days worth of logs, marking the rotation with the date, and compressing older logs would be best for our needs. It sounded easy enough to configure after looking at man logrotate and /etc/logrotate.conf. But it didn’t work! No matter what I tried, only 7 days of logs were retained, although compressing and adding a date extension worked fine.
It turned out to be a simple problem, but it’s more important to know how to debug these problems, I think. Below is what I did.
You probably don’t want to debug configuration problems directly on your production instance, or even on a staging instance. To develop the changes locally, I found that the easiest solution was to use Vagrant. They provide a precise64 box which was perfect for my needs:
From there, you can test logrotate with the -d switch. Point it to your configuration file and then see what it says it will do:
logrotate -d /etc/logrotate.conf
The problem behavior was clearly visible in the output; /var/log/syslog was only being kept for 7 days. Changes to /etc/logrotate.conf did not make a difference for the rotation count (but I could change to doing dateext). Around that time, started poking around in /etc/logrotate.d/, where I found /etc/logrotate.d/rsyslog.
This is the original configuration for /var/log/syslog:
After that point, it was simple to try my changes and retest using logrotate as above:
# NOTE: `dateext` means you can rotate at **most** once a day. Be cafeful setting it globally.
reload rsyslog >/dev/null 2>&1 || true
The debug output then showed it would retain the logs for 30 days. Great! It was then a simple matter of installing the same configuration in production.
While the result is important, I think the bigger takeaway is a process I can use in the future. With Vagrant and logrotate -d /path/to/conf in hand, future configuration changes are easier to develop, test, and deploy.
Summary: Switching to Heroku Postgres? If you use a Crane or Kappa database instance, you might run into problems with using a non-standard port when connecting to PostgreSQL in production. (The problem may not happen in your staging environment if you use the dev plan.) Our app is a Padrino-based web service, but the problem described could happen to any framework/code that only expects standard port numbers. Check to make sure the non-standard port is handled correctly.
I hope this helps someone else in a similar predicament! Please let me know in the comments.
Note: Much of the below is based off of a written discussion we had with Heroku support. Thanks for your help, guys! (I’ve changed usernames, passwords, etc., for obvious reasons.)
The steps went as expected for our staging application, but NOT for the production one. This app runs on the Bamboo stack. We could access the PostgreSQL database via heroku pg:psql HEROKU_POSTGRESQL_OUR_COLOR as expected, but when we did heroku console, we got the following error:
PGError: could not connect to server: Connection refused
Is the server running on host "ec2-12-34-567-890.compute-1.amazonaws.com" and accepting
TCP/IP connections on port 5432?
That confused us, as we would expect bothpsql and console to fail the same way.
We investigated and noticed that HEROKU_POSTGRESQL_OUR_COLOR had a non-standard port (5602):
However, that did NOT make a difference. We also checked config/database.yml, and it seemed to have correct values.
We were left with a production application that could not connect to its database. As a workaround, we promoted the shared database back to being in use:
heroku pg:promote SHARED_DATABASE
We opened a support ticket with Heroku, and later heard back that there were issues with the Heroku-generated config/database.yml. It did not pick up the non-default port number, which is required for Crane. They updated some code on their end to generate the config/database.yml correctly.
We confirmed that the database.yml was created correctly, but unfortunately that wasn’t the entire problem. Our config/database.rb file (used by the Padrino framework) did not handle the non-standard Postgres port either. After realizing that, it was a simple fix to get the database connection working again; we just had to include the port value in the call to ActiveRecord::Base.establish_connection.
(For those keeping score, there was a total of 2 “non-standard port” bugs happening at the same time.)
Out of curiosity, we asked why non-standard port numbers were used. We were told that they’re only used in the Crane and Kappa plans, so one of those would need to be used in staging to have the same setup. The current implementation of Crane and Kappa use a multi-tenant system, so they listen on multiple ports instead of a single port. (Crane and Kappa are the least expensive production database plans.) We were also told the port numbers could change in the future.
Moral of the story: Make sure your app works without depending on the defaults. That’s good advice in general; it would probably be best to test that you don’t depend on tacit default values for anything (databases or otherwise). One way of doing this would be testing with non-standard values when developing.
To summarize, I have an Ubuntu 10.04 instance on EC2 that is running rsyslogd. I’ve also set up the security groups as they describe, and added a syslog drain using a command like heroku drains:add syslog://host1.example.com:514.
I can send messages from the Heroku console to my rsyslogd instance via nc. I see them appear in the log file, so I know there isn’t a firewall/security group issue. However, Heroku does not seem to be forwarding log messages to the server that heroku drains lists. I would expect to see HTTP requests, Rails messages, etc.
Is there something else I can try to do to figure this out? I’m new to rsyslogd, so I could easily be missing something.
I found out what I had done wrong, with the help of Heroku Support. (Thanks guys!)
I had used the wrong security group name (default instead of logplex).
These are the appropriate values for the fields in the AWS Security Group web interface: