Skip to main content

Benjamin Oakes

Photo of Ben Oakes

Hi, I'm Ben Oakes and this is my geek blog. Currently, I'm a Ruby/JavaScript Developer at Liaison. Previously, I was a Developer at Continuity and Hedgeye, a Research Assistant in the Early Social Cognition Lab at Yale University and a student at the University of Iowa. I also organize TechCorridor.io, ICRuby, OpenHack Iowa City, and previously organized NewHaven.rb. I have an amazing wife named Danielle Oakes.

Filtering for the Debugging category. Clear

SSH Agent Forwarding with Vagrant AWS

by Ben

The in-progress Vagrant AWS has a lot of promise, especially for devops. The ability to test your Puppet or Chef scripts on an EC2 instance using Vagrant is very tempting. Unfortunately, it’s not yet quite stable enough to rely on, in my experience. Some errors seem to happen sporadically. Most are related to ssh, although running ssh manually works fine (either vagrant ssh or ssh user@host).

Sometimes, something as simple as mkdir fails without reason:

The following SSH command responded with a non-zero exit status.
Vagrant assumes that this means the command failed!

mkdir -p '/vagrant'

Other times, rsync completes, but then it immediately terminates the instance:

[default] Rsyncing folder: /home/ben/aws-sandbox/ => /vagrant
[default] Terminating the instance...

I’m still hopeful that it can be useful to us in the future. Like I said, there’s a lot of promise in this young project.

At any rate, we took some time to research how to get SSH agent forwarding working, which is valuable for us when remote pairing. We were getting stuck with errors like this:

Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

It turns out that vagrant itself ignores anything but identity files, which was key to getting agent forwarding to work. This can be inspected using vagrant ssh-config

It turns out that lib/vagrant/util/ssh.rb can be modified like so:

--- a/lib/vagrant/util/ssh.rb
+++ b/lib/vagrant/util/ssh.rb
@@ -108,7 +108,7 @@ module Vagrant
         # IdentitiesOnly option. Also, we don't enable it in plain mode so
         # that SSH properly searches our identities and tries to do it itself.
         if !Platform.solaris? && !plain_mode
-          command_options += ["-o", "IdentitiesOnly=yes"]
+          command_options += ["-o", "IdentitiesOnly=no"]
         end
 
         # If we're not in plain mode, attach the private key path.

There’s a related change that can be made to make vagrant ssh-config match, but it seems to be cosmetic:

--- a/templates/commands/ssh_config/config.erb
+++ b/templates/commands/ssh_config/config.erb
@@ -6,7 +6,7 @@ Host <%= host_key %>
   StrictHostKeyChecking no
   PasswordAuthentication no
   IdentityFile "<%= private_key_path %>"
-  IdentitiesOnly yes
+  IdentitiesOnly no
   LogLevel FATAL
 <% if forward_agent -%>
   ForwardAgent yes

That was enough to get our SSH agent forwarding to work. These changes make sense in the context of AWS, but probably not in Vagrant at large. I’m tempted to make a pull request, but the above changes are a little half baked — and vagrant-aws still needs some fine tuning before the change can really be tested.

All about to_h in Ruby 2.0

by Ben

I gave a talk at ICRuby today about Ruby 2.0, partially as a learning experience for myself. I hadn’t done much with Ruby 2.0 before, and I had fun learning more about what to expect. If you’d like to see what I presented, my slides are available.

A lot of what I showed about Ruby 2.0 was a pretty standard overview, but I paid special attention to to_h. I ended up doing some research that I haven’t seen written up elsewhere, and thought I should share it as a blog post as well.

What’s to_h?

It’s a new convention for retrieving a Hash representation of an object. When I was first learning Ruby in 2008, I remember expecting to_h to exist after learning about to_s for String and to_a for Array. In concept, it’s similar to serializable_hash in Rails as well. I’m happy to see this become a part of core Ruby.

What can I do with it?

Now that there’s an official method for getting the “Hash version” of an object, you can start to use it in your methods when using a Hash makes things easier, or you’d like to duck type.

Since to_h is now a part of Ruby’s core and std-lib, I thought I’d see how it’s used.

A quick word about the examples: we tend to use panda and bamboo as metasyntactic variables at Continuity Control, partially because of the great Jonathan Magen. Plus, they’re fun compared to plain old foo and bar.

Core

Searching ruby-doc.org core gave:

…and a handful of aliases called to_hash, which were also present in 1.9.

Here’s what they do:

ENV.to_h                 # => {"TERM"=>"xterm", "SHELL"=>"/bin/bash", ...}
{ panda: 'bamboo' }.to_h # => {:panda=>"bamboo"}
nil.to_h                 # => {}

s = Struct.new(:panda, :bamboo).new
s.to_h                   # => {:panda=>nil, :bamboo=>nil}

Std-lib

Searching ruby-doc.org std-lib gave:

I don’t have any examples for the latter 3, but OpenStruct#to_h is easy to demonstrate:

require 'ostruct'
os = OpenStruct.new(panda: 'bamboo')
os      # => #<OpenStruct panda="bamboo">
os.to_h # => {:panda=>"bamboo"}

Any gotchas?

Not everything about to_h works the way I’d expect.

Arrays

This doesn’t work:

%w(panda bamboo).to_h # => NoMethodError: undefined method `to_h'

I might have expected behavior like this:

Hash['panda', 'bamboo'] # => {"panda"=>"bamboo"}

That would be especially nice, since then you could convert back and forth from Array to Hash:

{ panda: 'bamboo' }.to_a.to_h # => NoMethodError: undefined method `to_h'

…but alas, that’s just not how it works. However, we can try to convince matz otherwise.

Something I found by accident: I screwed up Hash[] the first time and got a bunch of new warnings on STDERR.

Hash[['panda', 'bamboo']]
# (irb):5: warning: wrong element type String at 0 (expected array)
# (irb):5: warning: ignoring wrong elements is deprecated, remove them explicitly
# (irb):5: warning: this causes ArgumentError in the next release

In Ruby 1.9.3, it would print no warnings and simply return {}.

JSON

I also was hoping JSON would take advantage of to_h, since it’s now a part of Ruby’s stdlib.

require 'json'
JSON.generate(ENV)
# /usr/local/lib/ruby/2.0.0/json/common.rb:223:in `generate': only generation of JSON objects or arrays allowed (JSON::GeneratorError)
# from /usr/local/lib/ruby/2.0.0/json/common.rb:223:in `generate'
# from tmp/talk_code.rb:3:in `<main>'

I would have expected something like this:

require 'json'
# NOTE: This doesn't actually work this way.  Blog skimmers take notice!
JSON.generate(ENV) # => "{\"TERM\":\"xterm\",\"SHELL\": ...

Fortunately, you can do this:

require 'json'
JSON.generate(ENV.to_h) # => "{\"TERM\":\"xterm\",\"SHELL\": ...

…but that feels like an excellent use of to_h that should have been a part of JSON.

Enough complaining! Show something useful.

Duck typing is probably the most useful use case I can think of. It’s a good alternative to Hash[] in some cases as well.

to_h

Here’s a simple example. Let’s say I have a method called eat:

def eat(diet)
  "A panda eats #{ diet[:eats] }"
end

…but I want to make sure the diet that is passed in is treated like a Hash. That’s possible now:

def eat(diet)
  "A panda eats #{ diet.to_h[:eats] }"
end

# It works with a Hash
panda_diet = { eats: 'bamboo' }
eat(panda_diet) # => "A panda eats bamboo"

# ...a Struct
panda_diet = Struct.new(:eats).new('shoots and leaves')
eat(panda_diet) # => "A panda eats shoots and leaves"

# ...or even nil
eat(nil) # => "A panda eats "

Hash()

One other addition I noticed (but haven’t seen mentioned elsewhere) is the new Hash() method. It’s kind of like Array() or String() in Ruby 1.9.3 and below. These methods let you coerce values, in a sense.

Here’s an example using Array:

def eat_up(foods)
  # Turns anything into an `Array`:
  #
  #     nil        => []
  #     'bamboo'   => ['bamboo']
  #     ['bamboo'] => ['bamboo']
  #
  Array(foods).each do |food|
    puts eat(eats: food)
  end
end

eat_up(nil)
# [no output]
eat_up('bamboo')
# A panda eats bamboo
eat_up(['bamboo', 'shoots and leaves'])
# A panda eats bamboo
# A panda eats shoots and leaves

That’s actually very useful behavior; a lot of annoying type and error checking just goes away. Ever since I first saw Avdi Grimm present it, I’ve found many uses for it.

The good news is, you can now do something similar with Hash()

def eat(diet)
  diet = Hash(diet)
  "A panda eats #{ diet[:eats] }"
end

panda_diet = { eats: 'bamboo' }
eat(panda_diet) # => "A panda eats bamboo"
eat(nil) # => "A panda eats "

If used in the right situation, that might just be as useful as Array().

But strangely enough, Hash() doesn’t work exactly like to_h:

Hash([]) # => {}

Hash(OpenStruct.new)
# TypeError: can't convert OpenStruct into Hash
#     from (irb):100:in `Hash'
#     from (irb):100
#     from /usr/local/bin/irb:12:in `<main>'

I don’t currently have an explanation, but unless you need specific behavior from Hash(), you may prefer to use to_h.

Slightly less contrived examples

Flexible constructor

If you have a use case for it, to_h could make constructors more flexible:

class Panda
  def initialize(params)
    params = params.to_h

    @name = params[:name]
    @age = params[:age]
    @weight = params[:weight]
  end
end

Flexible constructor with OpenStruct

Or even use an OpenStruct instead, if you’d like:

require 'ostruct'

class Panda
  def initialize(params)
    params = OpenStruct.new(params.to_h)

    @name = params.name
    @age = params.age
    @weight = params.weight
  end
end

OpenStruct conversion

If you felt like it, you could even refactor that into this:

require 'ostruct'

# Convert to an `OpenStruct`
def OpenStruct(hash_like)
  OpenStruct.new(hash_like.to_h)
end

env = OpenStruct(ENV)
env.TERM # => 'xterm'

Reusable to_h definition

You could even parameterize how to define to_h:

# Related to my concept of `to_h` back in 2010: https://github.com/benjaminoakes/snippets/blob/master/ruby/to_h.rb
module ConversionHelper
  def pick(*methods)
    h = {}
    methods.each { |m| h[m] = send(m) }
    h
  end
end

class Panda
  include ConversionHelper

  attr_reader :name, :age

  def initialize(params)
    params = OpenStruct.new(params.to_h)

    @name = params.name
    @age = params.age
    @weight = params.weight
  end

  def to_h
    # Pandas are sensitive about their weight, after all...
    pick(:name, :age)
  end
end

I haven’t decided whether the last few ideas would actually be useful in practice, but these are the types of things that Hash() and to_h open up for Rubyists.

That’s not doing quite what you think…

by Ben

I recently helped an intern at Hedgeye work through a problem with a database query. Because I’m working in a separate timezone, I ended up making suggestions through a GitHub pull request. We discussed and decided that what I wrote was self-contained enough that I should re-post so it can help others.

:conditions => ["event_type != ?", 'LOGIN'||'LOGOUT'],

I don’t think this is doing quite what you think…

'LOGIN' || 'LOGOUT' # => 'LOGIN'

So this turns into:

where event_type != 'LOGIN'

I’m guessing you meant to do:

where event_type != 'LOGIN' or event_type != 'LOGOUT'

But, believe it or not, != is a MySQL proprietary extension to SQL. It would probably be best to use something that’s a part of ANSI SQL:

where event_type <> 'LOGIN' or event_type <> 'LOGOUT'
-- alternative:
where event_type not in ('LOGIN', 'LOGOUT')

Because these are literals (not user-provided values), there’s no point in sanitization using ?.

Conclusion:

:conditions => "event_type not in ('LOGIN', 'LOGOUT')",

database configuration does not specify adapter

by Ben

From an answer I wrote for StackOverflow:

Another possible cause:

In Rails 3.2.x, establish_connection has a default argument set from the environment:

From connection_specification.rb:

def self.establish_connection(spec = ENV["DATABASE_URL"])
  resolver = ConnectionSpecification::Resolver.new spec, configurations
  spec = resolver.spec

The way ConnectionSpecification::Resolver works depends on ENV['DATABASE_URL'] giving a nil if not set. (Normally, it would be something like postgres://...).

So, if you happen to have misconfigured DATABASE_URL such that ENV['DATABASE_URL'] == '', that will give you database configuration does not specify adapter.

Debugging rsyslog and logrotate

by Ben

Takeaway: Debugging rsyslog and logrotate is easier when using Vagrant and logrotate‘s debug switch (-d).

I’ve been getting my hands greasy with more involvement in Ubuntu server maintenance lately, so I thought I’d share some of my experiences with debugging rsyslog and logrotate.

We’re using an EC2 instance running Ubuntu 12.04 (Precise Pangolin) to receive syslog packets from Heroku and elsewhere. They’re received with rsyslog, and we’re using logrotate to rotate and compress the log files. We’re being a little lazy right now and just letting everything go into /var/log/syslog. So far, that’s been a good choice as the machine only handles log aggregation. Most of the setup is similar to what Heroku recommends.

My first bump in the road was that which Ubuntu 12.04 doesn’t give rsyslog port 514 (the standard syslog port), unlike Ubuntu 10.04. This is because 514 requires root privileges. As a result, the port has to be above 1024, so I chose 1514 to make it easier to remember and know its purpose. It’s not completely clear why Ubuntu changed rsyslog to run under its own user (I would guess security reasons), but there is a bug report which helped me figure this out.

Following that I ran into an issue with getting logrotate to rotate the log files in the way we wanted. We decided that storing 30 days worth of logs, marking the rotation with the date, and compressing older logs would be best for our needs. It sounded easy enough to configure after looking at man logrotate and /etc/logrotate.conf. But it didn’t work! No matter what I tried, only 7 days of logs were retained, although compressing and adding a date extension worked fine.

It turned out to be a simple problem, but it’s more important to know how to debug these problems, I think. Below is what I did.

You probably don’t want to debug configuration problems directly on your production instance, or even on a staging instance. To develop the changes locally, I found that the easiest solution was to use Vagrant. They provide a precise64 box which was perfect for my needs:

vagrant box add precise64 http://files.vagrantup.com/precise64.box

From there, you can test logrotate with the -d switch. Point it to your configuration file and then see what it says it will do:

logrotate -d /etc/logrotate.conf

The problem behavior was clearly visible in the output; /var/log/syslog was only being kept for 7 days. Changes to /etc/logrotate.conf did not make a difference for the rotation count (but I could change to doing dateext). Around that time, started poking around in /etc/logrotate.d/, where I found /etc/logrotate.d/rsyslog.

This is the original configuration for /var/log/syslog:

/var/log/syslog
{
        rotate 7
        daily
        missingok
        notifempty
        delaycompress
        compress
        postrotate
                reload rsyslog >/dev/null 2>&1 || true
        endscript
}

After that point, it was simple to try my changes and retest using logrotate as above:

/var/log/syslog
{
        rotate 30
        daily
        # NOTE: `dateext` means you can rotate at **most** once a day.  Be cafeful setting it globally.
        dateext
        missingok
        notifempty
        delaycompress
        compress
        postrotate
                reload rsyslog >/dev/null 2>&1 || true
        endscript
}

The debug output then showed it would retain the logs for 30 days. Great! It was then a simple matter of installing the same configuration in production.

While the result is important, I think the bigger takeaway is a process I can use in the future. With Vagrant and logrotate -d /path/to/conf in hand, future configuration changes are easier to develop, test, and deploy.

How do I embed images inside a GitHub wiki (gollum) repository?

by Ben

After pushing images into the wiki repository (clone, add images, push), you can use relative paths like so:

[[foo.jpg]]

For more info, see the demo wiki’s page on images.

Update (2012-05-02): This is part of my answer on StackOverflow.

How do I do multiple assignment in MATLAB

by Ben

From my question on StackOverflow (CC BY-SA 3.0):

Here’s an example of what I’m looking for:

>> foo = [88, 12];
>> [x, y] = foo;

I’d expect something like this afterwards:

>> x

x =

    88

>> y

y =

    12

But instead I get errors like:

??? Too many output arguments.

I thought deal() might do it, but it seems to only work on cells.

>> [x, y] = deal(foo{:});
??? Cell contents reference from a non-cell array object.

How do I solve my problem? Must I constantly index by 1 and 2 if I want to deal with them separately?

What’s the closest thing to #define in Matlab?

by Ben

From my question on StackOverflow (CC BY-SA 3.0):

In C, I might do something like this:

#define MAGIC_NUMBER (88)

int foo(int a, int b, int c) {
  return a + b + c + MAGIC_NUMBER;
}

double bar(double x, double n) {
  return x + n + MAGIC_NUMBER;
}

/*
 * ...and so on with many kind-of-long functions using
 * MAGIC_NUMBER instead of writing a literal 88 like so:
 */

double bar(double x, double n) {
  return x + n + 88;
}

What should I do in Matlab? (Needs to work across multiple files.)

How can I get the column names when querying with DBI in Perl?

by Ben

From my question on StackOverflow (CC BY-SA 3.0):

I’m using DBI to query a SQLite3 database. What I have works, but it doesn’t return the columns in order. Example:

Query:  select col1, col2, col3, col4 from some_view;
Output:

    col3, col2, col1, col4
    3, 2, 1, 4
    3, 2, 1, 4
    3, 2, 1, 4
    3, 2, 1, 4
    ...

(values and columns are just for illustration)

I know this is happening because I’m using a hash, but how else do I get the column names back if I only use an array? All I want to do is get something like this for any arbitrary query:

    col1, col2, col3, col4
    1, 2, 3, 4
    1, 2, 3, 4
    1, 2, 3, 4
    1, 2, 3, 4
    ...

(That is, I need the output is in the right order and with the column names.)

I’m very much a Perl novice, but I really thought this would be a simple problem. (I’ve done this before in Ruby and PHP, but I’m having trouble tracking down what I’m looking for in the Perl documentation.)

Here’s a pared down version of what I have at the moment:

use Data::Dumper;
use DBI;

my $database_path = '~/path/to/db.sqlite3';

$database = DBI->connect(
  "dbi:SQLite:dbname=$database_path",
  "",
  "",
  {
    RaiseError => 1,
    AutoCommit => 0,
  }
) or die "Couldn't connect to database: " . DBI->errstr;

my $result = $database->prepare('select col1, col2, col3, col4 from some_view;')
    or die "Couldn't prepare query: " . $database->errstr;

$result->execute
    or die "Couldn't execute query: " . $result->errstr;

########################################################################################### 
# What goes here to print the fields that I requested in the query?
# It can be totally arbitrary or '*' -- "col1, col2, col3, col4" is just for illustration.
# I would expect it to be called something like $result->fields
########################################################################################### 

while (my $row = $result->fetchrow_hashref) {
    my $csv = join(',', values %$row);
    print "$csv\n";
}

$result->finish;

$database->disconnect;