☃

What's with the _utf8/_e/_snowman tag in Rails 3?

The _utf8 input tag forces Internet Explorer to properly respect your form's character encoding.

Rails uses the accept-charset attribute in your form element to let the server know that it should be able to deal with unicode characters (think of a user searching for café). Here's an example:

<form accept-charset='UTF-8' action='/posts' id='create-post' method='post'>
  …
</form>

Unfortunately, IE5-IE8 will not look at accept-charset unless at least one character in the form's values is not in the page's charset. Since the user can override the default charset at the browser level, Rails provides a hidden input containing a unicode character (Rails' default encoding is UTF-8), forcing IE to look at accept-charset. The unicode character is a snowman. Here's what the tag looks like:

<input name="_utf8" type="hidden" value="&#9731;" />

You can safely ignore params[:_utf8] in your Rails application.

Note: The _utf8 tag might be rendered as _e or _snowman depending on how recent your edge version of Rails is.

Why does it matter? What's the impact on my users?

From wycats, author of the patch:

This bug exists in IE5, IE6, IE7, and IE8. If the user switches the browser's encoding to Latin-1 (to understand why a user would decide to do something seemingly so crazy, check out this google search), any form submission will be sent in Latin-1.

This means that if a user searches for "Ché Guevara", it will come through incorrectly on the server-side. In Ruby 1.9, this will result in an encoding error when the text inevitably makes its way into the regular expression engine. In Ruby 1.8, it will result in broken results for the user.

By creating a parameter that can only be understood by IE as a unicode character, we are forcing IE to look at the accept-charset attribute, which then tells it to encode all of the characters as UTF-8, even ones that can be encoded in Latin-1.

Keep in mind that in Ruby 1.8, it is extremely trivial to get Latin-1 data into your UTF-8 database (since NOTHING in the entire stack checks that the bytes that the user sent at any point are valid UTF-8 characters). As a result, it's extremely common for Ruby applications (and PHP applications, etc. etc.) to exhibit this user-facing bug, and therefore extremely common for users to try to change the encoding as a palliative measure.