What's with the
_snowman tag in Rails 3?
_utf8 input tag forces Internet Explorer to properly respect your form's character encoding.
Rails uses the
accept-charset attribute in your
form element to let the server know that it should be able to deal with unicode characters (think of a user searching for café). Here's an example:
<form accept-charset='UTF-8' action='/posts' id='create-post' method='post'> … </form>
Unfortunately, IE5-IE8 will not look at
accept-charset unless at least one character in the form's values is not in the page's charset. Since the user can override the default charset at the browser level, Rails provides a hidden input containing a unicode character (Rails' default encoding is UTF-8), forcing IE to look at
accept-charset. The unicode character is a snowman. Here's what the tag looks like:
<input name="_utf8" type="hidden" value="☃" />
You can safely ignore
params[:_utf8] in your Rails application.
_utf8 tag might be rendered as
_snowman depending on how recent your edge version of Rails is.
Why does it matter? What's the impact on my users?
From wycats, author of the patch:
This bug exists in IE5, IE6, IE7, and IE8. If the user switches the browser's encoding to Latin-1 (to understand why a user would decide to do something seemingly so crazy, check out this google search), any form submission will be sent in Latin-1.
This means that if a user searches for "Ché Guevara", it will come through incorrectly on the server-side. In Ruby 1.9, this will result in an encoding error when the text inevitably makes its way into the regular expression engine. In Ruby 1.8, it will result in broken results for the user.
By creating a parameter that can only be understood by IE as a unicode character, we are forcing IE to look at the
accept-charsetattribute, which then tells it to encode all of the characters as UTF-8, even ones that can be encoded in Latin-1.
Keep in mind that in Ruby 1.8, it is extremely trivial to get Latin-1 data into your UTF-8 database (since NOTHING in the entire stack checks that the bytes that the user sent at any point are valid UTF-8 characters). As a result, it's extremely common for Ruby applications (and PHP applications, etc. etc.) to exhibit this user-facing bug, and therefore extremely common for users to try to change the encoding as a palliative measure.