I migrated from TextDrive to a Joyent Shared Accelerator and in the process I had to migrate the Wordpress MySQL database as well. After the migration the “ and ‘ where showing as “ and ’ respectively. It was a charset problem. Apparently the problem was
that the data itself was already in UTF-8 within a Latin1 database (due to WP default charset).
So I did the backup again (like this)
mysqldump --user=$DB1USER --password=$DB1PASSWD \
--default-character-set=latin1 $DB1NAME dump.sql
and then I imported the dump.sql
file again into the Joyent utf8 mysql:
$ cat dump.sql |sed -e 's/DEFAULT CHARSET=latin1;/DEFAULT CHARSET=utf8;/'>dp2.sql
$ mysqldump --user=$DB2USER --password=$DB2PASSWORD --add-drop-table \
--no-data $DB2NAME| grep ^DROP |mysql --user=$DB2NAME --password=$DB2PASSWORD \
$DB2NAME # to drop all existing tables
$ mysql --user=$DB2USER --password=$DB2PASSWORD $DB2NAME <dp2.sql
and the problem was solved!.
Then it got me thinking, how actually get a from “Japonés en viñetas” to “Japonés en viñetasâ€. So I tried to achieve the same result from the command line:
$ echo \“Japonés en viñetas\” “Japonés en viñetas” $ echo \“Japonés en viñetas\” |iconv -f latin1 -t utf-8 âJaponés en viñetasâ
That’s not quite what I was expecting. Then I read the Wikipedia article on ISO-8859-1/Latin1
and I found that Latin-1 is confused with Windows-1253 and that “Many web browsers and e-mail clients will interpret ISO-8859-1 control codes as Windows-1252 characters in order to accommodate such mislabeling”
So I tried it
$ echo \“Japonés en viñetas\”
“Japonés en viñetas”
$ echo \“Japonés en viñetas\” |iconv -f windows-1252 -t utf-8
“Japonés en viñetasâ€
iconv: (stdin):1:25: cannot convert
There it is the "
becomes “
. when it actually UTF-8 but misintepreted as Windows-1252