-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
The phpBB 2.0 converter should check for the presence of non-UTF8 characters and convert them to UTF-8 as part of the conversion process. I do these for clients and I've stumbled across this issue before and keep fixing these problems retroactively. The symptoms of when this happen vary, but most recently it was manifested in searching. An XML parsing error occurred and the underlying issue was the built-in XML parser could not parse non-UTF8 content, resulting in a document that was not well formed. So the search failed with an ugly error message.
It may not be possible to do this with all DBMSes you support, but it certainly can be done with MySQL/MariaDB, which is most of your base. Here's an example of what works:
UPDATE phpbb_posts SET post_text = CONVERT(CAST(CONVERT(post_text USING latin1) AS BINARY) USING utf8);
I recommend the following columns go through this process:
- phpbb_posts.post_subject
- phpbb_posts.post_text
- phpbb_topics.topic_title
- phpbb_topics.topic_last_post_subject
- phpbb_posts.poll_title
- phpbb_privmsgs.message_subject
- phpbb_privmsgs.message_text
- phpbb_forums.forum_name
- phpbb_forums.forum_desc
There may be more that need this based on your judgment.