Uploaded image for project: 'phpBB'
  1. phpBB
  2. PHPBB-10921

Second ordered pair from confusables.php is reversing valid clean character when run through utf8_clean_string()

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Fix
    • Icon: Major Major
    • 3.1.12-RC1, 3.2.2-RC1
    • 3.0.11-RC1, 3.1.0-dev
    • Other
    • None

      The second pair in confusables.php should not be part of the array because it is the inverse of the 1309th pair. This results in them reversing each other depending on which of the "from" values is present in the string provided. The 1309th pair is the valid one of the two as its replacement value is also used in other instances of exclamation point lookalikes (for example in pairs 1304 ('⁈'=>'ʔǃ'), 1310 ('!'=>'ǃ'), 1311('⁉'=>'ǃʔ'), 1312('‼'=>'ǃǃ')). To clarify, the KEY in the second pair is the clean string.

      Here's a quick script that I wrote to prove that it's happening:

      <?php
       
      define('IN_PHPBB', true);
      $phpbb_root_path = './';
      $phpEx = substr(strrchr(__FILE__, '.'), 1);
      include($phpbb_root_path . 'common.' . $phpEx);
       
      $homographs = include($phpbb_root_path . 'includes/utf/data/confusables.' . $phpEx);
      $i = 1;
      foreach ($homographs as $from => $to)
      {
      // Is the value getting reversed?
      if (utf8_clean_string($to) == $from)
      {
      echo "Reversal happening at pair $i- $from : $to <br />";
      }
      // Check if value from pair 1309 exists
      if (utf8_strpos($to, 'ǃ') !== false)
      {
      echo "1309 value appears at $i <br />";
      }
      $i++;
      }
      ?>
      

      It also seems to happen at pair 518, but that seems to be because the key and the value are the same character?

            CHItA CHItA
            prototech prototech [X] (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: