As of today, there is no software that can recognize ancient characters, such as ancient Chinese characters in seal script or oracle bone script, ancient Egyptian hieroglyphs, Maya script, etc. These characters or scripts will not be encoded by Unicode, and so will remain as images. A human has to read and interpret them. This takes years of practice and is error-prone. Since OCR (optical character recognition) or even facial recognition has become commonplace, ancient character recognition can be made available without too much modification of existing technology. Once that's done, a Web site can be set up for the convenience of scholars and hobbyists alike.
Associated with this technology, the software that can recognize ancient characters can be extended to generate "Levenshtein distance"-like metric or index. There is at least one use case with this metric. The scholars studying ancient Chinese characters know that before Qin Shi Huang (literally, first emperor of Qin) united China, the writing styles of the same character differed in different states. The researchers judge the similarity of the styles and group the states accordingly. But this human judgement is inevitably arbitrary and varies from person to person. Software-assisted similarity judgement will be a great step toward standardization. Levenshtein distance works on words composed of letters. But the concept can be extended to glyphs if a computer scientist can cooperate in this research.
No comments:
Post a Comment