text-scrambler
Using the Unicode confusable characters and other tricks, we can transform a text into another that looks exactly like it but remains different from a machine view.
Examples
Replacing randomly the Latin characters by Greek or Cyrillic letters and adding the ZW(N)J.
Original text:
Herman Melville (August 1, 1819 â September 28, 1891) was an American novelist, short story writer, and poet of the American Renaissance period. Among his best-known works are Moby-Dick (1851), Typee (1846), a romanticized account of his experiences in Polynesia, and Billy Budd, Sailor, a posthumously published novella. Although his reputation was not high at the time of his death, the centennial of his birth in 1919 was the starting point of a Melville revival and Moby-Dick grew to be considered one of the great American novels.
Srambled text with zw(n)j added (looking the same but totally different):
Hâeârâmâaânâ âMâeâlâvâiâlâlâeâ â(âAâuâgâuâsâtâ â1â,â â1â8â1â9â âââ âSâeâpâtâeâmâbâeârâ â2â8â,â â1â8â9â1â)â âwâaâsâ âaânâ âAâmâeârâiâcâaânâ ânâoâvâeâlâiâsâtâ,â âsâhâoârâtâ âsâtâoârâyâ âwârâiâtâeârâ,â âaânâdâ âpâoâeâtâ âoâfâ âtâhâeâ âAâmâeârâiâcâaânâ âRâeânâaâiâsâsâaânâcâeâ âpâeârâiâoâdâ.â âAâmâoânâgâ âhâiâsâ âbâeâsâtâ-âkânâoâwânâ âwâoârâkâsâ âaârâeâ âMâoâbâyâ-âDâiâcâkâ â(â1â8â5â1â)â,â âTâyâpâeâeâ â(â1â8â4â6â)â,â âaâ ârâoâmâaânâtâiâcâiâzâeâdâ âaâcâcâoâuânâtâ âoâfâ âhâiâsâ âeâxâpâeârâiâeânâcâeâsâ âiânâ âPâoâlâyânâeâsâiâaâ,â âaânâdâ âBâiâlâlâyâ âBâuâdâdâ,â âSâaâiâlâoârâ,â âaâ âpâoâsâtâhâuâmâoâuâsâlâyâ âpâuâbâlâiâsâhâeâdâ ânâoâvâeâlâlâaâ.â âAâlâtâhâoâuâgâhâ âhâiâsâ ârâeâpâuâtâaâtâiâoânâ âwâaâsâ ânâoâtâ âhâiâgâhâ âaâtâ âtâhâeâ âtâiâmâeâ âoâfâ âhâiâsâ âdâeâaâtâhâ,â âtâhâeâ âcâeânâtâeânânâiâaâlâ âoâfâ âhâiâsâ âbâiârâtâhâ âiânâ â1â9â1â9â âwâaâsâ âtâhâeâ âsâtâaârâtâiânâgâ âpâoâiânâtâ âoâfâ âaâ âMâeâlâvâiâlâlâeâ ârâeâvâiâvâaâlâ âaânâdâ âMâoâbâyâ-âDâiâcâkâ âgârâeâwâ âtâoâ âbâeâ âcâoânâsâiâdâeârâeâdâ âoânâeâ âoâfâ âtâhâeâ âgârâeâaâtâ âAâmâeârâiâcâaânâ ânâoâvâeâlâsâ.
Srambled text with latin letter replaced with their Cyrillic/Greek equivalent:
Ðеrman MelvÑllе (ÎuguÑt 1, 1819 â Septеmber 28, 1891) wÐ°Ñ an ÐmеrÑÑаn nοvеlist, shοrt story writеr, and poеt Пf the Americаn RеnaÑssanÑe pеriПd. AmПng his bеst-known works arе Îoby-DÑÑk (1851), ΀yÑеe (1846), a rПmаnticizеd accПunt Пf hÑs eÑ ÑerÑencеs in ΡПlynеÑiа, аnd Ðilly Budd, РаÑlοr, а pοsthumously ÑublÑshed nПvеllа. Although hiÑ reputation was nοt hÑgh at thе tÑme οf hÑÑ dеаth, the ÑentennÑаl Пf hÑs bÑrth in 1919 waÑ thе stаrting point οf a Îelville revival and Îοby-Dick grew tο bе ÑПnÑidеrеd οne Пf the great American novels.
Srambled text with both changes:
Hâeârâmâaânâ âÎâeâlâvâÑâlâlâеâ â(âÐâuâgâuâÑâtâ â1â,â â1â8â1â9â âââ âSâeâpâtâeâmâbâeârâ â2â8â,â â1â8â9â1â)â âwâaâÑâ âaânâ âÎâmâeârâiâÑâaânâ ânâoâvâeâlâiâÑâtâ,â âsâhâοârâtâ âÑâtâοârâyâ âwârâiâtâеârâ,â âаânâdâ âpâПâеâtâ âοâfâ âtâhâeâ âÐâmâeârâÑâcâаânâ âRâеânâaâiâsâsâaânâÑâеâ âpâеârâiâПâdâ.â âAâmâοânâgâ âhâiâÑâ âbâеâsâtâ-âkânâοâwânâ âwâПârâkâÑâ âaârâеâ âMâПâbâyâ-âDâiâcâkâ â(â1â8â5â1â)â,â âTâyâpâеâеâ â(â1â8â4â6â)â,â âaâ ârâοâmâаânâtâÑâÑâÑâzâeâdâ âaâÑâcâПâuânâtâ âοâfâ âhâÑâsâ âeâxâÑâeârâÑâеânâcâeâsâ âÑânâ âÐ âПâlâyânâеâsâÑâаâ,â âaânâdâ âÐâiâlâlâyâ âÎâuâdâdâ,â âÐ âаâiâlâοârâ,â âaâ âpâοâÑâtâhâuâmâПâuâÑâlâyâ âpâuâbâlâiâÑâhâеâdâ ânâПâvâеâlâlâaâ.â âAâlâtâhâПâuâgâhâ âhâiâsâ ârâeâÑâuâtâaâtâÑâoânâ âwâаâÑâ ânâПâtâ âhâÑâgâhâ âаâtâ âtâhâеâ âtâÑâmâeâ âoâfâ âhâÑâsâ âdâeâаâtâhâ,â âtâhâеâ âÑâeânâtâeânânâÑâaâlâ âoâfâ âhâÑâÑâ âbâiârâtâhâ âÑânâ â1â9â1â9â âwâаâsâ âtâhâeâ âsâtâаârâtâÑânâgâ âÑâοâÑânâtâ âοâfâ âaâ âÎâeâlâvâiâlâlâеâ ârâеâvâiâvâаâlâ âaânâdâ âÐâoâbâyâ-âDâÑâÑâkâ âgârâеâwâ âtâοâ âbâeâ âÑâoânâsâiâdâeârâeâdâ âПânâeâ âoâfâ âtâhâеâ âgârâeâаâtâ âÐâmâеârâiâÑâаânâ ânâoâvâeâlâsâ.
It is worth to notice that search engines can't find the original webpage (as free online plagiarism checkers). Searching for Îelvillе (with cyrillic letters) (copy-paste it) on Google doesn't return any match, though the original word Melville does.
Using all of the confusable characters of unicode (see the unicode confusable characters below), we can generate weird looking text worthy of old spam messages:
ð®âðâð£âmâðªânâ âð¡âÒœâðââšâðªâðâðºâð®â ââðâðâð°âêâð£âtâ â1â,â â1â8â1ââ³â âââ âáâðâðºâðâðŸâmâÆâð¢âð¯â âƧâðâê¹â â1âà¬âð¿â1ââ âðžâðâðâ âðºâð«â âÎâmââ¯âð¯âð²âꮯâð¶âð·â ânâàŽâðŒâð¢âðžâïœâsâðâØâ âðâðâ꬜âêâðœâ âðŒâðââ²ârâð£â âð°âð»âÑâðâеâð£âÙ«â âαâðâðâ âð¥âðâïœ âð¥â âﮚâfâ âðµâïœâð²â âÎâmâðâð«âêâðžâïœânâ âðŒµâðŠâðâðâðŸâð âð£âð¶âðâð°âðâ âðâðŸârââ³âﮫâá¯âð©â âÎâmâïœâðâðâ âð±âá¥âð¬â âáâðâðâð¥âÛâðâðâïœâð€âð§â âðâПâê®âð€âðâ âð¶âð¿âðŸâ âðžâà»âáâð®âⲺâð£âðâð âðâ âãâ1âðªâ5â1âãâê¹â âð³âðâð¹âðŠâðâ âãâ1âð¯žâðâ6ââ³âê¹â âðâ âð£â꬜âmââºâð¯âðµâÑâꮯâðâð³ââ âðâ âðâïœâáŽâá¿âðâðâðâ âð¹€âð£â âðâÓâðâ âðâð¥âð¥âð¢âð¿âêâeâð·âïœââ®âꮪâ âðâðâ âðâð°âÓâγâð·âðŸâð°âðâð®âØâ âðŒâð«âðâ âð âá¥âðâlâðâ âðâð®âðâð¹âââ âáâаâêâðâðâðâ,â âαâ âðâ꬜âðâðœâÒ»âðâmâðºâáŽâð°âð¹âðŠâ âðâáŽâáâðâðâsâïœâð²âêâ âðâðâðâðâðâ×âðªâêâ âðœâð€âð¡âÒ»âð€âð¢âÖâð©â âðâιâÑâ âðâðâð âðâðâðªâð©âɪâﮚâð·â âðâðºâsâ âð¯âð¹€âðâ âð¡âðâá¶âðâ âðâðâ âð©âïœâꬲâ âðâðŠâmâеâ âðŒâáºâ âââıâÆœâ âðâðâðâðâð¥âê¹â âð©âáâꬲâ âð°ââ âð»âð±âðâðânâðâðâð â âﻫâð§â âðœâðâðŽâ âbâıâðâðœâð©â âïœâð§â â1âð£â1âðµâ âðâαâðâ âðâð¡âÒœâ âð€âð¡âðârâðâá¥âðâá¶â âðâסâðŸâð»âðâ âðâðâ âðâ âêâðââµâðâËâÐâðâÒœâ âð¯âðâïœâð²âðâðâlâ âÉâð¯âðœâ âðâà¶âðâð¢âââð·âͺâðâðžâ âð°âêâеâᎡâ âðâï®â âá²ââ¯â âïœââŽâð£âð°âð£âdââ âð¯ââ®ââ Ÿâ âﻬâðââ®â âà©Šâðâ âð©âð¥âð¢â âðâê®ââ¯âð¶âð©â âðâmâðŠâᎊâðŸâðâðâð§â âðâïœâð¿ââ â|âðâêž
Full documentation at https://text-scrambler.readthedocs.io
Installation
pip install text-scrambler
Quickstart
Python
>>> from text_scrambler import Scrambler >>> scr = Scrambler() >>> text = "This is an example" >>> text_1 = scr.scramble(text, level=1) >>> ############# >>> # adding only zwj/zwnj characters >>> print(text, text_1, sep="\\n") This is an example Tâhâiâsâ âiâsâ âaânâ âeâxâaâmâpâlâe >>> assert text != text_1 >>> print(len(text), len(text_1)) 18 35 >>> # though the texts look similar, the second one has more characters >>> ############# >>> text_2 = scr.scramble(text, level=2) >>> # replacing some latin letters by their cyrillic/greek equivalent >>> print(text_2) ТhiÑ iÑ an ÐµÑ Ð°mple >>> for char, char_2 in zip(text, text_2): ... if char != char_2: ... print(char, char_2) ... T Т s Ñ s Ñ e е x Ñ a а >>> ############# >>> text_3 = scr.scramble(text, level=3) >>> # adding zwj/zwnj characters and replacing latin letters >>> print(text_3) TâhâÑâsâ âiâÑâ âаânâ âeâÑ âаâmâpâlâe >>> print(text, text_3, sep="\\n") This is an example TâhâÑâsâ âiâÑâ âаânâ âeâÑ âаâmâpâlâe >>> assert text_3 != text >>> ############# >>> text_4 = scr.scramble(text, level=4) >>> # replacing all characters by any unicode looking like character >>> print(text_4) â€âðœâð¢âðŽâ âðªâðšâ âðªâÕŒâ âðâ⚯âðâmâÏâðâÒœ >>> # >>> # generating several versions >>> versions = scr.generate(text, 10, level=4) >>> for txt in versions: ... print(txt) ... ðâðµâðâð°â âðâÑâ âÉâðâ âꬲâðâðâmâð ââ²âð ðâÒ»âð£âÆœâ âËâê±â âðŒâð§â âðâðâðŒâmâðâðâ⯠âïœâð²âð°â ââ âð°â âαânâ âꬲâ‬âαâmââŽâðžâïœ ð§âðµâiâð â âïœâðâ ââºâð¯â âð²âðâаâmâð±âðžâð¢ â€âðâðâïœâ âɪâðâ âð¶âðâ âðŸâð âð¶âmâðâðâð® ðâïœâðâꮪâ ââ âðâ âð®âð§â âꬲâáœâðªâmâðââœâð® ðâðâÑâðâ âıâê±â âðâðâ âð²âð©âðâmâÑâðâð ð¿âáââ¹âð¬â âð¶âðâ âðŒâð«â âð²âð±âðªâmâðâð¡âð ââïœâðŸâꮪâ âïœâðŽâ âð°âðâ âðâáœâð®âmâðœâðâð² ð³âÕ°âðâsâ âðâðšâ âðâðâ âðŠâðâаâmâðâðâð² >>> versions = scr.generate(text, 1000, level=1) >>> assert len(versions) == len(set(versions)) >>> # all unique
Command line interface (CLI)
To get words from input words through CLI, run
$ python -m text_scrambler usage: Usage : python -m text_scrambler file Replace/insert the charaters of the file using the unicode confusable characters positional arguments: file encoded in UTF-8 optional arguments: -h, --help show this help message and exit -l LEVEL, --level LEVEL 1: insert non printable characters within the text 2: replace some latin letters to their Greek or Cyrillic equivalent 3: insert non printable characters and change the some latin to their Greek or Cyrillic equivalent 4: insert non printable chraracters change all possible letter to a randomly picked unicode letter equivalent default=1 -n N, --generate N Scramble n times the string default=1
Links
See https://en.wikipedia.org/wiki/Word_joiner for more info on word joiners
See https://unix.stackexchange.com/questions/469347/using-uniq-on-unicode-text for why in this case the sort command wouldn't work well to check the uniqueness of those strings
See http://www.unicode.org/Public/security/revision-03/confusablesSummary.txt for the complete list of confusable.