Korean Noise Generator


License
MIT
Install
pip install konoise==1.0.8.5

Documentation

ํ•œ๊ตญ์–ด ๋…ธ์ด์ฆˆ ์ถ”๊ฐ€ (konoise)

ํ•œ๊ตญ์–ด ๋ฌธ์„œ์— ๋…ธ์ด์ฆˆ๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒƒ์„ ๋•๋Š” ํŒŒ์ด์ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ž…๋‹ˆ๋‹ค(Library for generating the noise in Korean).

์ง€์›

  • manylinux (lastest version, 1.7.5)
  • windows, macOS (old version)

์„ค์น˜ ๋ฐฉ๋ฒ•

$ pip install konoise

์‹คํ–‰ ๋ฐฉ๋ฒ•

from konoise import NoiseGenerator

text = "ํ–‰๋ณตํ•œ ๊ฐ€์ •์€ ๋ชจ๋‘๊ฐ€ ๋‹ฎ์•˜์ง€๋งŒ, ๋ถˆํ–‰ํ•œ ๊ฐ€์ •์€ ๋ชจ๋‘ ์ €๋งˆ๋‹ค์˜ ์ด์œ ๋กœ ๋ถˆํ–‰ํ•˜๋‹ค."
generator = NoiseGenerator(num_cores=8)
text = generator.generate(text, methods='disattach-letters', prob=1., delimeter='newline')
text
>>> ํ–‰๋ณตํ•œ ใ„ฑใ…์ •์€ ๋ชจ๋‘ใ„ฑใ… ๋‹ฎ์•˜ใ…ˆใ…ฃ๋งŒ, ๋ถˆํ–‰ํ•œ ใ„ฑใ…์ •์€ ๋ชจ๋‘ ใ…ˆใ…“ใ…ใ…ใ„ทใ…์˜ ใ…‡ใ…ฃ์œ ๋กœ ๋ถˆํ–‰ใ…Žใ…ใ„ทใ….
  • text: ๋…ธ์ด์ฆˆ๋ฅผ ์ƒ์„ฑํ•  ํ…์ŠคํŠธ์ž…๋‹ˆ๋‹ค.

  • methods: ๋…ธ์ด์ฆˆ ์ƒ์„ฑ ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค(์‚ฌ์šฉ๊ฐ€๋Šฅํ•œ ๋ฐฉ๋ฒ•๋“ค์€ ์•„๋ž˜๋ฅผ ์ฐธ๊ณ , default:).

  • prob: ๋…ธ์ด์ฆˆ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ํ™•๋ฅ ์ž…๋‹ˆ๋‹ค(delimeter๋ณ„๋กœ ์ ์šฉ, 0-1์‚ฌ์ด์˜ ์‹ค์ˆ˜).

  • delimeter: ๋…ธ์ด์ฆˆ ์ ์šฉ, ๋ฉ€ํ‹ฐ ํ”„๋กœ์„ธ์‹ฑ ์ ์šฉ์˜ ๊ธฐ์ค€์ด ๋˜๋Š” ๋‹จ์œ„ ์ž…๋‹ˆ๋‹ค('total':์ „์ฒด,'newline':๊ฐœํ–‰(\n),'sentence':๋ฌธ์žฅ).

  • use_rust_tokenizer: rust ๊ธฐ๋ฐ˜ ๋…ธ์ด์ฆˆ ์ƒ์„ฑ๊ธฐ๋ฅผ ์‚ฌ์šฉ ํ•  ์ง€๋ฅผ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค.

๋…ธ์ด์ฆˆ ์ƒ์„ฑ ๋ฐฉ๋ฒ•

๋…ธ์ด์ฆˆ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์ด 6๊ฐ€์ง€๊ฐ€ ๊ตฌํ˜„๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

'disattach-letters': disattach_letters,
'change-vowels': change_vowels,
'palatalization': partial(phonetic_change, func='palatalization'),
'linking': partial(phonetic_change, func='linking'),
'liquidization': partial(phonetic_change, func='liquidization'),
'nasalization': partial(phonetic_change, func='nasalization'),
'assimilation': partial(phonetic_change, func='assimilation'),
'yamin-jungum': yamin_jungum
  • ์‰ผํ‘œ(,)๋กœ ๊ตฌ๋ถ„ํ•˜์—ฌ ์—ฌ๋Ÿฌ ๋ฐฉ๋ฒ•๋“ค์„ ๊ฐ™์ด ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

[disattach-letters] ์ž๋ชจ ๋ถ„๋ฆฌ(alphabet separation)์— ์˜ํ•œ ๋…ธ์ด์ฆˆ ์ถ”๊ฐ€ ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ๊ธ€์ž์˜ ์ž์Œ๊ณผ ๋ชจ์Œ์„ ๋ถ„๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ๋‹จ, ๊ฐ€๋…์„ฑ์„ ์œ„ํ•ด ์ข…์„ฑ์ด ์—†์œผ๋ฉฐ ์ค‘์„ฑ์ด 'ใ…˜', 'ใ…™', 'ใ…š', 'ใ…›', 'ใ…œ', 'ใ…', 'ใ…ž', 'ใ…Ÿ', 'ใ… ', 'ใ…ก', 'ใ…ข', 'ใ…—' ๊ฐ€ ์•„๋‹ ๊ฒฝ์šฐ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค(์˜ˆ: ์•ˆ๋…•ํ•˜์„ธ์š” > ์•ˆ๋…•ใ…Žใ…ใ……ใ…”์š”)

[change-vowels] ๋ชจ์Œ ๋ณ€ํ˜•์— ์˜ํ•œ ๋…ธ์ด์ฆˆ ์ถ”๊ฐ€ ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ๊ธ€์ž์˜ ๋ชจ์Œ์„ ๋ณ€ํ˜•์‹œํ‚ต๋‹ˆ๋‹ค. ๋‹จ, ๊ฐ€๋…์„ฑ์„ ์œ„ํ•ด ์ข…์„ฑ์ด ์—†์œผ๋ฉฐ ์ค‘์„ฑ์ด 'ใ…', 'ใ…‘', 'ใ…“', 'ใ…•', 'ใ…—', 'ใ…›', 'ใ…œ', 'ใ… ' ์ผ ๊ฒฝ์šฐ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค(์˜ˆ: ์•ˆ๋…•ํ•˜์„ธ์š” > ์•ˆ๋…•ํ–์„ธ์˜ค).

[palatalization] ์Œ์šด ๋ณ€ํ™” ์ค‘, ๊ตฌ๊ฐœ ์Œํ™”๋ฅผ ๊ตฌํ˜„ํ•˜๋Š” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค.

[linking] ์Œ์šด ๋ณ€ํ™” ์ค‘, ์—ฐ์Œ์„ ๊ตฌํ˜„ํ•˜๋Š” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค.

[liquidization] ์Œ์šด ๋ณ€ํ™” ์ค‘, ์œ ์Œํ™”๋ฅผ ๊ตฌํ˜„ํ•˜๋Š” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค.

[nasalization] ์Œ์šด ๋ณ€ํ™” ์ค‘, ๋น„์Œํ™”๋ฅผ ๊ตฌํ˜„ํ•˜๋Š” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค.

[assimilation] ์Œ์šด ๋ณ€ํ™” ์ค‘, ์Œ์šด๋™ํ™”๋ฅผ ๊ตฌํ˜„ํ•˜๋Š” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค.

[yamin-jungum] ์•ผ๋ฏผ์ •์Œ์œผ๋กœ ์ผ๋ถ€ ๊ธ€์ž๋ฅผ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ๋‹จ, ๊ฐ€๋…์„ฑ์ด ๋–จ์–ด์ง€๋Š” ์ผ๋ถ€ ํ‘œํ˜„์€ ์ œ์™ธ๋˜์—ˆ์Šต๋‹ˆ๋‹ค(๊ท€์—ฌ์›Œ > ์ปค์—ฌ์›Œ).

๋ณ€ํ˜• ์˜ˆ์‹œ

[original]  ํ–‰๋ณตํ•œ ๊ฐ€์ •์€ ๋ชจ๋‘๊ฐ€ ๋‹ฎ์•˜์ง€๋งŒ, ๋ถˆํ–‰ํ•œ ๊ฐ€์ •์€ ๋ชจ๋‘ ์ €๋งˆ๋‹ค์˜ ์ด์œ ๋กœ ๋ถˆํ–‰ํ•˜๋‹ค.

[disattach-letters, prob=1.] ํ–‰๋ณตํ•œ ใ„ฑใ…์ •์€ ๋ชจ๋‘ใ„ฑใ… ๋‹ฎ์•˜ใ…ˆใ…ฃ๋งŒ, ๋ถˆํ–‰ํ•œ ใ„ฑใ…์ •์€ ๋ชจ๋‘ ใ…ˆใ…“ใ…ใ…ใ„ทใ…์˜ ใ…‡ใ…ฃ์œ ๋กœ ๋ถˆํ–‰ใ…Žใ…ใ„ทใ….

[change-vowels, prob=1.] ํ–‰๋ณตํ•œ ๊ฐธ์ •์€ ๋ฌ˜๋“€๊ฐธ ๋‹ฎ์•˜์ง€๋งŒ, ๋ถˆํ–‰ํ•œ ๊ฐธ์ •์€ ๋ฌ˜๋“€ ์ ธ๋จ€๋Œœ์˜ ์ด์šฐ๋ฃŒ ๋ถˆํ–‰ํ–๋Œœ.

[palatalization, prob=1.] ํ–‰๋ณด์นธ ๊ฐ€์ •์€ ๋ชจ๋‘๊ฐ€ ๋‹ฎ์•˜์ง€๋งŒ, ๋ถˆํ–‰ํ•œ ๊ฐ€์ •์€ ๋ชจ๋‘ ์ €๋งˆ๋‹ค์˜ ์ด์œ ๋กœ ๋ถˆํ–‰ํ•˜๋‹ค.

[linking, prob=1.] ํ–‰๋ณตํ•œ ๊ฐ€์ •์€ ๋ชจ๋‘๊ฐ€ ๋‹ฌ๋งœ์ง€๋งŒ, ๋ถˆํ–‰ํ•œ ๊ฐ€์ •์€ ๋ชจ๋‘ ์ €๋งˆ๋‹ค์˜ ์ด์œ ๋กœ ๋ถˆํ–‰ํ•˜๋‹ค.

[liquidization, prob=1.] ํ–‰๋ณตํ•œ ๊ฐ€์ •์€ ๋ชจ๋‘๊ฐ€ ๋‹ฎ์•˜์ง€๋งŒ, ๋ถˆํ–‰ํ•œ ๊ฐ€์ •์€ ๋ชจ๋‘ ์ €๋งˆ๋‹ค์˜ ์ด์œ ๋กœ ๋ถˆํ–‰ํ•˜๋‹ค.(No Change)

[nasalization, prob=1.] ํ–‰๋ณตํ•œ ๊ฐ€์ •์€ ๋ชจ๋‘๊ฐ€ ๋‹ฎ์•˜์ง€๋งŒ, ๋ถˆํ–‰ํ•œ ๊ฐ€์ •์€ ๋ชจ๋‘ ์ €๋งˆ๋‹ค์˜ ์ด์œ ๋กœ ๋ถˆํ–‰ํ•˜๋‹ค. (No Change)

[assimilation, prob=1.] ํ–‰๋ณตํ•œ ๊ฐ€์ •์€ ๋ชจ๋‘๊ฐ€ ๋‹ฎ์•˜์ง€๋งŒ, ๋ถˆํ–‰ํ•œ ๊ฐ€์ •์€ ๋ชจ๋‘ ์ €๋งˆ๋‹ค์˜ ์ด์œ ๋กœ ๋ถˆํ–‰ํ•˜๋‹ค. (No Change)

[yamin-jungum, prob=1.] ํ–‰๋ณตํ•œ ๊ฐ€์ •์€ ๋ชจ๋‘๊ฐ€ ๋‹ฎ์•˜์ง€๋งŒ, ๋ถˆํ–‰ํ•œ ๊ฐ€์ •์€ ๋ชจ๋‘ ์ €๋งˆ๋‹ค์˜ ์ด์œพ๋กœ ๋ถˆํ–‰ํ•˜๋‹ค.

Noise Generator in Rust

  1. ๋…ธ์ด์ฆˆ ์ƒ์„ฑ๊ธฐ ๋‚ด์—์„œ ์‚ฌ์šฉ
    from konoise import NoiseGenerator
    
    # provide the same methods(except yamin-jungum)
    # if you insert the string 'yamin-jungum', 
    # it might be applied with the python generator 
    # even if 'use_rust_tokenizer' is True
    
    text = "ํ–‰๋ณตํ•œ ๊ฐ€์ •์€ ๋ชจ๋‘๊ฐ€ ๋‹ฎ์•˜์ง€๋งŒ, ๋ถˆํ–‰ํ•œ ๊ฐ€์ •์€ ๋ชจ๋‘ ์ €๋งˆ๋‹ค์˜ ์ด์œ ๋กœ ๋ถˆํ–‰ํ•˜๋‹ค."
    genertor = NoiseGenerator()
    genertor.generate(text, 'disattach-letters', 0.5, use_rust_tokenizer=True) 
    >>> 'ํ–‰๋ณตํ•œ ใ„ฑใ…์ •์€ ๋ชจ๋‘ใ„ฑใ… ๋‹ฎ์•˜ใ…ˆใ…ฃ๋งŒ, ๋ถˆํ–‰ํ•œ ใ„ฑใ…์ •์€ ๋ชจ๋‘ ใ…ˆใ…“ใ…ใ…ใ„ทใ…์˜ ใ…‡ใ…ฃ์œ ๋กœ ๋ถˆํ–‰ใ…Žใ…ใ„ทใ….'
    
  2. rust ๋ชจ๋“ˆ์„ ์ง์ ‘ ์‚ฌ์šฉ
    from konoise import rust_generator
    
    text = "ํ–‰๋ณตํ•œ ๊ฐ€์ •์€ ๋ชจ๋‘๊ฐ€ ๋‹ฎ์•˜์ง€๋งŒ, ๋ถˆํ–‰ํ•œ ๊ฐ€์ •์€ ๋ชจ๋‘ ์ €๋งˆ๋‹ค์˜ ์ด์œ ๋กœ ๋ถˆํ–‰ํ•˜๋‹ค."
    rust_generator.get_noise(text, 'disattach-letters', 0.5) # provide the same methods(except yamin-jungum)
    >>> 'ํ–‰๋ณตํ•œ ใ„ฑใ…์ •์€ ๋ชจ๋‘ใ„ฑใ… ๋‹ฎ์•˜ใ…ˆใ…ฃ๋งŒ, ๋ถˆํ–‰ํ•œ ใ„ฑใ…์ •์€ ๋ชจ๋‘ ใ…ˆใ…“ใ…ใ…ใ„ทใ…์˜ ใ…‡ใ…ฃ์œ ๋กœ ๋ถˆํ–‰ใ…Žใ…ใ„ทใ….'
    

๊ธฐํƒ€

  • ๋น„์Œํ™”, ์œ ์Œํ™”, ๊ตฌ๊ฐœ์Œํ™”, ์—ฐ์Œ, ์Œ์šด ๋™ํ™”์˜ ๋ชจ๋“  ๊ทœ์น™์ด ๊ตฌํ˜„๋˜์ง€ ์•Š์€ ์ƒํƒœ์ด๋ฉฐ, ์ถ”ํ›„ ํ™•๋Œ€๋  ์˜ˆ์ •์ž…๋‹ˆ๋‹ค(๋ˆ„๋ฝ๋œ ๊ทœ์น™์ด ์žˆ์„ ์ˆ˜ ์žˆ์œผ๋‹ˆ, ๋ฐœ๊ฒฌ ์‹œ ํ”ผ๋“œ๋ฐฑ ์ฃผ์‹œ๋ฉด ๊ฐ์‚ฌํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค).
  • prob๋Š” ๋ณ€ํ˜• ๊ฐ€๋Šฅํ•œ ๊ธ€์ž๋“ค์— ๋Œ€ํ•ด์„œ ํ•ด๋‹น ํ™•๋ฅ ๋งŒํผ ํ™•๋ฅ ์ ์œผ๋กœ ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค(prob๊ฐ€ 1์ด๋ผ๊ณ  ํ•ด์„œ ๋ชจ๋“  ํ…์ŠคํŠธ๊ฐ€ ๋ณ€๊ฒฝ๋˜๋Š” ๊ฒƒ์ด ์•„๋‹™๋‹ˆ๋‹ค).
  • ๋‘ ๊ฐœ ์ด์ƒ์˜ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ(์‰ผํ‘œ๋กœ ๊ตฌ๋ถ„), ํ•œ ๋‹จ์œ„ ํ…์ŠคํŠธ์—์„œ ๋‘ ๊ฐœ์˜ ๋ฐฉ๋ฒ•์ด ์‚ฌ์šฉ๋˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ๊ฐ ๋‹จ์œ„ ํ…์ŠคํŠธ๋งˆ๋‹ค ๋žœ๋คํ•˜๊ฒŒ ๋ฐฉ๋ฒ•์„ ๊ฒฐ์ •ํ•˜์—ฌ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.