KoBART-Transformers
SKTμμ 곡κ°ν KoBARTλ₯Ό νΈλ¦¬νκ² μ¬μ©ν μ μκ² transformersλ‘ ν¬ν
νμμ΅λλ€.
Install (Optional)
BartModel
κ³Ό PreTrainedTokenizerFast
λ₯Ό μ΄μ©νλ©΄ μ€μΉνμ€ νμ μμ΅λλ€.
pip install kobart-transformers
Tokenizer
PreTrainedTokenizerFast
λ₯Ό μ΄μ©νμ¬ κ΅¬νλμμ΅λλ€.
PreTrainedTokenizerFast.from_pretrained("hyunwoongko/kobart")
μ λμΌν©λλ€.
>> > from kobart_transformers import get_kobart_tokenizer
>> > # from transformers import PreTrainedTokenizerFast
>> > kobart_tokenizer = get_kobart_tokenizer ()
>> > # kobart_tokenizer = PreTrainedTokenizerFast.from_pretrained("hyunwoongko/kobart")
>> > kobart_tokenizer .tokenize ("μλ
νμΈμ. νκ΅μ΄ BART μ
λλ€.π€£:)l^o" )
['βμλ
ν' , 'μΈμ.' , 'βνκ΅μ΄' , 'βB' , 'A' , 'R' , 'T' , 'βμ
' , 'λλ€.' , 'π€£' , ':)' , 'l^o' ]
Model
BartModel
μ μ΄μ©νμ¬ κ΅¬νλμμ΅λλ€.
BartModel.from_pretrained("hyunwoongko/kobart")
μ λμΌν©λλ€.
>> > from kobart_transformers import get_kobart_model , get_kobart_tokenizer
>> > # from transformers import BartModel
>> > kobart_tokenizer = get_kobart_tokenizer ()
>> > model = get_kobart_model ()
>> > # model = BartModel.from_pretrained("hyunwoongko/kobart")
>> > inputs = kobart_tokenizer (['μλ
νμΈμ.' ], return_tensors = 'pt' )
>> > model (inputs ['input_ids' ])
Seq2SeqModelOutput (last_hidden_state = tensor ([[[- 0.4488 , - 4.3651 , 3.2349 , ..., 5.8916 , 4.0497 , 3.5468 ],
[- 0.4096 , - 4.6106 , 2.7189 , ..., 6.1745 , 2.9832 , 3.0930 ]]],
grad_fn = < TransposeBackward0 > ), past_key_values = None , decoder_hidden_states = None , decoder_attentions = None , cross_attentions = None , encoder_last_hidden_state = tensor ([[[ 0.4624 , - 0.2475 , 0.0902 , ..., 0.1127 , 0.6529 , 0.2203 ],
[ 0.4538 , - 0.2948 , 0.2556 , ..., - 0.0442 , 0.6858 , 0.4372 ]]],
grad_fn = < TransposeBackward0 > ), encoder_hidden_states = None , encoder_attentions = None )
For Seq2Seq Training
seq2seq νμ΅μμλ μλμ κ°μ΄ get_kobart_for_conditional_generation()
μ μ΄μ©ν©λλ€.
BartForConditionalGeneration.from_pretrained("hyunwoongko/kobart")
μ λμΌν©λλ€.
>> > from kobart_transformers import get_kobart_for_conditional_generation
>> > # from transformers import BartForConditionalGeneration
>> > model = get_kobart_for_conditional_generation ()
>> > # model = BartForConditionalGeneration.from_pretrained("hyunwoongko/kobart")
Updates Notes
version 0.1
pad
ν ν°μ΄ μ€μ λμ§ μμ μλ¬λ₯Ό ν΄κ²°νμμ΅λλ€.
from kobart import get_kobart_tokenizer
kobart_tokenizer = get_kobart_tokenizer ()
kobart_tokenizer (["νκ΅μ΄" , "BART λͺ¨λΈμ" , "μκ°ν©λλ€." ], truncation = True , padding = True )
{
'input_ids' : [[28324 , 3 , 3 , 3 , 3 ], [15085 , 264 , 281 , 283 , 24224 ], [15630 , 20357 , 3 , 3 , 3 ]],
'token_type_ids' : [[0 , 0 , 0 , 0 , 0 ], [0 , 0 , 0 , 0 , 0 ], [0 , 0 , 0 , 0 , 0 ]],
'attention_mask' : [[1 , 0 , 0 , 0 , 0 ], [1 , 1 , 1 , 1 , 1 ], [1 , 1 , 0 , 0 , 0 ]]
}
version 0.1.3
get_kobart_for_conditional_generation()
λ₯Ό __init__.py
μ λ±λ‘νμμ΅λλ€.
version 0.1.4
λλ½λμλ special_tokens_map.json
μ μΆκ°νμμ΅λλ€.
μ΄μ pip install
μμ΄ KoBARTλ₯Ό μ΄μ©ν μ μμ΅λλ€.
thanks to bernardscumm
Reference