fasta-one-hot-encoder

Simple python to lazily one-hot encode fasta files using multiple processes, either single bases or considering arbitrary kmers.


License
MIT
Install
pip install fasta-one-hot-encoder==1.2.2

Documentation

Fasta One-Hot Encoder

travis sonar_quality sonar_maintainability sonar_coverage Maintainability pip

Simple python to lazily one-hot encode fasta files using multiple processes, either single bases or considering arbitrary kmers.

Installation

Simply run:

pip installed fasta_one_hot_encoder

Examples

Bases

Bases

One-hot encode to bases.

from fasta_one_hot_encoder import FastaOneHotEncoder

encoder = FastaOneHotEncoder(
    nucleotides = "acgt",
    lower = True,
    sparse = False,
    handle_unknown="ignore"
)
path = "test_data/my_test_fasta.fa"
encoder.transform_to_df(path, verbose=True).to_csv(
    "my_result.csv"
)

Obtained results should look like:

  a c g t
0 0 0 1 0
1 0 1 0 0
2 0 1 0 0

Kmers

Kmers

One-hot encode to kmers of given length.

from fasta_one_hot_encoder import FastaOneHotEncoder

encoder = FastaOneHotEncoder(
    nucleotides = "acgt",
    kmers_length=2,
    lower = True,
    sparse = False,
    handle_unknown="ignore"
)
path = "test_data/my_test_fasta.fa"
encoder.transform_to_df(path, verbose=True).to_csv(
    "my_result.csv"
)

Obtained results should look like:

  aa ac ag at ca cc cg ct ga gc gg gt ta tc tg tt
0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0