embulk-filter-gsub

Embulk filter plugin to convert text column values with regular expressions


License
MIT
Install
gem install embulk-filter-gsub -v 0.2.0

Documentation

Gsub filter plugin for Embulk

CircleCI

Embulk filter plugin to convert text column values with regular expressions.

Overview

  • Plugin type: filter

Configuration

  • target_columns: columns to convert text value (array, default: [])

Example

filters:
  - type: gsub
    target_columns:
      foo:
        - type: regexp_replace
          pattern: '(\w*):\s*(\w*)'
          to: "$1 = [$2]"
      bar:
        - type: to_lower_case
      baz:
        - type: to_upper_case
          pattern: "test"

target column configuration

  • type: type of text substitution (string, default: regexp_replace)

Supported Types are:

  • regexp_replace
  • to_lower_case
  • to_upper_case

regexp_replace

  • pattern: regular expression pattern to be substituted (string, required)
  • to: replacement string (string, required)
Example
target_columns:
  foo:
    - type: regexp_replace
      pattern: '(\w*):\s*(\w*)'
      to: "$1 = [$2]"

it converts input like this

foo bar
example-foo: 1234 example-bar: 9876

into the output

foo bar
example-foo = [1234] example-bar: 9876

to_lower_case

  • pattern: regular expression pattern to be substituted (string, optional)

If pattern is omitted, whole text is converted into lower case letters.

Example
target_columns:
  foo:
    - type: to_lower_case

it converts input like this

foo bar
ABC XYZ

into the output

foo bar
abc XYZ

to_upper_case

  • pattern: regular expression pattern to be substituted (string, optional)

If pattern is omitted, whole text is converted into upper case letters.

Example
target_columns:
  foo:
    - type: to_upper_case

it converts input like this

foo bar
abc xyz

into the output

foo bar
ABC xyz

Multiple conversion

You can apply multiple conversion on a column value.

target_columns:
  foo:
    - type: regexp_replace
      pattern: '</?\w*\s*/?>'
      to: ""
    - type: regexp_replace
      pattern: '(\w*):\s*(\w*)'
      to: "$1 = [$2]"

Regular expression options

You can specify some regular expression options.

target_columns:
  foo:
    - type: regexp_replace
      pattern: 'foo'
      to: "***"
      regexp_options:
        ignore_case: true

Supported options are:

  • ignore_case (boolean, default: false)
  • multiline (boolean, default: true)
  • dot_matches_all (boolean, default: false)
  • enable_comments (boolean, default: false)

Build

$ ./gradlew gem  # -t to watch change of files and rebuild continuously