Reads a python module and statically analyzes it. This works well with jupyter extensions in VS Code, and will have better performance when the module files are formatted with PEP-8.
$ pip install textpy
lazyr>=0.0.16
hintwith>=0.1.3
typing-extensions
black
NOTE: pandas>=1.4.0 is recommended but not necessary.
To demonstrate the usage of this module, we put a file named myfile.py
under ./examples/
(you can find it in the repository, or create a new file of your own):
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from typing import Optional
class MyBook:
"""
A book that records a story.
Parameters
----------
story : str, optional
Story to record, by default None.
"""
def __init__(self, story: Optional[str] = None) -> None:
if story is None:
self.content = "This book is empty."
self.content = story
def print_my_book(book: MyBook) -> None:
"""
Print a book.
Parameters
----------
book : MyBook
A book.
"""
print(book.content)
Run the following codes to find all the occurrences of some pattern (for example, "MyBook") in myfile.py
:
>>> import textpy as tx
>>> myfile = tx.module("./examples/myfile.py") # reads the python module
>>> myfile.findall("MyBook")
examples/myfile.py:7: 'class <MyBook>:'
examples/myfile.py:24: 'def print_my_book(book: <MyBook>) -> None:'
examples/myfile.py:30: ' book : <MyBook>'
If you are using a jupyter notebook, you can run a cell like this:
>>> myfile.findall("content")
source | match |
---|---|
myfile.MyBook:7 | class MyBook: |
myfile.print_my_book():24 | def print_my_book(book: MyBook) -> None: |
myfile.print_my_book():30 | book : MyBook |
Note that in the jupyter notebook case, the matched substrings are clickable, linking to where the patterns were found.
The previous demonstration introduced the core function tx.module()
. The return of tx.module()
is a subinstance of the abstract class PyText
, who supports various text manipulation methods:
>>> isinstance(myfile, tx.PyText)
True
Sometimes, your python module may contain not just one file, but don't worry, since tx.module()
provides support for complex file hierarchies. If the path points to a single file, the return type will be PyFile
; otherwise, the return type will be PyDir
- both are subclasses of PyText
.
In conclusion, once you've got a python package, you can simply give the package dirpath to tx.module()
, and do things like before:
>>> pkg_dir = "" # type any path here
>>> pattern = "" # type any regex pattern here
>>> res = tx.module(pkg_dir).findall(pattern)
As mentioned before, user can use .findall()
to find all non-overlapping matches of some pattern in a python module.
>>> myfile.findall("optional")
examples/myfile.py:13: ' story : str, <optional>'
The return object of .findall()
has a _repr_mimebundle_()
method to beautify the representation inside a jupyter notebook. However, you can compulsively disable this feature by setting display_params.use_mimebundle
to False:
>>> from textpy import display_params
>>> display_params.use_mimebundle = False
In addition, the .findall()
method has some optional parameters to customize the pattern, including whole_word=
, case_sensitive=
, and regex=
.
>>> myfile.findall("mybook", case_sensitive=False, regex=False, whole_word=True)
examples/myfile.py:7: 'class <MyBook>:'
examples/myfile.py:24: 'def print_my_book(book: <MyBook>) -> None:'
examples/myfile.py:30: ' book : <MyBook>'
Use .replace()
to find all non-overlapping matches of some pattern, and replace them with another string:
>>> replacer = myfile.replace("book", "magazine")
>>> replacer
examples/myfile.py:9: ' A <book/magazine> that records a story.'
examples/myfile.py:20: ' self.content = "This <book/magazine> is empty."'
examples/myfile.py:24: 'def print_my_<book/magazine>(<book/magazine>: MyBook) -> None:'
examples/myfile.py:26: ' Print a <book/magazine>.'
examples/myfile.py:30: ' <book/magazine> : MyBook'
examples/myfile.py:31: ' A <book/magazine>.'
examples/myfile.py:34: ' print(<book/magazine>.content)'
At this point, the replacement has not actually taken effect yet. Use .confirm()
to confirm the changes and write them to the file(s):
>>> replacer.confirm()
{'successful': ['examples/myfile.py'], 'failed': []}
If you want to rollback the changes, run:
>>> replacer.rollback()
{'successful': ['examples/myfile.py'], 'failed': []}
Use .delete()
to find all non-overlapping matches of some pattern, and delete them:
>>> deleter = myfile.delete("book")
>>> deleter
examples/myfile.py:9: ' A <book> that records a story.'
examples/myfile.py:20: ' self.content = "This <book> is empty."'
examples/myfile.py:24: 'def print_my_<book>(<book>: MyBook) -> None:'
examples/myfile.py:26: ' Print a <book>.'
examples/myfile.py:30: ' <book> : MyBook'
examples/myfile.py:31: ' A <book>.'
examples/myfile.py:34: ' print(<book>.content)'
>>> deleter.confirm()
{'successful': ['examples/myfile.py'], 'failed': []}
>>> deleter.rollback()
{'successful': ['examples/myfile.py'], 'failed': []}
This project falls under the BSD 3-Clause License.
- Updated
PyText.check_format()
: now returns a boolean value instead of None. - Updated the
ignore=
parameter formodule()
: it now accepts a list of path-patterns; paths matching any of these patterns will be ignored when searching for files.
- Fixed issue: can not display special characters in
*._repr_mimebundle_()
.
- New gloabal parameters:
tree_style=
,table_style=
,use_mimebundle=
, andskip_line_numbers=
; find them undertx.display_params
. - Defined
display_params.defaults()
for users to get the default values of the parameters. - New subclass
PyProperty
inherited fromPyMethod
. Class properties will be stored in instances ofPyProperty
instead ofPyMethod
in the future. - Updated the method
PyText.jumpto()
; it now allows "/" as delimiters (in addition to "."); if a class or callable is defined more than once, jump to the last (previously first) place where it was defined. -
PyText
has a_repr_mimebundle_()
method now. - New property
PyText.imports
. - Created a utility class
HTMLTableMaker
in place ofStyler
; this significantly reduces the running overhead of*._repr_mimebundle_()
.
- Updated
utils.re_extensions
:- bugfix for
rsplit()
; - new string operation
quote_collapse()
.
- bugfix for
- Updated
utils.re_extensions
:-
Important: we've decided to extract
utils.re_extensions
into an independent package namedre_extensions
(presently at v0.0.3), so any future updates should be looked up in https://github.com/Chitaoji/re-extensions instead; we will stay in sync with it, however; -
real_findall()
now returns match objects instead of spans and groups; -
smart_sub()
accepts a new optional parameter calledcount=
; -
SmartPattern
supports [] to indicate a Unicode (str) or bytes pattern (like whatre.Pattern
does); - new regex operations
smart_split()
,smart_findall()
,line_findall()
,smart_subn()
, andsmart_fullmatch()
; - created a namespace
Smart
for all the smart operations; - bugfixes for
rsplit()
,lsplit()
, andsmart_sub()
.
-
Important: we've decided to extract
- Reduced the running cost of
PyText.findall()
by taking advantage of the new regex operationline_findall()
.
- New methods
PyText.is_file()
andPyText.is_dir()
to find out whether the instance represents a file / directory. - New method
PyText.check_format()
for format checking. - Defined the comparison ordering methods
__eq__()
,__gt__()
, and__ge__()
forPyText
. They compares twoPyText
object via their absolute paths. - Updated
utils.re_extensions
:- new regex operations
smart_search()
,smart_match()
, andsmart_sub()
; - new string operation
counted_strip()
; - new utility classes
SmartPattern
andSmartMatch
. - new utility functions
find_right_bracket()
andfind_left_bracket()
.
- new regex operations
- New string operation
utils.re_extensions.word_wrap()
. - Various improvements.
- The module-level function
textpy()
is going to be deprecated to avoid conflicts with the package nametextpy
. Please usemodule()
insead. - New methods
PyText.replace()
andPyText.delete()
. - New class
Replacer
as the return type ofPyText.replace()
, with public methods.confirm()
,.rollback()
, etc. - Added a dunder method
PyText.__truediv__()
as an alternative toPyText.jumpto()
. - New subclass
PyContent
inherited fromPyText
. APyContent
object stores a part of a file that is not storable by instances of other subclasses.
- Improved behavior of clickables.
- Fixed issue: incorrect file links in the output of
TextPy.findall()
;
- Various improvements.
- Updated LICENSE.
- Refactored README.md.
- Lazily imported pandas to reduce the time cost for importing.
- New optional parameters for
TextPy.findall()
:-
whole_word=
: whether to match whole words only; -
case_sensitive=
: specifies case sensitivity.
-
- New optional parameter
encoding=
fortextpy()
.
- Removed unnecessary dependencies.
- Bugfix under Windows system.
- Provided compatibility with pandas versions lower than 1.4.0.
- Updated
textpy()
:-
Path
object is now acceptable as the positional argument; - new optional parameter
home=
for specifying the home path.
-
- More flexible presentation of output from
TextPy.findall()
.
- Fixed a display issue of README.md on PyPI.
- Initial release.