csv2sqlLike
csv2sqlLike is a package for simple data analysis using light data set(<30MB). This package has filtering method similar with sql's filtering functions. Hope this package could be helpful for who analyze data in social science.
csv2sqlLike is consistent with 2 main classes.
- PseudoSQLFromCSV
- Transfer2SQLDB
PseudoSQLFromCSV is charging on handling data:
- load data and heads as nested list and list from csv file
- filtering data under specific condition
- grouping data with specific head
- write csv file with data inside this object
Transfer2SQLDB is charging on data transferring between PseudoSQLFromCSV and DB:
- create table in DB from data inside PseudoSQLFromCSV
- bring data as nested list from table in DB
Installation
PIP:
pip3 install csv2sqllike
Usages
load data from csv file
data = csv2sqllike.get_data_from_csv("[path_to_file]")
# example
data = csv2sqllike.get_data_from_csv("./data.csv")
data = csv2sqllike.get_data_from_csv("./test.csv",
type_dict={'region': 'str', 'country': 'str', 'name': 'str', 'sex': 'str', 'university': 'str', 'age': 'int'}
)
# check loaded data type : nested list
print(type(data.data))
# check data head
print(data.head)
# check first row data
print(data.data[0])
# check data
print(data.data)
filtering data using condition
data.where("[head] [operator] [specific value]")
# example
data.where("region == east-asia")
# check conditions used
print(data.condition_where)
# check filtered data
print(data.cache_data)
grouping data using specific head
data.group_by("[head]")
# example
data.group_by("region")
# check heads used for grouping data
print(data.condition_group_by)
# check grouping data stored in dictionary
print(data.cache_dict)
clear cache storage(storage for filtering and grouping)
# check cache stroage befor clearing caches
print(data.condition_where)
print(data.cache_data)
print(data.condition_group_by)
print(data.cache_dict)
# clear cache storage
data.clear_cache_data()
# check cache stroage after clearing caches
print(data.condition_where)
print(data.cache_data)
print(data.condition_group_by)
print(data.cache_dict)
add head and delete head
print(data.header)
# add new head
data.add_head("new_head")
# check added head
print(data.header)
# delete head
data.delete_head("new_head")
# check heads after deleting specific head
print(data.header)
For more examples and usage, please refer to the jupyter notebook.
Release History
- 1.0.0
- First release
- 1.0.1
- Add encoding option(default encode is utf-8)
- 1.0.2
- Add auto installing for required package
- 1.0.3
- Improve precision on data shape check function