arXivTrends
An arXiv scraper to retrieve records from given research areas in mathematics and detect some trends in hyper-specialization and rate increase of scientific production in those fields.
Install
Use the package manager pip (or pip3
for python3):
$ pip install arxivtrends
Alternatively, download the source and use setup.py
:
$ python setup.py install
To update the module using pip
:
$ pip install arxivtrends --upgrade
Examples
Let's import arxivtrends
and create a scraper to fetch all preprints in
Fourier analysis (for other fields see below):
import arxivtrends
scraper = arxivtrends.Scraper(macro_field='Partial differential equations of elliptic type')
The instantiation of the class Scraper
with the parameter macro_field
set to 'Partial differential equations of elliptic type' returns a dictionary-like
object containing all the information (authors, title, submission date, etc.)
about the arXiv preprints whose Mathematics Subject Classification (MSC) falls under the category Partial differential
equations of elliptic type.
Once scraper
is built, we can start the parsing process and extract the
information we want for each preprint: submission date, list of authors and
number of pages.
output_df = scraper.scrape()
While scrape()
is running, it prints its status:
Total number of papers scraped: 100
Total number of papers scraped: 200
...
Finally the extracted information is saved both into the pandas DataFrame output_df
and into a .csv
file. The latter option may be useful in case of overnight
running and kernel shutdown after a certain time of inactivity, as the parsing
process may last up to a few hours (see the script arxivtrends.py
).
Once the parsing is complete, we can call the data visualization methods (see the
script graphics.py
) and see what the data can tell us. For example, the
below call to the method plot_N_authors_papers()
shows the number of
uploaded arXiv preprints with at least 3 authors, year by year:
plot_N_authors_papers(output_df, 3)
Research Areas
Currently available option for the parameter macro_field
:
Harmonic analysis on Euclidean spaces
(MSC codes: 42A05 - 42C40), Abstract harmonic analysis
(MSC codes: 43A05 - 43A90), Partial differential equations of elliptic type
(MSC codes: 35J05 - 35J85), Partial differential equations of fluid mechanics
(MSC codes: 76A02 - 76S05).