Alt Job

Scraping alternative websites for jobs.

Atl Job scrapes a bunch of green/social/alternative websites to send digest of new job postings by email. Also generates an Excel file with job postings informations.

The scraped data include: job title, type, salary, week_hours, date posted, apply before date and full description. Additionnaly, a set of keywords matches are automatically checked against all jobs and added as a new column. (See screens)

Job postings mailling lists 🔥

Montréal / Québec:
alt_job_mtl Google Group. Join to receive a daily digest of new Montréal and Province of Québec job postings.

Supported websites

Alt Job is wrote in an extensible way, only 30 lines of code are required to support a new job posting site! Focused on Canada/Québec for now, please contribute to improve the software or expand the scope 🙂

Supports the following websites:

arrondissement.com: Québec (full parsing)
cdeacf.ca: Québec (full job PDFs parsing)
chantier.qc.ca: Québec (full parsing)
goodwork.ca: Québec and Canada (full parsing, form search still TODO)
engages.ca: Québec (paging TODO)
enviroemplois.org: Québec (full parsing)
charityvillage.com: Québec and Canada (full parsing, require chromedriver)
aqoci.qc.ca: Québec, Internationnal (full parsing)

The support of the following websites is on the TODO:

Install

Install all requirements (see setup.py for more details)

python3 -m pip install 'alt_job[all]'

Require Python >= 3.6

Configure

Sample config file

[alt_job]

##### General config #####

# Logging
log_level=INFO
scrapy_log_level=ERROR

# Jobs data file, default is ~/jobs.json
# jobs_datafile=/home/user/Jobs/jobs-mtl.json

# Asynchronous workers, number of site to scan at the same time
# Default to 5.
# workers=10

##### Mail sender #####

# Email server settings
smtphost=smtp.gmail.com
mailfrom=you@gmail.com
smtpuser=you@gmail.com
smtppass=password
smtpport=587
smtptls=Yes

# Email notif settings
mailto=["you@gmail.com"]

##### Scrapers #####

# Website domain
[goodwork.ca]
# URL to start the scraping, required for all scrapers
url=https://www.goodwork.ca/jobs.php?prov=QC

[cdeacf.ca]
url=http://cdeacf.ca/recherches?f%5B0%5D=type%3Aoffre_demploi

# Load full jobs details: If supported by the scraper,
#   this will follow each job posting link in listing and parse full job description.
#   turn on to parse all job informations
# Default to False!
load_full_jobs=True

[arrondissement.com]
url=https://www.arrondissement.com/tout-list-emplois/

# Load all new pages: If supported by the scraper,
#   this will follow each "next page" links and parse next listing page
#   until older (in database) job postings are found.
# Default to False!
load_all_new_pages=True

[chantier.qc.ca]
url=https://chantier.qc.ca/decouvrez-leconomie-sociale/offres-demploi
load_full_jobs=Yes

# Disabled scraper
# [engages.ca]
# url=https://www.engages.ca/emplois?search%5Bkeyword%5D=&search%5Bjob_sector%5D=&search%5Bjob_city%5D=Montr%C3%A9al

[enviroemplois.org]
# Multiple start URLs crawl
start_urls=["https://www.enviroemplois.org/offres-d-emploi?sector=&region=6&job_kind=&employer=",
    "https://www.enviroemplois.org/offres-d-emploi?sector=&region=3&job_kind=&employer="]

Run it

python3 -m alt_job -c /home/user/Jobs/alt_job.conf

Arguments

Some of the config options can be overwritten with CLI arguments.

  -c <File path> [<File path> ...], --config_file <File path> [<File path> ...]
                        configuration file(s). Default locations will be
                        checked and loaded if file exists:
                        `~/.alt_job/alt_job.conf`, `~/alt_job.conf` or
                        `./alt_job.conf` (default: [])
  -t, --template_conf   print a template config file and exit. (default:
                        False)
  -V, --version         print Alt Job version and exit. (default: False)
  -x <File path>, --xlsx_output <File path>
                        Write all NEW jobs to Excel file (default: None)
  -s <Website> [<Website> ...], --enabled_scrapers <Website> [<Website> ...]
                        List of enabled scrapers. By default it's all scrapers
                        configured in config file(s) (default: [])
  -j <File path>, --jobs_datafile <File path>
                        JSON file to store ALL jobs data. Default is
                        '~/jobs.json'. Use 'null' keyword to disable the
                        storage of the datafile, all jobs will be considered
                        as new and will be loaded (default: )
  --workers <Number>    Number of websites to scrape asynchronously (default:
                        5)
  --full, --load_all_jobs
                        Load the full job description page to parse
                        additionnal data. This settings is applied to all
                        scrapers (default: False)
  --all, --load_all_new_pages
                        Load new job listing pages until older jobs are found.
                        This settings is applied to all scrapers (default:
                        False)
  --quick, --no_load_all_jobs
                        Do not load the full job description page to parse
                        additionnal data (Much more faster). This settings is
                        applied to all scrapers (default: False)
  --first, --no_load_all_new_pages
                        Load only the first job listing page. This settings is
                        applied to all scrapers (default: False)
  --mailto <Email> [<Email> ...]
                        Emails to notify of new job postings (default: [])
  --log_level <String>  Alt job log level. Exemple: DEBUG (default: INFO)
  --scrapy_log_level <String>
                        Scrapy log level. Exemple: DEBUG (default: ERROR)

alt-job
Release 0.3.5

Release 0.3.5

0.3.5

0.3.4

0.3.3

0.3.2

0.3.1

0.3

0.2

Documentation

Alt Job

Job postings mailling lists 🔥

Supported websites

Install

Configure

Run it

Arguments

Stats

Development practices

Releases

Contributors

alt-job Release 0.3.5

Release 0.3.5 Toggle Dropdown 0.3.5 0.3.4 0.3.3 0.3.2 0.3.1 0.3 0.2

Documentation

Alt Job

Job postings mailling lists 🔥

Supported websites

Install

Configure

Run it

Arguments

Stats

Development practices

Releases

Contributors

alt-job
Release 0.3.5

Release 0.3.5

0.3.5

0.3.4

0.3.3

0.3.2

0.3.1

0.3

0.2