Get information from various web resources

pip install webrecon==0.0.2


Recon project


These codes can take directions from a message and do things that are suited to a distributed workflow.

Local Setup

  • install minikube

  • install kubectl

      $ brew cask install minikube
      $ brew install kubernetes-cli
      $ git clone <this repo>
      $ minikube start
      $ kubectl create -f deployments.yaml
      $ kubectl expose <kubectl get deployments result>

or in pure docker, using docker-compose commands.


For each app in the system, we create a deployment/service/etc. Kubernetes will get a Deployment object, ECS will get a AWS::ECS::Service resource and so on. These services do different things and they're all required but they don't need to be running at the same time; They just need to run at some point.

The different services are:


  • Create kubernetes deployments for the job_loader and pagesim apps
  • Improve corpus.
    • The current corpus is a Wikipedia dump. The Recon app needs to be set to work against several different domains or someone needs to tell me how to google better data sets...The PageSims model will suck until it gets good documents to compare to and Wikipedia isn't so good for everything.
  • Improve the README.....
  • Contemplate harder on switching from Capybara to Twisted for the WebRecon project.
  • Create better kube orchestration
  • Create Cloud resource templates (Like Cloudformation for EC2s for a cluster, etc)
    • AWS:
      • SQS queue, nothing special
      • N number of EC2s, let's start with one or two
      • S3 bucket for artifacts like the corpora and the model(s)