subhh/discovery-distribution

Install script for TYPO3, subugoe/find extension and subhh/discovery extension for Hamburg Open Science project


Keywords
TYPO3 CMS, typo3find HamburgOpenScience Discovery Appwerft
License
GPL-2.0+

Documentation

HOS-Discovery

Please select/open the topic:

Introduction

This repo describes the use of the TYPO3-Extensions discovery. The module extends the typo3find extension of SUB Göttingen (subugoe) and realizes the Schaufenster for the HamburgOpenScience project "HOS-discovery"

Here are some screenshots:

Search with autocompleting

Heatmap with geolocations

Interactive DDC tree

Wordcloud of subjects

Installation

Solr

Requirement for the Solr installation is the installation of Java ≧7.

Java7

The installation of Java differs by platform:

UBUNTU

On Ubuntu we can use the package manager APT (Advanced Packaging Tool) to do this. To install Java, run the following command in a shell:

sudo apt-get update
sudo apt-get -y install default-jre

CENTOS

On CentOS we can use the package manager Yellowdog Updater (Yum) for installing Java. You can type the following command:

sudo yum install default-jre

Testing of Java

Once Java is installed, you can verify it by running the following command:

sudo java -version

Aspected output:

openjdk version "1.8.0_111"
OpenJDK Runtime Environment (build 1.8.0_111-b15)
OpenJDK 64-Bit Server VM (build 25.111-b15, mixed mode)

Downloading and Installing Apache Solr

First you will need to download the latest version of Apache Solr from the Apache website. You can easily download it using the wget command:

wget http://apache.org/dist/lucene/solr/7.3.1/solr-7.3.1.tgz

Please modify the version. You can see the available version under http://apache.org/dist/lucene/solr/

Once the download is completed, extract the service installation file with the following command:

tar xzf solr-7.3.1.tgz solr-7.3.1/bin/install_solr_service.sh --strip-components=2

Don't forget to modify the versions number.

Install Solr as a service by running the following command:

sudo bash ./install_solr_service.sh solr-7.3.1.tgz

Aspected output:

We recommend installing the 'lsof' command for more stable start/stop of Solr

Extracting solr-7.3.1.tgz to /opt

Installing symlink /opt/solr -> /opt/solr-7.3.1 ...

Installing /etc/init.d/solr script ...

Installing /etc/default/solr.in.sh ...

Service solr installed.
Customize Solr startup configuration in /etc/default/solr.in.sh
NOTE: Please install lsof as this script needs it to determine if Solr is listening on port 8983.

Started Solr server on port 8983 (pid=6426). Happy searching!

Found 1 Solr nodes:

Solr process 6426 running on port 8983
{
 "solr_home":"/var/solr/data",
 "version":"7.3.1 a66a44513ee8191e25b477372094bfa846450316 - shalin - 2018-11-02 19:52:42",
 "startTime":"2016-11-30T06:49:18.927Z",
 "uptime":"0 days, 0 hours, 0 minutes, 18 seconds",
 "memory":"85.4 MB (%17.4) of 490.7 MB"}

This waring we will fix later.

You can start|stop|restart the Solr service with the following commands:

sudo service solr start
sudo service solr stop
sudo service solr restart

More about solr installation

Avoiding of crashes

In /etc/security/limits.conf you can add these lines:

solr             soft    nofile          500000
solr             hard    nofile          500000
solr             soft    nproc           65000
solr             hard    nproc           65000

These lines above suppress the warning during start of solr.

Creating of Solr index

sudo -u solr $(grep SOLR_INSTALL_DIR=/etc/init.d/solr | sed 's/\"//g' | sed 's/SOLR_INSTALL_DIR=//')/bin/solr create -c HOS) 
Hint:

The cli part $(……) greps the value of SOLR_INSTALL_DIR from init.d script (in most cases /opt/solr)

You can test with:

curl http://localhost:8983/solr/admin/cores

This command will show state of server in JSON format.

After successful installation of Java and Solr we will install LAMP as basement for TYPO3.

We tested the installation on Ubuntu and CentOS. Although that is a standard task for a developer we oint to these receipts for Ubuntu and CentOS

Criteria of completeness:

  • mySQL server is running with database typo3, admin user and typo3 user,
  • PHP7.2 is installed with mysql, xml, json, gd, imagemagick, png, jpeg, gif
  • up and running http servers

Apache

Some modifications of php

TYPO3 is greedy. Therefore it make sense to increase some parameters for PHP. You can do it with the script below.

sudo sed -i 's/max_execution_time = 30/max_execution_time = 240/' /etc/php/7.0/apache2/php.ini
sudo sed -i 's/; max_input_vars = 1000/max_input_vars = 1500/' /etc/php/7.0/apache2/php.ini
sudo sed -i 's/upload_max_filesize = 2M/upload_max_filesize = 8M/' /etc/php/7.0/apache2/php.ini

or you edit the properties in /etc/php.ini manually. Don't forget to restart the httpd server by

sudo service apache1 restart

If this step was successful, we can prepare the apache server for delivering of our content.

.htaccess

The TYPO3 generated URLs are very long. For usage of simple URLs of document detail pages we suggest teh usage of realurl module from Martin Poelstra/Kasper Skårhøj/Дмитрий Дулепов. In next version this functionality will realized in core. The internal links will generated from TYPO3, the routing must realized by server. For this you need a .htaccess in document root. You have to copy this file into you document root. In our case /var/www/servers/openscience.hamburg.de/web/.

robots.txt

User-Agent: * 
Disallow: / 
Allow: /ID/

Copy this file to your document root.

Apache VirtualHost

For this we create a file named openscience.hamburg.de.conf inside the folder /etc/apache2/sites-available/. with this content:

<VirtualHost *:80>
        DocumentRoot /var/www/servers/openscience.hamburg.de/web
        ServerName hosdev.sub.uni-hamburg.de
        Options -Indexes
        DirectoryIndex index.php
     	 # Basic Auth for solrAdmin:
     	 <Location /solrAdmin>
        	AuthType Basic
        	AuthName "Restricted Files"
        	AuthBasicProvider file
        	AuthUserFile "/etc/apache/.htpassword"
        	Require user solr
     	 </Location>
       ProxyPreserveHost On
       ProxyRequests Off
       # Tunneling for solrAdmin:
     	ProxyPass /solrAdmin  http://localhost:8983/solr
     	ProxyPassReverse /solrAdmin http://localhost:8983/solr
</VirtualHost>

This configuration supports only http (without SSL). In production it makes sense to enable SSL. This you can do inside the apache.conf or in the load balancer.

Firewall

The firewall (ufw) only allows port 22 and 80.

Requests beginning with /solrAdmin will tunneled to native solr port 8983. With the script htpasswd we can add a user to /etc/apache2/.htpassword

Activating of the VirtualHost

For activating the configuration we have to set a symlink:

sudo ln -s /etc/apache2/sites-available/openscience.hamburg.de.conf /etc/apache2/sites-enabled/openscience.hamburg.de.conf

resp. for CentOS:

sudo ln -s /etc/httpd/sites-available/openscience.hamburg.de.conf /etc/httpd/sites-enabled/openscience.hamburg.de.conf

Creating solr-admin user

This command:

sudo htpasswd -c /etc/httpd/.htpassword solradmin

creates a new file .htpassword inside apache root config (we have announced this in our host section) and adds a user solradmin.

Now we can access the admin UI by URL like http://myserver.com/solrAdmin.

On CentOS this folder is named /etc/httpd/.../.

Testing of Apache (esp. PHP7.2)

For testing purpose you can place a little file (named info.php) with this content:

<?php phpinfo(); ?>

into folder openscience.hamburg.de. This script maybe is usefull:

sudo mkdir /var/www/servers/openscience.hamburg.de;\
sudo echo '<?php phpinfo(); ?>' > /var/www/servers/openscience.hamburg.de/info.php;\
sudo chown -R www-data.www-data /var/www/servers/*

Now you can open the website in a browser and call info.php. Here you can test the right version of php and the other stuff like mysql client

TYPO3

Installing composer

For installing TYPO3 and the extensions we use composer.

First we install curl by:

sudo apt-get install curl // Ubuntu

resp.

sudo yum  install curl // CentOS

Next, download the installer:

sudo curl -s https://getcomposer.org/installer | php

and move the composer.phar file:

sudo mv composer.phar /usr/local/bin/composer

Use the composer command to test the installation. If Composer is installed correctly, the server will respond with a long list of help information and commands:

user@localhost:~# composer
   ______
  / ____/___  ____ ___  ____  ____  ________  _____
 / /   / __ \/ __ `__ \/ __ \/ __ \/ ___/ _ \/ ___/
/ /___/ /_/ / / / / / / /_/ / /_/ (__  )  __/ /
\____/\____/_/ /_/ /_/ .___/\____/____/\___/_/
                /_/
Composer version 1.3.2 2017-01-27 18:23:41

Usage:
  command [options] [arguments]

Options:
  -h, --help                     Display this help message
  -q, --quiet                    Do not output any message

Installation of TYPO3 with all needed extensions

The apache server is listening on port 80 and aspects the document root on

/var/www/servers/openscience.hamburg.de

We change to the parent of this folder and start:

sudo cd /var/www/servers/;\
sudo rm -rf openscience.hamburg.de/;\
sudo composer create-project -vvv subhh/discovery-distribution openscience.hamburg.de dev-master;\
sudo chown -R www-data.www.data *;\  
sudo touch openscience.hamburg.de/web/FIRST_INSTALL

Potential pitfall

You have an other apache user, in this case you have to mofify line 5.

Please test if you are use the right DOCUMENT_ROOT inside your VirtualServer section. In our case it is:

/var/www/servers/openscience.hamburg.de/web

This web trick is new in composer controled TYPO3 for avoiding git issues.

Don't forget to restart by:

sudo service httpd restart

after editing of apache configuration.

Configuration of TYPO3

In a browser of your choice you call the page i.e. http://openscience.hamburg.de

System environment check

The server redirect to typo3/sysext/install/Start/Install.php and ask for some data:

Select database

After click you have to put your DB credentials into form:

After click you have to select an empty database. If the DB is filled (maybe if you have restarted the installation), you have to drop and create again (see. chapter about database).

Create user and import base data

After click you have to create an admin user:

Done!

After this step you can start the TYPO3 backend.

Activating of extensions

First you have to activate three extensions:

  • scriptmerger
  • find
  • discovery

Potential pitfall during extension activation

In some cases the activating doesn't work. In this case the orange progress bar grows very slowly to right and stopps without message. Nothinh to see in error logs. In this case you can open typo3conf/PackageStates.php and add this snippet to the end of the array:

'find' => [
           'packagePath' => 'typo3conf/ext/find/',
        ],
'discovery' => [
            'packagePath' => 'typo3conf/ext/discovery/',
        ],
'scriptmerger' => [
            'packagePath' => 'typo3conf/ext/scriptmerger/',
        ],

After this you can test in extension manager of backend if the three modules are activated.

Adding of static templates

In section WEB/TEMPLATE you have to add static templates from extensions. Click here on Edit the whole template record:

Here on Includes

And now you can add by clicking on the right side of table (Available Items):

Don't forget to save. The save button is on top of section.

Adding plugin to page

In top section WEB/Page, you click on page under the root element and then choose first tab named General. On this tab you can put the title of the page under Header. In our case: "Hamburg Open Science: Discovery".

In tab Plugin you can select Find.

Adding setup to page template

plugin.tx_scriptmerger {
   javascript {
     compress.enable = 0
     minify.enable =1
     merge.enable = 1
    }
    css {
       merge.enable = 1
       compress.enable = 0
    }
}

plugin.tx_find.settings.connections.default.options {
     host = localhost
     port = 8983
     path = /solr/HOS
}

Discription of some features

The Discovery app uses an extended version of subugoe/find. The most facet functionalities are realized with Javascript inside schaufenster extension.

Searchfield

The logic is implemented in file Resources/Public/Javascript/schaufenster.searchfield.js.

The search field consists of three parts:

  • input field(s)
  • input selector
  • submit button

Input field(s)

Every field will configured in typoscript (setup.txt).

Input selector

The original HTML element SELECT is difficult to style. Therefore we use a custome element following this instruction: https://www.w3schools.com/howto/howto_custom_select.asp The handling of selector changes the visibility of input fields. After changing of focus the recent field will emptied. After reload the selector will preselected.

Submit button

Clicking of Submit button submits the form.

Heatmap with geolocation of publications

Obviously the solr query generates more hits then a common map api can process. There are more then one render modes. The most known is a cluster manager. The API limits the number of markers in a map. In our case we have only a couple of geo locations but a big number of hits on one location. In this case a heatmap is a good solution. The model consists of a collection of geolocations with optional value for every location.

The UI has two parts: a "thumbnail" in facet column

and a big version in a lightbox overlay:

The project uses Leaflet as framework and API. This is an open source library for handling of slippy tile maps. Most mapping providers (like google, mapbox, bing, osm) work with this technology. The world map is divided in a fixed grid of tiles (in most cases 256x256px) for all zoom levels. An other technology (wms) renders the maps in real time on server. The most modern technology solution realizes the rendering on client and only vector data will be transfered from server to client.

Wordcloud with subjects

The script /Resources/Public/Javascript/schaufenster.wordcloud.js reads all subjects from subject facet and substitutes the old DOM part with the new one. The used d3 library is a singleton. Therefore it is not possible to realize both (small and large one) in the same namespace. The large part is realized with an iframe to create a new html page.

Creators as Dounut

The simple logic is realized in /Resources/Public/Javascript/schaufenster.publisher.js. Basically the old DOM will substitute with the new one.

DDC as (file-) tree

The Dublin Core Schema is a small set of vocabulary terms that can be used to describe digital resources (video, images, web pages, etc.), as well as physical resources such as books or CDs, and objects like artworks. The first 3 levels are licence free.

The facet ddc contains only the numbers of ddc. The resolving of these numbers to labels will process in the Javascript layer. Server delivers only a simple list, this list will be transformed into a tree model.

The script /Resources/Public/Javascript/schaufenster.ddc.js replaces the original DOM part into a graphical tree.


Adding of new facet components

Currently all new components are realized with pure jQuery. This page describes the clean, TYPO3-conform way.

Working environment

In this receipt some details how you can work with standard UI programms remote via sshfs