Occupations Classification

This vignette will go into the details of the ESCO/ISCO hierarchical relationship and explain how the labourR package takes advantage of that relationship to suggest occupations for multilingual free-text vacancy data.

ESCO - ISCO relationship

ESCO is the multilingual classification of European Skills, Competences, Qualifications and Occupations. ESCO works as a dictionary, describing, identifying and classifying professional occupations, skills, and qualifications relevant for the EU labour market and education and training. Those concepts and the relationships between them can be understood by electronic systems, which allows different online platforms to use ESCO for services like matching jobseekers to jobs on the basis of their skills, suggesting training to people who want to reskill or upskill etc.

ISCO, on the other hand, the International Standard Classification of Occupations, is a four-level classification of occupation groups managed by the International Labour Organisation (ILO). Its structure follows a grouping by education level. The two latest versions of ISCO are ISCO-88 (dating from 1988) and ISCO-08 (dating from 2008).

In ESCO, each occupation is mapped to exactly one ISCO-08 code. ISCO-08 can therefore be used as a hierarchical structure for the occupations pillar. ISCO-08 provides the top four levels for the occupations pillar. ESCO occupations are located at level 5 and lower.

Fig.1 - ESCO is mapped to the 4th level of the ISCO hierarchical model.

Fig.1 - ESCO is mapped to the 4th level of the ISCO hierarchical model.

To find more about ESCO and its relationship with ISCO, visit ESCOpedia, the online reference to the ESCO classification.

Mapping job titles

The goal of labourR is to map multilingual free-text of occupations, such as a job title in a Curriculum Vitae, to existing hierarchical ontologies of ESCO and ISCO classification and showcase their importance in understanding and analyzing labour market. Computations are vectorized and the data.table package is used for high performance and memory efficiency.

In the following we will explain how the classifier maps free-text vacancy data into the ESCO-ISCO official ontologies and takes advantage of their hierarchy.

The occupations classifier takes as input the following,

First, the input text is cleansed and tokenized. The tokens are then matched with the ESCO occupations vocabulary, created from the of the occupations. They are joined with the weighted tokens of the ESCO occupations and the sum of the tf-idf score is used to retrieve the suggested ontologies. Precisely, the suggested ESCO occupations are retrieved by solving the optimization problem,

\[\arg \max_d \left\{ \vec{u}_{binary} \cdot \vec{u}_d \right\}\]

where, \(\vec{u}_{binary}\) stands for the binary representation of a query to the ESCO-vocabulary space, while, \(\vec{u}_d\) is the ESCO occupation normalized vector generated by the tf-idf numerical statistic. If an ISCO level is specified, the K-NN algorithm is used to determine the suggested occupation, classified by a plurality vote in the corresponding hierarchical level of its neighbors.

library(labourR)
library(data.table)
library(magrittr)

corpus <- data.table(
  id = 1:3,
  text = c(
    "Insegnante di scuola primaria",
    "Sales and marketing assistant manager",
    "Data Scientist"
  )
)

One functionality the package provides is language identification using the cld2 package, that is based on a Naive Bayes classifier.

corpus[, language := identify_language(text)]

For num_leaves equal to 10 (ESCO occupations) and isco_level equal to 3, the suggested occupation is returned for each identified language respectively,

languages <- unique(corpus$language)
suggestions <- lapply(languages, function(lang) {
  classify_occupation(
    corpus = corpus[language == lang],
    lang = lang,
    isco_level = 3,
    num_leaves = 10
  )
}) %>% rbindlist
#>    id iscoGroup                                      preferredLabel
#> 1:  1       234         Primary school and early childhood teachers
#> 2:  2       243 Sales, marketing and public relations professionals
#> 3:  3       251   Software and applications developers and analysts