Research in programming Wikidata/Business enterprise

This article is devoted to the study Wikidata objects "commercial organizations". With the help of SPARQL queries, computed on the objects of the type "commercial organizations" in the Wikidata, the following tasks have been solved: maked a list with organizations by branches distribution in the form of a bubble chart, counted the quantity of organizations by countries, drawn the graph of existing organizations and their subsidiaries. Conclusions were drawn regarding the completeness of the Wikidata on this topic, including a map of the organizations of the world.

Instances of object "Business enterprise" edit

Using the following queary we can get list of all commercial organizations.

#added 2017-02
#List of `instances of` "business enterprise" 
SELECT ?lang ?langLabel
WHERE
{
    ?lang wdt:P31 wd:Q4830453.
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}

SPARQL-query, 109383 Results

👍> The most complete and elaborated business enterprise on the Wikidata are: Google, Apple, Microsoft

👎> Almost empty and uninformative business enterprise on the Wikidata are: Pininfarina, ANHUI EXPRESSWAY COMPANY LIMITED, Futura et Marge

The defect of the resulting list is that objects turned out to be nameless on the Wikidata (No label defined). Let's try to get a list of organizations where "label" field will be non-empty.

#List of `instances of` "business enterprise" only with a label.
SELECT ?item ?item_label
WHERE
{
    ?item wdt:P31 wd:Q4830453
    ; rdfs:label ?item_label. 

    FILTER (LANG(?item_label) = "en"). 
}

SPARQL-query, 74556 Results

Distribution of organizations by industry edit

Each organization specializes some industry. In order to understand which industry, for example, is the most popular (that is, how many organizations work in this industry), we can build a diagram.

Type of result: bubble diagram.

Are used:

#enterprise industry ranking
#defaultView:BubbleChart
SELECT ?industry ?company (count(*) as ?count)
WHERE 
{
    ?org wdt:P31 wd:Q4830453.
    ?org wdt:P452 ?industry.
    OPTIONAL {
		?industry rdfs:label ?company
		filter (lang(?company) = "en")
	}
}
GROUP BY ?industry ?company
ORDER BY DESC(?count) ASC(?company)

SPARQL query, 864 Results.

After analysis of this diagram (Fig. 1), we can conclude that the number of organizations involved in a particular industry. It is possible to build a table based on the data obtained (make a list of the 5 most popular industries):

TOP5 most popular industries
Industry name Quantity of organizations
automative industry1149
retail843
telecommunications648
video game industry633
manufacturing506
 
Fig. 1: Diagram of organizations of the world by industry


Let's answer the question: What and how many industries exist in Russia?

#enterprise industry ranking in Russia
#defaultView:BubbleChart
SELECT ?industry ?company (count(*) as ?count) 
WHERE 
{
    ?org wdt:P31 wd:Q4830453.
    ?org wdt:P452 ?industry.
    ?org wdt:P17 wd:Q159. #Russia country
    OPTIONAL {
		?industry rdfs:label ?company
		filter (lang(?company) = "en")
	}
}
GROUP BY ?industry ?company
ORDER BY DESC(?count) ASC(?company)

SPARQL-query, 60 Results.

TOP5 most popular organizations in Russia
Industry name Quantity of organizations
retail78
automative industry13
arms industry10
aerospace industry9
video game industry9

It can be concluded that such industry as retail in Russia dominates over the rest, and very seriously. If the quantity of organizations in this area reaches 78, then in the next industry (automotive industry), only 13 organizations work.

For comparison, we can build a list of existing industries of some other country (for example, Norway).

#enterprise industry ranking in Norway
#defaultView:BubbleChart
SELECT ?industry ?company (count(*) as ?count) 
WHERE 
{
    ?org wdt:P31 wd:Q4830453.
    ?org wdt:P452 ?industry.
    ?org wdt:P17 wd:Q20. #Norway country
    OPTIONAL {
		?industry rdfs:label ?company
		filter (lang(?company) = "en")
	}
}
GROUP BY ?industry ?company
ORDER BY DESC(?count) ASC(?company)

SPARQL-query, 41 Results.

The dominant industry here is manufacturing (Q187939).

Number of organizations by country edit

Next query displays number of commercial organizations in each country in the world.

Are used:

SELECT ?countryLabel (count(?org) as ?count)
WHERE
{
    ?org  wdt:P31 wd:Q4830453.
    ?org wdt:P17 ?country.

  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
 }
  GROUP BY ?country ?countryLabel
  ORDER BY DESC (?count)

SPARQL-query, 198 Results

Organizations and their subsidiaries edit

It is necessary to build a graph from existing organizations, including subsidiaries.

Are used:

#subsidary graph
#defaultView:Graph
SELECT ?org ?orgLabel ?subsidiary ?subsidiaryLabel
WHERE
{
    ?org wdt:P31 wd:Q22687
    ; rdfs:label ?item_label.

    SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
    OPTIONAL { ?org wdt:P355 ?subsidiary. }
    FILTER  (LANG(?item_label) = "en") 
}

SPARQL-query, 428 Results(edges).

The resulting graph of neighbors (Fig. 2) consists of hanging vertices and isolated vertices. It is necessary to construct a graph where these vertices are absent.

 
Fig. 2: Diagram of subsidiaries of the world


#neighboring countries graph
#defaultView:Graph
SELECT ?org ?orgLabel ?subsidiary ?subsidiaryLabel
WHERE
{
    ?org wdt:P31 wd:Q22687
    ; rdfs:label ?item_label.
    ?org wdt:P355 ?subsidiary. 
  
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }

    FILTER  (LANG(?item_label) = "en") 
}

SPARQL-query, 55 Results(edges).

Fullness of the Wikidata edit

According to the category List of companies of Russia there are at least 208 commercial organizations in English Wikipedia in Russia. We can note that there is a rating of the largest companies of Russia that is listed. It can be concluded that even big organizations have not been included in this list, not talking about small and medium ones.

It is impossible to obtain relevant data on the number of commercial organizations, because their number grows every day, and information about them is not represented in the public domain. For example, the USRLE, which provides data for a fee. [1]

The quantity of commercial organizations entered in the state register as newly created, in 2014 amounted 420.5 thousand, according to data on the site of the Federal Tax Service (FTS). In June, 2015 came into force orders of the Ministry of Finance of Russia that the data of existing organizations and information about them no longer applies in public. The data can be provided only to state authorities, local self-government bodies and so on. Therefore, it is not possible to obtain reliable data on the quantity of available organizations.

There is an opportunity to explore fullness with the help of the Wikidata. It is necessary to remember the total number of organizations (from the beginning) on the Wikidata (about 110 000, as their number is constantly growing). A typical user who has a general understanding of organizations may be interested to see how an organization looks or where it is located on the map.

To see how many organizations have an image (that is, the 'image' field is filled in), we need to write the following script.

#List of organizations with image

SELECT ?org ?orgLabel ?image
WHERE
{
  ?org wdt:P31 wd:Q4830453. #instance of orgs
  ?org wdt:P18 ?image #has image
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en"}
}

SPARQL-query, 2913 Results.

It can be concluded that the number of organizations with the image is 2913. This is not so much, which indicates about incompleteness of information.

Let's build a table of (maybe) popular user requests for organizations (depending on who is interested in some things about the organization). Also, we sort it by descending the results.

Table of requests in Wikidata
Request name Quantity of results
inception30995
founded by5722
subsidiary3398
subsidiary2913
location577
motto2

The results of this table indicate that the quantity of necessary information about organizations is very small, considering their total number on the Wikidata.

There is an opportunity to investigate organizations in Russia too. We can try to get a list of organizations in Russia with the help of the Wikidata.

#List of organizations 

SELECT ?org ?orgLabel
WHERE
{
  ?org wdt:P31 wd:Q4830453. #instance of organizations
  ?org wdt:P17 wd:Q159. #Russia country

  SERVICE wikibase:label { bd:serviceParam wikibase:language "en"}
}

SPARQL-query, 577 Results.

There are 577 organizations that were output by the query. For example, the user wants to see how these organizations are located on the map. It is necessary to write a script.

#Map of organizations 
#defaultView:Map

SELECT ?org ?orgLabel ?location
WHERE
{
  ?org wdt:P31 wd:Q4830453. #instance of orgs
  ?org wdt:P17 wd:Q159. #Russia country
  ?org wdt:P625 ?location #display location

  SERVICE wikibase:label { bd:serviceParam wikibase:language "en"}
}

SPARQL-query, 9 Results.

Result: very few records with geographic coordinates in Russia. We can get a map of organizations not only in Russia, but of all organizations in the world by using the following script.

#List of organizations 
#defaultView:Map

SELECT ?org ?orgLabel ?location
WHERE
{
  ?org wdt:P31 wd:Q4830453. #instance of orgs
  ?org wdt:P625 ?location

  SERVICE wikibase:label { bd:serviceParam wikibase:language "en"}
}

SPARQL-query, 511 Results.

The result (Fig. 3), again, is very small, only 511 organizations. The quantity of organizations with location is even less than the total number of all organizations in Russia.

 
Fig. 3: World organizations map


Analyzing the data obtained, it can be concluded that the information about organizations on the Wikidata are only partially filled. There is not enough information to do any definite conclusions about the organizations and their components. A small amount of information can be explained by the chaotic appearance and disappearance of organizations (it is not easy to survive in such conditions of competition and the existing economy). But the information even about such major organizations (Apple, Microsoft, Intel) is incomplete and needs to be improved (for example, the Intel organization does not have a motto on Wikidata).

Future work edit

  1. Output 20 organizations with the largest revenue.
  2. Output as a diagram how many commercial organizations are appear each year.
  3. What is the distribution of the quantity of commercial organizations by industry in different countries.

Test edit

1 The following commercial organizations are listed: Tele2, Lada, Aviakor, Uralmash. Correlate the organization's data with the images below.

1 (Tele2),2 (Lada),3 (Aviakor),4 (Uralmash)
 
 
 
 

2 Such commercial organizations are known: MegaFon, Svyaznoy, w:EurosetEvroset, Sportmaster. Years of the creation of commercial organizations are known: 1992, 1995, 1997, 2002.
Arrange the organization's data in order of increasing date of their creation (1st place is the oldest organization, 4th place is the newest one).

1 place (1992),2 place (1995),3 place (1997),4 place (2002)
  MegaFon
  Svyaznoy
  Evroset
Sportmaster

3 Arrange countries in ascending order of the number of organizations (on the 1st place: least number of organizations):

1 2 3 4
Sweden
United Kingdom
USA
Germany


SPARQL-queries with answers:

List of all organizations,

List of all organizations with years of creation,

List of all organizations In Russia with image,

List of organizations by country in descending order

References edit

  • "Access to EGRUL and EGRIP". 2017.
  1. EGRUL 2017.