Research in programming Wikidata/Business enterprise
This article is devoted to the study Wikidata objects "commercial organizations". With the help of SPARQL queries, computed on the objects of the type "commercial organizations" in the Wikidata, the following tasks have been solved: maked a list with organizations by branches distribution in the form of a bubble chart, counted the quantity of organizations by countries, drawn the graph of existing organizations and their subsidiaries. Conclusions were drawn regarding the completeness of the Wikidata on this topic, including a map of the organizations of the world.
Instances of object "Business enterprise"
edit- Objects: business enterprise (Q4830453)
Using the following queary we can get list of all commercial organizations.
#added 2017-02
#List of `instances of` "business enterprise"
SELECT ?lang ?langLabel
WHERE
{
?lang wdt:P31 wd:Q4830453.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
SPARQL-query, 109383 Results
👍> The most complete and elaborated business enterprise on the Wikidata are: Google, Apple, Microsoft
👎> Almost empty and uninformative business enterprise on the Wikidata are: Pininfarina, ANHUI EXPRESSWAY COMPANY LIMITED, Futura et Marge
The defect of the resulting list is that objects turned out to be nameless on the Wikidata (No label defined). Let's try to get a list of organizations where "label" field will be non-empty.
#List of `instances of` "business enterprise" only with a label.
SELECT ?item ?item_label
WHERE
{
?item wdt:P31 wd:Q4830453
; rdfs:label ?item_label.
FILTER (LANG(?item_label) = "en").
}
SPARQL-query, 74556 Results
Distribution of organizations by industry
editEach organization specializes some industry. In order to understand which industry, for example, is the most popular (that is, how many organizations work in this industry), we can build a diagram.
Type of result: bubble diagram.
Are used:
- object business enterprise (Q4830453) (business enterprise),
- property industry (P452) (industry).
#enterprise industry ranking
#defaultView:BubbleChart
SELECT ?industry ?company (count(*) as ?count)
WHERE
{
?org wdt:P31 wd:Q4830453.
?org wdt:P452 ?industry.
OPTIONAL {
?industry rdfs:label ?company
filter (lang(?company) = "en")
}
}
GROUP BY ?industry ?company
ORDER BY DESC(?count) ASC(?company)
SPARQL query, 864 Results.
After analysis of this diagram (Fig. 1), we can conclude that the number of organizations involved in a particular industry. It is possible to build a table based on the data obtained (make a list of the 5 most popular industries):
Industry name | Quantity of organizations |
---|---|
automative industry | 1149 |
retail | 843 |
telecommunications | 648 |
video game industry | 633 |
manufacturing | 506 |
Let's answer the question: What and how many industries exist in Russia?
#enterprise industry ranking in Russia
#defaultView:BubbleChart
SELECT ?industry ?company (count(*) as ?count)
WHERE
{
?org wdt:P31 wd:Q4830453.
?org wdt:P452 ?industry.
?org wdt:P17 wd:Q159. #Russia country
OPTIONAL {
?industry rdfs:label ?company
filter (lang(?company) = "en")
}
}
GROUP BY ?industry ?company
ORDER BY DESC(?count) ASC(?company)
SPARQL-query, 60 Results.
Industry name | Quantity of organizations |
---|---|
retail | 78 |
automative industry | 13 |
arms industry | 10 |
aerospace industry | 9 |
video game industry | 9 |
It can be concluded that such industry as retail in Russia dominates over the rest, and very seriously. If the quantity of organizations in this area reaches 78, then in the next industry (automotive industry), only 13 organizations work.
For comparison, we can build a list of existing industries of some other country (for example, Norway).
#enterprise industry ranking in Norway
#defaultView:BubbleChart
SELECT ?industry ?company (count(*) as ?count)
WHERE
{
?org wdt:P31 wd:Q4830453.
?org wdt:P452 ?industry.
?org wdt:P17 wd:Q20. #Norway country
OPTIONAL {
?industry rdfs:label ?company
filter (lang(?company) = "en")
}
}
GROUP BY ?industry ?company
ORDER BY DESC(?count) ASC(?company)
SPARQL-query, 41 Results.
The dominant industry here is manufacturing (Q187939).
Number of organizations by country
editNext query displays number of commercial organizations in each country in the world.
Are used:
- object business enterprise (Q4830453) (business enterprise),
- property country (P17) (country).
SELECT ?countryLabel (count(?org) as ?count)
WHERE
{
?org wdt:P31 wd:Q4830453.
?org wdt:P17 ?country.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
GROUP BY ?country ?countryLabel
ORDER BY DESC (?count)
SPARQL-query, 198 Results
Organizations and their subsidiaries
editIt is necessary to build a graph from existing organizations, including subsidiaries.
Are used:
- object business enterprise (Q4830453) (business enterprise),
- property subsidiary (P355) (subsidiary).
#subsidary graph
#defaultView:Graph
SELECT ?org ?orgLabel ?subsidiary ?subsidiaryLabel
WHERE
{
?org wdt:P31 wd:Q22687
; rdfs:label ?item_label.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
OPTIONAL { ?org wdt:P355 ?subsidiary. }
FILTER (LANG(?item_label) = "en")
}
SPARQL-query, 428 Results(edges).
The resulting graph of neighbors (Fig. 2) consists of hanging vertices and isolated vertices. It is necessary to construct a graph where these vertices are absent.
#neighboring countries graph
#defaultView:Graph
SELECT ?org ?orgLabel ?subsidiary ?subsidiaryLabel
WHERE
{
?org wdt:P31 wd:Q22687
; rdfs:label ?item_label.
?org wdt:P355 ?subsidiary.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
FILTER (LANG(?item_label) = "en")
}
SPARQL-query, 55 Results(edges).
Fullness of the Wikidata
editAccording to the category List of companies of Russia there are at least 208 commercial organizations in English Wikipedia in Russia. We can note that there is a rating of the largest companies of Russia that is listed. It can be concluded that even big organizations have not been included in this list, not talking about small and medium ones.
It is impossible to obtain relevant data on the number of commercial organizations, because their number grows every day, and information about them is not represented in the public domain. For example, the USRLE, which provides data for a fee. [1]
The quantity of commercial organizations entered in the state register as newly created, in 2014 amounted 420.5 thousand, according to data on the site of the Federal Tax Service (FTS). In June, 2015 came into force orders of the Ministry of Finance of Russia that the data of existing organizations and information about them no longer applies in public. The data can be provided only to state authorities, local self-government bodies and so on. Therefore, it is not possible to obtain reliable data on the quantity of available organizations.
There is an opportunity to explore fullness with the help of the Wikidata. It is necessary to remember the total number of organizations (from the beginning) on the Wikidata (about 110 000, as their number is constantly growing). A typical user who has a general understanding of organizations may be interested to see how an organization looks or where it is located on the map.
To see how many organizations have an image (that is, the 'image' field is filled in), we need to write the following script.
#List of organizations with image
SELECT ?org ?orgLabel ?image
WHERE
{
?org wdt:P31 wd:Q4830453. #instance of orgs
?org wdt:P18 ?image #has image
SERVICE wikibase:label { bd:serviceParam wikibase:language "en"}
}
SPARQL-query, 2913 Results.
It can be concluded that the number of organizations with the image is 2913. This is not so much, which indicates about incompleteness of information.
Let's build a table of (maybe) popular user requests for organizations (depending on who is interested in some things about the organization). Also, we sort it by descending the results.
Request name | Quantity of results |
---|---|
inception | 30995 |
founded by | 5722 |
subsidiary | 3398 |
subsidiary | 2913 |
location | 577 |
motto | 2 |
The results of this table indicate that the quantity of necessary information about organizations is very small, considering their total number on the Wikidata.
There is an opportunity to investigate organizations in Russia too. We can try to get a list of organizations in Russia with the help of the Wikidata.
#List of organizations
SELECT ?org ?orgLabel
WHERE
{
?org wdt:P31 wd:Q4830453. #instance of organizations
?org wdt:P17 wd:Q159. #Russia country
SERVICE wikibase:label { bd:serviceParam wikibase:language "en"}
}
SPARQL-query, 577 Results.
There are 577 organizations that were output by the query. For example, the user wants to see how these organizations are located on the map. It is necessary to write a script.
#Map of organizations
#defaultView:Map
SELECT ?org ?orgLabel ?location
WHERE
{
?org wdt:P31 wd:Q4830453. #instance of orgs
?org wdt:P17 wd:Q159. #Russia country
?org wdt:P625 ?location #display location
SERVICE wikibase:label { bd:serviceParam wikibase:language "en"}
}
SPARQL-query, 9 Results.
Result: very few records with geographic coordinates in Russia. We can get a map of organizations not only in Russia, but of all organizations in the world by using the following script.
#List of organizations
#defaultView:Map
SELECT ?org ?orgLabel ?location
WHERE
{
?org wdt:P31 wd:Q4830453. #instance of orgs
?org wdt:P625 ?location
SERVICE wikibase:label { bd:serviceParam wikibase:language "en"}
}
SPARQL-query, 511 Results.
The result (Fig. 3), again, is very small, only 511 organizations. The quantity of organizations with location is even less than the total number of all organizations in Russia.
Analyzing the data obtained, it can be concluded that the information about organizations on the Wikidata are only partially filled. There is not enough information to do any definite conclusions about the organizations and their components. A small amount of information can be explained by the chaotic appearance and disappearance of organizations (it is not easy to survive in such conditions of competition and the existing economy). But the information even about such major organizations (Apple, Microsoft, Intel) is incomplete and needs to be improved (for example, the Intel organization does not have a motto on Wikidata).
Future work
edit- Output 20 organizations with the largest revenue.
- Output as a diagram how many commercial organizations are appear each year.
- What is the distribution of the quantity of commercial organizations by industry in different countries.
Test
edit
SPARQL-queries with answers:
List of all organizations with years of creation,
List of all organizations In Russia with image,
References
edit- "Access to EGRUL and EGRIP". 2017.
- Andrew Krizhanovsky, Nikita Nikolaev (2017). "Коммерческие организации" [Business Enterprise]. Authorea.