Research in programming Wikidata/University

This research is devoted to the study of the Wikidata object - "University". With the help of SPARQL query to Wikidata, the following tasks were solved:

  • building a list of all universities;
  • building a bubble chart showing the ratio of universities in different countries;
  • the mapping of universities located in Russia.

In the course of the work, information was supplemented at the Wikidata objects corresponding to the universities of Russia, conclusions were drawn about the completeness of the data presented in Wikipedia and in the Wikidata.

List of universities edit

Let's create a list of all universities.

#added 2017-02
#List of `instances of` "university" 
SELECT ?university ?universityLabel
WHERE
{
    ?university wdt:P31 wd:Q3918.
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}

SPARQL query 13298 records.

👍> The most complete and elaborate universities on the Wikidata are: University of Tokyo, Massachusetts Institute of Technology, Moscow State University

👎> Almost empty and uninformative universities were: Moscow State University of Food Production, Technical University of UMMC, Saratov State Socio-Economic University

Fullness of Wikidata: world universities edit

According to the international rating system of universities Webometrics Ranking of World Universities there are more than 19 thousand universities on Earth. The main list [1] includes about 12 thousand of them.

The non-profit rating [2] includes information on 12,000 universities and colleges with a website.

According to the category Universities alphabetically of the Russian Wikipedia, there are more than two and a half thousand universities. A single list of all universities in the English Wikipedia does not exist, but there are lists of countries, the category Universities and colleges by country.

The data that can be obtained by looking at these categories are different from those used in the international university rankings. This is due to the fact that many pages of universities are not properly filled in both Wikipedia and Wikidata, they have insufficient information and do not belong to the categories considered. For example, an article in the English Wikipedia Lincoln University of Business and Management has only the category Companies based in Sharjah, which has nothing to do with universities.

Fullness of Wikidata: Russian universities edit

According to the website Statistics of Russian education[3] for 2004, there were 4157 universities in Russia. Let's build a SPARQL query to find out how many universities have information in Wikidata:

#added 2017-03
#Number of universities in Russia
SELECT ?university ?universityLabel ?country 
WHERE
{
	?university wdt:P31 wd:Q3918; # instance of university 
        wdt:P17 wd:Q159;          # with country Russia
        rdfs:label ?item_label.    

    FILTER (LANG(?item_label) = "en") .
	SERVICE wikibase:label {
		bd:serviceParam wikibase:language "en".
	}
}

SPARQL query 436 records.

In the category Universities in Russia Russian Wikipedia has a page Project:Education/Lists/Universities in Russia, which contains a list of more than 350 universities. The category Universities in Russia of the English Wikipedia contains information about 77 Russian universities. Thus, Wikipedia provides information on only a tenth of the universities in Russia.

Universities in different countries edit

Let's construct a bubble diagram showing the ratio of the number of universities in different countries of the world.

#added 2017-03
#Universities in different countries 
#defaultView:BubbleChart
SELECT ?university ?country (count(*) as ?count)
WHERE
{
	?univer wdt:P31 wd:Q3918 ;
        wdt:P17 ?university.
  	OPTIONAL {
		?university rdfs:label ?country
		filter (lang(?country ) = "en")
	}
}
GROUP BY ?university ?country
ORDER BY DESC(?count)

SPARQL query 199 records.

The result is shown in the screenshot below.

 
Bubble chart visualization of number of universities in different countries


Altogether, out of 250 countries, two hundred have universities. Leaders in the number of universities are: the United States of America - 1,608 universities, India - 930, Japan - 836. Russia is in fifth place with 453 universities.

The map of Russian universities edit

Let's display on the map the universities located in Russia ("country" property). If the university has a website, add it on a tooltip ("official website" property).

#added 2017-02
#Locations of universities in Russia 
#defaultView:Map
SELECT ?universityLabel ?universityDescription ?website ?coord
WHERE {
	?university wdt:P31 wd:Q3918;
		wdt:P17 wd:Q159;
		wdt:P625 ?coord.
	OPTIONAL {
		?university wdt:P856 ?website
	}
	SERVICE wikibase:label {
		bd:serviceParam wikibase:language "en, ru".
	}
}

SPARQL query 205 records, as of May 2017. There are 387 Russian universities, as of January 2018.

Incorrect filling edit

PetrSU edit

As a result of the script, Petrozavodsk State University (PetrSU) will not be displayed on the map. It is caused by the fact that the object on the Wiktidata, corresponding to PetrSU, does not have the property "coordinate location". To fix this, the coordinates were added to the PetrSU object.

TSU edit

 
Incorrect filling of the property "coordinate location" caused Tver State Universite float at the ocean

As you can see in the screenshot below, the Tver State University (TSU) is displayed on a map in the Atlantic Ocean. This was also caused by the incorrect filling of the property "coordinate location". This problem was also fixed.

100 objects edit

 
Map of Wikidata objects corresponding to Russian universities and having the property "coordinate location"

Let's data on the location of one hundred universities in the Wikidata. Let's check the success of the filling, once again running the SPARQL query, presented above. As of May, it will return 308 records.

Universities and scientists who studied in them edit

The graph of universities and scientists was drawn with the help of SPARQL script. Vertices are universities and scientists. The edge corresponds to the property "educated at", which shows that the scientist studied at this university.

 
Visualization in the form of graph: universities and scientists who studied in them. The vertices are universities and scientists. The "educated at" property is an edge

The numerical parameters of the constructed graph are:

  • Number of scientists:
  • Число университетов.

Future work edit

  1. List all universities named after someone ("named after" property). For this list:
    • to learn the names of people universities are called by. What profession they have? The answer should be in the form of a bubble diagram.
    • to learn universities are called by the names of living or dead people? How many of those and other countries?
    • to draw a graph (histogram), where the X axis is the number of years since the death of the person who gave the name to the university, to the institution's founding, along the Y axis - how many such universities (for example, Ivanov AA died in 1825, established his name in 1850 , Then on the X axis at 25 add a single line to the histogram (by Y).
  1. Find universities with founders ("founded by" property). Mark them on the map.
  1. Make a ranking of universities by the number of awards according to the data of the Wikidata ("award received" property).
  2. Find the oldest university and output the following information:
    • The city in which it is located,
    • Calculate how old the university is now,
    • A list of famous scientists who studied at this university.
  3. To draw up a columnar diagram of the number of universities in the cities of Russia.
  4. Colculate the number of universities, named after a famous person.

Tests edit


Keys (SPARQL queries):

References edit

  • "Ranking of World Universities". Retrieved 2017-03-08.
  • "4 International Colleges & Universities (4ICU)". Retrieved 2017-03-08.
  • "Статистика Российского образования" [Statistics of Russian education]. Retrieved 2017-03-16.