BIM-224 Research Infrastructures 23

Materials and Tasks for the module "BIM-224, SoSe 2023, Blümel/Rossenova" for students at Hochschule Hannover. The materials are prepared with several colleagues from the Open Science Lab at TIB Hannover.

Session 1: Data harvesting interfaces / data collection

edit

Slides are available here: https://docs.google.com/presentation/d/1IxRTQhTY8nwFaijHq78m0NvW6Qw_YAj3YQtyO9Nn6dg/edit?usp=sharing

Student homework task pages

edit

Group task 1

edit
Platform list
edit
  • Radar4Culture
  • GNM catalog
  • Forschungsbibliothek Gotha der Universität Erfurt
  • Datenportal des MfN Berlin
  • Herbarium Berolinense
  • Sketchfab
  • Porta Fontium
  • Coding da Vinci
Type of API list
edit

Group task 2

edit

Session 2: Data cleaning, reconciliation and enrichment

edit

Slides are available here: https://docs.google.com/presentation/d/1HpXUXYcs-LDOQYuQzFv1SYYutKUYP8qG0mR5BG3fLyw/edit?usp=sharing

OpenRefine official documentation:

edit

https://openrefine.org/docs/manual/facets

https://openrefine.org/docs/manual/transforming

OpenRefine video tutorial:

edit

https://youtu.be/jyUlT8ohlG4

Homework presentations:

edit

Session 3: Data in Wikidata

edit

Slides are available here: https://docs.google.com/presentation/d/1bCilgycOApKcFjzelntD6zRf5WBU9804t_9Fb-Lc1E8/edit?usp=sharing

Homework presentations:

edit

Session 4: Data Upload and querying (26.05)

edit

Slides are available here: https://docs.google.com/presentation/d/1ebFJXSKikUSyjjPIsXFwTqVV2-igku6ra5Vm83h5SWQ/edit?usp=sharing

Additional tutorials:

edit

Complete upload pipeline tutorial: https://en.wikiversity.org/wiki/OpenRefine_to_Wikibase%3A_Data_Upload_Pipeline

Upload tutorial for media files in Wikimedia Commons: https://en.wikiversity.org/wiki/Uploading_media_files_to_a_Wikibase_with_OpenRefine

Homework presentations:

edit

Session Workshop: Fermenting Data Workshop (02.06)

edit

Slides are available here: https://docs.google.com/presentation/d/1BHlO17nTTXccoPMgqXZBx46zuDvhnM52h5Wj9X8p36M/edit?usp=sharing

Wikibase instance:

edit

https://fermentingdata.wikibase.cloud/w/index.php?title=Special:CreateAccount&returnto=Main+Page

Session 5: Data upload and querying (cont.) / Data visualisation and presentation (09.06)

edit

Video recording of the lecture: https://drive.google.com/file/d/1q94LdQauMPErzK5Yp2jD1zq0_MjWWgCX/view?usp=sharing

Slides are available here: https://docs.google.com/presentation/d/1T1fPDI2jSQJ1Q6rAaARIgxTbmST5Py_C8pmCBBAlXWQ/edit?usp=sharing

Book an individual feedback session - 15 mins per person:

edit
  • 15:00: Lisa Sommer
  • 15:15: -
  • 15:30: Ahmad Aroud
  • 15:45: -
  • 16:00: Anna Rahr
  • 16:15: Gizem Ergün
  • 16:30: Memo Loran Tuku
  • 16:45: Josef Debase
  • 17:00: Ahmad Hasan Ahmad
  • 17:15: Jana Cornelius
  • 17:30: -

Session 6: Data publication and review

edit

In this session we will review homework and discuss requirements for final assignment submission.

Final submission deadline is July 7th.

Final assignment submission instructions

edit

1) Spreadsheet with data you uploaded to Wikidata
2) Spreadsheet with the data you can download from the SPARQL endpoint with your main data query
3) Publication on GitHub Pages containing:

  • your custom query results
  • customized title / author / cover image
  • customized additional text and optionally embedded data visualization as .svg and/or live results in an iframe.

Infos discussed during the session today

edit
1) Adding proper Wikitext to Images in Commons when Uploading via OpenRefine
edit

- A more detailed tutorial page, if you want to go more in-depth (esp. page 5 & 6): https://docs.google.com/document/d/1ENpZBOHvMESOst4Phh5gSRWlnAdBs-OMZt5j_cL-YGA/edit?usp=sharing

- For quick reference, I advise you to just check the screenshot here: and try to replicate in your schema builder when uploading. You need to make sure you have all of these statements for the images, in addition to the Wikitext. Depicts / Main subject link your image to the main object / artwork you uploaded to Wikidata.

- If you have photos of objects, you can use this simple Wikitext for all your photos (in addition to the statements as shown in the screenshot)

== {{int:filedesc}} == {{Art photo}} == {{int:license-header}} == {{CC-BY-4.0}}

Note to check the license – the above is just an example!

Note that if you copy my screenshot schema you will need to update the museum to match the museum you’re working with and license, too.
If you have photos of paintings / artworks, you can use this simple Wikitext for all your photos (in addition to the statements as shown in the screenshot)

== {{int:filedesc}} == {{Artwork}} == {{int:license-header}} == {{CC-BY-4.0}}

- More details are available in the google doc I shared above, but these instructions should be sufficient, too.

2) Using OpenRefine online
edit

- There is actually an online version of OpenRefine! It is a bit old and does not have all new functionalities, e.g. you can’t upload images with it, but other than that it can be helpful in cases when you can’t use it on a personal or institutional computer for technical or other reasons. You need to go here: hub-paws.wmcloud.org and log in with your Wikimedia account. Then select OpenRefine from the set of tools available.

3) Issues with SPARQL queries, e.g. removing multiple line results for same item, etc.
edit

- You can use a group_concat clause to concatenate multiple values in a single column, in order to avoid duplication of the same item over multiple lines, e.g. see this example: https://w.wiki/6qbP

- If you need more help customizing your queries, you can ask your peers, ask ChatGPT (though do not rely on it too much, it is still not very good with SPARQL and you have to be a magician with the prompts to get it all correct), or you can always consult trusted sources like StackOverflow and this very helpful SPARQL learning page on Wikidata - https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples

Final updates regarding publications:
edit

For reference, you can have a look at the publications of your peers, or you can also double-check my own publication, which exemplifies different parts of the assignment.
- published view here: https://lozanaross.github.io/catalogue-003/
- Github code view here: https://github.com/lozanaross/catalogue-003

FINAL SUBMISSION

edit

Send the spreadsheets to the instructor via email.

Add your name & link to your publication below: