Wikidata is the next big thing that is coming out of Wikimedia. It is taking the idea of Wikipedia a step further by making data not only accesible to everyone but also machine readable. This means that now we can ask our computers things like Give me a list of the largest cities that have a female mayor
.
The Code For Germany community is putting a focus on election data this year as country-wide elections are coming up. A couple of months ago, we came together to build tools around election data and Wikidata where I build a small website that allows you to draw charts of the seat distribution of parliaments. I will share in this article how to do this. The result will look like this (This is the 10th Landtag/state parliament of the German state Nordrhein-Westfalen) :
The cool thing about this is that all data comes from Wikidata. We just need to tell it which governmental body and which period we are interested in and it will provide all data (including the color of the parties).
The query that was used to get the data for the chart above looks as follows. I will go through it step by step and explain what each line does.
1
2
3
4
5
6
7
8
9
10
11
12
13
SELECT ?partyLabel ?rgb ?party (COUNT(*) as ?count)
WHERE
{
?politician wdt:P39 wd:Q17781726 .
?politician p:P39 ?membership .
?membership pq:P2937 wd:Q30544760 .
?politician wdt:P102 ?party .
?party wdt:P462 ?color .
?color wdt:P465 ?rgb .
SERVICE wikibase:label { bd:serviceParam wikibase:language "de" }
}
group by ?party ?partyLabel ?rgb
The query is written in sparql which is a query language for graph databases. It looks kind of weird when you are used to writing SQL queries but is actually quite cool.
The basic idea is as follows: Everything in Wikidata is a triplet of Subject, Predicate, Object.
So you could have something like Angela Merkel (Subject) has position (Predicate) Federal Chancellor of Germany (Object). The cool thing now is that we can continue here and make our current object (Federal Chancellor of Germany) our new subject. So we could find something like Federal Chancellor of Germany (now the Subject) is being elected by (Predicate) Bundestag of Germany.
This allows us to store information about the world in a structured format very easily. And because we can store it we can also query it. So now we could ask questions like: Give me all people that claim to have position Chancellor of Germany
. The query for it looks like this:
select ?person ?personLabel
where
{
?person wdt:P39 wd:Q4970706 .
SERVICE wikibase:label { bd:serviceParam wikibase:language "de". }
}
You can try it live in the query interface What we are saying here is:
- Define a new variable called
person
. - It should be every item (Subject) that has the property “Position held”(
P39
) (Predicate) with a value of “Chancellor of Germany” (Q4970706
)(Object) -
Call the magic
label
service which also creates a variablepersonLabel
that holds the name of the item in German (de
)What we do now in the request for the parliament visualization builds on top of the same ideas, it just follows the path down a little further. There are two more things in there that are new: a
COUNT(*)
and aGROUP BY
- they allow us to count how many people are in the parliament for each party. This part works just like SQL. The last thing to explain are lines 5 and 6. I did not tell the whole truth when I said that in Wikidata we always have Subject, Predicate, Object. Actually Predicates can also have so called Qualifiers. This is often used to add a source or to add some more information about a statement. So if we have a statement that claims that Angela Merkel is the chancellor of Germany this could have qualifiers for the start and the end date of this statement. So essentially qualifiers add additonal data to a statement. In our example we are looking at themembership
(P39
) in more detail. We want to only get the memberships that have a Parliamentary Term (P2937
) which is the 10th election period in Nordrhein Westfalen (Q30544760
)`.
So let’s walk through our original query step by step:
- We were interested in getting the name of the party, the parties color in RGB, the actual party item in wikidata and the number of members for this election period
- We declare a new variable called
politician
which is every item that claims to be a member of the Landtag in Nordrhein-Westfalen (Q17781726
) - We also store this membership in a variable called
membership
- Now we say that we only want to have items for which the
membership
has a qualifier that puts it in the 10th election period - With our filtered results we get the party for the politician
- We get the party’s color property
- And the color’s RGB value
- Lastly we get the label for all items (we will only be using the
partyLabel
) - We group by everything that we do not want to count (to get a count per party)
Once this is done, we transform the results with some JavaScript code into the format that the excellent parliament-svg expects and get a beautiful chart.
I hope I was able to show you the power of Wikidata and the cool questions that you can answer with it. There is one thing that I need to confess though: Most of the data that we are displaying now was also added by me because it was missing in Wikidata. This might happen to you - maybe you will find the data model but there will be no data yet. Fortunately Wikidata also makes it very easy to import data - but that is a topic for another blog post.
Let me know if you have any questions (I mean it!) on twitter or via E-Mail.