Challenges and lessons learned in acquiring data from external data sources: the Internet Festival use case
How do citizens perceive a cultural event such as the Internet Festival? For the Me-Mind project this is an interesting indicator to approach the impact analysis. Not only because perception is important to give us an external look, but for what it implies further.If the citizens’ perception of an event is positive, it means that citizens, associations and businesses see the event as an opportunity to benefit directly and indirectly. If, on the other hand, the perception is “nil” or even negative, this means that the event does not create any benefits or is even a disruptive element. In this case it is important to understand the reasons for this non- or negative perception, in order to create collaborative strategies and benefits both for the city fabric and the event itself.
Let’s start from the beginning.
The Me-Mind project followed several steps of data analysis to measure the impacts of the Internet Festival. After analysing the internal data available to the festival (such as the budget invested on the territory of Pisa, the media coverage and the audience feedback), the project turned its attention to the collection of heterogeneous data, both structured and unstructured, coming from external stakeholders.In particular, the focus was on the economic data of the city fabric (such as number of overnight in hotels, number of tourists registered in the city of Pisa, analytics data from websites promoting the Tuscan city) with the aim of understanding whether there are common trends and patterns that can relate the festival’s activities and growth to the economic activities of the city of Pisa, which hosts the event every year.
The process of acquiring data from external organisations brought us into direct contact with important stakeholders in order to share objectives and methodologies: in fact, collecting data belonging to external organisations is, first and foremost, a work of networking. The data collection process included both a phase of identification of the data available to external organisations and an investigation of the ways these organisations collect, store and organise their data.
Let’s take an example.
Can we say that the festival has an impact on the city’s hospitality structures? What data would help the Internet Festival in this regard? Since accommodation structures normally have a database of guest check-ins and check-outs, we can leverage that dataset to see if the number of guests in the city’s hotels has been positively impacted during the festival days compared to the seasonal average. And again, we can use that dataset to assess if the number of guests in hotels was affected in any way during the week of the event as the reputation of the festival increased (taking into account of course the dividing line created by COVID). The analysis of these data, crossed with other dataset, can open a monitoring window that may help us to focus on the impact of the festival.The biggest obstacle we encountered so far in this data acquiring process, apart from a certain diffidence of the external organisations to make data available, was the mismatch between the way external organisations store and structure data and the way they serve us.
Let’s take another example.
It is interesting for us to understand if the festival positively impacts the popularity of the destination Pisa. If a correlation exists between the number of tourists and the number of reviews, can we identify a growing trend from the oldest to the most recent editions of the festival? The number of tourists might help us to address this question.Our event lasts 4 days every year, starting from the second Thursday of October until Sunday. The tourist flows dataset can be interesting for a comparative analysis only if the data are organised with a daily (or at least weekly) granularity: how many tourists stayed in Pisa every day (or every week) from the beginning of October to the end of the month? Such a database organised on a monthly basis is not able to give us sufficiently detailed data to superimpose the trend of tourist flows on the days of the festival, i.e. to focus on the second week of October.
Therefore, in addition to the need to share heterogeneous data, it is also necessary to think about the data structure so that they benefit all the actors involved.
Let us therefore return to where we began: the theme of the public’s perception of a cultural event. Discovering correlations between the actions of a cultural event and the economic fabric of the city in which the event is held, and being able to show and demonstrate these correlations (if they exist), increases the perception of the event itself and consequently strengthens its capacity to enter into direct relations with other local players by implementing shared and integrated actions.