Scraping & Big Data


Web scraping (also called web harvesting or web data extraction) is a technique that allows the extraction of information from websites with the use of programs. Often it includes transforming unstructured data from web pages into databases for content analysis or reuse.

Web programs that simulate human navigation are created and launched. By visiting web pages, these software collect the necessary data and transcribe them on files (or databases). This data is used for offline analyzes.

In market research, web scraping is often used to collect contact information from particular targets. For example, for some surveys on companies, contact information is first collected on the yellow pages and then used in CATI or CAWI surveys.

Data collection on Social Networks, Twitter, Blogs ...

Big Data are information generated by users involuntarily (browsing) or voluntarily (writing in blogs), on Social Networks. Then there are also the information generated for some administrative operations such as credit card payments.

Public Opinion Quarterly (POQ), the quarterly magazine of AAPOR, dedicated a monographic issue on the past, present and future of investigations. Cooper has called this type of data “organic data”. Hardly anyone uses this denomination so we will call them Big Data for clarity.

Big Data does not replace statistical research, opinion polls or market research for one simple reason: they often collect little information. The most commonly shared information is like this: I like a certain brand, gender, age, location, time of post. The advantage is that big data can include tens of thousands of cases (eg Facebook likes), while a survey typically does not exceed 2.000 respondents. On the other hand, a questionnaire is normally composed of dozens of questions on which it is possible to analyze any relationships (eg which product I consume, for what reasons, etc.)

Another aspect to consider is the two Big Data biases:

  • coverage: what is the coverage of that particular Social Network with respect to the population?
  • measurement: how many members of that Social Network are happy to share (and let know) their opinion on a particular product?

This service consists of collecting and systematizing information that is voluntarily or involuntarily left on the web by "surfers". For example, like a brand on facebook, opinion on a politician left on twitter or blogs, navigators' paths from one site to another.

This type of information is constantly growing and is an opportunity for Demetra Opinions.net techno-researchers to provide a new service to their customers.

Do you have any questions or want to ask us for a quote?