October 15, 2025

What Is Data?

Data is the new gold: discover what data is, which types exist — and how companies create real competitive advantages with Big Data and web crawling.
Visualisierung von Datenströmen und digitalen Informationen – Symbolbild für den Blogbeitrag ‚Was sind Daten?‘ der DataHive GmbH.

Data is everywhere and shapes our daily lives. It's generated through online shopping, social networks, residential usage, and sensors in public spaces and on smartphones. Much of this data is created directly by users through their activities — such as reviews, search queries, or interactions on social media.

The term "data" comes from Latin and is the plural of "datum," meaning "something given." In everyday language, data describes various pieces of information or values. The definition of data forms the foundation for how content is interpreted and further processed. Data serves as a reflection of reality and is essential for making reliable decisions.

But what exactly does "data" mean?

At its core, data refers to information in a specific form that can be stored, transmitted, and interpreted. Correctly categorising different types of data is crucial so that data can be used efficiently and correctly interpreted by systems like search engines. Data forms the basis for knowledge, decisions, and technological developments.

Data Types

Data can be divided into different types depending on the form it takes and how it can be used. A fundamental distinction is made between qualitative and quantitative data. Another important difference lies between structured and unstructured data. In addition, semi-structured datasets exist that combine elements of both.

Qualitative data describes characteristics or properties that cannot be expressed directly in numbers — such as colours, opinions, or categories.

Quantitative data, on the other hand, is measurable and representable in numbers, such as height, temperature, or revenue.

Structured data comes in a clear, tabular form, such as in databases or Excel spreadsheets. It is based on a predefined data model and can be stored in a SQL database.

Unstructured data has no fixed form. This includes texts, images, videos, audio recordings, or social media posts. It often comes in different formats, making search and organisation difficult.

Semi-structured data — examples include XML or JSON files, which have a certain level of order but are more flexible than classic tables.

Big Data

Big Data refers to the enormous volumes of data generated in companies and organisations today, and to the technologies used to make this data usable. Big Data encompasses both structured and unstructured data from a wide variety of sources: from social media and e-commerce platforms to financial transactions and industrial sensor data.

The special feature of Big Data lies not only in volume, but also in the variety of datasets and the speed at which new data is generated. To extract valuable insights from these vast amounts of data, companies use specialised tools and technologies such as Hadoop, Spark, or cloud-based data platforms. Big Data makes it possible to recognise patterns, predict trends, and optimise processes.

Data Origin

Beyond structure and measurability, the context in which data is created also matters. We distinguish between observational data (collected through measurements or sensors), experimental data (deliberately generated under controlled conditions), and transactional data (e.g. from online purchases or card payments).

Challenges in Handling Data

Illustration: Herausforderungen im Datenumgang – Datenflut, Qualität und Datenschutz
A world of challenges in dealing with data

Despite their value, many companies face obstacles when trying to systematically use data:

Data overload — Every day, enormous amounts of new information are generated. Without clear structures and automated processes, time is lost and decisions become inefficient.

Quality and reliability — Not all data is correct, current, or complete. Erroneous or outdated information can lead to wrong conclusions. This is why raw data must be verified, cleaned, and brought into a usable form.

Accessibility — Many potentially valuable datasets are publicly available but not in a format that's easy to use, scattered across websites or in unstructured documents. This is where automated web crawling and data extraction services offer an efficient solution.

Data protection and legal aspects — Using data also brings legal questions, particularly regarding data protection (e.g. DSG or GDPR) and copyright. Reputable data services address these aspects from the outset.

What Is Data Extraction?

Data extraction is a central step in data management. It refers to the process of deliberately obtaining information from various sources, preparing it, and making it available for further use.

The three core steps of data extraction:

  1. Identify the data source — Which systems or files contain the relevant information?
  2. Extract — The desired information is specifically pulled out using ETL tools, APIs, or automated scripts.
  3. Transform and prepare — Raw data is often differently structured and must be cleaned, standardised, and converted into a unified format.

Real-world example: Property investor

Datenanalyse eines Immobilieninvestors – grafische Darstellung von Immobilien-, Baupotential-, Markt- und Bewertungsdaten
Data analysis of a real estate investor

A property investor wants to identify which assets in a city have the best return potential. Data extraction allows them to automatically combine purchase prices, building potential, rental yields, upcoming zoning revisions, building activity, and expected population growth, thereby enabling instant comparisons at the push of a button.

Web Crawling as a Data Source

Web crawling is the automated process in which programmes, so-called crawlers or bots, systematically search websites and capture their contents.

How web crawling works:

  1. Set a starting point (URL)
  2. Follow links and capture content (texts, prices, images, metadata)
  3. Store and structure the data in a database for analysis

Real-world example: Property developer

Baustelle überlagert von Datenströmen – Symbolbild für datenbasierte Entscheidungen in der Immobilienentwicklung
Data analysis of a building owner

A developer planning a new residential project can use web crawling to monitor authority and government websites for upcoming zoning revisions and participation processes, track property portals for land prices, monitor news portals for planned infrastructure projects, and identify opportunities to positively influence the regulatory environment around their assets.

The Role of Data in Artificial Intelligence

Data is the foundation of every successful AI application. The quality and variety of data used largely determines how powerful and reliable an AI model is. Modern AI technologies such as machine learning and deep learning can extract valuable information from complex data sources. Companies that use their data deliberately for AI applications gain a decisive advantage in digital transformation.

Best Practices for Companies Working with Data

  1. Start small, think big — Begin with your most important business data before launching complex Big Data projects.
  2. Ensure data quality — Only clean, current, and complete data delivers reliable results.
  3. Use visualisation — Dashboards and simple graphics help identify patterns quickly.
  4. Choose tools wisely — You don't need the most expensive solutions; lean tools often suffice.
  5. Translate insights into action — Data is only valuable when it leads to concrete measures: adjusting campaigns, optimising investments, or configuring opportunity radars.

DataHive helps companies meaningfully collect, prepare, and transform data into clear recommendations for action — turning the question "What is data?" into a clear answer: data is the foundation for faster, better decisions ahead of the competition.

FAQ

  1. What types of data does DataHive collect for real estate?
    DataHive collects structured geodata, zoning information, building registers, and publicly available web data across all 26 Swiss cantons — normalised into a single, consistent dataset. Every parcel gets a complete development potential score, updated at least monthly.
  2. How is data extraction different from a standard report?
    Unlike a one-off PDF report, data extraction continuously pulls live information from dozens of sources and delivers it in the format you already use — Excel, API, or a custom dashboard. No manual copying, no outdated numbers, no data locked in attachments.
  3. Can web crawling really give us a competitive edge?
    Yes. Automated crawling monitors thousands of municipal websites, authority portals, and property platforms simultaneously — flagging zoning changes or new opportunities the moment they appear. That means you act before competitors even know something has changed.
  4. What's the difference between structured and unstructured data?
    Structured data lives in tables and databases — clean and queryable. Unstructured data includes PDFs, planning documents, and web pages. DataHive processes both, turning fragmented, hard-to-access public data into analysis-ready datasets for your team.
  5. How do I know if data quality is good enough to trust?
    DataHive guarantees at least 95% data accuracy — and if that's not met, we fix it at no charge. You can also test quality risk-free with a single parcel from CHF 25 before any commitment, and compare it directly against your current data source.