-
Notifications
You must be signed in to change notification settings - Fork 104
3.1.2.Difference between data formats and structes
- Data that is counted and has a limited number of values
- Data that is measured and can have almost any numeric value
- A type of qualitative data that is categorized without a set order
- A type of qualitative data with a set order or scale
- Data that lives within a company's own systems
- Data that lives and is generated outside of an organization
- Data organized in a certain format such as rows and columns
- Data that is not organized in any easily identifiable manner
An entertainment website displays a star rating for a movie based on user reviews. Users can select from one to five whole stars to rate the movie. The star rating is an example of what type of data? Select all that apply.
- Ordinal
ContinuousNominal- Discrete
Correct. The star rating is an example of ordinal data because the number of stars are in order of how much each person liked the movie. It’s also an example of discrete data because a person has to choose a full star measure; half-stars weren’t an option.
The use of external data is particularly valuable in which circumstances?
When analysis involves data that hasn’t been cleanedWhen analysis requires a lot of structured dataWhen analysis includes data from audio files- When analysis depends on as many data sources as possible
Correct. External data is particularly valuable when an analysis depends on as many sources as possible.
When you think about the word "format," a lot of things might come to mind. Think of an advertisement for your favorite store. You might find it in the form of a print ad, a billboard, or even a commercial. The information is presented in the format that works best for you to take it in. The format of a dataset is a lot like that, and choosing the right format will help you manage and use your data in the best way possible.
As with most things, it is easier for definitions to click when we can pair them with real life examples. Review each definition first and then use the examples to lock in your understanding of each data format.
- Differences between primary and secondary data and examples of each
Data Format Classification | Definition | Examples |
---|---|---|
Primary data | Collected by a researcher from first-hand sources | - Data from an interview you conducted - Data from a survey returned from 20 participants - Data from questionnaires you got back from a group of workers |
Secondary data | Gathered by other people or from other research | - Data you bought from a local data analytics firm’s customer profiles - Demographic data collected by a university; Census data gathered by the federal government |
- Differences between internal and external data and examples of each
Data Format Classification | Definition | Examples |
---|---|---|
Internal data | Data that lives inside a company’s own systems | - Wages of employees across different business units tracked by HR - Sales data by store location - Product inventory levels across distribution centers |
External data | Data that lives outside of a company or organization | - National average wages for the various positions throughout your organization - Credit reports for customers of an auto dealership |
- Differences between continuous and discrete data and examples of each
Data Format Classification | Definition | Examples |
---|---|---|
Continuous data | Data that is measured and can have almost any numeric value | - Height of kids in third grade classes (52.5 inches, 65.7 inches) - Runtime markers in a video - Temperature |
Discrete data | Data that is counted and has a limited number of values | - Number of people who visit a hospital on a daily basis (10, 20, 200) - Room’s maximum capacity allowed - Tickets sold in the current month |
- Differences between qualitative and quantitative data and examples of each
Data Format Classification | Definition | Examples |
---|---|---|
Qualitative | Subjective and explanatory measures of qualities and characteristics | - Exercise activity most enjoyed - Favorite brands of most loyal customers - Fashion preferences of young adults |
Quantitative | Specific and objective measures of numerical facts | - Percentage of board certified doctors who are women - Population of elephants in Africa - Distance from Earth to Mars |
- Differences between nominal and ordinal data and examples of each
Data Format Classification | Definition | Examples |
---|---|---|
Nominal | A type of qualitative data that isn’t categorized with a set order | - First time customer, returning customer, regular customer - New job applicant, existing applicant, internal applicant - New listing, reduced price listing, foreclosure |
Ordinal | A type of qualitative data with a set order or scale | - Movie ratings (number of stars: 1 star, 2 stars, 3 stars) - Ranked-choice voting selections (1st, 2nd, 3rd) - Income level (low income, middle income, high income) |
- Differences between structured and unstructured data and examples of each
Data Format Classification | Definition | Examples |
---|---|---|
Structured data | Data organized in a certain format, like rows and columns | - Expense reports - Tax returns - Store inventory |
Unstructured data | Data that isn’t organized in any easily identifiable manner | - Social media posts - Emails - Videos |
- A model that is used for organizing data elements and how they relate to one another
- Pieces of information, such as people's names, account numbers, and addresses
In this reading, you will learn about data modeling and some different types of data models. Data models help keep data consistent and give us a map of how data is organized. This makes it easier for analysts and other stakeholders to make sense of their data and use it in the right ways. As a junior data analyst, you will probably be working with the data models your organization already has in place — but understanding how data models work can help you make sense of other models you might come across on the job.
Data modeling is the process of creating diagrams that visually represent how data is organized and structured. These visual representations are called data models. You can think of data modeling as a blueprint of a house. At any point, there might be electricians, carpenters, and plumbers using that blueprint. Each one of these builders has a different relationship to the blueprint, but they all need it to understand the overall structure of the house. Data models are similar; different users might have different data needs, but the data model gives them an understanding of the structure as a whole.
Each level of data modeling has a different level of detail.
- Conceptual data modeling gives you a high-level view of your data structure, such as how you want data to interact across an organization.
- Logical data modeling focuses on the technical details of the model such as relationships, attributes, and entities.
- Physical data modeling should actually depict how the database was built. By this stage, you are laying out how each database will be put in place and how the databases, applications, and features will interact in specific detail.
More information can be found in this comparison of data models.
There are a lot of approaches when it comes to developing data models, but two common methods are the Entity Relationship Diagram (ERD) and the Unified Modeling Language (UML) diagram. ERDs are a visual way to understand the relationship between entities in the data model. UML diagrams are very detailed diagrams that describe the structure of a system by showing the system's entities, attributes, operations, and the relationships. As a junior data analyst, you will need to understand that there are different data modeling techniques, but in practice, you will probably be using your organization’s existing model.
You can read more about ERD, UML, and data dictionaries in this data modeling techniques article.
Data modeling can help you explore the high-level details of your data and how it is related across the organization’s information systems. Data modeling sometimes requires data analysis to understand how the data is put together; that way, you know how to map the data. And finally, data models make it easier for everyone in your organization to understand and collaborate with you on your data. This is important for you and everyone on your team!
Data is everywhere and it can be stored in lots of ways. Two general categories of data are:
- Structured data: Organized in a certain format, such as rows and columns.
- Unstructured data: Not organized in any easy-to-identify way.
For example, when you rate your favorite restaurant online, you're creating structured data. But when you use Google Earth to check out a satellite image of a restaurant location, you're using unstructured data.
Here's a refresher on the characteristics of structured and unstructured data:
As we described earlier, structured data is organized in a certain format. This makes it easier to store and query for business needs. If the data is exported, the structure goes along with the data.
Unstructured data can’t be organized in any easily identifiable manner. And there is much more unstructured than structured data in the world. Video and audio files, text files, social media content, satellite imagery, presentations, PDF files, open-ended survey responses, and websites all qualify as types of unstructured data.
The lack of structure makes unstructured data difficult to search, manage, and analyze. But recent advancements in artificial intelligence and machine learning algorithms are beginning to change that. Now, the new challenge facing data scientists is making sure these tools are inclusive and unbiased. Otherwise, certain elements of a dataset will be more heavily weighted and/or represented than others. And as you're learning, an unfair dataset does not accurately represent the population, causing skewed outcomes, low accuracy levels, and unreliable analysis.
TOTAL POINTS 4
Fill in the blank: The running time of a movie is an example of _____ data.
qualitative- continuous
discretenominal
Correct. Running times of movies are an example of continuous data, which is measured and can have almost any numeric value.
What are the characteristics of unstructured data? Select all that apply.
Fits neatly into rows and columns- May have an internal structure
- Is not organized
Has a clearly identifiable structure
Correct. Unstructured data is not organized, although it may have an internal structure.
Structured data enables data to be grouped together to form relations. This makes it easier for analysts to do what with the data? Select all that apply.
- Analyze
Rewrite- Search
- Store
Correct. Structured data that is grouped together to form relations enables analysts to more easily store, search, and analyze the data.
Which of the following is an example of unstructured data?
- Email message
Contact saved on a phoneRating of a local favorite restaurantGPS location
Correct. An example of unstructured data is an email message. Other examples of unstructured data are video files and social media content.
https://www.coursera.org/professional-certificates/google-data-analytics
© 2021 Coursera Inc. All rights reserved.