Skip to content

This SAS case study aims to prepare the inbound and outbound tourism data for countries and continents in 2014. The program effectively addresses the data requirements, enabling analyst team to to perform analyses aimed at growing company's market share.

License

Notifications You must be signed in to change notification settings

MaxineXiong/Wourld_Tourism_Data_Preparation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

SAS Case Study: World Tourism Data Preparation

GitHub License: MIT Platform - SAS 9.4


In this case study, the primary objective is to organize and prepare data for a company, enabling the analyst team to generate insightful reports, visualizations, and statistical models aimed at expanding the company's market share. Specifically, the focus is on analyzing inbound and outbound tourism data for countries and continents in the year 2014. The project involves three overall tasks related to the delivery of essential tables: the cleaned_tourism table, the final_tourism table, and the nocountryfound table. The first task entails restructuring the initial tourism table to meet specific data requirements and creating the cleaned_tourism table. Subsequently, the second task involves merging the restructured tourism data with the country-info table to establish the final_tourism table, which exclusively contains matching rows. Finally, the third task necessitates creating the nocountryfound table, containing a distinct list of countries that lack matching rows in the country_info table. The SAS program has effectively addressed these data requirements.


Case Study Requirements

The raw data for this case study is the Tourism data located in CR library.

screenshot1

The final table should be a combination of the cleaned_tourism table and the country_info table.

See the example of desired outcome below:

screenshot2

Data Requirements

Create the cleaned_tourism table with the following column requirements:

  • Country_Name – contains the country name from the original Country column.
  • Tourism_Type – contains the type of tourism from the original Country column. Valid values are Inbound tourism or Outbound tourism.
  • Category – contains category names by extracting and modifying values from the original Country column. There should be six distinct values for Category as shown in the table below:

  • Series – All values should be in uppercase and data that is not available (coded as "..") should be changed to a missing character value.
  • Y2014 – contains numeric values that are calculated from the scaled character values in the original _2014 year column. The scaled values are multiplied by either thousands or millions (abbreviated Mn), depending on the value listed for category in the Country column. The new Y2014 values should be formatted with the COMMA format.
    • Example: if the category is Travel - US$ MN and the value for _2014 is 4.26, Y2014 is equal to 4.26 * 1000000, or 4,260,000.
    • Include only Country_Name, Tourism_Type, Category, Series, and Y2014 in the output table.

Merge the cleaned_tourism table with the country_info table and do the following:

  1. Create two new tables:
    • final_tourism should contain only merged data.
    • nocountryfound should contain a list of distinct countries from the cleaned_tourism table that do not have a match in the country_info table.
  2. Create a format for the Continent column that labels continent IDs with the corresponding continent names. Permanently apply the format in the final_tourism table.
    • 1 = North America
    • 2 = South America
    • 3 = Europe
    • 4 = Africa
    • 5 = Asia
    • 6 = Oceania
    • 7 = Antarctica

Repository Structure

This repository is structured as follows:

TSA-Claims-Data-Analysis 
├── World_Tourism_Data_Preparation.sas 
├── ECRB94/ 
│   └── createdataCRB_oda.sas 
├── README.md 
└── LICENSE
  • World_Tourism_Data_Preparation.sas: This SAS program file addresses all data requirements in the case study. It is the main file for preparing and analysing the tourism data for inbound and outbound 2014 tourism for countries and continents.
  • ECRB94/createdataCRB_oda.sas: This SAS program generates all the necessary libraries, tables, and data files in SAS Studio. It sets up the required environment for running the main data preparation program, World_Tourism_Data_Preparation.sas.
  • README.md: This file provides an overview of the repository, including descriptions of the case study and relevant information for usage.
  • LICENSE: The license file for the project.

Please note that the World_Tourism_Data_Preparation.sas program should be used as the primary entry point for data preparation, while ECRB94/createdataCRB_oda.sas sets up the required environment to execute the data preparation program successfully.


Prerequisites

To run the TSA Claims Data Analysis program, you need an active account for either SAS® OnDemand for Academics or SAS® Viya. These platforms provide the necessary environment for executing SAS programs and analysing the data.


Usage

To use this repository, follow the steps outlined below:

  1. Download the repository to your local machine.
  2. Launch SAS Studio using your SAS software.
  3. In the Server Files and Folders panel, click New at the top and select Folder.
  4. Enter "ECRB94" in uppercase exactly as shown in the Name box. Click Save.
  5. Verify that the ECRB94 folder has been successfully created under Files(Home).
  6. Select the ECRB94 folder and click the Upload tool.
  7. In the Upload Files window, click Choose Files and navigate to the ECRB94 folder on your computer. Select the file createdataCRB_oda.sas and click Open. Click Upload to add the program to the ECRB94 folder on the server.
  8. In SAS Studio, double-click createdataCRB_oda.sas to open the program.
  9. Click the Run tool or press F3 to execute the program. This will generate all the necessary SAS files and data required for the program. After the program completes, you will see a list of SAS tables displayed in the Results tab.
  10. In the Server Files and Folders panel, navigate to the ECRB94 folder and collapse each subfolder. You should see three subfolders: data, output, and programs.
  11. The target libraries and tables used in the SAS program are now accessible in the ~/ECRB94/data directory.

By following these steps, you will have successfully set up the required environment and data files to utilize the SAS program included in this repository.


License

This project is licensed under the MIT License.

About

This SAS case study aims to prepare the inbound and outbound tourism data for countries and continents in 2014. The program effectively addresses the data requirements, enabling analyst team to to perform analyses aimed at growing company's market share.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages