Analyzing JSON Data with Pandas in Python
This example demonstrates the use of Python's Pandas library to extract and analyze JSON data provided as a string.
Introduction
Backstory and Motivation
Meet Alex, a data analyst working for a research firm that specializes in environmental studies. Alex's team is responsible for analyzing and visualizing data related to pollution levels in various cities. They frequently receive data from different sources in JSON format. To streamline the analysis process, Alex decides to create a Python script that reads JSON data from a string and converts it into a structured Pandas DataFrame. This will enable the team to easily manipulate and visualize the pollution data.
Statement
On input data, we have a JSON-formatted string with the following structure:
[
{
"city": "New York",
"pollutant": "CO2",
"value": 320
},
{
"city": "Los Angeles",
"pollutant": "NO2",
"value": 45
},
{
"city": "Chicago",
"pollutant": "SO2",
"value": 20
}
]
Input data has the following structure:
"data": string
import pandas as pd
import json
# Convert JSON data string to a list of dictionaries
data_list = json.loads(INPUT_DATA[0]["data"])
# Create a Pandas DataFrame from the list of dictionaries
df = pd.DataFrame(data_list)
# Display the DataFrame
log.info("Original DataFrame:")
log.info(df)
# Perform data analysis tasks (e.g., filtering, aggregation, visualization)
avg_pollution = df.groupby("pollutant")["value"].mean()
log.info("Average pollution levels by pollutant:")
log.info(avg_pollution)
Explanation
- Converting JSON to List of Dictionaries: We use Python's json.loads() function to convert the JSON data string into a list of dictionaries.
- Creating Pandas DataFrame: We create a Pandas DataFrame from the list of dictionaries. Each dictionary represents a data entry in the DataFrame.
- Displaying Data: We display the original DataFrame to visualize the structured data.
- Data Analysis: We perform data analysis tasks, such as calculating the average pollution levels by pollutant using Pandas' grouping and aggregation capabilities.
Conclusion
This Pandas-based analysis of JSON data simplifies the process of handling structured data received in JSON format. By leveraging Pandas, you can quickly convert, analyze, and visualize JSON data, making it a valuable tool for data analysts in various fields, including environmental studies.