# What is the Difference Between axis=0 and axis=1 When Working with Pandas Dataframes?

Sometimes, functions ask you to specify an `axis`

. The documentation can often feel vague and/or technical.

For instance, here’s a quote from the `apply`

function’s documentation:

axis : {0 or ‘index’, 1 or ‘columns’}, default 0 Axis along which the function is applied: 0 or ‘index’: apply function to each column. 1 or ‘columns’: apply function to each row.

Uuuum… right.

So what’s the difference? Here’s an example…

## Load a CSV file to play with

### Prerequisites (if you want to practice)

- Install the Pandas library for your Python environment
- Cells in this notebook expect the Car Sales.csv file to be in the same directory as your notebook
- Resources to help you practice

### First Things First

`import pandas as pd`

```
# Read the CSV file
# This assumes "Car Sales.csv" is in the same directory as your notebook
car_sales_data = pd.read_csv("Car Sales.csv")
# Show the first 5 rows
first_five = car_sales_data.head(5)
display(first_five)
```

DealershipName | RedCars | SilverCars | BlackCars | BlueCars | MonthSold | YearSold | |
---|---|---|---|---|---|---|---|

0 | Clyde's Clunkers | 902.0 | 650.0 | 754.0 | 792.0 | 1.0 | 2018.0 |

1 | Clyde's Clunkers | 710.0 | 476.0 | 518.0 | 492.0 | 2.0 | 2018.0 |

2 | Clyde's Clunkers | 248.0 | 912.0 | 606.0 | 350.0 | 3.0 | 2018.0 |

3 | Clyde's Clunkers | 782.0 | 912.0 | 858.0 | 446.0 | 4.0 | 2018.0 |

4 | Clyde's Clunkers | 278.0 | 354.0 | 482.0 | 752.0 | 5.0 | 2018.0 |

The car sales data looks like it contains one row that summarizes the **total sales of each color** of car for a **given dealership**, for **each month** of the **year**.

To state the “grain” of the data frame another way, the data frame contains **one row per dealership, month, year** combo and reports the total number of cars sold by color.

### Choose Your Scenario

Suppose that two people come to you and ask separate questions about **average car sales**.

Lucy asks, “Can you calculate the average number of cars sold for each color?”

Zack asks, “Can you calculate the average number of cars sold (regardless of color) for each dealership in each month & year?” (so basically the average of red, silver, black, and blue cars for each row)

#### Start with Lucy.

Think about what you’d do to answer Lucy’s question by hand, manually, if you didn’t have Pandas to do the work for you. Here’s what I’d do:

- Start with the RedCars column.
- Add up 902, 710, 248, 782, 278, and so on.
- Divide that sum by the total number of values. Boom. RedCars average.
- Rinse and reapeat steps 1-3 for SilverCars, BlackCars, and BlueCars.

**This is an axis=0 scenario** in Pandas.

`first_five[['RedCars', 'SilverCars', 'BlackCars', 'BlueCars']].mean(axis=0)`

```
RedCars 584.0
SilverCars 660.8
BlackCars 643.6
BlueCars 566.4
dtype: float64
```

#### What about Zack?

What would you do to answer his question by hand, without Pandas? How’s this…

- Start with the first row of data (Row 0), since his question matches the “grain” of the data frame… one row per dealership per month & year.
- Add up the RedCars, SilverCars, BlackCars, and BlueCars values for Row 0 and divide by 4. So (902 + 650 + 754 + 792)/4
- Rinse and repeat steps 1 & 2 for every row in the data frame. Boom. Average cars sold by dealer/month/year.

**This is an axis=1 scenario**.

`first_five[['RedCars', 'SilverCars', 'BlackCars', 'BlueCars']].mean(axis=1)`

```
0 774.5
1 549.0
2 529.0
3 749.5
4 466.5
dtype: float64
```

### Summarizing the Findings

Specifying an `axis`

to a function in Pandas is helping answer one of the following questions:

- Should I (Pandas)
**start with a column**and make this function do its job*downward*on all the “cells” for that column, and then continue doing the same thing**for all the rest of the columns**in the data frame? (`axis=0`

)

*or*

- Should I (Pandas)
**start with the first row**of data in the data frame and make this function do its job*horizontally*on all of the “cells” for that row, and then continue doing the same thing**for all the rest of the rows**in the data frame? (`axis=1`

)