Last updated on 

This is a Python/Pandas vs Julia cheatsheet and comparison. You can find what is the equivalent of Pandas in Julia or vice versa. You can find links to the documentation and other useful Pandas/Julia resources.

The table below show the useful links for both:

Below you can find equivalent code between Pandas and Julia. Have in mind that some examples might differ due to different indexing.

Import and package installation

import pandas as pd
import numpy as np

using DataFrames
using Statistics
using CSV

Import libraries and modules

pip install pandas

using Pkg

install package

Search Packages

Pandas Series vs Julia Array DataFrame comparison

s = pd.Series(['a', 'b', 'c'], index=[0 , 1, 2])

s = [1, 2, 3]

Pandas series vs Julia vector



Get first element of array or Series

df = pd.DataFrame(
{'col_1': [11, 12, 13],
'col_2': [21, 22, 23]},
index=[0, 1, 3])

df = DataFrame(a=11:13, b=21:23)

Pandas vs Julia DataFrame

import numpy as np
import pandas as pd
data=np.random.randint(0,10,size=(10, 3))
df = pd.DataFrame(data, columns=list('abc'))

using Random
df = DataFrame(rand(10, 3), [:a, :b, :c])

Create random DataFrame

Import Data Julia vs Pandas

df = pd.read_csv('file.csv')

df =“file.csv”, DataFrame)

Read CSV file


using JSON

Read JSON file


A = urldownload(“”)
A |> DataFrame

Read data from URL

df = pd.read_fwf('delim_file.txt')

readdlm(“delim_file.txt”, ‘ ‘, Int, ‘

Read delimited file

Data export – Pandas vs Julia


CSV.write(“file.csv”, df)

Writes to a CSV file


using JSON3

Writes to a file in JSON format

Statistics, samples and summary of the data


first(df, 6)

First n rows


last(df, 6)

Last n rows



Summary statistics

df.loc[:, :'a'].describe()

describe(df[!, [:a]])

Describe columns


using Statistics

Statistical functions

Select data by index, by label, get subset

df.loc[1:3, :]

df[1:3, :]

Select first N rows – all columns

df.loc[[1, 2, 3], :]

df[[1, 2, 3], :]

Select rows by index

df.loc[:, ['a', 'b']].copy()

df[:, [:a, :b]]

Select columns by name(copy)

df.loc[:, ['a']]

df[!, [:A]]

Select columns by name(reference)

df.loc[1:3, ['b', 'a']]

df[1:3, [:b, :a]]

Subset rows and columns

df.loc[[3,1], ['b', 'a']]

df[[3, 1], [:c]]

Reverse selection


findall(ismissing, df[:, “a”])

Select NaN values


filter(!ismissing, df[:, “a”])

Select non NaN values

df['new col'] = df['col'] * 100

df[!, “d”] = df[!, “a”] * 100

Add new column based on other column

df['new col'] = False

df[!, “e”] .= false

Add new column single value

df.loc[-1] = [1, 2, 3]

push!(df,[0, 0, 0])

Add new row at the end of DataFrame

df.append(df2, ignore_index = True)


add rows from DataFrame to existing DataFrame



(Series) Drop values from Series by index (row axis)

s.drop([1, 2])

filter!(e->e∉[1, 2],a)

(Series) Drop values from Series by index (row axis)

df.drop('b' , axis=1)

dropmissing!(df[:, [“b”]])

Drop column by name col_1 (column axis)



Drops all rows that contain null values


df[all.(!ismissing, eachrow(df)), :]

Drops all rows that contain null values


df[:, all.(!ismissing, eachcol(df))]

Drops all columns that contain null values

Sorting and rank values in Pandas vs Julia



sort array of values

sorted([2,3,1], reverse=True)

sort([2,3,1], rev=true)

sort in reverse order


sort(df, [:a])

sort DataFrame by column

df.sort_values(['a', 'b'], ascending=[False, True])

sort(df, [order(:a, rev=true), :b])

sort DataFrame by multiple columns

Filter data based on multiple criteria

df.loc[:, df.isna().any()]

mapcols(x -> any(ismissing, x), df)

find columns with na

df[df['col_1'] > 100]

filter(row -> row.a > 100, df)

Values greater than X


filter(row -> row.a == ‘a’ && row.b >= 5, df)

Filter Multiple Conditions – & – and; | – or

df[df['a'] == 'test']

df[ ( df.a .== “test” ) , :]

filter by sting value

df[(df['a'] == 'test') & (df['b'] == 'a2') ]

df[ ( df.a .== “test” ) .& ( df.b .== “a2” ), :]

combine conditions

Group by and summarize data


groupby(df, [:a])

Group by single column

df.groupby(['a', 'b']).c.sum()

gdf = groupby(df, [:a, :b])
combine(gdf, :c => sum)

group by multiple columns and sum third


combine(groupby(df, [:x1]), nrow => :count)

group by and count

Convert to date, string, numeric


replace(df.a,missing => 0)

replace NA values

df.replace('..', None)

ifelse.(df .== “..”, missing, df)

convert .. to NA


df[!, :a] = parse.(Int64, df[!, :a])

convert string to int

pd.to_datetime(df['date'], format='%Y-%m-%d')

using Dates
df.Date = Date.(df.Date, “dd-mm-yyyy”)

convert string to date

Install Julia Packages

To install new packages in Julia we can also use the Julia Package manager by:

  • open Linux Terminal
  • start Julia – julia
  • Type ] (right bracket). You don’t have to hit Return.
    • Termimal will change to (@v1.8) pkg>
  • Type add to add a package
    • you can provide the names of several packages separated by spaces.
  • Control-C to exit the package manager

Example: (v1.8) pkg> add JSON StaticArrays

Differences: Julia and Pandas

Pandas and Julia are both popular tools for data analysis and manipulation. Some key differences between them:


One big difference between Julia and Pandas is indexing:


Personally I prefer SQL syntax over both Julia and Pandas. I can work fine with both of them. As I have more experience with Python I would go with Pandas. Some people consider Julia to have better syntax since it was designed for data science. Example of syntax difference between Julia and Pandas:

# pandas
import pandas as pd
df = pd.read_csv('sales_data.csv')
totals = df.groupby('product')['sales'].sum()

# julia
using DataFrames
using CSV
df = DataFrame("sales_data.csv"))
totals = combine(groupby(df, :product), :sales => sum)


In general Julia is faster for most operations and bigger datasets. For smaller datasets Pandas might be close or even better than Julia. The reason is for compilation time for Julia.

To test performance we can use dataset with 10M rows – Game Recommendations on Steam:

# pandas
import pandas as pd
df = pd.read_csv('recommendations.csv')

# julia
@time begin
using CSV, DataFrames
df = CSV.File("recommendations.csv") |> DataFrame
result = mean(df[:, "hours"])

The results are:

  • Pandas
    • CPU times: user 5.67 s, sys: 1.85 s, total: 7.52 s
    • Wall time: 7.71 s
  • Julia
    • 7.257497 seconds (1.13 k allocations: 1.349 GiB, 2.63% gc time)

While for dataset – 12M rows we get:

  • Pandas
    • CPU times: user 34.8 s, sys: 3.74 s, total: 38.5 s
    • Wall time: 42.4 s
  • Julia
    • 29.964544 seconds (162.12 M allocations: 9.878 GiB, 15.79% gc time)

First julia execution is slower so we take the second one.

Libraries and Ecosystem

Pandas has a bigger community and ecosystem. The Python libraries offers greater variety of Packages in many areas:

  • web scraping
  • data science
  • science
  • etc

Language Features

I prefer Julia for distributed computing and parallel computing. Pandas seems for me much better for visualization and EDA.

Learning Curve

Again it depends on personal choice. Python is considered as one of the best programming languages for beginners. Julia surpassed Python in recent surveys for loved language:

stackoverflow survey – Most loved, dreaded, and wanted

Note: I need to add that I’m still learning and discovering Julia – so so statements above might change in future :)

Pandas vs Julia docs


In summary, Pandas and Julia are both powerful tools for data analysis, but they have different strengths and weaknesses.

Pandas has a larger ecosystem of tools and is generally easier to learn. Julia is faster and has some unique language features that can make it more powerful for certain types of data analysis tasks.

Ultimately, the choice between Pandas and Julia depends on your specific requirements and preferences.


Cheatsheet Image

Read More