Best Transitioning From Excel To Python: SEO Data Analysis

Safalta Expert Published by: Aryan Rana Updated Mon, 05 Dec 2022 12:54 AM IST

Highlights

The capacity to work with larger datasets and automate tedious activities are just a couple of the many advantages of learning to code, whether with Python, JavaScript, or another programming language.

Table of Content
Prerequisites
LEN
Eliminating Duplicates
Columns From Text
CONCATENATE
SEARCH/FIND




The capacity to work with larger datasets and automate tedious activities are just a couple of the many advantages of learning to code, whether with Python, JavaScript, or another programming language.

However, despite the advantages, I entirely see why many SEO specialists haven't made the switch. We're all busy, and it's not a prerequisite for SEO.

Free Demo Classes

Register here for Free Demo Classes



It may feel like you're reinventing the wheel if you need to do a task quickly and you already know how to do it in Excel or Google Sheets.

It took me a while to get to the point where Python is my default option for data processing because when I first started coding, I mainly used it for things that Excel couldn't do.

Looking back, I'm so glad I persisted, although there were times when it was difficult, requiring hours of scouring Stack Overflow forums.

Other SEO experts can avoid the same tragedy by reading this post.

In it, we'll discuss the Python counterparts of the most popular Excel formulas and tools for analysing SEO data; all of these tools are accessible in the Google Colab notebook that is referenced in the summary.


Download these FREE Ebooks:
1. Introduction to Digital Marketing
2. Website Planning and Creation


You can check other related blogs below:
1. Powerful SEO Techniques to rank in Google
2. How to get powerful SEO backlinks? Top 10 Tips to get Backlinks

3. Search Intent - All You Should know
4. What is page experience in Digital marketing?

5. SEO Vs PPC: Which is beneficial?
6. 7 Tips for combine Website Content to Improve SEO
7. 6 Reasons Email Marketing increase holiday sales
8. 6 SEO hacks to revive your Website


You will discover the equivalents of:
  • LEN.
  • Eliminate Duplicates.
  • Columns from Text.
  • SEARCH/FIND.
  • CONCATENATE.
  • Locate and replace
  • LEFT/MID/RIGHT.
  • IF.
  • IFS.
  • VLOOKUP.
  • COUNTIF/SUMIF/AVERAGEIF.
  • a pivot table

Amazingly, we'll be relying primarily on Pandas, with a bit of assistance from its larger brother NumPy, to do all of this.


Prerequisites

We won't be discussing a few items today due to time constraints, including:
  • Python installation.
  • Simple Pandas functions like filtering, previewing data frames, and importing CSVs.

If you have any questions, Hamlet's introduction to Python data analysis for SEO is the best resource.

Without further ado, let's get started.


LEN

A count of the characters in a text string is given by LEN.

A typical use case for length measurement in SEO is to check whether title tags or meta descriptions will be abbreviated in search results.

If we wanted to count the second cell in column A in Excel, we would type:

=LEN(A2)

Not too dissimilar is Python, where we can use the built-in lens function in conjunction with Pandas' loc[] to retrieve a specific row of data within a column of data:

len(df['Title'].loc[0])

In this illustration, the "Title" column of our dataframe's first row is used to represent the length.

However, knowing a cell's length isn't really helpful for SEO. In a normal situation, we'd want to apply a function to every single column!

This may be done in Excel by either double-clicking or dragging the formula cell in the bottom right corner downward.

We can use the Pandas data frames str.len function to get the number of rows in a series and then put the results in a new column:

df['Length'] = df['Title'].str.len()

A "vectorized" action called str.len is made to be applied concurrently to a number of values. Since they nearly always end up being quicker than a loop, we'll utilise these operations a lot in this article.

LEN is frequently used in conjunction with SUBSTITUTE to count the number of words in a cell:

=LEN(TRIM(A2)) - =LEN(SUBSTITUTE(A2," ","") + 1

By using the str. split and str.len functions in Pandas, we can accomplish this:

Title = df['Title'].str.split().len df['No. Words'] = ()

More specifically, what we're doing with str. split is splitting our data based on the presence of whitespace in the string, and then counting the number of component parts.


Eliminating Duplicates

By deleting totally duplicate rows (when all columns are selected) or by removing rows with the same values in particular columns, Excel's "Eliminate Duplicates" tool makes it simple to remove duplicate values from a dataset.

Drop duplicates in Pandas provide this functionality.

To remove redundant rows from a data frame type:

df.drop duplicates(inplace=True)

Include the subset parameter to remove rows based on duplication in a single column:

drop duplicates (subset = "column", inplace = True)

Alternatively, provide a list's numerous columns:

drop duplicates (subset = ['column', 'column2'], inplace = True)

The replace parameter is one addition from the list above that deserves special attention. With inplace=True, we can replace our current data frame without having to make a new one.

Of course, there are situations when we want to keep the raw data. If so, we can designate a different variable to which we can assign our deduped data frame:

Df2 equals df.drop duplicates(subset='column')


Columns From Text

The "text to columns" feature, another indispensable tool, allows you to divide a text string based on a delimiter like a slash, comma, or whitespace.

Dividing a URL into its domain and various subfolders as an illustration.

To remove redundant rows from a data frame type:

df.drop duplicates(inplace=True)

Include the subset parameter to remove rows based on duplication in a single column:

drop duplicates (subset = "column", inplace = True)
Alternatively, provide a list's numerous columns:

drop duplicates (subset = ['column', 'column2'], inplace = True)

The replace parameter is one addition from the list above that deserves special attention. With inplace=True, we can replace our current data frame without having to make a new one.

Of course, there are situations when we want to keep the raw data. If so, we can designate a different variable to which we can assign our deduped data frame:

Df2 equals df.drop duplicates(subset='column')


CONCATENATE

With the use of various modifiers, users can create lists of keywords by combining numerous text strings using the CONCAT function.

In this scenario, we're expanding the list of product categories in column A to include "mens" and whitespace:

A2: =CONCAT($F$1, ","

If we're working with strings, Python's arithmetic operator can accomplish the same thing:

Mens +'+'+ df['Keyword'] = df['Combined]

Alternately, specify several data columns:

"Combined" = "Subdomain" + "URL"

Although Pandas has a separate concat method, using it to combine multiple data frames with the same columns is more advantageous.

For instance, if our preferred link analysis tool had produced multiple exports:

df = pd.read_csv('data.csv')
df2 = pd.read_csv('data2.csv')
df3 = pd.read_csv('data3.csv')

dflist = [df, df2, df3]

df = pd.concat(dflist, ignore_index=True)

 

SEARCH/FIND
 

The SEARCH and FIND formulas provide a way of locating a substring within a text string.

These instructions are frequently used in conjunction with ISNUMBER to generate a Boolean column that aids in dataset filtering, which may be quite beneficial when carrying out activities like log file analysis, as shown in this article. E.g.:

=ISNUMBER(SEARCH("searchthis",A2)

In this scenario, we're expanding the list of product categories in column A to include "mens" and whitespace:

A2: =CONCAT($F$1, ","

The case sensitivity of find makes it different from SEARCH.


Str.contains, the analogous Pandas function, is case-sensitive by default:

Journal = Journal + URL.

engine, na=False, str.contains
Setting the case argument to False will enable case insensitivity:

Journal = Journal + URL.

In either case, adding na=False will stop null values from being returned within the Boolean column. str.contains("engine", case=False, na=False)

Using Pandas has a number of benefits, including the fact that this function natively supports regex, unlike Excel and Google Sheets via REGEXMATCH.

Using the pipe character, also known as the OR operator, you can combine multiple substrings:

Journal = Journal + URL.

engine|search, na=F, str.contains

Python has gained popularity as more people have become aware of its powers and potential, despite the fact that it technically offers different functionality than Excel. Many developers and the larger data science community believe it to be a superior data analysis tool.

Can Python take the place of Excel?

Excel and Python serve different purposes.

Python is a programming language, and it may be used to create a wide range of programmes in addition to data management. It goes without saying that learning to code is a prerequisite for using Python.

Does Excel and Python play nicely together?

Utilizing Python in Excel spreadsheets can be an excellent method to increase productivity and eliminate the requirement for data import and export. Similar to how you may use VBA, interactive worksheets can be created using Python code, but with all of Python's advantages.

Related Article

UPPPSC PCS Postponed: यूपीपीएससी पीसीएस प्रारंभिक परीक्षा स्थगित; 'एक दिन, एक पाली' की मांग पर लगी मुहर

Read More

SBI Assistant Manager exam date out now, Check the vacancies and latest update here

Read More

RSMSSB: जूनियर इंस्ट्रक्टर परीक्षा के लिए जारी हुआ प्रवेश पत्र, एग्जाम के दिन ले जाना न भूलें ये दस्तावज

Read More

UPSC IFS Main 2024 admit card released, Check the exam schedule and steps to download hall ticket here

Read More

India Post GDS: इंडिया पोस्ट ने जारी की जीडीएस भर्ती की चौथी मेरिट सूची, ऐसे देखें लिस्ट में अपना नाम

Read More

NABARD Office Attendant Admit Card 2024 out at nabard.org, Read the steps to download hall ticket here

Read More

NABARD: नाबार्ड ने ऑफिस अटेंडेंट पद के लिए जारी किए प्रवेश पत्र, 21 नवंबर को होगा एग्जाम; देखें परीक्षा पैटर्न

Read More

GATE 2025: गेट परीक्षा का विस्तृत कार्यक्रम जारी, यहां देखें कब होगा कौन से विषय का एग्जाम

Read More

SSC JE 2024 Tier 2: 1,765 पदों के लिए जारी हुई जेई टियर-2 की उत्तर कुंजी, चयनित अभ्यर्थियों इतना मिलेगा वेतन

Read More