Use the Blank Sheet of Paper Test to Optimize for Natural Language Processing

Use the Blank Sheet of Paper Test to Optimize for Natural Language Processing

Use the Blank Sheet of Paper Test to Optimize for Natural Language Processing 1920 1248 Evan Hall

If you handed someone a blank sheet of paper and the only thing written on it was the page’s title, would they understand what the title meant? Would they have a clear idea of what the actual document might be about? If so, then congratulations! You just passed the Blank Sheet of Paper Test for page titles because your title was descriptive.

The Blank Sheet of Paper Test (BSoPT) is an idea Ian Lurie has talked about a lot over the years, and recently on his new website. It’s a test to see if what you’ve written has meaning to someone who has never encountered your brand or content before. In Ian’s words, “Will this text, written on a blank sheet of paper, make sense to a stranger?” The Blank Sheet of Paper Test is about clarity without context.

But what if we’re performing the BSoPT on a machine instead of a person? Does our thought experiment still apply? I think so. Machines can’t read—even sophisticated ones like Google and Bing. They can only guess at the meaning of our content, which makes the test especially relevant.

I have an alternative version of the BSoPT, but for machines: If all a machine could see is a list of words that appear in a document and how often, could it reasonably guess what the document is about?

The Blank Sheet of Paper Test for word frequency

If you handed someone a blank sheet of paper and the only thing written on it was this table of words and frequencies, could they guess what the article is about?

An article about sharpening a knife is a pretty good guess. The article I took this word frequency table from was a how-to guide for sharpening a kitchen knife.

What if the words “step” and “how” appeared in the table? Would the person reading be more confident this article is about sharpening knives, or less? Could they tell if this article is about sharpening kitchen knives or pocket knives?

If we can’t get a pretty good idea of what the article is about based on which words it uses, then it fails the BSoPT for word frequency.

Can we still use word frequency for BERT?

Earlier natural language processing (NLP) approaches employed by search engines used statistical analysis of word frequency and word co-occurrence to determine what a page is about. They ignored the order and part of speech of the words in our content, basically treating our pages like bags of words.

The tools we used to optimize for that kind of NLP compared the word frequency of our content against our competitors, and told us where the gaps in word usage were. Hypothetically, if we added those words to our content, we would rank higher, or at least help search engines understand our content better.

Those tools still exist: Market Muse, SEMRush, seobility, Ryte, and others have some sort of word frequency or TD-IDF gap analysis capability. I’ve been using a free word frequency tool called Online Text Comparator, and it works pretty well. Are they still useful now that search engines have advanced with NLP approaches like BERT? I think so, but it’s not as simple as more words = better rankings.

BERT is a lot more sophisticated than a bag-of-words approach. BERT looks at the order of words, part of speech, and any entities present in our content. It’s robust and can be trained to do many things including question answering and named entity recognition—definitely more advanced than basic word frequency.

However, BERT still needs to look at the words present on the page to function, and word frequency is a basic summary of that. Now, word location and part of speech matter more. We can’t just sprinkle the words we found in our gap analysis around the page.

Enhancing content with word frequency tools

To help make our content unambiguous to machines, we need to make it unambiguous to users. Reducing ambiguity in our writing is about choosing words that are specific to the topic we’re writing about. If our writing uses a lot of generic verbs, pronouns, and non-thematic adjectives, then not only is our content bland, it’s hard to understand.

Consider this extreme example of non-specific language:

“The trick to finding the right chef’s knife is finding a good balance of features, qualities and price. It should be made from metal strong enough to keep its edge for a decent amount of time. You should have a comfortable handle that won’t make you tired. You don’t need to spend a lot either. The home cook doesn’t need a fancy $350 Japanese knife.”

This copy isn’t great. It looks almost machine-generated. I can’t imagine a full article written like this would pass the BSoPT for word frequency.

Here’s what the word frequency table looks like with some stop words removed:

Now suppose we used a word frequency tool on a few pages that are ranking well for “how to pick a chef’s knife” and found that these parts of speech were being used fairly often:

Entities: blade, steel, fatigue, damascus steel, santoku, Shun (brand)
Verbs
: grip, chopping
Adjectives
: perfect, hard, high-carbon

Incorporating these words into our copy would yield text that’s significantly better:

“The trick to finding the perfect chef’s knife is getting the right balance of features, qualities, and price. The blade should be made from steel hard enough to keep a sharp edge after repeated use. You should have an ergonomic handle that you can grip comfortably to prevent fatigue from extending chopping. You don’t need to spend a lot, either. The home cook doesn’t need a $350 high-carbon damascus steel santoku from Shun.”

This upgraded text will be easier for machines to classify, and better for users to read. It’s also just good writing to use words relevant to your topic.

Looking toward the future of NLP

Is improving our content with the Blank Sheet of Paper Test optimizing for BERT or other NLP algorithms? No, I don’t think so. I don’t think there is a special set of words we can add to our content to magically rank higher through exploiting BERT. I see this as a way to ensure our content is understood clearly by both users and machines.

I anticipate that we’re getting pretty close to the point where the idea of optimizing for NLP will be considered absurd. Maybe in 10 years, writing for users and writing for machines will be the same thing because of how far the technology has advanced. But even then, we’ll still have to make sure our content makes sense. And the Blank Sheet of Paper Test will still be a great place to start.

* Checkbox GDPR is required

*

I agree

Will you like to book a consultation today?

We promise you’ll be glad to have us as the only premium website developer you’ve ever had!

Will you like to book a consultation today?

We promise you’ll be glad to have us as the only premium website developer you’ve ever had!

Bear Design - WordPress Development

Bear Design provides website development and design, creating content uploaded websites and improving web page placements and web traffic. Bear Design websites are unique, easy to use and responsive. Site owners can easily edit the content, or can trust the Bear Design & Communications to keep them up to date and supply quality content regularly.


GET IN TOUCH
160 City Road, EC1V 2NX London, United Kingdom
Monday – Thursday: 9:00 AM – 5:00 PM
Friday: 9:00 AM – 2:00 PM

WE ARE IN LONDON

Bear Design - WordPress Development

Bear Design provides website development and design, creating content uploaded websites and improving web page placements and web traffic. Bear Design websites are unique, easy to use and responsive. Site owners can easily edit the content, or can trust the Bear Design & Communications to keep them up to date and supply quality content regularly.


WE ARE IN LONDON

GET IN TOUCH
160 City Road, EC1V 2NX London, United Kingdom
Monday – Thursday: 9:00 AM – 5:00 PM
Friday: 9:00 AM – 2:00 PM

Bear Design - WordPress Development

Bear Design provides website development and design, creating content uploaded websites and improving web page placements and web traffic. Bear Design websites are unique, easy to use and responsive. Site owners can easily edit the content, or can trust the Bear Design & Communications to keep them up to date and supply quality content regularly.


GET IN TOUCH
160 City Road, EC1V 2NX London, United Kingdom
Monday – Thursday: 9:00 AM – 5:00 PM
Friday: 9:00 AM – 2:00 PM

WE ARE IN LONDON

© Made with by Bear Design

© Made with by Bear Design

    We are Bear Design

    WE DESIGN

    YOUR WORLD

    Bear Design & Communications Ltd.

    Address : 160 City Road, EC1V 2NX London, United Kingdom
    Phone : +36 702 448 100
    Email : [email protected]

    Opening hours :
    Monday – Thursday: 9:00 AM – 5:00 PM
    Friday: 9:00 AM – 2:00 PM

    Are you sure?
    You must approve our cookie policy to use our site. I you refuse it you will redirect to the Google.
    Refuse
    Approve Cookies
    Cookie Policy
    Cookie Policy
    This Bear Design Cookie Policy (“Policy”) outlines the general policy, practices, and types of cookies that Bear Design And Communications Ltd.. (“Bear Design”, “we”, “us” or “our”) may use to improve our services and your experience when visiting our websites.Cookies are small pieces of text used to store information on web browsers. They’re used by many websites to store and receive identifiers and other information on devices, such as a handheld phone or computer. Our site and services use cookies and other similar technologies (collectively in this Policy, “cookies”), in order to provide a better service to you and to generally improve our sites and services. For example, we may use cookies to help direct you to the appropriate part of our websites, by indicating that you are a repeat visitor. We may also use information to present you with services that are matched to your preferences.Some portions of our websites are functional without cookies, and you may generally choose whether to accept cookies. Most web browsers are set to accept cookies by default, however, you may be able to delete cookies yourself through your browser’s cookie manager. To do so, please follow the instructions provided by your web browser. Please note that disabling cookies will reset your session, disable auto-login, and may adversely the availability and functionality of our websites and the services we can provide to you.As part of our services, we may also place cookies on the computers of visitors to websites protected by Bear Design. We do this in order to identify malicious visitors, reduce the chance of blocking legitimate users, and to provide customized services.Our websites use first party cookies (i.e., cookies set directly by Bear Design) as well as third party cookies, as detailed in the table below.
    Type of CookieWhy we use these cookiesWho serves them and where can you find out more information?
    Analytics and research of usersThese are used to understand, improve, and research users visiting //beardesign.me and their needs for our product offerings. For example, we may use cookies to understand what pages a user browses before submitting a sales request form. We do not share information about this analysis with any third parties.Selected third parties listed and defined as follows:
    • Google Analytics – Web traffic tracking – //www.google.com/policies/privacy/
    • Bing – Conversion tracking from Bing ads – https://advertise.bingads.microsoft.com/en-us/resources/policies/microsoft-bing-adsprivacy-policy
    • Doubleclick – Google advertising platform that analyzes browsing activity across website to establish user profile – //www.google.com/policies/technologies/ads/
    • Twitter – Analyzes browsing activity across website to establish user profile – https://support.twitter.com/articles/20170514
    • Facebook – Analyzes browsing activity across website to establish user profile – https://www.facebook.com/policies/cookies/
    A user can delete these cookies through browser settings.
    Improving Website experienceThese provide functionality to help us deliver a better user experience for our website. For example, cookies help facilitate chats with our sales representatives, allow you to search the website, and deliver the user quickly to their intended website location.1st party and selected third parties as defined below:
    • __cfduid 3rd party cookie – This cookie is strictly necessary for Cloudflare’s security features
    • __hssc Cookie for keeping track of sessions. This is used to determine if we should increment the session number and timestamps in the __hstc cookie. It contains: the domain, viewCount (increments each pageView in a session), session start timestamp. (Expires: 30 min)
    • __hssrc Whenever HubSpot changes the session cookie, this cookie is also set. We set it simply to the value “1”, and use it to determine if the user has restarted their browser. If this cookie does not exist when we manage cookies, we assume it is a new session. (Expires: None. Session cookie)
    • __hstc The main cookie for tracking visitors. It contains: the domain, utk (see below), initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session) (Expires: 2 years)
    • hsfirstvisit This cookie used to keep track of a user’s first visit. (Expires: 10 years)
    • hubspotutk This cookie is used for to keep track of a visitor’s identity. This cookie is passed to HubSpot on form submission and used when deduplicating contacts. (Expires: 10 years)
    • wordpress_ WordPress cookie for a logged in user.
    • wordpress_logged_in_ WordPress cookie for a logged in user.
    • wp-settings- WordPress also sets a few wp-settings-[UID] cookies. The number on the end is your individual user ID from the users database table. This is used to customize your view of admin interface, and possibly also the main site interface.
    • wp-settings-time- WordPress also sets a few wp-settings-{time}-[UID] cookies. The number on the end is your individual user ID from the users database table. This is used to customize your view of admin interface, and possibly also the main site interface.
    • __cfduid 3rd party cookie – This cookie is strictly necessary for Cloudflare’s security features
    A user can delete these cookies through browser settings.
    LAST UPDATE: 24.01.2018, LONDON
    Approve
    Refuse
    Cookie Policy
    This Bear Design Cookie Policy (“Policy”) outlines the general policy, practices, and types of cookies that Bear Design And Communications Ltd.. (“Bear Design”, “we”, “us” or “our”) may use to improve our services and your experience when visiting our websites.Cookies are small pieces of text used to store information on web browsers. They’re used by many websites to store and receive identifiers and other information on devices, such as a handheld phone or computer. Our site and services use cookies and other similar technologies (collectively in this Policy, “cookies”), in order to provide a better service to you and to generally improve our sites and services. For example, we may use cookies to help direct you to the appropriate part of our websites, by indicating that you are a repeat visitor. We may also use information to present you with services that are matched to your preferences.Some portions of our websites are functional without cookies, and you may generally choose whether to accept cookies. Most web browsers are set to accept cookies by default, however, you may be able to delete cookies yourself through your browser’s cookie manager. To do so, please follow the instructions provided by your web browser. Please note that disabling cookies will reset your session, disable auto-login, and may adversely the availability and functionality of our websites and the services we can provide to you.As part of our services, we may also place cookies on the computers of visitors to websites protected by Bear Design. We do this in order to identify malicious visitors, reduce the chance of blocking legitimate users, and to provide customized services.Our websites use first party cookies (i.e., cookies set directly by Bear Design) as well as third party cookies, as detailed in the table below.
    Type of CookieWhy we use these cookiesWho serves them and where can you find out more information?
    Analytics and research of usersThese are used to understand, improve, and research users visiting //beardesign.me and their needs for our product offerings. For example, we may use cookies to understand what pages a user browses before submitting a sales request form. We do not share information about this analysis with any third parties.Selected third parties listed and defined as follows:
    • Google Analytics – Web traffic tracking – //www.google.com/policies/privacy/
    • Bing – Conversion tracking from Bing ads – https://advertise.bingads.microsoft.com/en-us/resources/policies/microsoft-bing-adsprivacy-policy
    • Doubleclick – Google advertising platform that analyzes browsing activity across website to establish user profile – //www.google.com/policies/technologies/ads/
    • Twitter – Analyzes browsing activity across website to establish user profile – https://support.twitter.com/articles/20170514
    • Facebook – Analyzes browsing activity across website to establish user profile – https://www.facebook.com/policies/cookies/
    A user can delete these cookies through browser settings.
    Improving Website experienceThese provide functionality to help us deliver a better user experience for our website. For example, cookies help facilitate chats with our sales representatives, allow you to search the website, and deliver the user quickly to their intended website location.1st party and selected third parties as defined below:
    • __cfduid 3rd party cookie – This cookie is strictly necessary for Cloudflare’s security features
    • __hssc Cookie for keeping track of sessions. This is used to determine if we should increment the session number and timestamps in the __hstc cookie. It contains: the domain, viewCount (increments each pageView in a session), session start timestamp. (Expires: 30 min)
    • __hssrc Whenever HubSpot changes the session cookie, this cookie is also set. We set it simply to the value “1”, and use it to determine if the user has restarted their browser. If this cookie does not exist when we manage cookies, we assume it is a new session. (Expires: None. Session cookie)
    • __hstc The main cookie for tracking visitors. It contains: the domain, utk (see below), initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session) (Expires: 2 years)
    • hsfirstvisit This cookie used to keep track of a user’s first visit. (Expires: 10 years)
    • hubspotutk This cookie is used for to keep track of a visitor’s identity. This cookie is passed to HubSpot on form submission and used when deduplicating contacts. (Expires: 10 years)
    • wordpress_ WordPress cookie for a logged in user.
    • wordpress_logged_in_ WordPress cookie for a logged in user.
    • wp-settings- WordPress also sets a few wp-settings-[UID] cookies. The number on the end is your individual user ID from the users database table. This is used to customize your view of admin interface, and possibly also the main site interface.
    • wp-settings-time- WordPress also sets a few wp-settings-{time}-[UID] cookies. The number on the end is your individual user ID from the users database table. This is used to customize your view of admin interface, and possibly also the main site interface.
    • __cfduid 3rd party cookie – This cookie is strictly necessary for Cloudflare’s security features
    A user can delete these cookies through browser settings.
    LAST UPDATE: 24.01.2018, LONDON
    Approve
    Refuse
    Welcome
    We use cookies to ensure that we give you the best experience on our website. Before you continue browsing you must approve or refuse our cookie policy.
    Approve
    Refuse
    Cookie Policy