Methodology Summary

Ad Fontes Media’s ranking method has evolved from its initial form, in which its Founder, Vanessa Otero, performed the rankings, to its current method of multi-analyst content analysis ratings. Ad Fontes finished its first extensive multi-analyst content ratings research project in June 2019, and the current version of the Media Bias Chart is generated from those ratings plus ongoing additional ratings. See the White Paper about this project here.

Since the first Media Bias Chart was launched, people have asked for more data and transparency about the news source ratings.  In response, we conducted our first large content analysis ratings project. The first iteration of this ratings project was conducted to provide a large enough initial set of rankings to populate the Media Bias Chart reasonably well. Extensive details of that project, which ran from March to June 2019, are described in this white paper here, and are summarized in the next few paragraphs.

During this project, nearly 1800 individual articles and TV news shows were rated by at least three analysts with different political views (left, right and center). We had 20 analysts, each analyst having analyzed about 370 articles and about 17 TV shows. Each analyst rated approximately three articles from each of the over 100 news sources available for viewing on the Chart. As a result, we have nearly 7,000 individual ratings.

The multi-person ranking per article was designed to minimize the impact of any one person’s political bias on the published ranking, and the breadth of coverage by each analyst over all of the sources was designed to enhance each analyst’s familiarity with sources across the spectrum.

The content ratings research project ran over a course of twelve weeks, with three study periods of three to four weeks each. The articles rated will covered a minimum of seven articles per source, but for some larger sources we rated up to over 80 articles.

The sample sets of articles and shows were pulled from sites three times throughout the project, and each time, all articles were pulled on the same day, meaning that they were from the same news cycle. The purpose of pulling all articles from the same day was so that analysts would be able to incorporate evaluations of bias by omission and bias by topic selection.

The type of rating we asked each analyst to provide was an overall coordinate ranking on the chart (i.e., “40, -12”). These rankings were based on the methodology shown and described in Ad Fontes Media’s grading rubrics as of  early 2019 (which you may have seen on our website). The ranking methodology is rigorous and rule-based. There are many specific factors we take into account for both reliability and bias because there are many measurable indicators of each. Therefore, the ratings are not simply subjective opinion polling, but rather methodical content analysis. Overall source rankings are composite weighted rankings of the individual article and story scores.

We continue to refine our methodology as we discover ways to have analysts classify rating factors more consistently. Our analysts used the first versions of our analyst rating software interface. A version of this ratings software will soon be available for use by educators in classrooms. For educators interested in teaching students how to rate articles like Ad Fontes Media does, see here.

Methodology Deep-Dive

This process has evolved over time and with input from many thoughtful commentators and experts. You can read about how we got to this methodology over in this series of methodology posts (these are long—Vanessa will try to put them in a book eventually). If you have questions about why we now rank sources as described below, you will find a lot of the answers in those posts. You may also have questions about the taxonomy itself (like how we define “reliability” and “bias”). You’ll find a lot of those answers in those posts as well.

Before getting into the criteria (shown in rubric form below), it is helpful to understand some principles and caveats:

  • The main principle of Ad Fontes (which means “to the source” in Latin) is that we analyze content. We look as closely as possible at individual articles, shows, and stories, and analyze what we are looking at: pictures, headlines, and most importantly, sentences and words.
  • The overall source ranking is a result of a weighted average, algorithmic translation of article raw scores. Low quality and highly biased content weight the overall source down and outward. Further, the reach (ratings, popularity, etc.) of individual articles are weighted as well within each source. That means if a source’s low-quality and highly biased content is highly read or watched, it weights the overall source down even more. The exact weighting algorithm is not included here because it is proprietary. Aspects of what is disclosed here are patent pending.
  • The current rankings are based on a small sample size from each source. We believe these sample articles and shows are representative of their respective sources, but these rankings will certainly get more accurate as we rate more articles over time (you can help us with that here).
  • Keep in mind that this ratings system currently uses humans with subjective biases to rate things that are created by other humans with subjective biases and place them on an objective scale. That is inherently difficult, but can be done well and in a fair manner. There are other good models for doing something similar, such as grading standardized written exams (like AP tests and the bar exam), or judging athletic competitions (such as gymnastics and figure skating). You can get to good results as long as you have standards on how to judge many granular details, and have experts that are trained on such standards implementing them. We’ve begun to create that process here. Below are some of those granular details.

Formal Ratings Process

Our full article grading rubric shows each of the factors analysts are asked to consider before providing a final rating for reliability and bias. Displaying multiple factors in rubric form accomplishes a few things:

  • It shows the exact criteria and quantitative measures used to rate each article, which increases transparency;
  • It makes it easier to compile and record more data over time, which will eventually be of much value;
  • It creates safeguards against individual raters’ subjective political biases (including our own);
  • It allows results to be replicated by others.

Our current analyst team has received training on these ratings standards, and using multiple raters on particular articles allows averaging of scores to minimize effects of bias.

If it sounds like using this rating rubric with multiple raters per article is a lot of work, that’s because it is. Each analyst does not fill out an entire rubric in order to provide an overall rating; rather, each analyst uses it as a mental guide to consider each factor, and then provides a two-dimensional rating in the form of overall reliability and bias scores.

However, we have already been able to show that we get reasonably consistent ratings results by raters with diverging political views, so eventually, this ratings process can be automated and scaled up via machine learning forms of artificial intelligence. An important part of this scaling-up automation process will be quality checking the AI results against subjective ratings by humans to ensure the scoring and algorithms produce results consistent with human judgments. For example, we would have the same article subjectively rated on the chart by a panel of three humans, one who identifies as fairly right, one who identifies as fairly left, and one who identifies as centrist. These three ratings would be averaged for an overall subjective ranking, and a machine scored article would have to match that average ranking. Aspects of this process are patent pending.

Article and Show Rating Methodologies

Note that there are different, additional criteria that go into rating TV shows as compared to written articles. First I’ll discuss the article rating methodology because the show rating methodology uses the article rating methodology as a first step and adds additional show ranking criteria, which mostly deals with the quality and purpose of show guests. (See both rubrics below)

  • Article Rating Methodology

 

Step 1: Rubric Grading

Below is an article grading rubric that we use to guide the rating of articles. As shown, there are two main parts, one for a quality (now known as “reliability”)score and one for a bias score.

Quality

  • Element scores: Each can be evaluated on a scale of 1-8, which corresponds to the vertical categories on the chart.
  • Sentence scores: Each sentence (or sometimes multiple sentences) can be evaluated for both Veracity (1 being completely true and 5 being completely false) and Expression (1 being a fact statement and 5 being an opinion statement).. For more on these scales, see here.
  • Unfairness instances: We look for discrete unfairness instances. For more on what constitutes something being unfair, see here.

 

Bias

  • Topic Selection and/or Presentation: The topic itself, and how it is initially presented in the headline, categorized in one of the seven horizontal categories on the chart (MEL=Most Extreme Left, HPL=Hyper-Partisan Left, etc.). This is one of the ways to measure bias by omission. Here, we categorize a topic in part by what it means that the source covered this topic as opposed to other available topics covered in other sources.
  • Sentence Metrics: Not every sentence contains instances of bias related to the three types listed here, which are biases based on “political position,” “characterization,” and “terminology.” Sometimes these instances overlap. Discrete instances throughout the article are considered.
  • Comparison: The overall bias is scored in comparison to other known articles about the subject. This is a second way (and probably most important way) we measure bias by omission. Comparison is done in view of other contemporaneous stories about the same topic, and bias can be determined when we know all the possible facts that could reasonably be covered in a story.

Step 2: Weighting and Overall Score

Analysts are asked to consider each factor they evaluated and consider whether any particular factors showed extremely low reliability or extreme bias. For example, misleading or false statements are factors that show extremely low reliability, and name-calling or personal attacks are factors that show extreme bias. If so, analysts are instructed to weight their overall scores downward or outward in view of those factors. Otherwise, the individual factors may be averaged to provide an overall reliability and bias score.

Note: The following rubric is subject to copyright.

  • Show Rating Methodology

Step 1: Rubric Grading

Grading TV shows (or video, e.g., YouTube shows) involves grading everything according to the Article Grading Rubric but also adds the Show Grading Rubric shown below.

There are a couple of major format differences between articles and shows, the first of which is that there are many more visual elements (titles, graphics, ledes, and chyrons), each of which may be scored. The second is that a major component of most cable news shows are guest interactions, which is what the show grading rubric focuses on. It is critically important to individually rate the Type, Political Stance, and Subject Matter Expertise of each guest, as well as the Host Posture towards each guest. Although at first glance, many cable news shows seem to follow the same format, these guest metrics provide the greatest insight into the differences in quality and bias between shows.

I’ve received many comments to the effect of “I can tell you are biased because Fox and MSNBC (or Fox and CNN) are not at similar places on opposite sides of the chart.” I disagree with the notion that each of these networks should simply be viewed as different sides of the same coin, the only difference being political position. Given the way we rate the content of these shows, it is highly improbable that dozens of hours of programming each day on each network would have very similar scores. It is illogical to assume they would, given that the producers of the shows have different goals, are trying to fill different niches, and are trying to appeal to different audiences.

In order to compete in the news business, many sources purposely try to differentiate themselves from similar sources. Cable news hosts themselves are typically employed to bring a particular kind of contribution that is unique based on their styles, backgrounds, and viewpoints, which naturally results in different content analysis rankings by our metrics.

Guest Type:

“Guest” is a term for anyone who appears on the show who is not a host. These guests can be called any number of titles depending on the show. They can include on-site reporters, who report in a traditional style seen on network evening news programs or local new programs, but a large number of guests on cable news shows are commentators, and are called “contributors,” “analysts,” “interviewees,” etc. Many shows commonly have up to ten such guests per show, which is why there are ten columns on the rubric. Of the guest types listed (politician, journalist, paid contributor, etc.), none are necessarily indicative of quality of bias on their own.

Quality and bias of guest appearances are instead determined by the “guest type” in conjunction with each of the other metrics for each guest.

Guest Political Stance on Subject:

A guest’s political stance on a particular subject, if known or described during the guest appearance, is rated according to the horizontal scale (Most Extreme, Hyper-partisan, Neutral, etc.). It is key to rate the stance of the guest on the particular issue at the particular time of the appearance, rather than to rate the stance based on a person’s historical or reputational affiliation, or a broad categorization of a person’s political leanings, which is a less accurate basis for rating bias of a guest appearance. That is, it is less accurate to say “this person is liberal (or conservative)” than to say “this person took this liberal (or conservative) stance at this time.” People and their histories are complex.

For politicians, political stances on particular issues are often publicly available information via their platform or other statement of issues on their websites, and their historical/reputational stances are often the same as their stances during a particular appearance. However, it is especially important to distinguish between a guest’s current stances and past affiliations, particularly during times of rapid change in politics. For example, if the current Governor of Ohio, John Kasich, appears on a show and fairly criticizes President Trump for a particular statement or action, such a stance should be rated as neutral or skews left, instead of using his party affiliation (Republican) to rate his stance as skews right. However if he was talking about his positions on abortion or taxes, his stance would likely be rated as skews right (based on such stated right-leaning positions on Kasich’s website).

Guest Expertise on Subject Matter

This rating takes into account both the expertise of the guest as well as the subject matter about which the guest is asked to speak. An “expert” does not necessarily have to have particular titles, degrees, or ranks. Rather, “expertise” is defined here as the ability to provide unique insight on a topic based on experience. Although many guests have expertise and a title, degree, and/or rank, others have expertise by virtue of a particular experience instead. For example, an ordinary person who has experienced addiction to opioids may have expertise on the subject of “how opioid addiction can affect one’s life.” We can refer to this type of expert as an “anecdotal” expert. However, that same person may or may not have expertise on the related subject “what are the best ways to address the opioid epidemic,” and a different kind of expert may be a physician or someone with public health policy experience. We can refer to such an expert as a “credentialed” expert.

Expertise is rated on a scale of 1-5, as follows:

1: Unqualified to comment on subject matter

2: No more qualified to comment than any other avid political/news observer on political/news topic

3: Qualified on ordinarily complex topic or common experience

  1. Qualified on very complex topic/Very qualified on ordinarily complex topic/Qualified on uncommon experience
  2. Very qualified on very complex topic/ Very qualified on very uncommon experience

Host Posture Metric

The interaction between the guest and the host also impacts the bias of the guest appearance. For example, the bias present when a host is challenging a hyper-partisan guest is very different than the bias present when another host is sympathetic with the same hyper-partisan guest.

The scale, as shown below, identifies several types of host postures, each of which are fairly self-explanatory. They are somewhat listed in order of “worst” to “best,” but some postures, such as “challenging,” or “sympathetic” are not necessarily good or bad, and determinations of bias depend on the context.

Note: The following rubric is subject to copyright.

Step 2: Overall score.

Analysts may take combinations of multiple factors into account for an overall score. For example, if a guest is a politician, and has a hyper-partisan right stance, and the host simply provides a platform for that politician to advocate his or her opinion, that guest appearance will be rated as “hyper-partisan right, opinion.” If the same politician is on a different show in which the host takes a challenging posture toward the guest, the appearance may be rated as “neutral/balance, analysis.” If the host takes a hostile posture toward the same guest, the appearance may be rated as “skews left, unfair persuasion.” As one can surmise from the options shown in the rubric, there are many possible combinations of guest type, guest stance, guest expertise, and host posture.

Similar to how the article scoring works, analysts may weights certain factors or average them. For example, if a guest is unqualified for the subject matter and hyper-partisan, and the host takes a “cheerleading” posture, this combination would be weighted heavily downward and outward. Then the analysts provide scores as coordinates on the chart (e.g., 48, -18).

Data Availability

There are several layers of data that may be of interest to researchers, media organizations, and regular chart users. Our overall source ranking coordinates and individual article rankings (which are averages of the ratings for each of the three analysts who rated the article) are available for download on our main Media Bias Chart page here.

If you would like additional underlying data, for either commercial or non-commercial purposes, please contact us here.