This process has evolved over time and with input from many thoughtful commentators and experts. You can read about how I got to this methodology over in this series of methodology posts (these are long—I’ll try to put them in a book eventually). If you have questions about why I now rank sources as described below, you will find a lot of the answers in those posts. You may also have questions about the taxonomy itself (like how I define “quality” and “bias”). You’ll find a lot of those answers in those posts as well.
Before getting into the criteria (shown in rubric form below), it is helpful to understand some principles and caveats:
- The main principle of Ad Fontes (which means “to the source” in Latin) is that we analyze content. We look as closely as possible at individual articles, shows, and stories, and analyze what we are looking at: pictures, headlines, and most importantly, sentences and words.
- Though Ad Fontes currently consists of *just me* for content analysis, plus a handful of awesome helpers and advisors, I am working on expanding this project to include many more content analysts, so I will aspirationally refer to Ad Fontes herein as “we.”
- The overall source ranking is a result of a weighted average, algorithmic translation of article raw scores. Low quality and highly biased content weight the overall source down and outward. Further, the reach (ratings, popularity, etc.) of individual articles are weighted as well within each source. That means if a source’s low-quality and highly biased content is highly read or watched, it weights the overall source down even more. The exact weighting algorithm is not included here because it is proprietary. Aspects of what is disclosed here are patent pending.
- The current rankings are based on a small sample size from each source. We believe these sample articles and shows are representative of their respective sources, but these rankings will certainly get more accurate as we rate more articles over time (you can help us with that here).
- Keep in mind that this ratings system currently uses humans with subjective biases (well, as of right now, one human with subjective biases) to rate things that are created by other humans with subjective biases and place them on an objective scale. That is inherently difficult, but can be done well and in a fair manner. There are other good models for doing something similar, such as grading standardized written exams (like AP tests and the bar exam), or judging athletic competitions (such as gymnastics and figure skating). You can get to good results as long as you have standards on how to judge many granular details, and have experts that are trained on such standards implementing them. We’ve begun to create that process here. Below are some of those granular details.
For the past couple of years, I have been rating individual articles using a shorthand, or “quick rating” version of the formalized full rubric below. I have rated over a thousand articles, including at least ten from each source listed on the chart, and substantially more from the most popular sources (i.e., over a hundred each for the Washington Post, NYTimes, AP, etc). I have rated these using my shorthand version because of the enormous amount of time it takes to read and rate that many stories.
Formal Ratings Process
I have formalized the above shorthand process into the full article grading rubric shown below. Standardizing this process accomplishes a few things:
- It shows the exact criteria and quantitative measures used to rate each article, which increases transparency;
- It makes it easier to compile and record more data over time, which will eventually be of much value;
- It creates safeguards against individual raters’ subjective political biases (including my own);
- It allows results to be replicated by others.
I plan on having a team of people (including those with differing political views than mine) trained on these ratings standards, and using multiple raters on particular articles to allow averaging of scores to minimize effects of bias.
If it sounds like using this rating rubric with multiple raters per article is a lot of work, that’s because it is. However, once we can show that we get consistent ratings results by raters with diverging political views, eventually, this ratings process can be automated and scaled up via machine learning forms of artificial intelligence. An important part of this scaling-up automation process will be quality checking the AI results against subjective ratings by humans to ensure the scoring and algorithms produce results consistent with human judgments. For example, we would have the same article subjectively rated on the chart by a panel of three humans, one who identifies as fairly right, one who identifies as fairly left, and one who identifies as centrist. These three ratings would be averaged for an overall subjective ranking, and a machine scored article would have to match that average ranking. Aspects of this process are patent pending.
Article and Show Rating Methodologies
Note that there are different, additional criteria that go into rating TV shows as compared to written articles. First I’ll discuss the article rating methodology because the show rating methodology uses the article rating methodology as a first step and adds additional show ranking criteria, which mostly deals with the quality and purpose of show guests. (See both rubrics below)
- Article Rating Methodology
Step 1: Rubric Grading
Below is an article grading rubric that we currently use for full rankings of articles. As shown, there are two main parts, one for a quality score and one for a bias score.
- Element scores: We score each on a scale of 1-8, which corresponds to the vertical categories on the chart.
- Sentence scores: Each sentence is rated for both Veracity (1 being completely true and 5 being completely false) and Expression (1 being a fact statement and 5 being an opinion statement). We put hash marks under each 1-5 category for each sentence and count how many are in each category. For more on these scales, see here.
- Unfairness instances: We count the raw number of unfairness instances. For more on what constitutes something being unfair, see here.
- Topic Selection and/or Presentation: The topic itself, and how it is initially presented in the headline, categorized in one of the seven horizontal categories on the chart (MEL=Most Extreme Left, HPL=Hyper-Partisan Left, etc.). This is one of the ways to measure bias by omission. Here, we categorize a topic in part by what it means that the source covered this topic as opposed to other available topics covered in other sources.
- Sentence Metrics: Not every sentence contains instances of bias related to the three types listed here, which are biases based on “political position,” “characterization,” and “terminology.” Sometimes these instances overlap. Each one throughout the article is counted
- Comparison: The overall bias is scored in comparison to other known articles about the subject. This is a second way (and probably most important way) we measure bias by omission. Comparison is done in view of other contemporaneous stories about the same topic, and bias can be determined when we know all the possible facts that could reasonably be covered in a story.
Step 2: Algorithm Translation
The raw scores are then input into a proprietary algorithm that weights certain categories of scores and averages them, and then translates those weighted average scores into coordinates on the chart (e.g., 48, -18).
The exact weighting formulas are somewhat complex, but as an example of some of the effect of the weighting decisions, consider an article that has 20 sentences, and on the Veracity scale (how true each sentence is), 14 of the sentences are 1’s (completely true), 4 sentences are 3’s (neither true nor false) and 2 sentences are 5’s (completely false). A straight average would give this a Veracity score of 1.8 (mostly true) on this scale, but that would be a bad result because an article containing two completely, demonstrably false statements is really, really bad according to journalism standards. Therefore, we weight any Veracity “5” scores very heavily.
Not all weighting decisions are that extreme though, and some are a straight average. For example, on the Expression scale, an article that has an equal number of 1’s (stated very factually) and 3’s (stated as analysis) would likely get an Expression score of 2 (stated factually with some analysis). There are many relationships between different raw scores on the rubric that get translated in the algorithm.
Regarding how these scores are translated onto the coordinates on the chart, a number of different raw scores may result in placements in the different categories. For example, a source that has a lot of foul language used to characterize political opponents would have high raw scores in the “unfairness instances” metric and “characterization” metric in the “Most Extreme” columns, which would result in its placement in the low bottom right or left under “Propaganda/Contains Misleading Info.” This could be the case because the content is categorized in this system as “propaganda,” even if the content was not misleading. That is, it may not have any completely false statements (no Veracity “5’s”). Conversely, a different article from a different source may be placed in a similar spot on the chart because it has several Veracity “4’s,” and Expression “4’s,” even though it does not have high raw scores for the unfairness instances or extreme characterization metrics.
Note: The following rubric is subject to copyright. Educators can request royalty-free use at firstname.lastname@example.org.
- Show Rating Methodology
Step 1: Rubric Grading
Grading TV shows (or video, e.g., YouTube shows) involves grading everything according to the Article Grading Rubric but also adds the Show Grading Rubric shown below.
There are a couple of major format differences between articles and shows, the first of which is that there are many more visual elements (titles, graphics, ledes, and chyrons), each of which may be scored. The second is that a major component of most cable news shows are guest interactions, which is what the show grading rubric focuses on. It is critically important to individually rate the Type, Political Stance, and Subject Matter Expertise of each guest, as well as the Host Posture towards each guest. Although at first glance, many cable news shows seem to follow the same format, these guest metrics provide the greatest insight into the differences in quality and bias between shows.
I’ve received many comments to the effect of “I can tell you are biased because Fox and MSNBC (or Fox and CNN) are not at similar places on opposite sides of the chart.” I disagree with the notion that each of these networks should simply be viewed as different sides of the same coin, the only difference being political position. Given the way we rate the content of these shows, it is highly improbable that dozens of hours of programming each day on each network would have very similar scores. It is illogical to assume they would, given that the producers of the shows have different goals, are trying to fill different niches, and are trying to appeal to different audiences.
In order to compete in the news business, many sources purposely try to differentiate themselves from similar sources. Cable news hosts themselves are typically employed to bring a particular kind of contribution that is unique based on their styles, backgrounds, and viewpoints, which naturally results in different content analysis rankings by our metrics.
“Guest” is a term for anyone who appears on the show who is not a host. These guests can be called any number of titles depending on the show. They can include on-site reporters, who report in a traditional style seen on network evening news programs or local new programs, but a large number of guests on cable news shows are commentators, and are called “contributors,” “analysts,” “interviewees,” etc. Many shows commonly have up to ten such guests per show, which is why there are ten columns on the rubric. Of the guest types listed (politician, journalist, paid contributor, etc.), none are necessarily indicative of quality of bias on their own.
Quality and bias of guest appearances are instead determined by the “guest type” in conjunction with each of the other metrics for each guest.
Guest Political Stance on Subject:
A guest’s political stance on a particular subject, if known or described during the guest appearance, is rated according to the horizontal scale (Most Extreme, Hyper-partisan, Neutral, etc.). It is key to rate the stance of the guest on the particular issue at the particular time of the appearance, rather than to rate the stance based on a person’s historical or reputational affiliation, or a broad categorization of a person’s political leanings, which is a less accurate basis for rating bias of a guest appearance. That is, it is less accurate to say “this person is liberal (or conservative)” than to say “this person took this liberal (or conservative) stance at this time.” People and their histories are complex.
For politicians, political stances on particular issues are often publicly available information via their platform or other statement of issues on their websites, and their historical/reputational stances are often the same as their stances during a particular appearance. However, it is especially important to distinguish between a guest’s current stances and past affiliations, particularly during times of rapid change in politics. For example, if the current Governor of Ohio, John Kasich, appears on a show and fairly criticizes President Trump for a particular statement or action, such a stance should be rated as neutral or skews left, instead of using his party affiliation (Republican) to rate his stance as skews right. However if he was talking about his positions on abortion or taxes, his stance would likely be rated as skews right (based on such stated right-leaning positions on Kasich’s website).
Guest Expertise on Subject Matter
This rating takes into account both the expertise of the guest as well as the subject matter about which the guest is asked to speak. An “expert” does not necessarily have to have particular titles, degrees, or ranks. Rather, “expertise” is defined here as the ability to provide unique insight on a topic based on experience. Although many guests have expertise and a title, degree, and/or rank, others have expertise by virtue of a particular experience instead. For example, an ordinary person who has experienced addiction to opioids may have expertise on the subject of “how opioid addiction can affect one’s life.” We can refer to this type of expert as an “anecdotal” expert. However, that same person may or may not have expertise on the related subject “what are the best ways to address the opioid epidemic,” and a different kind of expert may be a physician or someone with public health policy experience. We can refer to such an expert as a “credentialed” expert.
Expertise is rated on a scale of 1-5, as follows:
1: Unqualified to comment on subject matter
2: No more qualified to comment than any other avid political/news observer on political/news topic
3: Qualified on ordinarily complex topic or common experience
- Qualified on very complex topic/Very qualified on ordinarily complex topic/Qualified on uncommon experience
- Very qualified on very complex topic/ Very qualified on very uncommon experience
Host Posture Metric
The interaction between the guest and the host also impacts the bias of the guest appearance. For example, the bias present when a host is challenging a hyper-partisan guest is very different than the bias present when another host is sympathetic with the same hyper-partisan guest.
The scale, as shown below, identifies several types of host postures, each of which are fairly self-explanatory. They are somewhat listed in order of “worst” to “best,” but some postures, such as “challenging,” or “sympathetic” are not necessarily good or bad, and determinations of bias depend on the context.
Note: The following rubric is subject to copyright. Educators can request royalty-free use at email@example.com.
Step 2: Algorithm Translation
The algorithm decisions for show grading mainly involve assigning scores for particular combinations of scores in the categories. For example, if a guest is a politician, and has a hyper-partisan right stance, and the host simply provides a platform for that politician to advocate his or her opinion, that guest appearance will be rated as “hyper-partisan right, opinion.” If the same politician is on a different show in which the host takes a challenging posture toward the guest, the appearance may be rated as “neutral/balance, analysis.” If the host takes a hostile posture toward the same guest, the appearance may be rated as “skews left, unfair persuasion.” As one can surmise from the options shown in the rubric, there are many possible combinations of guest type, guest stance, guest expertise, and host posture.
Similar to how the article scoring works, the raw scores for show grading are also input into an algorithm that weights certain categories of scores and averages them. In addition, the guest appearance combination is then scored and certain appearances are weighted. For example, if a guest is unqualified for the subject matter and hyper-partisan, and the host takes a “cheerleading” posture, this combination would be weighted heavily downward and outward. Then the algorithm translates those weighted average scores into coordinates on the chart (e.g., 48, -18).
Many people have inquired about whether the underlying scoring data is available. There are several layers of data that may be of interest to researchers, media organizations, and regular chart users. The data is/will/will not be available as follows:
Chart coordinate data: This is simple output data that lists the coordinate positions of sources, shows, and articles. Some of this data is already available in a sortable list format for the 4.0 chart and cable network sub-charts. As this project expands to allow viewing of all ranked articles on an interactive chart, those article coordinates will be publicly viewable and available.
Rubric grading raw and weighted scores: This data is currently not available because the data set is currently not large enough or in a uniform enough format for public production. We plan on always making sample sets available for public view for transparency, and the rest will be available under commercial license terms. We anticipate that first sets will be available mid-2019.
Algorithms: These are proprietary and will not be publicly available.