How an ex-YouTube insider investigated its secret algorithm

1-A

YouTube’s recommendation system draws on techniques in machine learning to decide which videos are auto-played or appear “up next”. The precise formula it uses, however, is kept secret. Aggregate data revealing which YouTube videos are heavily promoted by the algorithm, or how how many views individual videos receive from “up next” suggestions, is also withheld from the public.

Disclosing that data would enable academic institutions, fact-checkers and regulators (as well as journalists) to assess the type of content YouTube is most likely to promote. By keeping the algorithm and its results under wraps, YouTube ensures that any patterns that indicate unintended biases or distortions associated with its algorithm are concealed from public view.

By putting a wall around its data, YouTube, which is owned by Google, protects itself from scrutiny. The computer program written by Guillaume Chaslot overcomes that obstacle to force some degree of transparency.

The ex-Google engineer said his method of extracting data from the video-sharing site could not provide a comprehensive or perfectly representative sample of videos that were being recommended. But it can give a snapshot. He has used his software to detect YouTube recommendations across a range of topics and publishes the results on his website, algotransparency.org.

How Chaslot’s software works

The program simulates the behaviour of a YouTube user. During the election, it acted as a YouTube user may have if she were interested in either of the two main presidential candidates. It discovered a video through a YouTube search, and then followed a chain of YouTube–recommended titles appearing “up next”.

Chaslot programmed his software to obtain the initial videos through YouTube searches for either “Trump” or “Clinton”, alternating between the two to ensure they were each searched 50% of the time. It then clicked on several search results (usually the top five videos) and captured which videos YouTube was recommending “up next”.
The process was then repeated, this time by selecting a sample of those videos YouTube had just placed “up next”, and identifying which videos the algorithm was, in turn, showcasing beside those. The process was repeated thousands of times, collating more and more layers of data about the videos YouTube was promoting in its conveyor belt of recommended videos.

'Fiction is outperforming reality': how YouTube's algorithm distorts truth

By design, the program operated without a viewing history, ensuring it was capturing generic YouTube recommendations rather than those personalised to individual users.

The data was probably influenced by the topics that happened to be trending on YouTube on the dates he chose to run the program: 22 August; 18 and 26 October; 29-31 October; and 1-7 November.

On most of those dates, the software was programmed to begin with five videos obtained through search, capture the first five recommended videos, and repeat the process five times. But on a handful of dates, Chaslot tweaked his program, starting off with three or four search videos, capturing three or four layers of recommended videos, and repeating the process up to six times in a row.
Whichever combinations of searches, recommendations and repeats Chaslot used, the program was doing the same thing: detecting videos that YouTube was placing “up next” as enticing thumbnails on the right-hand side of the video player.

His program also detected variations in the degree to which YouTube appeared to be pushing content. Some videos, for example, appeared “up next” beside just a handful of other videos. Others appeared “up next” beside hundreds of different videos across multiple dates.

In total, Chaslot’s database recorded 8,052 videos recommended by YouTube. He has made the code behind his program publicly available here. The Guardian has published the full list of videos in Chaslot’s database here.

Content analysis

The Guardian’s research included a broad study of all 8,052 videos as well as a more focused content analysis, which assessed 1,000 of the top re

We assessed the top 500 videos that were recommended after a search for the term “Trump” and the top 500 videos recommended after a “Clinton” search. Each individual video was scrutinised to determine whether it was obviously partisan and, if so, whether the video favoured the Republican or Democratic presidential campaign. In order to judge this, we watched the content of the videos and considered their titles.

About a third of the videos were deemed to be either unrelated to the election, politically neutral or insufficiently biased to warrant being categorised as favouring either campaign. (An example of a video that was unrelated to the election was one entitled “10 Intimate Scenes Actors Were Embarrassed to Film”; an example of a video deemed politically neutral or even-handed was this NBC News broadcast of the second presidential debate.)

Many mainstream news clips, including ones from MSNBC, Fox and CNN, were judged to fall into the “evenhanded” category, as were many mainstream comedy clips created by the likes of Saturday Night Live, John Oliver and Stephen Colbert. Formulating a view on these videos was a subjective process but for the most part it was very obvious which candidate videos benefited. There were a few exceptions. For example, some might consider this CNN clip, in which a Trump supporter forcefully defended his lewd remarks and strongly criticised Hillary Clinton and her husband, to be beneficial to the Republican. Others might point to the CNN anchor’s exasperated response, and argue the video was actually more helpful to Clinton. In the end, this video was too difficult for us categorise. It is an example of a video listed as not benefiting either candidate.

For two-thirds of the videos, however, the process of judging who the content benefited was relatively uncomplicated. Many videos clearly leaned toward one candidate or the other. For example, a video of a speech in which Michelle Obama was highly critical of Trump’s treatment of women was deemed to have leaned in favour of Clinton. A video falsely claimingClinton suffered a mental breakdown was categorised as benefiting the Trump campaign.

We found that most of the videos labeled as benefiting the Trump campaign might be more accurately described as highly critical of Clinton. Many are what might be described as anti-Clinton conspiracy videos or “fake news”. The database appeared highly skewed toward content critical of the Democratic nominee. But for the purpose of categorisation, these types of videos, such as a video entitled “WHOA! HILLARY THINKS CAMERA’S OFF… SENDS SHOCK MESSAGE TO TRUMP”, were listed as favouring the Trump campaign.

Missing videos and bias

Roughly half of the YouTube-recommended videos in the database have been taken offline or made private since the election, either because they were removed by whoever uploaded them or because they were taken down by YouTube. That might be because of a copyright violation, or because the video contained some other breach of the company’s policies.

We were unable to watch original copies of missing videos. They were therefore excluded from our first round of content analysis, which included only videos we could watch, and concluded that 84% of partisan videos were beneficial to Trump, while only 16% were beneficial to Clinton.

Interestingly, the bias was marginally larger when YouTube recommendations were detected following an initial search for “Clinton” videos. Those resulted in 88% of partisan “Up next” videos being beneficial to Trump. When Chaslot’s program detected recommended videos after a “Trump” search, in contrast, 81% of partisan videos were favorable to Trump.

That said, the “Up next” videos following from “Clinton” and “Trump” videos often turned out to be the same or very similar titles. The type of content recommended was, in both cases, overwhelmingly beneficial to Trump, with a surprising amount of conspiratorial content and fake news damaging to Clinton.

Supplementary count

After counting only those videos we could watch, we conducted a second analysis to include those missing videos whose titles strongly indicated the content would have been beneficial to one of the campaigns. It was also often possible to find duplicates of these videos.

Two highly recommended videos in the database with one-sided titles were, for example, entitled “This Video Will Get Donald Trump Elected” and “Must Watch!! Hillary Clinton tried to ban this video”. Both of these were categorised, in the second round, as beneficial to the Trump campaign.

When all 1,000 videos were tallied – including the missing videos with very slanted titles – we counted 643 videos had an obvious bias. Of those, 551 videos (86%) favoured the Republican nominee, while only 92 videos (14%) were beneficial to Clinton.

Whether missing videos were included in our tally or not, the conclusion was the same. Partisan videos recommended by YouTube in the database were about six times more likely to favour Trump’s presidential campaign than Clinton’s.

Database analysis

All 8,052 videos were ranked by the number of “recommendations” – that is, the number of times they were detected appearing as “Up next” thumbnails beside other videos. For example, if a video was detected appearing “Up next” beside four other videos, that would be counted as four “recommendations”. If a video appeared “Up next” beside the same video on, say, three separate dates, that would be counted as three “recommendations”. (Multiple recommendations between the same videos on the same day were not counted).

Here are the 25 most recommended videos, according to the above metric.

القائمة الرئيسية

الصفحات

علوش للمعلومياتية

How an ex-YouTube insider investigated its secret algorithm

القائمة الرئيسية

الصفحات

How an ex-YouTube insider investigated its secret algorithm

مواضيع ذات صلة

تعليقات: (0) إضافة تعليق