Twitter bots, explained
By Trung Phan
Breaking down Twitter bots
What is a soufflé?
Up until about a week ago, my answer was “a French dish which is puffy”. I’ve had soufflés but only recently Googled what it actually is: “A soufflé is a baked egg-based dish originating in France, that can be served as a savory main dish or sweetened as a dessert.”
For the longest time, my understanding of Twitter bots was similarly deficient and along the lines of “a troll army of anonymous accounts that spread Russian misinformation” or “crypto scam”.
But since Elon launched his bid for Twitter last month, I did some proper research on bots. And by “proper”, I mean “read up on all the Twitter engineering blog posts about the topic”.
So, let me share my findings on the following questions:
- What are Twitter bots?
- How many Twitter bots are there?
- What’s a good bot vs. a bad bot?
- How does Twitter deal with bots?
- What can Elon do about the bots?
But before we get into it, here is the funniest bot-related tweet ever:
What are Twitter bots?
Bots are automated accounts that are programmed to do certain activities: tweeting, re-tweeting, liking, following, replied, DMs etc. (Twitter says they are “nothing more, nothing less”).
Twitter’s Site Integrity team notes that the the term “bot” has been warped in recent years to “mischaracterize accounts with numerical usernames that are auto-generated when your preference is taken, and more worryingly, as a tool by those in positions of political power to tarnish the views of people who may disagree with them or online public opinion that’s not favorable.”
This mischaracterization is pretty much in line with my pre-research perception that bots are “a troll army of anonymous accounts that spread Russian misinformation”.
The “troll army” framing catapulted to prominence during the 2016 US Presidential election (and there was evidence of bad behaviour).
But an automated account in and of itself is not a negative.
How many Twitter bots are on the platform?
Twitter estimates that 5% of the platform is bots. If we apply that to Twitter’s 229m daily active users (DAU) figure, then there are more than 11m bots active on the platform right now.
In 2017, a third-party research placed the % of bot accounts between 9-15%! People in this camp believe Twitter is under-reporting bots to make the platform user numbers look more robust. Twitter rebuts that external estimates are based on incomplete information.
Either way, Twitter’s own calculation of bots would be even higher if the platform didn’t “permanently suspend millions of accounts every month that are automated or spammy” and they do so “before [these accounts] ever reach an eyeball in a Twitter Timeline or Search.”
Which brings us to our next question…
What’s a good bot vs. a bad bot?
Twitter takes a “holistic” look at bot accounts to determine if the activity is no beuno.
The Site Integrity team is specifically looking for prohibited activities including:
- Malicious use of automation to undermine and disrupt the public conversation, like trying to get something to trend
- Artificial amplification of conversations on Twitter, including through creating multiple or overlapping accounts
- Generating, soliciting, or purchasing fake engagements
- Engaging in bulk or aggressive tweeting, engaging, or following
- Using hashtags in a spammy way, including using unrelated hashtags in a tweet (aka “hashtag cramming”)
An example of “malicious use” is the bot networks that flooded Twitter during the 2016 US election cycle. There were actually bots on both sides of the campaign, but pro-Trump bots out-tweeted pro-Clinton bots at a 7:1 ratio, per one study (the intention of the tweets were often to pump up inflammatory content). More recently, Russian bots have been spreading pro-Russia propaganda during Putin’s brutal invasion of Ukraine.
Another issue is bots as fake followers. Twitter auditing tool SparkToro estimates that 40-50% of the top Twitter accounts — including @BarackObama (132m followers), @BillGates (59m),
@TrungTPhan (390k), @Cristiano (100m), @ElonMusk (91m) @KimKardashian (72m) — are fake. However, fake accounts are probably a bigger issue for the single million (and below) follower figures; that’s the range that people are charging money for engagement based on their follower size.
Perhaps the most salient example right now of bots run amok is crypto spam/scam, which flood the replies of notable accounts. Here’s the playbook: a prominent account tweets (@ElonMusk) followed by a reply from a lookalike account (@EloooonMusk) that is doing a fake crypto giveaway (“send 0.2 ETH to this address to receive 1 ETH”). The scam tweet is usually followed up by a bunch of dopey replies talking about how much money the person “made” taking part in the giveaway.
If a bot account isn’t involved in this tomfoolery, Twitter is more than happy to let them operate. There are lots of legit bot accounts which provide useful information or have a creative output. You’ve probably see some:
- @threadreaderapp turns long Twitter threads into blog posts
- @wayback_exe tweets out screenshots of old websites every 2 hours
- @tinycarebot tweets out an hourly reminder to take a break from work (I’ve used it 10x already to procrastinate from writing this email)
- @EveryElonReply tweets out every like or reply from Elon
- @EarthquakeBot tweets out every earthquake of 5.0 magnitude or higher
- @EveryColorBot tweets out a different color swatch every hour
- @DistortBot distorts any photo
- @Year_Progress tweets out a bar of how much of the year has progressed (and the replies when we get to 69% are…predictable lol)
How does Twitter deal with bots?
Back in 2014, Twitter rolled out a bot spam-fighting tool called BotMaker. While it is now defunct, the engineering blog post for Botmaker highlights how Twitter spam is different than other platforms:
- Data: Twitter’s API is exposed to developers, which means that “spammers know (almost) everything Twitter’s anti-spam systems know through the APIs”
- Latency: Twitter is a real-time platform so “anti-spam systems must avoid adding latency to user-visible operations”
Compare these same variables in email spam, where the data is private and latency (of a few seconds) is usually fine.
After taking these challenges into consideration, Twitter’s approach to dealing with bots is pretty straightforward. In an interview with Big Technology, Twitter’s former head of engineering Alex Roetter says that the company builds machine learning (ML) classifiers to identify and ban bots.
Bot classifiers include a number of variables:
- IP address (is one IP address creating a lot of accounts?)
- Profile picture (the stock no-image avatar for a profile is usually a red flag; stock photos of young women are also used in various scams…you can reverse Google image search them if you’re getting some suspicious DMs)
- Account name (Auto-generated accounts with the [Number] format are suspicious … but people are also just sometimes lazy, so an account like @Trung7649373649 could actually be legit…but prob not)
- Follows / Followers (bot accounts typically follow a lot of other bot accounts)
- Timing of tweets (do a lot of related accounts all tweet at the same time?)
- Frequency of tweets (excessive tweeting — like 100+ a day — is a red flag; although I’d probably trigger this on my most active days)
- Account creation dates (are a network of related bots all created in the same time frame?)
- Content of tweets (bot campaigns typically regurgitate similar text)
- Links: (spammy bots love using URL shorteners because: 1) it makes the text shorter (duh); and 2) they can track traffic)
Once the system is set up, there is a trade-off: “[Twitter can tune] the classifier to either be really aggressive, where you’d eliminate bots but also ban a bunch of human ‘false positives’, or be less aggressive, where you’d let some bots slide and ban fewer humans.”
Twitter’s Safety team says its challenges 5m to 10m bots a week that it suspects of malicious activity. However, there are obviously many edge cases which require a second level of defence: a forensic investigative team (for its judgement calls, Twitter puts out a Transparency Report every six months).
By all accounts, Twitter has made some real progress dealing with bots. But it’s an endless battle (to wit: the reCAPTCHA arms race between spammers and major tech platforms to determine if someone is human).
One last thing: Twitter’s been testing a feature where bot accounts are verified by and attached to a real user. This is one way of flagging “legit” bots and should be rolled out widely.
What can Elon do about the bots?
As of May 6th, 2022, Elon’s bid to close his $46.5B deal for Twitter is looking more and more likely. Financing was always the biggest question, but he’s quieted doubters by selling $8.5B of Tesla stock and announcing $7B+ in equity financing from friends, VCs and other institutional investors.
In recent months, he’s tweeted that crypto spam bots are the “single most annoying part” of Twitter and that “we will defeat the spam bots are die trying”.
To battle the bots, Elon will have to pull the same ML levers used by Twitter’s current Site Integrity and Safety teams.
If Twitter gets much more aggressive on bots, it could snag a lot of false positives (aka Vietnamese-Canadian Twitter accounts that tweet 200 really dumb things a day that looks like a bot but is really a person). The proliferation of crypto spam bots does suggest that bot filters are too loose or need a few variables turned way up.
Per Elon, Tesla has the “most advanced” real-world AI based on the company’s computer vision work. Many will quibble with the claim but he can clearly attract top AI talent (as he’s done with Tesla and OpenAI, which he is no longer a part of).
Across the company, Twitter job applications jumped more than 2.5x in the week after the social network accepted Elon’s bid (which should offset expected attrition…although losing institutional knowledge always hurts).
Elon has also made comments that could complicate Twitter’s bot-fighting approach, particularly the desire to “open source” Twitter’s algorithm to make it more transparent.
As mentioned above, spammers already take advantage of how much of Twitter’s platform is exposed by the API. Further “opening” of the platform creates more surface area for malicious bots to outcompete Twitter’s defences.
Elon also wants “human verification”, which sounds like it’s partly meant to snuff out bots. Elon later confirmed that he understands the value of anonymity, so “human verification” doesn’t mean doxxing. It probably means way better reCAPTCHAs. He could kill two birds with one stone by having Twitter users label driving images for Tesla. And if Southeast Asian reCAPTACHA spam farms solve this at scale, Tesla should hire them (I’m only half joking).
Another idea: verifying anyone that pays a subscription (which he floated at $24 a year for a lower-priced Twitter Blue). Verified accounts could have their signal boosted relative to non-verified accounts, thus making the cost to run a bot network quite onerous.
Clearly, taking down bad bots while letting good bots operate is a balancing act. Just stop asking me if I’m a robot…you da robot!
Links and Memes
Larry Ellison and Steve Jobs: As part of the $7B+ equity commitment that Elon secured to help him close the Twitter deal, Larry Ellison — Oracle founder and 7th richest person in the world — chipped in $1B (see table below). He previously invested $1B into Tesla in December 2018 (he snagged a board seat and that bet is up 15x). This is how I imagine the convo went:
ELON: Yo Larry, Google is crushing it.
LARRY: Elon, it’s Ellison, not Page.
ELON: Oh dude, sorry. Was gonna ask you too: wanna chip in a $1B for Twitter?
LARRY: Eh, markets are kind of tough right now.
ELON: Dude, I literally made you $14B on your Tesla investment.
LARRY: You’re right.
ELON: And we can stick it to old William Gates.
LARRY: I’m in.
Anyways, here is Ellison giving the commencement speech at USC in 2016 and telling a wild story about his best friend Steve Jobs: in the mid-1990s, Ellison tried to buy Apple for $5B and re-install Jobs as CEO while giving him 25% of the company (Jobs was pushed out of Apple in 1985). Jobs said “no” because he didn’t want to go the hostile takeover route but eventually returned in 1997 when Apple acquired his 2nd computing startup (NeXT)…and the rest is history.
The secret to making geniuses?: Fascinating article by Erik Hoel titled “Why we stopped making Einsteins?”. While the internet has given people access to the entirety of human knowledge, we don’t seem to be creating geniuses like yesteryear. Why? Some people believe we’ve run out of ideas. Hoel makes the case that the decline in “geniuses” coincides with the decline in what he calls “Aristocratic tutoring”:
[Historically, raising geniuses] usually involved a paid adult tutor, who was an expert in the field, spending significant time with a young child or teenager, instructing them but also engaging them in discussions, often in a live-in capacity, fostering both knowledge but also engagement with intellectual subjects and fields. As the name suggests it was something reserved mostly for aristocrats, which means, no way around it, it was deeply inequitable.
Historical big brains that received this form of tutoring include Marcus Aurelius, Bertrand Russell, John von Neumann, Ludwig Wittgenstein, Charles Darwin, Ada Lovelace, Voltaire, Leo Tolstoy, John Stuart Mill, Hannah Arendt, and Virginia Woolf.
Per Hoel, the modern education system is much more cookie-cutter while the concept of “Aristocratic tutoring” — and its class implications — is unpalatable for society at large. Whether or not you agree, the piece will make you think.
- “Spike Lee’s Jackie Robinson Biopic”
- “David Cronenberg’s “Eastern Promises Sequel”
- “Christopher Nolan’s Howard Hughes Biopic”
- “Martin Scorsese’s “Dino” (about Frank Sinatra)
- “Quentin Tarantino’s “Double V Vega” (Pulp Fiction prequel)
Here’s one thing we don’t have to wait for: the prequel for Game of Thrones (HBO’s House of Dragons trailer just came out and the show is live in August…BOOM!).
Filmmaking breakdown (Tone): Here’s a great 6-minute video on how “tone” (AKA brightness) is used to set the mood of a movie by pulling on 3 levers: 1) lighting; 2) exposure and 3) art direction. The Godfather‘s director of photography Gordon Willis was given the incredible nickname “The Prince of Darkness” for how he used lighting to tell a story. And check out different Supermans below; same character, but very different lighting (and mood).
And here are some other memes (see y’all next week):
The tweet of the week comes from the world’s #1 chess player: Magnus Carlsen.