Facebook, Analyze This!

Mustapha Hamoui
Geek Living
Published in
3 min readJun 2, 2016

--

So apparently, Facebook’s robots are getting some new skills:

Today, Facebook announced Deep Text, an AI engine it’s building to understand the meaning and sentiment behind all of the text posted by users to Facebook. [..] “We want Deep Text to be used in categorizing content within Facebook to facilitate searching for it and also surfacing the right content to users,” Hussein Mehanna, an engineering director at Facebook’s machine learning team

“All of the text” eh? Mr. Mehanna, your name sounds Lebanese. Can you explain to me how Facebook is planning to analyze the sentiments behind a typical Lebanese Facebook comment like the following: (Ps: Even if you don’t understand Arabic, you can still read on after the paragraph below)

ya3niii ya mannnn, yalatif shu nefish rishak! kess ekhta 7sebak James Bond!! yalli sta7ou metou… Anywayzz baddak nedhar lyoum? khabbarouni 3an matra7 naarrrrrr bi Mar Mkhayel.. 2al les filles son gher shikil over there… yalla talk ltr! Ciao!

What seems like a casual conversation to Lebanese humans is actually a nightmare for artificial intelligence, no matter how advanced, to understand. Even advanced natural language processing (NLP) software need some basic rules and heuristics to work. But the above paragraph casually breaks any rule you can come up with. But for kick’s sake, let’s try to see what’s going on.

This paragraph contains:

  • Transliterated Lebanese colloquialism and expressions
  • Informal conventions for turning Arabic Letters into latin numerals (3=ع)
  • Seamless mixing in of English and French phrases (and the odd Italian Ciao)
  • Modification of words (in all languages) to emphasize emotion (mannnnn, anywayzzz, naaaar)
  • Word shortening (ltr)
  • Reference to foreign popular culture (James Bond)
  • Swearing that is used as a linguistic filler (kess ekhta)
  • The use of “Yalla” which I would classify as a challenge in its own right with its many, many, possible meanings.

You can already begin to get a sense of the kind of hurdles a robot can run into to classify that short comment.

If you’re constantly cursing your iphone for “auto-correcting” your “walla” into “walls” and your “shu” to “shy”, you get an idea of how stupid robots are when it comes to transliterated Arabic.

But What About Machine Learning?

But ya Mustapha, isn’t the whole point of machine learning — the technology where AIs learn from millions and millions of datasets instead of rules — meant to solve exactly that kind of problems?

The thing is, there will never be enough datasets of “transliterated lebanese modified colloquialism peppered with foreign languages”. A machine needs millions and millions of datasets with a certain consistency to be of any use. But the subset of users is so small (Lebanese on the Internet) and the variations are so large between dialects (Baddi vs Baddeh vs Biddy), educational level, choice of foreign languages to mix in, not to mention the constant additions of new popular culture references(Lebanese Political developments, memes, celebrities, etc) that this, in my opinion is an impossible problem to solve even with a combination of smart algorithms and machine learning.

So, dear Facebook, Rou7ou balltou el ba7er

--

--