Evangelos Bitsikas, who is pursuing a PhD in cybersecurity at the Northwestern University in the US, applied a new machine-learning program to data gleaned from the SMS system of mobile devices.
Receiving an SMS inevitably generates Delivery Reports whose reception bestows a timing attack vector at the sender. Bitsikas developed an ML model enabling the SMS sender to determine the recipient's location with a 96% accuracy for locations across different countries, the researcher says in a study.
The basic idea is that a hacker would send multiple text messages to the target phone, and the timing of each automated delivery reply creates a fingerprint of the target's location. These fingerprints have ever been there but weren't a problem until Bitsikas' group used ML to develop an algorithm capable of reading them. They can be fed into the machine-learning model, which then responds with the predicted location.
According to the researcher, it doesn't matter whether or not the communication is encrypted.
This is another excellent reason to never give anyone at all your cell phone number. Give them a voice number, like Google voice, Google Fi, voip.ms. The number of people have should not be the number attached to the device you walk around with.
Then if somebody wants to track you by your phone number they'll have to go to the phone service who is not connected directly to your phone other than through the internet. And then they'll have to track you through the internet. So it won't be a data broker selling your location data enmass indexable by your known phone number.
This is unlikely to work for internet messaging services. If you're finding the location of the phone based on the location of the tower that delivers the message to the phone, the analogous part in modern internet messaging services would be a cloud server in a cloud data center hub. There are few of these in the world, so even if you could narrow it down that way, you'd end up with vague locations like "western North America" or "Europe".
Additionally, the routing of messages in internet messaging services is usually not so sophisticated. You can only tell the difference between sending a message to somebody from their east and sending a message to somebody from their west if the message is taking a different route to get to the user based on the physical direction. If the path of the message is always sender->infrastructure->central database->infrastructure->receiver, you change only change the sender->infrastructure and maybe the infrastructure->central db latency. Without being able to change the path the message takes back out of the system to the target, you can't gain any useful information.
It should work with direct IP networking, but for locating IP addresses we already have location databases and traceroute so it wouldn't be necessary. Maybe it could work if there was a pseudo p2p service where clients connect to the nearest Cloudflare edge compute node or something and then the nodes connect directly between each other at the IP layer, because in that case you would be going through sufficiently sophisticated internet routing but the target's IP wouldn't be available for a less sophisticated and more accurate approach.
I don't think I understand the attack then. So a timing attack on Read receipts gives you approximate location how?
I understood the SMS case because the tower data could then be extrapolated. But if we're just talking about a standard internet application like signal. The read receipts are coming over the internet and not coming from Tower records.
Or at least that's my understanding. If I have a computer attached to some point on the internet. People could use ping timings to theoretically restrict the location but not very accurately right?
You just measure the time until the delivery recipe arrives. You can approximate how far away the recipient is. Now you keep doing that while changing your own location (use vpns etc.) and you can slowly get a more accurate location of the target. Now you automate that stuff and also utilize machine learning to interpret the data.
That makes sense. It wouldn't give you very accurate data. But it'll get you within a hundred kilometers or so?
Though it seems like the solution here isn't always on VPN. So the measurements would only get to your VPN endpoint. Which is trivial to know by the IP address