Today was ... interesting. If you followed me for the past months over on the shitbird site, you might have seen a bunch of angry German words, lots of graphs, and the occassional news paper, radio, or TV snippet with yours truely. Let me explain.
In Austria, inflation is way above the EU average. ...
It would be nice to be able to bring to light the price gouging that is taking place in Canada with regards to grocery stores.
At the bottom of the chain on mastodon the creator says they use the search APIs of the store websites. I wouldn't have expected those to be easily accessible!
Yeah a lot of chains even have a documented, developer-friendly API. If that's not available though, you can usually figure out the API just by looking at the calls your browser makes when visiting a page. Most sites use a REST API for catalog pages that's then rendered out with JavaScript.
If that doesn't work, then you can usually scrape everything with Selenium. It's a little harder to do, but still quite manageable, though that usually has to be a background job, as it's slow.
The issue with this sort of thing is primarily one of data entry, rather than "tech savvy" as such. Defining the database is easy compared to getting the data in there.
Quick options would include parsing the information out of the stores' websites (possible, but if Javascript is involved you may be looking at puppeting a browser with Selenium, which isn't fast and can get tedious, and the approach depends on the websites being complete, accurate, and up-to-date), or hacking or snooping on the stores' own mobile apps (if they have them) to get price information in a usable format. Approaches like this are inherantly brittle, as even trivial changes made from the grocery chains' end can cause them to break. Scraping information without a defined API or the cooperation of the owner of the data is a moving target. From experience, I can tell you that it gets annoying fast.
In the case of the Austrian government, they probably wanted that cooperation and defined API. Which would have required careful negotiations with each company and paid programmers looking at the corporate databases. That would have increased their cost and lengthened their projected timeframe. Corruption and corporate greed did the rest.
the responsible minister claimed it's an immense task and will take til autumn. It will only include 16 product categories (think flour, milk,etc.). And it will only be updated once a week.
I mean that's pretty pathetic. Better than nothing, but "only updated once a week" sounds like "the intern who has to enter the prices works only for 20 hours", not like they created an API and told the grocery chains to upload their prices.
Unknown. I don't use the grocery chains' websites (I'm of the "go to the nearest physical store and figure it out once there" persuasion), so I don't know what the complexity level would be. It's possible that they're all older-school sites where you can lift the data straight from the HTML, which is relatively fast.
Are we looking to expand what’s here on the Grocery Tracker to incorporate what they are doing with the Austrian site?
I’d also like to look at other pinch points of government heel dragging. Housing, energy, medical, transportation, telecom, news etc.
We all see these government contracts go out for seven figures and it’s always shown to be blown out of proportion.
A nice added bonus to the project in Austria was someone giving historical data. It would be great to have a similar leg up for Canada.
Firefox will basically hand you want you need to interact with APIs, here is an example in Powershell of getting a milk price from Superstore for a specific store
The real question is with that apikey. I didn't see it change across browsers or anything. Is it hardcoded? Does it change regularly? Someone could probably find out if they did enough digging
Even without an API it should be possible, in theory, to just parse the data directly from their websites.
This also gives the grocery stores less of a leg to stand on in terms of legal or practical recourse. They chose to create a publicly browsable database of their prices; all you're doing is browsing it.
Huh. Now you've got me wondering... Could you leave a device hidden in store that receives the IR signals that program the tags, capture that information, then parse it out later? You could literally log prices changes at the shelves, in real time.
If you do get started with this, I'd love to follow along and find a place where I can help. If you guys make a community or mastodon account for example, please link it :)
Would a system identifying products from a recipt work for this? combined with other data sources (like web scraping) it would make it a lot easier to crowdsource the data, even if only sortaa technically inclined people do it
It would help to some extent, but to really get people to buy in you'd need an app to do the heavy lifting (that is, it's easier to get people to snap a photo of their receipt than to type the info in one character at a time). Some people might still be willing to do it without, but how many?
You'd also have to relate the abbreviations that often appear on grocery receipts back to the items they represent, which is more data entry.
Yeah, I was thinking something along the lines of lots n lots of easy shitty data (ex. anyone who can take a picture), some pretty good data (ex. hand labeled receipts), some 100% reliable data (scraped/api) then some sort of system to correlate the 3, especially when prices match identically between receipt and api a fair sized database could create itself.
Also would need some sort of processing center to handle the many image processing requests, but maybe that could be done client side
Pretty easy to get something basic set up if you get enough people to crowd-source data with photos of stuff in grocery stores and their receipts, along with some scraping to get data that's available online. It's a project that's been on my backlog for a while, but I can bump it up if others want to join me in making this.
Can't, this would be illegal within a year. Scraping data is already taboo. How fucking dumb is that.
Its why I hate the ' starving artist worried about AI scrapping' stories. It will be used to usher in stronger laws to prevent us from scraping this data. Its a double edged sword.