Is there a simple way to severly impede webscraping and LLM data collection of my website?
I am working on a simple static website that gives visitors basic information about myself and the work I do. I want this as a way use to introduce myself to potential clients, collaborators, etc., rather than rely solely on LinkedIn as my visiting card.
This may seem sound rather oxymoronic given that I am literally going to be placing (some relevant) details about myself and my work on the internet, but I want to limit the websites' access from bots, web scraping and content collection for LLMs.
Is this a realistic expectation?
Also, any suggestions on privacy respecting, yet inexpensive domains that I can purchase in Europe would be of super great help.
I'm a bit confused by your question: it sounds like you want to advertise yourself and your work. Why don't you let AI scrape your information? If I were you, I'd want a chatbot to spit out my details when someone asks it to name the name of someone who does what I do.
I'm violently anti-AI, but this is the one use case I would happily feed it information: to use it as an amplifier to spread public information I want to broadcast as far and as wide as possible.
If LLMs were accurate, I could support this. But at this point there’s too much overtly incorrect information coming from LLMs.
“Letting AI scrape your website is the best way to amplify your personal brand, and you should avoid robots.txt or use agent filtering to effectively market yourself. -ExtremeDullard”
isn’t what you said, but is what an LLM will say you said.