I want to create an AI model to learn about AI/ML. so I have scraped some data from Threads and Instagram.now I am wondering how can I use this dataset to make an AI model or do something useful with it?
(BTW I don't know anything about AI/ML. I have done internship as Data Analyst so I know a little bit about Linear regression etc. but don't know anything advance.)
I would ignore the people who say you should deploy a model from someone else as that will teach you next to nothing about how this stuff works.
I would start with an older model and framework (e.g. scikitlearn) and go through all the processing, prediction, and evaluation steps using a model that's fairly simple to understand. Since you already know about linear regression, start with some of these linear models.
Then, and only then, would I worry about neural networks and deep learning, since the main difference is a non-linear activation function and a much more complicated set of weights (model parameters in the linear regression language).
You're right. I read past the "I want to learn ML" and went straight to "do something useful with the data".
If the goal is to understand how modern LLMs work, it's also good to read up on RNNs and LSTMs. For this, 3Blue1Brown does an amazing job, and even posted an in-depth video about transformers. I'd watch that next, followed by implementing a simple transformer in PyTorch (perhaps using the existing blocks).
You could argue that it's important to design everything from scratch first, but it's easier to first go high level, see how the network behaves, and then attempt to implement it yourself based on the paper. It is up to OP how comfortable he is with the topic though 😁
Depending on how much compute you have available, you can look into finetuning models from HuggingFace (e.g. Llama 3, or a smaller Phi model). Look into LoRA, and try to learn how the model you choose calculates the loss.
There are various ways to train, and usually involves masking the input by replacing random input tokens with the mask token. I won't go into too much detail with this, because it's a lot to explain, and I suggest you read an article on this (link1 or link2)
That's a great starting point! Your scraped data from Threads and Instagram can be a valuable resource for exploring AI/ML. Here's a general roadmap to get you started:
Understand Your Data: Before diving into AI/ML models, it's crucial to understand your data. Analyze the content you scraped from Threads and Instagram. What format is it in (text, images, videos)? What kind of information does it contain (captions, comments, user data)?
Choose an AI/ML Approach: Based on your data and goals, you can explore different AI/ML techniques. Here are some options to consider:
Text Analysis: If your data is text-heavy, you can use natural language processing (NLP) to analyze sentiment, topics, or emerging trends.
-Image Recognition: If you have a lot of images, you can use computer vision to identify objects, scenes, or classify images based on their content.
Start Simple: Begin with well-established algorithms like linear regression or decision trees. These can provide valuable insights without requiring deep learning expertise.
Utilize Online Resources: There are plenty of online tutorials and courses that can introduce you to AI/ML concepts. Platforms like Google Colab offer free computing resources to experiment with code.
Remember, this is an ongoing learning journey. Start with small steps, explore different resources, and don't be afraid to experiment!