Scraping Instagram with Python

In today’s post we are going how to look at how you can extract information from a users Instagram profile. It’s surprisingly easy to extract profile information such as the number of followers a user has and information and image files for a users most recent posts. With a bit of effort it would be relatively easy to extract large chunks of data regarding a user. This could then be applied at a very broad scale to extract a large chunk of all public posts featured on Instagram’s site.

Imports & Setup

We begin by making our imports and writing the dunder init method for our class. Our code requires two packages not included in the standard library, requests for making HTTP Requests and BeautifulSoup to make html parsing more user friendly. If you do not already have these libraries install, you can use the following pip command:

The init method of our class takes two optional keyword arguments, which we simply store in self. This will allow us to override the default user agent list and use a proxy should we wish to avoid detection.

We then write two helper methods. First, we write a very simply method that returns us a random user-agent. Switching user agents is often a best practice when web scraping and can help you avoid detection. Should the caller of our class have provided their own list of user agents we take a random agent from the provided list.  Otherwise we will return our default user agent.

Our second helper method is simply a wrapper around requests. We pass in a URL and try to make a request using the provided user agent and proxy. If we are unable to make the request or Instagram responds with a non-200 status code we simply re-raise the error. If everything goes fine, we return the page in questions HTML.

Extracting JSON from JavaScript

Instagram serve’s all the of information regarding a user in the form of JavaScript object. This means that we can extract all of a users profile information and their recent posts by just making a HTML request to their profile page. We simply need to turn this JavaScript object into JSON, which is very easy to do.

We can write this very hacky, but effective method to extract JSON from a user profile. We apply the static method decorator to this function, as it’s possible to use this method without initializing our class. We simply create a soup from the HTML, select body of the content and then pull out the first ‘script’ tag. We can then simply do a couple text replacements on the script tag, to derive a string which can be loaded into a dictionary object using the json.loads method.

Bringing it all together

We then bring it all together in two functions which we can use to extract information from this very large JSON object. We first make a request to the page, before extracting the JSON result. We then use two different selectors to pull out the relevant bits of information, as the default JSON object has lots of information we don’t really need.

When extracting profile information we extract all attributes from the “user” object, excluding their recent posts. In the “recent posts” function, we use a slightly different selector and pull out all the information about all of the recent posts made by our targeted user.

Example Usage

We can then use the Instagram scraper in a very simply fashion to pull out all the most recent posts from our favorite users in a very simple fashion. You could do lots of things with the resulting data, which could be used in Instagram analytics app for instance or you could simply programmatically download all the images relating to that user.

There is certainly room for improvement and modification. It would also be possible to use Instagram’s graph API, to pull out further posts from a particular user or pull out lists of a users recent followers etc. Allowing you to collect large amounts of data, without having to deal with Facebook’s restrictive API limitations and policies.

Full Code

20 thoughts to “Scraping Instagram with Python”

  1. I tried the code in example usage with a loop for 500 usernames and it is breaking after 101. Do you know any reason for this?

    1. Are you being blocked? It may be possible that Instagram realizes that you are scraping their site and block you. Are you delaying requests? Or just making all 101 requests back to back.

      1. Now its working perfectly fine. Thanks!
        With this code we can extract only upto 12 posts. Can we combine this with selenium just to scroll down and then run beautiful soup again till the end? if so can you guide me through it. Thanks!

    1. It appears that the best way to do this would be to use the Instagram graph API. However, this protected with an info has which appears to be dynamically generated.

      I would recommend you try looking at using a browser rendering solution, such as Selenium, Splash or Pyppeteer.

  2. Thanks for posting!
    I am new to Python and trying to figure out how to use this code. I do not understand where do I write the profile url to be analyzed in the code. I know it may sound like a stupid question but I will appreciate any help here 🙂

    1. If you take a look at the example usage section, you will find an example of how you can use this code.

      You could simply copy the full code and then copy the example usage code below it. It is here, where you would simply change out the URL that you want to use.

      1. Thanks Edmund for the quick reply.
        I have already managed to do so with a similar code I found published here:

        I learned how to push the date to Google Sheets and it is working fine.
        I do have a few questions: Is it possible to import information regarding a specific post by a user or just a bulk info regarding the last 12 posts?
        What about stories? Anyway to get info regarding these?

        Thanks again!

        1. To get more information regarding a users posts, you have to take one of two routes. Option one would be to manipulate the Graph API, which was relatively easy in the past but has become more difficult since the Cambridge Analytica scandal. You should also be aware you can get your account blocked. The second option is to use a browser automation solution such as Selenium. Even these have extraction limits with Instagram limiting the rate at which you can scroll down pages and interact with page options. It’s really the same story with stories.

          1. I believe that certain rate limiting is account level based? I thought like rate limiting was limited on an account basis, but I could be mistaken.

Leave a Reply

Your email address will not be published. Required fields are marked *