Scraping Instagram with Python

In today’s post we are going how to look at how you can extract information from a users Instagram profile. It’s surprisingly easy to extract profile information such as the number of followers a user has and information and image files for a users most recent posts. With a bit of effort it would be relatively easy to extract large chunks of data regarding a user. This could then be applied at a very broad scale to extract a large chunk of all public posts featured on Instagram’s site.

Imports & Setup

We begin by making our imports and writing the dunder init method for our class. Our code requires two packages not included in the standard library, requests for making HTTP Requests and BeautifulSoup to make html parsing more user friendly. If you do not already have these libraries install, you can use the following pip command:

The init method of our class takes two optional keyword arguments, which we simply store in self. This will allow us to override the default user agent list and use a proxy should we wish to avoid detection.

We then write two helper methods. First, we write a very simply method that returns us a random user-agent. Switching user agents is often a best practice when web scraping and can help you avoid detection. Should the caller of our class have provided their own list of user agents we take a random agent from the provided list.  Otherwise we will return our default user agent.

Our second helper method is simply a wrapper around requests. We pass in a URL and try to make a request using the provided user agent and proxy. If we are unable to make the request or Instagram responds with a non-200 status code we simply re-raise the error. If everything goes fine, we return the page in questions HTML.

Extracting JSON from JavaScript

Instagram serve’s all the of information regarding a user in the form of JavaScript object. This means that we can extract all of a users profile information and their recent posts by just making a HTML request to their profile page. We simply need to turn this JavaScript object into JSON, which is very easy to do.

We can write this very hacky, but effective method to extract JSON from a user profile. We apply the static method decorator to this function, as it’s possible to use this method without initializing our class. We simply create a soup from the HTML, select body of the content and then pull out the first ‘script’ tag. We can then simply do a couple text replacements on the script tag, to derive a string which can be loaded into a dictionary object using the json.loads method.

Bringing it all together

We then bring it all together in two functions which we can use to extract information from this very large JSON object. We first make a request to the page, before extracting the JSON result. We then use two different selectors to pull out the relevant bits of information, as the default JSON object has lots of information we don’t really need.

When extracting profile information we extract all attributes from the “user” object, excluding their recent posts. In the “recent posts” function, we use a slightly different selector and pull out all the information about all of the recent posts made by our targeted user.

Example Usage

We can then use the Instagram scraper in a very simply fashion to pull out all the most recent posts from our favorite users in a very simple fashion. You could do lots of things with the resulting data, which could be used in Instagram analytics app for instance or you could simply programmatically download all the images relating to that user.

There is certainly room for improvement and modification. It would also be possible to use Instagram’s graph API, to pull out further posts from a particular user or pull out lists of a users recent followers etc. Allowing you to collect large amounts of data, without having to deal with Facebook’s restrictive API limitations and policies.

Full Code

Leave a Reply

Your email address will not be published. Required fields are marked *