Yelp API

API方式可以更快并且返回更准确的数据,但是很多公司也选择部分开放数据以保证自己的行业竞争力。例如,用户评论数据对优化企业经营有非常大的帮助,这部分数据Yelp就在API中设了限制,每个企业id只能返回3条数据。如果需要获取全部评论数据,还是需要用爬虫的方式。

以下文章来自链接

STEP 1: OBTAINING ACCESS TO THE YELP API

Before you can use the Yelp API, you need to submit a developer request. This can be done here. I’m not sure what the requirements are, but my guess is they approve almost everyone. After getting access, you will need to get your API keys from the Manage API access section on the site.

STEP 2: GETTING THE RAUTH LIBRARY

Yelp’s API uses OAuth authentication for API calls. Unless you want to do a lot of work, I suggest that you use a third party library to handle the OAuth for you. For this tutorial I’m using rauth, but feel free to use any library of your choice.

You can use easy_install rauth or pip install rauth to download the library.

STEP 3: WRITE THE CODE TO QUERY THE YELP API

You’ll first need to figure out what information you actually want to query. The API Documentation gives you all of the different parameters that you can specify and the correct syntax.

For this example, we’re going to be doing some location-based searching for restaurants. If you store each of the search parameters in a dictionary, you can save yourself some formatting. Here’s a method that accepts a latitude and longitude and returns the search parameter dictionary:

def get_search_parameters(lat,long):

#See the Yelp API for more details

params = {}

params["term"] = "restaurant"

params["ll"] = "{},{}".format(str(lat),str(long))

params["radius_filter"] = "2000"

params["limit"] = "10"

return params

Next we need to build our actual API call. Using the codes from the Manage API access page, we’re going to create an OAuth session. After we have a session, we can make an actual API call using our search parameters. Finally, we take that data and put it into a Python dictionary.

def get_results(params):

#Obtain these from Yelp's manage access page

consumer_key = "YOUR_KEY"

consumer_secret = "YOUR_SECRET"

token = "YOUR_TOKEN"

token_secret = "YOUR_TOKEN_SECRET"

session = rauth.OAuth1Session(

consumer_key = consumer_key

,consumer_secret = consumer_secret

,access_token = token

,access_token_secret = token_secret)

request = session.get("http:\/\/api.yelp.com\/v2\/search",params=params)

#Transforms the JSON API response into a Python dictionary

data = request.json()

session.close()

return data

Now we can put it all together. Since Yelp will only return a max of 40 results at a time, you will likely want to make several API calls if you’re putting together any sort of sizable dataset. Currently, Yelp allows 10,000 API calls per day which should be way more than enough for compiling a dataset! However, when I’m making repeat API calls, I always make sure to rate-limit myself.

Companies with APIs will almost always have mechanisms in place to prevent too many requests from being made at once. Often this is done by IP address. They may have some code in place to only handle X calls in Y time per IP or X concurrent calls per IP, etc. If you rate limit yourself you can increase your chances of always getting back a response.

def main():

locations = [(39.98,-82.98),(42.24,-83.61),(41.33,-89.13)]

api_calls = []

for lat,long in locations:

params = get_search_parameters(lat,long)

api_calls.append(get_results(params))

#Be a good internet citizen and rate-limit yourself

time.sleep(1.0)

##Do other processing

def main(): locations = [(39.98,-82.98),(42.24,-83.61),(41.33,-89.13)] api_calls = [] for lat,long in locations: params = get_search_parameters(lat,long) api_calls.append(get_results(params)) #Be a good internet citizen and rate-limit yourself time.sleep(1.0) ##Do other processin

At this point you have a list of dictionaries that represent each of the API calls you made. You can then do whatever additional processing you want to each of those dictionaries to extract the information you are interested in.

When working with a new API, I sometimes find it useful to open an interactive Python session and actually play with the API responses in the console. This helps me understand the structure so I can code the logic to find what I’m looking for.

You can get this complete script here. Every API is different, but Yelp is a friendly introduction to the world of making API calls through Python. With this skill you can construct your own datasets from any of the companies with public APIs.

results matching ""

    No results matching ""