iorewblaster.blogg.se - Webscraper for games

#Webscraper for games code

movies is stripped of the elements we don’t need, and now we’ll assign the conversion code data to it to finish it up.This tells our function to strip the $ from the left side and strip the M from the right side. lstrip(‘$’).rstrip(‘M’) is our function arguments.Normal functions are defined using the def keyword. lambda x: x is an anonymous functions in Python (one without a name).map() function calls the specified function for each item of an iterable movies tells pandas to go to the column us_grossMillions in our DataFrame.We’ll be assigning our new cleaned up data to our us_grossMillions column. movies is our gross data in our movies DataFrame.Looks like we have some unwanted elements in our data: dollar signs,Ms, mins, commas, parentheses, and extra white space in the Metascores. So far so good, but we aren’t quite there yet. us_gross.append(grosses) tells the scraper to take what we found and stored in grosses and to add it into our empty list called us_grosses (which we created in the beginning).But if the data that’s stored in nv isn’t greater than one - meaning if the gross is missing - then put a dash there.

nv.text if len(nv) > 1 else ‘-’ says if the length of nv is greater than one, then find the second datum that’s stored.

nv tells the scraper to go into the nv tag and grab the second data in the list - which is gross because gross comes second in our HTML code.

grosses is the variable we’ll use to store the gross we find in the nv tag.

votes.append(vote) tells the scraper to take what we found and stored in vote and to add it into our empty list called votes (which we created in the beginning).

text tells the scraper to grab that text.

nv tells the scraper to go into the nv tag and grab the first data in the list - which are the votes because votes comes first in our HTML code (computers count in binary - they start count at 0, not 1).vote is the variable we’ll use to store the votes we find in the nv tag.( ‘span’, attrs = ‘name’ : ’nv’) is how we can grab attributes of that specific tag.find_all() is the method we’ll use to grab both of the tags.container is what we used in our for loop for iterating over each time.nv is an entirely new variable we’ll use to hold both the votes and the gross tags.