Quantcast
Channel: Active questions tagged feedparser - Stack Overflow
Viewing all articles
Browse latest Browse all 105

Import RSS with FeedParser and Get Both Posts and General Information to Single Pandas DataFrame

$
0
0

I am working on as a python novice on an exercise to practice importing data in python. Eventually I want to analyze data from different podcasts (infos on the podcasts itself and every episode) by putting the data into a coherent dataframe work on it with NLP.

So far I have managed to read a list of RSS feeds and get the information on every single episode of the RSS feed (a post).

But I am having trouble to find an integrated working process in python to gather both

  1. information on every single episode of the RSS feed (a post)
  2. and general information about the RSS feed (like title of the podcast)in one go.

CodeThis is what i have got so far

import feedparserimport pandas as pdrss_feeds = ['http://feeds.feedburner.com/TEDTalks_audio','https://joelhooks.com/rss.xml','https://www.sciencemag.org/rss/podcast.xml',    ]#number of feeds is reduced for testingposts = []feed = []for url in rss_feeds:       feed = feedparser.parse(url)       for post in feed.entries:           posts.append((post.title, post.link, post.summary))df = pd.DataFrame(posts, columns=['title', 'link', 'summary'])

OutputThe dataframe includes 652 non-null objects for three columns (as intended) - basically every post made in every podcast. The column title refers to the title of the episode but not to the title of the podcast (which in this example is 'Ted Talk Daily').

titlelinksummary
03 questions to ask yourself about everything y...https://www.ted.com/talks/stacey_abrams_3_ques...How you respond to setbacks is what defines yo...
1What your sleep patterns say about your relati...https://www.ted.com/talks/tedx_shorts_what_you...Wendy Troxel looks at the cultural expectation...
2How we can actually pay people enough -- with ...https://www.ted.com/talks/ted_business_how_we_...Capitalism urgently needs an upgrade, says Pay...

I am struggling to find a way on how to include the title of the podcasts to this dataframe, too. I always get an error selecting parts the whole feed information e.g. ['feed']['title'].

Thanks for every hint with this!

SourceI accustomed what I have so far based on this source: Get Feeds from FeedParser and Import to Pandas DataFrame


Viewing all articles
Browse latest Browse all 105

Trending Articles