Quantcast
Channel: Active questions tagged feedparser - Stack Overflow
Viewing all articles
Browse latest Browse all 105

why the feedparser break without any error message when pull rss channel in Python 3

$
0
0

I am using feedparser feedparser=6.0.2 to parse some rss resource in Python 3.10, when I using feedparser to get the response in the CentOS 7.x, the feedparser just exists without any exception message. Why did this happen? This is my Python code looks like:

def feeder_parse(self, source: any, level: str, task_id: str):    try:        logger.info(str(level) +" level feeder parse source:" + source.sub_url +",task_id:" + task_id)        if hasattr(ssl, '_create_unverified_context'):            ssl._create_default_https_context = ssl._create_unverified_context        #        # when fetch rss subscribe content        # we should send the etag and last modified info        # so the server will tell us if the source not modified        # it will save the network traffic        # and make massive rss source update possible        #        logger.info(str(level) +" level set ssl:" + source.etag +",last modified:" + source.last_modified +",sub url:" + source.sub_url)        feed = feedparser.parse(source.sub_url,                                etag=source.etag if source.etag is None else ast.literal_eval(source.etag),                                modified=source.last_modified)        logger.info(str(level) +" level get feeder:")        if not hasattr(feed, 'status'):            logger.error("do not contain status:" + source.sub_url)            return        if feed.status == 522:            RssSource.unsubscribe(source, -1)            return        if feed.status == 403:            RssSource.unsubscribe(source, -2)            return        if feed.status == 304:            self.sub_source_update_dynamic_interval(source, 5)            return        etag = ''        last_modified = ''        if hasattr(feed, 'etag'):            etag = feed.etag        if hasattr(feed, 'updated'):            last_modified = feed.updated        if feed.entries is None or len(feed.entries) == 0:            logger.warn(str(level) +" level get null entry:" + source.sub_url +",task_id:" + task_id)            return        for entry in feed.entries:            logger.info(str(level) +" level get entry:" + source.sub_url +",task_id:" + task_id)            source.etag = etag            source.last_modified = last_modified            RssParser.parse_single(entry, source, level)        self.sub_source_meta_compare(source, feed)        self.parse_fav_icon(source)    except requests.ReadTimeout:        logger.error(str(level) +" level read timeout:" + source.sub_url +",task_id:" + task_id)        return    except socket.timeout:        logger.error(str(level) +" level socket timeout:" + source.sub_url +",task_id:" + task_id)        return    except RemoteDisconnected:        logger.error(str(level) +" level remote disconnected:" + source.sub_url +",task_id:" + task_id)        return    except URLError:        logger.error(str(level) +" level read url error:" + source.sub_url +",task_id:" + task_id)        return    except Exception as e:        logger.error("feed parser error, url:" + source.sub_url, e)

the part of log output like this:

2022-05-13 18:21:00,599 - RssParser.py:25 - cruise-task-executor - 2 level feeder parse source:https://incidentdatabase.ai/rss.xml,task_id:eed2adee-9f92-4d5c-8d59-fe672371ab9f2022-05-13 18:21:00,599 - RssParser.py:43 - cruise-task-executor - 2 level set ssl:ecc8bd172ed1d9a9f17d999074b15c30-ssl-df,last modified:,sub url:https://incidentdatabase.ai/rss.xml2022-05-13 18:21:00,638 - RssParser.py:25 - cruise-task-executor - 2 level feeder parse source:https://blog.crunchydata.com/blog/rss.xml,task_id:ec57d2ba-8a38-45c0-9b8b-46fc77e036ee2022-05-13 18:21:00,638 - RssParser.py:43 - cruise-task-executor - 2 level set ssl:,last modified:,sub url:https://blog.crunchydata.com/blog/rss.xml2022-05-13 18:21:00,727 - RssParser.py:25 - cruise-task-executor - 2 level feeder parse source:http://www.jenlawrence.org/feed,task_id:8ce67905-9a53-463c-aec7-f8c5ca4012e42022-05-13 18:21:00,728 - RssParser.py:43 - cruise-task-executor - 2 level set ssl:b627eec4f37d61083b462f430e8a46bf,last modified:Mon, 04 Apr 2022 23:09:39 GMT,sub url:http://www.jenlawrence.org/feed2022-05-13 18:21:00,778 - RssParser.py:25 - cruise-task-executor - 2 level feeder parse source:https://www.cicoding.cn/atom.xml,task_id:43007f6d-483c-46f0-9d0a-43742cc317af2022-05-13 18:21:00,778 - RssParser.py:43 - cruise-task-executor - 2 level set ssl:,last modified:Thu, 11 Mar 2021 05:53:55 GMT,sub url:https://www.cicoding.cn/atom.xml2022-05-13 18:21:00,860 - RssParser.py:25 - cruise-task-executor - 2 level feeder parse source:https://tlanyan.me/feed,task_id:418b70a4-a7a0-468a-a0a7-e29ef264fff72022-05-13 18:21:00,860 - RssParser.py:43 - cruise-task-executor - 2 level set ssl:"d5df98785beddd62763cc8ea64ac659f",last modified:,sub url:https://tlanyan.me/feed2022-05-13 18:21:13,948 - RssParser.py:87 - cruise-task-executor - 2 level read url error:https://tlanyan.me/feed,task_id:418b70a4-a7a0-468a-a0a7-e29ef264fff72022-05-13 18:24:00,155 - RssParser.py:25 - cruise-task-executor - 3 level feeder parse source:https://www.cnbeta.com/backend.php,task_id:87c75c99-89e0-4260-a54d-7c98575ad5dc2022-05-13 18:24:00,155 - RssParser.py:43 - cruise-task-executor - 3 level set ssl:,last modified:,sub url:https://www.cnbeta.com/backend.php

you can see the information from log that it did not print the line:

logger.info(str(level) +" level get feeder:")

seemed it stopped without error in the rss scrapy. why did this happen? what should I do to fixed this problem? I have tried to print the varialbe and put it in the unit test function but it works fine. Still could not figure out where is going wrong after read the function again and agin. Can someone give me a hand to find out where is the problem? The schedule task was the celery task.


Viewing all articles
Browse latest Browse all 105

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>