I am using feedparser feedparser=6.0.2
to parse some rss resource in Python 3.10, when I using feedparser to get the response in the CentOS 7.x, the feedparser just exists without any exception message. Why did this happen? This is my Python code looks like:
def feeder_parse(self, source: any, level: str, task_id: str): try: logger.info(str(level) +" level feeder parse source:" + source.sub_url +",task_id:" + task_id) if hasattr(ssl, '_create_unverified_context'): ssl._create_default_https_context = ssl._create_unverified_context # # when fetch rss subscribe content # we should send the etag and last modified info # so the server will tell us if the source not modified # it will save the network traffic # and make massive rss source update possible # logger.info(str(level) +" level set ssl:" + source.etag +",last modified:" + source.last_modified +",sub url:" + source.sub_url) feed = feedparser.parse(source.sub_url, etag=source.etag if source.etag is None else ast.literal_eval(source.etag), modified=source.last_modified) logger.info(str(level) +" level get feeder:") if not hasattr(feed, 'status'): logger.error("do not contain status:" + source.sub_url) return if feed.status == 522: RssSource.unsubscribe(source, -1) return if feed.status == 403: RssSource.unsubscribe(source, -2) return if feed.status == 304: self.sub_source_update_dynamic_interval(source, 5) return etag = '' last_modified = '' if hasattr(feed, 'etag'): etag = feed.etag if hasattr(feed, 'updated'): last_modified = feed.updated if feed.entries is None or len(feed.entries) == 0: logger.warn(str(level) +" level get null entry:" + source.sub_url +",task_id:" + task_id) return for entry in feed.entries: logger.info(str(level) +" level get entry:" + source.sub_url +",task_id:" + task_id) source.etag = etag source.last_modified = last_modified RssParser.parse_single(entry, source, level) self.sub_source_meta_compare(source, feed) self.parse_fav_icon(source) except requests.ReadTimeout: logger.error(str(level) +" level read timeout:" + source.sub_url +",task_id:" + task_id) return except socket.timeout: logger.error(str(level) +" level socket timeout:" + source.sub_url +",task_id:" + task_id) return except RemoteDisconnected: logger.error(str(level) +" level remote disconnected:" + source.sub_url +",task_id:" + task_id) return except URLError: logger.error(str(level) +" level read url error:" + source.sub_url +",task_id:" + task_id) return except Exception as e: logger.error("feed parser error, url:" + source.sub_url, e)
the part of log output like this:
2022-05-13 18:21:00,599 - RssParser.py:25 - cruise-task-executor - 2 level feeder parse source:https://incidentdatabase.ai/rss.xml,task_id:eed2adee-9f92-4d5c-8d59-fe672371ab9f2022-05-13 18:21:00,599 - RssParser.py:43 - cruise-task-executor - 2 level set ssl:ecc8bd172ed1d9a9f17d999074b15c30-ssl-df,last modified:,sub url:https://incidentdatabase.ai/rss.xml2022-05-13 18:21:00,638 - RssParser.py:25 - cruise-task-executor - 2 level feeder parse source:https://blog.crunchydata.com/blog/rss.xml,task_id:ec57d2ba-8a38-45c0-9b8b-46fc77e036ee2022-05-13 18:21:00,638 - RssParser.py:43 - cruise-task-executor - 2 level set ssl:,last modified:,sub url:https://blog.crunchydata.com/blog/rss.xml2022-05-13 18:21:00,727 - RssParser.py:25 - cruise-task-executor - 2 level feeder parse source:http://www.jenlawrence.org/feed,task_id:8ce67905-9a53-463c-aec7-f8c5ca4012e42022-05-13 18:21:00,728 - RssParser.py:43 - cruise-task-executor - 2 level set ssl:b627eec4f37d61083b462f430e8a46bf,last modified:Mon, 04 Apr 2022 23:09:39 GMT,sub url:http://www.jenlawrence.org/feed2022-05-13 18:21:00,778 - RssParser.py:25 - cruise-task-executor - 2 level feeder parse source:https://www.cicoding.cn/atom.xml,task_id:43007f6d-483c-46f0-9d0a-43742cc317af2022-05-13 18:21:00,778 - RssParser.py:43 - cruise-task-executor - 2 level set ssl:,last modified:Thu, 11 Mar 2021 05:53:55 GMT,sub url:https://www.cicoding.cn/atom.xml2022-05-13 18:21:00,860 - RssParser.py:25 - cruise-task-executor - 2 level feeder parse source:https://tlanyan.me/feed,task_id:418b70a4-a7a0-468a-a0a7-e29ef264fff72022-05-13 18:21:00,860 - RssParser.py:43 - cruise-task-executor - 2 level set ssl:"d5df98785beddd62763cc8ea64ac659f",last modified:,sub url:https://tlanyan.me/feed2022-05-13 18:21:13,948 - RssParser.py:87 - cruise-task-executor - 2 level read url error:https://tlanyan.me/feed,task_id:418b70a4-a7a0-468a-a0a7-e29ef264fff72022-05-13 18:24:00,155 - RssParser.py:25 - cruise-task-executor - 3 level feeder parse source:https://www.cnbeta.com/backend.php,task_id:87c75c99-89e0-4260-a54d-7c98575ad5dc2022-05-13 18:24:00,155 - RssParser.py:43 - cruise-task-executor - 3 level set ssl:,last modified:,sub url:https://www.cnbeta.com/backend.php
you can see the information from log that it did not print the line:
logger.info(str(level) +" level get feeder:")
seemed it stopped without error in the rss scrapy. why did this happen? what should I do to fixed this problem? I have tried to print the varialbe and put it in the unit test function but it works fine. Still could not figure out where is going wrong after read the function again and agin. Can someone give me a hand to find out where is the problem? The schedule task was the celery task.