Inspired by requests for its simplicity and powered by lxml for its speed. Newspaper is an amazing python library for extracting & curating articles. Newspaper delivers Instapaper style article extraction. Newspaper is a Python3 library! If you are certain that an entire news source is in one language, go ahead and use the same api. Works in 10+ languages, English, Chinese, German, Arabic, and more! On python3 you must install newspaper3k, not newspaper. newspaper is our python2 library. Although installing newspaper is simple with pip, you will run into fixable issues if you are trying to install on ubuntu. Source objects are an abstraction of online news media websites like CNN or ESPN. You can initialize them in two different ways. Building a Source will extract its categories, feeds, articles, brand, and description for you. You may also provide configuration parameters like language, browser_user_agent, and etc seamlessly.

Features

  • Multi-threaded article download framework
  • News url identification
  • Text extraction from html
  • Top image extraction from html
  • All image extraction from html
  • Keyword extraction from text
  • Summary extraction from text
  • Author extraction from text
  • Google trending terms extraction

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow Newspaper3k

Newspaper3k Web Site

Other Useful Business Software
Marketing automation for any business | ActiveCampaign Icon
Marketing automation for any business | ActiveCampaign

Your team of AI agents handles email, SMS, WhatsApp and more for you

Active Intelligence revolutionizes how you work. You guide direction while AI handles execution, acts on insights, and shows you the path forward. It's how marketing should be.
Learn More
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Newspaper3k!

Additional Project Details

Operating Systems

Mac

Programming Language

Python

Related Categories

Python MARC and Book Library Metadata, Python Metadata Editors

Registered

2021-05-26