• Pivot 5
  • Posts
  • Baidu blocks Google, Bing from scraping content amid demand for data used on AI projects

Baidu blocks Google, Bing from scraping content amid demand for data used on AI projects

In partnership with

Pivot 5: 5 stories. 5 minutes a day. 5 days a week.

The Daily Newsletter for Intellectually Curious Readers

  • We scour 100+ sources daily

  • Read by CEOs, scientists, business owners and more

  • 3.5 million subscribers

1. Baidu blocks Google, Bing from scraping content amid demand for data used on AI projects

Chinese internet search giant Baidu has blocked Google and Bing's search engines from scraping content from its Wikipedia-style service. A recent update of Baidu Baike's robots.txt file has outright blocked the ability of Googlebot and Bingbot crawlers to index content from the Chinese platform. This move follows US social news aggregation platform Reddit's move in July, which blocked various search engines, except Google, from indexing its online posts and discussions.

Baidu's initiative shows Beijing-based Baidu's increased effort to safeguard its online assets as demand for vast troves of data has increased for training and building artificial intelligence models and applications. Baidu Baike still allows Google and Bing to browse and index its online repository of nearly 30 million entries.

Read the full story here

2. German AI Startup Aleph Alpha Launches Pharia-1-LLM Model Family

German AI Startup Aleph Alpha has announced the release of its latest foundation model family, Pharia-1-LLM, featuring Pharia-1-LLM-7B-control and Pharia-1-LLM-7B-control-aligned.

These models are now publicly available under the Open Aleph License, which permits non-commercial research and educational use. Pharia-1-LLM-7B-control is designed to produce concise, length-controlled responses.

Read the full story here

3. Grok-2 gets a speed bump after developers rewrite code in three days

Elon Musk's xAI has released its Grok-2 large language model chatbot, available for an $8 monthly subscription on X. Both Grok-2 and Grok-2 mini versions have increased their analysis speed and accuracy after two developers rewrote the inference code stack using SGLang, an open-source system for executing complex language model programs. 

SGLang is highly efficient and can achieve up to 6.4 times higher throughput than existing systems. The system supports many models, including Llama, Mistral, and LLaVA, and is compatible with open-weight and API-based models like OpenAI's GPT-4. The main Grok-2 has secured the #2 spot in the third-party Lmsys Chatbot Arena leaderboard, based on 6686 votes.

Read the full story here

4. Penny a minute voice bots help India startups vie with OpenAI

Sarvam

Earlier this month, executives from Alphabet Inc.’s Google DeepMind, Microsoft Corp. and Meta Platforms Inc. joined tech founders in Bangalore to watch one of India’s top AI startups unveil a new product that might change how the world’s most populous country uses the technology.

Sarvam AI, often described as India’s OpenAI, introduced software for businesses that can interact with customers using spoken voice rather than just text.

Read the full story here

5. OpenAI supports California AI bill requiring 'watermarking' of synthetic content

OpenAI, a ChatGPT developer, is supporting a California bill that would require tech companies to label AI-generated content, which can range from harmless memes to deepfakes spreading misinformation about political candidates. The bill, AB 3211, has been overshadowed by SB 1047, which mandates AI developers conduct safety testing on their own models.

OpenAI believes transparency and requirements around provenance, such as watermarking, are important for AI-generated content, especially in an election year. With countries representing a third of the world's population having polls this year, experts are concerned about the role AI-generated content will play in elections.

Read the full story here

Advertise with Pivot 5 to reach influential minds & elevate your brand

Get your brand in front of 80,000+ businesses and professionals who rely on Pivot 5 for daily AI updates. Book future ad spots here.