首页 > 编程语言 >使用Java Xpath 爬取某易云歌曲

使用Java Xpath 爬取某易云歌曲

时间:2023-08-25 17:11:44浏览次数:37  
标签:Xpath map songId Java String 爬取 header result put

本文使用Java xpath 爬取某易云歌曲,并下载至本地。
代码仅用于个人学习使用,欢迎各位大佬提出建议。

1、添加依赖

        <dependency>
            <groupId>cn.wanghaomiao</groupId>
            <artifactId>JsoupXpath</artifactId>
            <version>2.2</version>
        </dependency>
        <dependency>
            <groupId>cn.hutool</groupId>
            <artifactId>hutool-all</artifactId>
            <version>5.8.9</version>
        </dependency>

2、获取音乐id和url

    /**
     * 获取歌曲信息
     *
     * @param url
     * @return
     */
    public Map<String, Object> getMusicInfo(String url) {
        Map<String, Object> result = new HashMap<>();
        url = url.replace("/#", "");
        Map<String, Object> header = new HashMap<>();
        header.put("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36");
        header.put("Referer", "https://music.163.com/");
        header.put("Host", "music.163.com");
        String res = HttpUtil.get(url, header);

        JXDocument jxDocument = JXDocument.create(res);
        //歌曲列表
        List<JXNode> songs = jxDocument.selN("//ul[@class=\"f-hide\"]/li/a");
        //歌单名称
        JXNode jxsonglistName = jxDocument.selNOne("//h2[contains(@class,\"f-ff2\")]/text()");
        //歌手名
        JXNode jxsingerName = jxDocument.selNOne("//h2[@id=\"artist-name\"]/text()");
        String songlistName = null != jxsonglistName ? jxsonglistName.toString() : "";
        String singerName = null != jxsingerName ? jxsingerName.toString() : "";

        System.out.println(String.format("=======================%s=======================", StrUtil.isBlank(songlistName) ? singerName : songlistName));
        List<Map<String, Object>> musics = new ArrayList<>();
        result.put("title", StrUtil.isBlank(songlistName) ? singerName : songlistName);
        for (JXNode song : songs) {
            Element element = song.asElement();
            String songName = element.text();
            String songId = element.attr("href").split("=")[1];
            String songUrl = OUT_LINK + songId;
            Map<String, Object> map = new HashMap<>();
            map.put("songId", songId);
            map.put("songName", songName);
            map.put("songUrl", songUrl);
            map.put("title", StrUtil.isBlank(songlistName) ? singerName : songlistName);
            //map.put("lyric", getMusicLyric(songId));
            musics.add(map);
            //单线程下载歌曲
            //downloadSong(songName, songUrl, result.get("title").toString());
        }
        musics.forEach(x -> System.out.println(x));

        //多线程下载歌曲
        //multiDownload(musics);
        result.put("songs", musics);
        //System.out.println(result);
        return result;
    }

3、获取歌词

    /**
     * 获取歌词
     *
     * @param songId
     * @return
     */
    public String getMusicLyric(String songId) {
        String url = String.format("http://music.163.com/api/song/lyric?id=%s&lv=-1&kv=-1&tv=-1", songId);
        Map<String, Object> header = new HashMap<>();
        header.put("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36");
        header.put("Referer", "https://music.163.com/");
        header.put("Host", "music.163.com");
        String res = HttpUtil.get(url, header);
        return JSONObject.parseObject(res).getJSONObject("lrc").getString("lyric");
    }

4、完整代码

加入多线程下载歌曲代码

import cn.hutool.core.io.FileUtil;
import cn.hutool.core.lang.Console;
import cn.hutool.core.util.StrUtil;
import cn.hutool.http.HttpUtil;
import com.alibaba.fastjson.JSONObject;
import org.jsoup.nodes.Element;
import org.seimicrawler.xpath.JXDocument;
import org.seimicrawler.xpath.JXNode;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class Music163 {
    //下载地址
    private static String OUT_LINK = "http://music.163.com/song/media/outer/url?id=";
    //本地下载目录
    private static String DOWNLOAD_PATH = "E:\\music\\";

    public static void main(String[] args) {
        String musicUrl;
        //歌曲清单
        // 热歌 3778678 原创 2884035  新歌 3779629 飙升 19723756
        musicUrl = "https://music.163.com/#/playlist?id=3778678";
        // 歌手歌曲榜单  8325->梁静茹
        //musicUrl = "https://music.163.com/#/artist?id=8325";
        //搜索列表
        // musicUrl = "https://music.163.com/#/search/m/?order=hot&cat=全部&limit=435&offset=435&s=梁静茹";
        Music163 music163 = new Music163();
        music163.getMusicInfo(musicUrl);
    }

    /**
     * 获取歌曲信息
     *
     * @param url
     * @return
     */
    public Map<String, Object> getMusicInfo(String url) {
        Map<String, Object> result = new HashMap<>();
        url = url.replace("/#", "");
        Map<String, Object> header = new HashMap<>();
        header.put("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36");
        header.put("Referer", "https://music.163.com/");
        header.put("Host", "music.163.com");
        String res = HttpUtil.get(url, header);

        JXDocument jxDocument = JXDocument.create(res);
        //歌曲列表
        List<JXNode> songs = jxDocument.selN("//ul[@class=\"f-hide\"]/li/a");
        //歌单名称
        JXNode jxsonglistName = jxDocument.selNOne("//h2[contains(@class,\"f-ff2\")]/text()");
        //歌手名
        JXNode jxsingerName = jxDocument.selNOne("//h2[@id=\"artist-name\"]/text()");
        String songlistName = null != jxsonglistName ? jxsonglistName.toString() : "";
        String singerName = null != jxsingerName ? jxsingerName.toString() : "";

        System.out.println(String.format("=======================%s=======================", StrUtil.isBlank(songlistName) ? singerName : songlistName));
        List<Map<String, Object>> musics = new ArrayList<>();
        result.put("title", StrUtil.isBlank(songlistName) ? singerName : songlistName);
        for (JXNode song : songs) {
            Element element = song.asElement();
            String songName = element.text();
            String songId = element.attr("href").split("=")[1];
            String songUrl = OUT_LINK + songId;
            Map<String, Object> map = new HashMap<>();
            map.put("songId", songId);
            map.put("songName", songName);
            map.put("songUrl", songUrl);
            map.put("title", StrUtil.isBlank(songlistName) ? singerName : songlistName);
            //map.put("lyric", getMusicLyric(songId));
            musics.add(map);
            //单线程下载歌曲
            //downloadSong(songName, songUrl, result.get("title").toString());
        }
        musics.forEach(x -> System.out.println(x));

        //多线程下载歌曲
        //multiDownload(musics);
        result.put("songs", musics);
        //System.out.println(result);
        return result;
    }

    /**
     * 获取歌词
     *
     * @param songId
     * @return
     */
    public String getMusicLyric(String songId) {
        String url = String.format("http://music.163.com/api/song/lyric?id=%s&lv=-1&kv=-1&tv=-1", songId);
        Map<String, Object> header = new HashMap<>();
        header.put("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36");
        header.put("Referer", "https://music.163.com/");
        header.put("Host", "music.163.com");
        String res = HttpUtil.get(url, header);
        return JSONObject.parseObject(res).getJSONObject("lrc").getString("lyric");
    }

    /**
     * 歌曲下载
     *
     * @param songName
     * @param songUrl
     * @param title
     */
    public void downloadSong(String songName, String songUrl, String title) {
        HttpUtil.downloadFile(songUrl,
                FileUtil.file(DOWNLOAD_PATH + title + "\\", songName + ".mp3"));
        Console.log("下载完成!" + "==》" + songName);
    }

    /**
     * 多线程下载
     *
     * @param list
     */
    public void multiDownload(List<Map<String, Object>> list){
        //使用多线程优化查询速度
        int threadNum = 10;
        if (list.size() < 10) {
            threadNum = 1;
        }
        ExecutorService executorService = Executors.newFixedThreadPool(threadNum);
        CountDownLatch countDownLatch = new CountDownLatch(threadNum);

        int perSize = list.size() / threadNum;
        for (int i = 0; i < threadNum; i++) {
            int start = i * perSize;
            int end = (i + 1) * perSize;
            if (i == threadNum - 1) {
                end = list.size();
            }
            List<Map<String, Object>> maps = list.subList(start, end);
            MultiThread thread = new MultiThread();
            thread.setProjectList(maps);
            thread.setCountDownLatch(countDownLatch);
            executorService.submit(thread);
        }
        try {
            countDownLatch.await();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        executorService.shutdown();
    }

    class MultiThread extends Thread {
        private List<Map<String, Object>> projectList;

        private CountDownLatch countDownLatch;

        private List<Map<String, Object>> result;


        public void setResultList(List<Map<String, Object>> result) {
            this.result = result;
        }

        public void setProjectList(List<Map<String, Object>> projectList) {
            this.projectList = projectList;
        }

        public void setCountDownLatch(CountDownLatch countDownLatch) {
            this.countDownLatch = countDownLatch;
        }

        @Override
        public void run() {
            try {

                for (Map<String, Object> map : projectList) {
                    downloadSong((String)map.get("songName"),(String)map.get("songUrl"),(String)map.get("title"));
                }

            } catch (Exception e) {
                e.printStackTrace();
            } finally {
                if (countDownLatch != null) {
                    countDownLatch.countDown();
                }
            }
        }
    }
}

标签:Xpath,map,songId,Java,String,爬取,header,result,put
From: https://www.cnblogs.com/995i996/p/17657468.html

相关文章

  • Java maven 工程报错:cannot be read or is not a valid ZIP file
    原因:这个报错,其实jar包是个异常的jar。我是通过maven下载的后缀.lastupdate,然后我就直接把.lastupdate改成了.jar。但是其实这个并不是实际意义上的jar包。解决办法:找到匹配的jar包替换掉这个无效的jar包就可以了......
  • JavaScript 去重-对象数组中的重复对象
    先showCodeArray.from(newSet(myArray.map(JSON.stringify)),JSON.parse)myArray是一个对象数组,它是源数据。map(JSON.stringify) 的作用是将每个对象转换为JSON字符串。JSON.stringify 方法将JavaScript对象转换为JSON字符串表示。newSet(...) 创建一个新的S......
  • [Java SE] Java执行命令行
    1序言实现自动化程序、跨环境调用的重要途径2源码示例packagetest.java;importorg.junit.Test;importjava.io.BufferedReader;importjava.io.File;importjava.io.IOException;importjava.io.InputStreamReader;importjava.nio.charset.StandardCharsets;im......
  • 利用Java实现文本到语音转换(TTS)的实用指南
    在现代技术发展的背景下,文本到语音转换(TTS)成为了一种非常有用的技术。TTS技术可以将文字转换成自然流畅的语音,提供更加人性化和便利的交互方式。本文将介绍如何使用Java来实现TTS功能,让我们一起来探索吧!引言文本到语音转换(TTS)是一种使计算机能够将文字转换成可听的语音的技术。它......
  • [javase高级] HashMap实现原理
    HashMap是数组+链表实现的,既然用到hash散列,那么肯定不可避免的会出现冲突问题,HashMap解决冲突的方法是拉链法,因为这里有用到数组,那么当容量不足的时候就需要进行扩容操作了,在HashMap中有个术语叫冲突,当冲突几率越来越高的时候就需要进行扩容操作了那什么情况就叫冲突几率高呢?就是......
  • 【算法记录】Java - Base64编码解码源码
    Base64编码表索引对应字符索引对应字符索引对应字符索引对应字符0A17R34i51z1B18S35j5202C19T36k5313D20U37l5424E21V38m5535F22W39n5646G23X40o5757H24Y41p5868I25Z42q5......
  • 在线直播系统源码,java使用Thumbnailator实现图片压缩
    在线直播系统源码,java使用Thumbnailator实现图片压缩1.添加jar包 <!--Thumbnailator图片压缩--><dependency>  <groupId>net.coobird</groupId>  <artifactId>thumbnailator</artifactId>  <version>0.4.8</version></dependency>......
  • java的反射到底是有什么用处?怎么用?
    首先,明白静态语言与动态语言的区别Java是一门静态语言,但是反射提供了一种黑魔法,允许在运行时去动态修改一个类的定义,一个对象的属性等等,给静态语言加上了“动态”的效果Java反射可以简单的理解为Class的一种抽象,我们知道,Java是面向对象语言,一切皆对象,都是某种抽象,可以理解反射就是......
  • Java反射是什么?
    JAVA反射机制是在运行状态中,对于任意一个类,都能够知道这个类的所有属性和方法;对于任意一个对象,都能够调用它的任意一个方法;这种动态获取的信息以及动态调用对象的方法的功能称为java语言的反射机制。Java反射机制主要提供了以下功能:在运行时判断任意一个对象所属的类;在运行时构造......
  • java线程池
    Executors创建线程的4种方法//1.newSingleThreadExecutor创建“单线程化线程池”//特点://单线程化的线程池中的任务是按照提交的次序顺序执行的。//只有一个线程的线程池。//池中的唯一线程的存活时间是无限的。//......